Python Pandas Identify Duplicated rows with Additional Column -

- September 15, 2011

i have following dataframe:

df out[23]:     pplnum  roomnum  value 0       1        0    265 1       1       12    170 2       2        0    297 3       2       12     85 4       2        0     41 5       2       12    144

generally pplnum , roomnum generated this, , follow format:

for ppl in [1,2,2]:     room in [0, 12]:         print(ppl, room)  1 0 1 12 2 0 2 12 2 0 2 12

but achieve mark duplicates combinations of pplnum , roomnum can know combinationss first occurrence, second occurrence , on... expected output dataframe this:

    pplnum  roomnum  value  c 0       1        0    265  1 1       1       12    170  1 2       2        0    297  1 3       2       12     85  1 4       2        0     41  2 5       2       12    144  2

you can using groupby() cumcount() function:

in [102]: df['c'] = df.groupby(['pplnum','roomnum']).cumcount() + 1  in [103]: df out[103]:    pplnum  roomnum  value  c 0       1        0    265  1 1       1       12    170  1 2       2        0    297  1 3       2       12     85  1 4       2        0     41  2 5       2       12    144  2

explanation:

in [101]: df.groupby(['pplnum','roomnum']).cumcount() + 1 out[101]: 0    1 1    1 2    1 3    1 4    2 5    2 dtype: int64

group-by examples

Search This Blog

Today's Best Video

Python Pandas Identify Duplicated rows with Additional Column -

Comments

Post a Comment

Popular posts from this blog

ios - RestKit 0.20 — CoreData: error: Failed to call designated initializer on NSManagedObject class (again) -

laravel - PDOException in Connector.php line 55: SQLSTATE[HY000] [1045] Access denied for user 'root'@'localhost' (using password: YES) -

java - Digest auth with Spring Security using javaconfig -