Python Pandas Identify Duplicated rows with Additional Column -
i have following dataframe
:
df out[23]: pplnum roomnum value 0 1 0 265 1 1 12 170 2 2 0 297 3 2 12 85 4 2 0 41 5 2 12 144
generally pplnum
, roomnum
generated this, , follow format:
for ppl in [1,2,2]: room in [0, 12]: print(ppl, room) 1 0 1 12 2 0 2 12 2 0 2 12
but achieve mark duplicates combinations of pplnum
, roomnum
can know combinationss first occurrence, second occurrence , on... expected output dataframe this:
pplnum roomnum value c 0 1 0 265 1 1 1 12 170 1 2 2 0 297 1 3 2 12 85 1 4 2 0 41 2 5 2 12 144 2
you can using groupby() cumcount() function:
in [102]: df['c'] = df.groupby(['pplnum','roomnum']).cumcount() + 1 in [103]: df out[103]: pplnum roomnum value c 0 1 0 265 1 1 1 12 170 1 2 2 0 297 1 3 2 12 85 1 4 2 0 41 2 5 2 12 144 2
explanation:
in [101]: df.groupby(['pplnum','roomnum']).cumcount() + 1 out[101]: 0 1 1 1 2 1 3 1 4 2 5 2 dtype: int64
Comments
Post a Comment