Compare each pair of dates in two columns in python efficiently -
i have data frame column of start dates , column of end dates. want check integrity of dates ensuring start date before end date (i.e. start_date < end_date).i have on 14,000 observations run through.
i have data in form of:
start end 0 2008-10-01 2008-10-31 1 2006-07-01 2006-12-31 2 2000-05-01 2002-12-31 3 1971-08-01 1973-12-31 4 1969-01-01 1969-12-31
i have added column write result to, though want highlight whether there incorrect ones can delete them:
dates['correct'] = " "
and have began check each date pair using following, dataframe called dates:
for index, row in dates.iterrows(): if dates.start[index] < dates.end[index]: dates.correct[index] = "correct" elif dates.start[index] == dates.end[index]: dates.correct[index] = "same" elif dates.start[index] > dates.end[index]: dates.correct[index] = "incorrect"
which works, taking really long-time (about on 15 minutes). need more efficiently running code - there doing wrong or improve?
why not in vectorized way:
is_correct = dates['start'] < dates['end'] is_incorrect = dates['start'] > dates['end'] is_same = ~is_correct & ~is_incorrect
Comments
Post a Comment