Compare each pair of dates in two columns in python efficiently -


i have data frame column of start dates , column of end dates. want check integrity of dates ensuring start date before end date (i.e. start_date < end_date).i have on 14,000 observations run through.

i have data in form of:

    start       end 0   2008-10-01  2008-10-31   1   2006-07-01  2006-12-31   2   2000-05-01  2002-12-31   3   1971-08-01  1973-12-31   4   1969-01-01  1969-12-31   

i have added column write result to, though want highlight whether there incorrect ones can delete them:

dates['correct'] = " " 

and have began check each date pair using following, dataframe called dates:

for index, row in dates.iterrows():     if dates.start[index] < dates.end[index]:         dates.correct[index] = "correct"     elif dates.start[index] == dates.end[index]:         dates.correct[index] = "same"     elif dates.start[index] > dates.end[index]:         dates.correct[index] = "incorrect" 

which works, taking really long-time (about on 15 minutes). need more efficiently running code - there doing wrong or improve?

why not in vectorized way:

is_correct = dates['start'] < dates['end'] is_incorrect = dates['start'] > dates['end'] is_same = ~is_correct & ~is_incorrect 

Comments

Popular posts from this blog

ios - RestKit 0.20 — CoreData: error: Failed to call designated initializer on NSManagedObject class (again) -

java - Digest auth with Spring Security using javaconfig -

laravel - PDOException in Connector.php line 55: SQLSTATE[HY000] [1045] Access denied for user 'root'@'localhost' (using password: YES) -