compare - Comparing observations -


suppose dataset includes following variables:

set obs 100 generate var1 = rnormal() generate var2 = rnormal()  input double(id var5 var6) 1 1052 17.348 2 1288 17.378 3 1536 17.387 4 2028 17.396 5 1810 17.402 6 2034 17.407 end  input double(id var5 var6) 1 10000 0.4 2 22000 0.55 3 25000 0.5 4 40000 1 end 

i need delete rows of ids have an increased value of var5 , reduced value of var6 compared @ least 1 other id. in first example, number 4 2028 , 17.396 should deleted. in second example, number 3 25000 , 0.5 should deleted. after elimination, observations of 3 variables should this:

1 1052 17.348 2 1288 17.378 3 1536 17.387 5 1810 17.402 6 2034 17.407  1 10000 0.4 2 22000 0.55 4 40000 1 

while var1 , var2 should remain intact.

how can this?

this odd because appear have dataset unrelated variables. have initial dataset of 100 observations variables var1 , var2 , secondary dataset 6 observations variables var5 , var6. objective appears to remove observations, values contained in variables var5 , var6. looks spreadsheet thinking stata has single dataset in memory @ given time.

the task of identifying observations drop requires compare each observations values var5 , var6 other observations values variables. can done in stata forming pairwise combinations using cross command.

here's solution starts data organized presented , separates 2 datasets in order perform task of dropping observations based on var5 , var6 values. since datasets appear unrelated, unmatched merge used recombine data.

clear set obs 100 generate var1 = rnormal() generate var2 = rnormal()  input double(id var5 var6) 1 1052 17.348 2 1288 17.378 3 1536 17.387 4 2028 17.396 5 1810 17.402 6 2034 17.407 end tempfile main save "`main'"  * extract secondary dataset  keep id var5 var6 keep if !mi(id) tempfile data2 save "`data2'"  * form pairwise combinations rename * =_0 cross using "`data2'"  * identify cases there's increase in var5 , decrease in var6 gen todrop = var5_0 > var5  & var6_0 < var6  * drop id if there's @ least 1 case, reduce original obs , vars bysort id_0 (todrop): keep if !todrop[_n] keep if id == id_0 keep id var5 var6 list  * merge original data, use unmatched merge since  * secondary data unrelated sort id tempfile newdata2 save "`newdata2'" use "`main'", clear drop id var5 var6 merge 1:1 _n using "`newdata2'", nogen 

Comments

Popular posts from this blog

ios - RestKit 0.20 — CoreData: error: Failed to call designated initializer on NSManagedObject class (again) -

laravel - PDOException in Connector.php line 55: SQLSTATE[HY000] [1045] Access denied for user 'root'@'localhost' (using password: YES) -

java - Digest auth with Spring Security using javaconfig -