compare - Comparing observations -
suppose dataset includes following variables:
set obs 100 generate var1 = rnormal() generate var2 = rnormal() input double(id var5 var6) 1 1052 17.348 2 1288 17.378 3 1536 17.387 4 2028 17.396 5 1810 17.402 6 2034 17.407 end input double(id var5 var6) 1 10000 0.4 2 22000 0.55 3 25000 0.5 4 40000 1 end
i need delete rows of ids have an increased value of var5 , reduced value of var6 compared @ least 1 other id. in first example, number 4 2028 , 17.396 should deleted. in second example, number 3 25000 , 0.5 should deleted. after elimination, observations of 3 variables should this:
1 1052 17.348 2 1288 17.378 3 1536 17.387 5 1810 17.402 6 2034 17.407 1 10000 0.4 2 22000 0.55 4 40000 1
while var1
, var2
should remain intact.
how can this?
this odd because appear have dataset unrelated variables. have initial dataset of 100 observations variables var1
, var2
, secondary dataset 6 observations variables var5
, var6
. objective appears to remove observations, values contained in variables var5
, var6
. looks spreadsheet thinking stata has single dataset in memory @ given time.
the task of identifying observations drop requires compare each observations values var5
, var6
other observations values variables. can done in stata forming pairwise combinations using cross
command.
here's solution starts data organized presented , separates 2 datasets in order perform task of dropping observations based on var5
, var6
values. since datasets appear unrelated, unmatched merge
used recombine data.
clear set obs 100 generate var1 = rnormal() generate var2 = rnormal() input double(id var5 var6) 1 1052 17.348 2 1288 17.378 3 1536 17.387 4 2028 17.396 5 1810 17.402 6 2034 17.407 end tempfile main save "`main'" * extract secondary dataset keep id var5 var6 keep if !mi(id) tempfile data2 save "`data2'" * form pairwise combinations rename * =_0 cross using "`data2'" * identify cases there's increase in var5 , decrease in var6 gen todrop = var5_0 > var5 & var6_0 < var6 * drop id if there's @ least 1 case, reduce original obs , vars bysort id_0 (todrop): keep if !todrop[_n] keep if id == id_0 keep id var5 var6 list * merge original data, use unmatched merge since * secondary data unrelated sort id tempfile newdata2 save "`newdata2'" use "`main'", clear drop id var5 var6 merge 1:1 _n using "`newdata2'", nogen
Comments
Post a Comment