Forming groups of spatio-temporally near trajectories in R or PostgreSQL -
i'm doing trajectory analysis using r , postgresql. in order form groups of trajectory segments successive positions spatio-temporally near, i've created following table. i'm still missing column group_id
, question about.
bike_id1 datetime bike_id2 near group_id 1 2016-05-28 11:00:00 2 true 1 1 2016-05-28 11:00:05 2 true 1 1 2016-05-28 11:00:10 2 false na [...] 2 2016-05-28 11:00:05 3 true 1 2 2016-05-28 11:00:10 3 true 1
this result of multiple comparisons between each trajectory every other (all combinations without repetitions) , inner join on datetime
(sampled on multiple of 5 seconds). shows positions, bike 1 , 2 sampled @ same time , spatially near (some arbitrary threshold).
now i'd give away unique ids segments 2 bikes spatio-temporally near (group_id
). this i'm stuck: i'd want group_id
respect groups multiple trajectories. method assigning group_id
should realize if bike 1 , 2 in group @ 2016-05-28 11:00:05
, 3 belongs same group if near 2 @ same timestamp (2016-05-28 11:00:05
).
are there tools within r or postgresql me task? running loop through table seems wrong way go this.
edit: @wildplasser pointed out, seems gaps-and-islands problem traditionally solved using sql. has kindly produced sample data have extended , include in question.
create table nearness -- ( seq serial not null unique -- surrogate conveniance ( bike1 integer not null , bike2 integer not null , stamp timestamp not null , near boolean , primary key(bike1,bike2,stamp) ); insert nearness( bike1,bike2,stamp,near) values (1,2, '2016-05-28 11:00:00', true) ,(1,2, '2016-05-28 11:00:05', true) ,(1,2, '2016-05-28 11:00:10', true) ,(1,2, '2016-05-28 11:00:20', true) -- <<-- gap here ,(1,2, '2016-05-28 11:00:25', true) ,(1,2, '2016-05-28 11:00:30', false) ,(4,5, '2016-05-28 11:00:00', false) ,(4,5, '2016-05-28 11:00:05', false) ,(4,5, '2016-05-28 11:00:10', true) ,(4,5, '2016-05-28 11:00:15', true) ,(4,5, '2016-05-28 11:00:20', true) ,(2,3, '2016-05-28 11:00:05', true) -- <<-- bike 1, 2, 3 in 1 grp @ 11:00:05 ,(2,3, '2016-05-28 11:00:10', true) -- <<-- no group here ,(6,7, '2016-05-28 11:00:00', false) ,(6,7, '2016-05-28 11:00:05', false) ;
update: [after understanding real question ;-] finding equivalence groups of bikes (set, bike_set) in fact relational-division problem. finding begin , end of segments (clust) within set of bikes same in first attempt.
- the clusters stored in arrays: (i trust on clusters not becoming large)
- the array built recursive query: every pair of bikes has 1 member in common current cluster merged it.
- at end, arrays contain bike_id's happened within reach @ particular time.
- (plus intermediate rows need suppressed later
uniq
cte) - the rest more-or-less standard gap-detection in time-series.
note: code trusts on (bike2 > bike1)
. needed keep array sorted , canonical. actual content not guaranteed canonical because order of addition in recursive query cannot guaranteed. may need work.
create table nearness ( bike1 integer not null , bike2 integer not null , stamp timestamp not null , near boolean , primary key(bike1,bike2,stamp) ); insert nearness( bike1,bike2,stamp,near) values (1,2, '2016-05-28 11:00:00', true) ,(1,2, '2016-05-28 11:00:05', true) ,(1,2, '2016-05-28 11:00:10', true) ,(1,2, '2016-05-28 11:00:20', true) -- <<-- gap here ,(1,2, '2016-05-28 11:00:25', true) ,(1,2, '2016-05-28 11:00:30', false) -- <<-- these false-records serve no pupose ,(4,5, '2016-05-28 11:00:00', false) -- <<-- result same without them ,(4,5, '2016-05-28 11:00:05', false) ,(4,5, '2016-05-28 11:00:10', true) ,(4,5, '2016-05-28 11:00:15', true) ,(4,5, '2016-05-28 11:00:20', true) ,(2,3, '2016-05-28 11:00:05', true) -- <<-- bike 1, 2, 3 in 1 grp @ 11:00:05 ,(2,3, '2016-05-28 11:00:10', true) -- <<-- no group here ,(6,7, '2016-05-28 11:00:00', false) ,(6,7, '2016-05-28 11:00:05', false) ; -- recursive union-find glue sets of bike_ids -- ,occuring @ same moment. -- sets represented {ordered,unique} arrays here recursive wood ( omg ( select bike1 ,bike2,stamp , row_number() over(order bike1,bike2,stamp) seq , array[bike1,bike2]::integer[] arr nearness n near = true ) -- find existing combinations of bikes select o1.stamp, o1.seq , array[o1.bike1,o1.bike2]::integer[] arr omg o1 union select o2.stamp, o2.seq -- avoid duplicates inside array , case when o2.bike1 = any(w.arr) w.arr || o2.bike2 else w.arr || o2.bike1 end arr omg o2 join wood w on o2.stamp = w.stamp , o2.seq > w.seq , (o2.bike1 = any(w.arr) or o2.bike2 = any(w.arr)) , not (o2.bike1 = any(w.arr) , o2.bike2 = any(w.arr)) ) , uniq ( -- suppress partial sets caused recursive union-find buildup select * wood w not exists (select * wood nx nx.stamp = w.stamp , nx.arr @> w.arr , nx.arr <> w.arr -- contains not equal ) ) , xsets ( -- make unique sets of bikes select distinct arr -- , min(seq) grp uniq group arr ) , sets ( -- enumerate sets of bikes select arr , row_number() on () setnum xsets ) , drag ( -- detect beginning , end of segments of consecutive observations select u.* -- within constant set of bike_ids -- edge-detection begin of group , not exists (select * uniq nx nx.arr = u.arr , nx.stamp < u.stamp , nx.stamp >= u.stamp - '5 sec'::interval ) is_first -- edge-detection end of group , not exists (select * uniq nx nx.arr = u.arr , nx.stamp > u.stamp , nx.stamp <= u.stamp + '5 sec'::interval ) is_last , row_number() over(order arr,stamp) nseq uniq u ) , top ( -- id , groupnum start of group select nseq , row_number() on () clust drag is_first ) , bot ( -- id , groupnum end of group select nseq , row_number() on () clust drag is_last ) select w.seq orgseq -- results, please ... , w.stamp , g0.clust clust , row_number() over(www) rn , s.setnum, s.arr bike_set drag w join sets s on s.arr = w.arr join top g0 on g0.nseq <= w.seq join bot g1 on g1.nseq >= w.seq , g1.clust = g0.clust window www (partition g1.clust order w.stamp) order g1.clust, w.stamp ;
result:
orgseq | stamp | clust | rn | setnum | bike_set --------+---------------------+-------+----+--------+---------- 1 | 2016-05-28 11:00:00 | 1 | 1 | 1 | {1,2} 4 | 2016-05-28 11:00:20 | 3 | 1 | 1 | {1,2} 5 | 2016-05-28 11:00:25 | 3 | 2 | 1 | {1,2} 6 | 2016-05-28 11:00:05 | 4 | 1 | 3 | {1,2,3} 7 | 2016-05-28 11:00:10 | 4 | 2 | 3 | {1,2,3} 8 | 2016-05-28 11:00:10 | 4 | 3 | 2 | {4,5} (6 rows)
Comments
Post a Comment