hadoop - How to use MAX and COUNT function simultaneously on two different tables which are applied with a JOIN? -
//pig program user = load 'path' using pigstorage(',') (id:int, reputation:int, displayname:chararray, loc:chararray, age:int); post = load 'path' using pigstorage(',') (id:int, post_type:int, creationdate:chararray, score:int, viewcount:int, ownerus)er_id:int, title:chararray, answercount:chararray, commentcount:chararray); join user id, post id; = join user id, post id; dump a; user_group = group all; max_reputation = foreach user_group generate(user.displayname, user.reputation, post.id), max(user.reputation), count(post.id);
so grouped 2 different tables i.e. user , post applied join it.
problem statment :to find displayname , no of posts of user having maximum reputation.
so need displayname , reputation user
and id post
and want apply max(user.reputation) , count(post.id) on join i.e. a
please help.
what more useful , applying join , doing max , count or applying max , count , doing join.
problem statement :to find displayname , no of posts of user having maximum reputation.
at first try find displayname of user having maximum reputation of relation "user"
and apply join relation "post" gather post's max user.. apply grouping on basis on id , take count .
the below code reach goal
user = load 'path' using pigstorage(',') (id:int, reputation:int, displayname:chararray, loc:chararray, age:int); post = load 'path' using pigstorage(',') (id:int, post_type:int, creationdate:chararray,score:int, viewcount:int, ownerus)er_id:int, title:chararray, answercount:chararray); user_grp = group user id; user_each = foreach user_grp { user_order = order user reputation desc; user_limit = limit user_order 1; user_nested = foreach user_limit generate id,displayname; generate flatten(user_nested) (user_id,displayname); }; user_join = join user_each user_id, post id; user_grouping = group user_join user_id; user_output = foreach user_grouping generate group user_id, max(user_join.displayname) displayname, count(user_join.post_type) post_cnts;
Comments
Post a Comment