hadoop - How to use MAX and COUNT function simultaneously on two different tables which are applied with a JOIN? -


//pig program  user = load 'path' using pigstorage(',') (id:int, reputation:int, displayname:chararray, loc:chararray, age:int);  post = load 'path' using pigstorage(',') (id:int, post_type:int, creationdate:chararray, score:int, viewcount:int, ownerus)er_id:int, title:chararray, answercount:chararray, commentcount:chararray);  join user id, post id;  = join user id, post id;  dump a;  user_group = group all;  max_reputation = foreach user_group generate(user.displayname, user.reputation, post.id), max(user.reputation), count(post.id); 

so grouped 2 different tables i.e. user , post applied join it.

problem statment :to find displayname , no of posts of user having maximum reputation.

so need displayname , reputation user

and id post

and want apply max(user.reputation) , count(post.id) on join i.e. a

please help.

what more useful , applying join , doing max , count or applying max , count , doing join.

problem statement :to find displayname , no of posts of user having maximum reputation.

at first try find displayname of user having maximum reputation of relation "user"

and apply join relation "post" gather post's max user.. apply grouping on basis on id , take count .

the below code reach goal

user = load 'path' using pigstorage(',') (id:int, reputation:int, displayname:chararray, loc:chararray, age:int);  post = load 'path' using pigstorage(',') (id:int, post_type:int, creationdate:chararray,score:int, viewcount:int, ownerus)er_id:int, title:chararray, answercount:chararray);  user_grp = group user id;  user_each = foreach user_grp                   {                    user_order = order user reputation desc;                    user_limit = limit user_order 1;                    user_nested = foreach user_limit generate id,displayname;                    generate flatten(user_nested)  (user_id,displayname);                  }; user_join = join user_each user_id, post id;   user_grouping = group user_join user_id;  user_output  = foreach user_grouping generate group user_id, max(user_join.displayname) displayname, count(user_join.post_type) post_cnts; 

Comments

Popular posts from this blog

ios - RestKit 0.20 — CoreData: error: Failed to call designated initializer on NSManagedObject class (again) -

java - Digest auth with Spring Security using javaconfig -

laravel - PDOException in Connector.php line 55: SQLSTATE[HY000] [1045] Access denied for user 'root'@'localhost' (using password: YES) -