perl - Analyse text records based on the value of two fields -
i using perl test couple of conditions in each line of input file. one-liner below works records not all.
in current output, lines 2,3, , 5 correct, lines 1 , 4 not, presumably because stb value has 2 comma-separated values in instead of one. instance stb=0.5,0.645036; instead of stb=0.590597;.
i can not seem figure out how apply same logic both conditions, if stb >= 0.8, "strand bias" , reads value of fdp field.
the input file have lines in 1 stb value , two.
perl
perl -ple '/^[^#].*fdp=(\d+);.*stb=(\d+\.\d+);/ , $_.=($2 >= 0.8?" strand bias ":" ").$1." reads"' input > out input
chr1 93159358 . ct ac,c 51.3482 pass af=0,0.538462;ao=4,12;dp=39;fao=0,21;fdp=39;fr=.;fro=18;fsaf=0,11;fsar=0,10;fsrf=15;fsrr=3;fwdb=0.0379899,0.0954749;fxx=0;hrun=1,5;len=2,1;mlld=22.441,10.1519;oalt=ac,-;oid=.,.;omapalt=ac,c;opos=93159358,93159359;oref=ct,t;pb=0.5,0.5;pbp=1,1;qd=5.26648;rbi=0.0698716,0.219287;refb=-0.0299799,-0.0774582;revb=0.0586414,0.197411;ro=22;saf=0,9;sar=4,3;srf=17;srr=5;ssen=0,0;ssep=0,0;sssb=-0.747246,-0.0336118;stb=0.5,0.645036;stbp=1,0.086;type=mnp,del;varb=0.059091,0.135819;ann=evi5 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr 0/2:46:39:39:22:18:4,12:0,21:0,0.538462:4,3:0,9:17:5:0,10:0,11:15:3 chr1 93073228 . c 142.937 pass af=0.4;ao=42;dp=105;fao=42;fdp=105;fr=.;fro=63;fsaf=25;fsar=17;fsrf=28;fsrr=35;fwdb=-0.00213313;fxx=0.00943307;hrun=2;len=1;mlld=178.966;oalt=a;oid=.;omapalt=a;opos=93073228;oref=c;pb=0.5;pbp=1;qd=5.44523;rbi=0.00753887;refb=-0.0179184;revb=-0.00723079;ro=63;saf=25;sar=17;srf=28;srr=35;ssen=0;ssep=0;sssb=0.159972;stb=0.590597;stbp=0.144;type=snp;varb=0.0207923;ann=evi5 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr 0/1:142:105:105:63:63:42:42:0.4:17:25:28:35:17:25:28:35 chr1 93089823 . t c 1038.33 pass af=1;ao=110;dp=111;fao=111;fdp=111;fr=.;fro=0;fsaf=76;fsar=35;fsrf=0;fsrr=0;fwdb=0.0247073;fxx=0.00892777;hrun=2;len=1;mlld=59.5565;oalt=c;oid=.;omapalt=c;opos=93089823;oref=t;pb=0.5;pbp=1;qd=37.4173;rbi=0.025038;refb=-0.0649256;revb=-0.0040564;ro=1;saf=75;sar=35;srf=1;srr=0;ssen=0;ssep=0;sssb=-0.00628837;stb=0.5;stbp=1;type=snp;varb=0.000880627;ann=evi5 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr 1/1:42:111:111:1:0:110:111:1:35:75:1:0:35:76:0:0 chr11 36596027 . ag aa,a 1031.71 pass af=0.121875,0.703125;ao=52,118;dp=333;fao=39,225;fdp=320;fr=.;fro=56;fsaf=2,136;fsar=37,89;fsrf=14;fsrr=42;fwdb=0.0148693,0.00188064;fxx=0.0615818;hrun=5,5;len=1,1;mlld=11.6837,10.3394;oalt=a,-;oid=.,.;omapalt=aa,a;opos=36596028,36596028;oref=g,g;pb=0.5,0.5;pbp=1,1;qd=12.8964;rbi=0.065829,0.11083;refb=-0.0510698,-0.110624;revb=0.0641277,0.110814;ro=85;saf=2,84;sar=50,34;srf=17;srr=68;ssen=0,0;ssep=0,0.328125;sssb=-0.504054,0.394265;stb=0.789128,0.571642;stbp=0.007,0;type=snp,del;varb=0.0245642,0.134756;ann=rag1 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr 0/2:42:333:320:85:56:52,118:39,225:0.121875,0.703125:50,34:2,84:17:68:37,89:2,136:14:42 chr11 95825383 . c t 143.023 pass af=0.47561;ao=28;dp=71;fao=39;fdp=82;fr=.;fro=43;fsaf=6;fsar=33;fsrf=40;fsrr=3;fwdb=-0.0301041;fxx=0.0238067;hrun=1;len=1;mlld=189.321;oalt=t;oid=.;omapalt=t;opos=95825383;oref=c;pb=0.5;pbp=1;qd=6.97675;rbi=0.153139;refb=-0.0165525;revb=0.150151;ro=43;saf=6;sar=22;srf=40;srr=3;ssen=0;ssep=0;sssb=-0.666847;stb=0.875258;stbp=0;type=snp;varb=0.0275999;ann=maml2 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr 0/1:143:71:82:43:43:28:39:0.47561:22:6:40:3:33:6:40:3 current output (lines 2,3, , 5 correct)
line 1 stb=0.5,0.645036 line 4 stb=0.789128,0.571642 chr1 93159358 . ct ac,c 51.3482 pass af=0,0.538462;ao=4,12;dp=39;fao=0,21;fdp=39;fr=.;fro=18;fsaf=0,11;fsar=0,10;fsrf=15;fsrr=3;fwdb=0.0379899,0.0954749;fxx=0;hrun=1,5;len=2,1;mlld=22.441,10.1519;oalt=ac,-;oid=.,.;omapalt=ac,c;opos=93159358,93159359;oref=ct,t;pb=0.5,0.5;pbp=1,1;qd=5.26648;rbi=0.0698716,0.219287;refb=-0.0299799,-0.0774582;revb=0.0586414,0.197411;ro=22;saf=0,9;sar=4,3;srf=17;srr=5;ssen=0,0;ssep=0,0;sssb=-0.747246,-0.0336118;stb=0.5,0.645036;stbp=1,0.086;type=mnp,del;varb=0.059091,0.135819;ann=evi5 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr 0/2:46:39:39:22:18:4,12:0,21:0,0.538462:4,3:0,9:17:5:0,10:0,11:15:3 chr1 93073228 . c 142.937 pass af=0.4;ao=42;dp=105;fao=42;fdp=105;fr=.;fro=63;fsaf=25;fsar=17;fsrf=28;fsrr=35;fwdb=-0.00213313;fxx=0.00943307;hrun=2;len=1;mlld=178.966;oalt=a;oid=.;omapalt=a;opos=93073228;oref=c;pb=0.5;pbp=1;qd=5.44523;rbi=0.00753887;refb=-0.0179184;revb=-0.00723079;ro=63;saf=25;sar=17;srf=28;srr=35;ssen=0;ssep=0;sssb=0.159972;stb=0.590597;stbp=0.144;type=snp;varb=0.0207923;ann=evi5 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr 0/1:142:105:105:63:63:42:42:0.4:17:25:28:35:17:25:28:35 105 reads chr1 93089823 . t c 1038.33 pass af=1;ao=110;dp=111;fao=111;fdp=111;fr=.;fro=0;fsaf=76;fsar=35;fsrf=0;fsrr=0;fwdb=0.0247073;fxx=0.00892777;hrun=2;len=1;mlld=59.5565;oalt=c;oid=.;omapalt=c;opos=93089823;oref=t;pb=0.5;pbp=1;qd=37.4173;rbi=0.025038;refb=-0.0649256;revb=-0.0040564;ro=1;saf=75;sar=35;srf=1;srr=0;ssen=0;ssep=0;sssb=-0.00628837;stb=0.5;stbp=1;type=snp;varb=0.000880627;ann=evi5 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr 1/1:42:111:111:1:0:110:111:1:35:75:1:0:35:76:0:0 111 reads chr11 36596027 . ag aa,a 1031.71 pass af=0.121875,0.703125;ao=52,118;dp=333;fao=39,225;fdp=320;fr=.;fro=56;fsaf=2,136;fsar=37,89;fsrf=14;fsrr=42;fwdb=0.0148693,0.00188064;fxx=0.0615818;hrun=5,5;len=1,1;mlld=11.6837,10.3394;oalt=a,-;oid=.,.;omapalt=aa,a;opos=36596028,36596028;oref=g,g;pb=0.5,0.5;pbp=1,1;qd=12.8964;rbi=0.065829,0.11083;refb=-0.0510698,-0.110624;revb=0.0641277,0.110814;ro=85;saf=2,84;sar=50,34;srf=17;srr=68;ssen=0,0;ssep=0,0.328125;sssb=-0.504054,0.394265;stb=0.789128,0.571642;stbp=0.007,0;type=snp,del;varb=0.0245642,0.134756;ann=rag1 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr 0/2:42:333:320:85:56:52,118:39,225:0.121875,0.703125:50,34:2,84:17:68:37,89:2,136:14:42 chr11 95825383 . c t 143.023 pass af=0.47561;ao=28;dp=71;fao=39;fdp=82;fr=.;fro=43;fsaf=6;fsar=33;fsrf=40;fsrr=3;fwdb=-0.0301041;fxx=0.0238067;hrun=1;len=1;mlld=189.321;oalt=t;oid=.;omapalt=t;opos=95825383;oref=c;pb=0.5;pbp=1;qd=6.97675;rbi=0.153139;refb=-0.0165525;revb=0.150151;ro=43;saf=6;sar=22;srf=40;srr=3;ssen=0;ssep=0;sssb=-0.666847;stb=0.875258;stbp=0;type=snp;varb=0.0275999;ann=maml2 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr 0/1:143:71:82:43:43:28:39:0.47561:22:6:40:3:33:6:40:3 strand bias 82 reads desired output
chr1 93159358 . ct ac,c 51.3482 pass af=0,0.538462;ao=4,12;dp=39;fao=0,21;fdp=39;fr=.;fro=18;fsaf=0,11;fsar=0,10;fsrf=15;fsrr=3;fwdb=0.0379899,0.0954749;fxx=0;hrun=1,5;len=2,1;mlld=22.441,10.1519;oalt=ac,-;oid=.,.;omapalt=ac,c;opos=93159358,93159359;oref=ct,t;pb=0.5,0.5;pbp=1,1;qd=5.26648;rbi=0.0698716,0.219287;refb=-0.0299799,-0.0774582;revb=0.0586414,0.197411;ro=22;saf=0,9;sar=4,3;srf=17;srr=5;ssen=0,0;ssep=0,0;sssb=-0.747246,-0.0336118;stb=0.5,0.645036;stbp=1,0.086;type=mnp,del;varb=0.059091,0.135819;ann=evi5 39 reads readsgt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr 0/2:46:39:39:22:18:4,12:0,21:0,0.538462:4,3:0,9:17:5:0,10:0,11:15:3 chr1 93073228 . c 142.937 pass af=0.4;ao=42;dp=105;fao=42;fdp=105;fr=.;fro=63;fsaf=25;fsar=17;fsrf=28;fsrr=35;fwdb=-0.00213313;fxx=0.00943307;hrun=2;len=1;mlld=178.966;oalt=a;oid=.;omapalt=a;opos=93073228;oref=c;pb=0.5;pbp=1;qd=5.44523;rbi=0.00753887;refb=-0.0179184;revb=-0.00723079;ro=63;saf=25;sar=17;srf=28;srr=35;ssen=0;ssep=0;sssb=0.159972;stb=0.590597;stbp=0.144;type=snp;varb=0.0207923;ann=evi5 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr 0/1:142:105:105:63:63:42:42:0.4:17:25:28:35:17:25:28:35 105 reads chr1 93089823 . t c 1038.33 pass af=1;ao=110;dp=111;fao=111;fdp=111;fr=.;fro=0;fsaf=76;fsar=35;fsrf=0;fsrr=0;fwdb=0.0247073;fxx=0.00892777;hrun=2;len=1;mlld=59.5565;oalt=c;oid=.;omapalt=c;opos=93089823;oref=t;pb=0.5;pbp=1;qd=37.4173;rbi=0.025038;refb=-0.0649256;revb=-0.0040564;ro=1;saf=75;sar=35;srf=1;srr=0;ssen=0;ssep=0;sssb=-0.00628837;stb=0.5;stbp=1;type=snp;varb=0.000880627;ann=evi5 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr 1/1:42:111:111:1:0:110:111:1:35:75:1:0:35:76:0:0 111 reads chr11 36596027 . ag aa,a 1031.71 pass af=0.121875,0.703125;ao=52,118;dp=333;fao=39,225;fdp=320;fr=.;fro=56;fsaf=2,136;fsar=37,89;fsrf=14;fsrr=42;fwdb=0.0148693,0.00188064;fxx=0.0615818;hrun=5,5;len=1,1;mlld=11.6837,10.3394;oalt=a,-;oid=.,.;omapalt=aa,a;opos=36596028,36596028;oref=g,g;pb=0.5,0.5;pbp=1,1;qd=12.8964;rbi=0.065829,0.11083;refb=-0.0510698,-0.110624;revb=0.0641277,0.110814;ro=85;saf=2,84;sar=50,34;srf=17;srr=68;ssen=0,0;ssep=0,0.328125;sssb=-0.504054,0.394265;stb=0.789128,0.571642;stbp=0.007,0;type=snp,del;varb=0.0245642,0.134756;ann=rag1 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr 0/2:42:333:320:85:56:52,118:39,225:0.121875,0.703125:50,34:2,84:17:68:37,89:2,136:14:42 320 reads chr11 95825383 . c t 143.023 pass af=0.47561;ao=28;dp=71;fao=39;fdp=82;fr=.;fro=43;fsaf=6;fsar=33;fsrf=40;fsrr=3;fwdb=-0.0301041;fxx=0.0238067;hrun=1;len=1;mlld=189.321;oalt=t;oid=.;omapalt=t;opos=95825383;oref=c;pb=0.5;pbp=1;qd=6.97675;rbi=0.153139;refb=-0.0165525;revb=0.150151;ro=43;saf=6;sar=22;srf=40;srr=3;ssen=0;ssep=0;sssb=-0.666847;stb=0.875258;stbp=0;type=snp;varb=0.0275999;ann=maml2 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr 0/1:143:71:82:43:43:28:39:0.47561:22:6:40:3:33:6:40:3 strand bias 82 reads
it nice have more information
on face of it, long you're happy ignore second value stb, can remove insistence of semicolon ; after value. this
perl -ple '/^[^#].*fdp=(\d+);.*stb=(\d+\.\d+)/ , $_.=($2 >= 0.8?" strand bias ":" ").$1." reads"' but that's pretty awful perl code and, have found, damned hard debug
i prefer this: script extracts all of labelled values hash , uses them directly there
i've guessed want all of stb values under 0.8 result, i've used all function list::util test that. split stb value on commas , create boolean $all_ok status variable indicates whether true. works fine whether there's 1 value or 1 thousand
then printf builds output string components we've calculated
i've emptied $line variable can see appended each line debugging purposes. delete statement or comment out real run
use strict; use warnings 'all'; use list::util 'all'; while ( <> ) { next unless /\s/; @fields = split; chomp( $line = $_ ); %values = map { split /=/ } split /;/, $fields[7]; $all_ok = { $_ < 0.8 } split /,/, $values{stb}; $line = ''; # debugging printf "%s %s %s reads\n", $line, $all_ok ? 'good' : 'strand bias', $values{fdp}; } output
39 reads 105 reads 111 reads 320 reads strand bias 82 reads update
now you've explained want list of status values in output can write better solution. no longer need list::util::all, instead need create array of boolean status values list of stb data
don't forget comment out $line = '' before run data real
it looks this
use strict; use warnings 'all'; use list::util 'all'; while ( <> ) { next unless /\s/; @fields = split; chomp( $line = $_ ); %values = map { split /=/ } split /;/, $fields[7]; @stb_ok = map { $_ < 0.8 } split /,/, $values{stb}; @good = map { $_ ? 'good' : 'strand bias' } @stb_ok; $line = ''; # debugging printf "%s %s %s reads\n", $line, "@good", $values{fdp}; } output
good 39 reads 105 reads 111 reads good 320 reads strand bias 82 reads
Comments
Post a Comment