perl - Analyse text records based on the value of two fields -


i using perl test couple of conditions in each line of input file. one-liner below works records not all.

in current output, lines 2,3, , 5 correct, lines 1 , 4 not, presumably because stb value has 2 comma-separated values in instead of one. instance stb=0.5,0.645036; instead of stb=0.590597;.

i can not seem figure out how apply same logic both conditions, if stb >= 0.8, "strand bias" , reads value of fdp field.

the input file have lines in 1 stb value , two.

perl

perl -ple '/^[^#].*fdp=(\d+);.*stb=(\d+\.\d+);/ , $_.=($2 >= 0.8?" strand bias ":" ").$1." reads"' input > out 

input

chr1    93159358    .   ct  ac,c    51.3482 pass    af=0,0.538462;ao=4,12;dp=39;fao=0,21;fdp=39;fr=.;fro=18;fsaf=0,11;fsar=0,10;fsrf=15;fsrr=3;fwdb=0.0379899,0.0954749;fxx=0;hrun=1,5;len=2,1;mlld=22.441,10.1519;oalt=ac,-;oid=.,.;omapalt=ac,c;opos=93159358,93159359;oref=ct,t;pb=0.5,0.5;pbp=1,1;qd=5.26648;rbi=0.0698716,0.219287;refb=-0.0299799,-0.0774582;revb=0.0586414,0.197411;ro=22;saf=0,9;sar=4,3;srf=17;srr=5;ssen=0,0;ssep=0,0;sssb=-0.747246,-0.0336118;stb=0.5,0.645036;stbp=1,0.086;type=mnp,del;varb=0.059091,0.135819;ann=evi5    gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr   0/2:46:39:39:22:18:4,12:0,21:0,0.538462:4,3:0,9:17:5:0,10:0,11:15:3 chr1    93073228    .   c     142.937 pass    af=0.4;ao=42;dp=105;fao=42;fdp=105;fr=.;fro=63;fsaf=25;fsar=17;fsrf=28;fsrr=35;fwdb=-0.00213313;fxx=0.00943307;hrun=2;len=1;mlld=178.966;oalt=a;oid=.;omapalt=a;opos=93073228;oref=c;pb=0.5;pbp=1;qd=5.44523;rbi=0.00753887;refb=-0.0179184;revb=-0.00723079;ro=63;saf=25;sar=17;srf=28;srr=35;ssen=0;ssep=0;sssb=0.159972;stb=0.590597;stbp=0.144;type=snp;varb=0.0207923;ann=evi5 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr   0/1:142:105:105:63:63:42:42:0.4:17:25:28:35:17:25:28:35 chr1    93089823    .   t   c   1038.33 pass    af=1;ao=110;dp=111;fao=111;fdp=111;fr=.;fro=0;fsaf=76;fsar=35;fsrf=0;fsrr=0;fwdb=0.0247073;fxx=0.00892777;hrun=2;len=1;mlld=59.5565;oalt=c;oid=.;omapalt=c;opos=93089823;oref=t;pb=0.5;pbp=1;qd=37.4173;rbi=0.025038;refb=-0.0649256;revb=-0.0040564;ro=1;saf=75;sar=35;srf=1;srr=0;ssen=0;ssep=0;sssb=-0.00628837;stb=0.5;stbp=1;type=snp;varb=0.000880627;ann=evi5    gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr   1/1:42:111:111:1:0:110:111:1:35:75:1:0:35:76:0:0 chr11   36596027    .   ag  aa,a    1031.71 pass    af=0.121875,0.703125;ao=52,118;dp=333;fao=39,225;fdp=320;fr=.;fro=56;fsaf=2,136;fsar=37,89;fsrf=14;fsrr=42;fwdb=0.0148693,0.00188064;fxx=0.0615818;hrun=5,5;len=1,1;mlld=11.6837,10.3394;oalt=a,-;oid=.,.;omapalt=aa,a;opos=36596028,36596028;oref=g,g;pb=0.5,0.5;pbp=1,1;qd=12.8964;rbi=0.065829,0.11083;refb=-0.0510698,-0.110624;revb=0.0641277,0.110814;ro=85;saf=2,84;sar=50,34;srf=17;srr=68;ssen=0,0;ssep=0,0.328125;sssb=-0.504054,0.394265;stb=0.789128,0.571642;stbp=0.007,0;type=snp,del;varb=0.0245642,0.134756;ann=rag1    gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr   0/2:42:333:320:85:56:52,118:39,225:0.121875,0.703125:50,34:2,84:17:68:37,89:2,136:14:42 chr11   95825383    .   c   t   143.023 pass    af=0.47561;ao=28;dp=71;fao=39;fdp=82;fr=.;fro=43;fsaf=6;fsar=33;fsrf=40;fsrr=3;fwdb=-0.0301041;fxx=0.0238067;hrun=1;len=1;mlld=189.321;oalt=t;oid=.;omapalt=t;opos=95825383;oref=c;pb=0.5;pbp=1;qd=6.97675;rbi=0.153139;refb=-0.0165525;revb=0.150151;ro=43;saf=6;sar=22;srf=40;srr=3;ssen=0;ssep=0;sssb=-0.666847;stb=0.875258;stbp=0;type=snp;varb=0.0275999;ann=maml2    gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr   0/1:143:71:82:43:43:28:39:0.47561:22:6:40:3:33:6:40:3 

current output (lines 2,3, , 5 correct)

line 1 stb=0.5,0.645036 line 4 stb=0.789128,0.571642 
chr1    93159358    .   ct  ac,c    51.3482 pass    af=0,0.538462;ao=4,12;dp=39;fao=0,21;fdp=39;fr=.;fro=18;fsaf=0,11;fsar=0,10;fsrf=15;fsrr=3;fwdb=0.0379899,0.0954749;fxx=0;hrun=1,5;len=2,1;mlld=22.441,10.1519;oalt=ac,-;oid=.,.;omapalt=ac,c;opos=93159358,93159359;oref=ct,t;pb=0.5,0.5;pbp=1,1;qd=5.26648;rbi=0.0698716,0.219287;refb=-0.0299799,-0.0774582;revb=0.0586414,0.197411;ro=22;saf=0,9;sar=4,3;srf=17;srr=5;ssen=0,0;ssep=0,0;sssb=-0.747246,-0.0336118;stb=0.5,0.645036;stbp=1,0.086;type=mnp,del;varb=0.059091,0.135819;ann=evi5    gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr   0/2:46:39:39:22:18:4,12:0,21:0,0.538462:4,3:0,9:17:5:0,10:0,11:15:3 chr1    93073228    .   c     142.937 pass    af=0.4;ao=42;dp=105;fao=42;fdp=105;fr=.;fro=63;fsaf=25;fsar=17;fsrf=28;fsrr=35;fwdb=-0.00213313;fxx=0.00943307;hrun=2;len=1;mlld=178.966;oalt=a;oid=.;omapalt=a;opos=93073228;oref=c;pb=0.5;pbp=1;qd=5.44523;rbi=0.00753887;refb=-0.0179184;revb=-0.00723079;ro=63;saf=25;sar=17;srf=28;srr=35;ssen=0;ssep=0;sssb=0.159972;stb=0.590597;stbp=0.144;type=snp;varb=0.0207923;ann=evi5 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr   0/1:142:105:105:63:63:42:42:0.4:17:25:28:35:17:25:28:35 105 reads chr1    93089823    .   t   c   1038.33 pass    af=1;ao=110;dp=111;fao=111;fdp=111;fr=.;fro=0;fsaf=76;fsar=35;fsrf=0;fsrr=0;fwdb=0.0247073;fxx=0.00892777;hrun=2;len=1;mlld=59.5565;oalt=c;oid=.;omapalt=c;opos=93089823;oref=t;pb=0.5;pbp=1;qd=37.4173;rbi=0.025038;refb=-0.0649256;revb=-0.0040564;ro=1;saf=75;sar=35;srf=1;srr=0;ssen=0;ssep=0;sssb=-0.00628837;stb=0.5;stbp=1;type=snp;varb=0.000880627;ann=evi5    gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr   1/1:42:111:111:1:0:110:111:1:35:75:1:0:35:76:0:0 111 reads chr11   36596027    .   ag  aa,a    1031.71 pass    af=0.121875,0.703125;ao=52,118;dp=333;fao=39,225;fdp=320;fr=.;fro=56;fsaf=2,136;fsar=37,89;fsrf=14;fsrr=42;fwdb=0.0148693,0.00188064;fxx=0.0615818;hrun=5,5;len=1,1;mlld=11.6837,10.3394;oalt=a,-;oid=.,.;omapalt=aa,a;opos=36596028,36596028;oref=g,g;pb=0.5,0.5;pbp=1,1;qd=12.8964;rbi=0.065829,0.11083;refb=-0.0510698,-0.110624;revb=0.0641277,0.110814;ro=85;saf=2,84;sar=50,34;srf=17;srr=68;ssen=0,0;ssep=0,0.328125;sssb=-0.504054,0.394265;stb=0.789128,0.571642;stbp=0.007,0;type=snp,del;varb=0.0245642,0.134756;ann=rag1    gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr   0/2:42:333:320:85:56:52,118:39,225:0.121875,0.703125:50,34:2,84:17:68:37,89:2,136:14:42 chr11   95825383    .   c   t   143.023 pass    af=0.47561;ao=28;dp=71;fao=39;fdp=82;fr=.;fro=43;fsaf=6;fsar=33;fsrf=40;fsrr=3;fwdb=-0.0301041;fxx=0.0238067;hrun=1;len=1;mlld=189.321;oalt=t;oid=.;omapalt=t;opos=95825383;oref=c;pb=0.5;pbp=1;qd=6.97675;rbi=0.153139;refb=-0.0165525;revb=0.150151;ro=43;saf=6;sar=22;srf=40;srr=3;ssen=0;ssep=0;sssb=-0.666847;stb=0.875258;stbp=0;type=snp;varb=0.0275999;ann=maml2    gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr   0/1:143:71:82:43:43:28:39:0.47561:22:6:40:3:33:6:40:3 strand bias 82 reads 

desired output

chr1    93159358    .   ct  ac,c    51.3482 pass    af=0,0.538462;ao=4,12;dp=39;fao=0,21;fdp=39;fr=.;fro=18;fsaf=0,11;fsar=0,10;fsrf=15;fsrr=3;fwdb=0.0379899,0.0954749;fxx=0;hrun=1,5;len=2,1;mlld=22.441,10.1519;oalt=ac,-;oid=.,.;omapalt=ac,c;opos=93159358,93159359;oref=ct,t;pb=0.5,0.5;pbp=1,1;qd=5.26648;rbi=0.0698716,0.219287;refb=-0.0299799,-0.0774582;revb=0.0586414,0.197411;ro=22;saf=0,9;sar=4,3;srf=17;srr=5;ssen=0,0;ssep=0,0;sssb=-0.747246,-0.0336118;stb=0.5,0.645036;stbp=1,0.086;type=mnp,del;varb=0.059091,0.135819;ann=evi5    39 reads readsgt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr   0/2:46:39:39:22:18:4,12:0,21:0,0.538462:4,3:0,9:17:5:0,10:0,11:15:3 chr1    93073228    .   c     142.937 pass    af=0.4;ao=42;dp=105;fao=42;fdp=105;fr=.;fro=63;fsaf=25;fsar=17;fsrf=28;fsrr=35;fwdb=-0.00213313;fxx=0.00943307;hrun=2;len=1;mlld=178.966;oalt=a;oid=.;omapalt=a;opos=93073228;oref=c;pb=0.5;pbp=1;qd=5.44523;rbi=0.00753887;refb=-0.0179184;revb=-0.00723079;ro=63;saf=25;sar=17;srf=28;srr=35;ssen=0;ssep=0;sssb=0.159972;stb=0.590597;stbp=0.144;type=snp;varb=0.0207923;ann=evi5 gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr   0/1:142:105:105:63:63:42:42:0.4:17:25:28:35:17:25:28:35 105 reads chr1    93089823    .   t   c   1038.33 pass    af=1;ao=110;dp=111;fao=111;fdp=111;fr=.;fro=0;fsaf=76;fsar=35;fsrf=0;fsrr=0;fwdb=0.0247073;fxx=0.00892777;hrun=2;len=1;mlld=59.5565;oalt=c;oid=.;omapalt=c;opos=93089823;oref=t;pb=0.5;pbp=1;qd=37.4173;rbi=0.025038;refb=-0.0649256;revb=-0.0040564;ro=1;saf=75;sar=35;srf=1;srr=0;ssen=0;ssep=0;sssb=-0.00628837;stb=0.5;stbp=1;type=snp;varb=0.000880627;ann=evi5    gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr   1/1:42:111:111:1:0:110:111:1:35:75:1:0:35:76:0:0 111 reads chr11   36596027    .   ag  aa,a    1031.71 pass    af=0.121875,0.703125;ao=52,118;dp=333;fao=39,225;fdp=320;fr=.;fro=56;fsaf=2,136;fsar=37,89;fsrf=14;fsrr=42;fwdb=0.0148693,0.00188064;fxx=0.0615818;hrun=5,5;len=1,1;mlld=11.6837,10.3394;oalt=a,-;oid=.,.;omapalt=aa,a;opos=36596028,36596028;oref=g,g;pb=0.5,0.5;pbp=1,1;qd=12.8964;rbi=0.065829,0.11083;refb=-0.0510698,-0.110624;revb=0.0641277,0.110814;ro=85;saf=2,84;sar=50,34;srf=17;srr=68;ssen=0,0;ssep=0,0.328125;sssb=-0.504054,0.394265;stb=0.789128,0.571642;stbp=0.007,0;type=snp,del;varb=0.0245642,0.134756;ann=rag1    gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr   0/2:42:333:320:85:56:52,118:39,225:0.121875,0.703125:50,34:2,84:17:68:37,89:2,136:14:42 320 reads chr11   95825383    .   c   t   143.023 pass    af=0.47561;ao=28;dp=71;fao=39;fdp=82;fr=.;fro=43;fsaf=6;fsar=33;fsrf=40;fsrr=3;fwdb=-0.0301041;fxx=0.0238067;hrun=1;len=1;mlld=189.321;oalt=t;oid=.;omapalt=t;opos=95825383;oref=c;pb=0.5;pbp=1;qd=6.97675;rbi=0.153139;refb=-0.0165525;revb=0.150151;ro=43;saf=6;sar=22;srf=40;srr=3;ssen=0;ssep=0;sssb=-0.666847;stb=0.875258;stbp=0;type=snp;varb=0.0275999;ann=maml2    gt:gq:dp:fdp:ro:fro:ao:fao:af:sar:saf:srf:srr:fsar:fsaf:fsrf:fsrr   0/1:143:71:82:43:43:28:39:0.47561:22:6:40:3:33:6:40:3 strand bias 82 reads 

it nice have more information

on face of it, long you're happy ignore second value stb, can remove insistence of semicolon ; after value. this

perl -ple '/^[^#].*fdp=(\d+);.*stb=(\d+\.\d+)/ , $_.=($2 >= 0.8?" strand bias ":" ").$1." reads"' 

but that's pretty awful perl code and, have found, damned hard debug

i prefer this: script extracts all of labelled values hash , uses them directly there

i've guessed want all of stb values under 0.8 result, i've used all function list::util test that. split stb value on commas , create boolean $all_ok status variable indicates whether true. works fine whether there's 1 value or 1 thousand

then printf builds output string components we've calculated

i've emptied $line variable can see appended each line debugging purposes. delete statement or comment out real run

use strict; use warnings 'all';  use list::util 'all';  while ( <> ) {     next unless /\s/;     @fields = split;      chomp( $line = $_ );      %values = map { split /=/ } split /;/, $fields[7];     $all_ok = { $_ < 0.8 } split /,/, $values{stb};      $line = '';  # debugging      printf "%s %s %s reads\n", $line, $all_ok ? 'good' : 'strand bias', $values{fdp}; } 

output

 39 reads  105 reads  111 reads  320 reads  strand bias 82 reads 



update

now you've explained want list of status values in output can write better solution. no longer need list::util::all, instead need create array of boolean status values list of stb data

don't forget comment out $line = '' before run data real

it looks this

use strict; use warnings 'all';  use list::util 'all';  while ( <> ) {     next unless /\s/;     @fields = split;      chomp( $line = $_ );      %values = map { split /=/ } split /;/, $fields[7];      @stb_ok = map { $_ < 0.8 } split /,/, $values{stb};     @good   = map { $_ ? 'good' : 'strand bias' } @stb_ok;      $line = '';  # debugging      printf "%s %s %s reads\n", $line, "@good", $values{fdp}; } 

output

 good 39 reads  105 reads  111 reads  good 320 reads  strand bias 82 reads 

Comments

Popular posts from this blog

ios - RestKit 0.20 — CoreData: error: Failed to call designated initializer on NSManagedObject class (again) -

java - Digest auth with Spring Security using javaconfig -

laravel - PDOException in Connector.php line 55: SQLSTATE[HY000] [1045] Access denied for user 'root'@'localhost' (using password: YES) -