python - Filter information of a txt file by regular expressions -

- June 15, 2011

i have file information, how looks like:

****alignment**** sequence:  gi|86755972|gb|abd15130.1| cold acclimation protein cor413-pm1 [chimonanthus praecox] length:  201 e-value:  2.66576e-82 kylamktdqlavanmidsdinelkmatmrlindasmlghygfgthflkwlaclaaiyllildrtnwrtnmltsll... +ylamktd+ +   +i +d+ e+   +l+ da+ lg  g gt  lkw+a  aaiyllildrtnw+tnmlt+ll... eylamktdewsaqqliqtdlkemgkaakklvydatklgslgvgtsilkwvasfaaiyllildrtnwktnmltall...

now want filter information, , want use variable. think should use regular expression this, don't know how lots of information of second line, example.

i need hitsid, protein, organism, , evalue.

the corresponding data:

hitsid = 86755972 protein = cold acclimation protein cor413-pm1 organism = chimonanthus praecox evalue = 2.66576e-82

so want that, when ask hitsid, python prints '86755972'.

could me this? thanks!

use regex like

^sequence:[^|]*\|(?p<hitsid>[^|]*)\|\s*\s*(?p<protein>[^][]*?)\s*\[(?p<organism>[^][]*)][\s\s]*?\ne-value:\s*(?p<evalue>.*)

see regex demo

a sample python code getting multiple values list of dictionaries:

import re p = re.compile(r'^sequence:[^|]*\|(?p<hitsid>[^|]*)\|\s*\s*(?p<protein>[^][]*?)\s*\[(?p<organism>[^][]*)][\s\s]*?\ne-value:\s*(?p<evalue>.*)', re.multiline) s = "****alignment****\nsequence:  gi|86755972|gb|abd15130.1| cold acclimation protein cor413-pm1 [chimonanthus praecox]\nlength:  201\ne-value:  2.66576e-82\nkylamktdqlavanmidsdinelkmatmrlindasmlghygfgthflkwlaclaaiyllildrtnwrtnmltsll...\n+ylamktd+ +   +i +d+ e+   +l+ da+ lg  g gt  lkw+a  aaiyllildrtnw+tnmlt+ll...\neylamktdewsaqqliqtdlkemgkaakklvydatklgslgvgtsilkwvasfaaiyllildrtnwktnmltall..." res = [m.groupdict() m in p.finditer(s)] x in res:     print(x['hitsid'])     print(x['protein'])     print(x['organism'])     print(x['evalue'])

Search This Blog

Today's Best Video

python - Filter information of a txt file by regular expressions -

Comments

Post a Comment

Popular posts from this blog

ios - RestKit 0.20 — CoreData: error: Failed to call designated initializer on NSManagedObject class (again) -

laravel - PDOException in Connector.php line 55: SQLSTATE[HY000] [1045] Access denied for user 'root'@'localhost' (using password: YES) -

java - Digest auth with Spring Security using javaconfig -