Solved

regular expression

Posted on 2004-09-15
12
247 Views
Last Modified: 2010-03-05
Hi All,

I need help modifying a regular expression.  My current one is :

if (/^ATOM\s+\d+\s+\CA\s+(\S+)\s+/)

which gets "VAL" code from the line below in a large file

ATOM      2  CA  VAL A   3      17.591  48.101  25.416  1.00 27.93           C
 

however sometimes the file can be line
ATOM   1358  CA ALEU A 199      -3.698 -19.821 -32.696  0.50 21.71           C  

and I can't match the "LEU" code.

Can someone help modify my expression so I can accomadate both these lines in the input to get the desired  code like VAL and LEU in the examples above?

Thanks

Sarah
0
Comment
Question by:sarahJo
12 Comments
 
LVL 19

Accepted Solution

by:
Kim Ryan earned 125 total points
ID: 12071903
Try this one instead, ignores an optional A before the VAL or LEU
/^ATOM\s+\d+\s+\CA\s+A?(\S+)\s+/
0
 
LVL 10

Assisted Solution

by:rj2
rj2 earned 125 total points
ID: 12071913
Sample code below matches both VAL and LEU.

$_='ATOM      2  CA  VAL A   3      17.591  48.101  25.416  1.00 27.93           C';

if (/^ATOM\s+\d+\s+\CA\s+(\S+)\s+/) {
      print "Match: $1\n";
}
$_='ATOM   1358  CA LEU A 199      -3.698 -19.821 -32.696  0.50 21.71           C';
if (/^ATOM\s+\d+\s+\CA\s+(\S+)\s+/) {
      print "Match: $1\n";
}
0
 

Author Comment

by:sarahJo
ID: 12071936
Hi Teraplane,

Thanks for that.......it might be any letter befor the VAL or LEU...not just A, can it be chnaged to accomadate this?

Thanks!!!
0
Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

 

Author Comment

by:sarahJo
ID: 12071943
Hi rj2,

the second line is

ATOM   1358  CA ALEU A 199      -3.698 -19.821 -32.696  0.50 21.71           C  

with an A before the LEU...that was the problem so the expression needs a little tinkering!

Thanks
Sarah
0
 
LVL 19

Expert Comment

by:Kim Ryan
ID: 12071955
Yes you can, but is it always followd by LEU or VAL? If so this would work.
/^ATOM\s+\d+\s+\CA\s+\w?(VAL|LEU)\s+/
0
 

Author Comment

by:sarahJo
ID: 12071958
Oh, its always followed by LEU or VAl i'm afraid....could be any 3 letter code!
0
 
LVL 19

Expert Comment

by:Kim Ryan
ID: 12071965
Could you specify your data format more completely so I can define your problem more precisely? If you are looking for any character followed by 2 or 3 characters, we cannot filter out the first consistenlty. For example it would grab the V from VAL, but ALEU would be OK. We need tu use a pattern either based on string data or character position.
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 125 total points
ID: 12076145
If you always want the last 3 letters of 3 or 4 letters that would be
/^ATOM\s+\d+\s+CA\s+\w?(\w\w\w)\s+/
0
 

Assisted Solution

by:joedundas
joedundas earned 125 total points
ID: 12089361
Hey Sarah,
  When you are parsing PDB files, it is usually better to use substr().  The A in ALEU is in the Alternative Location Indicator column.  There are many more instance when there won't be spaces between data columns.  
The following link will give you the column information to use for substr()
http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html

Joe
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

789 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question