regular expression

Hi All,

I need help modifying a regular expression.  My current one is :

if (/^ATOM\s+\d+\s+\CA\s+(\S+)\s+/)

which gets "VAL" code from the line below in a large file

ATOM      2  CA  VAL A   3      17.591  48.101  25.416  1.00 27.93           C
 

however sometimes the file can be line
ATOM   1358  CA ALEU A 199      -3.698 -19.821 -32.696  0.50 21.71           C  

and I can't match the "LEU" code.

Can someone help modify my expression so I can accomadate both these lines in the input to get the desired  code like VAL and LEU in the examples above?

Thanks

Sarah
sarahJoAsked:
Who is Participating?
 
Kim RyanConnect With a Mentor IT ConsultantCommented:
Try this one instead, ignores an optional A before the VAL or LEU
/^ATOM\s+\d+\s+\CA\s+A?(\S+)\s+/
0
 
rj2Connect With a Mentor Commented:
Sample code below matches both VAL and LEU.

$_='ATOM      2  CA  VAL A   3      17.591  48.101  25.416  1.00 27.93           C';

if (/^ATOM\s+\d+\s+\CA\s+(\S+)\s+/) {
      print "Match: $1\n";
}
$_='ATOM   1358  CA LEU A 199      -3.698 -19.821 -32.696  0.50 21.71           C';
if (/^ATOM\s+\d+\s+\CA\s+(\S+)\s+/) {
      print "Match: $1\n";
}
0
 
sarahJoAuthor Commented:
Hi Teraplane,

Thanks for that.......it might be any letter befor the VAL or LEU...not just A, can it be chnaged to accomadate this?

Thanks!!!
0
Cloud Class® Course: C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

 
sarahJoAuthor Commented:
Hi rj2,

the second line is

ATOM   1358  CA ALEU A 199      -3.698 -19.821 -32.696  0.50 21.71           C  

with an A before the LEU...that was the problem so the expression needs a little tinkering!

Thanks
Sarah
0
 
Kim RyanIT ConsultantCommented:
Yes you can, but is it always followd by LEU or VAL? If so this would work.
/^ATOM\s+\d+\s+\CA\s+\w?(VAL|LEU)\s+/
0
 
sarahJoAuthor Commented:
Oh, its always followed by LEU or VAl i'm afraid....could be any 3 letter code!
0
 
Kim RyanIT ConsultantCommented:
Could you specify your data format more completely so I can define your problem more precisely? If you are looking for any character followed by 2 or 3 characters, we cannot filter out the first consistenlty. For example it would grab the V from VAL, but ALEU would be OK. We need tu use a pattern either based on string data or character position.
0
 
ozoConnect With a Mentor Commented:
If you always want the last 3 letters of 3 or 4 letters that would be
/^ATOM\s+\d+\s+CA\s+\w?(\w\w\w)\s+/
0
 
joedundasConnect With a Mentor Commented:
Hey Sarah,
  When you are parsing PDB files, it is usually better to use substr().  The A in ALEU is in the Alternative Location Indicator column.  There are many more instance when there won't be spaces between data columns.  
The following link will give you the column information to use for substr()
http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html

Joe
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.