Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 276
  • Last Modified:

regular expression

Hi All,

I need help modifying a regular expression.  My current one is :

if (/^ATOM\s+\d+\s+\CA\s+(\S+)\s+/)

which gets "VAL" code from the line below in a large file

ATOM      2  CA  VAL A   3      17.591  48.101  25.416  1.00 27.93           C
 

however sometimes the file can be line
ATOM   1358  CA ALEU A 199      -3.698 -19.821 -32.696  0.50 21.71           C  

and I can't match the "LEU" code.

Can someone help modify my expression so I can accomadate both these lines in the input to get the desired  code like VAL and LEU in the examples above?

Thanks

Sarah
0
sarahJo
Asked:
sarahJo
4 Solutions
 
Kim RyanIT ConsultantCommented:
Try this one instead, ignores an optional A before the VAL or LEU
/^ATOM\s+\d+\s+\CA\s+A?(\S+)\s+/
0
 
rj2Commented:
Sample code below matches both VAL and LEU.

$_='ATOM      2  CA  VAL A   3      17.591  48.101  25.416  1.00 27.93           C';

if (/^ATOM\s+\d+\s+\CA\s+(\S+)\s+/) {
      print "Match: $1\n";
}
$_='ATOM   1358  CA LEU A 199      -3.698 -19.821 -32.696  0.50 21.71           C';
if (/^ATOM\s+\d+\s+\CA\s+(\S+)\s+/) {
      print "Match: $1\n";
}
0
 
sarahJoAuthor Commented:
Hi Teraplane,

Thanks for that.......it might be any letter befor the VAL or LEU...not just A, can it be chnaged to accomadate this?

Thanks!!!
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
sarahJoAuthor Commented:
Hi rj2,

the second line is

ATOM   1358  CA ALEU A 199      -3.698 -19.821 -32.696  0.50 21.71           C  

with an A before the LEU...that was the problem so the expression needs a little tinkering!

Thanks
Sarah
0
 
Kim RyanIT ConsultantCommented:
Yes you can, but is it always followd by LEU or VAL? If so this would work.
/^ATOM\s+\d+\s+\CA\s+\w?(VAL|LEU)\s+/
0
 
sarahJoAuthor Commented:
Oh, its always followed by LEU or VAl i'm afraid....could be any 3 letter code!
0
 
Kim RyanIT ConsultantCommented:
Could you specify your data format more completely so I can define your problem more precisely? If you are looking for any character followed by 2 or 3 characters, we cannot filter out the first consistenlty. For example it would grab the V from VAL, but ALEU would be OK. We need tu use a pattern either based on string data or character position.
0
 
ozoCommented:
If you always want the last 3 letters of 3 or 4 letters that would be
/^ATOM\s+\d+\s+CA\s+\w?(\w\w\w)\s+/
0
 
joedundasCommented:
Hey Sarah,
  When you are parsing PDB files, it is usually better to use substr().  The A in ALEU is in the Alternative Location Indicator column.  There are many more instance when there won't be spaces between data columns.  
The following link will give you the column information to use for substr()
http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html

Joe
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now