Solved

question

Posted on 2004-04-07
3
183 Views
Last Modified: 2010-03-04
Hi All,

I know I've been posting alot of questions recently but this is easy enough for ye guys and difficult for me!  I have a script below that parses that input file below and gives a list of numbers one for (A) as below and one for (B), in order and occuring once, for the input file like. for example it prints : Interface residues A: 5, 6, 7 9
                                                                                                   Interface residues B :8, 10 ,11, 12 etc in a nice format. etc for the whole file.  What I have to do is parse the first part of the file as I've done before but when the format changes like at the line "SER   5(A)(CA)   - PRO   6(A)"(CA)   :   3.018 --I need to print all these numbers, like before, but to a separate line headed as "neighbouring Residues".  So my output file looks like
Interface resides A: (numbers A first part of file)
interface residuesB: (numbers B first part of file
Neighbouring residuesA: (numbers A after format changes)
Neighbouring residuesB: (numbers B after format changes)

I know its quite simply but my efforts wouldn't work!  Thanks


Input file:
PHE 119(A)( 906)   - THR  10(B)( 996)   :   4.441
PHE 119(A)( 911)   - GLN  11(B)(1002)   :   4.486
PHE 119(A)( 911)   - PRO  12(B)(1016)   :   4.203
PHE 119(A)( 914)   - TRP  13(B)(1025)   :   4.372
PHE 119(A)( 913)   - VAL  16(B)(1056)   :   3.810
PHE 119(A)( 913)   - PHE 119(B)(1874)   :   4.362
SER   5(A)(CA)   - PRO   6(A)(CA)   :   3.018
PRO   6(A)(CA)   - SER   5(A)(CA)   :   3.018
PRO   6(A)(CA)   - SER   7(A)(CA)   :   3.831
THR  10(A)(CA)   - SER   9(A)(CA)   :   3.816
THR  10(A)(CA)   - GLN  11(A)(CA)   :   3.778
GLN  11(A)(CA)   - THR  10(A)(CA)   :   3.778
GLN  11(B)(CA)   - PRO  12(B)(CA)   :   3.830
PRO  12(B)(CA)   - PRO   8(B)(CA)   :   4.140
PRO  12(B)(CA)   - GLN  11(B)(CA)   :   3.830


Script
#!/usr/local/bin/perl

use strict;

my $datafile = '/home/paul/list/antotest';

my %chain;

open FILE, $datafile or die "Can not open $datafile $!\n";

open(OUTFILE,">antotest2")||die;

while (<FILE>) {
    if (/(\d+)\((.).*-.*\s+(\d+)\((.)/) {
     $chain{$2}->{$1} = $1;
     $chain{$4}->{$3} = $3;
    } else {
       print OUTFILE "";
    }
}
foreach my $chain (sort keys %chain) {
     print OUTFILE "Interfacing Residues Chain $chain: ";
     my $c = 0;
     foreach my $skey (sort {$a <=> $b} keys %{$chain{$chain}} ) {
          print  OUTFILE ","   if ($c > 0);
          $c++;
          printf OUTFILE "%-3d", $skey;
     }
     print OUTFILE "\n";
}
0
Comment
Question by:paulieomeara
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 10780904
#!/usr/local/bin/perl

use strict;

my $datafile = '/home/paul/list/antotest';

my @chain;

open FILE, $datafile or die "Can not open $datafile $!\n";

while (<FILE>) {
    while( /(\d+)\((.)\)\(\s*(\w*)\)/g ){
        my($r,$c,$p)=($1,$2,$3);
        $chain[$p!~/\d+/]{$c}{$r}++;
    }
}
open(OUTFILE,">antotest2")||die $!;
for my $part ( 0,1 ){
  foreach my $chain ( sort keys %{$chain[$part]} ){
    print OUTFILE "${[qw(Interfacing Neighboring)]}[$part] Residues Chain $chain: ";
    my $c = 0;
    foreach my $skey (sort {$a <=> $b} keys %{$chain[$part]{$chain}} ) {
        print  OUTFILE ","   if ($c > 0);
        $c++;
        printf OUTFILE "%-3d", $skey;
    }
    print OUTFILE "\n";
  }
}
0
 

Author Comment

by:paulieomeara
ID: 10781146
Hi Oza,

Thats fantastic, thanks so much!  Just one more thing, is there any way that I can exclude the numbers that are in Interfacing Residues ChainA from Neighboring Residues Chain A and the same for Interfacing Residues Chain B?

My current output is like this:
Interfacing Residues Chain A: 5  ,6  ,10 ,11 ,12 ,13 ,16 ,17 ,19 ,20 ,23 ,49 ,11
4,115,116,117,118,119
Interfacing Residues Chain B: 5  ,6  ,10 ,11 ,12 ,13 ,16 ,19 ,20 ,23 ,49 ,114,11
5,116,117,118,119
Neighboring Residues Chain A: 5  ,6  ,7  ,8  ,9  ,10 ,11 ,12 ,13 ,14 ,15 ,16 ,17
 ,18 ,19 ,20 ,21 ,22 ,23 ,24 ,48 ,49 ,50 ,113,114,115,116,117,118,119,120
Neighboring Residues Chain B: 5  ,6  ,7  ,8  ,9  ,10 ,11 ,12 ,13 ,14 ,15 ,16 ,17
 ,18 ,19 ,20 ,21 ,22 ,23 ,24 ,48 ,49 ,50 ,113,114,115,116,117,118,119,120



0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 10781256
#assuming all the Interface Resudue Chains come before the Neighboring Residue Cheains
while (<FILE>) {
    while( /(\d+)\((.)\)\(\s*(\w*)\)/g ){
        my($r,$c,$p)=($1,$2,$3);
        next if $chain[0]{$c}{$r};
        $chain[$p!~/\d+/]{$c}{$r}++;    }
}
close FILE;
open(OUTFILE,">antotest2")||die $!;
for my $part ( 0,1 ){  foreach my $chain ( sort keys %{$chain[$part]} ){
    print OUTFILE "${[qw(Interfacing Neighboring)]}[$part] Residues Chain $chain: ";
    print OUTFILE join',',map{sprintf"%-3d",$_} sort{$a<=>$b}keys %{$chain[$part]{$chain}};
    print OUTFILE "\n";
  }
}
close OUTFILE;
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

756 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question