[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 190
  • Last Modified:

question

Hi All,

I know I've been posting alot of questions recently but this is easy enough for ye guys and difficult for me!  I have a script below that parses that input file below and gives a list of numbers one for (A) as below and one for (B), in order and occuring once, for the input file like. for example it prints : Interface residues A: 5, 6, 7 9
                                                                                                   Interface residues B :8, 10 ,11, 12 etc in a nice format. etc for the whole file.  What I have to do is parse the first part of the file as I've done before but when the format changes like at the line "SER   5(A)(CA)   - PRO   6(A)"(CA)   :   3.018 --I need to print all these numbers, like before, but to a separate line headed as "neighbouring Residues".  So my output file looks like
Interface resides A: (numbers A first part of file)
interface residuesB: (numbers B first part of file
Neighbouring residuesA: (numbers A after format changes)
Neighbouring residuesB: (numbers B after format changes)

I know its quite simply but my efforts wouldn't work!  Thanks


Input file:
PHE 119(A)( 906)   - THR  10(B)( 996)   :   4.441
PHE 119(A)( 911)   - GLN  11(B)(1002)   :   4.486
PHE 119(A)( 911)   - PRO  12(B)(1016)   :   4.203
PHE 119(A)( 914)   - TRP  13(B)(1025)   :   4.372
PHE 119(A)( 913)   - VAL  16(B)(1056)   :   3.810
PHE 119(A)( 913)   - PHE 119(B)(1874)   :   4.362
SER   5(A)(CA)   - PRO   6(A)(CA)   :   3.018
PRO   6(A)(CA)   - SER   5(A)(CA)   :   3.018
PRO   6(A)(CA)   - SER   7(A)(CA)   :   3.831
THR  10(A)(CA)   - SER   9(A)(CA)   :   3.816
THR  10(A)(CA)   - GLN  11(A)(CA)   :   3.778
GLN  11(A)(CA)   - THR  10(A)(CA)   :   3.778
GLN  11(B)(CA)   - PRO  12(B)(CA)   :   3.830
PRO  12(B)(CA)   - PRO   8(B)(CA)   :   4.140
PRO  12(B)(CA)   - GLN  11(B)(CA)   :   3.830


Script
#!/usr/local/bin/perl

use strict;

my $datafile = '/home/paul/list/antotest';

my %chain;

open FILE, $datafile or die "Can not open $datafile $!\n";

open(OUTFILE,">antotest2")||die;

while (<FILE>) {
    if (/(\d+)\((.).*-.*\s+(\d+)\((.)/) {
     $chain{$2}->{$1} = $1;
     $chain{$4}->{$3} = $3;
    } else {
       print OUTFILE "";
    }
}
foreach my $chain (sort keys %chain) {
     print OUTFILE "Interfacing Residues Chain $chain: ";
     my $c = 0;
     foreach my $skey (sort {$a <=> $b} keys %{$chain{$chain}} ) {
          print  OUTFILE ","   if ($c > 0);
          $c++;
          printf OUTFILE "%-3d", $skey;
     }
     print OUTFILE "\n";
}
0
paulieomeara
Asked:
paulieomeara
  • 2
1 Solution
 
ozoCommented:
#!/usr/local/bin/perl

use strict;

my $datafile = '/home/paul/list/antotest';

my @chain;

open FILE, $datafile or die "Can not open $datafile $!\n";

while (<FILE>) {
    while( /(\d+)\((.)\)\(\s*(\w*)\)/g ){
        my($r,$c,$p)=($1,$2,$3);
        $chain[$p!~/\d+/]{$c}{$r}++;
    }
}
open(OUTFILE,">antotest2")||die $!;
for my $part ( 0,1 ){
  foreach my $chain ( sort keys %{$chain[$part]} ){
    print OUTFILE "${[qw(Interfacing Neighboring)]}[$part] Residues Chain $chain: ";
    my $c = 0;
    foreach my $skey (sort {$a <=> $b} keys %{$chain[$part]{$chain}} ) {
        print  OUTFILE ","   if ($c > 0);
        $c++;
        printf OUTFILE "%-3d", $skey;
    }
    print OUTFILE "\n";
  }
}
0
 
paulieomearaAuthor Commented:
Hi Oza,

Thats fantastic, thanks so much!  Just one more thing, is there any way that I can exclude the numbers that are in Interfacing Residues ChainA from Neighboring Residues Chain A and the same for Interfacing Residues Chain B?

My current output is like this:
Interfacing Residues Chain A: 5  ,6  ,10 ,11 ,12 ,13 ,16 ,17 ,19 ,20 ,23 ,49 ,11
4,115,116,117,118,119
Interfacing Residues Chain B: 5  ,6  ,10 ,11 ,12 ,13 ,16 ,19 ,20 ,23 ,49 ,114,11
5,116,117,118,119
Neighboring Residues Chain A: 5  ,6  ,7  ,8  ,9  ,10 ,11 ,12 ,13 ,14 ,15 ,16 ,17
 ,18 ,19 ,20 ,21 ,22 ,23 ,24 ,48 ,49 ,50 ,113,114,115,116,117,118,119,120
Neighboring Residues Chain B: 5  ,6  ,7  ,8  ,9  ,10 ,11 ,12 ,13 ,14 ,15 ,16 ,17
 ,18 ,19 ,20 ,21 ,22 ,23 ,24 ,48 ,49 ,50 ,113,114,115,116,117,118,119,120



0
 
ozoCommented:
#assuming all the Interface Resudue Chains come before the Neighboring Residue Cheains
while (<FILE>) {
    while( /(\d+)\((.)\)\(\s*(\w*)\)/g ){
        my($r,$c,$p)=($1,$2,$3);
        next if $chain[0]{$c}{$r};
        $chain[$p!~/\d+/]{$c}{$r}++;    }
}
close FILE;
open(OUTFILE,">antotest2")||die $!;
for my $part ( 0,1 ){  foreach my $chain ( sort keys %{$chain[$part]} ){
    print OUTFILE "${[qw(Interfacing Neighboring)]}[$part] Residues Chain $chain: ";
    print OUTFILE join',',map{sprintf"%-3d",$_} sort{$a<=>$b}keys %{$chain[$part]{$chain}};
    print OUTFILE "\n";
  }
}
close OUTFILE;
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now