Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 269
  • Last Modified:

Perl script

Hi ,

I need a perl script to do the following cause I'm having real problems with it!  I have a file which contains text- see example  below.  I need to parse this file and create an output for the 2 chains in the file( A and D). The numbers I'm interesting in capturing are, for example in the first line below : 98 and 146. I need all the numbers for every A and every D in the file. For the entire file I need my output to look like this:
eg
Chain A :34,35,38,39,41
ChainD :37,38,141,143       with every position captured. I'm having major problems with it so any help whould be great!


Example Text:

PRO  38(A)( 291)   - HIS 146(D)(4409)   :   3.840
THR  39(A)( 298)   - PRO 100(D)(4058)   :   3.748
LYS  41(A)( 315)   - HIS 146(D)(4409)   :   3.787
THR  42(A)( 320)   - HIS  97(D)(4030)   :   3.683
THR  42(A)( 322)   - ASP  99(D)(4044)   :   3.780
ARG 142(A)(1084)   - TYR  35(D)(3553)   :   3.788
ARG 142(A)(1079)   - PRO  36(D)(3564)   :   3.809
ARG 142(A)(1081)   - TRP  37(D)(3578)   :   4.002
PRO  38(A)(CA)   - PHE  34(A)(CA)   :   5.092
PRO  38(A)(CA)   - LEU  35(A)(CA)   :   5.761
PRO  38(A)(CA)   - PHE  37(A)(CA)   :   3.800
PRO  38(A)(CA)   - THR  39(A)(CA)   :   3.775
ARG 142(A)(CA)   - TYR 141(A)(CA)   :   3.792
VAL  34(D)(CA)   - ARG  30(D)(CA)   :   5.766
VAL  34(D)(CA)   - LEU  31(D)(CA)   :   5.397
VAL  34(D)(CA)   - LEU  32(D)(CA)   :   5.830
VAL  34(D)(CA)   - VAL  33(D)(CA)   :   3.816
VAL  34(D)(CA)   - TYR  35(D)(CA)   :   3.823
VAL  34(D)(CA)   - PRO  36(D)(CA)   :   5.936
TYR  35(D)(CA)   - LEU  31(D)(CA)   :   5.682
TYR  35(D)(CA)   - LEU  32(D)(CA)   :   5.579
0
paulieomeara
Asked:
paulieomeara
  • 5
  • 4
  • 3
1 Solution
 
ozoCommented:
Could you please explain the relationship between your example output and the example text
0
 
paulieomearaAuthor Commented:
The example text above contains a list of interactions between between (A) and (D), (A) and (A) and also (D) and (D).  I want to capture the numbers before all the (A)'s and all the (D)'s in the 2 lists of the input file, as shown above.  What I want to do is remove redundancy so if one of these numbers occurs more than once, I will only hav eit once in my output file.  My output file should be split in two parts (A) and (D) with each number that occurs in (A) and (D) found once in my output file and with these numbers occuring in order like :
EG
Chain (A) : 34,35,38,39,41
Chain (D) : 37,38,141,143      
 

Does this make it clearer?
0
 
TintinCommented:
#!/usr/bin/perl
use strict;
my $datafile = '/path/to/data.dat';
my %chain;

open FILE, $datafile or die "Can not open $datafile $!\n";

while (<FILE>) {
        if (/(\d+).(.)/) {
                $chain{$2}->{$1} = $1;
        }
        else {
                print "Incorrect format: $_\n";
        }
}

close FILE;

foreach my $chain (keys %chain) {
        print "Chain $chain: ";
        print join(',',keys %{$chain{$chain}}) . "\n";
}

The output from your sample data is:

Chain A: 142,38,39,41,42
Chain D: 34,35

You don't specify if you want the output sorted or not, so let me know if this is a requirement.
0
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
ozoCommented:
Ok, then why doesn't chain (A) contain 42,141,142
and why doesn't chain (D) contain 31,32,34,35,36,97,99,100,146
0
 
ozoCommented:
Tintin's sample output does not match your sample output.
Which, if any, is correct?
0
 
paulieomearaAuthor Commented:
Hi Tintin,

Thanks for that but I do need the data sorted so its in order..

Ozo...that was just an example output...it will contain all the numbers in both the chains
0
 
TintinCommented:
I misunderstood the requirements.

Change the if test to:

        if (/(\d+)\((.).*-.*\s+(\d+)\((.)/) {
                $chain{$2}->{$1} = $1;
                $chain{$4}->{$3} = $3;
0
 
paulieomearaAuthor Commented:
My output will include all the numbers before all the (A)'s and all the (D)s in the file.  I want only one copy of the number and I need the out put to occur in order

EG
Chain (A): 34,35,37,38,39, 41,42,141,142
chain (D) : 30,31,32,33, 34 ,35, 36,37,99,100,146     from the above file sample
0
 
TintinCommented:
Putting the whole thing together:

#!/usr/bin/perl
use strict;
my $datafile = 'data';
my %chain;

open FILE, $datafile or die "Can not open $datafile $!\n";

while (<FILE>) {
        if (/(\d+)\((.).*-.*\s+(\d+)\((.)/) {
                $chain{$2}->{$1} = $1;
                $chain{$4}->{$3} = $3;
        }
        else {
                print "Incorrect format: $_\n";
        }
}

close FILE;

foreach my $chain (sort keys %chain) {
        print "Chain $chain: ";
        print join(',',sort {$a <=> $b} keys %{$chain{$chain}} ) . "\n";
}


Output is now:

Chain A: 34,35,37,38,39,41,42,141,142
Chain D: 30,31,32,33,34,35,36,37,97,99,100,146
0
 
paulieomearaAuthor Commented:
Hi Tintin,

That works good....but I need them to work in order?  Is this possible?
0
 
paulieomearaAuthor Commented:
Thank you...that works great.....eased my mind!
0
 
ozoCommented:
Sorry, I was trying to reproduce your sample output from your sample input
Without that requirement the program is simple:

#!/usr/bin/perl
#!/usr/bin/perl
use strict;
my $datafile = shift || "Example.text";
my %chain;

open FILE, $datafile or die "Can not open $datafile $!\n";

while( <FILE> ){
        $chain{$2}->{$1}++ while /(\d+)\((\w)\)/g;
}

close FILE;

print "Chain ($_): ",join(",",sort{$a<=>$b}keys %{$chain{$_}}),"\n" for sort keys %chain;

__DATA__
This produces
Chain (A) : 34,35,37,38,39,41,42,141,142
Chain (D) : 30,31,32,33,34,35,36,37,97,99,100,146
from
PRO  38(A)( 291)   - HIS 146(D)(4409)   :   3.840
THR  39(A)( 298)   - PRO 100(D)(4058)   :   3.748
LYS  41(A)( 315)   - HIS 146(D)(4409)   :   3.787
THR  42(A)( 320)   - HIS  97(D)(4030)   :   3.683
THR  42(A)( 322)   - ASP  99(D)(4044)   :   3.780
ARG 142(A)(1084)   - TYR  35(D)(3553)   :   3.788
ARG 142(A)(1079)   - PRO  36(D)(3564)   :   3.809
ARG 142(A)(1081)   - TRP  37(D)(3578)   :   4.002
PRO  38(A)(CA)   - PHE  34(A)(CA)   :   5.092
PRO  38(A)(CA)   - LEU  35(A)(CA)   :   5.761
PRO  38(A)(CA)   - PHE  37(A)(CA)   :   3.800
PRO  38(A)(CA)   - THR  39(A)(CA)   :   3.775
ARG 142(A)(CA)   - TYR 141(A)(CA)   :   3.792
VAL  34(D)(CA)   - ARG  30(D)(CA)   :   5.766
VAL  34(D)(CA)   - LEU  31(D)(CA)   :   5.397
VAL  34(D)(CA)   - LEU  32(D)(CA)   :   5.830
VAL  34(D)(CA)   - VAL  33(D)(CA)   :   3.816
VAL  34(D)(CA)   - TYR  35(D)(CA)   :   3.823
VAL  34(D)(CA)   - PRO  36(D)(CA)   :   5.936
TYR  35(D)(CA)   - LEU  31(D)(CA)   :   5.682
TYR  35(D)(CA)   - LEU  32(D)(CA)   :   5.579

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 5
  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now