Solved

Perl script

Posted on 2004-03-30
12
250 Views
Last Modified: 2010-03-04
Hi ,

I need a perl script to do the following cause I'm having real problems with it!  I have a file which contains text- see example  below.  I need to parse this file and create an output for the 2 chains in the file( A and D). The numbers I'm interesting in capturing are, for example in the first line below : 98 and 146. I need all the numbers for every A and every D in the file. For the entire file I need my output to look like this:
eg
Chain A :34,35,38,39,41
ChainD :37,38,141,143       with every position captured. I'm having major problems with it so any help whould be great!


Example Text:

PRO  38(A)( 291)   - HIS 146(D)(4409)   :   3.840
THR  39(A)( 298)   - PRO 100(D)(4058)   :   3.748
LYS  41(A)( 315)   - HIS 146(D)(4409)   :   3.787
THR  42(A)( 320)   - HIS  97(D)(4030)   :   3.683
THR  42(A)( 322)   - ASP  99(D)(4044)   :   3.780
ARG 142(A)(1084)   - TYR  35(D)(3553)   :   3.788
ARG 142(A)(1079)   - PRO  36(D)(3564)   :   3.809
ARG 142(A)(1081)   - TRP  37(D)(3578)   :   4.002
PRO  38(A)(CA)   - PHE  34(A)(CA)   :   5.092
PRO  38(A)(CA)   - LEU  35(A)(CA)   :   5.761
PRO  38(A)(CA)   - PHE  37(A)(CA)   :   3.800
PRO  38(A)(CA)   - THR  39(A)(CA)   :   3.775
ARG 142(A)(CA)   - TYR 141(A)(CA)   :   3.792
VAL  34(D)(CA)   - ARG  30(D)(CA)   :   5.766
VAL  34(D)(CA)   - LEU  31(D)(CA)   :   5.397
VAL  34(D)(CA)   - LEU  32(D)(CA)   :   5.830
VAL  34(D)(CA)   - VAL  33(D)(CA)   :   3.816
VAL  34(D)(CA)   - TYR  35(D)(CA)   :   3.823
VAL  34(D)(CA)   - PRO  36(D)(CA)   :   5.936
TYR  35(D)(CA)   - LEU  31(D)(CA)   :   5.682
TYR  35(D)(CA)   - LEU  32(D)(CA)   :   5.579
0
Comment
Question by:paulieomeara
  • 5
  • 4
  • 3
12 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 10720667
Could you please explain the relationship between your example output and the example text
0
 

Author Comment

by:paulieomeara
ID: 10720735
The example text above contains a list of interactions between between (A) and (D), (A) and (A) and also (D) and (D).  I want to capture the numbers before all the (A)'s and all the (D)'s in the 2 lists of the input file, as shown above.  What I want to do is remove redundancy so if one of these numbers occurs more than once, I will only hav eit once in my output file.  My output file should be split in two parts (A) and (D) with each number that occurs in (A) and (D) found once in my output file and with these numbers occuring in order like :
EG
Chain (A) : 34,35,38,39,41
Chain (D) : 37,38,141,143      
 

Does this make it clearer?
0
 
LVL 48

Expert Comment

by:Tintin
ID: 10720775
#!/usr/bin/perl
use strict;
my $datafile = '/path/to/data.dat';
my %chain;

open FILE, $datafile or die "Can not open $datafile $!\n";

while (<FILE>) {
        if (/(\d+).(.)/) {
                $chain{$2}->{$1} = $1;
        }
        else {
                print "Incorrect format: $_\n";
        }
}

close FILE;

foreach my $chain (keys %chain) {
        print "Chain $chain: ";
        print join(',',keys %{$chain{$chain}}) . "\n";
}

The output from your sample data is:

Chain A: 142,38,39,41,42
Chain D: 34,35

You don't specify if you want the output sorted or not, so let me know if this is a requirement.
0
 
LVL 84

Expert Comment

by:ozo
ID: 10720793
Ok, then why doesn't chain (A) contain 42,141,142
and why doesn't chain (D) contain 31,32,34,35,36,97,99,100,146
0
 
LVL 84

Expert Comment

by:ozo
ID: 10720828
Tintin's sample output does not match your sample output.
Which, if any, is correct?
0
 

Author Comment

by:paulieomeara
ID: 10720830
Hi Tintin,

Thanks for that but I do need the data sorted so its in order..

Ozo...that was just an example output...it will contain all the numbers in both the chains
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 48

Expert Comment

by:Tintin
ID: 10720836
I misunderstood the requirements.

Change the if test to:

        if (/(\d+)\((.).*-.*\s+(\d+)\((.)/) {
                $chain{$2}->{$1} = $1;
                $chain{$4}->{$3} = $3;
0
 

Author Comment

by:paulieomeara
ID: 10720869
My output will include all the numbers before all the (A)'s and all the (D)s in the file.  I want only one copy of the number and I need the out put to occur in order

EG
Chain (A): 34,35,37,38,39, 41,42,141,142
chain (D) : 30,31,32,33, 34 ,35, 36,37,99,100,146     from the above file sample
0
 
LVL 48

Accepted Solution

by:
Tintin earned 500 total points
ID: 10720887
Putting the whole thing together:

#!/usr/bin/perl
use strict;
my $datafile = 'data';
my %chain;

open FILE, $datafile or die "Can not open $datafile $!\n";

while (<FILE>) {
        if (/(\d+)\((.).*-.*\s+(\d+)\((.)/) {
                $chain{$2}->{$1} = $1;
                $chain{$4}->{$3} = $3;
        }
        else {
                print "Incorrect format: $_\n";
        }
}

close FILE;

foreach my $chain (sort keys %chain) {
        print "Chain $chain: ";
        print join(',',sort {$a <=> $b} keys %{$chain{$chain}} ) . "\n";
}


Output is now:

Chain A: 34,35,37,38,39,41,42,141,142
Chain D: 30,31,32,33,34,35,36,37,97,99,100,146
0
 

Author Comment

by:paulieomeara
ID: 10720889
Hi Tintin,

That works good....but I need them to work in order?  Is this possible?
0
 

Author Comment

by:paulieomeara
ID: 10720910
Thank you...that works great.....eased my mind!
0
 
LVL 84

Expert Comment

by:ozo
ID: 10720992
Sorry, I was trying to reproduce your sample output from your sample input
Without that requirement the program is simple:

#!/usr/bin/perl
#!/usr/bin/perl
use strict;
my $datafile = shift || "Example.text";
my %chain;

open FILE, $datafile or die "Can not open $datafile $!\n";

while( <FILE> ){
        $chain{$2}->{$1}++ while /(\d+)\((\w)\)/g;
}

close FILE;

print "Chain ($_): ",join(",",sort{$a<=>$b}keys %{$chain{$_}}),"\n" for sort keys %chain;

__DATA__
This produces
Chain (A) : 34,35,37,38,39,41,42,141,142
Chain (D) : 30,31,32,33,34,35,36,37,97,99,100,146
from
PRO  38(A)( 291)   - HIS 146(D)(4409)   :   3.840
THR  39(A)( 298)   - PRO 100(D)(4058)   :   3.748
LYS  41(A)( 315)   - HIS 146(D)(4409)   :   3.787
THR  42(A)( 320)   - HIS  97(D)(4030)   :   3.683
THR  42(A)( 322)   - ASP  99(D)(4044)   :   3.780
ARG 142(A)(1084)   - TYR  35(D)(3553)   :   3.788
ARG 142(A)(1079)   - PRO  36(D)(3564)   :   3.809
ARG 142(A)(1081)   - TRP  37(D)(3578)   :   4.002
PRO  38(A)(CA)   - PHE  34(A)(CA)   :   5.092
PRO  38(A)(CA)   - LEU  35(A)(CA)   :   5.761
PRO  38(A)(CA)   - PHE  37(A)(CA)   :   3.800
PRO  38(A)(CA)   - THR  39(A)(CA)   :   3.775
ARG 142(A)(CA)   - TYR 141(A)(CA)   :   3.792
VAL  34(D)(CA)   - ARG  30(D)(CA)   :   5.766
VAL  34(D)(CA)   - LEU  31(D)(CA)   :   5.397
VAL  34(D)(CA)   - LEU  32(D)(CA)   :   5.830
VAL  34(D)(CA)   - VAL  33(D)(CA)   :   3.816
VAL  34(D)(CA)   - TYR  35(D)(CA)   :   3.823
VAL  34(D)(CA)   - PRO  36(D)(CA)   :   5.936
TYR  35(D)(CA)   - LEU  31(D)(CA)   :   5.682
TYR  35(D)(CA)   - LEU  32(D)(CA)   :   5.579

0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now