Solved

final output question ever!

Posted on 2004-04-12
5
199 Views
Last Modified: 2010-03-04
Hi guys,

Ok, this is the last question I shall ever ask on perl...I promise!  Ozo was helping me with this, I thought I could fix it but...due to my perl dyslexia I failed!
Basically, I have an output like this:

Output file:
Interfacing Residues Chain A:30 ,31 ,34 ,35 ,36 ,99 ,103,104,106,107,110,111,114
,117,118,119,120,122,123,126
Interface Residue matched cA:E  ,R  ,L  ,S  ,F  ,K  ,H  ,C  ,L  ,V  ,A  ,A  ,P
,F  ,T  ,P  ,A  ,H  ,A  ,D
Interfacing Residues Chain B:30 ,33 ,34 ,35 ,51 ,55 ,101,108,109,111,112,115,116
,119,122,123,124,125,127,128,131
Interface Residue matched cB:R  ,V  ,V  ,Y  ,P  ,M  ,E  ,N  ,V  ,V  ,C  ,A  ,H
,G  ,F  ,T  ,P  ,P  ,Q  ,A  ,Q
Neighboring Residues Chain A:29 ,32 ,33 ,37 ,98 ,100,102,105,108,109,112,113,115
,116,121,124,125,127
Neighboring Residue match cA:L  ,M  ,F  ,P  ,F  ,L  ,S  ,L  ,T  ,L  ,H  ,L  ,A
,E  ,V  ,S  ,L  ,K
Neighboring Residues Chain B:29 ,31 ,32 ,36 ,50 ,52 ,54 ,56 ,100,102,107,110,113
,114,117,118,120,121,126,129,130,132
Neighboring Residue match cB:G  ,L  ,L  ,P  ,T  ,D  ,V  ,G  ,P  ,N  ,G  ,L  ,V
,L  ,H  ,F  ,K  ,E  ,V  ,A  ,Y  ,K

1bbbAB
Surface Residues chainA:1  ,3  ,4  ,5  ,8  ,9  ,11 ,12 ,15 ,16 ,18 ,19 ,20 ,23 ,
37 ,38 ,40 ,41 ,42 ,44 ,45 ,47 ,48 ,49 ,50 ,51 ,53 ,54 ,56 ,57 ,60 ,61 ,64 ,67 ,
68 ,71 ,72 ,74 ,75 ,77 ,78 ,79 ,81 ,82 ,83 ,85 ,86 ,89 ,91 ,92 ,94 ,95 ,96 ,130,
134,137,138,139,140,141
Surface Resitype chainA:V  ,S  ,P  ,A  ,T  ,N  ,K  ,A  ,G  ,K  ,G  ,A  ,H  ,E  ,
P  ,T  ,K  ,T  ,Y  ,P  ,H  ,D  ,L  ,S  ,H  ,G  ,A  ,Q  ,K  ,G  ,K  ,K  ,D  ,T  ,
N  ,A  ,H  ,D  ,D  ,P  ,N  ,A  ,S  ,A  ,L  ,D  ,L  ,H  ,L  ,R  ,D  ,P  ,V  ,A  ,
T  ,T  ,S  ,K  ,Y  ,R
Surface Residues chainB:1  ,2  ,4  ,5  ,6  ,8  ,9  ,10 ,12 ,13 ,16 ,17 ,19 ,20 ,
21 ,22 ,37 ,39 ,40 ,41 ,43 ,44 ,46 ,47 ,49 ,58 ,59 ,61 ,62 ,63 ,65 ,66 ,69 ,72 ,
73 ,76 ,77 ,79 ,80 ,82 ,83 ,84 ,86 ,87 ,88 ,90 ,91 ,92 ,94 ,95 ,96 ,97 ,99 ,104,
135,139,143,144,145,146
Surface Resitype chainB:V  ,H  ,T  ,P  ,E  ,K  ,S  ,A  ,T  ,A  ,G  ,K  ,N  ,V  ,
D  ,E  ,W  ,Q  ,R  ,F  ,E  ,S  ,G  ,D  ,S  ,P  ,K  ,K  ,A  ,H  ,K  ,K  ,G  ,S  ,
D  ,A  ,H  ,D  ,N  ,K  ,G  ,T  ,A  ,T  ,L  ,E  ,L  ,H  ,D  ,K  ,L  ,H  ,D  ,R  ,
A  ,N  ,H  ,K  ,Y  ,H

I have to parse this file above and if any of the numbers from the "Interfacting Residues chainA" line or "neighboring ResidueschainA" line occur in the"Surface Residues chainA"- I have to remove the number from "Surface Residues chainA" and also its corresponding Surface Resitype chainA letter(which lies directly beneath it) without altering the format I have!  This has also got to be repeated for the numbers in "Interfacting Residues chainB" or  "neighboring ResidueschainB" again, removing any of the same numbers in "Surface Residues chainB" and its corresponding "Surface Resitype chainB".

Ozo's program looked like this:

open (IN,'1bbbAB60') || die "Unable to open the Input File";
undef($/); $_=<IN>; close IN;
@ChainA=();  @ChainB=();
if (m#Interfacing Residues Chain A: ?([\s,\d]*)#) {push(@ChainA,split(/\s*,\s*/,$1))};
if (m#Neighboring Residues Chain A: ?([\s,\d]*)#) {push(@ChainA,split(/\s*,\s*/,$1))};
if (m#Interfacing Residues Chain B: ?([\s,\d]*)#) {push(@ChainB,split(/\s*,\s*/,$1))};
if (m#Neighboring Residues Chain B: ?([\s,\d]*)#) {push(@ChainB,split(/\s*,s*/,$1))};
@SurfaceResidueA=();  @SurfaceResidueB=();
if (m#Surface Residues chainA: ?([\s\d,]*)#) {push(@SurfaceResidueA,split(/\s*,\s*/,$1)
)};
if (m#Surface Residues chainB: ?([\s,\d]*)#) {push(@SurfaceResidueB,split(/\s*,\s*/,$1)
)};

@ChainA{@ChainA}=(1)x@ChainA;
$SRA=join(",", map{sprintf("%-3s", $_)} grep {!$ChainA{$_}} @SurfaceResidueA);
s#(Surface Residues chainA:)[\d\s,]*#$1 $SRA#;
@ChainB{@ChainB}=(1)x@ChainB;
$SRB=join(",", map{sprintf("%-3s", $_)} grep {!$ChainB{$_}} @SurfaceResidueB);
s#(Surface Residues chainB:)[\d\s,]*#$1 $SRB#;

open (OUT,">outty.txt") or die $!;
print OUT;

but unfortunatley, it was still not removing the numbers from "Surface Residues chainA"(or chainB) and also its corresponding Surface Resitype chainA(or chainB) letter

Ozo's output is below if he's here and can help me cause he's a genius!
Thanks Sarah XX



1bbbAB
Interfacing Residues Chain A: 30 ,31 ,34 ,35 ,36 ,99 ,103,104,106,107,110,111,11
4,117,118,119,120,122,123,126
Interfacing Residues Chain B: 30 ,33 ,34 ,35 ,51 ,55 ,101,108,109,111,112,115,11
6,119,122,123,124,125,127,128,131
Neighbouring Residues Chain A: 29 ,32 ,33 ,37 ,98 ,100,102,105,108,109,112,113,1
15,116,121,124,125,127
Neighbouring Residues Chain B: 29 ,31 ,32 ,36 ,50 ,52 ,54 ,56 ,100,102,107,110,1
13,114,117,118,120,121,126,129,130,132

1bbbAB
Surface Residues chainA: 1  ,4  ,8  ,15 ,16 ,19 ,38 ,41 ,44 ,45 ,50 ,51 ,53 ,61
,64 ,71 ,74 ,82 ,85 ,90 ,92 ,115,139,141
Surface Resitype chainA:V  ,P  ,T  ,G  ,K  ,A  ,T  ,T  ,P  ,H  ,H  ,G  ,A  ,K  ,
D  ,A  ,D  ,A  ,D  ,K  ,R  ,A  ,K  ,R
Surface Residues chainB: 2  ,5  ,6  ,9  ,16 ,21 ,22 ,40 ,43 ,44 ,47 ,49 ,52 ,56
,76 ,79 ,87 ,95 ,97 ,99 ,120,146
Surface Resitype chainB:H  ,P  ,E  ,S  ,G  ,D  ,E  ,R  ,E  ,S  ,D  ,S  ,D  ,G  ,
A  ,D  ,T  ,K  ,H  ,D  ,K  ,H
0
Comment
Question by:sarahJo
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 4

Accepted Solution

by:
vi_srikanth earned 500 total points
ID: 10804317
Could u clarify my doubt?  You have said that the input of the above program will be like this:

Interfacing Residues Chain A:30 ,31 ,34 ,35 ,36 ,99 ,103,104,106,107,110,111,114
,117,118,119,120,122,123,126
.
.
.

If u see the above input, there are linebreaks/entermarks/newline characters within each line, i.e. in the above there is an entermark after 114.  Will this be the real case? or while posting ur comment u've delibrately put these entermarks?  In other words, will the input for the above program will have linebreaks within each line or not?  If it has, then we might have to tweak the code a little.  If I'm not clear tell me.
0
 

Author Comment

by:sarahJo
ID: 10804333
Hi Vi srikanth,

No...each line of input will all be in one line
so like this:
Interfacing Residues Chain A:30 ,31 ,34 ,35 ,36 ,99 ,103

Sorry, I just got corrupted when I pasted it in.Tks! Sarah
0
 
LVL 4

Expert Comment

by:vi_srikanth
ID: 10804357
The above program for the above input outputs the following. U've said that "it was still not removing the numbers from ...".  Can u exactly pinpoint the number which got retained? For eg., in the input if u see there is "37" in Surface Residues chainA, which got deleted in the final output.

Input:
--------------------------------------
Interfacing Residues Chain A:30 ,31 ,34 ,35 ,36 ,99 ,103,104,106,107,110,111,114,117,118,119,120,122,123,126
Interface Residue matched cA:E  ,R  ,L  ,S  ,F  ,K  ,H  ,C  ,L  ,V  ,A  ,A  ,P,F  ,T  ,P  ,A  ,H  ,A  ,D
Interfacing Residues Chain B:30 ,33 ,34 ,35 ,51 ,55 ,101,108,109,111,112,115,116,119,122,123,124,125,127,128,131
Interface Residue matched cB:R  ,V  ,V  ,Y  ,P  ,M  ,E  ,N  ,V  ,V  ,C  ,A  ,H,G  ,F  ,T  ,P  ,P  ,Q  ,A  ,Q
Neighboring Residues Chain A:29 ,32 ,33 ,37 ,98 ,100,102,105,108,109,112,113,115,116,121,124,125,127
Neighboring Residue match cA:L  ,M  ,F  ,P  ,F  ,L  ,S  ,L  ,T  ,L  ,H  ,L  ,A,E  ,V  ,S  ,L  ,K
Neighboring Residues Chain B:29 ,31 ,32 ,36 ,50 ,52 ,54 ,56 ,100,102,107,110,113,114,117,118,120,121,126,129,130,132
Neighboring Residue match cB:G  ,L  ,L  ,P  ,T  ,D  ,V  ,G  ,P  ,N  ,G  ,L  ,V,L  ,H  ,F  ,K  ,E  ,V  ,A  ,Y  ,K

1bbbAB
Surface Residues chainA:1  ,3  ,4  ,5  ,8  ,9  ,11 ,12 ,15 ,16 ,18 ,19 ,20 ,23 ,37 ,38 ,40 ,41 ,42 ,44 ,45 ,47 ,48 ,49 ,50 ,51 ,53 ,54 ,56 ,57 ,60 ,61 ,64 ,67 ,
68 ,71 ,72 ,74 ,75 ,77 ,78 ,79 ,81 ,82 ,83 ,85 ,86 ,89 ,91 ,92 ,94 ,95 ,96 ,130,134,137,138,139,140,141
Surface Resitype chainA:V  ,S  ,P  ,A  ,T  ,N  ,K  ,A  ,G  ,K  ,G  ,A  ,H  ,E  ,P  ,T  ,K  ,T  ,Y  ,P  ,H  ,D  ,L  ,S  ,H  ,G  ,A  ,Q  ,K  ,G  ,K  ,K  ,D  ,T  ,N  ,A  ,H  ,D  ,D  ,P  ,N  ,A  ,S  ,A  ,L  ,D  ,L  ,H  ,L  ,R  ,D  ,P  ,V  ,A  ,T  ,T  ,S  ,K  ,Y  ,R
Surface Residues chainB:1  ,2  ,4  ,5  ,6  ,8  ,9  ,10 ,12 ,13 ,16 ,17 ,19 ,20 ,21 ,22 ,37 ,39 ,40 ,41 ,43 ,44 ,46 ,47 ,49 ,58 ,59 ,61 ,62 ,63 ,65 ,66 ,69 ,72 ,73 ,76 ,77 ,79 ,80 ,82 ,83 ,84 ,86 ,87 ,88 ,90 ,91 ,92 ,94 ,95 ,96 ,97 ,99 ,104,135,139,143,144,145,146
Surface Resitype chainB:V  ,H  ,T  ,P  ,E  ,K  ,S  ,A  ,T  ,A  ,G  ,K  ,N  ,V  ,D  ,E  ,W  ,Q  ,R  ,F  ,E  ,S  ,G  ,D  ,S  ,P  ,K  ,K  ,A  ,H  ,K  ,K  ,G  ,S  ,D  ,A  ,H  ,D  ,N  ,K  ,G  ,T  ,A  ,T  ,L  ,E  ,L  ,H  ,D  ,K  ,L  ,H  ,D  ,R  ,A  ,N  ,H  ,K  ,Y  ,H


Output:
----------------------------------------
Interfacing Residues Chain A:30 ,31 ,34 ,35 ,36 ,99 ,103,104,106,107,110,111,114,117,118,119,120,122,123,126
Interface Residue matched cA:E  ,R  ,L  ,S  ,F  ,K  ,H  ,C  ,L  ,V  ,A  ,A  ,P,F  ,T  ,P  ,A  ,H  ,A  ,D
Interfacing Residues Chain B:30 ,33 ,34 ,35 ,51 ,55 ,101,108,109,111,112,115,116,119,122,123,124,125,127,128,131
Interface Residue matched cB:R  ,V  ,V  ,Y  ,P  ,M  ,E  ,N  ,V  ,V  ,C  ,A  ,H,G  ,F  ,T  ,P  ,P  ,Q  ,A  ,Q
Neighboring Residues Chain A:29 ,32 ,33 ,37 ,98 ,100,102,105,108,109,112,113,115,116,121,124,125,127
Neighboring Residue match cA:L  ,M  ,F  ,P  ,F  ,L  ,S  ,L  ,T  ,L  ,H  ,L  ,A,E  ,V  ,S  ,L  ,K
Neighboring Residues Chain B:29 ,31 ,32 ,36 ,50 ,52 ,54 ,56 ,100,102,107,110,113,114,117,118,120,121,126,129,130,132
Neighboring Residue match cB:G  ,L  ,L  ,P  ,T  ,D  ,V  ,G  ,P  ,N  ,G  ,L  ,V,L  ,H  ,F  ,K  ,E  ,V  ,A  ,Y  ,K

1bbbAB
Surface Residues chainA: 1  ,3  ,4  ,5  ,8  ,9  ,11 ,12 ,15 ,16 ,18 ,19 ,20 ,23 ,38 ,40 ,41 ,42 ,44 ,45 ,47 ,48 ,49 ,50 ,51 ,53 ,54 ,56 ,57 ,60 ,61 ,64 ,67 ,68 ,71 ,72 ,74 ,75 ,77 ,78 ,79 ,81 ,82 ,83 ,85 ,86 ,89 ,91 ,92 ,94 ,95 ,96 ,130,134,137,138,139,140,141
Surface Resitype chainA:V  ,S  ,P  ,A  ,T  ,N  ,K  ,A  ,G  ,K  ,G  ,A  ,H  ,E  ,P  ,T  ,K  ,T  ,Y  ,P  ,H  ,D  ,L  ,S  ,H  ,G  ,A  ,Q  ,K  ,G  ,K  ,K  ,D  ,T  ,N  ,A  ,H  ,D  ,D  ,P  ,N  ,A  ,S  ,A  ,L  ,D  ,L  ,H  ,L  ,R  ,D  ,P  ,V  ,A  ,T  ,T  ,S  ,K  ,Y  ,R
Surface Residues chainB: 1  ,2  ,4  ,5  ,6  ,8  ,9  ,10 ,12 ,13 ,16 ,17 ,19 ,20 ,21 ,22 ,37 ,39 ,40 ,41 ,43 ,44 ,46 ,47 ,49 ,58 ,59 ,61 ,62 ,63 ,65 ,66 ,69 ,72 ,73 ,76 ,77 ,79 ,80 ,82 ,83 ,84 ,86 ,87 ,88 ,90 ,91 ,92 ,94 ,95 ,96 ,97 ,99 ,104,135,139,143,144,145,146
Surface Resitype chainB:V  ,H  ,T  ,P  ,E  ,K  ,S  ,A  ,T  ,A  ,G  ,K  ,N  ,V  ,D  ,E  ,W  ,Q  ,R  ,F  ,E  ,S  ,G  ,D  ,S  ,P  ,K  ,K  ,A  ,H  ,K  ,K  ,G  ,S  ,D  ,A  ,H  ,D  ,N  ,K  ,G  ,T  ,A  ,T  ,L  ,E  ,L  ,H  ,D  ,K  ,L  ,H  ,D  ,R  ,A  ,N  ,H  ,K  ,Y  ,H
0
 

Author Comment

by:sarahJo
ID: 10804395
Hi Vi_srikanth,

My sincere apologies...its working fine.  One of my files was corrupted!  Thank you.

0
 
LVL 4

Expert Comment

by:vi_srikanth
ID: 10804424
Thats gr8
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question