?
Solved

final output question ever!

Posted on 2004-04-12
5
Medium Priority
?
200 Views
Last Modified: 2010-03-04
Hi guys,

Ok, this is the last question I shall ever ask on perl...I promise!  Ozo was helping me with this, I thought I could fix it but...due to my perl dyslexia I failed!
Basically, I have an output like this:

Output file:
Interfacing Residues Chain A:30 ,31 ,34 ,35 ,36 ,99 ,103,104,106,107,110,111,114
,117,118,119,120,122,123,126
Interface Residue matched cA:E  ,R  ,L  ,S  ,F  ,K  ,H  ,C  ,L  ,V  ,A  ,A  ,P
,F  ,T  ,P  ,A  ,H  ,A  ,D
Interfacing Residues Chain B:30 ,33 ,34 ,35 ,51 ,55 ,101,108,109,111,112,115,116
,119,122,123,124,125,127,128,131
Interface Residue matched cB:R  ,V  ,V  ,Y  ,P  ,M  ,E  ,N  ,V  ,V  ,C  ,A  ,H
,G  ,F  ,T  ,P  ,P  ,Q  ,A  ,Q
Neighboring Residues Chain A:29 ,32 ,33 ,37 ,98 ,100,102,105,108,109,112,113,115
,116,121,124,125,127
Neighboring Residue match cA:L  ,M  ,F  ,P  ,F  ,L  ,S  ,L  ,T  ,L  ,H  ,L  ,A
,E  ,V  ,S  ,L  ,K
Neighboring Residues Chain B:29 ,31 ,32 ,36 ,50 ,52 ,54 ,56 ,100,102,107,110,113
,114,117,118,120,121,126,129,130,132
Neighboring Residue match cB:G  ,L  ,L  ,P  ,T  ,D  ,V  ,G  ,P  ,N  ,G  ,L  ,V
,L  ,H  ,F  ,K  ,E  ,V  ,A  ,Y  ,K

1bbbAB
Surface Residues chainA:1  ,3  ,4  ,5  ,8  ,9  ,11 ,12 ,15 ,16 ,18 ,19 ,20 ,23 ,
37 ,38 ,40 ,41 ,42 ,44 ,45 ,47 ,48 ,49 ,50 ,51 ,53 ,54 ,56 ,57 ,60 ,61 ,64 ,67 ,
68 ,71 ,72 ,74 ,75 ,77 ,78 ,79 ,81 ,82 ,83 ,85 ,86 ,89 ,91 ,92 ,94 ,95 ,96 ,130,
134,137,138,139,140,141
Surface Resitype chainA:V  ,S  ,P  ,A  ,T  ,N  ,K  ,A  ,G  ,K  ,G  ,A  ,H  ,E  ,
P  ,T  ,K  ,T  ,Y  ,P  ,H  ,D  ,L  ,S  ,H  ,G  ,A  ,Q  ,K  ,G  ,K  ,K  ,D  ,T  ,
N  ,A  ,H  ,D  ,D  ,P  ,N  ,A  ,S  ,A  ,L  ,D  ,L  ,H  ,L  ,R  ,D  ,P  ,V  ,A  ,
T  ,T  ,S  ,K  ,Y  ,R
Surface Residues chainB:1  ,2  ,4  ,5  ,6  ,8  ,9  ,10 ,12 ,13 ,16 ,17 ,19 ,20 ,
21 ,22 ,37 ,39 ,40 ,41 ,43 ,44 ,46 ,47 ,49 ,58 ,59 ,61 ,62 ,63 ,65 ,66 ,69 ,72 ,
73 ,76 ,77 ,79 ,80 ,82 ,83 ,84 ,86 ,87 ,88 ,90 ,91 ,92 ,94 ,95 ,96 ,97 ,99 ,104,
135,139,143,144,145,146
Surface Resitype chainB:V  ,H  ,T  ,P  ,E  ,K  ,S  ,A  ,T  ,A  ,G  ,K  ,N  ,V  ,
D  ,E  ,W  ,Q  ,R  ,F  ,E  ,S  ,G  ,D  ,S  ,P  ,K  ,K  ,A  ,H  ,K  ,K  ,G  ,S  ,
D  ,A  ,H  ,D  ,N  ,K  ,G  ,T  ,A  ,T  ,L  ,E  ,L  ,H  ,D  ,K  ,L  ,H  ,D  ,R  ,
A  ,N  ,H  ,K  ,Y  ,H

I have to parse this file above and if any of the numbers from the "Interfacting Residues chainA" line or "neighboring ResidueschainA" line occur in the"Surface Residues chainA"- I have to remove the number from "Surface Residues chainA" and also its corresponding Surface Resitype chainA letter(which lies directly beneath it) without altering the format I have!  This has also got to be repeated for the numbers in "Interfacting Residues chainB" or  "neighboring ResidueschainB" again, removing any of the same numbers in "Surface Residues chainB" and its corresponding "Surface Resitype chainB".

Ozo's program looked like this:

open (IN,'1bbbAB60') || die "Unable to open the Input File";
undef($/); $_=<IN>; close IN;
@ChainA=();  @ChainB=();
if (m#Interfacing Residues Chain A: ?([\s,\d]*)#) {push(@ChainA,split(/\s*,\s*/,$1))};
if (m#Neighboring Residues Chain A: ?([\s,\d]*)#) {push(@ChainA,split(/\s*,\s*/,$1))};
if (m#Interfacing Residues Chain B: ?([\s,\d]*)#) {push(@ChainB,split(/\s*,\s*/,$1))};
if (m#Neighboring Residues Chain B: ?([\s,\d]*)#) {push(@ChainB,split(/\s*,s*/,$1))};
@SurfaceResidueA=();  @SurfaceResidueB=();
if (m#Surface Residues chainA: ?([\s\d,]*)#) {push(@SurfaceResidueA,split(/\s*,\s*/,$1)
)};
if (m#Surface Residues chainB: ?([\s,\d]*)#) {push(@SurfaceResidueB,split(/\s*,\s*/,$1)
)};

@ChainA{@ChainA}=(1)x@ChainA;
$SRA=join(",", map{sprintf("%-3s", $_)} grep {!$ChainA{$_}} @SurfaceResidueA);
s#(Surface Residues chainA:)[\d\s,]*#$1 $SRA#;
@ChainB{@ChainB}=(1)x@ChainB;
$SRB=join(",", map{sprintf("%-3s", $_)} grep {!$ChainB{$_}} @SurfaceResidueB);
s#(Surface Residues chainB:)[\d\s,]*#$1 $SRB#;

open (OUT,">outty.txt") or die $!;
print OUT;

but unfortunatley, it was still not removing the numbers from "Surface Residues chainA"(or chainB) and also its corresponding Surface Resitype chainA(or chainB) letter

Ozo's output is below if he's here and can help me cause he's a genius!
Thanks Sarah XX



1bbbAB
Interfacing Residues Chain A: 30 ,31 ,34 ,35 ,36 ,99 ,103,104,106,107,110,111,11
4,117,118,119,120,122,123,126
Interfacing Residues Chain B: 30 ,33 ,34 ,35 ,51 ,55 ,101,108,109,111,112,115,11
6,119,122,123,124,125,127,128,131
Neighbouring Residues Chain A: 29 ,32 ,33 ,37 ,98 ,100,102,105,108,109,112,113,1
15,116,121,124,125,127
Neighbouring Residues Chain B: 29 ,31 ,32 ,36 ,50 ,52 ,54 ,56 ,100,102,107,110,1
13,114,117,118,120,121,126,129,130,132

1bbbAB
Surface Residues chainA: 1  ,4  ,8  ,15 ,16 ,19 ,38 ,41 ,44 ,45 ,50 ,51 ,53 ,61
,64 ,71 ,74 ,82 ,85 ,90 ,92 ,115,139,141
Surface Resitype chainA:V  ,P  ,T  ,G  ,K  ,A  ,T  ,T  ,P  ,H  ,H  ,G  ,A  ,K  ,
D  ,A  ,D  ,A  ,D  ,K  ,R  ,A  ,K  ,R
Surface Residues chainB: 2  ,5  ,6  ,9  ,16 ,21 ,22 ,40 ,43 ,44 ,47 ,49 ,52 ,56
,76 ,79 ,87 ,95 ,97 ,99 ,120,146
Surface Resitype chainB:H  ,P  ,E  ,S  ,G  ,D  ,E  ,R  ,E  ,S  ,D  ,S  ,D  ,G  ,
A  ,D  ,T  ,K  ,H  ,D  ,K  ,H
0
Comment
Question by:sarahJo
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 4

Accepted Solution

by:
vi_srikanth earned 1500 total points
ID: 10804317
Could u clarify my doubt?  You have said that the input of the above program will be like this:

Interfacing Residues Chain A:30 ,31 ,34 ,35 ,36 ,99 ,103,104,106,107,110,111,114
,117,118,119,120,122,123,126
.
.
.

If u see the above input, there are linebreaks/entermarks/newline characters within each line, i.e. in the above there is an entermark after 114.  Will this be the real case? or while posting ur comment u've delibrately put these entermarks?  In other words, will the input for the above program will have linebreaks within each line or not?  If it has, then we might have to tweak the code a little.  If I'm not clear tell me.
0
 

Author Comment

by:sarahJo
ID: 10804333
Hi Vi srikanth,

No...each line of input will all be in one line
so like this:
Interfacing Residues Chain A:30 ,31 ,34 ,35 ,36 ,99 ,103

Sorry, I just got corrupted when I pasted it in.Tks! Sarah
0
 
LVL 4

Expert Comment

by:vi_srikanth
ID: 10804357
The above program for the above input outputs the following. U've said that "it was still not removing the numbers from ...".  Can u exactly pinpoint the number which got retained? For eg., in the input if u see there is "37" in Surface Residues chainA, which got deleted in the final output.

Input:
--------------------------------------
Interfacing Residues Chain A:30 ,31 ,34 ,35 ,36 ,99 ,103,104,106,107,110,111,114,117,118,119,120,122,123,126
Interface Residue matched cA:E  ,R  ,L  ,S  ,F  ,K  ,H  ,C  ,L  ,V  ,A  ,A  ,P,F  ,T  ,P  ,A  ,H  ,A  ,D
Interfacing Residues Chain B:30 ,33 ,34 ,35 ,51 ,55 ,101,108,109,111,112,115,116,119,122,123,124,125,127,128,131
Interface Residue matched cB:R  ,V  ,V  ,Y  ,P  ,M  ,E  ,N  ,V  ,V  ,C  ,A  ,H,G  ,F  ,T  ,P  ,P  ,Q  ,A  ,Q
Neighboring Residues Chain A:29 ,32 ,33 ,37 ,98 ,100,102,105,108,109,112,113,115,116,121,124,125,127
Neighboring Residue match cA:L  ,M  ,F  ,P  ,F  ,L  ,S  ,L  ,T  ,L  ,H  ,L  ,A,E  ,V  ,S  ,L  ,K
Neighboring Residues Chain B:29 ,31 ,32 ,36 ,50 ,52 ,54 ,56 ,100,102,107,110,113,114,117,118,120,121,126,129,130,132
Neighboring Residue match cB:G  ,L  ,L  ,P  ,T  ,D  ,V  ,G  ,P  ,N  ,G  ,L  ,V,L  ,H  ,F  ,K  ,E  ,V  ,A  ,Y  ,K

1bbbAB
Surface Residues chainA:1  ,3  ,4  ,5  ,8  ,9  ,11 ,12 ,15 ,16 ,18 ,19 ,20 ,23 ,37 ,38 ,40 ,41 ,42 ,44 ,45 ,47 ,48 ,49 ,50 ,51 ,53 ,54 ,56 ,57 ,60 ,61 ,64 ,67 ,
68 ,71 ,72 ,74 ,75 ,77 ,78 ,79 ,81 ,82 ,83 ,85 ,86 ,89 ,91 ,92 ,94 ,95 ,96 ,130,134,137,138,139,140,141
Surface Resitype chainA:V  ,S  ,P  ,A  ,T  ,N  ,K  ,A  ,G  ,K  ,G  ,A  ,H  ,E  ,P  ,T  ,K  ,T  ,Y  ,P  ,H  ,D  ,L  ,S  ,H  ,G  ,A  ,Q  ,K  ,G  ,K  ,K  ,D  ,T  ,N  ,A  ,H  ,D  ,D  ,P  ,N  ,A  ,S  ,A  ,L  ,D  ,L  ,H  ,L  ,R  ,D  ,P  ,V  ,A  ,T  ,T  ,S  ,K  ,Y  ,R
Surface Residues chainB:1  ,2  ,4  ,5  ,6  ,8  ,9  ,10 ,12 ,13 ,16 ,17 ,19 ,20 ,21 ,22 ,37 ,39 ,40 ,41 ,43 ,44 ,46 ,47 ,49 ,58 ,59 ,61 ,62 ,63 ,65 ,66 ,69 ,72 ,73 ,76 ,77 ,79 ,80 ,82 ,83 ,84 ,86 ,87 ,88 ,90 ,91 ,92 ,94 ,95 ,96 ,97 ,99 ,104,135,139,143,144,145,146
Surface Resitype chainB:V  ,H  ,T  ,P  ,E  ,K  ,S  ,A  ,T  ,A  ,G  ,K  ,N  ,V  ,D  ,E  ,W  ,Q  ,R  ,F  ,E  ,S  ,G  ,D  ,S  ,P  ,K  ,K  ,A  ,H  ,K  ,K  ,G  ,S  ,D  ,A  ,H  ,D  ,N  ,K  ,G  ,T  ,A  ,T  ,L  ,E  ,L  ,H  ,D  ,K  ,L  ,H  ,D  ,R  ,A  ,N  ,H  ,K  ,Y  ,H


Output:
----------------------------------------
Interfacing Residues Chain A:30 ,31 ,34 ,35 ,36 ,99 ,103,104,106,107,110,111,114,117,118,119,120,122,123,126
Interface Residue matched cA:E  ,R  ,L  ,S  ,F  ,K  ,H  ,C  ,L  ,V  ,A  ,A  ,P,F  ,T  ,P  ,A  ,H  ,A  ,D
Interfacing Residues Chain B:30 ,33 ,34 ,35 ,51 ,55 ,101,108,109,111,112,115,116,119,122,123,124,125,127,128,131
Interface Residue matched cB:R  ,V  ,V  ,Y  ,P  ,M  ,E  ,N  ,V  ,V  ,C  ,A  ,H,G  ,F  ,T  ,P  ,P  ,Q  ,A  ,Q
Neighboring Residues Chain A:29 ,32 ,33 ,37 ,98 ,100,102,105,108,109,112,113,115,116,121,124,125,127
Neighboring Residue match cA:L  ,M  ,F  ,P  ,F  ,L  ,S  ,L  ,T  ,L  ,H  ,L  ,A,E  ,V  ,S  ,L  ,K
Neighboring Residues Chain B:29 ,31 ,32 ,36 ,50 ,52 ,54 ,56 ,100,102,107,110,113,114,117,118,120,121,126,129,130,132
Neighboring Residue match cB:G  ,L  ,L  ,P  ,T  ,D  ,V  ,G  ,P  ,N  ,G  ,L  ,V,L  ,H  ,F  ,K  ,E  ,V  ,A  ,Y  ,K

1bbbAB
Surface Residues chainA: 1  ,3  ,4  ,5  ,8  ,9  ,11 ,12 ,15 ,16 ,18 ,19 ,20 ,23 ,38 ,40 ,41 ,42 ,44 ,45 ,47 ,48 ,49 ,50 ,51 ,53 ,54 ,56 ,57 ,60 ,61 ,64 ,67 ,68 ,71 ,72 ,74 ,75 ,77 ,78 ,79 ,81 ,82 ,83 ,85 ,86 ,89 ,91 ,92 ,94 ,95 ,96 ,130,134,137,138,139,140,141
Surface Resitype chainA:V  ,S  ,P  ,A  ,T  ,N  ,K  ,A  ,G  ,K  ,G  ,A  ,H  ,E  ,P  ,T  ,K  ,T  ,Y  ,P  ,H  ,D  ,L  ,S  ,H  ,G  ,A  ,Q  ,K  ,G  ,K  ,K  ,D  ,T  ,N  ,A  ,H  ,D  ,D  ,P  ,N  ,A  ,S  ,A  ,L  ,D  ,L  ,H  ,L  ,R  ,D  ,P  ,V  ,A  ,T  ,T  ,S  ,K  ,Y  ,R
Surface Residues chainB: 1  ,2  ,4  ,5  ,6  ,8  ,9  ,10 ,12 ,13 ,16 ,17 ,19 ,20 ,21 ,22 ,37 ,39 ,40 ,41 ,43 ,44 ,46 ,47 ,49 ,58 ,59 ,61 ,62 ,63 ,65 ,66 ,69 ,72 ,73 ,76 ,77 ,79 ,80 ,82 ,83 ,84 ,86 ,87 ,88 ,90 ,91 ,92 ,94 ,95 ,96 ,97 ,99 ,104,135,139,143,144,145,146
Surface Resitype chainB:V  ,H  ,T  ,P  ,E  ,K  ,S  ,A  ,T  ,A  ,G  ,K  ,N  ,V  ,D  ,E  ,W  ,Q  ,R  ,F  ,E  ,S  ,G  ,D  ,S  ,P  ,K  ,K  ,A  ,H  ,K  ,K  ,G  ,S  ,D  ,A  ,H  ,D  ,N  ,K  ,G  ,T  ,A  ,T  ,L  ,E  ,L  ,H  ,D  ,K  ,L  ,H  ,D  ,R  ,A  ,N  ,H  ,K  ,Y  ,H
0
 

Author Comment

by:sarahJo
ID: 10804395
Hi Vi_srikanth,

My sincere apologies...its working fine.  One of my files was corrupted!  Thank you.

0
 
LVL 4

Expert Comment

by:vi_srikanth
ID: 10804424
Thats gr8
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question