Solved

Miss-cleaves

Posted on 2009-05-18
3
216 Views
Last Modified: 2012-05-07
I have a chunk of code which selects an enzyme, and depending on the enzyme cuts a sequence at a specific section:

my $enzyme = $query->param('enzyme');

# Select an enzyme from the radio buttons on form
my $re;
if   ($enzyme eq   'TRYPSIN') { $re=qr/(?<=[KR])(?!P)/; }
elsif($enzyme eq 'ENDOPROTL') { $re=qr/(?<=K)(?!P)/; }
elsif($enzyme eq 'ENDOPROTA') { $re=qr/(?<=R)(?!P)/; }
elsif($enzyme eq    'V8PROT') { $re=qr/(?<=E)(?!P)/; }
else {die "Unknown enzyme selection '$enzyme'\n";}

so in the case above.. Trypsin cuts at K and R but not after P
                                     EndoprotL cuts at K but not after P
etc etc.

Anyway, i'm trying to manipulate this code to try to count how many times a miscleavage happens... i.e. if Trypsin is selected,
how many times does "K" followed by "P" occur?  
how many times does "R" followed by "P" occur?
(these two will be added up)
if EndoprotL selected, how many time doe "K" followed by "P" occur?
etc....

these will be known as miss cleaves and become the variable $miss_cleave

I've copy/pasted my script below if any further information is required.

Thanks.
#!/usr/bin/perl -w
use CGI::Carp 'fatalsToBrowser';
# ORFfinder.pl
# Perl programme to read in FastA format to find all possible open
# reading frames (ORFS) beginning with ATG and ending with a stop codon,
# TGA, TAA, TAG)
 
# Analyse all six open reading frames and predict ORFS in all six. Only
# longest ORF will be used.
 
require 'module.pm';
use CGI;
use strict;
use warnings;
use DNALib;
use ReadingFrameModules;
my $query = new CGI;
 
# Initialise variables
my ($dna, $dna1, $dna2, $dna3, $dna5, $dna6, $revcom, $revcom1, $revcom2, $longorf1, $longorf2, $longorf3, $longorf4, $longorf5, $longorf6, 
$dna_filename);
$dna=$dna1=$dna2=$dna3=$dna5=$dna6=$revcom=$revcom1=$revcom2=$longorf1=$longorf2=$longorf3=$longorf4=$longorf5=$longorf6=$dna_filename='';
my $dna_file;
my @file_data;
my $dna_header;
 
   # If a text box provided, take from that
if ($query->param('dna-textbox')) {
   $dna1 = $query->param('dna-textbox');
   # take header and save it as a string $dna_header
   ($dna_header, $dna1) = split(/\n/, $dna1, 2);
 
   $dna = extract_string_sequence_from_fasta_data($dna1);
 }
   # Else see if file upload
elsif($query->param('fileupload'))  {
 
   #  Retrieve the file from the web post instead of the filesystem
  @file_data = get_file_data();
   #Extract the sequence from the contents of the file
   $dna = extract_sequence_from_fasta_data(@file_data);
}
 
 
# Add ACGT Validation, changing all non ACGT code to A
$dna =~ s/[^acgt]/a/g;
 
 
# feed the dna data into open_reading_frame to return the longest ORF
 
$longorf1 = open_reading_frame($dna);
 
# remove first base from sequence
$dna2 = substr $dna, 1;
$longorf2 = open_reading_frame($dna2);
 
# remove first base from $dna2
$dna3 = substr $dna2, 1;
$longorf3 = open_reading_frame($dna3);
 
#Reverse compliment the DNA sequence
$revcom = revcom($dna);
$longorf4 = open_reading_frame($revcom);
 
 
#remove first base from sequence
$dna5 = substr $revcom, 1;
$longorf5 = open_reading_frame($dna5);
 
#remove a further base from the sequence
$dna6 = substr $dna5, 1;
$longorf6 = open_reading_frame($dna6);
 
# SECOND HALF OF THE PROGRAM - THIS WAS ORIGINALLY TO BE SENT TO A SECOND SCRIPT
# FOR TASK 2 BUT HAD PROBLEMS WITH THE CGI IMPLEMENTING TWO SCRIPTS ON ONE HTML FORM
 
# my($longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6)=@ARGV;
 
#Transfer Open Reading Frames over to ProteinDigest
# system './proteindigest.pl', $longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6;
 
# Initialise second program variables
my $orfprotein1 = '';
my $orfprotein2 = '';
my $orfprotein3 = '';
my $orfprotein4 = '';
my $orfprotein5 = '';
my $orfprotein6 = '';
my $codon;
 
# Convert DNA sequence to Protein sequence - Translate each three base
# codon into an amino acid, and append to the protein
 
for(my $i=0; $i < (length($longorf1) -2) ; $i += 3) {
$codon = substr($longorf1,$i,3);
$orfprotein1 .= codon2aa($codon);
}
 
for(my $i=0; $i < (length($longorf2) -2) ; $i += 3) {
$codon = substr($longorf2,$i,3);
$orfprotein2 .= codon2aa($codon);
}
 
for(my $i=0; $i < (length($longorf3) -2) ; $i += 3) {
$codon = substr($longorf3,$i,3);
$orfprotein3 .= codon2aa($codon);
}
 
for(my $i=0; $i < (length($longorf4) -2) ; $i += 3) {
$codon = substr($longorf4,$i,3);
$orfprotein4 .= codon2aa($codon);
}
 
for(my $i=0; $i < (length($longorf5) -2) ; $i += 3) {
$codon = substr($longorf5,$i,3);
$orfprotein5 .= codon2aa($codon);
}
 
for(my $i=0; $i < (length($longorf6) -2) ; $i += 3) {
$codon = substr($longorf6,$i,3);
$orfprotein6 .= codon2aa($codon);
}
 
# Add N-terminal to each reading frame
 
$orfprotein1 = $orfprotein1 = "_$orfprotein1";
$orfprotein2 = $orfprotein2 = "_$orfprotein2";
$orfprotein3 = $orfprotein3 = "_$orfprotein3";
$orfprotein4 = $orfprotein4 = "_$orfprotein4";
$orfprotein5 = $orfprotein5 = "_$orfprotein5";
$orfprotein6 = $orfprotein6 = "_$orfprotein6";
 
 
 
 
 
 
 
 
 
my $enzyme = $query->param('enzyme');
 
# Select an enzyme from the radio buttons on form
my $re;
if   ($enzyme eq   'TRYPSIN') { $re=qr/(?<=[KR])(?!P)/; }
elsif($enzyme eq 'ENDOPROTL') { $re=qr/(?<=K)(?!P)/; }
elsif($enzyme eq 'ENDOPROTA') { $re=qr/(?<=R)(?!P)/; }
elsif($enzyme eq    'V8PROT') { $re=qr/(?<=E)(?!P)/; }
else {die "Unknown enzyme selection '$enzyme'\n";}
 
 
# To cleave all proteins, and put then in the same array
my @parts;
foreach my $seq ($orfprotein1,$orfprotein2,$orfprotein3,$orfprotein4,$orfprotein5,$orfprotein6) {
    push @parts, split($re, $seq);
}
 
# Now, @parts contains everything
# Generate an array of all digested protein fragments
my @fragments = join("<br>\n", @parts); 
 
print "Content-type:  text/html
 
<html>
<head>
<link href='thrColElsHdr.css' rel='stylesheet' type='text/css' />
</head>
<div class='thrColElsHdr'>
 
<div id='container'>
  <div id='header'>
     
     <img src='dna.png' alt='DNA double helix' />
 
         <h2>Peptide mass/charge analyser</h2>
 
    
  <!-- end #header --></div>
  <div id='sidebar1'>
  
  <!-- end #sidebar1 --></div>
  <div id='sidebar2'>
  
  <!-- end #sidebar2 --></div>
  <div id='mainContent'>
  
<label>
<h2>Protein Digestion Results for $dna_header</h2>
 
 
</label>
<form id='form3' name='form3' method='post' action='mass.pl'>
<label>Please select a Mass to be analysed before continuing to the mass 
analyser:<br />    <br />
    <label>
      <input type='radio' name='mass' value='average' 
id='average' />
      Average</label>
    <label>
      <input type='radio' name='mass' value='mono-isotopic' 
id='mono-isotopic'
/>
      Mono-Isotopic</label>
    <br />
<br />
 
Please click here:
<form method= 'link' action='mass.pl'> <input class='form-button' type='submit' value='M/Z Analyser'>
 
</form>
 
<hr />
 
<p>List of protein cleavage fragments, cleaved with enzyme $enzyme;</p>
<p>@fragments</p>  
 
  
  
  
  
 
 
 
 
 
	<!-- end #mainContent --></div>
	<!-- This clearing element should immediately follow the #mainContent div in order to force the #container div to contain all child floats --><br class='clearfloat' />
   <div id='footer'>
<p><a href='Help.pl#references'>REFERENCES</a> | <a href='Help.pl#about'>ABOUT</a></p>
  <!-- end #footer --></div>
<!-- end #container --></div>
</div>
</html>
 
";

Open in new window

0
Comment
Question by:StephenMcGowan
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 39

Expert Comment

by:Adam314
ID: 24413258

my $enzyme = $query->param('enzyme');
 
# Select an enzyme from the radio buttons on form
my ($reS, $reC);
if   ($enzyme eq   'TRYPSIN') { $reS=qr/(?<=[KR])(?!P)/; $reC = qr/[KR]P/;}
elsif($enzyme eq 'ENDOPROTL') { $reS=qr/(?<=K)(?!P)/;    $reC = qr/KP/;}
elsif($enzyme eq 'ENDOPROTA') { $reS=qr/(?<=R)(?!P)/;    $reC = qr/RP/;}
elsif($enzyme eq    'V8PROT') { $reS=qr/(?<=E)(?!P)/;    $reC = qr/EP/;}
else {die "Unknown enzyme selection '$enzyme'\n";}
 
 
# To cleave all proteins, and put then in the same array
my @parts;
my $miss_cleave;
foreach my $seq ($orfprotein1,$orfprotein2,$orfprotein3,$orfprotein4,$orfprotein5,$orfprotein6) {
    push @parts, split($reS, $seq);
    $miss_cleave = $seq =~ s/$reC//g;
}

Open in new window

0
 

Author Comment

by:StephenMcGowan
ID: 24414760
Hi Adam,

Really sorry about this, but i think i've described what i want to achieve wrong.

My script currently creates an array called @fragments which is a list of small peptides which varies depending on which enzyme is cutting it. Each enzyme is different:

Trypsin cuts at K and R but not when followed by a P ("KP"  "RP")
EndoprotL cuts at K but not when followed by a P ("KP")
etc etc you get the jist...

Anyway!, this is all dependent on the enzyme selected, so for each enzyme, there will be a different type of miss cleave, whether it be (KP + RP) (KP) (RP) or (EP)

what i'm trying to do is generate a way, dependent on enzyme, to scan through all lines of @fragments for each line count the number of the certain type of miscleave, and return a number in an array... so:

Enzyme: Trypsin

Peptide                                                   Miscleaves

SAEVIHQ "RP" VEEALDTDEK                        1
EMLR                                                              0
DVAI "KP" DVVPPNVR                                  1
DLALVELDILR                                                0
ER "KP" R                                                       1
GK                                                                  0
LSVGDLAELLYR                                           0

Thanks
0
 
LVL 39

Accepted Solution

by:
Adam314 earned 500 total points
ID: 24416376

my $enzyme = $query->param('enzyme');
 
# Select an enzyme from the radio buttons on form
my ($reS, $reC);
if   ($enzyme eq   'TRYPSIN') { $reS=qr/(?<=[KR])(?!P)/; $reC = qr/[KR]P/;}
elsif($enzyme eq 'ENDOPROTL') { $reS=qr/(?<=K)(?!P)/;    $reC = qr/KP/;}
elsif($enzyme eq 'ENDOPROTA') { $reS=qr/(?<=R)(?!P)/;    $reC = qr/RP/;}
elsif($enzyme eq    'V8PROT') { $reS=qr/(?<=E)(?!P)/;    $reC = qr/EP/;}
else {die "Unknown enzyme selection '$enzyme'\n";}
 
 
# To cleave all proteins, and put then in the same array
my @parts;
foreach my $seq ($orfprotein1,$orfprotein2,$orfprotein3,$orfprotein4,$orfprotein5,$orfprotein6) {
    push @parts, split($reS, $seq);
}
 
my @miss_cleave;
foreach my $part (@parts) {
    my $seq=$part;
    push @miss_cleave, $seq =~ s/$reC//g;
}

Open in new window

0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question