Solved

Miss-cleaves

Posted on 2009-05-18
3
207 Views
Last Modified: 2012-05-07
I have a chunk of code which selects an enzyme, and depending on the enzyme cuts a sequence at a specific section:

my $enzyme = $query->param('enzyme');

# Select an enzyme from the radio buttons on form
my $re;
if   ($enzyme eq   'TRYPSIN') { $re=qr/(?<=[KR])(?!P)/; }
elsif($enzyme eq 'ENDOPROTL') { $re=qr/(?<=K)(?!P)/; }
elsif($enzyme eq 'ENDOPROTA') { $re=qr/(?<=R)(?!P)/; }
elsif($enzyme eq    'V8PROT') { $re=qr/(?<=E)(?!P)/; }
else {die "Unknown enzyme selection '$enzyme'\n";}

so in the case above.. Trypsin cuts at K and R but not after P
                                     EndoprotL cuts at K but not after P
etc etc.

Anyway, i'm trying to manipulate this code to try to count how many times a miscleavage happens... i.e. if Trypsin is selected,
how many times does "K" followed by "P" occur?  
how many times does "R" followed by "P" occur?
(these two will be added up)
if EndoprotL selected, how many time doe "K" followed by "P" occur?
etc....

these will be known as miss cleaves and become the variable $miss_cleave

I've copy/pasted my script below if any further information is required.

Thanks.
#!/usr/bin/perl -w

use CGI::Carp 'fatalsToBrowser';

# ORFfinder.pl

# Perl programme to read in FastA format to find all possible open

# reading frames (ORFS) beginning with ATG and ending with a stop codon,

# TGA, TAA, TAG)
 

# Analyse all six open reading frames and predict ORFS in all six. Only

# longest ORF will be used.
 

require 'module.pm';

use CGI;

use strict;

use warnings;

use DNALib;

use ReadingFrameModules;

my $query = new CGI;
 

# Initialise variables

my ($dna, $dna1, $dna2, $dna3, $dna5, $dna6, $revcom, $revcom1, $revcom2, $longorf1, $longorf2, $longorf3, $longorf4, $longorf5, $longorf6, 

$dna_filename);

$dna=$dna1=$dna2=$dna3=$dna5=$dna6=$revcom=$revcom1=$revcom2=$longorf1=$longorf2=$longorf3=$longorf4=$longorf5=$longorf6=$dna_filename='';

my $dna_file;

my @file_data;

my $dna_header;
 

   # If a text box provided, take from that

if ($query->param('dna-textbox')) {

   $dna1 = $query->param('dna-textbox');

   # take header and save it as a string $dna_header

   ($dna_header, $dna1) = split(/\n/, $dna1, 2);
 

   $dna = extract_string_sequence_from_fasta_data($dna1);

 }

   # Else see if file upload

elsif($query->param('fileupload'))  {
 

   #  Retrieve the file from the web post instead of the filesystem

  @file_data = get_file_data();

   #Extract the sequence from the contents of the file

   $dna = extract_sequence_from_fasta_data(@file_data);

}
 
 

# Add ACGT Validation, changing all non ACGT code to A

$dna =~ s/[^acgt]/a/g;
 
 

# feed the dna data into open_reading_frame to return the longest ORF
 

$longorf1 = open_reading_frame($dna);
 

# remove first base from sequence

$dna2 = substr $dna, 1;

$longorf2 = open_reading_frame($dna2);
 

# remove first base from $dna2

$dna3 = substr $dna2, 1;

$longorf3 = open_reading_frame($dna3);
 

#Reverse compliment the DNA sequence

$revcom = revcom($dna);

$longorf4 = open_reading_frame($revcom);
 
 

#remove first base from sequence

$dna5 = substr $revcom, 1;

$longorf5 = open_reading_frame($dna5);
 

#remove a further base from the sequence

$dna6 = substr $dna5, 1;

$longorf6 = open_reading_frame($dna6);
 

# SECOND HALF OF THE PROGRAM - THIS WAS ORIGINALLY TO BE SENT TO A SECOND SCRIPT

# FOR TASK 2 BUT HAD PROBLEMS WITH THE CGI IMPLEMENTING TWO SCRIPTS ON ONE HTML FORM
 

# my($longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6)=@ARGV;
 

#Transfer Open Reading Frames over to ProteinDigest

# system './proteindigest.pl', $longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6;
 

# Initialise second program variables

my $orfprotein1 = '';

my $orfprotein2 = '';

my $orfprotein3 = '';

my $orfprotein4 = '';

my $orfprotein5 = '';

my $orfprotein6 = '';

my $codon;
 

# Convert DNA sequence to Protein sequence - Translate each three base

# codon into an amino acid, and append to the protein
 

for(my $i=0; $i < (length($longorf1) -2) ; $i += 3) {

$codon = substr($longorf1,$i,3);

$orfprotein1 .= codon2aa($codon);

}
 

for(my $i=0; $i < (length($longorf2) -2) ; $i += 3) {

$codon = substr($longorf2,$i,3);

$orfprotein2 .= codon2aa($codon);

}
 

for(my $i=0; $i < (length($longorf3) -2) ; $i += 3) {

$codon = substr($longorf3,$i,3);

$orfprotein3 .= codon2aa($codon);

}
 

for(my $i=0; $i < (length($longorf4) -2) ; $i += 3) {

$codon = substr($longorf4,$i,3);

$orfprotein4 .= codon2aa($codon);

}
 

for(my $i=0; $i < (length($longorf5) -2) ; $i += 3) {

$codon = substr($longorf5,$i,3);

$orfprotein5 .= codon2aa($codon);

}
 

for(my $i=0; $i < (length($longorf6) -2) ; $i += 3) {

$codon = substr($longorf6,$i,3);

$orfprotein6 .= codon2aa($codon);

}
 

# Add N-terminal to each reading frame
 

$orfprotein1 = $orfprotein1 = "_$orfprotein1";

$orfprotein2 = $orfprotein2 = "_$orfprotein2";

$orfprotein3 = $orfprotein3 = "_$orfprotein3";

$orfprotein4 = $orfprotein4 = "_$orfprotein4";

$orfprotein5 = $orfprotein5 = "_$orfprotein5";

$orfprotein6 = $orfprotein6 = "_$orfprotein6";
 
 
 
 
 
 
 
 
 

my $enzyme = $query->param('enzyme');
 

# Select an enzyme from the radio buttons on form

my $re;

if   ($enzyme eq   'TRYPSIN') { $re=qr/(?<=[KR])(?!P)/; }

elsif($enzyme eq 'ENDOPROTL') { $re=qr/(?<=K)(?!P)/; }

elsif($enzyme eq 'ENDOPROTA') { $re=qr/(?<=R)(?!P)/; }

elsif($enzyme eq    'V8PROT') { $re=qr/(?<=E)(?!P)/; }

else {die "Unknown enzyme selection '$enzyme'\n";}
 
 

# To cleave all proteins, and put then in the same array

my @parts;

foreach my $seq ($orfprotein1,$orfprotein2,$orfprotein3,$orfprotein4,$orfprotein5,$orfprotein6) {

    push @parts, split($re, $seq);

}
 

# Now, @parts contains everything

# Generate an array of all digested protein fragments

my @fragments = join("<br>\n", @parts); 
 

print "Content-type:  text/html
 

<html>

<head>

<link href='thrColElsHdr.css' rel='stylesheet' type='text/css' />

</head>

<div class='thrColElsHdr'>
 

<div id='container'>

  <div id='header'>

     

     <img src='dna.png' alt='DNA double helix' />
 

         <h2>Peptide mass/charge analyser</h2>
 

    

  <!-- end #header --></div>

  <div id='sidebar1'>

  

  <!-- end #sidebar1 --></div>

  <div id='sidebar2'>

  

  <!-- end #sidebar2 --></div>

  <div id='mainContent'>

  

<label>

<h2>Protein Digestion Results for $dna_header</h2>
 
 

</label>

<form id='form3' name='form3' method='post' action='mass.pl'>

<label>Please select a Mass to be analysed before continuing to the mass 

analyser:<br />    <br />

    <label>

      <input type='radio' name='mass' value='average' 

id='average' />

      Average</label>

    <label>

      <input type='radio' name='mass' value='mono-isotopic' 

id='mono-isotopic'

/>

      Mono-Isotopic</label>

    <br />

<br />
 

Please click here:

<form method= 'link' action='mass.pl'> <input class='form-button' type='submit' value='M/Z Analyser'>
 

</form>
 

<hr />
 

<p>List of protein cleavage fragments, cleaved with enzyme $enzyme;</p>

<p>@fragments</p>  
 

  

  

  

  
 
 
 
 
 

	<!-- end #mainContent --></div>

	<!-- This clearing element should immediately follow the #mainContent div in order to force the #container div to contain all child floats --><br class='clearfloat' />

   <div id='footer'>

<p><a href='Help.pl#references'>REFERENCES</a> | <a href='Help.pl#about'>ABOUT</a></p>

  <!-- end #footer --></div>

<!-- end #container --></div>

</div>

</html>
 

";

Open in new window

0
Comment
Question by:StephenMcGowan
  • 2
3 Comments
 
LVL 39

Expert Comment

by:Adam314
ID: 24413258

my $enzyme = $query->param('enzyme');

 

# Select an enzyme from the radio buttons on form

my ($reS, $reC);

if   ($enzyme eq   'TRYPSIN') { $reS=qr/(?<=[KR])(?!P)/; $reC = qr/[KR]P/;}

elsif($enzyme eq 'ENDOPROTL') { $reS=qr/(?<=K)(?!P)/;    $reC = qr/KP/;}

elsif($enzyme eq 'ENDOPROTA') { $reS=qr/(?<=R)(?!P)/;    $reC = qr/RP/;}

elsif($enzyme eq    'V8PROT') { $reS=qr/(?<=E)(?!P)/;    $reC = qr/EP/;}

else {die "Unknown enzyme selection '$enzyme'\n";}

 

 

# To cleave all proteins, and put then in the same array

my @parts;

my $miss_cleave;

foreach my $seq ($orfprotein1,$orfprotein2,$orfprotein3,$orfprotein4,$orfprotein5,$orfprotein6) {

    push @parts, split($reS, $seq);

    $miss_cleave = $seq =~ s/$reC//g;

}

Open in new window

0
 

Author Comment

by:StephenMcGowan
ID: 24414760
Hi Adam,

Really sorry about this, but i think i've described what i want to achieve wrong.

My script currently creates an array called @fragments which is a list of small peptides which varies depending on which enzyme is cutting it. Each enzyme is different:

Trypsin cuts at K and R but not when followed by a P ("KP"  "RP")
EndoprotL cuts at K but not when followed by a P ("KP")
etc etc you get the jist...

Anyway!, this is all dependent on the enzyme selected, so for each enzyme, there will be a different type of miss cleave, whether it be (KP + RP) (KP) (RP) or (EP)

what i'm trying to do is generate a way, dependent on enzyme, to scan through all lines of @fragments for each line count the number of the certain type of miscleave, and return a number in an array... so:

Enzyme: Trypsin

Peptide                                                   Miscleaves

SAEVIHQ "RP" VEEALDTDEK                        1
EMLR                                                              0
DVAI "KP" DVVPPNVR                                  1
DLALVELDILR                                                0
ER "KP" R                                                       1
GK                                                                  0
LSVGDLAELLYR                                           0

Thanks
0
 
LVL 39

Accepted Solution

by:
Adam314 earned 500 total points
ID: 24416376

my $enzyme = $query->param('enzyme');

 

# Select an enzyme from the radio buttons on form

my ($reS, $reC);

if   ($enzyme eq   'TRYPSIN') { $reS=qr/(?<=[KR])(?!P)/; $reC = qr/[KR]P/;}

elsif($enzyme eq 'ENDOPROTL') { $reS=qr/(?<=K)(?!P)/;    $reC = qr/KP/;}

elsif($enzyme eq 'ENDOPROTA') { $reS=qr/(?<=R)(?!P)/;    $reC = qr/RP/;}

elsif($enzyme eq    'V8PROT') { $reS=qr/(?<=E)(?!P)/;    $reC = qr/EP/;}

else {die "Unknown enzyme selection '$enzyme'\n";}

 

 

# To cleave all proteins, and put then in the same array

my @parts;

foreach my $seq ($orfprotein1,$orfprotein2,$orfprotein3,$orfprotein4,$orfprotein5,$orfprotein6) {

    push @parts, split($reS, $seq);

}
 

my @miss_cleave;

foreach my $part (@parts) {

    my $seq=$part;

    push @miss_cleave, $seq =~ s/$reC//g;

}

Open in new window

0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now