Solved

DNA Reading Frames

Posted on 2009-05-10
6
1,716 Views
Last Modified: 2012-05-06
Hi,

i'm just having trouble with a bit of code. I am trying to sequence three reading frames of a DNA sequence, so i.e. take the hypothetical sequence:

AAGAAAATGAAAAAAAAATAACCGCATG

Reading Frame1 would be: AAG | AAA | ATG | AAA | AAA | AAA | TAA | CCG | CAT | G
Reading Frame2 would be: AGA | AAA | TGA | AAA | AAA | AAT | AAC | CGC | ATG |
Reading Frame3 would be GAA | AAT | GAA | AAA | AAA | ATA |ACC | GCA | TG

You will see for reading frames 2 and 3 the first base has been removed, causing a frame shift in the triplicate code. Triple codes of interest are ATG (Start codon) and TAA/TGA/TAG (stop codons).

So i've been trying to write a code where you can input a random dna sequence, the script will then create the three reading frames by either keeping it as it is (ReadingFrame1), remove the first base (Frame2) or remove a second base (Frame 3).

Once creating the three frames, i have then created a sub-routine which uses a while loop to go along each sequence in triplicate fashion looking for a start codon (ATG) and terminating at any stop codon in it's triplicate frame. Staying in the same triplicate frame is essential, as shown above with the test sequences.

So far my script is:


# feed the dna data into open_reading_frame to return the longest ORF

print "\n -------Reading Frame 1-------\n\n";
$longorf1 = open_reading_frame($dna);


print "\n -------Reading Frame 2-------\n\n";
# remove first base from sequence
$dna2 = substr $dna, 1;
$longorf2 = open_reading_frame($dna2);
print $longorf2;

print "\n -------Reading Frame 3-------\n\n";
# remove first base from $dna2
$dna3 = substr $dna2, 1;
$longorf3 = open_reading_frame($dna3);
print $longorf3;

you will see each reading frame calls the sub-routine "open_reading_frame" which is the loop which should go along each of these reading frame sequences going along in triplicates looking for a start codon ATG and then terminating at a stop codon before printing out the longest of these reading frames.

The code for what i have done so far for this sub-routine is shown below:
# A subroutine to find the longest open reading frame (ORF) for a sequence

sub open_reading_frame {

    my($dna) = @_;

    use strict;
    use warnings;

    #Declare and initialise variables
    my $longest_str ='';
    my $longest_len = 0;

    local $_ = $dna;
    s/\s+//g;
#   print $_,"\n";

    # longest of the shortest sequences ending with TAA|TAG|TGA
    while( /ATG(?=((?:...)*?(?:TAA|TAG|TGA)))/ig ){
        if( length $1 >$longest_len ){
             $longest_str=$1;
             $longest_len=length $1;
             print $1, "\n";
          }
      }
    return $longest_str;
}

---My Problem---

Ok, now upon running the script with a test DNA sequence, reading frames 1, 2 and 3 are all returning the same sequences when they shouldn't as the are in different triplicate frames and should detect different ATG start codons depending on the triplicate frames they are in. Is there a problem with my loop? or a problem with the call to the sub-routine?

Many Thanks

Stephen

 
0
Comment
Question by:StephenMcGowan
  • 3
  • 3
6 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 24347627
while( /\G(?:...)*?ATG(?=((?:...)*?(?:TAA|TAG|TGA)))/ig ){
0
 

Author Comment

by:StephenMcGowan
ID: 24347654
Hi ozo,

I've entered this instead of the previous loop and it seems to be working fine, but it seems to be printing out all of the frames instead of only printing out the longest one?

I'd have thought:

             $longest_str=$1;
             $longest_len=length $1;
             print $1, "\n";

would have seen to this after the loop?
0
 
LVL 84

Expert Comment

by:ozo
ID: 24347672
the function only returns the longest one after finishing the loop,
but the  print $1, "\n"; is inside the loop
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:StephenMcGowan
ID: 24347718
Thanks again ozo, sorry i'm kinda new to this.

so you're saying i should have it outside the loop? as in:

    # longest of the shortest sequences ending with TAA|TAG|TGA
#    while( /ATG(?=((?:...)*?(?:TAA|TAG|TGA)))/ig ){
     while( /\G(?:...)*?ATG(?=((?:...)*?(?:TAA|TAG|TGA)))/ig ){
        if( length $1 >$longest_len ){
             $longest_str=$1;
             $longest_len=length $1;
          }
      }
 print $1, "\n";

    return $longest_str;
}

if i try this i receive: Use of uninitialized value in print at ReadingFrameModules.pm
not really too sure on where the print needs to go in order to print the longest only :o/
0
 
LVL 84

Expert Comment

by:ozo
ID: 24347733
$1 is only defined when the match succeeds, and the loop ends when the match fails
Did you mean to print $longest_str?
Which seems unnecessary, if the one who calls the function is responsible for printing the result of the function.
0
 

Author Comment

by:StephenMcGowan
ID: 24347791
Nevermind! sorted it! (i think!) Thanks ozo. :)
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Remove Malware code from PHP file 6 54
OTRS Installation 1 276
pattern matching in perl 2 100
Removing file extension within a file. 4 88
I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now