Solved

longest sequence...

Posted on 2009-04-10
6
182 Views
Last Modified: 2012-05-06
I got the below code from expert-exchange for getting the longest sequence.... starts with atg and ends with any of 3 codens..

when i give one sequence as input it gives perfect answer ....

but when i give two sequence as inputs it's giving answer for only one sequence....

so what modifications should i do...


sample sequences:


>G7H5
AATATGATTTTGAATTTGGTTCAAAATGAAACCGTCTCCGTTCATTGTTTTGATATTTGCTGTTATTATAGGCCTGTGTGGTTGTGCACCACCCAAGGCCGAAGAAACTCAATCTGCTACGAGTACGAAAGCCGAGTCTTCTAATGCGGGTCAGAGCGGAAATCGATATCCACCGGTGAAGATGAATTTTGAAAAAGTGTTTACTCCTAGTTTTTGTAAAGGTTTGCAAGATCAGCAATCAAAAATTGAAGAACTTTCGGCAGACTTGGAGAGGTTTGAGGGTCAGGAATTGAAGTCAAATTATGGAACATATTCCGACAAAAAGGACCATAAATAAAAATTTGTCCAGCAAAAGATATGGTTGCATAATAAACGCAAATATAATCATACACGCCCAAAAAAAAAAAAAAAAAAAAAAAA

>G8C4
GGGAGTATAATCTTGAATTTGGTTCAAAATGAAACTGTCTCTGTTCATTATTTTTTTGATATTTGCTGTTATTATAGGCCTGTGTGGTTGTGCACCACCCAAGGCCGAAGGAACTAAATCTGGTATGGGAACGCAAGCCGAGTCTTCTAATGCGGGTCAGAGAGGAAGTCGAAACAATGGCATCTCATCGGCGGAGTTGAACTTTGACAGAATTTCTCCTGGTTTTATTAAAGGTTTGCGTGAAGATCAATCAGGATATGAAAAAGTTGGAGAGATCTTGAAGAGGGCTCAGGATCAGCAATTGAAGTCAAATTATGGAAAATATTCCGACAAAAAGGCCCATAATTAAAAATTTGTTCAGCAAAAAATTTGGTTGCATAATAAACCCAAAAATAATCATCCCCGCAAAAAAAAAAAAAAAAAAAAAAAAAA
#!/usr/bin/perl
use warnings;
use strict;
 
open INPUT,"<input.txt" or die $!;
my $name=<INPUT>;
local $/;
$_ = <INPUT>;
close INPUT;
s/\s+//g;
my $longest_str ='';
my $longest_len = 0;
open P,">possible.txt" or die $!;
while( /ATG(?=((?:...)*?(?:TAA|TAG|TGA)))/g ){
   print P "ATG$1\n";
    if( length $1 >$longest_len ){
        $longest_str=$1;
        $longest_len=length $1;
    }
}
close P;
open OUTPUT,">output.fasta" or die $!;
print OUTPUT $name,"ATG",$longest_str if $longest_len;
close OUTPUT

Open in new window

0
Comment
Question by:shragi
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 25

Accepted Solution

by:
lwadwell earned 500 total points
ID: 24120119
Hi shragi,

you will need to change the code to loop through the input file and process each "set" of sequences.  I have had an attempt at it below ... I think it does what you want.

lwadwell
#!/usr/bin/perl
use warnings;
use strict;
 
open INPUT,"<input.txt" or die $!;
open OUTPUT,">output.fasta" or die $!; 
my $name; 
while ( my $string = <INPUT> ) {
	chomp($string);            # remove trailing new lines
	$string =~ s/\s+//g;       # remove whitespace
	next if ( $string eq "" ); # start loop again if blank line
	if ( $string =~ /^>/ ) {   # if line starts with a '>', save as $name
		$name = $string;
		next;
	}
	my $longest_str ='';
	my $longest_len = 0;
	open P,">possible.txt" or die $!;
	while( $string =~ /ATG(?=((?:...)*?(?:TAA|TAG|TGA)))/g ){
   		print P "ATG$1\n";
    	if ( length $1 >$longest_len ){
        	$longest_str=$1;
        	$longest_len=length $1;
    	}
	}
	close P;
	print OUTPUT $name,"ATG",$longest_str,"\n" if $longest_len;
}
close INPUT;
close OUTPUT

Open in new window

0
 

Author Comment

by:shragi
ID: 24128135
one small change required..

there is no space between name and sequence...
0
 

Author Comment

by:shragi
ID: 24128136
I need a space between them...
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 5

Expert Comment

by:vikaskhoria
ID: 24128310
Just change line 27 in the script above as:
(added space before and after ATG in quotes)
print OUTPUT $name," ATG ",$longest_str,"\n" if $longest_len;

0
 
LVL 25

Expert Comment

by:lwadwell
ID: 24128492
If you don't want one after the ATG then use
print OUTPUT $name," ATG",$longest_str,"\n" if $longest_len;

Open in new window

0
 

Author Closing Comment

by:shragi
ID: 31568994
Perfect..
0

Featured Post

Secure Your Active Directory - April 20, 2017

Active Directory plays a critical role in your company’s IT infrastructure and keeping it secure in today’s hacker-infested world is a must.
Microsoft published 300+ pages of guidance, but who has the time, money, and resources to implement? Register now to find an easier way.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question