asked on

HTML output different to it's Perl output

I've set up a basic web interface with a text box at:
http://biolinux.smith.man.ac.uk/~campus12sm/assessment.pl

>TESTSEQUENCE
atgtctgctgaagtcatccatcaggttgaagaagcacttgatacagatgagaaggagatg
ctgcgggatgttgctatagatgtggttccacctaatgtcagggaccttgcgctcgtcgag
ctggatattttacgggaaagaggtaagctgtctgtcggggacttggctgaactgctctac
agagtgaggcgatttgacctgctcaaacgtatcttgaagatggacagaaaagctgtggag
acccacctgctcaggaaccctcaccttgtttcggactatagagtgctgatggcagagatt
ggtgaggatttggataaatctgatgtgtcctcattaattttcctcatgaaggattacatg
ggccgaggcaagataagcaaggagaagagtttcttggaccttgtggttgagttggagaaa
ctaaatctggttgccccagatcaactggatttattagaaaaatgcctaaagaacatccac
agaatagacctgaagacaaaaatccagaagtacaagcagtctgttcaaggagcagggaca
agttacaggaatgttctccaagcagcaatccaaaagagtctcaaggatccttcaaataac
ttcaggctccataatgggagaagtaaagaacaaagacttaaggaacagcttggcgctcaa
caagaaccagtgaagaaatccattcaggaatcagaagcttttttgcctcagagcatacct
gaagagagatacaagatgaagagcaagcccctaggaatctgcctgataatcgattgcatt
ggcaatgagacagagcttcttcgagacaccttcacttccctgggctatgaagtccagaaa
ttcttgcatctcagtatgcatggtatatcccagattcttggccaatttgcctgtatgccc
gagcaccgagactacgacagctttgtgtgtgtcctggtgagccgaggaggctcccagagt
gtgtatggtgtggatcagactcactcagggctccccctgcatcacatcaggaggatgttc
atgggagattcatgcccttatctagcagggaagccaaagatgttttttattcagaactat

You should be able to copy/paste the above code into the text box and hit submit to return 6 DIFFERENT sequences. If i run this through Perl, calling the same dna sequence via file, this works fine:

Reading Frame 1: SAEVIHQVEEALDTDEKEMLRDVAIDVVPPNVRDLALVELDILRERGKLSVGDLAELLYRVRRFDLLKRILKMDRKAVETHLLRNPHLVSDYRVLMAEIGEDLDKSDVSSLIFLMKDYMGRGKISKEKSFLDLVVELEKLNLVAPDQLDLLEKCLKNIHRIDLKTKIQKYKQSVQGAGTSYRNVLQAAIQKSLKDPSNNFRLHNGRSKEQRLKEQLGAQQEPVKKSIQESEAFLPQSIPEERYKMKSKPLGICLIIDCIGNETELLRDTFTSLGYEVQKFLHLSMHGISQILGQFACMPEHRDYDSFVCVLVSRGGSQSVYGVDQTHSGLPLHHIRRMFMGDSCPYLAGKPKMFFIQNYVVSEGQLEDSSLLEVDGPAMKNVEFKAQKRGLCTVHREADFFWSLCTADMSLLEQSHSSPSLYLQCLSQKLRQERKRPLLDLHIELNGYMYDWNSRVSAKEKYYVWLQHTLRKKLILSYT_; 
Reading Frame 2: RQSFFETPSLPWAMKSRNSCISVCMVYPRFLANLPVCPSTETTTALCVSW_; 
Reading Frame 3: PLSSREAKDVFYSELCGVRGPAGGQQPLGGGWASDEECGIQGSEARAVHSSPRS_; 
Reading Frame 4: HTEMQEFLDFIAQGSEGVSKKLCLIANAIDYQADS_; 
Reading Frame 5: QSIIRQIPRGLLFILYLSSGML_; 
Reading Frame 6: RTHQIYPNPHQSLPSALYSPKQGEGS_; 

but if this is run through HTML, i receive sequences 4, 5 and 6 being exactly the same which shouldnt be the case. In fact none of the sequences are in the order shown above which i find to be bizarre!

My file processing script is shown below:

Thanks

#!/usr/bin/perl -w
# Perl programme to take the Reading Frame Sequences from
# "ORFfinder.pl", convert the DNA sequences into protein sequences and
# cleave the protein sequence depending on the option selected by
# user
 
use strict;
use warnings;
use ReadingFrameModules;
 
use CGI; # a predefined module
my $query = new CGI;
 
# Initialise variables
my $orfprotein1 = '';
my $orfprotein2 = '';
my $orfprotein3 = '';
my $orfprotein4 = '';
my $orfprotein5 = '';
my $orfprotein6 = '';
my $codon;
 
 
my($longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6)=@ARGV;
 
 
# Convert DNA sequence to Protein sequence - Translate each three base
# codon into an amino acid, and append to the protein
 
# READING FRAME 1
 
for(my $i=0; $i < (length($longorf1) -2) ; $i += 3) {
$codon = substr($longorf1,$i,3);
$orfprotein1 .= codon2aa($codon);
}
 
# READING FRAME 2
 
for(my $i=0; $i < (length($longorf2) -2) ; $i += 3) {
$codon = substr($longorf2,$i,3);
$orfprotein2 .= codon2aa($codon);
}
 
# READING FRAME 3
 
for(my $i=0; $i < (length($longorf3) -2) ; $i += 3) {
$codon = substr($longorf3,$i,3);
$orfprotein3 .= codon2aa($codon);
}
 
# READING FRAME 4
 
for(my $i=0; $i < (length($longorf4) -2) ; $i += 3) {
$codon = substr($longorf4,$i,3);
$orfprotein4 .= codon2aa($codon);
}
 
# READING FRAME 5
 
for(my $i=0; $i < (length($longorf5) -2) ; $i += 3) {
$codon = substr($longorf5,$i,3);
$orfprotein5 .= codon2aa($codon);
}
 
# READING FRAME 6
 
for(my $i=0; $i < (length($longorf6) -2) ; $i += 3) {
$codon = substr($longorf6,$i,3);
$orfprotein6 .= codon2aa($codon);
}
 
#HTML OUTPUT
 
print "Content-type: text/html
 
<html>
<title>Page 2</title>
<body>
Reading Frame 1: $orfprotein1;<br>
Reading Frame 2: $orfprotein2;<br>
Reading Frame 3: $orfprotein3;<br>
Reading Frame 4: $orfprotein4;<br>
Reading Frame 5: $orfprotein5;<br>
Reading Frame 6: $orfprotein6;<br>
</body>
</html>
 
";

Open in new window

ozo

How do you pass @ARGV to the program when it is run through HTML?

StephenMcGowan

ASKER

Hey ozo,

Sorry, how do you mean?

My assumption was that:
my($longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6)=@ARGV;

brought the $longorfs generated from my first script and imported them into this script for me to then play around with them and then create new html page shown at the bottom of the script.

So basically script 1 feeds in the dna sequences from my form and generates 6 $longorfs
script 2 generates 6 $orfproteins from the $longorfs and reports this as a new HTML page

hope this helps

ozo

Are you calling from your first script, or through HTML?

StephenMcGowan

ASKER

Ahh right, calling from my first script:

-----FIRST SCRIPT-----

#!/usr/bin/perl -w
# Perl programme to read in FastA format to find all possible open
# reading frames (ORFS) beginning with ATG and ending with a stop codon,
# TGA, TAA, TAG)

# Analyse all six open reading frames and predict ORFS in all six. Only
# longest ORF will be used.

use strict;
use warnings;
use ReadingFrameModules;
use CGI;
my $query = new CGI;

# Initialise variables
my @file_data = ();
my $dna = '';
my $dna2 = '';
my $dna3 = '';
my $dna5 = '';
my $dna6 = '';
my $revcom = '';
my $revcom1 = '';
my $revcom2 = '';
my $longorf1 = '';
my $longorf2 = '';
my $longorf3 = '';
my $longorf4 = '';
my $longorf5 = '';
my $longorf6 = '';

$dna = $query->param('dna-textbox');

# Extract the sequence data from the contents of the Fasta file

# $dna = extract_sequence_from_fasta_data(@file_data);

# feed the dna data into open_reading_frame to return the longest ORF

# print "\n -------Reading Frame 1-------\n\n";
$longorf1 = open_reading_frame($dna);
# print $longorf1;

# print "\n -------Reading Frame 2-------\n\n";
# remove first base from sequence
$dna2 = substr $dna, 1;
$longorf2 = open_reading_frame($dna2);
# print $longorf2;

# print "\n -------Reading Frame 3-------\n\n";
# remove first base from $dna2
$dna3 = substr $dna2, 1;
$longorf3 = open_reading_frame($dna3);
# print $longorf3;

#Reverse compliment the DNA sequence
$revcom = revcom($dna);

# print "\n -------Reading Frame 4-------\n\n";
# print $revcom;
$longorf4 = open_reading_frame($revcom);
# print $longorf4;

# print "\n -------Reading Frame 5-------\n\n";
#remove first base from sequence
$dna5 = substr $revcom, 1;
$longorf5 = open_reading_frame($dna5);
# print $longorf5;

# print "\n -------Reading Frame 6-------\n\n";
#remove a further base from the sequence
$dna6 = substr $dna5, 1;
$longorf6 = open_reading_frame($dna6);
# print $longorf6;

#Transfer Open Reading Frames over to ProteinDigest
system './proteindigest.pl', $longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6;

----SCRIPT 2----
 
#!/usr/bin/perl -w
# Perl programme to take the Reading Frame Sequences from
# "ORFfinder.pl", convert the DNA sequences into protein sequences and
# cleave the protein sequence depending on the option selected by
# user
 
use strict;
use warnings;
use ReadingFrameModules;
 
use CGI; # a predefined module
my $query = new CGI;
 
# my $enzyme = [FILL THIS OUT....]
 
 
# Initialise variables
my $orfprotein1 = '';
my $orfprotein2 = '';
my $orfprotein3 = '';
my $orfprotein4 = '';
my $orfprotein5 = '';
my $orfprotein6 = '';
my $codon;
 
 
my($longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6)=@ARGV;
 
 
# Convert DNA sequence to Protein sequence - Translate each three base
# codon into an amino acid, and append to the protein
 
# READING FRAME 1
# print "\n -------Reading Frame 1-------\n\n";
 
for(my $i=0; $i < (length($longorf1) -2) ; $i += 3) {
$codon = substr($longorf1,$i,3);
$orfprotein1 .= codon2aa($codon);
}
# print $orfprotein1;
 
# READING FRAME 2
# print "\n -------Reading Frame 2-------\n\n";
 
for(my $i=0; $i < (length($longorf2) -2) ; $i += 3) {
$codon = substr($longorf2,$i,3);
$orfprotein2 .= codon2aa($codon);
}
# print $orfprotein2;
 
# READING FRAME 3
# print "\n -------Reading Frame 3-------\n\n";
 
for(my $i=0; $i < (length($longorf3) -2) ; $i += 3) {
$codon = substr($longorf3,$i,3);
$orfprotein3 .= codon2aa($codon);
}
# print $orfprotein3;
 
# READING FRAME 4
# print "\n -------Reading Frame 4-------\n\n";
 
for(my $i=0; $i < (length($longorf4) -2) ; $i += 3) {
$codon = substr($longorf4,$i,3);
$orfprotein4 .= codon2aa($codon);
}
# print $orfprotein4;
 
# READING FRAME 5
# print "\n -------Reading Frame 5-------\n\n";
 
for(my $i=0; $i < (length($longorf5) -2) ; $i += 3) {
$codon = substr($longorf5,$i,3);
$orfprotein5 .= codon2aa($codon);
}
# print $orfprotein5;
 
# READING FRAME 6
# print "\n -------Reading Frame 6-------\n\n";
 
for(my $i=0; $i < (length($longorf6) -2) ; $i += 3) {
$codon = substr($longorf6,$i,3);
$orfprotein6 .= codon2aa($codon);
}
# print $orfprotein6;
 
 
 
#HTML OUTPUT
 
print "Content-type: text/html
 
<html>
<title>Page 2</title>
<body>
Reading Frame 1: $orfprotein1;<br>
Reading Frame 2: $orfprotein2;<br>
Reading Frame 3: $orfprotein3;<br>
Reading Frame 4: $orfprotein4;<br>
Reading Frame 5: $orfprotein5;<br>
Reading Frame 6: $orfprotein6;<br>
</body>
</html>
 
";

Open in new window

StephenMcGowan

ASKER

Just noticed something strange with this...

I assumed the difference between the Perl output and the textbox html entering output would have been due to the textbox itself.

When copying and pasting the DNA code and whacking it into the textbox, i was copying and pasting with the cursor was below the last line (see attached picture) which would return the incorrect sequences.

But when pressed backspace, going back to the last sequence entry and removing the last cursor line, frames 4,5 and 6 seem to read the same as the perl output... success!

Which leads to the issue if theres any way of perl taking this issue into account for text box entry? i.e. removing the last line:

accctgctg
atagatcta
|

if this happens to be the case?

And strangely, 4,5 and 6 have come out in the right order, where as 1,2 and 3 are still in the wrong orders?

the plot thickens!

Stephen.
textboxpic1.jpg

StephenMcGowan

ASKER

I'm guessing the text box issue can be resolved in perl:

>TESTSEQUENCE
atgtctgctgaagtcatccatcaggttg...
|

so pseudocode off the top of my head, the first line would need to be chopped:
if input data begins with ">" remove line

remove all blanks and spaces from the input data (this should resolve the text cursor problem)

so this should return data which starts at the first base of the sequence and finishes at the last base with nothing else.

just a little pseudocode off the top of my head, no idea on the actual code i'd use.

StephenMcGowan

ASKER

Here's where i think i'm going wrong:

ok so to begin with i feed in the data from the text box:
$dna1 = $query->param('dna-textbox');
then originally i'd have a sub-routine which would load a file (not textbox) and convert this into the array @file_data:

@file_data = get_file_data('testdnasequence');

subroutine:

sub get_file_data {

my($filename) = @_;

# Initialize variables
my @filedata = ( );

unless( open(GET_FILE_DATA, $filename) ) {
print STDERR "Cannot open file \"$filename\"\n\n";
exit;
}

@filedata = <GET_FILE_DATA>;

close GET_FILE_DATA;

return @filedata;
}

so this would be from a file and not a textbox input.

A further subroutine would then take the array and create a string which contained pure dna code... no spaces and no header, but this accepted an array!!

# A subroutine to extract FASTA sequence data from an array

sub extract_sequence_from_fasta_data {

my(@fasta_file_data) = @_;

use strict;
use warnings;

# Declare and initialise variables
my $sequence = '';

foreach my $line (@fasta_file_data) {

# discard blank line
if ($line =~ /^\s*$/) {
next;

# discard comment line
} elsif($line =~ /^\s*#/) {
next;

# discard fasta header line
} elsif($line =~ /^>/) {
next;

# keep line, add to sequence string
} else {
$sequence .= $line;
}
}

# remove non-sequence data (in this case, whitespace) from $sequence
# string
$sequence =~ s/\s//g;

return $sequence;

the problem i think i have is that:

$dna1 = $query->param('dna-textbox');

is calling the data a string, i'm then trying to run this into a subroutine which clears all header lines. whitespace etc but it is expecting an array:

# A subroutine to extract FASTA sequence data from an array

Is there any way of it accepting my $dna1 string instead? i'm guessing i can skip out the fileupload subroutine for testing files if i'm going with the textbox approach.

ASKER CERTIFIED SOLUTION

ozo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial