StephenMcGowan
asked on
HTML output different to it's Perl output
I've set up a basic web interface with a text box at:
http://biolinux.smith.man.ac.uk/~campus12sm/assessment.pl
>TESTSEQUENCE
atgtctgctgaagtcatccatcaggt tgaagaagca cttgatacag atgagaagga gatg
ctgcgggatgttgctatagatgtggt tccacctaat gtcagggacc ttgcgctcgt cgag
ctggatattttacgggaaagaggtaa gctgtctgtc ggggacttgg ctgaactgct ctac
agagtgaggcgatttgacctgctcaa acgtatcttg aagatggaca gaaaagctgt ggag
acccacctgctcaggaaccctcacct tgtttcggac tatagagtgc tgatggcaga gatt
ggtgaggatttggataaatctgatgt gtcctcatta attttcctca tgaaggatta catg
ggccgaggcaagataagcaaggagaa gagtttcttg gaccttgtgg ttgagttgga gaaa
ctaaatctggttgccccagatcaact ggatttatta gaaaaatgcc taaagaacat ccac
agaatagacctgaagacaaaaatcca gaagtacaag cagtctgttc aaggagcagg gaca
agttacaggaatgttctccaagcagc aatccaaaag agtctcaagg atccttcaaa taac
ttcaggctccataatgggagaagtaa agaacaaaga cttaaggaac agcttggcgc tcaa
caagaaccagtgaagaaatccattca ggaatcagaa gcttttttgc ctcagagcat acct
gaagagagatacaagatgaagagcaa gcccctagga atctgcctga taatcgattg catt
ggcaatgagacagagcttcttcgaga caccttcact tccctgggct atgaagtcca gaaa
ttcttgcatctcagtatgcatggtat atcccagatt cttggccaat ttgcctgtat gccc
gagcaccgagactacgacagctttgt gtgtgtcctg gtgagccgag gaggctccca gagt
gtgtatggtgtggatcagactcactc agggctcccc ctgcatcaca tcaggaggat gttc
atgggagattcatgcccttatctagc agggaagcca aagatgtttt ttattcagaa ctat
You should be able to copy/paste the above code into the text box and hit submit to return 6 DIFFERENT sequences. If i run this through Perl, calling the same dna sequence via file, this works fine:
Reading Frame 1: SAEVIHQVEEALDTDEKEMLRDVAID VVPPNVRDLA LVELDILRER GKLSVGDLAE LLYRVRRFDL LKRILKMDRK AVETHLLRNP HLVSDYRVLM AEIGEDLDKS DVSSLIFLMK DYMGRGKISK EKSFLDLVVE LEKLNLVAPD QLDLLEKCLK NIHRIDLKTK IQKYKQSVQG AGTSYRNVLQ AAIQKSLKDP SNNFRLHNGR SKEQRLKEQL GAQQEPVKKS IQESEAFLPQ SIPEERYKMK SKPLGICLII DCIGNETELL RDTFTSLGYE VQKFLHLSMH GISQILGQFA CMPEHRDYDS FVCVLVSRGG SQSVYGVDQT HSGLPLHHIR RMFMGDSCPY LAGKPKMFFI QNYVVSEGQL EDSSLLEVDG PAMKNVEFKA QKRGLCTVHR EADFFWSLCT ADMSLLEQSH SSPSLYLQCL SQKLRQERKR PLLDLHIELN GYMYDWNSRV SAKEKYYVWL QHTLRKKLIL SYT_;<br>
Reading Frame 2: RQSFFETPSLPWAMKSRNSCISVCMV YPRFLANLPV CPSTETTTAL CVSW_;<br>
Reading Frame 3: PLSSREAKDVFYSELCGVRGPAGGQQ PLGGGWASDE ECGIQGSEAR AVHSSPRS_; <br>
Reading Frame 4: HTEMQEFLDFIAQGSEGVSKKLCLIA NAIDYQADS_ ;<br>
Reading Frame 5: QSIIRQIPRGLLFILYLSSGML_;<b r>
Reading Frame 6: RTHQIYPNPHQSLPSALYSPKQGEGS _;<br>
but if this is run through HTML, i receive sequences 4, 5 and 6 being exactly the same which shouldnt be the case. In fact none of the sequences are in the order shown above which i find to be bizarre!
My file processing script is shown below:
Thanks
http://biolinux.smith.man.ac.uk/~campus12sm/assessment.pl
>TESTSEQUENCE
atgtctgctgaagtcatccatcaggt
ctgcgggatgttgctatagatgtggt
ctggatattttacgggaaagaggtaa
agagtgaggcgatttgacctgctcaa
acccacctgctcaggaaccctcacct
ggtgaggatttggataaatctgatgt
ggccgaggcaagataagcaaggagaa
ctaaatctggttgccccagatcaact
agaatagacctgaagacaaaaatcca
agttacaggaatgttctccaagcagc
ttcaggctccataatgggagaagtaa
caagaaccagtgaagaaatccattca
gaagagagatacaagatgaagagcaa
ggcaatgagacagagcttcttcgaga
ttcttgcatctcagtatgcatggtat
gagcaccgagactacgacagctttgt
gtgtatggtgtggatcagactcactc
atgggagattcatgcccttatctagc
You should be able to copy/paste the above code into the text box and hit submit to return 6 DIFFERENT sequences. If i run this through Perl, calling the same dna sequence via file, this works fine:
Reading Frame 1: SAEVIHQVEEALDTDEKEMLRDVAID
Reading Frame 2: RQSFFETPSLPWAMKSRNSCISVCMV
Reading Frame 3: PLSSREAKDVFYSELCGVRGPAGGQQ
Reading Frame 4: HTEMQEFLDFIAQGSEGVSKKLCLIA
Reading Frame 5: QSIIRQIPRGLLFILYLSSGML_;<b
Reading Frame 6: RTHQIYPNPHQSLPSALYSPKQGEGS
but if this is run through HTML, i receive sequences 4, 5 and 6 being exactly the same which shouldnt be the case. In fact none of the sequences are in the order shown above which i find to be bizarre!
My file processing script is shown below:
Thanks
#!/usr/bin/perl -w
# Perl programme to take the Reading Frame Sequences from
# "ORFfinder.pl", convert the DNA sequences into protein sequences and
# cleave the protein sequence depending on the option selected by
# user
use strict;
use warnings;
use ReadingFrameModules;
use CGI; # a predefined module
my $query = new CGI;
# Initialise variables
my $orfprotein1 = '';
my $orfprotein2 = '';
my $orfprotein3 = '';
my $orfprotein4 = '';
my $orfprotein5 = '';
my $orfprotein6 = '';
my $codon;
my($longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6)=@ARGV;
# Convert DNA sequence to Protein sequence - Translate each three base
# codon into an amino acid, and append to the protein
# READING FRAME 1
for(my $i=0; $i < (length($longorf1) -2) ; $i += 3) {
$codon = substr($longorf1,$i,3);
$orfprotein1 .= codon2aa($codon);
}
# READING FRAME 2
for(my $i=0; $i < (length($longorf2) -2) ; $i += 3) {
$codon = substr($longorf2,$i,3);
$orfprotein2 .= codon2aa($codon);
}
# READING FRAME 3
for(my $i=0; $i < (length($longorf3) -2) ; $i += 3) {
$codon = substr($longorf3,$i,3);
$orfprotein3 .= codon2aa($codon);
}
# READING FRAME 4
for(my $i=0; $i < (length($longorf4) -2) ; $i += 3) {
$codon = substr($longorf4,$i,3);
$orfprotein4 .= codon2aa($codon);
}
# READING FRAME 5
for(my $i=0; $i < (length($longorf5) -2) ; $i += 3) {
$codon = substr($longorf5,$i,3);
$orfprotein5 .= codon2aa($codon);
}
# READING FRAME 6
for(my $i=0; $i < (length($longorf6) -2) ; $i += 3) {
$codon = substr($longorf6,$i,3);
$orfprotein6 .= codon2aa($codon);
}
#HTML OUTPUT
print "Content-type: text/html
<html>
<title>Page 2</title>
<body>
Reading Frame 1: $orfprotein1;<br>
Reading Frame 2: $orfprotein2;<br>
Reading Frame 3: $orfprotein3;<br>
Reading Frame 4: $orfprotein4;<br>
Reading Frame 5: $orfprotein5;<br>
Reading Frame 6: $orfprotein6;<br>
</body>
</html>
";
How do you pass @ARGV to the program when it is run through HTML?
ASKER
Hey ozo,
Sorry, how do you mean?
My assumption was that:
my($longorf1,$longorf2,$lo ngorf3,$lo ngorf4,$lo ngorf5,$lo ngorf6)=@A RGV;
brought the $longorfs generated from my first script and imported them into this script for me to then play around with them and then create new html page shown at the bottom of the script.
So basically script 1 feeds in the dna sequences from my form and generates 6 $longorfs
script 2 generates 6 $orfproteins from the $longorfs and reports this as a new HTML page
hope this helps
Sorry, how do you mean?
My assumption was that:
my($longorf1,$longorf2,$lo
brought the $longorfs generated from my first script and imported them into this script for me to then play around with them and then create new html page shown at the bottom of the script.
So basically script 1 feeds in the dna sequences from my form and generates 6 $longorfs
script 2 generates 6 $orfproteins from the $longorfs and reports this as a new HTML page
hope this helps
Are you calling from your first script, or through HTML?
ASKER
Ahh right, calling from my first script:
-----FIRST SCRIPT-----
#!/usr/bin/perl -w
# Perl programme to read in FastA format to find all possible open
# reading frames (ORFS) beginning with ATG and ending with a stop codon,
# TGA, TAA, TAG)
# Analyse all six open reading frames and predict ORFS in all six. Only
# longest ORF will be used.
use strict;
use warnings;
use ReadingFrameModules;
use CGI;
my $query = new CGI;
# Initialise variables
my @file_data = ();
my $dna = '';
my $dna2 = '';
my $dna3 = '';
my $dna5 = '';
my $dna6 = '';
my $revcom = '';
my $revcom1 = '';
my $revcom2 = '';
my $longorf1 = '';
my $longorf2 = '';
my $longorf3 = '';
my $longorf4 = '';
my $longorf5 = '';
my $longorf6 = '';
$dna = $query->param('dna-textbox ');
# Extract the sequence data from the contents of the Fasta file
# $dna = extract_sequence_from_fast a_data(@fi le_data);
# feed the dna data into open_reading_frame to return the longest ORF
# print "\n -------Reading Frame 1-------\n\n";
$longorf1 = open_reading_frame($dna);
# print $longorf1;
# print "\n -------Reading Frame 2-------\n\n";
# remove first base from sequence
$dna2 = substr $dna, 1;
$longorf2 = open_reading_frame($dna2);
# print $longorf2;
# print "\n -------Reading Frame 3-------\n\n";
# remove first base from $dna2
$dna3 = substr $dna2, 1;
$longorf3 = open_reading_frame($dna3);
# print $longorf3;
#Reverse compliment the DNA sequence
$revcom = revcom($dna);
# print "\n -------Reading Frame 4-------\n\n";
# print $revcom;
$longorf4 = open_reading_frame($revcom );
# print $longorf4;
# print "\n -------Reading Frame 5-------\n\n";
#remove first base from sequence
$dna5 = substr $revcom, 1;
$longorf5 = open_reading_frame($dna5);
# print $longorf5;
# print "\n -------Reading Frame 6-------\n\n";
#remove a further base from the sequence
$dna6 = substr $dna5, 1;
$longorf6 = open_reading_frame($dna6);
# print $longorf6;
#Transfer Open Reading Frames over to ProteinDigest
system './proteindigest.pl', $longorf1,$longorf2,$longo rf3,$longo rf4,$longo rf5,$longo rf6;
-----FIRST SCRIPT-----
#!/usr/bin/perl -w
# Perl programme to read in FastA format to find all possible open
# reading frames (ORFS) beginning with ATG and ending with a stop codon,
# TGA, TAA, TAG)
# Analyse all six open reading frames and predict ORFS in all six. Only
# longest ORF will be used.
use strict;
use warnings;
use ReadingFrameModules;
use CGI;
my $query = new CGI;
# Initialise variables
my @file_data = ();
my $dna = '';
my $dna2 = '';
my $dna3 = '';
my $dna5 = '';
my $dna6 = '';
my $revcom = '';
my $revcom1 = '';
my $revcom2 = '';
my $longorf1 = '';
my $longorf2 = '';
my $longorf3 = '';
my $longorf4 = '';
my $longorf5 = '';
my $longorf6 = '';
$dna = $query->param('dna-textbox
# Extract the sequence data from the contents of the Fasta file
# $dna = extract_sequence_from_fast
# feed the dna data into open_reading_frame to return the longest ORF
# print "\n -------Reading Frame 1-------\n\n";
$longorf1 = open_reading_frame($dna);
# print $longorf1;
# print "\n -------Reading Frame 2-------\n\n";
# remove first base from sequence
$dna2 = substr $dna, 1;
$longorf2 = open_reading_frame($dna2);
# print $longorf2;
# print "\n -------Reading Frame 3-------\n\n";
# remove first base from $dna2
$dna3 = substr $dna2, 1;
$longorf3 = open_reading_frame($dna3);
# print $longorf3;
#Reverse compliment the DNA sequence
$revcom = revcom($dna);
# print "\n -------Reading Frame 4-------\n\n";
# print $revcom;
$longorf4 = open_reading_frame($revcom
# print $longorf4;
# print "\n -------Reading Frame 5-------\n\n";
#remove first base from sequence
$dna5 = substr $revcom, 1;
$longorf5 = open_reading_frame($dna5);
# print $longorf5;
# print "\n -------Reading Frame 6-------\n\n";
#remove a further base from the sequence
$dna6 = substr $dna5, 1;
$longorf6 = open_reading_frame($dna6);
# print $longorf6;
#Transfer Open Reading Frames over to ProteinDigest
system './proteindigest.pl', $longorf1,$longorf2,$longo
----SCRIPT 2----
#!/usr/bin/perl -w
# Perl programme to take the Reading Frame Sequences from
# "ORFfinder.pl", convert the DNA sequences into protein sequences and
# cleave the protein sequence depending on the option selected by
# user
use strict;
use warnings;
use ReadingFrameModules;
use CGI; # a predefined module
my $query = new CGI;
# my $enzyme = [FILL THIS OUT....]
# Initialise variables
my $orfprotein1 = '';
my $orfprotein2 = '';
my $orfprotein3 = '';
my $orfprotein4 = '';
my $orfprotein5 = '';
my $orfprotein6 = '';
my $codon;
my($longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6)=@ARGV;
# Convert DNA sequence to Protein sequence - Translate each three base
# codon into an amino acid, and append to the protein
# READING FRAME 1
# print "\n -------Reading Frame 1-------\n\n";
for(my $i=0; $i < (length($longorf1) -2) ; $i += 3) {
$codon = substr($longorf1,$i,3);
$orfprotein1 .= codon2aa($codon);
}
# print $orfprotein1;
# READING FRAME 2
# print "\n -------Reading Frame 2-------\n\n";
for(my $i=0; $i < (length($longorf2) -2) ; $i += 3) {
$codon = substr($longorf2,$i,3);
$orfprotein2 .= codon2aa($codon);
}
# print $orfprotein2;
# READING FRAME 3
# print "\n -------Reading Frame 3-------\n\n";
for(my $i=0; $i < (length($longorf3) -2) ; $i += 3) {
$codon = substr($longorf3,$i,3);
$orfprotein3 .= codon2aa($codon);
}
# print $orfprotein3;
# READING FRAME 4
# print "\n -------Reading Frame 4-------\n\n";
for(my $i=0; $i < (length($longorf4) -2) ; $i += 3) {
$codon = substr($longorf4,$i,3);
$orfprotein4 .= codon2aa($codon);
}
# print $orfprotein4;
# READING FRAME 5
# print "\n -------Reading Frame 5-------\n\n";
for(my $i=0; $i < (length($longorf5) -2) ; $i += 3) {
$codon = substr($longorf5,$i,3);
$orfprotein5 .= codon2aa($codon);
}
# print $orfprotein5;
# READING FRAME 6
# print "\n -------Reading Frame 6-------\n\n";
for(my $i=0; $i < (length($longorf6) -2) ; $i += 3) {
$codon = substr($longorf6,$i,3);
$orfprotein6 .= codon2aa($codon);
}
# print $orfprotein6;
#HTML OUTPUT
print "Content-type: text/html
<html>
<title>Page 2</title>
<body>
Reading Frame 1: $orfprotein1;<br>
Reading Frame 2: $orfprotein2;<br>
Reading Frame 3: $orfprotein3;<br>
Reading Frame 4: $orfprotein4;<br>
Reading Frame 5: $orfprotein5;<br>
Reading Frame 6: $orfprotein6;<br>
</body>
</html>
";
ASKER
Just noticed something strange with this...
I assumed the difference between the Perl output and the textbox html entering output would have been due to the textbox itself.
When copying and pasting the DNA code and whacking it into the textbox, i was copying and pasting with the cursor was below the last line (see attached picture) which would return the incorrect sequences.
But when pressed backspace, going back to the last sequence entry and removing the last cursor line, frames 4,5 and 6 seem to read the same as the perl output... success!
Which leads to the issue if theres any way of perl taking this issue into account for text box entry? i.e. removing the last line:
accctgctg
atagatcta
|
if this happens to be the case?
And strangely, 4,5 and 6 have come out in the right order, where as 1,2 and 3 are still in the wrong orders?
the plot thickens!
Stephen.
textboxpic1.jpg
I assumed the difference between the Perl output and the textbox html entering output would have been due to the textbox itself.
When copying and pasting the DNA code and whacking it into the textbox, i was copying and pasting with the cursor was below the last line (see attached picture) which would return the incorrect sequences.
But when pressed backspace, going back to the last sequence entry and removing the last cursor line, frames 4,5 and 6 seem to read the same as the perl output... success!
Which leads to the issue if theres any way of perl taking this issue into account for text box entry? i.e. removing the last line:
accctgctg
atagatcta
|
if this happens to be the case?
And strangely, 4,5 and 6 have come out in the right order, where as 1,2 and 3 are still in the wrong orders?
the plot thickens!
Stephen.
textboxpic1.jpg
ASKER
I'm guessing the text box issue can be resolved in perl:
>TESTSEQUENCE
atgtctgctgaagtcatccatcaggt tg...
|
so pseudocode off the top of my head, the first line would need to be chopped:
if input data begins with ">" remove line
remove all blanks and spaces from the input data (this should resolve the text cursor problem)
so this should return data which starts at the first base of the sequence and finishes at the last base with nothing else.
just a little pseudocode off the top of my head, no idea on the actual code i'd use.
>TESTSEQUENCE
atgtctgctgaagtcatccatcaggt
|
so pseudocode off the top of my head, the first line would need to be chopped:
if input data begins with ">" remove line
remove all blanks and spaces from the input data (this should resolve the text cursor problem)
so this should return data which starts at the first base of the sequence and finishes at the last base with nothing else.
just a little pseudocode off the top of my head, no idea on the actual code i'd use.
ASKER
Here's where i think i'm going wrong:
ok so to begin with i feed in the data from the text box:
$dna1 = $query->param('dna-textbox ');
then originally i'd have a sub-routine which would load a file (not textbox) and convert this into the array @file_data:
@file_data = get_file_data('testdnasequ ence');
subroutine:
sub get_file_data {
my($filename) = @_;
# Initialize variables
my @filedata = ( );
unless( open(GET_FILE_DATA, $filename) ) {
print STDERR "Cannot open file \"$filename\"\n\n";
exit;
}
@filedata = <GET_FILE_DATA>;
close GET_FILE_DATA;
return @filedata;
}
so this would be from a file and not a textbox input.
A further subroutine would then take the array and create a string which contained pure dna code... no spaces and no header, but this accepted an array!!
# A subroutine to extract FASTA sequence data from an array
sub extract_sequence_from_fast a_data {
my(@fasta_file_data) = @_;
use strict;
use warnings;
# Declare and initialise variables
my $sequence = '';
foreach my $line (@fasta_file_data) {
# discard blank line
if ($line =~ /^\s*$/) {
next;
# discard comment line
} elsif($line =~ /^\s*#/) {
next;
# discard fasta header line
} elsif($line =~ /^>/) {
next;
# keep line, add to sequence string
} else {
$sequence .= $line;
}
}
# remove non-sequence data (in this case, whitespace) from $sequence
# string
$sequence =~ s/\s//g;
return $sequence;
the problem i think i have is that:
$dna1 = $query->param('dna-textbox ');
is calling the data a string, i'm then trying to run this into a subroutine which clears all header lines. whitespace etc but it is expecting an array:
# A subroutine to extract FASTA sequence data from an array
Is there any way of it accepting my $dna1 string instead? i'm guessing i can skip out the fileupload subroutine for testing files if i'm going with the textbox approach.
ok so to begin with i feed in the data from the text box:
$dna1 = $query->param('dna-textbox
then originally i'd have a sub-routine which would load a file (not textbox) and convert this into the array @file_data:
@file_data = get_file_data('testdnasequ
subroutine:
sub get_file_data {
my($filename) = @_;
# Initialize variables
my @filedata = ( );
unless( open(GET_FILE_DATA, $filename) ) {
print STDERR "Cannot open file \"$filename\"\n\n";
exit;
}
@filedata = <GET_FILE_DATA>;
close GET_FILE_DATA;
return @filedata;
}
so this would be from a file and not a textbox input.
A further subroutine would then take the array and create a string which contained pure dna code... no spaces and no header, but this accepted an array!!
# A subroutine to extract FASTA sequence data from an array
sub extract_sequence_from_fast
my(@fasta_file_data) = @_;
use strict;
use warnings;
# Declare and initialise variables
my $sequence = '';
foreach my $line (@fasta_file_data) {
# discard blank line
if ($line =~ /^\s*$/) {
next;
# discard comment line
} elsif($line =~ /^\s*#/) {
next;
# discard fasta header line
} elsif($line =~ /^>/) {
next;
# keep line, add to sequence string
} else {
$sequence .= $line;
}
}
# remove non-sequence data (in this case, whitespace) from $sequence
# string
$sequence =~ s/\s//g;
return $sequence;
the problem i think i have is that:
$dna1 = $query->param('dna-textbox
is calling the data a string, i'm then trying to run this into a subroutine which clears all header lines. whitespace etc but it is expecting an array:
# A subroutine to extract FASTA sequence data from an array
Is there any way of it accepting my $dna1 string instead? i'm guessing i can skip out the fileupload subroutine for testing files if i'm going with the textbox approach.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.