Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

HTML output different to it's Perl output

Posted on 2009-05-11
8
Medium Priority
?
215 Views
Last Modified: 2012-06-27
I've set up a basic web interface with a text box at:
http://biolinux.smith.man.ac.uk/~campus12sm/assessment.pl

>TESTSEQUENCE
atgtctgctgaagtcatccatcaggttgaagaagcacttgatacagatgagaaggagatg
ctgcgggatgttgctatagatgtggttccacctaatgtcagggaccttgcgctcgtcgag
ctggatattttacgggaaagaggtaagctgtctgtcggggacttggctgaactgctctac
agagtgaggcgatttgacctgctcaaacgtatcttgaagatggacagaaaagctgtggag
acccacctgctcaggaaccctcaccttgtttcggactatagagtgctgatggcagagatt
ggtgaggatttggataaatctgatgtgtcctcattaattttcctcatgaaggattacatg
ggccgaggcaagataagcaaggagaagagtttcttggaccttgtggttgagttggagaaa
ctaaatctggttgccccagatcaactggatttattagaaaaatgcctaaagaacatccac
agaatagacctgaagacaaaaatccagaagtacaagcagtctgttcaaggagcagggaca
agttacaggaatgttctccaagcagcaatccaaaagagtctcaaggatccttcaaataac
ttcaggctccataatgggagaagtaaagaacaaagacttaaggaacagcttggcgctcaa
caagaaccagtgaagaaatccattcaggaatcagaagcttttttgcctcagagcatacct
gaagagagatacaagatgaagagcaagcccctaggaatctgcctgataatcgattgcatt
ggcaatgagacagagcttcttcgagacaccttcacttccctgggctatgaagtccagaaa
ttcttgcatctcagtatgcatggtatatcccagattcttggccaatttgcctgtatgccc
gagcaccgagactacgacagctttgtgtgtgtcctggtgagccgaggaggctcccagagt
gtgtatggtgtggatcagactcactcagggctccccctgcatcacatcaggaggatgttc
atgggagattcatgcccttatctagcagggaagccaaagatgttttttattcagaactat


You should be able to copy/paste the above code into the text box and hit submit to return 6 DIFFERENT sequences. If i run this through Perl, calling the same dna sequence via file, this works fine:

Reading Frame 1: SAEVIHQVEEALDTDEKEMLRDVAIDVVPPNVRDLALVELDILRERGKLSVGDLAELLYRVRRFDLLKRILKMDRKAVETHLLRNPHLVSDYRVLMAEIGEDLDKSDVSSLIFLMKDYMGRGKISKEKSFLDLVVELEKLNLVAPDQLDLLEKCLKNIHRIDLKTKIQKYKQSVQGAGTSYRNVLQAAIQKSLKDPSNNFRLHNGRSKEQRLKEQLGAQQEPVKKSIQESEAFLPQSIPEERYKMKSKPLGICLIIDCIGNETELLRDTFTSLGYEVQKFLHLSMHGISQILGQFACMPEHRDYDSFVCVLVSRGGSQSVYGVDQTHSGLPLHHIRRMFMGDSCPYLAGKPKMFFIQNYVVSEGQLEDSSLLEVDGPAMKNVEFKAQKRGLCTVHREADFFWSLCTADMSLLEQSHSSPSLYLQCLSQKLRQERKRPLLDLHIELNGYMYDWNSRVSAKEKYYVWLQHTLRKKLILSYT_;<br>
Reading Frame 2: RQSFFETPSLPWAMKSRNSCISVCMVYPRFLANLPVCPSTETTTALCVSW_;<br>
Reading Frame 3: PLSSREAKDVFYSELCGVRGPAGGQQPLGGGWASDEECGIQGSEARAVHSSPRS_;<br>
Reading Frame 4: HTEMQEFLDFIAQGSEGVSKKLCLIANAIDYQADS_;<br>
Reading Frame 5: QSIIRQIPRGLLFILYLSSGML_;<br>
Reading Frame 6: RTHQIYPNPHQSLPSALYSPKQGEGS_;<br>

but if this is run through HTML, i receive sequences 4, 5 and 6 being exactly the same which shouldnt be the case. In fact none of the sequences are in the order shown above which i find to be bizarre!

My file processing script is shown below:

Thanks
#!/usr/bin/perl -w
# Perl programme to take the Reading Frame Sequences from
# "ORFfinder.pl", convert the DNA sequences into protein sequences and
# cleave the protein sequence depending on the option selected by
# user
 
use strict;
use warnings;
use ReadingFrameModules;
 
use CGI; # a predefined module
my $query = new CGI;
 
# Initialise variables
my $orfprotein1 = '';
my $orfprotein2 = '';
my $orfprotein3 = '';
my $orfprotein4 = '';
my $orfprotein5 = '';
my $orfprotein6 = '';
my $codon;
 
 
my($longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6)=@ARGV;
 
 
# Convert DNA sequence to Protein sequence - Translate each three base
# codon into an amino acid, and append to the protein
 
# READING FRAME 1
 
for(my $i=0; $i < (length($longorf1) -2) ; $i += 3) {
$codon = substr($longorf1,$i,3);
$orfprotein1 .= codon2aa($codon);
}
 
# READING FRAME 2
 
for(my $i=0; $i < (length($longorf2) -2) ; $i += 3) {
$codon = substr($longorf2,$i,3);
$orfprotein2 .= codon2aa($codon);
}
 
# READING FRAME 3
 
for(my $i=0; $i < (length($longorf3) -2) ; $i += 3) {
$codon = substr($longorf3,$i,3);
$orfprotein3 .= codon2aa($codon);
}
 
# READING FRAME 4
 
for(my $i=0; $i < (length($longorf4) -2) ; $i += 3) {
$codon = substr($longorf4,$i,3);
$orfprotein4 .= codon2aa($codon);
}
 
# READING FRAME 5
 
for(my $i=0; $i < (length($longorf5) -2) ; $i += 3) {
$codon = substr($longorf5,$i,3);
$orfprotein5 .= codon2aa($codon);
}
 
# READING FRAME 6
 
for(my $i=0; $i < (length($longorf6) -2) ; $i += 3) {
$codon = substr($longorf6,$i,3);
$orfprotein6 .= codon2aa($codon);
}
 
#HTML OUTPUT
 
print "Content-type: text/html
 
<html>
<title>Page 2</title>
<body>
Reading Frame 1: $orfprotein1;<br>
Reading Frame 2: $orfprotein2;<br>
Reading Frame 3: $orfprotein3;<br>
Reading Frame 4: $orfprotein4;<br>
Reading Frame 5: $orfprotein5;<br>
Reading Frame 6: $orfprotein6;<br>
</body>
</html>
 
";

Open in new window

0
Comment
Question by:StephenMcGowan
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
8 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 24360315
How do you pass @ARGV to the program when it is  run through HTML?
0
 

Author Comment

by:StephenMcGowan
ID: 24360353
Hey ozo,

Sorry, how do you mean?

My assumption was that:
my($longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6)=@ARGV;

brought the $longorfs generated from my first script and imported them into this script for me to then play around with them and then create new html page shown at the bottom of the script.

So basically script 1 feeds in the dna sequences  from my form and generates 6 $longorfs
                    script 2 generates 6 $orfproteins from the $longorfs and reports this as a new HTML page

hope this helps
0
 
LVL 84

Expert Comment

by:ozo
ID: 24360380
Are you calling from your first script, or through HTML?
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 

Author Comment

by:StephenMcGowan
ID: 24360418
Ahh right, calling from my first script:

-----FIRST SCRIPT-----

#!/usr/bin/perl -w
# Perl programme to read in FastA format to find all possible open
# reading frames (ORFS) beginning with ATG and ending with a stop codon,
# TGA, TAA, TAG)

# Analyse all six open reading frames and predict ORFS in all six. Only
# longest ORF will be used.

use strict;
use warnings;
use ReadingFrameModules;
use CGI;
my $query = new CGI;

# Initialise variables
my @file_data = ();
my $dna = '';
my $dna2 = '';
my $dna3 = '';
my $dna5 = '';
my $dna6 = '';
my $revcom = '';
my $revcom1 = '';
my $revcom2 = '';
my $longorf1 = '';
my $longorf2 = '';
my $longorf3 = '';
my $longorf4 = '';
my $longorf5 = '';
my $longorf6 = '';

$dna = $query->param('dna-textbox');

# Extract the sequence data from the contents of the Fasta file

# $dna = extract_sequence_from_fasta_data(@file_data);

# feed the dna data into open_reading_frame to return the longest ORF

# print "\n -------Reading Frame 1-------\n\n";
$longorf1 = open_reading_frame($dna);
# print $longorf1;

# print "\n -------Reading Frame 2-------\n\n";
# remove first base from sequence
$dna2 = substr $dna, 1;
$longorf2 = open_reading_frame($dna2);
# print $longorf2;

# print "\n -------Reading Frame 3-------\n\n";
# remove first base from $dna2
$dna3 = substr $dna2, 1;
$longorf3 = open_reading_frame($dna3);
# print $longorf3;

#Reverse compliment the DNA sequence
$revcom = revcom($dna);

# print "\n -------Reading Frame 4-------\n\n";
# print $revcom;
$longorf4 = open_reading_frame($revcom);
# print $longorf4;

# print "\n -------Reading Frame 5-------\n\n";
#remove first base from sequence
$dna5 = substr $revcom, 1;
$longorf5 = open_reading_frame($dna5);
# print $longorf5;

# print "\n -------Reading Frame 6-------\n\n";
#remove a further base from the sequence
$dna6 = substr $dna5, 1;
$longorf6 = open_reading_frame($dna6);
# print $longorf6;

#Transfer Open Reading Frames over to ProteinDigest
system './proteindigest.pl', $longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6;



----SCRIPT 2----
 
#!/usr/bin/perl -w
# Perl programme to take the Reading Frame Sequences from
# "ORFfinder.pl", convert the DNA sequences into protein sequences and
# cleave the protein sequence depending on the option selected by
# user
 
use strict;
use warnings;
use ReadingFrameModules;
 
use CGI; # a predefined module
my $query = new CGI;
 
# my $enzyme = [FILL THIS OUT....]
 
 
# Initialise variables
my $orfprotein1 = '';
my $orfprotein2 = '';
my $orfprotein3 = '';
my $orfprotein4 = '';
my $orfprotein5 = '';
my $orfprotein6 = '';
my $codon;
 
 
my($longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6)=@ARGV;
 
 
# Convert DNA sequence to Protein sequence - Translate each three base
# codon into an amino acid, and append to the protein
 
# READING FRAME 1
# print "\n -------Reading Frame 1-------\n\n";
 
for(my $i=0; $i < (length($longorf1) -2) ; $i += 3) {
$codon = substr($longorf1,$i,3);
$orfprotein1 .= codon2aa($codon);
}
# print $orfprotein1;
 
# READING FRAME 2
# print "\n -------Reading Frame 2-------\n\n";
 
for(my $i=0; $i < (length($longorf2) -2) ; $i += 3) {
$codon = substr($longorf2,$i,3);
$orfprotein2 .= codon2aa($codon);
}
# print $orfprotein2;
 
# READING FRAME 3
# print "\n -------Reading Frame 3-------\n\n";
 
for(my $i=0; $i < (length($longorf3) -2) ; $i += 3) {
$codon = substr($longorf3,$i,3);
$orfprotein3 .= codon2aa($codon);
}
# print $orfprotein3;
 
# READING FRAME 4
# print "\n -------Reading Frame 4-------\n\n";
 
for(my $i=0; $i < (length($longorf4) -2) ; $i += 3) {
$codon = substr($longorf4,$i,3);
$orfprotein4 .= codon2aa($codon);
}
# print $orfprotein4;
 
# READING FRAME 5
# print "\n -------Reading Frame 5-------\n\n";
 
for(my $i=0; $i < (length($longorf5) -2) ; $i += 3) {
$codon = substr($longorf5,$i,3);
$orfprotein5 .= codon2aa($codon);
}
# print $orfprotein5;
 
# READING FRAME 6
# print "\n -------Reading Frame 6-------\n\n";
 
for(my $i=0; $i < (length($longorf6) -2) ; $i += 3) {
$codon = substr($longorf6,$i,3);
$orfprotein6 .= codon2aa($codon);
}
# print $orfprotein6;
 
 
 
#HTML OUTPUT
 
print "Content-type: text/html
 
<html>
<title>Page 2</title>
<body>
Reading Frame 1: $orfprotein1;<br>
Reading Frame 2: $orfprotein2;<br>
Reading Frame 3: $orfprotein3;<br>
Reading Frame 4: $orfprotein4;<br>
Reading Frame 5: $orfprotein5;<br>
Reading Frame 6: $orfprotein6;<br>
</body>
</html>
 
";

Open in new window

0
 

Author Comment

by:StephenMcGowan
ID: 24360662
Just noticed something strange with this...

I assumed the difference between the Perl output and the textbox html entering output would have been due to the textbox itself.

When copying and pasting the DNA code and whacking it into the textbox, i was copying and pasting with the cursor was below the last line (see attached picture) which would return the incorrect sequences.

But when pressed backspace, going back to the last sequence entry and removing the last cursor line, frames 4,5 and 6 seem to read the same as the perl output... success!

Which leads to the issue if theres any way of perl taking this issue into account for text box entry? i.e. removing the last line:
 
accctgctg
atagatcta
|

if this happens to be the case?

And strangely, 4,5 and 6 have come out in the right order, where as 1,2 and 3 are still in the wrong orders?

the plot thickens!

Stephen.
textboxpic1.jpg
0
 

Author Comment

by:StephenMcGowan
ID: 24360709
I'm guessing the text box issue can be resolved in perl:

>TESTSEQUENCE
atgtctgctgaagtcatccatcaggttg...
|

so pseudocode off the top of my head, the first line would need to be chopped:
if input data begins with ">" remove line

remove all blanks and spaces from the input data (this should resolve the text cursor problem)

so this should return data which starts at the first base of the sequence and finishes at the last base with nothing else.

just a little pseudocode off the top of my head, no idea on the actual code i'd use.
0
 

Author Comment

by:StephenMcGowan
ID: 24360895
Here's where i think i'm going wrong:

ok so to begin with i feed in the data from the text box:
$dna1 = $query->param('dna-textbox');
then originally i'd have a sub-routine which would load a file (not textbox) and convert this into the array @file_data:

@file_data = get_file_data('testdnasequence');

subroutine:

sub get_file_data {

    my($filename) = @_;

    # Initialize variables
    my @filedata = (  );

    unless( open(GET_FILE_DATA, $filename) ) {
        print STDERR "Cannot open file \"$filename\"\n\n";
        exit;
    }

    @filedata = <GET_FILE_DATA>;

    close GET_FILE_DATA;

    return @filedata;
}

so this would be from a file and not a textbox input.

A further subroutine would then take the array and create a string which contained pure dna code... no spaces and no header, but this accepted an array!!

# A subroutine to extract FASTA sequence data from an array

sub extract_sequence_from_fasta_data {

    my(@fasta_file_data) = @_;

    use strict;
    use warnings;

    # Declare and initialise variables
    my $sequence = '';

    foreach my $line (@fasta_file_data) {

        # discard blank line
        if ($line =~ /^\s*$/) {
            next;

        # discard comment line
        } elsif($line =~ /^\s*#/) {
            next;

        # discard fasta header line
        } elsif($line =~ /^>/) {
            next;

        # keep line, add to sequence string
        } else {
            $sequence .= $line;
        }
    }

    # remove non-sequence data (in this case, whitespace) from $sequence
    # string
    $sequence =~ s/\s//g;

    return $sequence;


the problem i think i have is that:

$dna1 = $query->param('dna-textbox');

is calling the data a string, i'm then trying to run this into a subroutine which clears all header lines. whitespace etc but it is expecting an array:

# A subroutine to extract FASTA sequence data from an array

Is there any way of it accepting my $dna1 string instead? i'm guessing i can skip out the fileupload subroutine for testing files if i'm going with the textbox approach.
0
 
LVL 84

Accepted Solution

by:
ozo earned 2000 total points
ID: 24361985
sub extract_sequence_from_fasta_data {
  local $_ = join'',@_;
      s/^>.*//gm;
      s/^\s*#.*//gm;
      s/\s+//g;
     return $_;
 }
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question