Solved

HTML output different to it's Perl output

Posted on 2009-05-11
8
207 Views
Last Modified: 2012-06-27
I've set up a basic web interface with a text box at:
http://biolinux.smith.man.ac.uk/~campus12sm/assessment.pl

>TESTSEQUENCE
atgtctgctgaagtcatccatcaggttgaagaagcacttgatacagatgagaaggagatg
ctgcgggatgttgctatagatgtggttccacctaatgtcagggaccttgcgctcgtcgag
ctggatattttacgggaaagaggtaagctgtctgtcggggacttggctgaactgctctac
agagtgaggcgatttgacctgctcaaacgtatcttgaagatggacagaaaagctgtggag
acccacctgctcaggaaccctcaccttgtttcggactatagagtgctgatggcagagatt
ggtgaggatttggataaatctgatgtgtcctcattaattttcctcatgaaggattacatg
ggccgaggcaagataagcaaggagaagagtttcttggaccttgtggttgagttggagaaa
ctaaatctggttgccccagatcaactggatttattagaaaaatgcctaaagaacatccac
agaatagacctgaagacaaaaatccagaagtacaagcagtctgttcaaggagcagggaca
agttacaggaatgttctccaagcagcaatccaaaagagtctcaaggatccttcaaataac
ttcaggctccataatgggagaagtaaagaacaaagacttaaggaacagcttggcgctcaa
caagaaccagtgaagaaatccattcaggaatcagaagcttttttgcctcagagcatacct
gaagagagatacaagatgaagagcaagcccctaggaatctgcctgataatcgattgcatt
ggcaatgagacagagcttcttcgagacaccttcacttccctgggctatgaagtccagaaa
ttcttgcatctcagtatgcatggtatatcccagattcttggccaatttgcctgtatgccc
gagcaccgagactacgacagctttgtgtgtgtcctggtgagccgaggaggctcccagagt
gtgtatggtgtggatcagactcactcagggctccccctgcatcacatcaggaggatgttc
atgggagattcatgcccttatctagcagggaagccaaagatgttttttattcagaactat


You should be able to copy/paste the above code into the text box and hit submit to return 6 DIFFERENT sequences. If i run this through Perl, calling the same dna sequence via file, this works fine:

Reading Frame 1: SAEVIHQVEEALDTDEKEMLRDVAIDVVPPNVRDLALVELDILRERGKLSVGDLAELLYRVRRFDLLKRILKMDRKAVETHLLRNPHLVSDYRVLMAEIGEDLDKSDVSSLIFLMKDYMGRGKISKEKSFLDLVVELEKLNLVAPDQLDLLEKCLKNIHRIDLKTKIQKYKQSVQGAGTSYRNVLQAAIQKSLKDPSNNFRLHNGRSKEQRLKEQLGAQQEPVKKSIQESEAFLPQSIPEERYKMKSKPLGICLIIDCIGNETELLRDTFTSLGYEVQKFLHLSMHGISQILGQFACMPEHRDYDSFVCVLVSRGGSQSVYGVDQTHSGLPLHHIRRMFMGDSCPYLAGKPKMFFIQNYVVSEGQLEDSSLLEVDGPAMKNVEFKAQKRGLCTVHREADFFWSLCTADMSLLEQSHSSPSLYLQCLSQKLRQERKRPLLDLHIELNGYMYDWNSRVSAKEKYYVWLQHTLRKKLILSYT_;<br>
Reading Frame 2: RQSFFETPSLPWAMKSRNSCISVCMVYPRFLANLPVCPSTETTTALCVSW_;<br>
Reading Frame 3: PLSSREAKDVFYSELCGVRGPAGGQQPLGGGWASDEECGIQGSEARAVHSSPRS_;<br>
Reading Frame 4: HTEMQEFLDFIAQGSEGVSKKLCLIANAIDYQADS_;<br>
Reading Frame 5: QSIIRQIPRGLLFILYLSSGML_;<br>
Reading Frame 6: RTHQIYPNPHQSLPSALYSPKQGEGS_;<br>

but if this is run through HTML, i receive sequences 4, 5 and 6 being exactly the same which shouldnt be the case. In fact none of the sequences are in the order shown above which i find to be bizarre!

My file processing script is shown below:

Thanks
#!/usr/bin/perl -w

# Perl programme to take the Reading Frame Sequences from

# "ORFfinder.pl", convert the DNA sequences into protein sequences and

# cleave the protein sequence depending on the option selected by

# user
 

use strict;

use warnings;

use ReadingFrameModules;
 

use CGI; # a predefined module

my $query = new CGI;
 

# Initialise variables

my $orfprotein1 = '';

my $orfprotein2 = '';

my $orfprotein3 = '';

my $orfprotein4 = '';

my $orfprotein5 = '';

my $orfprotein6 = '';

my $codon;
 
 

my($longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6)=@ARGV;
 
 

# Convert DNA sequence to Protein sequence - Translate each three base

# codon into an amino acid, and append to the protein
 

# READING FRAME 1
 

for(my $i=0; $i < (length($longorf1) -2) ; $i += 3) {

$codon = substr($longorf1,$i,3);

$orfprotein1 .= codon2aa($codon);

}
 

# READING FRAME 2
 

for(my $i=0; $i < (length($longorf2) -2) ; $i += 3) {

$codon = substr($longorf2,$i,3);

$orfprotein2 .= codon2aa($codon);

}
 

# READING FRAME 3
 

for(my $i=0; $i < (length($longorf3) -2) ; $i += 3) {

$codon = substr($longorf3,$i,3);

$orfprotein3 .= codon2aa($codon);

}
 

# READING FRAME 4
 

for(my $i=0; $i < (length($longorf4) -2) ; $i += 3) {

$codon = substr($longorf4,$i,3);

$orfprotein4 .= codon2aa($codon);

}
 

# READING FRAME 5
 

for(my $i=0; $i < (length($longorf5) -2) ; $i += 3) {

$codon = substr($longorf5,$i,3);

$orfprotein5 .= codon2aa($codon);

}
 

# READING FRAME 6
 

for(my $i=0; $i < (length($longorf6) -2) ; $i += 3) {

$codon = substr($longorf6,$i,3);

$orfprotein6 .= codon2aa($codon);

}
 

#HTML OUTPUT
 

print "Content-type: text/html
 

<html>

<title>Page 2</title>

<body>

Reading Frame 1: $orfprotein1;<br>

Reading Frame 2: $orfprotein2;<br>

Reading Frame 3: $orfprotein3;<br>

Reading Frame 4: $orfprotein4;<br>

Reading Frame 5: $orfprotein5;<br>

Reading Frame 6: $orfprotein6;<br>

</body>

</html>
 

";

Open in new window

0
Comment
Question by:StephenMcGowan
  • 5
  • 3
8 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 24360315
How do you pass @ARGV to the program when it is  run through HTML?
0
 

Author Comment

by:StephenMcGowan
ID: 24360353
Hey ozo,

Sorry, how do you mean?

My assumption was that:
my($longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6)=@ARGV;

brought the $longorfs generated from my first script and imported them into this script for me to then play around with them and then create new html page shown at the bottom of the script.

So basically script 1 feeds in the dna sequences  from my form and generates 6 $longorfs
                    script 2 generates 6 $orfproteins from the $longorfs and reports this as a new HTML page

hope this helps
0
 
LVL 84

Expert Comment

by:ozo
ID: 24360380
Are you calling from your first script, or through HTML?
0
 

Author Comment

by:StephenMcGowan
ID: 24360418
Ahh right, calling from my first script:

-----FIRST SCRIPT-----

#!/usr/bin/perl -w
# Perl programme to read in FastA format to find all possible open
# reading frames (ORFS) beginning with ATG and ending with a stop codon,
# TGA, TAA, TAG)

# Analyse all six open reading frames and predict ORFS in all six. Only
# longest ORF will be used.

use strict;
use warnings;
use ReadingFrameModules;
use CGI;
my $query = new CGI;

# Initialise variables
my @file_data = ();
my $dna = '';
my $dna2 = '';
my $dna3 = '';
my $dna5 = '';
my $dna6 = '';
my $revcom = '';
my $revcom1 = '';
my $revcom2 = '';
my $longorf1 = '';
my $longorf2 = '';
my $longorf3 = '';
my $longorf4 = '';
my $longorf5 = '';
my $longorf6 = '';

$dna = $query->param('dna-textbox');

# Extract the sequence data from the contents of the Fasta file

# $dna = extract_sequence_from_fasta_data(@file_data);

# feed the dna data into open_reading_frame to return the longest ORF

# print "\n -------Reading Frame 1-------\n\n";
$longorf1 = open_reading_frame($dna);
# print $longorf1;

# print "\n -------Reading Frame 2-------\n\n";
# remove first base from sequence
$dna2 = substr $dna, 1;
$longorf2 = open_reading_frame($dna2);
# print $longorf2;

# print "\n -------Reading Frame 3-------\n\n";
# remove first base from $dna2
$dna3 = substr $dna2, 1;
$longorf3 = open_reading_frame($dna3);
# print $longorf3;

#Reverse compliment the DNA sequence
$revcom = revcom($dna);

# print "\n -------Reading Frame 4-------\n\n";
# print $revcom;
$longorf4 = open_reading_frame($revcom);
# print $longorf4;

# print "\n -------Reading Frame 5-------\n\n";
#remove first base from sequence
$dna5 = substr $revcom, 1;
$longorf5 = open_reading_frame($dna5);
# print $longorf5;

# print "\n -------Reading Frame 6-------\n\n";
#remove a further base from the sequence
$dna6 = substr $dna5, 1;
$longorf6 = open_reading_frame($dna6);
# print $longorf6;

#Transfer Open Reading Frames over to ProteinDigest
system './proteindigest.pl', $longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6;



----SCRIPT 2----
 

#!/usr/bin/perl -w

# Perl programme to take the Reading Frame Sequences from

# "ORFfinder.pl", convert the DNA sequences into protein sequences and

# cleave the protein sequence depending on the option selected by

# user
 

use strict;

use warnings;

use ReadingFrameModules;
 

use CGI; # a predefined module

my $query = new CGI;
 

# my $enzyme = [FILL THIS OUT....]
 
 

# Initialise variables

my $orfprotein1 = '';

my $orfprotein2 = '';

my $orfprotein3 = '';

my $orfprotein4 = '';

my $orfprotein5 = '';

my $orfprotein6 = '';

my $codon;
 
 

my($longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6)=@ARGV;
 
 

# Convert DNA sequence to Protein sequence - Translate each three base

# codon into an amino acid, and append to the protein
 

# READING FRAME 1

# print "\n -------Reading Frame 1-------\n\n";
 

for(my $i=0; $i < (length($longorf1) -2) ; $i += 3) {

$codon = substr($longorf1,$i,3);

$orfprotein1 .= codon2aa($codon);

}

# print $orfprotein1;
 

# READING FRAME 2

# print "\n -------Reading Frame 2-------\n\n";
 

for(my $i=0; $i < (length($longorf2) -2) ; $i += 3) {

$codon = substr($longorf2,$i,3);

$orfprotein2 .= codon2aa($codon);

}

# print $orfprotein2;
 

# READING FRAME 3

# print "\n -------Reading Frame 3-------\n\n";
 

for(my $i=0; $i < (length($longorf3) -2) ; $i += 3) {

$codon = substr($longorf3,$i,3);

$orfprotein3 .= codon2aa($codon);

}

# print $orfprotein3;
 

# READING FRAME 4

# print "\n -------Reading Frame 4-------\n\n";
 

for(my $i=0; $i < (length($longorf4) -2) ; $i += 3) {

$codon = substr($longorf4,$i,3);

$orfprotein4 .= codon2aa($codon);

}

# print $orfprotein4;
 

# READING FRAME 5

# print "\n -------Reading Frame 5-------\n\n";
 

for(my $i=0; $i < (length($longorf5) -2) ; $i += 3) {

$codon = substr($longorf5,$i,3);

$orfprotein5 .= codon2aa($codon);

}

# print $orfprotein5;
 

# READING FRAME 6

# print "\n -------Reading Frame 6-------\n\n";
 

for(my $i=0; $i < (length($longorf6) -2) ; $i += 3) {

$codon = substr($longorf6,$i,3);

$orfprotein6 .= codon2aa($codon);

}

# print $orfprotein6;
 
 
 

#HTML OUTPUT
 

print "Content-type: text/html
 

<html>

<title>Page 2</title>

<body>

Reading Frame 1: $orfprotein1;<br>

Reading Frame 2: $orfprotein2;<br>

Reading Frame 3: $orfprotein3;<br>

Reading Frame 4: $orfprotein4;<br>

Reading Frame 5: $orfprotein5;<br>

Reading Frame 6: $orfprotein6;<br>

</body>

</html>
 

";

Open in new window

0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:StephenMcGowan
ID: 24360662
Just noticed something strange with this...

I assumed the difference between the Perl output and the textbox html entering output would have been due to the textbox itself.

When copying and pasting the DNA code and whacking it into the textbox, i was copying and pasting with the cursor was below the last line (see attached picture) which would return the incorrect sequences.

But when pressed backspace, going back to the last sequence entry and removing the last cursor line, frames 4,5 and 6 seem to read the same as the perl output... success!

Which leads to the issue if theres any way of perl taking this issue into account for text box entry? i.e. removing the last line:
 
accctgctg
atagatcta
|

if this happens to be the case?

And strangely, 4,5 and 6 have come out in the right order, where as 1,2 and 3 are still in the wrong orders?

the plot thickens!

Stephen.
textboxpic1.jpg
0
 

Author Comment

by:StephenMcGowan
ID: 24360709
I'm guessing the text box issue can be resolved in perl:

>TESTSEQUENCE
atgtctgctgaagtcatccatcaggttg...
|

so pseudocode off the top of my head, the first line would need to be chopped:
if input data begins with ">" remove line

remove all blanks and spaces from the input data (this should resolve the text cursor problem)

so this should return data which starts at the first base of the sequence and finishes at the last base with nothing else.

just a little pseudocode off the top of my head, no idea on the actual code i'd use.
0
 

Author Comment

by:StephenMcGowan
ID: 24360895
Here's where i think i'm going wrong:

ok so to begin with i feed in the data from the text box:
$dna1 = $query->param('dna-textbox');
then originally i'd have a sub-routine which would load a file (not textbox) and convert this into the array @file_data:

@file_data = get_file_data('testdnasequence');

subroutine:

sub get_file_data {

    my($filename) = @_;

    # Initialize variables
    my @filedata = (  );

    unless( open(GET_FILE_DATA, $filename) ) {
        print STDERR "Cannot open file \"$filename\"\n\n";
        exit;
    }

    @filedata = <GET_FILE_DATA>;

    close GET_FILE_DATA;

    return @filedata;
}

so this would be from a file and not a textbox input.

A further subroutine would then take the array and create a string which contained pure dna code... no spaces and no header, but this accepted an array!!

# A subroutine to extract FASTA sequence data from an array

sub extract_sequence_from_fasta_data {

    my(@fasta_file_data) = @_;

    use strict;
    use warnings;

    # Declare and initialise variables
    my $sequence = '';

    foreach my $line (@fasta_file_data) {

        # discard blank line
        if ($line =~ /^\s*$/) {
            next;

        # discard comment line
        } elsif($line =~ /^\s*#/) {
            next;

        # discard fasta header line
        } elsif($line =~ /^>/) {
            next;

        # keep line, add to sequence string
        } else {
            $sequence .= $line;
        }
    }

    # remove non-sequence data (in this case, whitespace) from $sequence
    # string
    $sequence =~ s/\s//g;

    return $sequence;


the problem i think i have is that:

$dna1 = $query->param('dna-textbox');

is calling the data a string, i'm then trying to run this into a subroutine which clears all header lines. whitespace etc but it is expecting an array:

# A subroutine to extract FASTA sequence data from an array

Is there any way of it accepting my $dna1 string instead? i'm guessing i can skip out the fileupload subroutine for testing files if i'm going with the textbox approach.
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 24361985
sub extract_sequence_from_fasta_data {
  local $_ = join'',@_;
      s/^>.*//gm;
      s/^\s*#.*//gm;
      s/\s+//g;
     return $_;
 }
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

25 Experts available now in Live!

Get 1:1 Help Now