Solved

HTML output different to it's Perl output

Posted on 2009-05-11
8
206 Views
Last Modified: 2012-06-27
I've set up a basic web interface with a text box at:
http://biolinux.smith.man.ac.uk/~campus12sm/assessment.pl

>TESTSEQUENCE
atgtctgctgaagtcatccatcaggttgaagaagcacttgatacagatgagaaggagatg
ctgcgggatgttgctatagatgtggttccacctaatgtcagggaccttgcgctcgtcgag
ctggatattttacgggaaagaggtaagctgtctgtcggggacttggctgaactgctctac
agagtgaggcgatttgacctgctcaaacgtatcttgaagatggacagaaaagctgtggag
acccacctgctcaggaaccctcaccttgtttcggactatagagtgctgatggcagagatt
ggtgaggatttggataaatctgatgtgtcctcattaattttcctcatgaaggattacatg
ggccgaggcaagataagcaaggagaagagtttcttggaccttgtggttgagttggagaaa
ctaaatctggttgccccagatcaactggatttattagaaaaatgcctaaagaacatccac
agaatagacctgaagacaaaaatccagaagtacaagcagtctgttcaaggagcagggaca
agttacaggaatgttctccaagcagcaatccaaaagagtctcaaggatccttcaaataac
ttcaggctccataatgggagaagtaaagaacaaagacttaaggaacagcttggcgctcaa
caagaaccagtgaagaaatccattcaggaatcagaagcttttttgcctcagagcatacct
gaagagagatacaagatgaagagcaagcccctaggaatctgcctgataatcgattgcatt
ggcaatgagacagagcttcttcgagacaccttcacttccctgggctatgaagtccagaaa
ttcttgcatctcagtatgcatggtatatcccagattcttggccaatttgcctgtatgccc
gagcaccgagactacgacagctttgtgtgtgtcctggtgagccgaggaggctcccagagt
gtgtatggtgtggatcagactcactcagggctccccctgcatcacatcaggaggatgttc
atgggagattcatgcccttatctagcagggaagccaaagatgttttttattcagaactat


You should be able to copy/paste the above code into the text box and hit submit to return 6 DIFFERENT sequences. If i run this through Perl, calling the same dna sequence via file, this works fine:

Reading Frame 1: SAEVIHQVEEALDTDEKEMLRDVAIDVVPPNVRDLALVELDILRERGKLSVGDLAELLYRVRRFDLLKRILKMDRKAVETHLLRNPHLVSDYRVLMAEIGEDLDKSDVSSLIFLMKDYMGRGKISKEKSFLDLVVELEKLNLVAPDQLDLLEKCLKNIHRIDLKTKIQKYKQSVQGAGTSYRNVLQAAIQKSLKDPSNNFRLHNGRSKEQRLKEQLGAQQEPVKKSIQESEAFLPQSIPEERYKMKSKPLGICLIIDCIGNETELLRDTFTSLGYEVQKFLHLSMHGISQILGQFACMPEHRDYDSFVCVLVSRGGSQSVYGVDQTHSGLPLHHIRRMFMGDSCPYLAGKPKMFFIQNYVVSEGQLEDSSLLEVDGPAMKNVEFKAQKRGLCTVHREADFFWSLCTADMSLLEQSHSSPSLYLQCLSQKLRQERKRPLLDLHIELNGYMYDWNSRVSAKEKYYVWLQHTLRKKLILSYT_;<br>
Reading Frame 2: RQSFFETPSLPWAMKSRNSCISVCMVYPRFLANLPVCPSTETTTALCVSW_;<br>
Reading Frame 3: PLSSREAKDVFYSELCGVRGPAGGQQPLGGGWASDEECGIQGSEARAVHSSPRS_;<br>
Reading Frame 4: HTEMQEFLDFIAQGSEGVSKKLCLIANAIDYQADS_;<br>
Reading Frame 5: QSIIRQIPRGLLFILYLSSGML_;<br>
Reading Frame 6: RTHQIYPNPHQSLPSALYSPKQGEGS_;<br>

but if this is run through HTML, i receive sequences 4, 5 and 6 being exactly the same which shouldnt be the case. In fact none of the sequences are in the order shown above which i find to be bizarre!

My file processing script is shown below:

Thanks
#!/usr/bin/perl -w

# Perl programme to take the Reading Frame Sequences from

# "ORFfinder.pl", convert the DNA sequences into protein sequences and

# cleave the protein sequence depending on the option selected by

# user
 

use strict;

use warnings;

use ReadingFrameModules;
 

use CGI; # a predefined module

my $query = new CGI;
 

# Initialise variables

my $orfprotein1 = '';

my $orfprotein2 = '';

my $orfprotein3 = '';

my $orfprotein4 = '';

my $orfprotein5 = '';

my $orfprotein6 = '';

my $codon;
 
 

my($longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6)=@ARGV;
 
 

# Convert DNA sequence to Protein sequence - Translate each three base

# codon into an amino acid, and append to the protein
 

# READING FRAME 1
 

for(my $i=0; $i < (length($longorf1) -2) ; $i += 3) {

$codon = substr($longorf1,$i,3);

$orfprotein1 .= codon2aa($codon);

}
 

# READING FRAME 2
 

for(my $i=0; $i < (length($longorf2) -2) ; $i += 3) {

$codon = substr($longorf2,$i,3);

$orfprotein2 .= codon2aa($codon);

}
 

# READING FRAME 3
 

for(my $i=0; $i < (length($longorf3) -2) ; $i += 3) {

$codon = substr($longorf3,$i,3);

$orfprotein3 .= codon2aa($codon);

}
 

# READING FRAME 4
 

for(my $i=0; $i < (length($longorf4) -2) ; $i += 3) {

$codon = substr($longorf4,$i,3);

$orfprotein4 .= codon2aa($codon);

}
 

# READING FRAME 5
 

for(my $i=0; $i < (length($longorf5) -2) ; $i += 3) {

$codon = substr($longorf5,$i,3);

$orfprotein5 .= codon2aa($codon);

}
 

# READING FRAME 6
 

for(my $i=0; $i < (length($longorf6) -2) ; $i += 3) {

$codon = substr($longorf6,$i,3);

$orfprotein6 .= codon2aa($codon);

}
 

#HTML OUTPUT
 

print "Content-type: text/html
 

<html>

<title>Page 2</title>

<body>

Reading Frame 1: $orfprotein1;<br>

Reading Frame 2: $orfprotein2;<br>

Reading Frame 3: $orfprotein3;<br>

Reading Frame 4: $orfprotein4;<br>

Reading Frame 5: $orfprotein5;<br>

Reading Frame 6: $orfprotein6;<br>

</body>

</html>
 

";

Open in new window

0
Comment
Question by:StephenMcGowan
  • 5
  • 3
8 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 24360315
How do you pass @ARGV to the program when it is  run through HTML?
0
 

Author Comment

by:StephenMcGowan
ID: 24360353
Hey ozo,

Sorry, how do you mean?

My assumption was that:
my($longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6)=@ARGV;

brought the $longorfs generated from my first script and imported them into this script for me to then play around with them and then create new html page shown at the bottom of the script.

So basically script 1 feeds in the dna sequences  from my form and generates 6 $longorfs
                    script 2 generates 6 $orfproteins from the $longorfs and reports this as a new HTML page

hope this helps
0
 
LVL 84

Expert Comment

by:ozo
ID: 24360380
Are you calling from your first script, or through HTML?
0
 

Author Comment

by:StephenMcGowan
ID: 24360418
Ahh right, calling from my first script:

-----FIRST SCRIPT-----

#!/usr/bin/perl -w
# Perl programme to read in FastA format to find all possible open
# reading frames (ORFS) beginning with ATG and ending with a stop codon,
# TGA, TAA, TAG)

# Analyse all six open reading frames and predict ORFS in all six. Only
# longest ORF will be used.

use strict;
use warnings;
use ReadingFrameModules;
use CGI;
my $query = new CGI;

# Initialise variables
my @file_data = ();
my $dna = '';
my $dna2 = '';
my $dna3 = '';
my $dna5 = '';
my $dna6 = '';
my $revcom = '';
my $revcom1 = '';
my $revcom2 = '';
my $longorf1 = '';
my $longorf2 = '';
my $longorf3 = '';
my $longorf4 = '';
my $longorf5 = '';
my $longorf6 = '';

$dna = $query->param('dna-textbox');

# Extract the sequence data from the contents of the Fasta file

# $dna = extract_sequence_from_fasta_data(@file_data);

# feed the dna data into open_reading_frame to return the longest ORF

# print "\n -------Reading Frame 1-------\n\n";
$longorf1 = open_reading_frame($dna);
# print $longorf1;

# print "\n -------Reading Frame 2-------\n\n";
# remove first base from sequence
$dna2 = substr $dna, 1;
$longorf2 = open_reading_frame($dna2);
# print $longorf2;

# print "\n -------Reading Frame 3-------\n\n";
# remove first base from $dna2
$dna3 = substr $dna2, 1;
$longorf3 = open_reading_frame($dna3);
# print $longorf3;

#Reverse compliment the DNA sequence
$revcom = revcom($dna);

# print "\n -------Reading Frame 4-------\n\n";
# print $revcom;
$longorf4 = open_reading_frame($revcom);
# print $longorf4;

# print "\n -------Reading Frame 5-------\n\n";
#remove first base from sequence
$dna5 = substr $revcom, 1;
$longorf5 = open_reading_frame($dna5);
# print $longorf5;

# print "\n -------Reading Frame 6-------\n\n";
#remove a further base from the sequence
$dna6 = substr $dna5, 1;
$longorf6 = open_reading_frame($dna6);
# print $longorf6;

#Transfer Open Reading Frames over to ProteinDigest
system './proteindigest.pl', $longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6;



----SCRIPT 2----
 

#!/usr/bin/perl -w

# Perl programme to take the Reading Frame Sequences from

# "ORFfinder.pl", convert the DNA sequences into protein sequences and

# cleave the protein sequence depending on the option selected by

# user
 

use strict;

use warnings;

use ReadingFrameModules;
 

use CGI; # a predefined module

my $query = new CGI;
 

# my $enzyme = [FILL THIS OUT....]
 
 

# Initialise variables

my $orfprotein1 = '';

my $orfprotein2 = '';

my $orfprotein3 = '';

my $orfprotein4 = '';

my $orfprotein5 = '';

my $orfprotein6 = '';

my $codon;
 
 

my($longorf1,$longorf2,$longorf3,$longorf4,$longorf5,$longorf6)=@ARGV;
 
 

# Convert DNA sequence to Protein sequence - Translate each three base

# codon into an amino acid, and append to the protein
 

# READING FRAME 1

# print "\n -------Reading Frame 1-------\n\n";
 

for(my $i=0; $i < (length($longorf1) -2) ; $i += 3) {

$codon = substr($longorf1,$i,3);

$orfprotein1 .= codon2aa($codon);

}

# print $orfprotein1;
 

# READING FRAME 2

# print "\n -------Reading Frame 2-------\n\n";
 

for(my $i=0; $i < (length($longorf2) -2) ; $i += 3) {

$codon = substr($longorf2,$i,3);

$orfprotein2 .= codon2aa($codon);

}

# print $orfprotein2;
 

# READING FRAME 3

# print "\n -------Reading Frame 3-------\n\n";
 

for(my $i=0; $i < (length($longorf3) -2) ; $i += 3) {

$codon = substr($longorf3,$i,3);

$orfprotein3 .= codon2aa($codon);

}

# print $orfprotein3;
 

# READING FRAME 4

# print "\n -------Reading Frame 4-------\n\n";
 

for(my $i=0; $i < (length($longorf4) -2) ; $i += 3) {

$codon = substr($longorf4,$i,3);

$orfprotein4 .= codon2aa($codon);

}

# print $orfprotein4;
 

# READING FRAME 5

# print "\n -------Reading Frame 5-------\n\n";
 

for(my $i=0; $i < (length($longorf5) -2) ; $i += 3) {

$codon = substr($longorf5,$i,3);

$orfprotein5 .= codon2aa($codon);

}

# print $orfprotein5;
 

# READING FRAME 6

# print "\n -------Reading Frame 6-------\n\n";
 

for(my $i=0; $i < (length($longorf6) -2) ; $i += 3) {

$codon = substr($longorf6,$i,3);

$orfprotein6 .= codon2aa($codon);

}

# print $orfprotein6;
 
 
 

#HTML OUTPUT
 

print "Content-type: text/html
 

<html>

<title>Page 2</title>

<body>

Reading Frame 1: $orfprotein1;<br>

Reading Frame 2: $orfprotein2;<br>

Reading Frame 3: $orfprotein3;<br>

Reading Frame 4: $orfprotein4;<br>

Reading Frame 5: $orfprotein5;<br>

Reading Frame 6: $orfprotein6;<br>

</body>

</html>
 

";

Open in new window

0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:StephenMcGowan
ID: 24360662
Just noticed something strange with this...

I assumed the difference between the Perl output and the textbox html entering output would have been due to the textbox itself.

When copying and pasting the DNA code and whacking it into the textbox, i was copying and pasting with the cursor was below the last line (see attached picture) which would return the incorrect sequences.

But when pressed backspace, going back to the last sequence entry and removing the last cursor line, frames 4,5 and 6 seem to read the same as the perl output... success!

Which leads to the issue if theres any way of perl taking this issue into account for text box entry? i.e. removing the last line:
 
accctgctg
atagatcta
|

if this happens to be the case?

And strangely, 4,5 and 6 have come out in the right order, where as 1,2 and 3 are still in the wrong orders?

the plot thickens!

Stephen.
textboxpic1.jpg
0
 

Author Comment

by:StephenMcGowan
ID: 24360709
I'm guessing the text box issue can be resolved in perl:

>TESTSEQUENCE
atgtctgctgaagtcatccatcaggttg...
|

so pseudocode off the top of my head, the first line would need to be chopped:
if input data begins with ">" remove line

remove all blanks and spaces from the input data (this should resolve the text cursor problem)

so this should return data which starts at the first base of the sequence and finishes at the last base with nothing else.

just a little pseudocode off the top of my head, no idea on the actual code i'd use.
0
 

Author Comment

by:StephenMcGowan
ID: 24360895
Here's where i think i'm going wrong:

ok so to begin with i feed in the data from the text box:
$dna1 = $query->param('dna-textbox');
then originally i'd have a sub-routine which would load a file (not textbox) and convert this into the array @file_data:

@file_data = get_file_data('testdnasequence');

subroutine:

sub get_file_data {

    my($filename) = @_;

    # Initialize variables
    my @filedata = (  );

    unless( open(GET_FILE_DATA, $filename) ) {
        print STDERR "Cannot open file \"$filename\"\n\n";
        exit;
    }

    @filedata = <GET_FILE_DATA>;

    close GET_FILE_DATA;

    return @filedata;
}

so this would be from a file and not a textbox input.

A further subroutine would then take the array and create a string which contained pure dna code... no spaces and no header, but this accepted an array!!

# A subroutine to extract FASTA sequence data from an array

sub extract_sequence_from_fasta_data {

    my(@fasta_file_data) = @_;

    use strict;
    use warnings;

    # Declare and initialise variables
    my $sequence = '';

    foreach my $line (@fasta_file_data) {

        # discard blank line
        if ($line =~ /^\s*$/) {
            next;

        # discard comment line
        } elsif($line =~ /^\s*#/) {
            next;

        # discard fasta header line
        } elsif($line =~ /^>/) {
            next;

        # keep line, add to sequence string
        } else {
            $sequence .= $line;
        }
    }

    # remove non-sequence data (in this case, whitespace) from $sequence
    # string
    $sequence =~ s/\s//g;

    return $sequence;


the problem i think i have is that:

$dna1 = $query->param('dna-textbox');

is calling the data a string, i'm then trying to run this into a subroutine which clears all header lines. whitespace etc but it is expecting an array:

# A subroutine to extract FASTA sequence data from an array

Is there any way of it accepting my $dna1 string instead? i'm guessing i can skip out the fileupload subroutine for testing files if i'm going with the textbox approach.
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 24361985
sub extract_sequence_from_fasta_data {
  local $_ = join'',@_;
      s/^>.*//gm;
      s/^\s*#.*//gm;
      s/\s+//g;
     return $_;
 }
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

25 Experts available now in Live!

Get 1:1 Help Now