Link to home
Start Free TrialLog in
Avatar of StephenMcGowan
StephenMcGowan

asked on

Data Upload

On an online form i want to give my user the option to either copy-paste text into a text box... or have the option to upload a file from a directory on their computer.

So far i've mainly been dealing with text box:

# Feed in data from text box on web-form

$dna1 = $query->param('dna-textbox');

i'm just wondering.. at the front end of my script how i'd go about making it more versatile to also accept file upload.

$dna1 = $query->param('dna-textbox'); ELSE $query->param('fileupload')
Avatar of DocSeltsam
DocSeltsam
Flag of Germany image

Hi there,

if (defined ($query->param('fileupload')))
{
  handle the upload;
}
else
{
  go on with  $query->param('dna-textbox');
}


--TheDoctor
When you say front-end, I assume you mean in the HTML web form, not the Perl CGI script.

Try the HTML below.

However, also set some things in your Perl program, are you using the -T flag for taint checking, probably should.
Also set a max file upload size:

$CGI::POST_MAX = 2000000;  # Max 2 mb file

Then in your Perl script, you can access like your code above:  $query->param("fileupload");

Use $query->param("fileupload")  to get the filename
Use $query->upload("fileupload") to get the file handle so you can read the contents


<form action="/cgi-bin/action.pl" method="POST" enctype="multipart/form-data">
 
  <input type="text" name="dna-textbox" id="dna-textbox"/>
 
  File to upload:  <input type="file" id="fileupload" name="fileupload"/>
 
</form>

Open in new window

Avatar of StephenMcGowan
StephenMcGowan

ASKER

On a HTML level... i've tried:

  <label>Enter a DNA Sequence:<br>
    <textarea name='dna-textbox' id='dna-textbox' cols='45' rows='5'></textarea><form id='form1' name='form1' method='post' action='ORFfinder.pl'>
  </label>
  <p>File to upload:</p>
<input type="file" id="fileupload" name="fileupload"/>

Which doesnt seem to have worked: Internal Server Error.

I was also talking about my perl script accepting two form options:

$dna1 = EITHER ACCEPT $query->param('dna-textbox'); OR ELSE ACCEPT $query->param('fileupload')

Thanks
HTML by itself will not generate an internal server error, you must be embedding the HTML inside a Perl script? If so, please post the Perl script. I copied this from a working Perl script I have, so I know the approach works. It is just a syntax error on your side.

DrSeltsam already posted the if/then solution portion.
Hi DocSeltsam,

so you're saying:

if (defined ($query->param('fileupload')))
{
$dna_filename = <STDIN>;
unless ( open(DNAFILE, $dna_filename) ) {
print "Cannot open file \"$dna_filename\"\n\n";
exit;
}
@dna = <DNAFILE>;
close DNAFILE;
$dna = join( ' ', @DNA);
$dna = ~ s/\s//g;
}
else
{
dna1 = $query->param('dna-textbox');
$dna = extract_string_sequence_from_fasta_data($dna1);
}

would this work?

You don't have to open an input file, its already provided, as I said use $query->upload('fileupload') to obtain the open file stream, don't use <STDIN>

Also, since this is Perl, you must pay attention to case sensitivity. Your variables are all mixed. In 1 place you say @dna, then you say @DNA

Try this way.
if (defined ($query->param('fileupload')))
{
   $dna_filename = $query->param('fileupload');
   $dna_file = $query->upload('fileupload');
   @dna = <$dna_file>;
   close $dna_file;
   $dna = join( ' ', @dna);
   $dna = ~ s/\s//g;
}

Open in new window

Hi there,

there is nothing I could add to mrjoltcola i guess.
You might want to read the file different, but that depends on your use case.

--TheDoctor
Here is a full working sample, however it does not save the file, it just reads it in and prints it to the browser in response

#!/usr/bin/perl
 
use CGI qw/:standard/;
 
print header,
        start_html('A Simple Example'),
        h1('A Simple Example'),
        start_form,
        "What's your name? ",textfield('name'),p,
        "What's the combination?", p,
        checkbox_group(-name=>'words',
                       -values=>['eenie','meenie','minie','moe'],
                       -defaults=>['eenie','minie']), p,
        "What's your favorite color? ",
        popup_menu(-name=>'color',
                   -values=>['red','green','blue','chartreuse']),p,
        filefield(-name => 'fileupload'),
        submit,
        end_form,
        hr;
 
if (param()) {
       my $name      = param('name');
       my $keywords  = join ', ',param('words');
       my $color     = param('color');
       my $filename  = param('fileupload');
 
       if($filename) {
          my $uploadfh = upload("fileupload");
          while(<$uploadfh>) {
             $str .= $_;
          }
          close $uploadfh;
          print
             "Your file ($filename) was: " . $str,
             hr;
       }
 
       print "Your name is",em(escapeHTML($name)),p,
             "The keywords are: ",em(escapeHTML($keywords)),p,
             "Your favorite color is ",em(escapeHTML($color)),
             hr;
 
}
 
print end_html;

Open in new window

for:

if (defined ($query->param('fileupload')))
{
   $dna_filename = $query->param('fileupload');
   $dna_file = $query->upload('fileupload');
   @dna = <$dna_file>;
   close $dna_file;
   $dna = join( ' ', @dna);
   $dna = ~ s/\s//g;
}

how should i initialise the variables $dna_file and @dna?
You don't need to initialize them, they are intialized in the code above. Now, if using "strict", you should "declare" them in  your program, either globally, or local to the procedure, depending on where you need the variables.

my $dna_file;
my @dna;
I've tried running the above script, but receive the error message:

Can't use an undefined value as a symbol reference

for the line: close $dna_file;
Hi Stephen, I took your own script as a starting point. It is normally recommended for you, as the asker, to post either the full script, or an adequate portion to assist us in helping you.
The way i've been shown to go about this using a file not uploaded using CGI is as follows:

########

#Read in the contents of the file "sample.dna"'
@file_data = get_file_data("sample.dna");

# Subroutine code:
# A subroutine to get data from a file given its filename

sub get_file_data {

    my($filename) = @_;

    use strict;
    use warnings;

    # Initialize variables
    my @filedata = (  );

    unless( open(GET_FILE_DATA, $filename) ) {
        print STDERR "Cannot open file \"$filename\"\n\n";
        exit;
    }

    @filedata = <GET_FILE_DATA>;

    close GET_FILE_DATA;

    return @filedata;
}



#Extract the sequence data from the contents of the file "sample.dna"
$dna = extract_sequence_from_fasta_data(@file_data);

#Subroutine code

# A subroutine to extract FASTA sequence data from an array

sub extract_sequence_from_fasta_data {

    my(@fasta_file_data) = @_;

    use strict;
    use warnings;

    # Declare and initialise variables
    my $sequence = '';

    foreach my $line (@fasta_file_data) {

        # discard blank line
        if ($line =~ /^\s*$/) {
            next;

        # discard comment line
        } elsif($line =~ /^\s*#/) {
            next;

        # discard fasta header line
        } elsif($line =~ /^>/) {
            next;

        # keep line, add to sequence string
        } else {
            $sequence .= $line;
        }
    }

    # remove non-sequence data (in this case, whitespace) from $sequence
    # string
    $sequence =~ s/\s//g;

    return $sequence;


########

print $dna;

The difference of what i'm trying to do is call pull the file from an online web-form and then run this through the subroutines shown above... (maybe not get_file_data... if it's not needed?)
So i guess what i'm saying is, when a user uploads a file on the webform, it will be transferred over to a Perl script, converted into an array, and fed into:

# A subroutine to extract FASTA sequence data from an array

sub extract_sequence_from_fasta_data {

    my(@fasta_file_data) = @_;

    use strict;
    use warnings;

    # Declare and initialise variables
    my $sequence = '';

    foreach my $line (@fasta_file_data) {

        # discard blank line
        if ($line =~ /^\s*$/) {
            next;

        # discard comment line
        } elsif($line =~ /^\s*#/) {
            next;

        # discard fasta header line
        } elsif($line =~ /^>/) {
            next;

        # keep line, add to sequence string
        } else {
            $sequence .= $line;
        }
    }

    # remove non-sequence data (in this case, whitespace) from $sequence
    # string
    $sequence =~ s/\s//g;

    return $sequence;

returning a $sequence.
Below is a picture of my front end HTML.

What i'm trying to do in a perl script is basically say:

#If a user copies/pastes into a text box, take this approach and make the file into an array called @dna

else

#If not, then use the file upload, upload the file and make the file into an array called @dna
Input.jpg
Here is a rewrite. Try it out. I tested it but have no samples or DNA processing code, it just prints the contents of either the textarea, or the file upload.

If both are submitted, it ignores the file.

#!/usr/bin/perl
 
use strict;
use warnings;
use CGI qw/:standard/;
 
my $query = new CGI;
 
my $dna;
 
# If text box provided, take from that
if($query->param('dna-textbox')) {
   $dna = $query->param('dna-textbox');
}
# Else see if file upload
elsif($query->param('fileupload')) {
   # Retrieve the file from the web post instead of the filesystem
   my @file_data = get_file_data();
 
   #Extract the sequence data from the contents of the file "sample.dna"
   $dna = extract_sequence_from_fasta_data(@file_data);
}
 
print_output($dna);
 
 
# Subroutines go below:
 
# A subroutine to get data from a file given its filename
sub get_file_data {
    # Initialize variables
    my @filedata = ();
    if(defined $query->upload('fileupload')) {
       my $GET_FILE_DATA = $query->upload('fileupload');
       @filedata = <$GET_FILE_DATA>;
       close $GET_FILE_DATA;
    }
    return @filedata;
}
 
 
# A subroutine to extract FASTA sequence data from an array
 
sub extract_sequence_from_fasta_data {
 
    my(@fasta_file_data) = @_;
 
    # Declare and initialise variables
    my $sequence = '';
 
    foreach my $line (@fasta_file_data) {
 
        # discard blank line
        if ($line =~ /^\s*$/) {
            next;
 
        # discard comment line
        } elsif($line =~ /^\s*#/) {
            next;
 
        # discard fasta header line
        } elsif($line =~ /^>/) {
            next;
 
        # keep line, add to sequence string
        } else {
            $sequence .= $line;
        }
    }
 
    # remove non-sequence data (in this case, whitespace) from $sequence
    # string
    $sequence =~ s/\s//g;
 
    return $sequence;
}
 
 
sub print_output {
    my $dna = shift;
 
    print header,
        start_html('DNA Sample'),
        h1('DNA Sample'),
        start_form,
        "Please enter a Fasta DNA Sequence or upload a file:", p,
        textarea('dna-textbox', 'DNA Sequence Here', 5, 60 ), p,
        "Filename to upload", p,
        filefield('fileupload'), p,
        submit('Go'),
        end_form,
        hr;
 
    if($dna) {
        print "RESULTS:  $dna\n", p;
    }
}

Open in new window

hey mrjoltcola,

thanks for the feedback.. i've had a go at puuting the code into my script, have gone to the form and tried uploading a file, but i'm hit with the error:
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.

This doesn't really give any hints to my error, and i'm using

 use CGI::Carp 'fatalsToBrowser';

aswell.. is there anyway i can find out what's wrong?
Try running my script standalone and see if it works, because I tested it before sending it.

Otherwise, try running it from the command line, you can run Perl CGI's like this.

./dna.pl dna-textbox=AABB
The ./ is optional in most environments

dna.pl should work by itself

Make sure it is executable, if on UNIX do: chmod +x dna.pl

ok.. i copy pasted your code into a file i called inputmethod.pl

I've had a play with it, if you type in random code into the textbox it comes up in results, but if you try to upload a file... like the one i've attached, it errors.
New-Text-Document.txt
Been looking into this further... The code below seems to be working fine for textbox when it's all implemented into my main script, it's just the upload a file option which seems to be falling over.

At lines:    # Else see if file upload
                 elsif($query->param('fileupload'))  {

                  #  Retrieve the file from the web post instead of the filesystem
                  @file_data = get_file_data();

does there need to be a $query->upload('fileupload') in there somewhere?

it just seems the code starts off trying to generate @file_data by feeding () into the sub-routine "get_file_data". Does this retrieve from the web-post 'fileupload'? because at the moment i don't think the information is being passed from form to perl. Not sure how to test it with "print" though.

These are my thoughts anyway.
   # If a text box provided, take from that
if ($query->param('dna-textbox')) {
   $dna1 = $query->param('dna-textbox');
   $dna = extract_string_sequence_from_fasta_data($dna1);
}
   # Else see if file upload
elsif($query->param('fileupload'))  {
 
   #  Retrieve the file from the web post instead of the filesystem
   @file_data = get_file_data();
 
   #Extract the sequence from the contents of the file
   $dna = extract_sequence_from_fasta_data(@file_data);
}

Open in new window

>> does there need to be a $query->upload('fileupload') in there somewhere?

There is, I put it inside get_file_data().
I get the error message: Can't call method "upload" on an undefined value at ReadingFrameModules.pm line 10.

in the subroutine at:

if(defined $query->upload('fileupload')) {

Do i need to call CGI for the subroutine? use CGI;?
hmmm would seem not....

error message is still kicking in, with "use CGI;" in the sub-routine. :o/
It is hard to help you when you don't post your modifications.

Also, in the other thread you opened, your posted code sample is a mish-mosh of valid and invalid Perl code, mixed with HTML.
Sorry, here is all of my code that i'm using.. i'll clarify that i'm having problems with the file upload part of the online HTML form and how it is transferred into the Perl Script.

I hope this helps alot more than the snippets of code i've been giving.

Thanks.

form.txt
ORFfinder.txt
ReadingFrameModules.txt
It might be simpler if we put all of this into a single Perl script, then work like that, and you can modularize it later when it is working. What do you think?
that will be alot clearer, yeah... i can then go through it and modularise it later... i have updated my scripts since we last chatted so i will submit these on this thread in 2 seconds.
Attached DNALib.pm (ORFfinder.pl follows)
# Module DNALib.pm
#
 
use strict;
 
# A subroutine to extract FASTA sequence data from a string (for use 
# with text box data entry).
 
sub extract_string_sequence_from_fasta_data {
   local $_ = join'',@_;
   s/^>.*//gm;
   s/^\s*#.*//gm;
   s/\s+//g;
   return $_;
}
 
# A subroutine to extract FASTA sequence data from an array
 
sub extract_sequence_from_fasta_data {
 
    my(@fasta_file_data) = @_;
 
    # Declare and initialise variables
    my $sequence = '';
 
    foreach my $line (@fasta_file_data) {
 
        # discard blank line
        if ($line =~ /^\s*$/) {
            next;
 
        # discard comment line
        } elsif($line =~ /^\s*#/) {
            next;
 
        # discard fasta header line
        } elsif($line =~ /^>/) {
            next;
 
        # keep line, add to sequence string
        } else {
            $sequence .= $line;
        }
    }
 
    # remove non-sequence data (in this case, whitespace) from $sequence
    # string
    $sequence =~ s/\s//g;
 
    return $sequence;
}
 
# A subroutine to find the longest open reading frame (ORF) for a sequence
 
sub open_reading_frame {
 
    my($dna) = @_;
 
    #Declare and initialise variables
    my $longest_str ='';
    my $longest_len = 0;
 
    local $_ = $dna;
    s/\s+//g;
 
    # longest of the shortest sequences ending with TAA|TAG|TGA
#    while( /ATG(?=((?:...)*?(?:TAA|TAG|TGA)))/ig ){
     while( /\G(?:...)*?ATG(?=((?:...)*?(?:TAA|TAG|TGA)))/ig ){
        if( length $1 >$longest_len ){
             $longest_str=$1;
             $longest_len=length $1;
          }
      }
 
    return $longest_str;
}
 
# revcom
#
# A subroutine to compute the reverse complement of DNA sequence
 
sub revcom {
 
    my($dna) = @_;
 
    # First reverse the sequence
    my $revcom = reverse $dna;
 
    # Next, complement the sequence, dealing with upper and lower case
    # A->T, T->A, C->G, G->C
    $revcom =~ tr/ACGTacgt/TGCAtgca/;
 
    return $revcom;
}
 
# codon2aa
#
# A subroutine to translate a DNA 3-character codon to an amino acid
# Using hash lookup
 
sub codon2aa {
    my($codon) = @_;
 
    $codon = uc $codon;
 
    my(%genetic_code) = (
 
    'TCA' => 'S',    # Serine
    'TCC' => 'S',    # Serine
    'TCG' => 'S',    # Serine
    'TCT' => 'S',    # Serine
    'TTC' => 'F',    # Phenylalanine
    'TTT' => 'F',    # Phenylalanine
    'TTA' => 'L',    # Leucine
    'TTG' => 'L',    # Leucine
    'TAC' => 'Y',    # Tyrosine
    'TAT' => 'Y',    # Tyrosine
    'TAA' => '_',    # Stop
    'TAG' => '_',    # Stop
    'TGC' => 'C',    # Cysteine
    'TGT' => 'C',    # Cysteine
    'TGA' => '_',    # Stop
    'TGG' => 'W',    # Tryptophan
    'CTA' => 'L',    # Leucine
    'CTC' => 'L',    # Leucine
    'CTG' => 'L',    # Leucine
    'CTT' => 'L',    # Leucine
    'CCA' => 'P',    # Proline
    'CCC' => 'P',    # Proline
    'CCG' => 'P',    # Proline
    'CCT' => 'P',    # Proline
    'CAC' => 'H',    # Histidine
    'CAT' => 'H',    # Histidine
    'CAA' => 'Q',    # Glutamine
    'CAG' => 'Q',    # Glutamine
    'CGA' => 'R',    # Arginine
    'CGC' => 'R',    # Arginine
    'CGG' => 'R',    # Arginine
    'CGT' => 'R',    # Arginine
    'ATA' => 'I',    # Isoleucine
    'ATC' => 'I',    # Isoleucine
    'ATT' => 'I',    # Isoleucine
    'ATG' => 'M',    # Methionine
    'ACA' => 'T',    # Threonine
    'ACC' => 'T',    # Threonine
    'ACG' => 'T',    # Threonine
    'ACT' => 'T',    # Threonine
    'AAC' => 'N',    # Asparagine
    'AAT' => 'N',    # Asparagine
    'AAA' => 'K',    # Lysine
    'AAG' => 'K',    # Lysine
    'AGC' => 'S',    # Serine
    'AGT' => 'S',    # Serine
    'AGA' => 'R',    # Arginine
    'AGG' => 'R',    # Arginine
    'GTA' => 'V',    # Valine
    'GTC' => 'V',    # Valine
    'GTG' => 'V',    # Valine
    'GTT' => 'V',    # Valine
    'GCA' => 'A',    # Alanine
    'GCC' => 'A',    # Alanine
    'GCG' => 'A',    # Alanine
    'GCT' => 'A',    # Alanine
    'GAC' => 'D',    # Aspartic Acid
    'GAT' => 'D',    # Aspartic Acid
    'GAA' => 'E',    # Glutamic Acid
    'GAG' => 'E',    # Glutamic Acid
    'GGA' => 'G',    # Glycine
    'GGC' => 'G',    # Glycine
    'GGG' => 'G',    # Glycine
    'GGT' => 'G',    # Glycine
    );
 
    if(exists $genetic_code{$codon}) {
        return $genetic_code{$codon};
    }else{
 
            print STDERR "Bad codon \"$codon\"!!\n";
            exit;
    }
}
 
1;

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of mrjoltcola
mrjoltcola
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
See above, I combined into 2 files and tested with upload file as well as textbox. You must save the 1st file under DNALib.pm in the same directory.
Thanks mrjoltcola! it works a charm! how was I with my perl programming? this is my first real encounter with perl, so i'm really just starting out!
Was there much work involved in patching up the original script? what exactly was wrong with it?

Thanks alot again for all of your time, effort and mainly patience!

Thanks!
You are welcome.

Since you asked, a few of things I found.

1) There were re-defined variables in one pl module that were already defined in the main module (example :  my $query = new CGI;)
2) There was a typo, in 1 place there was @file_data and other place @filedata (misname)

3) Also, there is no need to "use strict, .etc" inside each function just do it once at the top of the script.

But the main problem, with the form upload, was the lack of the enctype in the form tab (multipart/form-data)

<form id='form1' name='form1' method='post' action='ORFfinder.pl' enctype="multipart/form-data">


Other than that, not bad for a beginner. I hope I wasn't too hard on you :)
I meant "form tag"