• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1565
  • Last Modified:

Perl, print staments, print to screen , print to file problems

perl newbie

I am a newbie.
I know some of the basics of Perl programming.
I now have to create a code taht will
open a file
read from file
remove the header
print the header less portion to a new file.
This is my fourthweek with perl and I can handle STDIN
My question: where in the code should I place my "print to file" statement?
I guess my understanding of the "flow" of the program is messed up.
Where should I declare the variables with "my"?
At the ver
beginning? Outside the subroutine?



Here is the code:


print "PLEASE ENTER THE FILENAME OF THE YOUR SEQUENCE:=";
chomp($seq_filename=<STDIN>);
#
open(PROTFILE,$seq_filename) or die "unable to open the file";
@seq=<PROTFILE>;
close PROTFILE;
#
#
foreach $newline (@seq) {
#
## discard blank newline
if ($newline =~ /^\s*$/) {
next;

## discard comment newline
} elsif($newline =~ /^\s*/) {
next;

# discard fasta header newline
} elsif($newline =~ /^>/) {
next;

## keep newline, add to sequence string
} else {
$sequence1 .= $newline;
}
#
}

# remove non-sequence data (in this case, whitespace) from $sequence string
 #Remove whitespace
$newline =~ s/\s//g;
@seq=split("",$newline); #splits string into an array

print " \nThe original file is:\n$sequence1 \n";


0
thestarcrossed
Asked:
thestarcrossed
  • 6
  • 4
  • 3
  • +3
6 Solutions
 
jasonsbytesCommented:
So you want to output the contents of @seq back to a new file?

I guess at the end of the script something like:

open (NEWFILE, "somefile.dat") || die "blah";
foreach(@seq)
{
   print NEWFILE $_ . "\n";
}
close (NEWFILE);

That will print each element in the array seq on a new line in somefile.dat.

Where you declare you variables with my depends on the scope of the variable.  If you declare it inside a sub, it is only availble inside the sub, if you declare it oustide, it is essentially global.

Most people declare globals at the top, and then if you need a local in a sub, you declare it in the sub at the top.

make sure to add:  "use strict;" at the beginning so it will warn you if you are using an undeclered variable name.
0
 
thestarcrossedAuthor Commented:
Hello Jason,
Thank you for the code.
The trouble is that when I used it yesterday and today,
it game me this error message.

Filehandle NEWFILE opened only for input at nonseqdat.pl line 63.


Thank you for your prompt response!
0
 
TalmashCommented:
open (NEWFILE, ">somefile.dat") || die "blah";

the ">" sign opens the file for output

tal
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
jasonsbytesCommented:
yes, i forgot the '>'

Do what Talmash says...  Or if you want to open for append use, '>>'
0
 
thestarcrossedAuthor Commented:
I guess there is aproblem with the logic flow in my problem.

I used the ">>" and yet its giving me error messages. I give up for now.
0
 
ozoCommented:
what error messages is it giving you?
0
 
jasonsbytesCommented:
You probably don't want to use ">>" instead use ">"  I was just letting you know that ">>" is available if you want to append to the file later.

What is the error?
0
 
ozoCommented:
} elsif($newline =~ /^\s*/) {
this will always match, since there will always be at least zero whitespace characters at the stat of the line
0
 
ozoCommented:
the while loop will end when $newline is empty, so this will cause @seq to be empty too
@seq=split("",$newline); #splits string into an array
0
 
TintinCommented:
I'd approach it like:

^\s*

Means match zero or more whitespace from the beginning of the line.  This will match *all* lines.

What do your comment lines begin with?
0
 
TintinCommented:
Whoops, deleted part of my previous post.  The code I'd use would be along the lines of:

#!/usr/bin/perl
use strict;
print "PLEASE ENTER THE FILENAME OF THE YOUR SEQUENCE:=";
chomp(my $seq_filename=<STDIN>);

open PROTFILE,$seq_filename  or die "Unable to open $seq_filename $!\n";
open NEWFILE, ">newfile.dat" or die "Can not open /path/to/newfile.dat $!\n";

while (<PROTFILE>) {
  next if (/^\s*$|^>/);
  next if (/^\s*/);

  s/\s+//g;
  print NEWFILE "$_\n";
}
0
 
thestarcrossedAuthor Commented:
Error messages:
"my" variable $input masks earlier declaration in same scope at codon3.pl line 1
8.
Name "main::INFILE" used only once: possible typo at codon3.pl line 112.
PLEASE ENTER THE FILENAME OF THE YOUR SEQUENCE:=25na.pep
25na.pep
Use of uninitialized value in substr at codon3.pl line 33, <STDIN> line 2.
Use of uninitialized value in substr at codon3.pl line 33, <STDIN> line 2.
codon2aaTCASTCCSTCGSTCTSTTCFTTTFTTALTTGLTACYTATYTAA_TAG_TGCCTGTCTGA_TGGWCTALCTCL
CTGLCTTLCCAPCCCPCCGPCCTPCACHCATHCAAQCAGQCGARCGCRCGGRCGTRATAIATCIATTIATGMACATACCT
ACGTACTTAACNAATNAAAKAAGKAGCSAGTSAGARAGGRGTAVGTCVGTGVGTTVGCAAGCCAGCGAGCTAGACDGATD
GAAEGAGEGGAGGGCGGGGGGGTGUse of uninitialized value in concatenation (.) or strin
g at codon3.pl line 110, <STDIN> line 2.
I translated the sequence



 into the protein

128

Use of uninitialized value in print at codon3.pl line 111, <STDIN> line 2.

C:\Perl>



My code:#!/usr/bin/perl
use strict;
print "PLEASE ENTER THE FILENAME OF THE YOUR SEQUENCE:=";
chomp(my $input=<STDIN>);

open PROTFILE,$input  or die "Unable to open $input $!\n";
open NEWFILE, ">newfile.dat" or die "Can not open /path/to/newfile.dat $!\n";
#open (OUTFILE,">>$myfile");
while (<PROTFILE>) {
  next if (/^\s*$|^>/);
  next if (/^\s*/);

  s/\s+//g;
  print NEWFILE "$_\n";
}
use strict;
use warnings;
my $input;
my @newarray1;
my $newline;
my $i;
my $myfile;
my $protein;
my $codon;
my $myresults;
my %genetic_code;
my $sequence;
$myfile = 'codon.txt';
$input = <STDIN>;
open (PROTFILE, $input);
open (OUTFILE,">>$myfile");
while($input){
$codon = substr($newline,$i,3);
$protein .= codon2aa($codon);
 ##calling the sub routine
## codon2aa
# # A subroutine to translate a sequence 3-character codon to an amino acid
# Version 3, using hash lookup
print "codon2aa";
sub codon2aa {
my($codon) = @newarray1;
$codon = uc ($codon);
 %genetic_code = (
'TCA' => 'S', # Serine
'TCC' => 'S', # Serine
'TCG' => 'S', # Serine
'TCT' => 'S', # Serine
'TTC' => 'F', # Phenylalanine
'TTT' => 'F', # Phenylalanine
'TTA' => 'L', # Leucine
'TTG' => 'L', # Leucine
'TAC' => 'Y', # Tyrosine
'TAT' => 'Y', # Tyrosine
'TAA' => '_', # Stop
'TAG' => '_', # Stop
'TGC' => 'C', # Cysteine
'TGT' => 'C', # Cysteine
'TGA' => '_', # Stop
'TGG' => 'W', # Tryptophan
'CTA' => 'L', # Leucine
'CTC' => 'L', # Leucine
'CTG' => 'L', # Leucine
'CTT' => 'L', # Leucine
'CCA' => 'P', # Proline
'CCC' => 'P', # Proline
'CCG' => 'P', # Proline
'CCT' => 'P', # Proline
'CAC' => 'H', # Histidine
'CAT' => 'H', # Histidine
'CAA' => 'Q', # Glutamine
'CAG' => 'Q', # Glutamine
'CGA' => 'R', # Arginine
'CGC' => 'R', # Arginine
'CGG' => 'R', # Arginine
'CGT' => 'R', # Arginine
'ATA' => 'I', # Isoleucine
'ATC' => 'I', # Isoleucine
'ATT' => 'I', # Isoleucine
'ATG' => 'M', # Methionine
'ACA' => 'T', # Threonine
'ACC' => 'T', # Threonine
'ACG' => 'T', # Threonine
'ACT' => 'T', # Threonine
'AAC' => 'N', # Asparagine
'AAT' => 'N', # Asparagine
'AAA' => 'K', # Lysine
'AAG' => 'K', # Lysine
'AGC' => 'S', # Serine
'AGT' => 'S', # Serine
'AGA' => 'R', # Arginine
'AGG' => 'R', # Arginine
'GTA' => 'V', # Valine
'GTC' => 'V', # Valine
'GTG' => 'V', # Valine
'GTT' => 'V', # Valine
'GCA' => 'A', # Alanine
'GCC' => 'A', # Alanine
'GCG' => 'A', # Alanine
'GCT' => 'A', # Alanine
'GAC' => 'D', # Aspartic Acid
'GAT' => 'D', # Aspartic Acid
'GAA' => 'E', # Glutamic Acid
'GAG' => 'E', # Glutamic Acid
'GGA' => 'G', # Glycine
'GGC' => 'G', # Glycine
'GGG' => 'G', # Glycine
'GGT' => 'G', # Glycine
);} print OUTFILE codon2aa ;
print codon2aa;
print "I translated the sequence\n\n$sequence\n\n into the protein\n\n$protein\n\n";
 print OUTFILE $myresults;
 close INFILE;
close OUTFILE;
exit;   }

0
 
ozoCommented:
splain
/usr/bin/splain: Reading from STDIN
"my" variable $input masks earlier declaration in same scope at codon3.pl line 18
"my" variable $input masks earlier declaration in same scope at codon3.pl line
        18 (#1)
    (W misc) A "my" or "our" variable has been redeclared in the current
    scope or statement, effectively eliminating all access to the previous
    instance.  This is almost always a typographical error.  Note that the
    earlier variable will still exist until the end of the scope or until
    all closure referents to it are destroyed.
   
Name "main::INFILE" used only once: possible typo at codon3.pl line 112.
Name "main::INFILE" used only once: possible typo at codon3.pl line 112 (#2)
    (W once) Typographical errors often show up as unique variable names.
    If you had a good reason for having a unique name, then just mention it
    again somehow to suppress the message.  The our declaration is
    provided for this purpose.
   
    NOTE: This warning detects symbols that have been used only once so $c, @c,
    %c, *c, &c, sub c{}, c(), and c (the filehandle or format) are considered
    the same; if a program uses $c only once but also uses any of the others it
    will not trigger this warning.
   
Use of uninitialized value in substr at codon3.pl line 33, <STDIN> line 2.
Use of uninitialized value in substr at codon3.pl line 33, <STDIN> line 2 (#3)
    (W uninitialized) An undefined value was used as if it were already
    defined.  It was interpreted as a "" or a 0, but maybe it was a mistake.
    To suppress this warning assign a defined value to your variables.
   
    To help you figure out what was undefined, perl tells you what operation
    you used the undefined value in.  Note, however, that perl optimizes your
    program and the operation displayed in the warning may not necessarily
    appear literally in your program.  For example, "that $foo" is
    usually optimized into "that " . $foo, and the warning will refer to
    the concatenation (.) operator, even though there is no . in your
    program.
   
Use of uninitialized value in concatenation (.) or string at codon3.pl line 110, <STDIN> line 2.
Use of uninitialized value in concatenation (.) or string at codon3.pl line
        110, <STDIN> line 2 (#3)
Use of uninitialized value in print at codon3.pl line 111, <STDIN> line 2.
Use of uninitialized value in print at codon3.pl line 111, <STDIN> line 2 (#3)
0
 
TintinCommented:
You can't just concatenate  two scripts together and hope it will work.  My script was based on the original code you supplied, I didn't realise you had a whole lot of other stuff in the script as well.

0
 
ozoCommented:
You declare my $input twice
you never open INFILE, so what are you closing?
you never assign a value to  $myresults or to $sequence
0
 
ozoCommented:
you never assign any value to $newline or $i, so what should substr($newline,$i,3) do?
0
 
thestarcrossedAuthor Commented:
So that was it.. I will work on these, OZO, Tintin.
Thank you so much..
I tried learning  this on my own.

0
 
mjcoyneCommented:
#!/usr/bin/perl -w
use strict;
my $file;

open (IN, "input_dna.fasta") or die "Can't open input_file.fasta: $!\n";

{local $/; $file = <IN>;}

my ($header, $dna) = ($file =~ /(>.+?\n)(.+)/s);

$dna =~ s/\n//g;

my $pep = translate($dna);
my $pep_fasta = make_fasta($pep);

open (OUT, ">output_pep.fasta") or die;
print OUT $header, $pep_fasta;

sub translate {
    my $seq = shift;
    my ($codon, $trans);
    my %code = (
        'NNN' => 'X', 'TCA' => 'S', 'TCC' => 'S', 'TCG' => 'S',
        'TCT' => 'S', 'TTC' => 'F', 'TTT' => 'F', 'TTA' => 'L',
        'TTG' => 'L', 'TAC' => 'Y', 'TAT' => 'Y', 'TAA' => '',
        'TAG' => '', 'TGC' => 'C', 'TGT' => 'C', 'TGA' => '',
        'TGG' => 'W', 'CTA' => 'L', 'CTC' => 'L', 'CTG' => 'L',
        'CTT' => 'L', 'CCA' => 'P', 'CCC' => 'P', 'CCG' => 'P',
        'CCT' => 'P', 'CAC' => 'H', 'CAT' => 'H', 'CAA' => 'Q',
        'CAG' => 'Q', 'CGA' => 'R', 'CGC' => 'R', 'CGG' => 'R',
        'CGT' => 'R', 'ATA' => 'I', 'ATC' => 'I', 'ATT' => 'I',
        'ATG' => 'M', 'ACA' => 'T', 'ACC' => 'T', 'ACG' => 'T',
        'ACT' => 'T', 'AAC' => 'N', 'AAT' => 'N', 'AAA' => 'K',
        'AAG' => 'K', 'AGC' => 'S', 'AGT' => 'S', 'AGA' => 'R',
        'AGG' => 'R', 'GTA' => 'V', 'GTC' => 'V', 'GTG' => 'V',
        'GTT' => 'V', 'GCA' => 'A', 'GCC' => 'A', 'GCG' => 'A',
        'GCT' => 'A', 'GAC' => 'D', 'GAT' => 'D', 'GAA' => 'E',
        'GAG' => 'E', 'GGA' => 'G', 'GGC' => 'G', 'GGG' => 'G',
        'GGT' => 'G',);
   
    for (my $i = 0; $i < (length($seq) - 2) ; $i += 3) {
        $codon = substr($seq, $i, 3);
        $codon = uc $codon;
        $codon = "NNN" if ($codon =~ /N/);
        if (exists $code {$codon}) {
            $trans .= $code{$codon};
        } else {
            warn "Bad codon $codon\n";
        }
    }
    return $trans;
}

sub make_fasta {
    my $seq = shift;
    my $length = length ($seq);
    my ($i, $fasta);
    my $len = 60;
    my $out_pat = "A$len";
    my $whole = int($length/$len) * $len;
   
    for ($i = 0; $i < $whole; $i += $len) {
        my $line = pack ($out_pat, substr($seq, $i, $len)) . "\n";
        $fasta .= $line;
    }
   
    if (my $last = substr($seq, $i)) {
        $fasta .= $last . "\n";
    }
    return $fasta;
}
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

  • 6
  • 4
  • 3
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now