Modifying perl code

Hi there,

I have an excel CSV file which contains two columns:

Column A: mass
Column B: intensity


I also have a section of code which I'm currently looking to modify:

# record all masses from the file
    my %masses;
    while (<$in>) {
        chomp;
        # skip header line
        next if m{mass.*intensity};
        my ($mass) = split /,/;
        unless ($mass =~ m{^\d+(?:\.\d+)$}) {
            warn "mass ($mass) not a recognized number - skipping";
            next;
        }
        $mass = round($mass);
        $masses{$mass}++;
    }
    close $in;
    # pass masses hash to subroutine
    my $data = analyze(\%masses);
    output($wellposition, $data);
}

close $out;

Open in new window


At the moment, the code records all of the masses from the file.

I'm looking to change the code so that:

1) the script sorts the CSV file into "intensity" order: highest to lowest. So initially it focuses on column B.

2) the script then uses the "mass" values (column A) for the first 50 intensity values

for example the top 5 would work like this:

mass       intensity
0.3               7
4                 0.8
5                 0.1
0.8               4
1.9               9
2.6               2
3                 5.6
2                 3.2

1) sort into Intensity Order (Highest first)

mass       intensity
1.9               9
0.3               7
3                 5.6
0.8               4
2                 3.2
2.6               2
4                 0.8
5                 0.1

2) Take the first 5 masses from the list

mass
1.9
0.3
3
0.8
2

This is just a small example, however, I'd be doing this for the first 50.

Thanks,

Stephen.
StephenMcGowanAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ozoCommented:
# record all masses from the file
    my %masses;
    while (<$in>) {
        chomp;
        # skip header line
        next if m{mass.*intensity};
        my ($mass,$intensity) = split /,/;
        unless ($mass =~ m{^\d+(?:\.\d+)$}) {
            warn "mass ($mass) not a recognized number - skipping";
            next;
        }
       push @top,[$mass,$intensity];

    }
    close $in;
   $masses{round($_->[0])}++ for (sort{$b->[1]<=>$a->[1]}@top[0..49])[0..4];
    # pass masses hash to subroutine
   
    my $data = analyze(\%masses);
    output($wellposition, $data);
}

close $out;
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
StephenMcGowanAuthor Commented:
Hi ozo,

Thanks for getting back to me.

I ran the script and was given two errors regarding @top:

"Global symbol "@top" requires explicit package name at Id_script4.pl line 70"

which is this line:

 push @top,[$mass,$intensity];

and

"Global symbol "@top" requires explicit package name at Id_script4.pl line 74"

which is this line:

"$masses{round($_->[0])}++ for (sort{$b->[1]<=>$a->[1]}@top[0..49])[0..4];"



Stephen.
0
ozoCommented:
echo 'Global symbol "@top" requires explicit package name at Id_script4.pl line 70' | splain
Global symbol "@top" requires explicit package name at Id_script4.pl line 70 (#1)
    (F) You've said "use strict" or "use strict vars", which indicates
    that all variables must either be lexically scoped (using "my" or "state"),
    declared beforehand using "our", or explicitly qualified to say
    which package the global variable is in (using "::")


I did not include the declaration because it seemed like the section of code I was modifying  
was also missing some declarations.   Without the rest of the code, I did not know whether you would want to put the new declaration together with whatever other declarations you might have, or even whether you were using strict vars.
0
StephenMcGowanAuthor Commented:
Ohh I think I see...

The complete script is shown below, would I need to declare @top with a "my @top =" statement?

#!/usr/bin/perl
use strict;
use warnings;

my $len = 0; # hack global because it's simpler

##########################################################################
#Script to identify animal species using monoisotopic peak markers against
#MS data
##########################################################################

# forward slashes in dir name should work
my $dir = 'C:/Users/Stephen/Desktop/test/relmonopeaklists';
chdir $dir or die "could not cd to $dir: $!";

# create or overwrite SpeciesId
open my $out, '>', 'SpeciesId' or die "could not write SpeciesId: $!";

##########################################################################
#FILE HANDLING
##########################################################################

# get the list of csv files
opendir DIR, '.' or die "could not open dir: $!";
my @files = sort grep m{^\d+_\w+_[A-P]\d+\.csv$}, readdir DIR;
closedir DIR;

####################
#FOR EACH CSV FILE:
####################

foreach my $fil (@files) {
    # get wellposition from filename
    my ($wellposition) = $fil =~ m{^\d+_\w+_([A-P]\d+)\.csv$};
    open my $in, '<', $fil or die "could not open $fil: $!";
    
# record all masses from the file
#    my %masses;
#    while (<$in>) {
#        chomp;
#        # skip header line
#        next if m{mass.*intensity};
#        my ($mass) = split /,/;
#        unless ($mass =~ m{^\d+(?:\.\d+)$}) {
#            warn "mass ($mass) not a recognized number - #skipping";
#            next;
#        }
#        $mass = round($mass);
#        $masses{$mass}++;
#    }
#    close $in;
#    # pass masses hash to subroutine
#    my $data = analyze(\%masses);
#    output($wellposition, $data);
#}
#
#close $out;

# record all masses from the file
    my %masses;
    while (<$in>) {
        chomp;
        # skip header line
        next if m{mass.*intensity};
        my ($mass,$intensity) = split /,/;
        unless ($mass =~ m{^\d+(?:\.\d+)$}) {
            warn "mass ($mass) not a recognized number - skipping";
            next;
        }
       push @top,[$mass,$intensity];

    }
    close $in;
   $masses{round($_->[0])}++ for (sort{$b->[1]<=>$a->[1]}@top[0..49])[0..4];
    # pass masses hash to subroutine
   
    my $data = analyze(\%masses);
    output($wellposition, $data);
}

close $out;

##########################################################################
#SUB-ROUTINES
##########################################################################

sub round {
    my ($num) = @_;
    my ($start, $dig) = $num =~ m{^(\d+(?:\.\d)?)(\d)?};
    $start += 0.1 if (defined $dig and $dig >= 5);
    # XXX - you probably want one of these two uncommented
    # remove .0 from end of number
    # $start =~ s{\.0$}{};
    #add .0 to end of number if no decimal
    $start .= '.0' unless ($start =~ m{\.\d$});
    return $start;
}

# main sub
{ # closure
# keep %species local to sub-routine but only init it once
my %species;

my $Z='Z';
sub _init {

    open my $in, '<', 'Species_Int.txt' or die "could not open Species_Int.txt: $!";
    my $spec;
    while (<$in>) {
        chomp;
        next if /^\s*$/; # skip blank lines
        if (m{^([A-Z]?)\s*=?\s*(\d+(?:\.\d)?)(?:\s+AND\s+(\d+(?:\.\d)?))?\s*$}) {
            # handle letter = lines
            push @{$species{$spec}{$1||++$Z}}, $2;
            push @{$species{$spec}{$1||$Z}}, $3 if $3;
        } else {
            # handle species name lines
            $spec = $_;
            $len = length($spec) if (length($spec) > $len);
        }
    }
    close $in;
}

sub analyze {
    my ($masses) = @_;
    _init() unless %species;
    my %data;
    # loop over species entries
SPEC:
    foreach my $spec (keys %species) {
        # loop over each letter of a species
LTR:
        foreach my $ltr (keys %{$species{$spec}}) {
            # loop over each mass for a letter
            foreach my $mass (@{$species{$spec}{$ltr}}) {
                # skip to next letter if it is not found
                next LTR unless exists($masses->{$mass});
            }
            # if we get here, all mass values were found for the species/letter
            $data{$spec}{cnt}++;
        }
    }
    # add percentages
    foreach my $spec (keys %data) {
        $data{$spec}{pct} = round($data{$spec}{cnt} / scalar(keys %{$species{$spec}}) * 100);
    }
    return \%data;
}
} # end closure

##########################################################################
#RESULTS
##########################################################################

{ # begin closure
my $data;
sub _cust_sort {
    if ($data->{$b}{pct} == $data->{$a}{pct}) {
        return $data->{$b}{cnt} <=> $data->{$a}{cnt};
    }
    return $data->{$b}{pct} <=> $data->{$a}{pct};
}
sub output {
    my $wellposition = shift;
    $data = shift;
    my @order = sort _cust_sort keys %$data;
    print {$out} "Wellposition ($wellposition) Results:\n\n",
                 "Top 5 Species Identities:\n";
    # print out the top 5
    for my $i (0..4) {
        my $spec = $order[$i];
        unless ($order[$i]) {
            print "no more matches\n";
            last; # exit loop
        }
        printf {$out} "%d) %-${len}s  %d matches  %0.1f%%\n", $i+1, $spec, $data->{$spec}{cnt}, $data->{$spec}{pct};
    }
}
} # end closure

Open in new window

0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.