Modifying a perl script

Hi there,

I'm trying to modify an existing perl script as the one of the text files it uses has been modified.

The original script uses SpeciesId.txt, a file in the following format:

African Elephant

B = 1453.7
C = 1577.8
D = 2115.1
E = 2808.4
F = 2853.5 AND 2869.5
G = 2999.4 AND 3015.4

Indian Elephant

B = 1453.7
C = 1577.8
D = 2115.1
E = 2808.4
F = 2853.5 AND 2869.5
G = 2999.4 AND 3015.4

Open in new window


However, I'd like it to use SpeciesID3.txt

A file in the following format (basically no letters [A =, B = etc] ano no "AND" instances):

African Elephant

826.4
836.4
840.4
852.4
858.4
886.4
892.5
898.5
904.5
920.5
950.5
1001.5
1015.5
etc....

Alpaca

785.4
809.4
836.4
837.4
840.5
841.4
852.4
853.4
868.5
869.4
886.4
892.5
898.5
899.5
908.5
etc....

Open in new window


Would this be an easy modification to make to the existing script (Script.txt)?

Thanks,

Stephen.
SpeciesId.txt
SpeciesId3.txt
Script.txt
StephenMcGowanAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ozoCommented:
sub _init {
    open my $in, '<', 'SpeciesId3.txt' or die "could not open SpeciesId3.txt: $!";
    my $spec;
    while (<$in>) {
        chomp;
        next if /^\s*$/; # skip blank lines
        if (m{^([A-Z]?)\s*=?\s*(\d+(?:\.\d)?)(?:\s+AND\s+(\d+(?:\.\d)?))?$}) {
            # handle letter = lines
            push @{$species{$spec}{$1}}, $2;
            push @{$species{$spec}{$1}}, $3 if $3;
        } else {
            # handle species name lines
            $spec = $_;
            $len = length($spec) if (length($spec) > $len);
        }
    }
    close $in;
}
0
ozoCommented:
The modification in http:#a39469769 assumed that
African Elephant

826.4
836.4
840.4
852.4
858.4
886.4
892.5
898.5
904.5
920.5
950.5
1001.5
1015.5
etc....
should be treated like
African Elephant

 = 826.4 AND 836.4 AND 840.4 AND 852.4 AND 858.4 AND 886.4 AND 892.5 AND 898.5 AND 904.5 AND 920.5 AND 950.5 AND 1001.5 AND 1015.5
etc....
Is that correct?
If not, can a SpeciesId3.txt line be translated to an equivalent SpeciesId.txt line?
What would a correct translation of a SpeciesId3.txt line be?
0
wilcoxonCommented:
Looking at your code change (and leaving the AND regex alone), I think you are actually treating it as:
African Elephant
A = 826.4
B = 836.4
C = 840.4
...

Open in new window

because the optional AND portion of the regex will never match so $3 will never be anything.

I'm assuming is the correct treatment...
0
Build an E-Commerce Site with Angular 5

Learn how to build an E-Commerce site with Angular 5, a JavaScript framework used by developers to build web, desktop, and mobile applications.

StephenMcGowanAuthor Commented:
Hey,

Sorry, I should have explained better.

So, the original file had data which looked like this:

African Elephant

B = 1453.7
C = 1577.8
D = 2115.1
E = 2808.4
F = 2853.5 AND 2869.5
G = 2999.4 AND 3015.4

Open in new window


The difference with the new, updated file is that:

1) the letters for each mass have been removed (i.e. B = 1453.7, C = 1577.8), and instead it is just a long list of masses without identification letters preceding them for each line.
2) there are a lot more masses for each animal (where the above there are 6, the new file has roughly 100 per animal
3) the whole AND command is no longer required and can be written out of the script. There are no longer cases where two masses are required to generate a +1 score

So, in essence all the script is doing is taking a CSV data file, taking the masses from the CSV file and running through each of these animal lists looking for matches.
The top 5 matches will have match scores assigned and percentage % matches (calculated from the number of masses contained in the individual animal list).

Thanks for getting back to me, sorry if this has caused confusion!

Hope this is clearer!
0
ozoCommented:
my $Z='Z';
sub _init {
    open my $in, '<', 'SpeciesId3.txt' or die "could not open SpeciesId3.txt: $!";
    my $spec;
    while (<$in>) {
        chomp;
        next if /^\s*$/; # skip blank lines
        if (m{^([A-Z]?)\s*=?\s*(\d+(?:\.\d)?)(?:\s+AND\s+(\d+(?:\.\d)?))?\s*$}) {
            # handle letter = lines
            push @{$species{$spec}{$1||++$Z}}, $2;
            push @{$species{$spec}{$1||$Z}}, $3 if $3;
        } else {
            # handle species name lines
            $spec = $_;
            $len = length($spec) if (length($spec) > $len);
        }
    }
    close $in;
}
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
StephenMcGowanAuthor Commented:
Here's a simple example.

So the CSV file has 12 masses.

It then checks the masses against the SpeciesID text file:

Hippo

2
3
5
6
7
32
94
109

Rhino

2
3
5
12
24
33
58

Hyena

1
4
30
34
56
60
64


Penguin

1
2
4
9
12
30
32
33
36
50
58
60

Ox

1
3
4
13
70

Open in new window



Matching against the mass column in the CSV file, the scores would be like this:

Hippo

2 [+1 score]
3
5
6
7
32 [+1 score]
94
109

Rhino

2 [+1 score]
3
5
12 [+1 score]
24
33 [+1 score]
58 [+1 score]

Hyena

1 [+1 score]
4 [+1 score]
30 [+1 score]
34
56
60 [+1 score]
64


Penguin

1 [+1 score]
2 [+1 score]
4 [+1 score]
9 [+1 score]
12 [+1 score]
30 [+1 score]
32 [+1 score]
33 [+1 score]
36 [+1 score]
50 [+1 score]
58 [+1 score]
60 [+1 score]

Ox

1 [+1 score]
3
4 [+1 score]
13
70

Open in new window


So... the masses in CSV file A5 are most likely to be penguin, as it's matched up 100%, followed by Rhino and Hyena with 4 scores each (both have lists of 7 masses long, so would score a match percentage of 57% each).

So top five for A5 would be:

1) Penguin with 13 matches and a score of 100%
2) Rhino   4 matches   57%
3) Hyena  4 matched  57%
4) Ox
5) Hippo


Just a simplified example of what I'm trying to achieve out of this script.
20130730-p12-A5-EXAMPLE.csv
SpeciesID-EXAMPLE.txt
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.