• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 386
  • Last Modified:

Perl Script to split the text

Hi,

I want to find the identification number from the below text, so I trying to write a script for it
Can some one help me with this.

Example Text:

$_ = "ERP*ABBB*TEXT*ME*205498748565460$MAIN1*87*2$M3*APT 21$M4*ERIE*PA*54654572$KAYAK*21312*123*12312*123$BROAD*P*18*SDFGASDF******ASDF$MAIN1*PA*1231*ASDFSDF*ARWTFGADF****BDFBV*140056546$ASDF*ASDF8*112157409*ASDF$MAIN1*ASDF*2*FASDFA*****ASDFASDF*65748756$ASDFAS*ASDFDAS*ASDFASD"

I split the above text using "$"

using the script below

my @values = split('$', $_);
  foreach my $val (@values) {
    print "$val\n";
  }

The output is below:

ERP*ABBB*TEXT*ME*205498748565460
MAIN1*87*2
M3*APT 21
M4*ERIE*PA*54654572
KAYAK*21312*123*12312*123
BROAD*P*18*SDFGASDF******ASDF
MAIN1*PA*1231*ASDFSDF*ARWTFGADF****BDFBV*140056546
ASDF*ASDF8*112157409*ASDF
MAIN1*ASDF*2*FASDFA*****ASDFASDF*65748756
ASDFAS*ASDFDAS*ASDFASD

Now I had main problem I want to extract emp id using the below condition
1) The emp id can be found after 9th '*' of a string that starts with MAIN1
2) But in the above example there are more than one "MAIN1" so the other condition is
the emp id exists in "MAIN1" which is followed by two strings that starts with "KAYAK" and "BROAD"
so in the above example the emp id is in the below string

MAIN1*PA*1231*ASDFSDF*ARWTFGADF****BDFBV*140056546

because the two strings that are before this line are

KAYAK*21312*123*12312*123
BROAD*P*18*SDFGASDF******ASDF

which starts with 'KAYAK' and 'BROAD'

so the string after 9th '*' in the below string is out emp id which is 140056546
MAIN1*PA*1231*ASDFSDF*ARWTFGADF****BDFBV*140056546

I want to extract that emp id...how can achieve that...

Thanks for the help
shragi
0
shragi
Asked:
shragi
  • 3
  • 2
  • 2
2 Solutions
 
mankowitzCommented:
Start with something like this:

\$KAYAK.*\$BROAD.*\$MAIN1(\*[^*]*){9}(\d{9})

Open in new window


You are looking for that last regex match.
0
 
wilcoxonCommented:
use strict;
use warnings;
my $str = 'ERP*ABBB*TEXT*ME*205498748565460$MAIN1*87*2$M3*APT 21$M4*ERIE*PA*54654572$KAYAK*21312*123*12312*123$BROAD*P*18*SDFGASDF******ASDF$MAIN1*PA*1231*ASDFSDF*ARWTFGADF****BDFBV*140056546$ASDF*ASDF8*112157409*ASDF$MAIN1*ASDF*2*FASDFA*****ASDFASDF*65748756$ASDFAS*ASDFDAS*ASDFASD';
my @lines = split /\$/, $str;
my $empid;
for my $i (2..@lines-1) {
    if ($lines[$i] =~ /^MAIN1\*/ and $lines[$i-1] =~ /^BROAD\*/ and $lines[$i-2] =~ /^KAYAK\*/) {
        $empid = (split /\*/, $lines[$i])[9];
    }
}
# this is not necessary if there is always a MAIN1 preceded by KAYAK and BROAD
# I wasn't clear from your description if there could be a record without the KAYAK and BROAD
unless ($empid) {
    while (@lines and not $empid) {
        if ($lines[0] =~ /^MAIN1\*/) {
            $empid = (split /\*/, $lines[0])[9];
        }
        shift @lines;
    }
}

print "empid = $empid\n";

Open in new window

0
 
shragiAuthor Commented:
@wilcoxon
your solution works and yes there is always a MAIN1 preceded by KAYAK and BROAD
Now a small expansion to this... I am trying to store these values in an array becoz later i want to compare these empID with another similar array of empID's

so i updated the script like below

use strict;
use warnings;
my $str = 'ERP*ABBB*TEXT*ME*205498748565460$MAIN1*87*2$M3*APT 21$M4*ERIE*PA*54654572$KAYAK*21312*123*12312*123$BROAD*P*18*SDFGASDF******ASDF$MAIN1*PA*1231*ASDFSDF*ARWTFGADF****BDFBV*140056546$ASDF*ASDF8*112157409*ASDF$MAIN1*ASDF*2*FASDFA*****ASDFASDF*65748756$ASDFAS*ASDFDAS*ASDFASD';
my @lines = split /\$/, $str;
my $empid;
my @empArray;
for my $i (2..@lines-1) {
    if ($lines[$i] =~ /^MAIN1\*/ and $lines[$i-1] =~ /^BROAD\*/ and $lines[$i-2] =~ /^KAYAK\*/) {
        $empid = (split /\*/, $lines[$i])[9];
 push(@empArray, $empid);
    }
}

foreach(@empArray)
      {
                      print "empid = $_ -> {empid}\n";
      }

but i got runtime error - no clear description abt the error to post it here
so how can I expand this to array (need not be an array any data structure works)
at the end of the day I collect empID's from this file and empID's from another file and find out the missing ones.
Is there any simple way to do this ?

Thanks,
Shragi
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
shragiAuthor Commented:
@wilcoxon

I fixed my compilation error....
that leaves to one question is there a simple way to compare two arrays and find the missing ones...
one way of doing it is comparing one by one... using foreach(@employes)
but I think there should be a simple approach to do ...

1) compare two arrays and list out the employees that are in array 1 but not in array 2

Thanks,
0
 
mankowitzCommented:
This sounds like another question...

but here's a hint:    

use Array::Utils qw(:all);
array_minus( @a, @b );
0
 
shragiAuthor Commented:
yup but thank you all
i found this link and i think that will help

http://stackoverflow.com/questions/2933347/comparing-two-arrays-using-perl


Thanks
0
 
wilcoxonCommented:
Simplest way to compare two arrays of scalars is actually to convert them to hashes:
%hash1 = map { $_ => 1 } @arr1;
%hash2 = map { $_ => 1 } @arr2;
foreach my $key (keys %hash1) {
    if (exists $hash2{$key}) {
        delete $hash2{$key};
    } else {
        print "$key missing from hash2\n";
    }
}
foreach my $key (keys %hash2) {
    print "$key missing from hash1\n";
}

Open in new window

0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

  • 3
  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now