asked on

perl search and replace

Hi - My file looks like below

ISDFFS*00* *00* *30*7984581235 *ZZ*AKLSJFHADSJF *624654*1805*^*00501*3465465*1*T*:~RS*GS*5145454121*AKLSJFHADSJF*624654*1805*7984581235*X*005010X306~ST*820*7984581235*005010X306~SDF*I*0*C*NON************624654~SDFS*3*99999~SDF*582****RD8*20141201-20141231~SWRE*1121326~REF*DOL*100000037~RSRRE*RER*ERER**-2132154021.14~DRERF*582****RD8*20150701-20150731~REF*DOL*100000257~SE*8*7984581235~RT*1*7984581235~IAS*1*7984581235~

Just to explain my problem clearly I split the file into multiple lines using '~' and it looks like below

ISDFFS*00* *00* *30*7984581235 *ZZ*AKLSJFHADSJF *624654*1805*^*00501*3465465*1*T*:~
RS*GS*5145454121*AKLSJFHADSJF*624654*1805*7984581235*X*005010X306~
ST*820*7984581235*005010X306~
SDF*I*0*C*NON************624654~
SDFS*3*99999~
SDF*582****RD8*20141201-20141231~
SWRE*1121326~
REF*DOL*100000037~
RSRRE*RER*ERER**-2132154021.14~
DRERF*582****RD8*20150701-20150731~
REF*DOL*100000257~
SE*8*7984581235~
RT*1*7984581235~
IAS*1*7984581235~

Now I want to replace all occurences line starting with REF*DOL* ....... with other values

first i want to find the occurence of REF*DOL*
example: REF*DOL*100000037~
now I want the number 100000037 and use it for a query like below

select memid from emp where eid = '100000037'

Open in new window

once i got the memid I want to use it for my replace option, lets say the memid returned from the query is 100000123
then i want to replace REF*DOL*100000037~
with
REF*DOL*100000123~REF*AP*100000123~

I want to do this for all occurances, this is a simple file so i have just two REF*DOL* but for big files i have these in 1000's

Can some one provide me perl script for this.

Thanks,

ozo

use DBI;
$dbh = DBI->connect("DBI:mysql:database=$database", $user, $password) or die $!;
while( <> ){
s/(?<=REF\*DOL\*)(\d+)~/@{[@r=$dbh->selectrow_array( "select memid from emp where eid = '$1'")]}~REF*AP*@r~/g;
}

wilcoxon

Can this be done before you split the file into individual lines (or just on the multi-line version of the file data)? Is the multi-line version of the file available in a single var or is it in a list (one line per entry) or something else?

Can a single eid appear in more than one record within the file? If so, for efficiency, you want to only do the db access once. Will all of the eids appear within a file? Is the emp table large? If not, I'd think about just caching the table.

shragi

ASKER

@wilcoxon
your answers
Can this be done before you split the file into individual lines (or just on the multi-line version of the file data)?
YES THERE IS NO NEED TO SPLIT - I JUST SPLITTED HERE TO EXPLAIN MY PROBLEM
Is the multi-line version of the file available in a single var or is it in a list (one line per entry) or something else?
THE FILE I GET HAS ONLY ONE LINE AND ONCE AGAIN NO NEED TO SPLIT, BUT EACH LINE IS SEPARATED BY ~

Can a single eid appear in more than one record within the file?
NO
If so, for efficiency, you want to only do the db access once. Will all of the eids appear within a file?
IF THAT IS POSSIBLE I WANT TO ACCESS DB ONCE BUT IS IT POSSIBLE
Is the emp table large? If not, I'd think about just caching the table.
YES IT IS AROUND 500K RECORDS AND 60 COLUMNS, ALSO ITS JUST NOT EMPID TABLE
I GAVE THAT AS EXAMPLE THERE IS A BIG QUERY THAT I WILL RUN FOR THIS.

Thanks,

shragi

ASKER

@ozo
i got compilation errors, can you please check my attached perl script and let me know where did i go wrong

ERRORS:
Possible unintended interpolation of @r in string at line 40
global symbol @r requires explicit package name at lien 40
perlscript.txt

ASKER CERTIFIED SOLUTION

ozo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

shragi

ASKER

thanks ozo - that worked.
I want a small extension to this, when i ran the query i got multiple results so to filter out and get just one result i thought of adding one add more condition to where in sql.
but for that i want to grab a value from the text file.

Going back to previous example:
ISDFFS*00* *00* *30*7984581235 *ZZ*AKLSJFHADSJF *624654*1805*^*00501*3465465*1*T*:~
RS*GS*5145454121*AKLSJFHADSJF*624654*1805*7984581235*X*005010X306~
ST*820*7984581235*005010X306~
SDF*I*0*C*NON************624654~
SDFS*3*99999~
SDF*582****RD8*20141201-20141231~
SWRE*1121326~
NM1*IL*1*BOURNE*BOND*L***C1*100000257~
REF*DOL*100000037~
RSRRE*RER*ERER**-2132154021.14~
DRERF*582****RD8*20150701-20150731~
REF*DOL*100000257~
SE*8*7984581235~
RT*1*7984581235~
IAS*1*7984581235~

There is this below line before 4 lines of REF*DOL which gives user last name and first name
NM1*IL*1*BOURNE*BOND*L***C1*100000257~

here bourne is lastname
and bond is first name
i want both to filter out, so that i can write query as below

select DISTINCT member_id
from relation as cr
join Contacts as c on c.id = cr.id and c.status = cr.status
join Coverage as ec on ec.id = cr.id and ec.status = cr.status
where cr.status = 'A' and ec.member_id = '$1' and lastname = 'bourne' and firstname = 'bond'

the important point is i want only the name from NM1*IL* which is 4 lines before REF*DOL*
these things can be repeated and i mean in file there are multiple sections like this...
so each NM1*IL* is associated to REF*DOL*
so if i find NM1*IL* then i use the name from this to update the immediate REF*DOL*

is this a quick fix.