[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 368
  • Last Modified:

Help with perl script to convert UIEE file to csv

Hi Experts,

I have a huge file (308mb) which I believe to be in the UIEE format. An example of one entry....

**START
IB 0702026174
BI Hardback
AU EVANS, WILLIAM CHARLES
BC GBC
CO SCOTLAND
EI 15 REV ED
IU 275 ILLS.
PD 20011213
NP 600
RP 68.99
RI 68.99
RE 68.99
PU ELSEVIER HEALTH SCIENCES
YP 2001
RC U
RS TERTIARY EDUCATION (US: COLLEGE)
TI TREASE AND EVANS PHARMACOGNOSY
DE Serves as the encyclopaedic reference work on pharmacognosy, the study of those natural substances, principally plants, that find a use in medicine. This book balances between classical and modern aspects of this branch of science, and covers the importance of complementary medicines, including herbal, homeopathic and aromatherapy.
EA 9780702026171
RF R
WE 1780
SG 1
GC O00
I3 9780702026171
**

Obviously it's all book related :) All entries follow the same format, with the first 2 characters indicating the column header. Now my ideal is to convert this file to a csv with one set of column headings. I've been given a perl script by a colleague, and he believes it was designed to do what I want but has never got it to work, or even know if it does. The script is as follows. Would this script work? If not does it need to be tweaked or a do over from scratch?

#! /usr/bin/perl -w
 
@columns = ();
@values = ();
$query = "";
 
while( $line = <STDIN> ){
    $line =~ s/\n//g;
    if( $line =~ /BOOS\|/ ){
        @columns = ();
        @values = ();
        while( $line !~ /BOOE\|/ ){
            $line = <STDIN>;
            $line =~ s/\n//g;
            if( $line =~ /(.+)\|(.+)/ ){
                push( @columns, $1 );
                push( @values, $2 );
            }
        }
        $query = "INSERT INTO books\n(";
        foreach $column ( @columns ){
            $query .= " `$column`,\n";
        }
        chop( $query );
        chop( $query );
        $query .= " )\nVALUES\n(";
        foreach $value ( @values ){
            $value =~ s/\'/\\\'/g;
            $value =~ s/\r//g;
            $query .= " \'$value\',\n";
        }
        chop( $query );
        chop( $query );
        $query .= " );\n";
        print $query;
    }
}

Open in new window

0
fyreheart
Asked:
fyreheart
  • 5
  • 5
2 Solutions
 
ozoCommented:
If we change $line =~ /(.+)\|(.+)/
to $line =~ /(.+)\s+(.+)/
and add the line
BOOS|
to the start of the file, and
BOOE|
to the end of the file
then running the program produces

INSERT INTO books
( `IB`,
 `BI`,
 `AU EVANS, WILLIAM`,
 `BC`,
 `CO`,
 `EI 15 REV`,
 `IU 275`,
 `PD`,
 `NP`,
 `RP`,
 `RI`,
 `RE`,
 `PU ELSEVIER HEALTH`,
 `YP`,
 `RC`,
 `RS TERTIARY EDUCATION (US:`,
 `TI TREASE AND EVANS`,
 `DE Serves as the encyclopaedic reference work on pharmacognosy, the study of those natural substances, principally plants, that find a use in medicine. This book balances between classical and modern aspects of this branch of science, and covers the importance of complementary medicines, including herbal, homeopathic and`,
 `EA`,
 `RF`,
 `WE`,
 `SG`,
 `GC`,
 `I3` )
VALUES
( '0702026174',
 'Hardback',
 'CHARLES',
 'GBC',
 'SCOTLAND',
 'ED',
 'ILLS.',
 '20011213',
 '600',
 '68.99',
 '68.99',
 '68.99',
 'SCIENCES',
 '2001',
 'U',
 'COLLEGE)',
 'PHARMACOGNOSY',
 'aromatherapy.',
 '9780702026171',
 'R',
 '1780',
 '1',
 'O00',
 '9780702026171' );

Is that what you wanted it to do?
0
 
fyreheartAuthor Commented:
Thankyou for the response as I have no idea with scripting! The output file is not quite want I had hoped for. I would like it to seperate out the first 2 letters and produce something more along the lines of....

INSERT INTO books
( `IB`,
 `BI`,
 `AU`,
 `BC`,
 `CO`,
 `EI`,
 `IU`,
 `PD`,
 `NP`,
 `RP`,
 `RI`,
 `RE`,
 `PU`,
 `YP`,
 `RC`,
 `RS`,
 `TI`,
 `DE`,
 `EA`,
 `RF`,
 `WE`,
 `SG`,
 `GC`,
 `I3` )
VALUES
( '0702026174',
 'Hardback',
 'CHARLES EVANS, WILLIAM',
 'GBC',
 'SCOTLAND',
 '15 REV ED',
 '275 ILLS.',
 '20011213',
 '600',
 '68.99',
 '68.99',
 '68.99',
 'ELSEVIER HEALTH SCIENCES',
 '2001',
 'U',
 'TERTIARY EDUCATION (US: COLLEGE)',
 'TREASE AND EVANS PHARMACOGNOSY',
 'Serves as the encyclopaedic reference work on pharmacognosy, the study of those natural substances, principally plants, that find a use in medicine. This book balances between classical and modern aspects of this branch of science, and covers the importance of complementary medicines, including herbal, homeopathic and aromatherapy.',
 '9780702026171',
 'R',
 '1780',
 '1',
 'O00',
 '9780702026171' );

Is that do-able within the script?
0
 
ozoCommented:
$line =~ /(.+?)\s+(.+)/
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
nitinsawhneyCommented:
Hi,

It is not feasible to edit an 308 MB file and change the start and end notation of the records as suggested by ozo in the first comment.
The following code is all you would need to parse your file. It assumes that the INPUT file is named as input.txt.

Cheers
#! /usr/bin/perl -w
 
@columns = ();
@values = ();
$query = "";
 
open(INFILE,'< input.txt') || die "File not found";
while( $line = <INFILE> ){
    $line =~ s/\n//g;
    if( $line =~ /^\*\*START$/ ){
        @columns = ();
        @values = ();
        while( $line !~ /^\*\*$/ ){
            $line = <INFILE>;
            $line =~ s/\n//g;
            if( $line =~ /^(.{2})\s+?(.+)$/ ){
                push( @columns, $1 );
                push( @values, $2 );
            }
        }
        $query = "INSERT INTO books\n(";
        foreach $column ( @columns ){
            $query .= " `$column`,\n";
        }
        chop( $query );
        chop( $query );
        $query .= " )\nVALUES\n(";
        foreach $value ( @values ){
            $value =~ s/\'/\\\'/g;
            $value =~ s/\r//g;
            $query .= " \'$value\',\n";
        }
        chop( $query );
        chop( $query );
        $query .= " );\n";
        print $query;
    }
}
close INFILE;

Open in new window

0
 
ozoCommented:
Assuming that
**START
in the file in meant to take the place of BOOS|  and
**
takes the place of BOOE|

perl -ne 'BEGIN{$/="**START\n"}print "INSERT INTO books\n(`",join("`,\n `",keys %h),"` )\nVALUES\n(`",join("`,\n `",values %h),"` );\n" if %h=/^(\w\w)\s+(.*)/gm;' input.txt
0
 
fyreheartAuthor Commented:
Ok so now for an entirely dense question, at the end of the program running will it then output to a csv? I see the insert command..but I just wanna check I don't need an already created file for it to output to or an open file of some sort?

Many thanks to both of you for your assistance so far!
0
 
ozoCommented:
running the program will print

INSERT INTO books
( `IB`,
 `BI`,
 `AU`,
 `BC`,
 `CO`,
 `EI`,
 `IU`,
 `PD`,
 `NP`,
 `RP`,
 `RI`,
 `RE`,
 `PU`,
 `YP`,
 `RC`,
 `RS`,
 `TI`,
 `DE`,
 `EA`,
 `RF`,
 `WE`,
 `SG`,
 `GC`,
 `I3` )
VALUES
( '0702026174',
 'Hardback',
 'CHARLES EVANS, WILLIAM',
 'GBC',
 'SCOTLAND',
 '15 REV ED',
 '275 ILLS.',
 '20011213',
 '600',
 '68.99',
 '68.99',
 '68.99',
 'ELSEVIER HEALTH SCIENCES',
 '2001',
 'U',
 'TERTIARY EDUCATION (US: COLLEGE)',
 'TREASE AND EVANS PHARMACOGNOSY',
 'Serves as the encyclopaedic reference work on pharmacognosy, the study of those natural substances, principally plants, that find a use in medicine. This book balances between classical and modern aspects of this branch of science, and covers the importance of complementary medicines, including herbal, homeopathic and aromatherapy.',
 '9780702026171',
 'R',
 '1780',
 '1',
 'O00',
 '9780702026171' );

what you do with that output is up to you.
0
 
fyreheartAuthor Commented:
Ok so now I'm entirely confused...not a difficult thing to do!

If I run the code nitinsawhney suggested, my dos screen scrolls endless through

INSERT INTO books )
VALUES );
INSERT INTO books )
VALUES );

You get the idea...but doesn't appear to output anything.

Ozo....where should your code (below) be put?

perl -ne 'BEGIN{$/="**START\n"}print "INSERT INTO books\n(`",join("`,\n `",keys %h),"` )\nVALUES\n(`",join("`,\n `",values %h),"` );\n" if %h=/^(\w\w)\s+(.*)/gm;' input.txt
0
 
ozoCommented:
> my dos screen scrolls endless through
Does input.txt contain endless
**START
**


> where should your code (below) be put?
type it (or cut and paste it) into the dos command shell
But in a dos command line you would have to use " instead of '
perl -ne "BEGIN{$/='**START'}print qq'INSERT INTO books\n(`',join(qq'`,\n `',keys %h),qq'` )\nVALUES\n(`',join(qq'`,\n `',values %h),qq'` );\n' if %h=/^(\w\w)\s+(.*)/gm;" input.txt
0
 
fyreheartAuthor Commented:
Have had to abandon this script due to a problem discovered within the input data. Thanks for all your help.
0
 
fyreheartAuthor Commented:
Thanks for  your help, however have had to abandon this script as it was discovered there was a problem with the data.
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

  • 5
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now