?
Solved

File 're-write'

Posted on 2000-01-17
19
Medium Priority
?
159 Views
Last Modified: 2010-03-05
I have a file which has repetetive entries like this

<STARTSIG>
Name
Location
DATE TIME
email
URL
Message
<STOPSIG>

Its a guestbook which i am porting to a mySQL based solution. The 'message' Can be multiple lines, all others are single line.

I would like to convert this to something more useful, like a single line entry:

'name','Location','etc','etc','etc'

Note that all variables can contain pretty much any characters (including international ones). Also note that i want the start/stop sigs stripped.

Someone up for this?
(platform is unix if that makes a diff)

TIA

Jan - Decent in mySQL / PHP... awful in perl :)
0
Comment
Question by:j2
  • 7
  • 5
  • 4
  • +1
19 Comments
 
LVL 2

Expert Comment

by:ventolin
ID: 2360366
# slurp your file into $sig_file and then:

&loopidty_loop;

sub loopidty_loop {
        my ($guts) = ($sig_file =~ m#<STARTSIG>\s*(.*?)\s*<STOPSIG>#is);
        # pass guts to another sub
        &parsethisandstoreit("$guts");
        if ($sig_file =~ /./) {
                $sig_file =~ s/<STARTSIG>\s*$guts\s*<STOPSIG>//i;
                &loopidty_loop;
                }
        }


sub parsethisandstoreit {
        # here $guts has the content of each guestbook entry
        # parse this out and do with it as you please
        my $guts = shift;
        print $guts;
        }
0
 
LVL 1

Expert Comment

by:ilia
ID: 2360389
The fact that the message can be multi-line is kinda hard, but here's my try assuming URL comes right before Message:

my @records = ();
my $record = '';
my $msg_flag = 0;
my $msg = '';

open (FILE, $file) or die $!;
while (<FILE>) {
  if ( $_ !~ "<STARTSIG>" ) {
    if ( msg_flag ) {
       if ( msg eq '' ) { msg = "'$_"; }
       else { msg .= $_; }
    }
    else { $record .= "'$_',"; }
  }
  if ( $_ =~ /^http/ ) {
    $msg_flag = 1; next;
  }
  if ( $_ =~ "<STOPSIG>" ) {
    $record .= "'\n";
    push (@records, $record);
    $msg_flag = 0;
    $record = ''; msg = '';
  }
}
close (FILE);

open (FILE, ">$file.new") or die $!;
print FILE @records;
close (FILE);
0
 
LVL 12

Author Comment

by:j2
ID: 2360393
Uhm.. slurp?

consider me a total newbie here, how do i get the above code to do what i want? or does it? :)
0
The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

 
LVL 12

Author Comment

by:j2
ID: 2360402
ilja: message follows dorectly on the line after URL so that is right, however, url contains "http://" if no url was given, does that help or complicate?
0
 
LVL 2

Expert Comment

by:ventolin
ID: 2360431
slurp, in this case, is slang for bringing the entire contents of a file into the variable $sig_file.

this could be a challenge for you if you have little experience, but the code provided here should get you in the right direction.

for the code i posted:

open your sig file.
bring the contents in to $sig_file
now paste in the code i posted.
this loops throught the file and gets the content of each
guestbook entry
edit parsethisandstoreit where commented and do whatever you please with this data. to change it as mentioned, you will need to parse $guts to get each entry using regular expressions. if you are new to those as well, pick up the book mastering regular expressions or go to http://www.perl.com/reference/query.cgi?section=regexp&x=5&y=11


0
 
LVL 1

Expert Comment

by:ilia
ID: 2360440
j2: it helps as it validates my code dependencies.

I'm not sure if ventolin's solution scales to multi-line messages..., but to slurp a file, use this code:

open(FILE, $file) or die $!;
$$sig_file = do { local $/; <FILE> };
close(FILE);


0
 
LVL 3

Expert Comment

by:guadalupe
ID: 2360445
#!/usr/local/bin/perl


open(DATA, "guest.txt");

open(OUT, ">guest2.txt");

while ($line = <DATA>)
{
      if ($line =~ /<STARTSIG>/)
      {
            $name = <DATA>;
            chomp($name);
            $location = <DATA>;
            chomp($location);
            $date = <DATA>;
            chomp($date);
            $email = <DATA>;
            chomp($email);
            $url = <DATA>;
            chomp($url);

            undef($msg_line);
            undef($message);

            until ($msg_line =~ /<STOPSIG>/)
            {
                  $message .= " $msg_line";
                  $msg_line = <DATA>;
                  chomp($msg_line);
            }

            
            print OUT "'$name','$location','$date','$email','$url','$message'\n";

      }
}

Another way.  The only ugly this are the chomps but perl won't let you chomp the "filehandle/shift" operator <HANDLE>
0
 
LVL 1

Expert Comment

by:ilia
ID: 2360446
oops, one $ not two


is there no edit feature here?
0
 
LVL 1

Expert Comment

by:ilia
ID: 2360456
guadalupe, I thought chomp was for arrays?
0
 
LVL 3

Expert Comment

by:guadalupe
ID: 2360460
One more comment, this depends on the format you mentioned which was a new line for each element including the multi-line messages.  This is due to the fact that <HANDLE> feeds the next line in the file to the assignment and therefore if you indeed have everything serperated by new lines each sucessive <HANDLE> gets a new piece of data and there are always in order.  The the only special consideration is the message.
0
 
LVL 12

Author Comment

by:j2
ID: 2360464
seems your code goes into an endless lope guadalope. guest2.txt is created, but empty. Program does not terminate (part from ctrl.c)
0
 
LVL 3

Accepted Solution

by:
guadalupe earned 800 total points
ID: 2360524
This is a little cleaner:

#!/usr/local/bin/perl


open(DATA, "guest.txt");

open(OUT, ">guest2.txt");

while ($line = <DATA>)
{
      if ($line =~ /<STARTSIG>/)
      {
            chomp($name = <DATA>);

            chomp($location = <DATA>);
            chomp($date = <DATA>);
            chomp($email = <DATA>);
            chomp($url = <DATA>);

            undef($msg_line);
            undef($message);

            until ($msg_line =~ /<STOPSIG>/)
            {
                  $message .= " $msg_line";
                  chomp($msg_line = <DATA>);
            }

            
            print OUT "'$name','$location','$date','$email','$url','$message'\n";

      }
}
0
 
LVL 3

Expert Comment

by:guadalupe
ID: 2360536
First, chomp is to remove the last character of aq line ONLY if it is a newline character.

Second the endless loop may have to do with the format of your file.  I used this prototype to test the script and it worked great:

<STARTSIG>
Name1
Location1
DATE TIME1
email1
URL1
Message1
111
<STOPSIG>
<STARTSIG>
Name2
Location2
DATE TIME2
email2
URL2
Message2
222
222
<STOPSIG>
<STARTSIG>
Name3
Location3
DATE TIME3
email3
URL3
Message3
333
333
333
<STOPSIG>

The results were:

'Name1','Location1 ','DATE TIME1 ','email1','URL1','  Message1 111'
'Name2 ','Location2','DATE TIME2','email2','URL2','  Message2 222 222'
'Name3','Location3','DATE TIME3','email3','URL3','  Message3 333 333 333'

The key here is to find what in my test file is distinct from your actual guest log record.
0
 
LVL 12

Author Comment

by:j2
ID: 2360564
here is an actual snippet

<STARTSIG>
Markisen
Södra Sverige
10.1.2000 17:48:55
wet71@hotmail.com
http://
God Fortsättning alla. tyvärr är det dåligt med kinky tjejer i Bleking
Men ni som läser detta får gärna höra av er.
kanske kan vi ordna något trevligt tillsammans,
<ENDSIG>
<STARTSIG>
MFs_slavinna
Östra Sverige
13.1.2000 18:27:3

http://
Hejsan rara!

Jodå jag är här. *ler* Kul att man är saknad. Fast jag skriver inte lika mycke nu som tidigare.
Saknar dig oxå!

Kram/älskling
<ENDSIG>

(note the blank line of no email is given, hadnt noticed that before)

0
 
LVL 3

Expert Comment

by:guadalupe
ID: 2360588
AHHHHHHH OK the problem is that in your question you put STOPSIG but it is really ENDSIG.

Just change:

until ($msg_line =~ /<STOPSIG>/)


To:

until ($msg_line =~ /<ENDSIG>/)
0
 
LVL 12

Author Comment

by:j2
ID: 2360704
Sorry, it must be the flue talking :)

let me try that :)
0
 
LVL 12

Author Comment

by:j2
ID: 2360713
On the money.

Tho, i just realized something.

I've quite a skilled NetREXX coder, why on EARTH didnt i use netrexx??? :)

Anyway, good show.

/j
0
 
LVL 12

Author Comment

by:j2
ID: 2360837
Question, would it be possible to pipe the data directly to the mysql client and load the data?
0
 
LVL 2

Expert Comment

by:ventolin
ID: 2360880
yes
0

Featured Post

2018 Annual Membership Survey

Here at Experts Exchange, we strive to give members the best experience. Help us improve the site by taking this survey today! (Bonus: Be entered to win a great tech prize for participating!)

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Article by: Tammy
MySQLTuner is a script written in Perl that allows you to review a MySQL installation quickly and make adjustments to increase performance and stability. The current configuration variables and status data is retrieved and presented in a brief forma…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

608 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question