File 're-write'

I have a file which has repetetive entries like this

<STARTSIG>
Name
Location
DATE TIME
email
URL
Message
<STOPSIG>

Its a guestbook which i am porting to a mySQL based solution. The 'message' Can be multiple lines, all others are single line.

I would like to convert this to something more useful, like a single line entry:

'name','Location','etc','etc','etc'

Note that all variables can contain pretty much any characters (including international ones). Also note that i want the start/stop sigs stripped.

Someone up for this?
(platform is unix if that makes a diff)

TIA

Jan - Decent in mySQL / PHP... awful in perl :)
LVL 12
j2Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ventolinCommented:
# slurp your file into $sig_file and then:

&loopidty_loop;

sub loopidty_loop {
        my ($guts) = ($sig_file =~ m#<STARTSIG>\s*(.*?)\s*<STOPSIG>#is);
        # pass guts to another sub
        &parsethisandstoreit("$guts");
        if ($sig_file =~ /./) {
                $sig_file =~ s/<STARTSIG>\s*$guts\s*<STOPSIG>//i;
                &loopidty_loop;
                }
        }


sub parsethisandstoreit {
        # here $guts has the content of each guestbook entry
        # parse this out and do with it as you please
        my $guts = shift;
        print $guts;
        }
0
iliaCommented:
The fact that the message can be multi-line is kinda hard, but here's my try assuming URL comes right before Message:

my @records = ();
my $record = '';
my $msg_flag = 0;
my $msg = '';

open (FILE, $file) or die $!;
while (<FILE>) {
  if ( $_ !~ "<STARTSIG>" ) {
    if ( msg_flag ) {
       if ( msg eq '' ) { msg = "'$_"; }
       else { msg .= $_; }
    }
    else { $record .= "'$_',"; }
  }
  if ( $_ =~ /^http/ ) {
    $msg_flag = 1; next;
  }
  if ( $_ =~ "<STOPSIG>" ) {
    $record .= "'\n";
    push (@records, $record);
    $msg_flag = 0;
    $record = ''; msg = '';
  }
}
close (FILE);

open (FILE, ">$file.new") or die $!;
print FILE @records;
close (FILE);
0
j2Author Commented:
Uhm.. slurp?

consider me a total newbie here, how do i get the above code to do what i want? or does it? :)
0
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

j2Author Commented:
ilja: message follows dorectly on the line after URL so that is right, however, url contains "http://" if no url was given, does that help or complicate?
0
ventolinCommented:
slurp, in this case, is slang for bringing the entire contents of a file into the variable $sig_file.

this could be a challenge for you if you have little experience, but the code provided here should get you in the right direction.

for the code i posted:

open your sig file.
bring the contents in to $sig_file
now paste in the code i posted.
this loops throught the file and gets the content of each
guestbook entry
edit parsethisandstoreit where commented and do whatever you please with this data. to change it as mentioned, you will need to parse $guts to get each entry using regular expressions. if you are new to those as well, pick up the book mastering regular expressions or go to http://www.perl.com/reference/query.cgi?section=regexp&x=5&y=11


0
iliaCommented:
j2: it helps as it validates my code dependencies.

I'm not sure if ventolin's solution scales to multi-line messages..., but to slurp a file, use this code:

open(FILE, $file) or die $!;
$$sig_file = do { local $/; <FILE> };
close(FILE);


0
guadalupeCommented:
#!/usr/local/bin/perl


open(DATA, "guest.txt");

open(OUT, ">guest2.txt");

while ($line = <DATA>)
{
      if ($line =~ /<STARTSIG>/)
      {
            $name = <DATA>;
            chomp($name);
            $location = <DATA>;
            chomp($location);
            $date = <DATA>;
            chomp($date);
            $email = <DATA>;
            chomp($email);
            $url = <DATA>;
            chomp($url);

            undef($msg_line);
            undef($message);

            until ($msg_line =~ /<STOPSIG>/)
            {
                  $message .= " $msg_line";
                  $msg_line = <DATA>;
                  chomp($msg_line);
            }

            
            print OUT "'$name','$location','$date','$email','$url','$message'\n";

      }
}

Another way.  The only ugly this are the chomps but perl won't let you chomp the "filehandle/shift" operator <HANDLE>
0
iliaCommented:
oops, one $ not two


is there no edit feature here?
0
iliaCommented:
guadalupe, I thought chomp was for arrays?
0
guadalupeCommented:
One more comment, this depends on the format you mentioned which was a new line for each element including the multi-line messages.  This is due to the fact that <HANDLE> feeds the next line in the file to the assignment and therefore if you indeed have everything serperated by new lines each sucessive <HANDLE> gets a new piece of data and there are always in order.  The the only special consideration is the message.
0
j2Author Commented:
seems your code goes into an endless lope guadalope. guest2.txt is created, but empty. Program does not terminate (part from ctrl.c)
0
guadalupeCommented:
This is a little cleaner:

#!/usr/local/bin/perl


open(DATA, "guest.txt");

open(OUT, ">guest2.txt");

while ($line = <DATA>)
{
      if ($line =~ /<STARTSIG>/)
      {
            chomp($name = <DATA>);

            chomp($location = <DATA>);
            chomp($date = <DATA>);
            chomp($email = <DATA>);
            chomp($url = <DATA>);

            undef($msg_line);
            undef($message);

            until ($msg_line =~ /<STOPSIG>/)
            {
                  $message .= " $msg_line";
                  chomp($msg_line = <DATA>);
            }

            
            print OUT "'$name','$location','$date','$email','$url','$message'\n";

      }
}
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
guadalupeCommented:
First, chomp is to remove the last character of aq line ONLY if it is a newline character.

Second the endless loop may have to do with the format of your file.  I used this prototype to test the script and it worked great:

<STARTSIG>
Name1
Location1
DATE TIME1
email1
URL1
Message1
111
<STOPSIG>
<STARTSIG>
Name2
Location2
DATE TIME2
email2
URL2
Message2
222
222
<STOPSIG>
<STARTSIG>
Name3
Location3
DATE TIME3
email3
URL3
Message3
333
333
333
<STOPSIG>

The results were:

'Name1','Location1 ','DATE TIME1 ','email1','URL1','  Message1 111'
'Name2 ','Location2','DATE TIME2','email2','URL2','  Message2 222 222'
'Name3','Location3','DATE TIME3','email3','URL3','  Message3 333 333 333'

The key here is to find what in my test file is distinct from your actual guest log record.
0
j2Author Commented:
here is an actual snippet

<STARTSIG>
Markisen
Södra Sverige
10.1.2000 17:48:55
wet71@hotmail.com
http://
God Fortsättning alla. tyvärr är det dåligt med kinky tjejer i Bleking
Men ni som läser detta får gärna höra av er.
kanske kan vi ordna något trevligt tillsammans,
<ENDSIG>
<STARTSIG>
MFs_slavinna
Östra Sverige
13.1.2000 18:27:3

http://
Hejsan rara!

Jodå jag är här. *ler* Kul att man är saknad. Fast jag skriver inte lika mycke nu som tidigare.
Saknar dig oxå!

Kram/älskling
<ENDSIG>

(note the blank line of no email is given, hadnt noticed that before)

0
guadalupeCommented:
AHHHHHHH OK the problem is that in your question you put STOPSIG but it is really ENDSIG.

Just change:

until ($msg_line =~ /<STOPSIG>/)


To:

until ($msg_line =~ /<ENDSIG>/)
0
j2Author Commented:
Sorry, it must be the flue talking :)

let me try that :)
0
j2Author Commented:
On the money.

Tho, i just realized something.

I've quite a skilled NetREXX coder, why on EARTH didnt i use netrexx??? :)

Anyway, good show.

/j
0
j2Author Commented:
Question, would it be possible to pipe the data directly to the mysql client and load the data?
0
ventolinCommented:
yes
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.