Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

How to do loop so that it can read the rest of file??

Posted on 2007-10-09
11
Medium Priority
?
240 Views
Last Modified: 2010-03-05
Hi,

I am trying to parse data files and have a few questions.
The below code works out for a file with one record and I am trying to change it to run it against data file contains multiple records separate by 4 n (nnnn).

1) I place a while ($record = <DATA>) {} but it didn't read rest of file.
It just read one record.
How do I make it read the rest of file??

2) $record = <Data> reads one line at a time.
I also saw line like this {local $/; $record = <DATA>;} which seems to read entire lines (I see no while loop to process each line so I assume that it read the entire lines)

Can you explain it to me this please??
$/ is new line separator and $record =<Data> is reading one line...@records =<DATA> will be reading entire lines and putting into array.
Thanks so much.



#!/usr/bin/perl -w
#Code was helped by Perl Master, mjcoyne/Adam314
#ParseRecords.pl on Data12
use strict;

my $record;
#open (IN, "record.txt") or die;
{local $/; $record = <DATA>;}
#close IN;

$record = "_-_\n" . $record . "_-_\n";

my ($cmatch, $company) = ($record =~ /(Company\s*:\s*(.+)\n)/i);
$record =~ s/$cmatch/_-_/;
print "<company>$company</company>\n";

my ($ematch, $email) = ($record =~ /(Email\s*:\s*(.+)\n)/i);
$record =~ s/$ematch/_-_/;
print "<email>$email</email>\n";

print "<phones>\n";

while ($record =~ /((Tel|Fax|Cell):\s*([\d-]+))/ig) {
    print "    <phone type=$2>$3</phone>\n";
    $record =~ s/$1/_-_/;
}

print "</phones>\n";

my($info) = ($record =~ /Information\s*:\s*(.+?)\n_-_/si);
print "<info>$info</info>\n";

my ($address) = ($record =~ /Address\s*:\s*(.+?)\n_-_/si);
print "<address>$address</address>\n";


New Data file:
__DATA__
Company: Yahoo Inc.
Tel: 2323-2323-2332
Tel: 3422-333-3333
Email: yahoo@yahoo.com
Tel: 123-333-3333
Fax: 343434-33-333
Information: ccccc:  ccccccccc:ccccccccccc:ccccccccccccccccccccccccc
ccccc:ccccccccccc                      ******Yes this this a continuation from above
Tel: 3443-34344-3443
Cell:343434.3444
HomeNumber: 123-456-7891
Address: 454 street
NY city, NY 34344

nnnn

Company: Yahoo Inc.
Tel: 2323-2323-2332
Tel: 3422-333-3333
Email: yahoo@yahoo.com


nnnn
Company: Yahoo Inc.
Tel: 2323-2323-2332
Tel: 3422-333-3333
Email: yahoo@yahoo.com

_ END _
0
Comment
Question by:dkim18
  • 7
  • 4
11 Comments
 
LVL 39

Expert Comment

by:Adam314
ID: 20044034
$/ is the input record seperator.  It is "\n" by default.  It is the string that endicates the end of a record.

my $record=<DATA>;
this will read the next record from the DATA handle.  It will read and return everything and including the input record seperator.

my @records=<DATA>;
this will read until the end of file for the DATA handle.  It will split the data using the $/, and return an array of records.

#this will read one record at a time.  It will continue in the while loop entil the end of the file
while ($record = <DATA>) {}

0
 

Author Comment

by:dkim18
ID: 20045622
my $record=<DATA>,
doesn't require any while loop to read each records then??
Like code above, there is no while loop but it reads each line upto first record.
(reads each line without while loop until next next record)
then how do i read next records??
I did put while loop in above code but it only read one record...
0
 

Author Comment

by:dkim18
ID: 20045775
__DATA__
Company: Yahoo Inc.
Tel: 2323-2323-2332
Tel: 3422-333-3333
Email: yahoo@yahoo.com
Tel: 123-333-3333
Fax: 343434-33-333
Information: ccccc:  ccccccccc:ccccccccccc:ccccccccccccccccccccccccc
ccccc:ccccccccccc                      ******Yes this this a continuation from above
Tel: 3443-34344-3443
Cell:343434.3444
HomeNumber: 123-456-7891
Address: 454 street
NY city, NY 34344

nnnn

Company: Yahoo Inc.
Tel: 2323-2323-2332
Tel: 3422-333-3333
Email: yahoo@yahoo.com


nnnn
Company: Yahoo Inc.
Tel: 2323-2323-2332
Tel: 3422-333-3333
Email: yahoo@yahoo.com

_ END _

I believe my data sets all have company name but might or might now have tel,fax,email,address,website or information fields.
Also it might have multiple same fields...

How do I parse this file continueously??
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 39

Expert Comment

by:Adam314
ID: 20048490
This:
    my $record=<DATA>;
will read one record from DATA.  If the input record seperator is undef, then one record will be the entire file.


Here is some code that will read each record, store it in a hash of array (the hash key is the field name, the array are all of the values)
use Data::Dumper;

$/="nnnn\n";
my @fields=qw(Company Tel Email Fax Information Tel Cell HomeNumber Address);


my $pattern = join("|", @fields);
while(my $record=<DATA>) {
      my @lines=split(/\n/, $record);
      
      my %recdata;
      my $lastfield;
      foreach (@lines) {
            if(/^($pattern)\s*:\s*(.*)$/) {
                  push @{$recdata{$1}}, $2;
                  $lastfield=$1;
            }
            elsif($lastfield) {
                  $recdata{$lastfield}->[-1] .= "\n$_";
            }
      }
      print Dumper(\%recdata);
}
0
 

Author Comment

by:dkim18
ID: 20049733
THanks Adam314,

1) Hash is a little confusing to me now.
push @{$recdata{$1}}, $2   <== You are adding key & value to the hash array right??

Print Dumper(\%recdata) <== I guess this print out one record of hash value(should be all records)
How do I print separate field values??
print "Company : recdata[Company]";  ??
print "Tel: recdata[Tel]"; ??

2) This only printed one record and this looks like my data file.
Don't know why it is not printing all the records

Company: Yahoo Inc.
Tel: 2323-2323-2332
Tel: 3422-333-3333
Email: yahoo@yahoo.com

nnnn

Company: Yahoo Inc.
Tel: 2323-2323-2332
Tel: 3422-333-3333
Email: yahoo@yahoo.com

nnnn

...more..

0
 

Author Comment

by:dkim18
ID: 20049798
Print Dumper(\%recdata{$fields[0]})
this printed company value but it also printed out another info like this:
$VAR1 = \[
               'Comany Name'
              ];

I also get this message:
Modificatioon of non-createable arrary value attempted, subscript -1 at DumperExample.pl line 24, <DATA> chunk 2.

What does this mean??
Thanks muhc.
0
 
LVL 39

Accepted Solution

by:
Adam314 earned 2000 total points
ID: 20049873
This line:
    push @{$recdata{$1}}, $2
The $1 is the field name (eg: Company).  The $2 is the value (eg: Yahoo Inc.).
This line will create a hash element in %recdata with key $1.  It will be a reference to an array.  It will add $2 to the end of that array.

Are you on windows? If so, then your data file might have different line endings.  I made a small to the script that should help.  If "nnnn" appears anywhere in the file other than as a record separator, it will not be read correctly.  Let me know if that is the case.

To print one field (eg: Company):
    print "Company: " . join(", ", @{$recdata{Company}}) . "\n";
The join is necessary because there could be more than one Company listed.  If so, they will be separated by comma.

Here is another version of the code.  Comments added, and changed to print the data in a different format:
use Data::Dumper;

$/="nnnn";
my @fields=qw(Company Tel Email Fax Information Tel Cell HomeNumber Address);


my $pattern = join("|", @fields);
while(my $record=<DATA>) {
      #Remove "nnnn" portion
      $record =~ s/nnnn\s*$//;
      
      #Split record into lines
      my @lines=split(/\n/, $record);
      
      #Hash to store data
      my %recdata;
      
      #Variable to store field name
      my $lastfield;
      
      #Loop through each line
      foreach (@lines) {
            #If this line is a new field
            if(/^($pattern)\s*:\s*(.*)$/) {
                  #Save the data to this field
                  push @{$recdata{$1}}, $2;
                  $lastfield=$1;
            }
            
            #If this line is not a new field
            elsif($lastfield) {
                  #Save the data to the previous lines field
                  $recdata{$lastfield}->[-1] .= "\n$_";
            }
      }
      
      #Print this record
      foreach my $key (sort keys %recdata) {
            print "$key:" . join(",", @{$recdata{$key}}) . "\n";
      }
      print "\n";
}

__DATA__
Company: Yahoo Inc.
Tel: 2323-2323-2332
Tel: 3422-333-3333
Email: yahoo@yahoo.com
Tel: 123-333-3333
Fax: 343434-33-333
Information: ccccc:  ccccccccc:ccccccccccc:ccccccccccccccccccccccccc
ccccc:ccccccccccc
Tel: 3443-34344-3443
Cell:343434.3444
HomeNumber: 123-456-7891
Address: 454 street
NY city, NY 34344

nnnn

Company: Yahoo Inc.
Tel: 2323-2323-2332
Tel: 3422-333-3333
Email: yahoo@yahoo.com


nnnn
Company: Yahoo Inc.
Tel: 2323-2323-2332
Tel: 3422-333-3333
Email: yahoo@yahoo.com

0
 

Author Comment

by:dkim18
ID: 20049879
I used this instead and worked.
$/="nnnn\n\n";

How can I access those values individually(only values)?

One more thing, I see they put together same field's values like this:

Tel => [ tel #1, tel #2, Tel #3]

How do I access it indiviual number?

0
 

Author Comment

by:dkim18
ID: 20050109
Your second script worked. Thanks much.

Is there a way to separate out the multiple values??
Is there a built-in function for it??
If not, I just write something split them.
Thanks much.


Tel: 3333, 444,55555 ==>
Tel: 333
Tel: 444
Tel: 55555
0
 

Author Comment

by:dkim18
ID: 20051081
figured it out.
Thanks so much.

How does this Dumper module work by the way??
I read it's page but didn't quite understand.

Can this be used any kind of
filed: values data which might or might not run on to multiple lines??

Again thanks so much for your help.
0
 
LVL 39

Expert Comment

by:Adam314
ID: 20051653
The dumper module is useful for displaying complex data - such as the hash of arrays used in this case.
You pass it whatever you would like it to display.  The output format is such that you could eval the output to recreate the same data structure.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question