Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

How to read complex data into an array?

Posted on 2004-10-29
16
Medium Priority
?
127 Views
Last Modified: 2010-03-05
Hi experts,
I have a probably weird data structure, but I need to read it into an array (from file) and put it back out as a file.
It looks like this:
###################################################################
FHGHMSSK      present_cw15_gelA_2C06.1194.1194.2.dta      928.836      21      21      8      0      0      -1      0      -1      2 '\n'
321.4      3494'\n'
345.2      963'\n'
...
988.6      1551'\n'
1117.5      1303'\n'
1117.5      1303'\n'
|'\n'
WPGTGAWR      present_cw15_gelA_2C06.1194.1194.2.dta      928.836      20      20      8      0      0      -1      0      -1      2'\n'
321.4      3494'\n'
345.2      963'\n'
384.6      1942'\n'
...
#'\n' = newline (to avoid miss conceptions)
##############################################################################
I need one array with all the data in it (the data on the line with the characters at the beginning and within that
array another array with the x/y data.

The x/y data is complete when '|' is encountered.
On the next line the next dataset starts.
One input set is the complete thing between '|' and '|'.
It is only allowed to occupy one array line.

The output in some other subroutine has to mirror part
of the input file back to a new file in exactly the same scheme
as before.

In c++ I have a class that serializes itself.
Too bad, I am only starting to learn Perl.

Thanks,
Jens

0
Comment
Question by:allmer
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
  • 4
16 Comments
 
LVL 7

Expert Comment

by:rugdog
ID: 12448999
#!/usr/bin/perl
use strict;
my $in_file="input_file_name";

my @d=ReadData($in_file);
PrintData(@d);

sub ReadData{
   my ($in_file)=@_;
   open(F,"$in_file") or die "failed to open $in_file\n";
   my @data;
   my $reading_what="header";
   my $l;
   my ($x,$y);
   while($l=<F>){
      chomp $l;
      #print "-$l-\n";
      if($l eq "|"){
         $reading_what="header";
         next;
      }
      if($reading_what eq "header"){
         push(@data,[$l]);
         $reading_what="xy";
      } else {
         $l=~m/(.+?)\s+(.+)/;
         ($x,$y)=($1,$2);
         push(@{$data[$#data]->[1]},[$x,$y]);
      }
   }
   close(F);
   return @data;
}

sub PrintData{
   for(@_){
      print $_->[0]."\n";
      for my $p (@{$_->[1]}){
         print $p->[0]." ".$p->[1]."\n";
      }
      print "|\n";
   }
}
0
 
LVL 5

Author Comment

by:allmer
ID: 12449471
Thanks alot rugdor!
I will try tomorrow morning.
Jens
0
 
LVL 5

Expert Comment

by:ITcrow
ID: 12450774
#! /usr/local/bin/perl -w

$/ = '|\'\n\'';                   # Your actual separator here.

while(<>) {                     # While there are records in the file
  push(@records, "$_");    # Add them in an array.
}

#- Providing a sample of how printed records will look like:
foreach( @records ) {
  print "RecordBegin---------------------------------------\n";    # Optional Begin Record Separator
  print "$_";                                                                      # Actual Record.
  print "RecordEND-----------------------------------------\n";    # Optional End Record Separator
}
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 5

Expert Comment

by:ITcrow
ID: 12450778
Usage:  Script_in_previous_append <record_file>

Eg.
> myscript records.txt
0
 
LVL 5

Author Comment

by:allmer
ID: 12453130
@rugdog:
Almost there ;)
It mirrors the array quite good 2 little minor changes are needed, however.

At the end of the xy data the '|' the seperator should appear on a line.
then the next dataset should be printed.
Like:
Line with data\n
lines with xy data\n
..\n
| #separator
Line with data\n
lines with xy data\n
..\n
| #separator
....

How can the print sub be adjusted so that only part of the array will be printed ($startPosition $endPosition) ?

@ITcrow:
Looks easy, but I think it's not quite what I need.
Problem is:
That the array is distributed by a sub that I cannot change.
Therefore all the data in between two '|' has to be on one line in the array.
Meaning:
Array(..first line of data ... array(xy.data))
Something like the above.
0
 
LVL 5

Author Comment

by:allmer
ID: 12453153
I changed the PrintData su to:

sub PrintData{
   for(@_){
      print $_->[0]."\n";
      for my $p (@{$_->[1]}){
         print $p->[0]." ".$p->[1]."|\n";
      }
   }
}

Right now the output is almost right, but
there are 2 spaces before '|'.
Probably doesn't matter but if it can easily be changed, that would be great.
Anyway, how do I change the sub to print only part of the array?
Thanks,
Jens
0
 
LVL 5

Author Comment

by:allmer
ID: 12453185
One more thing:
the data should be printed to file.
Maybe something like:
sub InitSubTask($start, $end ...) {
    open(F,">$serverSubTaskDir/queries.input");
    for($i = $start; $i < $end; $i++)
       print F @_[$i][0]." ".@_[$i][$1]."|\n";
    close F;
}
My perl ignorance is probably perfectly visible here,
but I guess you know what I am getting at.
Thank Jens
0
 
LVL 5

Expert Comment

by:ITcrow
ID: 12453845

To give you and idea on slurpling beginning spaces in e.g. $data.

$data =~ s/^\ +//;

OR to wipe any special characters,

$data =~ s/^\s+//;

BTW, as to my solution:

You always have access to data in a single line, it has '\n' in between but it's still a single record:

while(<>) {                     # While there are records in the file
  push(@records, "$_");    # Add them in an array.
}

parts of record are always visible as:

$sep = '\n';     # Correct it to \n or whatever is correct for your data;
@lines_of_a_record = split( /$sep/, $record );

....  make changes in record and reconstruct record .....

$record = join( $sep, @lines_of_a_record );

0
 
LVL 7

Expert Comment

by:rugdog
ID: 12457477
for fixing the sapces at the end of the x,y pair, sendig to a file and print part of the array:
 
#!/usr/bin/perl
use strict;
my $in_file="input_file_name";
my $out_file= "out_file" ;
my $start_pos=2;
my $end_pos=4;

my @d=ReadData($in_file);
PrintData($out_file,$start_pos,$end_pos,@d);

sub ReadData{
   my ($in_file)=@_;
   open(F,"$in_file") or die "failed to open $in_file\n";
   my @data;
   my $reading_what="header";
   my $l;
   my ($x,$y);
   while($l=<F>){
      chomp $l;
      #print "-$l-\n";
      if($l eq "|"){
         $reading_what="header";
         next;
      }
      if($reading_what eq "header"){
         push(@data,[$l]);
         $reading_what="xy";
      } else {
         $l=~m/(.+?)\s+(.+?)\s*$/;
         ($x,$y)=($1,$2);
         push(@{$data[$#data]->[1]},[$x,$y]);
      }
   }
   close(F);
   return @data;
}

sub PrintData{
   my ($fname,$start_pos,$end_pos,@arr)=@_;
   open(F,">  $fname") or die "failed to open file $fname: $!\n";
   for(my $i=$start_pos;$i<=$end_pos;$i++){
      print F $arr[$i]->[0]."\n";
      for my $p (@{$arr[$i]->[1]}){
         print F $p->[0]." ".$p->[1]."|\n";
      }
   }
   close(F);
}


0
 
LVL 5

Author Comment

by:allmer
ID: 12464494
Good morning,
@rugdog,
doesn't work quite yet. The complete file is in data[0].
It should be in the array dataset by dataset.
@ITcrow:
I did:

my $in_file = "$inputPath/$queriesFileName";
my $out_file = "$inputPath/res.txt";
my @records;
my $firstLine;
ReadData($in_file);
WriteInput($out_file,2,5);
sub ReadData {
  my($in_file) = @_;
  $/ = '|';
  my $num = <F>;                   #here I would like to swallow the first line from the file and put it into the var $num
  $firstLine = <F>;                   #Another line I would like to take of the file before it is processed.
  print "$firstLine, $num, \n";
  open(F,"$in_file");
  while(<F>) {
    push(@records, "$_");    # Add them in an array.
  }
  close(F);
}
sub WriteInput {
  my($filePath,$start,$end) = @_;
  open(F,">  $filePath") or die "Failed to open $filePath at $start\n";
  print $firstLine;
  my $diff = $end-$start;
  print F "$diff\n";
  for(my $i=$start; $i<$end; $i++) {
    print F @records[$i]."\n";
  }
  close(F);
}
Only thing that I need right now is the ability to take the first
two lines from the file and then process the rest of the data
while storing the two lines in any variables.
Thanks,
Jens
0
 
LVL 5

Expert Comment

by:ITcrow
ID: 12469568
 my $num = <F>;                   #here I would like to swallow the first line from the file and put it into the var $num
  $firstLine = <F>;                   #Another line I would like to take of the file before it is processed.
  print "$firstLine, $num, \n";
  open(F,"$in_file");
while(<F>) {

should be:
=====================================================================================
  open(F,"$in_file");
  my $num = <F>;                   #here I would like to swallow the first line from the file and put it into the var $num
  $firstLine = <F>;                   #Another line I would like to take of the file before it is processed.
  print "$firstLine, $num, \n";
  while(<F>) {

0
 
LVL 5

Accepted Solution

by:
ITcrow earned 1000 total points
ID: 12469634

Forgot to tell you that you will move the separator to just above while.
so for first two lines \n remains the separator and then you are switching.

 $/ = '|';

Here it is:
==================================================================================
sub ReadData {
  my($in_file) = @_;

  open(F,"$in_file") || die "Failed to open $in_file: $!\n";

  my $num = <F>;                    #Read line one.
  my $firstLine = <F>;             #Read line two.

  print "$firstLine, $num, \n";

   # Change the separator and process remaining lines.
   $/ = '|';
  while(<F>) {
    push(@records, "$_");    # Add them in an array.
  }
  close(F);
}
0
 
LVL 7

Expert Comment

by:rugdog
ID: 12473076
allmer,
   I'm added some lines to print each header element in the array with its index number, using it with the sample data you provided, it appears to behave ok, can you test and send what this script prints with the input file you are using.?

#!/usr/bin/perl
use strict;
my $in_file="input_file_name";
my $out_file= "out_file" ;
my $start_pos=0;
my $end_pos=1;

my @d=ReadData($in_file);

for(my $i=0;$i<=$#d;$i++){
   print "$i ".$d[$i]->[0]."\n";
}

PrintData($out_file,$start_pos,$end_pos,@d);

sub ReadData{
   my ($in_file)=@_;
   open(F,"$in_file") or die "failed to open $in_file\n";
   my @data;
   my $reading_what="header";
   my $l;
   my ($x,$y);
   while($l=<F>){
      chomp $l;
      #print "-$l-\n";
      if($l eq "|"){
         $reading_what="header";
         next;
      }
      if($reading_what eq "header"){
         push(@data,[$l]);
         $reading_what="xy";
      } else {
         $l=~m/(.+?)\s+(.+?)\s*$/;
         ($x,$y)=($1,$2);
         push(@{$data[$#data]->[1]},[$x,$y]);
      }
   }
   close(F);
   return @data;
}

sub PrintData{
   my ($fname,$start_pos,$end_pos,@arr)=@_;
   open(F,">  $fname") or die "failed to open file $fname: $!\n";
   for(my $i=$start_pos;$i<=$end_pos;$i++){
      print F $arr[$i]->[0]."\n";
      for my $p (@{$arr[$i]->[1]}){
         print F $p->[0]." ".$p->[1]."|\n";
      }
   }
   close(F);
}
0
 
LVL 5

Author Comment

by:allmer
ID: 12478089
Sorry,
I cannot test today, but I put a file on a server:
http://hippler.bio.upenn.edu/2c06.qgp
Anyway,
when viewing the input file with emacs I keep seeing:
^@
What is that and would it pose a problem?

Before I write anything to disk the terminal '|' should be removed:
@queries = @records = @d.
sub someSub{
  my ($self, $start, $end, $node, $inputDir, $serverSubTaskDir, $nodeSubTaskDir) = @_;
  open(F,">$serverSubTaskDir/queries.input");
  my $diff = $end-$start;
  print F "$diff"."\n";
  print F "$self->{firstline};
  my @tmp = @{$self->{queries}}[$start..$end];
        #Here I would like to delete the last line of the sub array in the last element of the tmp array
        #Some magic code:
        #my $discard = pop @{tmp[$#tmp]->[1]};
        #why does the above not work as I thought it would?
  print F @tmp;
  close F;
  $node->runCmd("cp $serverSubTaskDir/queries.input $nodeSubTaskDir/queries.input");
}
Any ideas?
Thanks,
Jens
0
 
LVL 7

Assisted Solution

by:rugdog
rugdog earned 1000 total points
ID: 12486148
Jens,
  yes, the ^@ will cause a problem since I wrote the script thinking the "|" was the only thing in the line, but try this modification:

#!/usr/bin/perl
use strict;
my $in_file="input_file_name";
my $out_file= "out_file" ;
my $start_pos=0;
my $end_pos=1;

my @d=ReadData($in_file);

for(my $i=0;$i<=$#d;$i++){
  print "$i ".$d[$i]->[0]."\n";
}

PrintData($out_file,$start_pos,$end_pos,@d);

sub ReadData{
  my ($in_file)=@_;
  open(F,"$in_file") or die "failed to open $in_file\n";
  my @data;
  my $reading_what="header";
  my $l;
  my ($x,$y);
  while($l=<F>){
     chomp $l;
     #print "-$l-\n";
     if($l =~ /^\|/){
        $reading_what="header";
        next;
     }
     if($reading_what eq "header"){
        push(@data,[$l]);
        $reading_what="xy";
     } else {
        $l=~m/(.+?)\s+(.+?)\s*$/;
        ($x,$y)=($1,$2);
        push(@{$data[$#data]->[1]},[$x,$y]);
     }
  }
  close(F);
  return @data;
}

sub PrintData{
  my ($fname,$start_pos,$end_pos,@arr)=@_;
  open(F,">  $fname") or die "failed to open file $fname: $!\n";
  for(my $i=$start_pos;$i<=$end_pos;$i++){
     print F $arr[$i]->[0]."\n";
     for my $p (@{$arr[$i]->[1]}){
        print F $p->[0]." ".$p->[1]."|\n";
     }
  }
  close(F);
}
0
 
LVL 5

Author Comment

by:allmer
ID: 12497421
Thanks alot you two,
the problem resides probably somewhere else.
My c++ class actually allows for some errors,
but it seems like the file transfer from Windows
to Unix screws up the files.
Anyway,
both solutions worked as far as I asked you.

I will switch to <XML> tags, now to make it
more clear.

I will split the points among you.
Thanks again,
Jens
0

Featured Post

Ask an Anonymous Question!

Don't feel intimidated by what you don't know. Ask your question anonymously. It's easy! Learn more and upgrade.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

609 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question