Link to home
Start Free TrialLog in
Avatar of allmer
allmerFlag for Türkiye

asked on

How to read complex data into an array?

Hi experts,
I have a probably weird data structure, but I need to read it into an array (from file) and put it back out as a file.
It looks like this:
###################################################################
FHGHMSSK      present_cw15_gelA_2C06.1194.1194.2.dta      928.836      21      21      8      0      0      -1      0      -1      2 '\n'
321.4      3494'\n'
345.2      963'\n'
...
988.6      1551'\n'
1117.5      1303'\n'
1117.5      1303'\n'
|'\n'
WPGTGAWR      present_cw15_gelA_2C06.1194.1194.2.dta      928.836      20      20      8      0      0      -1      0      -1      2'\n'
321.4      3494'\n'
345.2      963'\n'
384.6      1942'\n'
...
#'\n' = newline (to avoid miss conceptions)
##############################################################################
I need one array with all the data in it (the data on the line with the characters at the beginning and within that
array another array with the x/y data.

The x/y data is complete when '|' is encountered.
On the next line the next dataset starts.
One input set is the complete thing between '|' and '|'.
It is only allowed to occupy one array line.

The output in some other subroutine has to mirror part
of the input file back to a new file in exactly the same scheme
as before.

In c++ I have a class that serializes itself.
Too bad, I am only starting to learn Perl.

Thanks,
Jens

Avatar of rugdog
rugdog
Flag of Mexico image

#!/usr/bin/perl
use strict;
my $in_file="input_file_name";

my @d=ReadData($in_file);
PrintData(@d);

sub ReadData{
   my ($in_file)=@_;
   open(F,"$in_file") or die "failed to open $in_file\n";
   my @data;
   my $reading_what="header";
   my $l;
   my ($x,$y);
   while($l=<F>){
      chomp $l;
      #print "-$l-\n";
      if($l eq "|"){
         $reading_what="header";
         next;
      }
      if($reading_what eq "header"){
         push(@data,[$l]);
         $reading_what="xy";
      } else {
         $l=~m/(.+?)\s+(.+)/;
         ($x,$y)=($1,$2);
         push(@{$data[$#data]->[1]},[$x,$y]);
      }
   }
   close(F);
   return @data;
}

sub PrintData{
   for(@_){
      print $_->[0]."\n";
      for my $p (@{$_->[1]}){
         print $p->[0]." ".$p->[1]."\n";
      }
      print "|\n";
   }
}
Avatar of allmer

ASKER

Thanks alot rugdor!
I will try tomorrow morning.
Jens
Avatar of ITcrow
ITcrow

#! /usr/local/bin/perl -w

$/ = '|\'\n\'';                   # Your actual separator here.

while(<>) {                     # While there are records in the file
  push(@records, "$_");    # Add them in an array.
}

#- Providing a sample of how printed records will look like:
foreach( @records ) {
  print "RecordBegin---------------------------------------\n";    # Optional Begin Record Separator
  print "$_";                                                                      # Actual Record.
  print "RecordEND-----------------------------------------\n";    # Optional End Record Separator
}
Usage:  Script_in_previous_append <record_file>

Eg.
> myscript records.txt
Avatar of allmer

ASKER

@rugdog:
Almost there ;)
It mirrors the array quite good 2 little minor changes are needed, however.

At the end of the xy data the '|' the seperator should appear on a line.
then the next dataset should be printed.
Like:
Line with data\n
lines with xy data\n
..\n
| #separator
Line with data\n
lines with xy data\n
..\n
| #separator
....

How can the print sub be adjusted so that only part of the array will be printed ($startPosition $endPosition) ?

@ITcrow:
Looks easy, but I think it's not quite what I need.
Problem is:
That the array is distributed by a sub that I cannot change.
Therefore all the data in between two '|' has to be on one line in the array.
Meaning:
Array(..first line of data ... array(xy.data))
Something like the above.
Avatar of allmer

ASKER

I changed the PrintData su to:

sub PrintData{
   for(@_){
      print $_->[0]."\n";
      for my $p (@{$_->[1]}){
         print $p->[0]." ".$p->[1]."|\n";
      }
   }
}

Right now the output is almost right, but
there are 2 spaces before '|'.
Probably doesn't matter but if it can easily be changed, that would be great.
Anyway, how do I change the sub to print only part of the array?
Thanks,
Jens
Avatar of allmer

ASKER

One more thing:
the data should be printed to file.
Maybe something like:
sub InitSubTask($start, $end ...) {
    open(F,">$serverSubTaskDir/queries.input");
    for($i = $start; $i < $end; $i++)
       print F @_[$i][0]." ".@_[$i][$1]."|\n";
    close F;
}
My perl ignorance is probably perfectly visible here,
but I guess you know what I am getting at.
Thank Jens

To give you and idea on slurpling beginning spaces in e.g. $data.

$data =~ s/^\ +//;

OR to wipe any special characters,

$data =~ s/^\s+//;

BTW, as to my solution:

You always have access to data in a single line, it has '\n' in between but it's still a single record:

while(<>) {                     # While there are records in the file
  push(@records, "$_");    # Add them in an array.
}

parts of record are always visible as:

$sep = '\n';     # Correct it to \n or whatever is correct for your data;
@lines_of_a_record = split( /$sep/, $record );

....  make changes in record and reconstruct record .....

$record = join( $sep, @lines_of_a_record );

for fixing the sapces at the end of the x,y pair, sendig to a file and print part of the array:
 
#!/usr/bin/perl
use strict;
my $in_file="input_file_name";
my $out_file= "out_file" ;
my $start_pos=2;
my $end_pos=4;

my @d=ReadData($in_file);
PrintData($out_file,$start_pos,$end_pos,@d);

sub ReadData{
   my ($in_file)=@_;
   open(F,"$in_file") or die "failed to open $in_file\n";
   my @data;
   my $reading_what="header";
   my $l;
   my ($x,$y);
   while($l=<F>){
      chomp $l;
      #print "-$l-\n";
      if($l eq "|"){
         $reading_what="header";
         next;
      }
      if($reading_what eq "header"){
         push(@data,[$l]);
         $reading_what="xy";
      } else {
         $l=~m/(.+?)\s+(.+?)\s*$/;
         ($x,$y)=($1,$2);
         push(@{$data[$#data]->[1]},[$x,$y]);
      }
   }
   close(F);
   return @data;
}

sub PrintData{
   my ($fname,$start_pos,$end_pos,@arr)=@_;
   open(F,">  $fname") or die "failed to open file $fname: $!\n";
   for(my $i=$start_pos;$i<=$end_pos;$i++){
      print F $arr[$i]->[0]."\n";
      for my $p (@{$arr[$i]->[1]}){
         print F $p->[0]." ".$p->[1]."|\n";
      }
   }
   close(F);
}


Avatar of allmer

ASKER

Good morning,
@rugdog,
doesn't work quite yet. The complete file is in data[0].
It should be in the array dataset by dataset.
@ITcrow:
I did:

my $in_file = "$inputPath/$queriesFileName";
my $out_file = "$inputPath/res.txt";
my @records;
my $firstLine;
ReadData($in_file);
WriteInput($out_file,2,5);
sub ReadData {
  my($in_file) = @_;
  $/ = '|';
  my $num = <F>;                   #here I would like to swallow the first line from the file and put it into the var $num
  $firstLine = <F>;                   #Another line I would like to take of the file before it is processed.
  print "$firstLine, $num, \n";
  open(F,"$in_file");
  while(<F>) {
    push(@records, "$_");    # Add them in an array.
  }
  close(F);
}
sub WriteInput {
  my($filePath,$start,$end) = @_;
  open(F,">  $filePath") or die "Failed to open $filePath at $start\n";
  print $firstLine;
  my $diff = $end-$start;
  print F "$diff\n";
  for(my $i=$start; $i<$end; $i++) {
    print F @records[$i]."\n";
  }
  close(F);
}
Only thing that I need right now is the ability to take the first
two lines from the file and then process the rest of the data
while storing the two lines in any variables.
Thanks,
Jens
 my $num = <F>;                   #here I would like to swallow the first line from the file and put it into the var $num
  $firstLine = <F>;                   #Another line I would like to take of the file before it is processed.
  print "$firstLine, $num, \n";
  open(F,"$in_file");
while(<F>) {

should be:
=====================================================================================
  open(F,"$in_file");
  my $num = <F>;                   #here I would like to swallow the first line from the file and put it into the var $num
  $firstLine = <F>;                   #Another line I would like to take of the file before it is processed.
  print "$firstLine, $num, \n";
  while(<F>) {

ASKER CERTIFIED SOLUTION
Avatar of ITcrow
ITcrow

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
allmer,
   I'm added some lines to print each header element in the array with its index number, using it with the sample data you provided, it appears to behave ok, can you test and send what this script prints with the input file you are using.?

#!/usr/bin/perl
use strict;
my $in_file="input_file_name";
my $out_file= "out_file" ;
my $start_pos=0;
my $end_pos=1;

my @d=ReadData($in_file);

for(my $i=0;$i<=$#d;$i++){
   print "$i ".$d[$i]->[0]."\n";
}

PrintData($out_file,$start_pos,$end_pos,@d);

sub ReadData{
   my ($in_file)=@_;
   open(F,"$in_file") or die "failed to open $in_file\n";
   my @data;
   my $reading_what="header";
   my $l;
   my ($x,$y);
   while($l=<F>){
      chomp $l;
      #print "-$l-\n";
      if($l eq "|"){
         $reading_what="header";
         next;
      }
      if($reading_what eq "header"){
         push(@data,[$l]);
         $reading_what="xy";
      } else {
         $l=~m/(.+?)\s+(.+?)\s*$/;
         ($x,$y)=($1,$2);
         push(@{$data[$#data]->[1]},[$x,$y]);
      }
   }
   close(F);
   return @data;
}

sub PrintData{
   my ($fname,$start_pos,$end_pos,@arr)=@_;
   open(F,">  $fname") or die "failed to open file $fname: $!\n";
   for(my $i=$start_pos;$i<=$end_pos;$i++){
      print F $arr[$i]->[0]."\n";
      for my $p (@{$arr[$i]->[1]}){
         print F $p->[0]." ".$p->[1]."|\n";
      }
   }
   close(F);
}
Avatar of allmer

ASKER

Sorry,
I cannot test today, but I put a file on a server:
http://hippler.bio.upenn.edu/2c06.qgp
Anyway,
when viewing the input file with emacs I keep seeing:
^@
What is that and would it pose a problem?

Before I write anything to disk the terminal '|' should be removed:
@queries = @records = @d.
sub someSub{
  my ($self, $start, $end, $node, $inputDir, $serverSubTaskDir, $nodeSubTaskDir) = @_;
  open(F,">$serverSubTaskDir/queries.input");
  my $diff = $end-$start;
  print F "$diff"."\n";
  print F "$self->{firstline};
  my @tmp = @{$self->{queries}}[$start..$end];
        #Here I would like to delete the last line of the sub array in the last element of the tmp array
        #Some magic code:
        #my $discard = pop @{tmp[$#tmp]->[1]};
        #why does the above not work as I thought it would?
  print F @tmp;
  close F;
  $node->runCmd("cp $serverSubTaskDir/queries.input $nodeSubTaskDir/queries.input");
}
Any ideas?
Thanks,
Jens
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of allmer

ASKER

Thanks alot you two,
the problem resides probably somewhere else.
My c++ class actually allows for some errors,
but it seems like the file transfer from Windows
to Unix screws up the files.
Anyway,
both solutions worked as far as I asked you.

I will switch to <XML> tags, now to make it
more clear.

I will split the points among you.
Thanks again,
Jens