Solved

How to read complex data into an array?

Posted on 2004-10-29
120 Views
Last Modified: 2010-03-05
Hi experts,
I have a probably weird data structure, but I need to read it into an array (from file) and put it back out as a file.
It looks like this:
###################################################################
FHGHMSSK      present_cw15_gelA_2C06.1194.1194.2.dta      928.836      21      21      8      0      0      -1      0      -1      2 '\n'
321.4      3494'\n'
345.2      963'\n'
...
988.6      1551'\n'
1117.5      1303'\n'
1117.5      1303'\n'
|'\n'
WPGTGAWR      present_cw15_gelA_2C06.1194.1194.2.dta      928.836      20      20      8      0      0      -1      0      -1      2'\n'
321.4      3494'\n'
345.2      963'\n'
384.6      1942'\n'
...
#'\n' = newline (to avoid miss conceptions)
##############################################################################
I need one array with all the data in it (the data on the line with the characters at the beginning and within that
array another array with the x/y data.

The x/y data is complete when '|' is encountered.
On the next line the next dataset starts.
One input set is the complete thing between '|' and '|'.
It is only allowed to occupy one array line.

The output in some other subroutine has to mirror part
of the input file back to a new file in exactly the same scheme
as before.

In c++ I have a class that serializes itself.
Too bad, I am only starting to learn Perl.

Thanks,
Jens

0
Question by:allmer
    16 Comments
     
    LVL 7

    Expert Comment

    by:rugdog
    #!/usr/bin/perl
    use strict;
    my $in_file="input_file_name";

    my @d=ReadData($in_file);
    PrintData(@d);

    sub ReadData{
       my ($in_file)=@_;
       open(F,"$in_file") or die "failed to open $in_file\n";
       my @data;
       my $reading_what="header";
       my $l;
       my ($x,$y);
       while($l=<F>){
          chomp $l;
          #print "-$l-\n";
          if($l eq "|"){
             $reading_what="header";
             next;
          }
          if($reading_what eq "header"){
             push(@data,[$l]);
             $reading_what="xy";
          } else {
             $l=~m/(.+?)\s+(.+)/;
             ($x,$y)=($1,$2);
             push(@{$data[$#data]->[1]},[$x,$y]);
          }
       }
       close(F);
       return @data;
    }

    sub PrintData{
       for(@_){
          print $_->[0]."\n";
          for my $p (@{$_->[1]}){
             print $p->[0]." ".$p->[1]."\n";
          }
          print "|\n";
       }
    }
    0
     
    LVL 5

    Author Comment

    by:allmer
    Thanks alot rugdor!
    I will try tomorrow morning.
    Jens
    0
     
    LVL 5

    Expert Comment

    by:ITcrow
    #! /usr/local/bin/perl -w

    $/ = '|\'\n\'';                   # Your actual separator here.

    while(<>) {                     # While there are records in the file
      push(@records, "$_");    # Add them in an array.
    }

    #- Providing a sample of how printed records will look like:
    foreach( @records ) {
      print "RecordBegin---------------------------------------\n";    # Optional Begin Record Separator
      print "$_";                                                                      # Actual Record.
      print "RecordEND-----------------------------------------\n";    # Optional End Record Separator
    }
    0
     
    LVL 5

    Expert Comment

    by:ITcrow
    Usage:  Script_in_previous_append <record_file>

    Eg.
    > myscript records.txt
    0
     
    LVL 5

    Author Comment

    by:allmer
    @rugdog:
    Almost there ;)
    It mirrors the array quite good 2 little minor changes are needed, however.

    At the end of the xy data the '|' the seperator should appear on a line.
    then the next dataset should be printed.
    Like:
    Line with data\n
    lines with xy data\n
    ..\n
    | #separator
    Line with data\n
    lines with xy data\n
    ..\n
    | #separator
    ....

    How can the print sub be adjusted so that only part of the array will be printed ($startPosition $endPosition) ?

    @ITcrow:
    Looks easy, but I think it's not quite what I need.
    Problem is:
    That the array is distributed by a sub that I cannot change.
    Therefore all the data in between two '|' has to be on one line in the array.
    Meaning:
    Array(..first line of data ... array(xy.data))
    Something like the above.
    0
     
    LVL 5

    Author Comment

    by:allmer
    I changed the PrintData su to:

    sub PrintData{
       for(@_){
          print $_->[0]."\n";
          for my $p (@{$_->[1]}){
             print $p->[0]." ".$p->[1]."|\n";
          }
       }
    }

    Right now the output is almost right, but
    there are 2 spaces before '|'.
    Probably doesn't matter but if it can easily be changed, that would be great.
    Anyway, how do I change the sub to print only part of the array?
    Thanks,
    Jens
    0
     
    LVL 5

    Author Comment

    by:allmer
    One more thing:
    the data should be printed to file.
    Maybe something like:
    sub InitSubTask($start, $end ...) {
        open(F,">$serverSubTaskDir/queries.input");
        for($i = $start; $i < $end; $i++)
           print F @_[$i][0]." ".@_[$i][$1]."|\n";
        close F;
    }
    My perl ignorance is probably perfectly visible here,
    but I guess you know what I am getting at.
    Thank Jens
    0
     
    LVL 5

    Expert Comment

    by:ITcrow

    To give you and idea on slurpling beginning spaces in e.g. $data.

    $data =~ s/^\ +//;

    OR to wipe any special characters,

    $data =~ s/^\s+//;

    BTW, as to my solution:

    You always have access to data in a single line, it has '\n' in between but it's still a single record:

    while(<>) {                     # While there are records in the file
      push(@records, "$_");    # Add them in an array.
    }

    parts of record are always visible as:

    $sep = '\n';     # Correct it to \n or whatever is correct for your data;
    @lines_of_a_record = split( /$sep/, $record );

    ....  make changes in record and reconstruct record .....

    $record = join( $sep, @lines_of_a_record );

    0
     
    LVL 7

    Expert Comment

    by:rugdog
    for fixing the sapces at the end of the x,y pair, sendig to a file and print part of the array:
     
    #!/usr/bin/perl
    use strict;
    my $in_file="input_file_name";
    my $out_file= "out_file" ;
    my $start_pos=2;
    my $end_pos=4;

    my @d=ReadData($in_file);
    PrintData($out_file,$start_pos,$end_pos,@d);

    sub ReadData{
       my ($in_file)=@_;
       open(F,"$in_file") or die "failed to open $in_file\n";
       my @data;
       my $reading_what="header";
       my $l;
       my ($x,$y);
       while($l=<F>){
          chomp $l;
          #print "-$l-\n";
          if($l eq "|"){
             $reading_what="header";
             next;
          }
          if($reading_what eq "header"){
             push(@data,[$l]);
             $reading_what="xy";
          } else {
             $l=~m/(.+?)\s+(.+?)\s*$/;
             ($x,$y)=($1,$2);
             push(@{$data[$#data]->[1]},[$x,$y]);
          }
       }
       close(F);
       return @data;
    }

    sub PrintData{
       my ($fname,$start_pos,$end_pos,@arr)=@_;
       open(F,">  $fname") or die "failed to open file $fname: $!\n";
       for(my $i=$start_pos;$i<=$end_pos;$i++){
          print F $arr[$i]->[0]."\n";
          for my $p (@{$arr[$i]->[1]}){
             print F $p->[0]." ".$p->[1]."|\n";
          }
       }
       close(F);
    }


    0
     
    LVL 5

    Author Comment

    by:allmer
    Good morning,
    @rugdog,
    doesn't work quite yet. The complete file is in data[0].
    It should be in the array dataset by dataset.
    @ITcrow:
    I did:

    my $in_file = "$inputPath/$queriesFileName";
    my $out_file = "$inputPath/res.txt";
    my @records;
    my $firstLine;
    ReadData($in_file);
    WriteInput($out_file,2,5);
    sub ReadData {
      my($in_file) = @_;
      $/ = '|';
      my $num = <F>;                   #here I would like to swallow the first line from the file and put it into the var $num
      $firstLine = <F>;                   #Another line I would like to take of the file before it is processed.
      print "$firstLine, $num, \n";
      open(F,"$in_file");
      while(<F>) {
        push(@records, "$_");    # Add them in an array.
      }
      close(F);
    }
    sub WriteInput {
      my($filePath,$start,$end) = @_;
      open(F,">  $filePath") or die "Failed to open $filePath at $start\n";
      print $firstLine;
      my $diff = $end-$start;
      print F "$diff\n";
      for(my $i=$start; $i<$end; $i++) {
        print F @records[$i]."\n";
      }
      close(F);
    }
    Only thing that I need right now is the ability to take the first
    two lines from the file and then process the rest of the data
    while storing the two lines in any variables.
    Thanks,
    Jens
    0
     
    LVL 5

    Expert Comment

    by:ITcrow
     my $num = <F>;                   #here I would like to swallow the first line from the file and put it into the var $num
      $firstLine = <F>;                   #Another line I would like to take of the file before it is processed.
      print "$firstLine, $num, \n";
      open(F,"$in_file");
    while(<F>) {

    should be:
    =====================================================================================
      open(F,"$in_file");
      my $num = <F>;                   #here I would like to swallow the first line from the file and put it into the var $num
      $firstLine = <F>;                   #Another line I would like to take of the file before it is processed.
      print "$firstLine, $num, \n";
      while(<F>) {

    0
     
    LVL 5

    Accepted Solution

    by:

    Forgot to tell you that you will move the separator to just above while.
    so for first two lines \n remains the separator and then you are switching.

     $/ = '|';

    Here it is:
    ==================================================================================
    sub ReadData {
      my($in_file) = @_;

      open(F,"$in_file") || die "Failed to open $in_file: $!\n";

      my $num = <F>;                    #Read line one.
      my $firstLine = <F>;             #Read line two.

      print "$firstLine, $num, \n";

       # Change the separator and process remaining lines.
       $/ = '|';
      while(<F>) {
        push(@records, "$_");    # Add them in an array.
      }
      close(F);
    }
    0
     
    LVL 7

    Expert Comment

    by:rugdog
    allmer,
       I'm added some lines to print each header element in the array with its index number, using it with the sample data you provided, it appears to behave ok, can you test and send what this script prints with the input file you are using.?

    #!/usr/bin/perl
    use strict;
    my $in_file="input_file_name";
    my $out_file= "out_file" ;
    my $start_pos=0;
    my $end_pos=1;

    my @d=ReadData($in_file);

    for(my $i=0;$i<=$#d;$i++){
       print "$i ".$d[$i]->[0]."\n";
    }

    PrintData($out_file,$start_pos,$end_pos,@d);

    sub ReadData{
       my ($in_file)=@_;
       open(F,"$in_file") or die "failed to open $in_file\n";
       my @data;
       my $reading_what="header";
       my $l;
       my ($x,$y);
       while($l=<F>){
          chomp $l;
          #print "-$l-\n";
          if($l eq "|"){
             $reading_what="header";
             next;
          }
          if($reading_what eq "header"){
             push(@data,[$l]);
             $reading_what="xy";
          } else {
             $l=~m/(.+?)\s+(.+?)\s*$/;
             ($x,$y)=($1,$2);
             push(@{$data[$#data]->[1]},[$x,$y]);
          }
       }
       close(F);
       return @data;
    }

    sub PrintData{
       my ($fname,$start_pos,$end_pos,@arr)=@_;
       open(F,">  $fname") or die "failed to open file $fname: $!\n";
       for(my $i=$start_pos;$i<=$end_pos;$i++){
          print F $arr[$i]->[0]."\n";
          for my $p (@{$arr[$i]->[1]}){
             print F $p->[0]." ".$p->[1]."|\n";
          }
       }
       close(F);
    }
    0
     
    LVL 5

    Author Comment

    by:allmer
    Sorry,
    I cannot test today, but I put a file on a server:
    http://hippler.bio.upenn.edu/2c06.qgp
    Anyway,
    when viewing the input file with emacs I keep seeing:
    ^@
    What is that and would it pose a problem?

    Before I write anything to disk the terminal '|' should be removed:
    @queries = @records = @d.
    sub someSub{
      my ($self, $start, $end, $node, $inputDir, $serverSubTaskDir, $nodeSubTaskDir) = @_;
      open(F,">$serverSubTaskDir/queries.input");
      my $diff = $end-$start;
      print F "$diff"."\n";
      print F "$self->{firstline};
      my @tmp = @{$self->{queries}}[$start..$end];
            #Here I would like to delete the last line of the sub array in the last element of the tmp array
            #Some magic code:
            #my $discard = pop @{tmp[$#tmp]->[1]};
            #why does the above not work as I thought it would?
      print F @tmp;
      close F;
      $node->runCmd("cp $serverSubTaskDir/queries.input $nodeSubTaskDir/queries.input");
    }
    Any ideas?
    Thanks,
    Jens
    0
     
    LVL 7

    Assisted Solution

    by:rugdog
    Jens,
      yes, the ^@ will cause a problem since I wrote the script thinking the "|" was the only thing in the line, but try this modification:

    #!/usr/bin/perl
    use strict;
    my $in_file="input_file_name";
    my $out_file= "out_file" ;
    my $start_pos=0;
    my $end_pos=1;

    my @d=ReadData($in_file);

    for(my $i=0;$i<=$#d;$i++){
      print "$i ".$d[$i]->[0]."\n";
    }

    PrintData($out_file,$start_pos,$end_pos,@d);

    sub ReadData{
      my ($in_file)=@_;
      open(F,"$in_file") or die "failed to open $in_file\n";
      my @data;
      my $reading_what="header";
      my $l;
      my ($x,$y);
      while($l=<F>){
         chomp $l;
         #print "-$l-\n";
         if($l =~ /^\|/){
            $reading_what="header";
            next;
         }
         if($reading_what eq "header"){
            push(@data,[$l]);
            $reading_what="xy";
         } else {
            $l=~m/(.+?)\s+(.+?)\s*$/;
            ($x,$y)=($1,$2);
            push(@{$data[$#data]->[1]},[$x,$y]);
         }
      }
      close(F);
      return @data;
    }

    sub PrintData{
      my ($fname,$start_pos,$end_pos,@arr)=@_;
      open(F,">  $fname") or die "failed to open file $fname: $!\n";
      for(my $i=$start_pos;$i<=$end_pos;$i++){
         print F $arr[$i]->[0]."\n";
         for my $p (@{$arr[$i]->[1]}){
            print F $p->[0]." ".$p->[1]."|\n";
         }
      }
      close(F);
    }
    0
     
    LVL 5

    Author Comment

    by:allmer
    Thanks alot you two,
    the problem resides probably somewhere else.
    My c++ class actually allows for some errors,
    but it seems like the file transfer from Windows
    to Unix screws up the files.
    Anyway,
    both solutions worked as far as I asked you.

    I will switch to <XML> tags, now to make it
    more clear.

    I will split the points among you.
    Thanks again,
    Jens
    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone. Privacy Policy Terms of Use

    Featured Post

    Lean Six Sigma Project Manager Certification

    There are many schools of thought around successful project management, but few as highly regarded as the Six Sigma and Lean methods. With 37 hours of learning, this training will explain concrete processes for increasing efficiency and limiting wasted time and effort.

    I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
    Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
    Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
    This video is in connection to the article "The case of a missing mobile phone (https://www.experts-exchange.com/articles/28474/The-Case-of-a-Missing-Mobile-Phone.html)". It will help one to understand clearly the steps to track a lost android phone.

    877 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    14 Experts available now in Live!

    Get 1:1 Help Now