allmer
asked on
How to read complex data into an array?
Hi experts,
I have a probably weird data structure, but I need to read it into an array (from file) and put it back out as a file.
It looks like this:
########################## ########## ########## ########## ########## #
FHGHMSSK present_cw15_gelA_2C06.119 4.1194.2.d ta 928.836 21 21 8 0 0 -1 0 -1 2 '\n'
321.4 3494'\n'
345.2 963'\n'
...
988.6 1551'\n'
1117.5 1303'\n'
1117.5 1303'\n'
|'\n'
WPGTGAWR present_cw15_gelA_2C06.119 4.1194.2.d ta 928.836 20 20 8 0 0 -1 0 -1 2'\n'
321.4 3494'\n'
345.2 963'\n'
384.6 1942'\n'
...
#'\n' = newline (to avoid miss conceptions)
########################## ########## ########## ########## ########## ########## ##
I need one array with all the data in it (the data on the line with the characters at the beginning and within that
array another array with the x/y data.
The x/y data is complete when '|' is encountered.
On the next line the next dataset starts.
One input set is the complete thing between '|' and '|'.
It is only allowed to occupy one array line.
The output in some other subroutine has to mirror part
of the input file back to a new file in exactly the same scheme
as before.
In c++ I have a class that serializes itself.
Too bad, I am only starting to learn Perl.
Thanks,
Jens
I have a probably weird data structure, but I need to read it into an array (from file) and put it back out as a file.
It looks like this:
##########################
FHGHMSSK present_cw15_gelA_2C06.119
321.4 3494'\n'
345.2 963'\n'
...
988.6 1551'\n'
1117.5 1303'\n'
1117.5 1303'\n'
|'\n'
WPGTGAWR present_cw15_gelA_2C06.119
321.4 3494'\n'
345.2 963'\n'
384.6 1942'\n'
...
#'\n' = newline (to avoid miss conceptions)
##########################
I need one array with all the data in it (the data on the line with the characters at the beginning and within that
array another array with the x/y data.
The x/y data is complete when '|' is encountered.
On the next line the next dataset starts.
One input set is the complete thing between '|' and '|'.
It is only allowed to occupy one array line.
The output in some other subroutine has to mirror part
of the input file back to a new file in exactly the same scheme
as before.
In c++ I have a class that serializes itself.
Too bad, I am only starting to learn Perl.
Thanks,
Jens
ASKER
Thanks alot rugdor!
I will try tomorrow morning.
Jens
I will try tomorrow morning.
Jens
#! /usr/local/bin/perl -w
$/ = '|\'\n\''; # Your actual separator here.
while(<>) { # While there are records in the file
push(@records, "$_"); # Add them in an array.
}
#- Providing a sample of how printed records will look like:
foreach( @records ) {
print "RecordBegin-------------- ---------- ---------- -----\n"; # Optional Begin Record Separator
print "$_"; # Actual Record.
print "RecordEND---------------- ---------- ---------- -----\n"; # Optional End Record Separator
}
$/ = '|\'\n\''; # Your actual separator here.
while(<>) { # While there are records in the file
push(@records, "$_"); # Add them in an array.
}
#- Providing a sample of how printed records will look like:
foreach( @records ) {
print "RecordBegin--------------
print "$_"; # Actual Record.
print "RecordEND----------------
}
Usage: Script_in_previous_append <record_file>
Eg.
> myscript records.txt
Eg.
> myscript records.txt
ASKER
@rugdog:
Almost there ;)
It mirrors the array quite good 2 little minor changes are needed, however.
At the end of the xy data the '|' the seperator should appear on a line.
then the next dataset should be printed.
Like:
Line with data\n
lines with xy data\n
..\n
| #separator
Line with data\n
lines with xy data\n
..\n
| #separator
....
How can the print sub be adjusted so that only part of the array will be printed ($startPosition $endPosition) ?
@ITcrow:
Looks easy, but I think it's not quite what I need.
Problem is:
That the array is distributed by a sub that I cannot change.
Therefore all the data in between two '|' has to be on one line in the array.
Meaning:
Array(..first line of data ... array(xy.data))
Something like the above.
Almost there ;)
It mirrors the array quite good 2 little minor changes are needed, however.
At the end of the xy data the '|' the seperator should appear on a line.
then the next dataset should be printed.
Like:
Line with data\n
lines with xy data\n
..\n
| #separator
Line with data\n
lines with xy data\n
..\n
| #separator
....
How can the print sub be adjusted so that only part of the array will be printed ($startPosition $endPosition) ?
@ITcrow:
Looks easy, but I think it's not quite what I need.
Problem is:
That the array is distributed by a sub that I cannot change.
Therefore all the data in between two '|' has to be on one line in the array.
Meaning:
Array(..first line of data ... array(xy.data))
Something like the above.
ASKER
I changed the PrintData su to:
sub PrintData{
for(@_){
print $_->[0]."\n";
for my $p (@{$_->[1]}){
print $p->[0]." ".$p->[1]."|\n";
}
}
}
Right now the output is almost right, but
there are 2 spaces before '|'.
Probably doesn't matter but if it can easily be changed, that would be great.
Anyway, how do I change the sub to print only part of the array?
Thanks,
Jens
sub PrintData{
for(@_){
print $_->[0]."\n";
for my $p (@{$_->[1]}){
print $p->[0]." ".$p->[1]."|\n";
}
}
}
Right now the output is almost right, but
there are 2 spaces before '|'.
Probably doesn't matter but if it can easily be changed, that would be great.
Anyway, how do I change the sub to print only part of the array?
Thanks,
Jens
ASKER
One more thing:
the data should be printed to file.
Maybe something like:
sub InitSubTask($start, $end ...) {
open(F,">$serverSubTaskDir /queries.i nput");
for($i = $start; $i < $end; $i++)
print F @_[$i][0]." ".@_[$i][$1]."|\n";
close F;
}
My perl ignorance is probably perfectly visible here,
but I guess you know what I am getting at.
Thank Jens
the data should be printed to file.
Maybe something like:
sub InitSubTask($start, $end ...) {
open(F,">$serverSubTaskDir
for($i = $start; $i < $end; $i++)
print F @_[$i][0]." ".@_[$i][$1]."|\n";
close F;
}
My perl ignorance is probably perfectly visible here,
but I guess you know what I am getting at.
Thank Jens
To give you and idea on slurpling beginning spaces in e.g. $data.
$data =~ s/^\ +//;
OR to wipe any special characters,
$data =~ s/^\s+//;
BTW, as to my solution:
You always have access to data in a single line, it has '\n' in between but it's still a single record:
while(<>) { # While there are records in the file
push(@records, "$_"); # Add them in an array.
}
parts of record are always visible as:
$sep = '\n'; # Correct it to \n or whatever is correct for your data;
@lines_of_a_record = split( /$sep/, $record );
.... make changes in record and reconstruct record .....
$record = join( $sep, @lines_of_a_record );
for fixing the sapces at the end of the x,y pair, sendig to a file and print part of the array:
#!/usr/bin/perl
use strict;
my $in_file="input_file_name" ;
my $out_file= "out_file" ;
my $start_pos=2;
my $end_pos=4;
my @d=ReadData($in_file);
PrintData($out_file,$start _pos,$end_ pos,@d);
sub ReadData{
my ($in_file)=@_;
open(F,"$in_file") or die "failed to open $in_file\n";
my @data;
my $reading_what="header";
my $l;
my ($x,$y);
while($l=<F>){
chomp $l;
#print "-$l-\n";
if($l eq "|"){
$reading_what="header";
next;
}
if($reading_what eq "header"){
push(@data,[$l]);
$reading_what="xy";
} else {
$l=~m/(.+?)\s+(.+?)\s*$/;
($x,$y)=($1,$2);
push(@{$data[$#data]->[1]} ,[$x,$y]);
}
}
close(F);
return @data;
}
sub PrintData{
my ($fname,$start_pos,$end_po s,@arr)=@_ ;
open(F,"> $fname") or die "failed to open file $fname: $!\n";
for(my $i=$start_pos;$i<=$end_pos ;$i++){
print F $arr[$i]->[0]."\n";
for my $p (@{$arr[$i]->[1]}){
print F $p->[0]." ".$p->[1]."|\n";
}
}
close(F);
}
#!/usr/bin/perl
use strict;
my $in_file="input_file_name"
my $out_file= "out_file" ;
my $start_pos=2;
my $end_pos=4;
my @d=ReadData($in_file);
PrintData($out_file,$start
sub ReadData{
my ($in_file)=@_;
open(F,"$in_file") or die "failed to open $in_file\n";
my @data;
my $reading_what="header";
my $l;
my ($x,$y);
while($l=<F>){
chomp $l;
#print "-$l-\n";
if($l eq "|"){
$reading_what="header";
next;
}
if($reading_what eq "header"){
push(@data,[$l]);
$reading_what="xy";
} else {
$l=~m/(.+?)\s+(.+?)\s*$/;
($x,$y)=($1,$2);
push(@{$data[$#data]->[1]}
}
}
close(F);
return @data;
}
sub PrintData{
my ($fname,$start_pos,$end_po
open(F,"> $fname") or die "failed to open file $fname: $!\n";
for(my $i=$start_pos;$i<=$end_pos
print F $arr[$i]->[0]."\n";
for my $p (@{$arr[$i]->[1]}){
print F $p->[0]." ".$p->[1]."|\n";
}
}
close(F);
}
ASKER
Good morning,
@rugdog,
doesn't work quite yet. The complete file is in data[0].
It should be in the array dataset by dataset.
@ITcrow:
I did:
my $in_file = "$inputPath/$queriesFileNa me";
my $out_file = "$inputPath/res.txt";
my @records;
my $firstLine;
ReadData($in_file);
WriteInput($out_file,2,5);
sub ReadData {
my($in_file) = @_;
$/ = '|';
my $num = <F>; #here I would like to swallow the first line from the file and put it into the var $num
$firstLine = <F>; #Another line I would like to take of the file before it is processed.
print "$firstLine, $num, \n";
open(F,"$in_file");
while(<F>) {
push(@records, "$_"); # Add them in an array.
}
close(F);
}
sub WriteInput {
my($filePath,$start,$end) = @_;
open(F,"> $filePath") or die "Failed to open $filePath at $start\n";
print $firstLine;
my $diff = $end-$start;
print F "$diff\n";
for(my $i=$start; $i<$end; $i++) {
print F @records[$i]."\n";
}
close(F);
}
Only thing that I need right now is the ability to take the first
two lines from the file and then process the rest of the data
while storing the two lines in any variables.
Thanks,
Jens
@rugdog,
doesn't work quite yet. The complete file is in data[0].
It should be in the array dataset by dataset.
@ITcrow:
I did:
my $in_file = "$inputPath/$queriesFileNa
my $out_file = "$inputPath/res.txt";
my @records;
my $firstLine;
ReadData($in_file);
WriteInput($out_file,2,5);
sub ReadData {
my($in_file) = @_;
$/ = '|';
my $num = <F>; #here I would like to swallow the first line from the file and put it into the var $num
$firstLine = <F>; #Another line I would like to take of the file before it is processed.
print "$firstLine, $num, \n";
open(F,"$in_file");
while(<F>) {
push(@records, "$_"); # Add them in an array.
}
close(F);
}
sub WriteInput {
my($filePath,$start,$end) = @_;
open(F,"> $filePath") or die "Failed to open $filePath at $start\n";
print $firstLine;
my $diff = $end-$start;
print F "$diff\n";
for(my $i=$start; $i<$end; $i++) {
print F @records[$i]."\n";
}
close(F);
}
Only thing that I need right now is the ability to take the first
two lines from the file and then process the rest of the data
while storing the two lines in any variables.
Thanks,
Jens
my $num = <F>; #here I would like to swallow the first line from the file and put it into the var $num
$firstLine = <F>; #Another line I would like to take of the file before it is processed.
print "$firstLine, $num, \n";
open(F,"$in_file");
while(<F>) {
should be:
========================== ========== ========== ========== ========== ========== =========
open(F,"$in_file");
my $num = <F>; #here I would like to swallow the first line from the file and put it into the var $num
$firstLine = <F>; #Another line I would like to take of the file before it is processed.
print "$firstLine, $num, \n";
while(<F>) {
$firstLine = <F>; #Another line I would like to take of the file before it is processed.
print "$firstLine, $num, \n";
open(F,"$in_file");
while(<F>) {
should be:
==========================
open(F,"$in_file");
my $num = <F>; #here I would like to swallow the first line from the file and put it into the var $num
$firstLine = <F>; #Another line I would like to take of the file before it is processed.
print "$firstLine, $num, \n";
while(<F>) {
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
allmer,
I'm added some lines to print each header element in the array with its index number, using it with the sample data you provided, it appears to behave ok, can you test and send what this script prints with the input file you are using.?
#!/usr/bin/perl
use strict;
my $in_file="input_file_name" ;
my $out_file= "out_file" ;
my $start_pos=0;
my $end_pos=1;
my @d=ReadData($in_file);
for(my $i=0;$i<=$#d;$i++){
print "$i ".$d[$i]->[0]."\n";
}
PrintData($out_file,$start _pos,$end_ pos,@d);
sub ReadData{
my ($in_file)=@_;
open(F,"$in_file") or die "failed to open $in_file\n";
my @data;
my $reading_what="header";
my $l;
my ($x,$y);
while($l=<F>){
chomp $l;
#print "-$l-\n";
if($l eq "|"){
$reading_what="header";
next;
}
if($reading_what eq "header"){
push(@data,[$l]);
$reading_what="xy";
} else {
$l=~m/(.+?)\s+(.+?)\s*$/;
($x,$y)=($1,$2);
push(@{$data[$#data]->[1]} ,[$x,$y]);
}
}
close(F);
return @data;
}
sub PrintData{
my ($fname,$start_pos,$end_po s,@arr)=@_ ;
open(F,"> $fname") or die "failed to open file $fname: $!\n";
for(my $i=$start_pos;$i<=$end_pos ;$i++){
print F $arr[$i]->[0]."\n";
for my $p (@{$arr[$i]->[1]}){
print F $p->[0]." ".$p->[1]."|\n";
}
}
close(F);
}
I'm added some lines to print each header element in the array with its index number, using it with the sample data you provided, it appears to behave ok, can you test and send what this script prints with the input file you are using.?
#!/usr/bin/perl
use strict;
my $in_file="input_file_name"
my $out_file= "out_file" ;
my $start_pos=0;
my $end_pos=1;
my @d=ReadData($in_file);
for(my $i=0;$i<=$#d;$i++){
print "$i ".$d[$i]->[0]."\n";
}
PrintData($out_file,$start
sub ReadData{
my ($in_file)=@_;
open(F,"$in_file") or die "failed to open $in_file\n";
my @data;
my $reading_what="header";
my $l;
my ($x,$y);
while($l=<F>){
chomp $l;
#print "-$l-\n";
if($l eq "|"){
$reading_what="header";
next;
}
if($reading_what eq "header"){
push(@data,[$l]);
$reading_what="xy";
} else {
$l=~m/(.+?)\s+(.+?)\s*$/;
($x,$y)=($1,$2);
push(@{$data[$#data]->[1]}
}
}
close(F);
return @data;
}
sub PrintData{
my ($fname,$start_pos,$end_po
open(F,"> $fname") or die "failed to open file $fname: $!\n";
for(my $i=$start_pos;$i<=$end_pos
print F $arr[$i]->[0]."\n";
for my $p (@{$arr[$i]->[1]}){
print F $p->[0]." ".$p->[1]."|\n";
}
}
close(F);
}
ASKER
Sorry,
I cannot test today, but I put a file on a server:
http://hippler.bio.upenn.edu/2c06.qgp
Anyway,
when viewing the input file with emacs I keep seeing:
^@
What is that and would it pose a problem?
Before I write anything to disk the terminal '|' should be removed:
@queries = @records = @d.
sub someSub{
my ($self, $start, $end, $node, $inputDir, $serverSubTaskDir, $nodeSubTaskDir) = @_;
open(F,">$serverSubTaskDir /queries.i nput");
my $diff = $end-$start;
print F "$diff"."\n";
print F "$self->{firstline};
my @tmp = @{$self->{queries}}[$start ..$end];
#Here I would like to delete the last line of the sub array in the last element of the tmp array
#Some magic code:
#my $discard = pop @{tmp[$#tmp]->[1]};
#why does the above not work as I thought it would?
print F @tmp;
close F;
$node->runCmd("cp $serverSubTaskDir/queries. input $nodeSubTaskDir/queries.in put");
}
Any ideas?
Thanks,
Jens
I cannot test today, but I put a file on a server:
http://hippler.bio.upenn.edu/2c06.qgp
Anyway,
when viewing the input file with emacs I keep seeing:
^@
What is that and would it pose a problem?
Before I write anything to disk the terminal '|' should be removed:
@queries = @records = @d.
sub someSub{
my ($self, $start, $end, $node, $inputDir, $serverSubTaskDir, $nodeSubTaskDir) = @_;
open(F,">$serverSubTaskDir
my $diff = $end-$start;
print F "$diff"."\n";
print F "$self->{firstline};
my @tmp = @{$self->{queries}}[$start
#Here I would like to delete the last line of the sub array in the last element of the tmp array
#Some magic code:
#my $discard = pop @{tmp[$#tmp]->[1]};
#why does the above not work as I thought it would?
print F @tmp;
close F;
$node->runCmd("cp $serverSubTaskDir/queries.
}
Any ideas?
Thanks,
Jens
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks alot you two,
the problem resides probably somewhere else.
My c++ class actually allows for some errors,
but it seems like the file transfer from Windows
to Unix screws up the files.
Anyway,
both solutions worked as far as I asked you.
I will switch to <XML> tags, now to make it
more clear.
I will split the points among you.
Thanks again,
Jens
the problem resides probably somewhere else.
My c++ class actually allows for some errors,
but it seems like the file transfer from Windows
to Unix screws up the files.
Anyway,
both solutions worked as far as I asked you.
I will switch to <XML> tags, now to make it
more clear.
I will split the points among you.
Thanks again,
Jens
use strict;
my $in_file="input_file_name"
my @d=ReadData($in_file);
PrintData(@d);
sub ReadData{
my ($in_file)=@_;
open(F,"$in_file") or die "failed to open $in_file\n";
my @data;
my $reading_what="header";
my $l;
my ($x,$y);
while($l=<F>){
chomp $l;
#print "-$l-\n";
if($l eq "|"){
$reading_what="header";
next;
}
if($reading_what eq "header"){
push(@data,[$l]);
$reading_what="xy";
} else {
$l=~m/(.+?)\s+(.+)/;
($x,$y)=($1,$2);
push(@{$data[$#data]->[1]}
}
}
close(F);
return @data;
}
sub PrintData{
for(@_){
print $_->[0]."\n";
for my $p (@{$_->[1]}){
print $p->[0]." ".$p->[1]."\n";
}
print "|\n";
}
}