[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Store value from text file into array and appending a unique value

Posted on 2011-10-03
24
Medium Priority
?
200 Views
Last Modified: 2012-08-13
Hi all,

I have a list of values stored in a text file as per below:
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3040
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3040
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3024
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3024

I am trying to add an extra column with a unique value/number at the end of each line before storing the output into an array:

1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3040|1
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3040|2
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3024|3
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3024|4

I have tried the following code but when I print the output, only one line is displayed instead of all 4.

      my $i=1;

      while(<GL>) {

      @result = split /\|/;
      push(@result, "|$i");
      $i++;

      }
      close(GL);
      return @results;
      
Would appreciate some advice. Thank you
Jason
      
0
Comment
Question by:Jason_Sutiono
  • 11
  • 9
  • 3
  • +1
24 Comments
 
LVL 12

Expert Comment

by:tel2
ID: 36908372
Hi Jason,

Could you please tell me what this is for?  Is it homework?

You have a one dimensional array, but it sounds as if you're wanting to store 2 dimensions of data into it.  Do you want each row in a separate element of a 1 dimensional array, like this:
    @result[1] = 1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3040|1
    @result[2] = 1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3040|2
or split the fields up, and store them in a 2 dimensional array like this:
    @result[1][0] = 1BSH0008655
    @result[1][1] = AEO0001
    @result[1][2] = 1BSH0008200
    @result[1][3] = CI
    @result[1][4] = SPPNP3040
    @result[1][5] = 1
    @result[2][0] = 1BSH0008655
    ...etc...

Right now you're just overwriting the same one-dimensional array with the fields of each row.

Also, where did you open the GL file?
Also, you're trying to return @results (note the "s"), but the data is in @result.
Also, this doesn't look like a subroutine, so you can't "return" from it anyway.  Or is this just part of the code?
Also, you should probably do a "chomp;" before the "...split..." line, to remove the newline from the last field.
Also, instead of pushing "|$i", you should just push $i, since you have split the fields based on the "|" separater.

You could detect some of these issues by using the trick ozo gave you in your first question, i.e.:
    perl -Mdiagnostics yourscript.pl
and/or you could put this at the beginning of your code:
    use strict;
    use warnings;
But then you have to declare everything (e.g. with "my").

Awaiting your answers to my questions above...

tel2
0
 
LVL 9

Expert Comment

by:oheil
ID: 36908482
Only the last line is written out, because you overwrite the array @result in each step of the while loop.

      my $i=1;

      @result = ();
      while(<GL>) {

      @tmp_result = split /\|/;
      push(@tmp_result, "$i");

      push(@result, @tmp_result);
      $i++;

      }
      close(GL);
      return @results;

Open in new window


Now @result is an array of arrays. Each array in @result contains the splitted elements of the corresponding line plus the new new integer value.
If you want an array of the lines change
  push(@result, @tmp_result);
top
 $tmp_array = join('|',@tmp_array)
 push(@result, $tmp_result);

Oli

0
 
LVL 28

Expert Comment

by:FishMonger
ID: 36910576
Based on the problem statement I would not assume the OP wants a 2D array and based on my read, a simple assignment would suffice.

#!/usr/bin/perl

use strict;  
use warnings;

my $i = 0;
my @result;

while ( <DATA> ) {
    chomp;
    $result[$i++] = "$_|$.\n";
}
print @result;

__DATA__
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3040
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3040
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3024
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3024

Open in new window

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 28

Expert Comment

by:FishMonger
ID: 36910626
If a 2D array is what is wanted, then I'd probably do it this way.
#!/usr/bin/perl

use strict;  
use warnings;
use Data::Dumper;

my $i = 0;
my @result;

while ( <DATA> ) {
    chomp;
    $result[$i++] = [ split /\|/ ];
}
print Dumper \@result;

__DATA__
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3040
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3040
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3024
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3024

Open in new window


Which outputs:
$VAR1 = [
          [
            '1BSH0008655',
            'AEO0001',
            '1BSH0008200',
            'CI',
            'SPPNP3040'
          ],
          [
            '1BSH0008655',
            'AEO0001',
            '1BSH0008200',
            'CI',
            'SPPNP3040'
          ],
          [
            '1BSH0008655',
            'AEO0001',
            '1BSH0008200',
            'CI',
            'SPPNP3024'
          ],
          [
            '1BSH0008655',
            'AEO0001',
            '1BSH0008200',
            'CI',
            'SPPNP3024'
          ]
        ];
0
 
LVL 12

Expert Comment

by:tel2
ID: 36912785
Not bad, FishMonger, except I don't think you can simply do this:
    $result[$i++] = [ split /\|/ ];
as Jason's wanting $i on the end, so I guess something like this is needed:
    $result[$i] = [ split /\|/, "$_|$i" ];
    $i++;
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 36913104
tel2,

But why the 2D array?  The OP never indicated that the 2D was needed/wanted.  If it isn't wanted, then my first post would be the best option of what's been suggested so far.

For the 2D assignment, the $i array index would need to be initialized to 0 and the value being added to the end, based on the OP's sample, would need to be $i + 1.

I'd use post increment on $i and $. for the value to accomplish that. (assuming all lines in the file are being processed and none are skipped.)
$result[$i++] = [ split(/\|/, $_), "|$." ];

Open in new window

0
 
LVL 12

Expert Comment

by:tel2
ID: 36913345
Hi FishMonger,

> But why the 2D array?
I assume you mean something like: "Good point, but why the 2D array?".  We can only guess what Jason wants, and I discussed the 2 possibilities (1D & 2D) in my post, as did you.  I guessed 2D is possibly what he's wanting, based on the fact that he had "split" the fields of each line into separate array elements.  Maybe he split it without good reason.  Maybe he split it so he could easily access the individual fields later.  I don't think we're seeing all his code, so who knows what he's gonna do after this section.  That's why I covered both possibilities, and asked him the question "Do you want each row...".  But yes, 1D is most likely all he needs, though.

> For the 2D assignment, the $i array index would need to be initialized to 0...
Good point, or it could be initialised to 1 (as Jason did), and then the "+ 1" would not be required.

> I'd use post increment on $i and $. for the value to accomplish that.
Nice.
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 36913451
>> For the 2D assignment, the $i array index would need to be initialized to 0...
>Good point, or it could be initialised to 1 (as Jason did), and then the "+ 1" would not be required.

But then you'd need $i -1 when indexing the array or latter on account for $result[0] being undef.
0
 
LVL 12

Expert Comment

by:tel2
ID: 36913487
Not if I was doing it as I'd mentioned above, which was:
    $result[$i] = [ split /\|/, "$_|$i" ];
    $i++;
Of have I misunderstood you?
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 36913613
Arrays are zero indexed, so if you initialize $i to 1, then your first assignment will be assigning the second element, not the first.
#!/usr/bin/perl

use strict;  
use warnings;
use Data::Dumper;

my $i = 1;
my @result;

while ( <DATA> ) {
    chomp;
    $result[$i] = [ split /\|/, "$_|$i" ];
    $i++
}
print Dumper \@result;

__DATA__
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3040
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3040
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3024
1BSH0008655|AEO0001|1BSH0008200|CI|SPPNP3024

Open in new window



$VAR1 = [
          undef,
          [
            '1BSH0008655',
            'AEO0001',
            '1BSH0008200',
            'CI',
            'SPPNP3040',
            '1'
          ],
          [
            '1BSH0008655',
            'AEO0001',
            '1BSH0008200',
            'CI',
            'SPPNP3040',
            '2'
          ],
          [
            '1BSH0008655',
            'AEO0001',
            '1BSH0008200',
            'CI',
            'SPPNP3024',
            '3'
          ],
          [
            '1BSH0008655',
            'AEO0001',
            '1BSH0008200',
            'CI',
            'SPPNP3024',
            '4'
          ]
        ];
0
 
LVL 12

Expert Comment

by:tel2
ID: 36913661
I know, but what problem does that cause, FishMonger?
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 36913700
It means that you need to remember to account for that undefined element later in the script.  Forgetting to account for it later introduces a common programming bug known as "off by 1".  http://en.wikipedia.org/wiki/Off-by-one_error

A better question to ask yourself is, "why intentionally code it in such a way that easily introduces bugs, when the proper solution would be to use the correct indexes?"
0
 

Author Comment

by:Jason_Sutiono
ID: 36913761
Hi guys,

Thanks for the feedbacks.  @tel2 yes I have only displayed a portion of the code. Its for work actually and im obviously new to perl. Not much of a programmer either. But I do find learning perl quite fun(if you do get the code working of course!)

Basically I need to process over 200k lines from a text file. After putting each line into array and assigning the integer, I would need to access the individual fields. Would you guys suggest using the 2d array or 1d array? I'll need time to try out  the aforementioned suggestions. I really appreciate the overwhelming support.
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 36913879
Based on the info you've given so far a 1D array would be the proper choice.
0
 
LVL 12

Expert Comment

by:tel2
ID: 36913944
Hi FishMonger,

Fair point about the "off by 1" thing.  Depends on the kind of accessing that's going to happen next, I guess, but 0 is safer.  It's just that many (human) types prefer 1, I guess, and I was trying to cater to that preference that seemed to be evident in Jason's original post.

Why are you suggesting a 1D array if Jason says he "would need to access the individual fields"?
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 36914127
I'm suggesting the 1D array because it's a little easier to deal with for beginners and in most cases I try to follow the "KISS" principle.  However, since we don't have enough details to see the bigger picture, the choice comes down to a flip of the coin.  It may turn out that the 2D array is a better choice.

The approaches suggested so far dictates that you loop over the data at least twice.  In general, that may not be the best approach.  I generally try to loop over the data once processing each record as I go.  However, that is not always possible or the best approach.

The devil is in the details which we don't have.
0
 

Author Comment

by:Jason_Sutiono
ID: 36914135
Hi all,

To make it easier I'll outline all that I am trying to achieve.

I have got the following text file as the input:

Col 0               | Col 1      | Col 2              | 3|  4                   |5 |  6             | 7          |  8       |9|   10             |   11  
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|44|BN03054B|00015K|-381.64|1|07-JUN-2004|SPPNP3040
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|44|BN03054B|00015K|381.64|1|07-JUN-2004|SPPNP3040
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|425|CM03054G|0001PM|45.00|1|15-JUN-2004|SPPNP3024
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|425|CM03054G|0001PM|-45.00|1|15-JUN-2004|SPPNP3024
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|424|BN030504|0001PW|107.00|1|15-JUN-2004|SPPNP3044
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|424|BN030504|0001PW|207.00|1|15-JUN-2004|SPPNP3044
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|423|BN03054C|0001Q1|427.50|1|15-JUN-2004|SPPNP3040
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|423|BN03054C|0001Q1|527.50|1|15-JUN-2004|SPPNP3040



I need to sum the amount in column 8 based on the reference column (column 5). If the sum is 0, exclude them. otherwise, include all entries in the output file.

The output I am trying to achieve is as per below:

1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|424|BN030504|0001PW|107.00|1|15-JUN-2004|SPPNP3044
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|424|BN030504|0001PW|207.00|1|15-JUN-2004|SPPNP3044
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|423|BN03054C|0001Q1|427.50|1|15-JUN-2004|SPPNP3040
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|423|BN03054C|0001Q1|527.50|1|15-JUN-2004|SPPNP3040

Due to the fact that there are no unique 'primary id' for each line, I actually appended the extra integer to act a 'dummy id' since hash would only pick 1 line with unique value as the key.

I would later on drop the dummy ID to allow me to display everything that has a difference in the sum value. Otherwise, I will end up with:(which I am trying to avoid)

1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|424|BN030504|0001PW|307.00|1|15-JUN-2004|SPPNP3044
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|423|BN03054C|0001Q1|955|1|15-JUN-2004|SPPNP3040

I have attached my code. I tried to incorporate FishMonger's suggestion. But I am getting the error 'use of uninitialized value in concatenation (.) or string ./reconcile.pl line 27. Also, I get the following lines in the output file:

|ARRAY(0x80e6a10)|ARRAY(0x8063504)|ARRAY(0x80635b8)|ARRAY(0x806366c)|ARRAY(0x8063720)|ARRAY(0x80637d4)|ARRAY(0x8063888)|ARRAY(0x80e8618)|ARRAY(0x80e86cc)|ARRAY(0x80e8780)|ARRAY(0x80e8834)

Hope its clear enough. Thanks in advance!

#!/usr/bin/perl -s

use strict;
use warnings;
use Data::Dumper;

#-------------------------------------------------------------------------
# Configuration Variables
my $data = "/u/xi6505/pronto/cus/imports/test.txt";
my $file = "/u/xi6505/pronto/cus/imports/sample.csv";
#-------------------------------------------------------------------------

my @cola;
my %price;
my %columns;
my $num=0;

	@cola = process_import_file($data);
	
	#print Dumper \@cola;
	
	for (@cola){
	chomp;
	$price{$cola[5]} += $cola[8];
    $a{$cola[5]}{$cola[12]}="$cola[0]|$cola[1]|$cola[2]|$cola[3]|$cola[4]|$cola[5]|$cola[6]|$cola[7]|$cola[8]|$cola[9]|$cola[10]|$cola[11]";
	
	}
	

open(INFO, ">$file");	# Open for output

foreach my $id (sort keys %a) {
   
   
   foreach my $name (keys %{$a{$id}}) { #for each price(id)
		if ($price{$id} != $num) {

	print INFO "$a{$id}{$name}\n";

	 
	 }
	 }
	 }
	 
	 close INFO;
	 

   
# Takes file name as input and returns
sub process_import_file {
	
	open(GL,$data) or debug(0,"Can't parse po file - $data",1);
	my @result;
	my $i=1;
	
	while(<GL>) {
    chomp;
    $result[$i] = [ split /\|/, "$_|$i" ];
    $i++;
	
}

	
	close(GL);
	return @result;
	
	
}

Open in new window

0
 
LVL 12

Expert Comment

by:tel2
ID: 36914212
Hi Jason,

Based on the info you gave prior to your last post, i.e. you "would need to access the individual fields", I think a 2D array would be the proper choice, as that would allow easy access to every field of every row.

Having seen your last post, it looks as if there's more work left than I have time for, so I'll leave this to FishMonger or anyone else who has time.
0
 
LVL 12

Expert Comment

by:tel2
ID: 36914256
...that is, my 2D array suggestion, is, in my view, in line with the "KISS" principle, because it's generally simpler to access individual fields when they are in individual array elements.
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 36914377
The warning is due to the fact that your sub is building a 2D array, but you then loop over that array as if it were a 1D array.

You also didn't declare the %a hash, which was probably a typo when posting, because that missing declaration would generate compilation errors preventing the script from running.

Unless the data needs to be sorted by "col5", I'd not use either the 2D array or the HoH.  Instead, I'd simply process the file line-by-line which is more efficient.  If you do need it sorted by that field, I'd use a HoA (Hash-of-Arrays).

Other recommendations:

1) Fix you indentation, it's all over the map.

2) Use the 3 arg form of open and a lexical var for the filehandle.

3) ALWAYS check the return code of your open calls and take proper action if they fail.

If you don't need the sorting, your script could be reduced to this:
#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

# Configuration Variables
my $data = "/u/xi6505/pronto/cus/imports/test.txt";
my $file = "/u/xi6505/pronto/cus/imports/sample.csv";
my $num  = 0;

open(my $gl_fh, '<', $data) or debug(0, "Can't parse po file '$data' <$!>", 1);
open(my $csv_fh, '>', $file) or die "can't open '$file' <$!>";

while (my $line = <$gl_fh> ) {
    my @csv_data = split /\|/, $line;
    next if $csv_data[5] + $csv_data[8] == $num;
    print {$csv_fh} $line;
}

close $gl_fh;
clese $csv_fh;

Open in new window


0
 
LVL 12

Expert Comment

by:tel2
ID: 36914402
Nice work, FishMonger.

Keeping you on...

tel2
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 36914434
Since we're only concerned with 2 fields in the decision, an array slice would be better.

    my ($fld5, $fld8) = (split /\|/, $line)[5,8];
    next if $fld5 + $fld8 == $num;

Open in new window

0
 

Author Comment

by:Jason_Sutiono
ID: 36914573
Hi Fishmonger,

Thank you for your help. I just tried your code. But it is printing everything as it is unto the output instead of adding col 8 and excluding them if the sum is = 0.

Here is my input

1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|44|BN03054B|00015K|-381.64|1|07-JUN-2004|SPPNP3040
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|44|BN03054B|00015K|381.64|1|07-JUN-2004|SPPNP3040
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|425|CM03054G|0001PM|45.00|1|15-JUN-2004|SPPNP3024
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|425|CM03054G|0001PM|-45.00|1|15-JUN-2004|SPPNP3024
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|424|BN030504|0001PW|107.00|1|15-JUN-2004|SPPNP3044


After I run the code the output is:
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|44|BN03054B|00015K|-381.64|1|07-JUN-2004|SPPNP3040
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|44|BN03054B|00015K|381.64|1|07-JUN-2004|SPPNP3040
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|425|CM03054G|0001PM|45.00|1|15-JUN-2004|SPPNP3024
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|425|CM03054G|0001PM|-45.00|1|15-JUN-2004|SPPNP3024
1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|424|BN030504|0001PW|107.00|1|15-JUN-2004|SPPNP3044


Instead of printing just:

1BSH0008655|AEO0001|1BSH0008200|CI|03-MAY-2004|424|BN030504|0001PW|107.00|1|15-JUN-2004|SPPNP3044

since the 1st 2 lines (-381.64+381.64) = 0 based on reference 44  and 3rd and 4th lines (45+-45) = 0 based on reference 425 and so are excluded.

I guess I wasn't clear in explaining my requirements earlier.

Would appreciate your advice again. Thank you.
 
0
 
LVL 28

Accepted Solution

by:
FishMonger earned 1000 total points
ID: 36921792
Try this:
#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

# Configuration Variables
my $data = "/u/xi6505/pronto/cus/imports/test.txt";
my $file = "/u/xi6505/pronto/cus/imports/sample.csv";
my $num  = 0;

open(my $gl_fh, '<', $data) or debug(0, "Can't parse po file - $data <$!>", 1);
open(my $csv_fh, '>', $file) or die "can't open '$file' <$!>";

while (my $line = <$gl_fh> ) {
    my $num1 = (split /\|/, $line)[8];
    print $line and last if eof;
    
    $line = <$gl_fh>;
    my $num2 = (split /\|/, $line)[8];
    next if $num1 + $num2 == $num;
    print $line;
}

close $gl_fh;
clese $csv_fh;

Open in new window

0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

872 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question