Link to home
Start Free TrialLog in
Avatar of Jason_Sutiono
Jason_Sutiono

asked on

Perl Script to count the number of elements in an array

Hi all,

Would really appreciate some input on how to do the following  with a Perl script to process a text file.

Here is my input file:
col1|col2|col3|col4|col5|col6|col7|col8|col9|col10|col11|col12
BLA|001036|S|3228|10|1|2|3|001036|W035|S|
BLA|001036|S|3228|0|0|0|0|001036|W035|S|08961029909655092918
BLA|001036|S|3228|0|0|0|0|001036|W035|S|08961029909655092926
BLA|001036|S|3228|0|0|0|0|001036|W035|S|08961029909655092934
BLA|001036|S|3228|0|0|0|0|001036|W035|S|08961029909655092942
BLT|600123|S|3437|0|20|0|0|001036|W035|S|
BRO|900177|S|3531|-1|0|0|0|001036|W035|S|
CHL|123777|S|3327|3|0|0|0|001036|W035|S|
CHL|123777|S|3327|0|0|0|0|001036|W035|S|08961029909655093791
CHL|123777|S|3327|0|0|0|0|001036|W035|S|08961029909655093775

The final output that I am trying to achieve:
BLA|001036|S|3228|10|1|2|3|001036|W035|S| |4
BLT|600123|S|3437|0|20|0|0|001036|W035|S| |0
BRO|900177|S|3531|-1|0|0|0|001036|W035|S| |0
CHL|123777|S|3327|3|0|0|0|001036|W035|S| |2

Basically I am trying to count the number of string that appears in the last column and append the count as a new column in the output file.

My references/main keys for the initial array are column 2 (001036) and column 4 (3228).

For each new occurrence of col 2 and col 4(e.g 001036 and 3228), the last column would always be a space (" ").

So if($col[12] != " "), i need to count the number of string in the last column that appeared after it.
W035|S|
W035|S|08961029909655092918
W035|S|08961029909655092926
W035|S|08961029909655092934
W035|S|08961029909655092942

As such, the outcome for line 1 would be:
BLA|001036|S|3228|10|1|2|3|001036|W035|S| |4

In other words, $lastcol(001036)(3228)=4

The count of the strings is appended to the last column.

I would also require col 5,6,7,8 from line 1.

Likewise for 123777 and 3327, since there are 2 strings that appear in the entries below it (08961029909655093791 and 08961029909655093775), the outcome is
CHL|123777|S|3327|3.0000|0.0000|0.0000|0.0000|001036|W035|S| |2

If there are no entries below it, I would just append a 0 at the end of it
e.g BLT|600123|S|3437|0|20|0|0|001036|W035|S| |0

I hope I am clear in my brief.

Looking forward to the responses!!

Thank you in advance!

Jason
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Jason_Sutiono
Jason_Sutiono

ASKER

Thanks Ozo for your help. I will be trying out your solution tomorrow.

Actually there is one thing I forgot to ask.

Input:
col1|col2|col3|col4|col5|col6|col7|col8|col9|col10|col11|col12
MCA|U8350WHT|S|3320|1|0|0|166.5400|U8350WHT|W007|S|
MCA|U8350WHT|S|3320|0|0|0|0|U8350WHT|W007|S|356899040614534
MEL|U8350WHT|S|3532|2|0|0|166.5400|U8350WHT|W007|S|
MEL|U8350WHT|S|3532|0|0|0|0|U8350WHT|W007|S|356899040614526
MEL|U8350WHT|S|3532|0|0|0|0|U8350WHT|W007|S|356899040614658
MOR|U8350WHT|S|3867|1|0|0|166.5400|U8350WHT|W007|S|
MOR|U8350WHT|S|3867|0|0|0|0|U8350WHT|W007|S|356899040614971
PEN|U8350WHT|S|3526|1|0|0|166.5400|U8350WHT|W007|S|
PEN|U8350WHT|S|3526|0|0|0|0|U8350WHT|W007|S|356899040614690

What should I do to get only rows where the last column is equals to " "?

Outcome:
MCA|U8350WHT|S|3320|1|0|0|166.5400|U8350WHT|W007|S|
MEL|U8350WHT|S|3532|2|0|0|166.5400|U8350WHT|W007|S|
MOR|U8350WHT|S|3867|1|0|0|166.5400|U8350WHT|W007|S|
PEN|U8350WHT|S|3526|1|0|0|166.5400|U8350WHT|W007|S|

I have tried to only filter by $col12=" " as per attached but it actually prints out everything without excluding those that are not " ".

Help would be much appreciated.

Thank you!

foreach (<FILE4>) {
		
		if ($col[11] = ~/\S/; {
	push(@soh"$col[0]|$col[1]|$col[2]|$col[3]|$col[4]|$col[5]|$col[6]|$col[7]|$col[8]|$col[9]|$col[10]|$col[11]");

}
}

Open in new window

@soh = grep /\|\s*$/,<FILE4>;
Hi Ozo,

Thank you for your help!!

Its almost there just one thing though. The output that I get is:

BLA|001036|S|3228|10|1|2|3|001036|W035|S|
|4
CHL|123777|S|3327|3|0|0|0|001036|W035|S|
|2
BLT|600123|S|3437|0|20|0|0|001036|W035|S|
|0
BRO|900177|S|3531|-1|0|0|0|001036|W035|S|
|0


How do I get the count value to not print to a new line?

BLA|001036|S|3228|10|1|2|3|001036|W035|S| |4
CHL|123777|S|3327|3|0|0|0|001036|W035|S| |2
BLT|600123|S|3437|0|20|0|0|001036|W035|S| |0
BRO|900177|S|3531|-1|0|0|0|001036|W035|S| |0

Thank you in advance!!
Thanks Ozo u rock!!