• Status: Solved
• Priority: Medium
• Security: Public
• Views: 200

# counting lines in a file

HI

i have a file that many strings of the form

name1:name2:                          23

basically of the form

name1:name2:                          (some positive number)

can you help me do the following

1) count number of lines with name1
2) count uniques lines  with "name1:name2"
3) count the total (3rd colum) per unique 1) and 2) {above two caess}

0
Vlearns
• 9
• 6
1 Solution

Author Commented:
actually the format is

name1:name2:name3                          (some positive number)

name1:name2:name3                          23
0

Commented:
1) grep -c "name1" inputfile

2) grep -c "name1:name2" inputfile

3a)

awk  '/name1/ {count=count+\$NF} END {print count}' inputfile

3b)

awk  '/name1:name2/ {count=count+\$NF} END {print count}' inputfile

wmp
0

Author Commented:
Hi, thanks for the resposne, i will not know all the name1 ahead of times....name1 is a field that can have different names..

like
aaaa:cccccc:bbbbbb   1
bbbb.iiiiiiiii:nnnnnnn       2

i want to group by name1 field....

like (total)aaaa:dontcare   total count = 13
(total)bbbbbbb:dontcare   total count = 14

so on...

0

Author Commented:
i gues the algoritm is

total number of "name1:name2:name3"

1) look at each line of the file.
2) if name 1:name2:name3  is new and unseen, add it to the set of Seen, increment the count  by the (some positive number, last column)
3) if  name1:name2:name3 is already been seen(exists in the set), increment the count of that entry in the set by 1.

total number of  unique name1:name2:name3

1) look at every line.
2) if  name1:name2:name3 is new than increment count by 1
3) if name1:name2:name:3 is seen before, don't increment the count

0

Commented:
1)

awk -F: '{A[\$1]++} END {for (N in A) print "Total". N, "=", A[N]}' inputfile

2)

awk -F: '{A[\$1":"\$2]++} END {for (N in A) print "Total:", N, "=", A[N]}' inputfile

Could you explain (3) a bit more (output example)?

wmp
0

Commented:
Typo in (1), sorry:

awk -F: '{A[\$1]++} END {for (N in A) print "Total:", N, "=", A[N]}' inputfile
0

Commented:
For (3), did you think of something like this?

a)

awk -F: '{A[\$1]+=substr(\$NF,match(\$NF,"[\t ]"))} END {for (N in A) print "Sum:", N, "=", A[N]}'

b)

awk -F: '{A[\$1":"\$2]+=substr(\$NF,match(\$NF,"[\t ]"))} END {for (N in A) print "Sum:", N, "=", A[N]}'
0

Commented:
Add "inputfile"  at the end of each awk line!
0

Author Commented:
hi thanks,

how can i change your code

to calculate  unique entries

such that my output file is

then i can a wc -l on the new file and count the number of lines right?

0

Author Commented:
grep "MAP:AD:TOTAL:" 23.map  > /home/u

0

Commented:
awk does not have multidimensional arrays, so your new requirement would imply doing a much more elaborate programming.

Sorry, I don't have the time for such a big thing right now.
0

Commented:
... or what do you mean with "48 (45+3) ?
0

Author Commented:
48 is the sum of ttwo values for user1, ignore the bracketed stuff
0

Commented:
OK.

For the following to work you must take care not to put spaces around the colons (empty lines are dealt with, however), so your inputfile should look like this:

If you can accomplish this just do:

awk  '\$0!="" {A[\$1]+=\$NF} END {for (N in A) print N, A[N]}' inputfile

wmp
0

Commented:
To add a linecount avoiding a subsequent "wc -l":

awk  '\$0!="" {A[\$1]+=\$NF} END {for (N in A) {E++; print N, A[N]} print "\nNumber of Lines:",E}' inputfile
0

Commented:
Is perl an absolute necessacity/requirement?

Here is a code example that will build a hash table reference to name1 and then count each occurrence.

you can have
a hash reference
name1 that refers to another reference
name2 that refers to another reference etc.
Then when you are going through the DE-referencing to extract the data, you can sum the counts in each sub to answer the unique count per instance
name1 total
name1:name2 total
name1:name2:name3 total etc.

``````#!/usr/bin/perl

\$test="count";
#the below will read in data from standard input and deals with only one column of data (name1). Using @array=split(/:/,\$_); \$array[0] will have the value of name1.\$#array will report the number of elements in the array.
while (<>) {
chomp();
\$hash_reference->{\$_}->{\$test}+=1;
}
foreach \$key_hash (keys %\$hash_reference) {
foreach \$second_key (keys %{\$hash_reference->{\$key_hash}} )  {
print "\$second_key\n";
print "\$key_hash: " . \$hash_reference->{\$key_hash}->{\$test} . "\n";
}
}
``````
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

## Featured Post

• 9
• 6
Tackle projects and never again get stuck behind a technical roadblock.