• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 200
  • Last Modified:

counting lines in a file

HI

i have a file that many strings of the form

name1:name2:                          23

basically of the form


name1:name2:                          (some positive number)

can you help me do the following

1) count number of lines with name1
2) count uniques lines  with "name1:name2"
3) count the total (3rd colum) per unique 1) and 2) {above two caess}


0
Vlearns
Asked:
Vlearns
  • 9
  • 6
1 Solution
 
VlearnsAuthor Commented:
actually the format is

name1:name2:name3                          (some positive number)


name1:name2:name3                          23
0
 
woolmilkporcCommented:
1) grep -c "name1" inputfile

2) grep -c "name1:name2" inputfile

3a)

awk  '/name1/ {count=count+$NF} END {print count}' inputfile

3b)

awk  '/name1:name2/ {count=count+$NF} END {print count}' inputfile

wmp
0
 
VlearnsAuthor Commented:
Hi, thanks for the resposne, i will not know all the name1 ahead of times....name1 is a field that can have different names..


like
aaaa:cccccc:bbbbbb   1
bbbb.iiiiiiiii:nnnnnnn       2

i want to group by name1 field....


like (total)aaaa:dontcare   total count = 13
      (total)bbbbbbb:dontcare   total count = 14

so on...


   
0
Cloud Class® Course: Microsoft Windows 7 Basic

This introductory course to Windows 7 environment will teach you about working with the Windows operating system. You will learn about basic functions including start menu; the desktop; managing files, folders, and libraries.

 
VlearnsAuthor Commented:
i gues the algoritm is


total number of "name1:name2:name3"

1) look at each line of the file.
2) if name 1:name2:name3  is new and unseen, add it to the set of Seen, increment the count  by the (some positive number, last column)
3) if  name1:name2:name3 is already been seen(exists in the set), increment the count of that entry in the set by 1.

total number of  unique name1:name2:name3

1) look at every line.
2) if  name1:name2:name3 is new than increment count by 1
3) if name1:name2:name:3 is seen before, don't increment the count





0
 
woolmilkporcCommented:
1)

awk -F: '{A[$1]++} END {for (N in A) print "Total". N, "=", A[N]}' inputfile

2)

awk -F: '{A[$1":"$2]++} END {for (N in A) print "Total:", N, "=", A[N]}' inputfile

Could you explain (3) a bit more (output example)?

wmp
0
 
woolmilkporcCommented:
Typo in (1), sorry:

awk -F: '{A[$1]++} END {for (N in A) print "Total:", N, "=", A[N]}' inputfile
0
 
woolmilkporcCommented:
For (3), did you think of something like this?

a)

awk -F: '{A[$1]+=substr($NF,match($NF,"[\t ]"))} END {for (N in A) print "Sum:", N, "=", A[N]}'

b)

awk -F: '{A[$1":"$2]+=substr($NF,match($NF,"[\t ]"))} END {for (N in A) print "Sum:", N, "=", A[N]}'
0
 
woolmilkporcCommented:
Add "inputfile"  at the end of each awk line!
0
 
VlearnsAuthor Commented:
hi thanks,


how can i change your code

to calculate  unique entries

user1:MAP:AD:TOTAL:           45
user2:MAPAD:TOTAL:            23

user1:MAP:AD:TOTAL :            3
user3: MAP:AD:TOTAL:            10



such that my output file is


user1:MAP:AD:TOTAL:    48 (45+3)
user2:MAP:AD:TOTAL      23
user3: MAP:AD:TOTAL:            10


then i can a wc -l on the new file and count the number of lines right?




0
 
VlearnsAuthor Commented:
maybe start with
 grep "MAP:AD:TOTAL:" 23.map  > /home/u


0
 
woolmilkporcCommented:
awk does not have multidimensional arrays, so your new requirement would imply doing a much more elaborate programming.

Sorry, I don't have the time for such a big thing right now.
0
 
woolmilkporcCommented:
... or what do you mean with "48 (45+3) ?
0
 
VlearnsAuthor Commented:
48 is the sum of ttwo values for user1, ignore the bracketed stuff
0
 
woolmilkporcCommented:
OK.

For the following to work you must take care not to put spaces around the colons (empty lines are dealt with, however), so your inputfile should look like this:

user1:MAP:AD:TOTAL:           45
user2:MAPAD:TOTAL:            23

user1:MAP:AD:TOTAL:            3
user3:MAP:AD:TOTAL:            10

If you can accomplish this just do:

awk  '$0!="" {A[$1]+=$NF} END {for (N in A) print N, A[N]}' inputfile

wmp
0
 
woolmilkporcCommented:
To add a linecount avoiding a subsequent "wc -l":

awk  '$0!="" {A[$1]+=$NF} END {for (N in A) {E++; print N, A[N]} print "\nNumber of Lines:",E}' inputfile
0
 
arnoldCommented:
Is perl an absolute necessacity/requirement?

Here is a code example that will build a hash table reference to name1 and then count each occurrence.

you can have
a hash reference
                           name1 that refers to another reference
                                                         name2 that refers to another reference etc.
Then when you are going through the DE-referencing to extract the data, you can sum the counts in each sub to answer the unique count per instance
name1 total
name1:name2 total
name1:name2:name3 total etc.

                                                   
#!/usr/bin/perl

$test="count";
#the below will read in data from standard input and deals with only one column of data (name1). Using @array=split(/:/,$_); $array[0] will have the value of name1.$#array will report the number of elements in the array.
while (<>) {
       chomp();
       $hash_reference->{$_}->{$test}+=1;
}
foreach $key_hash (keys %$hash_reference) { 
        foreach $second_key (keys %{$hash_reference->{$key_hash}} )  { 
                print "$second_key\n";
                print "$key_hash: " . $hash_reference->{$key_hash}->{$test} . "\n";
         } 
}

Open in new window

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 9
  • 6
Tackle projects and never again get stuck behind a technical roadblock.
Join Now