Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 197
  • Last Modified:

counting lines in a file

HI

i have a file that many strings of the form

name1:name2:                          23

basically of the form


name1:name2:                          (some positive number)

can you help me do the following

1) count number of lines with name1
2) count uniques lines  with "name1:name2"
3) count the total (3rd colum) per unique 1) and 2) {above two caess}


0
Vlearns
Asked:
Vlearns
  • 9
  • 6
1 Solution
 
VlearnsAuthor Commented:
actually the format is

name1:name2:name3                          (some positive number)


name1:name2:name3                          23
0
 
woolmilkporcCommented:
1) grep -c "name1" inputfile

2) grep -c "name1:name2" inputfile

3a)

awk  '/name1/ {count=count+$NF} END {print count}' inputfile

3b)

awk  '/name1:name2/ {count=count+$NF} END {print count}' inputfile

wmp
0
 
VlearnsAuthor Commented:
Hi, thanks for the resposne, i will not know all the name1 ahead of times....name1 is a field that can have different names..


like
aaaa:cccccc:bbbbbb   1
bbbb.iiiiiiiii:nnnnnnn       2

i want to group by name1 field....


like (total)aaaa:dontcare   total count = 13
      (total)bbbbbbb:dontcare   total count = 14

so on...


   
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
VlearnsAuthor Commented:
i gues the algoritm is


total number of "name1:name2:name3"

1) look at each line of the file.
2) if name 1:name2:name3  is new and unseen, add it to the set of Seen, increment the count  by the (some positive number, last column)
3) if  name1:name2:name3 is already been seen(exists in the set), increment the count of that entry in the set by 1.

total number of  unique name1:name2:name3

1) look at every line.
2) if  name1:name2:name3 is new than increment count by 1
3) if name1:name2:name:3 is seen before, don't increment the count





0
 
woolmilkporcCommented:
1)

awk -F: '{A[$1]++} END {for (N in A) print "Total". N, "=", A[N]}' inputfile

2)

awk -F: '{A[$1":"$2]++} END {for (N in A) print "Total:", N, "=", A[N]}' inputfile

Could you explain (3) a bit more (output example)?

wmp
0
 
woolmilkporcCommented:
Typo in (1), sorry:

awk -F: '{A[$1]++} END {for (N in A) print "Total:", N, "=", A[N]}' inputfile
0
 
woolmilkporcCommented:
For (3), did you think of something like this?

a)

awk -F: '{A[$1]+=substr($NF,match($NF,"[\t ]"))} END {for (N in A) print "Sum:", N, "=", A[N]}'

b)

awk -F: '{A[$1":"$2]+=substr($NF,match($NF,"[\t ]"))} END {for (N in A) print "Sum:", N, "=", A[N]}'
0
 
woolmilkporcCommented:
Add "inputfile"  at the end of each awk line!
0
 
VlearnsAuthor Commented:
hi thanks,


how can i change your code

to calculate  unique entries

user1:MAP:AD:TOTAL:           45
user2:MAPAD:TOTAL:            23

user1:MAP:AD:TOTAL :            3
user3: MAP:AD:TOTAL:            10



such that my output file is


user1:MAP:AD:TOTAL:    48 (45+3)
user2:MAP:AD:TOTAL      23
user3: MAP:AD:TOTAL:            10


then i can a wc -l on the new file and count the number of lines right?




0
 
VlearnsAuthor Commented:
maybe start with
 grep "MAP:AD:TOTAL:" 23.map  > /home/u


0
 
woolmilkporcCommented:
awk does not have multidimensional arrays, so your new requirement would imply doing a much more elaborate programming.

Sorry, I don't have the time for such a big thing right now.
0
 
woolmilkporcCommented:
... or what do you mean with "48 (45+3) ?
0
 
VlearnsAuthor Commented:
48 is the sum of ttwo values for user1, ignore the bracketed stuff
0
 
woolmilkporcCommented:
OK.

For the following to work you must take care not to put spaces around the colons (empty lines are dealt with, however), so your inputfile should look like this:

user1:MAP:AD:TOTAL:           45
user2:MAPAD:TOTAL:            23

user1:MAP:AD:TOTAL:            3
user3:MAP:AD:TOTAL:            10

If you can accomplish this just do:

awk  '$0!="" {A[$1]+=$NF} END {for (N in A) print N, A[N]}' inputfile

wmp
0
 
woolmilkporcCommented:
To add a linecount avoiding a subsequent "wc -l":

awk  '$0!="" {A[$1]+=$NF} END {for (N in A) {E++; print N, A[N]} print "\nNumber of Lines:",E}' inputfile
0
 
arnoldCommented:
Is perl an absolute necessacity/requirement?

Here is a code example that will build a hash table reference to name1 and then count each occurrence.

you can have
a hash reference
                           name1 that refers to another reference
                                                         name2 that refers to another reference etc.
Then when you are going through the DE-referencing to extract the data, you can sum the counts in each sub to answer the unique count per instance
name1 total
name1:name2 total
name1:name2:name3 total etc.

                                                   
#!/usr/bin/perl

$test="count";
#the below will read in data from standard input and deals with only one column of data (name1). Using @array=split(/:/,$_); $array[0] will have the value of name1.$#array will report the number of elements in the array.
while (<>) {
       chomp();
       $hash_reference->{$_}->{$test}+=1;
}
foreach $key_hash (keys %$hash_reference) { 
        foreach $second_key (keys %{$hash_reference->{$key_hash}} )  { 
                print "$second_key\n";
                print "$key_hash: " . $hash_reference->{$key_hash}->{$test} . "\n";
         } 
}

Open in new window

0

Featured Post

Become an Android App Developer

Ready to kick start your career in 2018? Learn how to build an Android app in January’s Course of the Month and open the door to new opportunities.

  • 9
  • 6
Tackle projects and never again get stuck behind a technical roadblock.
Join Now