Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

counting lines in a file

Posted on 2011-09-09
16
Medium Priority
?
193 Views
Last Modified: 2012-05-12
HI

i have a file that many strings of the form

name1:name2:                          23

basically of the form


name1:name2:                          (some positive number)

can you help me do the following

1) count number of lines with name1
2) count uniques lines  with "name1:name2"
3) count the total (3rd colum) per unique 1) and 2) {above two caess}


0
Comment
Question by:Vlearns
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 6
16 Comments
 

Author Comment

by:Vlearns
ID: 36513417
actually the format is

name1:name2:name3                          (some positive number)


name1:name2:name3                          23
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36513465
1) grep -c "name1" inputfile

2) grep -c "name1:name2" inputfile

3a)

awk  '/name1/ {count=count+$NF} END {print count}' inputfile

3b)

awk  '/name1:name2/ {count=count+$NF} END {print count}' inputfile

wmp
0
 

Author Comment

by:Vlearns
ID: 36513565
Hi, thanks for the resposne, i will not know all the name1 ahead of times....name1 is a field that can have different names..


like
aaaa:cccccc:bbbbbb   1
bbbb.iiiiiiiii:nnnnnnn       2

i want to group by name1 field....


like (total)aaaa:dontcare   total count = 13
      (total)bbbbbbb:dontcare   total count = 14

so on...


   
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:Vlearns
ID: 36513762
i gues the algoritm is


total number of "name1:name2:name3"

1) look at each line of the file.
2) if name 1:name2:name3  is new and unseen, add it to the set of Seen, increment the count  by the (some positive number, last column)
3) if  name1:name2:name3 is already been seen(exists in the set), increment the count of that entry in the set by 1.

total number of  unique name1:name2:name3

1) look at every line.
2) if  name1:name2:name3 is new than increment count by 1
3) if name1:name2:name:3 is seen before, don't increment the count





0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36513766
1)

awk -F: '{A[$1]++} END {for (N in A) print "Total". N, "=", A[N]}' inputfile

2)

awk -F: '{A[$1":"$2]++} END {for (N in A) print "Total:", N, "=", A[N]}' inputfile

Could you explain (3) a bit more (output example)?

wmp
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36513777
Typo in (1), sorry:

awk -F: '{A[$1]++} END {for (N in A) print "Total:", N, "=", A[N]}' inputfile
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36513899
For (3), did you think of something like this?

a)

awk -F: '{A[$1]+=substr($NF,match($NF,"[\t ]"))} END {for (N in A) print "Sum:", N, "=", A[N]}'

b)

awk -F: '{A[$1":"$2]+=substr($NF,match($NF,"[\t ]"))} END {for (N in A) print "Sum:", N, "=", A[N]}'
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36513910
Add "inputfile"  at the end of each awk line!
0
 

Author Comment

by:Vlearns
ID: 36514194
hi thanks,


how can i change your code

to calculate  unique entries

user1:MAP:AD:TOTAL:           45
user2:MAPAD:TOTAL:            23

user1:MAP:AD:TOTAL :            3
user3: MAP:AD:TOTAL:            10



such that my output file is


user1:MAP:AD:TOTAL:    48 (45+3)
user2:MAP:AD:TOTAL      23
user3: MAP:AD:TOTAL:            10


then i can a wc -l on the new file and count the number of lines right?




0
 

Author Comment

by:Vlearns
ID: 36514202
maybe start with
 grep "MAP:AD:TOTAL:" 23.map  > /home/u


0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36514219
awk does not have multidimensional arrays, so your new requirement would imply doing a much more elaborate programming.

Sorry, I don't have the time for such a big thing right now.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36514236
... or what do you mean with "48 (45+3) ?
0
 

Author Comment

by:Vlearns
ID: 36514242
48 is the sum of ttwo values for user1, ignore the bracketed stuff
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36514283
OK.

For the following to work you must take care not to put spaces around the colons (empty lines are dealt with, however), so your inputfile should look like this:

user1:MAP:AD:TOTAL:           45
user2:MAPAD:TOTAL:            23

user1:MAP:AD:TOTAL:            3
user3:MAP:AD:TOTAL:            10

If you can accomplish this just do:

awk  '$0!="" {A[$1]+=$NF} END {for (N in A) print N, A[N]}' inputfile

wmp
0
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 2000 total points
ID: 36514346
To add a linecount avoiding a subsequent "wc -l":

awk  '$0!="" {A[$1]+=$NF} END {for (N in A) {E++; print N, A[N]} print "\nNumber of Lines:",E}' inputfile
0
 
LVL 79

Expert Comment

by:arnold
ID: 36516749
Is perl an absolute necessacity/requirement?

Here is a code example that will build a hash table reference to name1 and then count each occurrence.

you can have
a hash reference
                           name1 that refers to another reference
                                                         name2 that refers to another reference etc.
Then when you are going through the DE-referencing to extract the data, you can sum the counts in each sub to answer the unique count per instance
name1 total
name1:name2 total
name1:name2:name3 total etc.

                                                   
#!/usr/bin/perl

$test="count";
#the below will read in data from standard input and deals with only one column of data (name1). Using @array=split(/:/,$_); $array[0] will have the value of name1.$#array will report the number of elements in the array.
while (<>) {
       chomp();
       $hash_reference->{$_}->{$test}+=1;
}
foreach $key_hash (keys %$hash_reference) { 
        foreach $second_key (keys %{$hash_reference->{$key_hash}} )  { 
                print "$second_key\n";
                print "$key_hash: " . $hash_reference->{$key_hash}->{$test} . "\n";
         } 
}

Open in new window

0

Featured Post

Nothing ever in the clear!

This technical paper will help you implement VMware’s VM encryption as well as implement Veeam encryption which together will achieve the nothing ever in the clear goal. If a bad guy steals VMs, backups or traffic they get nothing.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Fine Tune your automatic Updates for Ubuntu / Debian
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question