Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

counting lines in a file

Posted on 2011-09-09
16
Medium Priority
?
195 Views
Last Modified: 2012-05-12
HI

i have a file that many strings of the form

name1:name2:                          23

basically of the form


name1:name2:                          (some positive number)

can you help me do the following

1) count number of lines with name1
2) count uniques lines  with "name1:name2"
3) count the total (3rd colum) per unique 1) and 2) {above two caess}


0
Comment
Question by:Vlearns
  • 9
  • 6
16 Comments
 

Author Comment

by:Vlearns
ID: 36513417
actually the format is

name1:name2:name3                          (some positive number)


name1:name2:name3                          23
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36513465
1) grep -c "name1" inputfile

2) grep -c "name1:name2" inputfile

3a)

awk  '/name1/ {count=count+$NF} END {print count}' inputfile

3b)

awk  '/name1:name2/ {count=count+$NF} END {print count}' inputfile

wmp
0
 

Author Comment

by:Vlearns
ID: 36513565
Hi, thanks for the resposne, i will not know all the name1 ahead of times....name1 is a field that can have different names..


like
aaaa:cccccc:bbbbbb   1
bbbb.iiiiiiiii:nnnnnnn       2

i want to group by name1 field....


like (total)aaaa:dontcare   total count = 13
      (total)bbbbbbb:dontcare   total count = 14

so on...


   
0
Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

 

Author Comment

by:Vlearns
ID: 36513762
i gues the algoritm is


total number of "name1:name2:name3"

1) look at each line of the file.
2) if name 1:name2:name3  is new and unseen, add it to the set of Seen, increment the count  by the (some positive number, last column)
3) if  name1:name2:name3 is already been seen(exists in the set), increment the count of that entry in the set by 1.

total number of  unique name1:name2:name3

1) look at every line.
2) if  name1:name2:name3 is new than increment count by 1
3) if name1:name2:name:3 is seen before, don't increment the count





0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36513766
1)

awk -F: '{A[$1]++} END {for (N in A) print "Total". N, "=", A[N]}' inputfile

2)

awk -F: '{A[$1":"$2]++} END {for (N in A) print "Total:", N, "=", A[N]}' inputfile

Could you explain (3) a bit more (output example)?

wmp
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36513777
Typo in (1), sorry:

awk -F: '{A[$1]++} END {for (N in A) print "Total:", N, "=", A[N]}' inputfile
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36513899
For (3), did you think of something like this?

a)

awk -F: '{A[$1]+=substr($NF,match($NF,"[\t ]"))} END {for (N in A) print "Sum:", N, "=", A[N]}'

b)

awk -F: '{A[$1":"$2]+=substr($NF,match($NF,"[\t ]"))} END {for (N in A) print "Sum:", N, "=", A[N]}'
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36513910
Add "inputfile"  at the end of each awk line!
0
 

Author Comment

by:Vlearns
ID: 36514194
hi thanks,


how can i change your code

to calculate  unique entries

user1:MAP:AD:TOTAL:           45
user2:MAPAD:TOTAL:            23

user1:MAP:AD:TOTAL :            3
user3: MAP:AD:TOTAL:            10



such that my output file is


user1:MAP:AD:TOTAL:    48 (45+3)
user2:MAP:AD:TOTAL      23
user3: MAP:AD:TOTAL:            10


then i can a wc -l on the new file and count the number of lines right?




0
 

Author Comment

by:Vlearns
ID: 36514202
maybe start with
 grep "MAP:AD:TOTAL:" 23.map  > /home/u


0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36514219
awk does not have multidimensional arrays, so your new requirement would imply doing a much more elaborate programming.

Sorry, I don't have the time for such a big thing right now.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36514236
... or what do you mean with "48 (45+3) ?
0
 

Author Comment

by:Vlearns
ID: 36514242
48 is the sum of ttwo values for user1, ignore the bracketed stuff
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36514283
OK.

For the following to work you must take care not to put spaces around the colons (empty lines are dealt with, however), so your inputfile should look like this:

user1:MAP:AD:TOTAL:           45
user2:MAPAD:TOTAL:            23

user1:MAP:AD:TOTAL:            3
user3:MAP:AD:TOTAL:            10

If you can accomplish this just do:

awk  '$0!="" {A[$1]+=$NF} END {for (N in A) print N, A[N]}' inputfile

wmp
0
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 2000 total points
ID: 36514346
To add a linecount avoiding a subsequent "wc -l":

awk  '$0!="" {A[$1]+=$NF} END {for (N in A) {E++; print N, A[N]} print "\nNumber of Lines:",E}' inputfile
0
 
LVL 80

Expert Comment

by:arnold
ID: 36516749
Is perl an absolute necessacity/requirement?

Here is a code example that will build a hash table reference to name1 and then count each occurrence.

you can have
a hash reference
                           name1 that refers to another reference
                                                         name2 that refers to another reference etc.
Then when you are going through the DE-referencing to extract the data, you can sum the counts in each sub to answer the unique count per instance
name1 total
name1:name2 total
name1:name2:name3 total etc.

                                                   
#!/usr/bin/perl

$test="count";
#the below will read in data from standard input and deals with only one column of data (name1). Using @array=split(/:/,$_); $array[0] will have the value of name1.$#array will report the number of elements in the array.
while (<>) {
       chomp();
       $hash_reference->{$_}->{$test}+=1;
}
foreach $key_hash (keys %$hash_reference) { 
        foreach $second_key (keys %{$hash_reference->{$key_hash}} )  { 
                print "$second_key\n";
                print "$key_hash: " . $hash_reference->{$key_hash}->{$test} . "\n";
         } 
}

Open in new window

0

Featured Post

How to Use the Help Bell

Need to boost the visibility of your question for solutions? Use the Experts Exchange Help Bell to confirm priority levels and contact subject-matter experts for question attention.  Check out this how-to article for more information.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The purpose of this article is to demonstrate how we can use conditional statements using Python.
Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses

886 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question