Solved

counting lines in a file

Posted on 2011-09-09
16
178 Views
Last Modified: 2012-05-12
HI

i have a file that many strings of the form

name1:name2:                          23

basically of the form


name1:name2:                          (some positive number)

can you help me do the following

1) count number of lines with name1
2) count uniques lines  with "name1:name2"
3) count the total (3rd colum) per unique 1) and 2) {above two caess}


0
Comment
Question by:Vlearns
  • 9
  • 6
16 Comments
 

Author Comment

by:Vlearns
ID: 36513417
actually the format is

name1:name2:name3                          (some positive number)


name1:name2:name3                          23
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36513465
1) grep -c "name1" inputfile

2) grep -c "name1:name2" inputfile

3a)

awk  '/name1/ {count=count+$NF} END {print count}' inputfile

3b)

awk  '/name1:name2/ {count=count+$NF} END {print count}' inputfile

wmp
0
 

Author Comment

by:Vlearns
ID: 36513565
Hi, thanks for the resposne, i will not know all the name1 ahead of times....name1 is a field that can have different names..


like
aaaa:cccccc:bbbbbb   1
bbbb.iiiiiiiii:nnnnnnn       2

i want to group by name1 field....


like (total)aaaa:dontcare   total count = 13
      (total)bbbbbbb:dontcare   total count = 14

so on...


   
0
 

Author Comment

by:Vlearns
ID: 36513762
i gues the algoritm is


total number of "name1:name2:name3"

1) look at each line of the file.
2) if name 1:name2:name3  is new and unseen, add it to the set of Seen, increment the count  by the (some positive number, last column)
3) if  name1:name2:name3 is already been seen(exists in the set), increment the count of that entry in the set by 1.

total number of  unique name1:name2:name3

1) look at every line.
2) if  name1:name2:name3 is new than increment count by 1
3) if name1:name2:name:3 is seen before, don't increment the count





0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36513766
1)

awk -F: '{A[$1]++} END {for (N in A) print "Total". N, "=", A[N]}' inputfile

2)

awk -F: '{A[$1":"$2]++} END {for (N in A) print "Total:", N, "=", A[N]}' inputfile

Could you explain (3) a bit more (output example)?

wmp
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36513777
Typo in (1), sorry:

awk -F: '{A[$1]++} END {for (N in A) print "Total:", N, "=", A[N]}' inputfile
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36513899
For (3), did you think of something like this?

a)

awk -F: '{A[$1]+=substr($NF,match($NF,"[\t ]"))} END {for (N in A) print "Sum:", N, "=", A[N]}'

b)

awk -F: '{A[$1":"$2]+=substr($NF,match($NF,"[\t ]"))} END {for (N in A) print "Sum:", N, "=", A[N]}'
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36513910
Add "inputfile"  at the end of each awk line!
0
Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

 

Author Comment

by:Vlearns
ID: 36514194
hi thanks,


how can i change your code

to calculate  unique entries

user1:MAP:AD:TOTAL:           45
user2:MAPAD:TOTAL:            23

user1:MAP:AD:TOTAL :            3
user3: MAP:AD:TOTAL:            10



such that my output file is


user1:MAP:AD:TOTAL:    48 (45+3)
user2:MAP:AD:TOTAL      23
user3: MAP:AD:TOTAL:            10


then i can a wc -l on the new file and count the number of lines right?




0
 

Author Comment

by:Vlearns
ID: 36514202
maybe start with
 grep "MAP:AD:TOTAL:" 23.map  > /home/u


0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36514219
awk does not have multidimensional arrays, so your new requirement would imply doing a much more elaborate programming.

Sorry, I don't have the time for such a big thing right now.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36514236
... or what do you mean with "48 (45+3) ?
0
 

Author Comment

by:Vlearns
ID: 36514242
48 is the sum of ttwo values for user1, ignore the bracketed stuff
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36514283
OK.

For the following to work you must take care not to put spaces around the colons (empty lines are dealt with, however), so your inputfile should look like this:

user1:MAP:AD:TOTAL:           45
user2:MAPAD:TOTAL:            23

user1:MAP:AD:TOTAL:            3
user3:MAP:AD:TOTAL:            10

If you can accomplish this just do:

awk  '$0!="" {A[$1]+=$NF} END {for (N in A) print N, A[N]}' inputfile

wmp
0
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 500 total points
ID: 36514346
To add a linecount avoiding a subsequent "wc -l":

awk  '$0!="" {A[$1]+=$NF} END {for (N in A) {E++; print N, A[N]} print "\nNumber of Lines:",E}' inputfile
0
 
LVL 76

Expert Comment

by:arnold
ID: 36516749
Is perl an absolute necessacity/requirement?

Here is a code example that will build a hash table reference to name1 and then count each occurrence.

you can have
a hash reference
                           name1 that refers to another reference
                                                         name2 that refers to another reference etc.
Then when you are going through the DE-referencing to extract the data, you can sum the counts in each sub to answer the unique count per instance
name1 total
name1:name2 total
name1:name2:name3 total etc.

                                                   
#!/usr/bin/perl

$test="count";
#the below will read in data from standard input and deals with only one column of data (name1). Using @array=split(/:/,$_); $array[0] will have the value of name1.$#array will report the number of elements in the array.
while (<>) {
       chomp();
       $hash_reference->{$_}->{$test}+=1;
}
foreach $key_hash (keys %$hash_reference) { 
        foreach $second_key (keys %{$hash_reference->{$key_hash}} )  { 
                print "$second_key\n";
                print "$key_hash: " . $hash_reference->{$key_hash}->{$test} . "\n";
         } 
}

Open in new window

0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Suggested Solutions

Linux users are sometimes dumbfounded by the severe lack of documentation on a topic. Sometimes, the documentation is copious, but other times, you end up with some obscure "it varies depending on your distribution" over and over when searching for …
It’s 2016. Password authentication should be dead — or at least close to dying. But, unfortunately, it has not traversed Quagga stage yet. Using password authentication is like laundering hotel guest linens with a washboard — it’s Passé.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now