Sort by one column, total in another column

seq1, content=xxx, size=400
seq2, content=xxx, size=500
seq3, content=aaa, size=300
seq3, content=aaa, size=200
seq3, content=bbb, size=200
..
...


zcat myfile.txt.gz | awk '{print $2}' |  sort | uniq -c | sort -rn |more

600000 content=xxx
500000 content=yyy
400000 content=zzz
300000 content=aaa
4000 content=bbb
2000 content=ccc



Now what I want is i have counts of content  ($2) and their size total ($3).

600000 content=xxx, 500000
500000 content=yyy, 444444
400000 content=zzz, 42344
300000 content=aaa, 234234
4000 content=bbb, 3252345
2000 content=ccc, 2345234

or something like that.

thanks.


W
williamwlkAsked:
Who is Participating?
 
woolmilkporcConnect With a Mentor Commented:
zcat myfile.txt.gz | awk -F',|='  '{c[$3]+=1; s[$3]+=$5} END {for(n in c) print c[n], n "," s[n]}' | sort -rn

The above will neither regard nor display the "content=" part. If you need to take this string into account use this:

zcat myfile.txt.gz | awk -F',|=' '{c[$2"="$3]+=1; s[$2"="$3]+=$5} END {for(n in c) print c[n], n "," s[n]}' | sort -rn

To avoid working on empty lines add this, if needed:

zcat myfile.txt.gz | awk -F',|='  '!/^$/ { .......

(remainder of the commands same as above).
0
 
williamwlkAuthor Commented:
Sorry about my late response, Dear Expert! Thank you so much for the code! Appreciate it.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.