We help IT Professionals succeed at work.

# Sort by one column, total in another column

on
seq1, content=xxx, size=400
seq2, content=xxx, size=500
seq3, content=aaa, size=300
seq3, content=aaa, size=200
seq3, content=bbb, size=200
..
...

zcat myfile.txt.gz | awk '{print \$2}' |  sort | uniq -c | sort -rn |more

600000 content=xxx
500000 content=yyy
400000 content=zzz
300000 content=aaa
4000 content=bbb
2000 content=ccc

Now what I want is i have counts of content  (\$2) and their size total (\$3).

600000 content=xxx, 500000
500000 content=yyy, 444444
400000 content=zzz, 42344
300000 content=aaa, 234234
4000 content=bbb, 3252345
2000 content=ccc, 2345234

or something like that.

thanks.

W
Comment
Watch Question

## View Solution Only

CERTIFIED EXPERT
Most Valuable Expert 2013
Top Expert 2013
Commented:
zcat myfile.txt.gz | awk -F',|='  '{c[\$3]+=1; s[\$3]+=\$5} END {for(n in c) print c[n], n "," s[n]}' | sort -rn

The above will neither regard nor display the "content=" part. If you need to take this string into account use this:

zcat myfile.txt.gz | awk -F',|=' '{c[\$2"="\$3]+=1; s[\$2"="\$3]+=\$5} END {for(n in c) print c[n], n "," s[n]}' | sort -rn

To avoid working on empty lines add this, if needed:

zcat myfile.txt.gz | awk -F',|='  '!/^\$/ { .......

(remainder of the commands same as above).

Commented:
Sorry about my late response, Dear Expert! Thank you so much for the code! Appreciate it.