• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 470
  • Last Modified:

Sort by one column, total in another column

seq1, content=xxx, size=400
seq2, content=xxx, size=500
seq3, content=aaa, size=300
seq3, content=aaa, size=200
seq3, content=bbb, size=200
..
...


zcat myfile.txt.gz | awk '{print $2}' |  sort | uniq -c | sort -rn |more

600000 content=xxx
500000 content=yyy
400000 content=zzz
300000 content=aaa
4000 content=bbb
2000 content=ccc



Now what I want is i have counts of content  ($2) and their size total ($3).

600000 content=xxx, 500000
500000 content=yyy, 444444
400000 content=zzz, 42344
300000 content=aaa, 234234
4000 content=bbb, 3252345
2000 content=ccc, 2345234

or something like that.

thanks.


W
0
williamwlk
Asked:
williamwlk
1 Solution
 
woolmilkporcCommented:
zcat myfile.txt.gz | awk -F',|='  '{c[$3]+=1; s[$3]+=$5} END {for(n in c) print c[n], n "," s[n]}' | sort -rn

The above will neither regard nor display the "content=" part. If you need to take this string into account use this:

zcat myfile.txt.gz | awk -F',|=' '{c[$2"="$3]+=1; s[$2"="$3]+=$5} END {for(n in c) print c[n], n "," s[n]}' | sort -rn

To avoid working on empty lines add this, if needed:

zcat myfile.txt.gz | awk -F',|='  '!/^$/ { .......

(remainder of the commands same as above).
0
 
williamwlkAuthor Commented:
Sorry about my late response, Dear Expert! Thank you so much for the code! Appreciate it.
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now