Solved

Sort by one column, total in another column

Posted on 2012-03-11
2
440 Views
Last Modified: 2012-04-02
seq1, content=xxx, size=400
seq2, content=xxx, size=500
seq3, content=aaa, size=300
seq3, content=aaa, size=200
seq3, content=bbb, size=200
..
...


zcat myfile.txt.gz | awk '{print $2}' |  sort | uniq -c | sort -rn |more

600000 content=xxx
500000 content=yyy
400000 content=zzz
300000 content=aaa
4000 content=bbb
2000 content=ccc



Now what I want is i have counts of content  ($2) and their size total ($3).

600000 content=xxx, 500000
500000 content=yyy, 444444
400000 content=zzz, 42344
300000 content=aaa, 234234
4000 content=bbb, 3252345
2000 content=ccc, 2345234

or something like that.

thanks.


W
0
Comment
Question by:williamwlk
2 Comments
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 500 total points
ID: 37708689
zcat myfile.txt.gz | awk -F',|='  '{c[$3]+=1; s[$3]+=$5} END {for(n in c) print c[n], n "," s[n]}' | sort -rn

The above will neither regard nor display the "content=" part. If you need to take this string into account use this:

zcat myfile.txt.gz | awk -F',|=' '{c[$2"="$3]+=1; s[$2"="$3]+=$5} END {for(n in c) print c[n], n "," s[n]}' | sort -rn

To avoid working on empty lines add this, if needed:

zcat myfile.txt.gz | awk -F',|='  '!/^$/ { .......

(remainder of the commands same as above).
0
 

Author Closing Comment

by:williamwlk
ID: 37797339
Sorry about my late response, Dear Expert! Thank you so much for the code! Appreciate it.
0

Featured Post

Migrating Your Company's PCs

To keep pace with competitors, businesses must keep employees productive, and that means providing them with the latest technology. This document provides the tips and tricks you need to help you migrate an outdated PC fleet to new desktops, laptops, and tablets.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Squid Connection Pools 3 69
Reset Root Password on CentOS 6 4 56
How to find Linux Server's last patch date 9 51
Oracle 10g standard edition server with 4 processors 3 55
I. Introduction There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension (http://www.experts-exchange.com/discussions/210281/Attachments-with-no-extension.html). This reminded me of questions tha…
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question