We help IT Professionals succeed at work.
Get Started

Windows batch script to count top 5 recurrent values for a column in csv file

704 Views
Last Modified: 2014-09-16
Pls unzip the attached using the pin number 1234554321 :
it's a sanitized file but just in case, I've zip it with a pin so
that its content is not easily searched

I'll need scripts (ideally Windows Batch script or if it's going to
run more than 15 minutes using Windows Batch on my 32bit
Win XP that runs on SSD, then I can accept PowerShell or even
Linux Shell script but there's this hassle of  uploading files into
the servers instead of doing on my laptop) that will process
the unzipped csv file (the number of lines could run from
200,000 to 800,000 & I do this weekly)

a) sort by the "Source IP" (ie column N) as primary key & the
    Reason (ie column D) as secondary key & then make a count
    of the occurrences of top 5 "Source IP" : need to know the
    top 5 Source IPs recur how many times each.  The output
    should have 5 lines of IPs with the count for each IP

b) then make a count of the occurrences of top 25 events for
    each of the top 5"Source IP" ie for each of the top 5
    recurring Source IP, what is the top 25 events (or if there
    is less then stop at whatever is the number of events that
    are available).    The output the top 5  IPs with each IP &
    counts of each of its top 25 events  (or whatever available)  

c) sort by the "Destination IP" (ie column Q) as primary key & the
    Reason (ie column D) as secondary key & then make a count
    of the occurrences of top 5 "Source IP" : need to know the
    top 5 Source IPs recur how many times each.  The output
    should have 5 lines of IPs with the count for each IP

d) then make a count of the occurrences of top 25 events for
    each of the top 5 "Destination IP" ie for each of the top 5
    recurring Destination IP, what is the top 25 events (or if
    there is less then stop at whatever is the number of events
    that are available).    The output the top 5  IPs with each IP
    & counts of each of its top 25 events  (or whatever available)  

Leave the sorted output filename in the same folder as
original_filename_sorted.csv  as I would like to browse
through
Comment
Watch Question
"Batchelor", Developer and EE Topic Advisor
CERTIFIED EXPERT
Top Expert 2015
Commented:
This problem has been solved!
Unlock 4 Answers and 31 Comments.
See Answers
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE