Need a bash script to extract 4th argument (syslog host IP address) in a CSV file

Hi team,

I have a dissected syslog file in CSV format which contains the following fields:

Month, date, time, IP address, Syslog message.  The csv file is tens of thousands of lines long and I just have a requirement to extract the unique IPs in the entire csv file and save them in a separate text file.

A snippet of the syslog.csv file is here:



"Apr","17","06:51:01","10.8.236.138","syslog T /emupdate/subscription?uid=3 HTTP/1.1' 200 492 "
"Apr","17","06:51:01","10.25.236.138","local/testmachine info logger: [ssl_req][17/Apr/2011:06:51:01 +1000] 10.8.8.8 TLSv1 DHE-RSA-AES256-SHA 'POST /emupdate/subscription?uid=3 HTTP/1.1' 492 "
"Apr","17","06:51:02","10.15.100.138","test info logger: [ssl_acc] 10.25.6.11 - - [17/Apr/2011:06:51:02 +1000] 'POST /emupdate/subscription?uid=3 HTTP/1.1' 200 492 "
"Apr","17","06:51:02","10.9.10.138","test info logger: [ssl_req][17/Apr/2011:06:51:02 +1000] 10.2.2.2 TLSv1 DHE-RSA-AES256-SHA 'POST /emupdate/subscription?uid=3 HTTP/1.1' 492 "

Open in new window


May i just request a simple bash script that can do the above?

Finally, does someone have a readymade "diff" script which can quickly compare two text files and extract a listing of lines (host IP addresses in this case) which are present in a text file (let's call it master) but which are not present in the extracted file above (let's call it extract).

Thanks for any help
rleyba828Asked:
Who is Participating?
 
point_pleasantCommented:
cat syslog | cut -f4 -d',' | tr -d '"' | sort -u > new_file
0
 
woolmilkporcCommented:
1)

awk -F',|"' '{print $11}' csvfile | sort -nu > textfile

2)

comm -23 master extract

wmp

0
 
woolmilkporcCommented:
ad 2)

if "master is not already sorted, use this:

sort -nu master > master.sorted; comm -23 master.sorted extract; rm master.sorted
0
 
rleyba828Author Commented:
Hi Team....

sincere apologies for the late reply.  For some reason, the first reply from woolmilkporc  (the one using awk) did not print out the full list but the one from point_pleasant (using the cut script) seems to print out everything.   Not sure how/why these two different approaches would yield different results.   Anyway,   I have awarded the points.  Thanks to the contributors for the big help.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.