• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 664
  • Last Modified:

bash log filter

Hello

I need help with a bash script that will concat 3 log files and bring back some usefull information

Here is a sample of the data that is in the files

2006-03-28 13:47:07,669: user login: test
2006-03-28 14:03:06,156: user timeout: johnsonr4
2006-03-28 14:03:06,314: user logout: johnsonr2
2006-03-28 14:10:53,206: user login: jonesg4
2006-03-28 14:10:57,817: user login: smithf3

Anything after the comma in the time is millisecond and can be ignored

I want to get a total number of log ins for a day and also get the total number of unique log ins for a day.

Any tips on how to do this  or get started would be appreciated.
The first thing I need to find out if how to import this data from .log files and how to go about manipulating it

Thanks
jculkincys

0
jculkincys
Asked:
jculkincys
  • 5
  • 5
1 Solution
 
XoFCommented:
Below a code snippet which does what you want. Just call the script with the date as first param and the filename to parse as second.
The sed expression is more complex as needed to meet your requirements and can be used for further investigation. Just drop the "| wc -l" at the end and you will get a complete listing for example.

<code>
#!/bin/sh
day=$1
logfile=$2

logins=`sed -n 's/^\('$day'\)[ ]*\([0-9]*:[0-9]*:[0-9]*\),[0-9]*: user login: \(.*\)/\1 \2 \3/p' $logfile | sort -k 3,3 | wc -l`
ulogins=`sed -n 's/^\('$day'\)[ ]*\([0-9]*:[0-9]*:[0-9]*\),[0-9]*: user login: \(.*\)/\1 \2 \3/p' $logfile | sort -k 3,3 -u | wc -l`

cat << EOF
Statistics for $day:
----------------------

Total logins:   $logins
Unique logins: $ulogins
EOF

</code>

HTH,
-XoF-
0
 
XoFCommented:
oops. forgot the concatenation:

<code>
#!/bin/sh
day=$1
shift
logfiles=$@

logins=`sed -n 's/^\('$day'\)[ ]*\([0-9]*:[0-9]*:[0-9]*\),[0-9]*: user login: \(.*\)/\1 \2 \3/p' $logfiles | sort -k 3,3 | wc -l`
ulogins=`sed -n 's/^\('$day'\)[ ]*\([0-9]*:[0-9]*:[0-9]*\),[0-9]*: user login: \(.*\)/\1 \2 \3/p' $logfiles | sort -k 3,3 -u | wc -l`

cat << EOF
Statistics for $day:
----------------------

Total logins:   $logins
Unique logins: $ulogins
EOF

</code>

call the script as:
./script <day> <file1> <file2> ... <file n>
0
 
jculkincysAuthor Commented:
That looks like it should work

Could you explain it a little for me.

Sorry I am a little new to bash programming

A line by line description would be most helpful
0
NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

 
jculkincysAuthor Commented:
I forgot to tell you that I would want the total number of timeouts for a day but I should be able to implement that once I understand your code.

how are the concatenated files stores? in a bash variable?

I was think that if we were dealing with very large logs - it might be better to concatenate all the longs into a temporary file and read from there - so we would not have to load so much into memory.

Your thoughts?

jculkincys
0
 
XoFCommented:

#!/bin/sh
## the so-called she-bang, which defines the interpreter to be used

day=$1
# store the first argument into variable "day"

shift # delete the first argument from the arg-list
logfiles=$@ # store all remaining arguments into the variable "logfiles"

# run sed on each logfile specified. the logfiles will be processed one after one, so you don't have to fear large memory consumptions.
# concatenation of the files does not occur
logins=`sed -n 's/^\('$day'\)[ ]*\([0-9]*:[0-9]*:[0-9]*\),[0-9]*: user login: \(.*\)/\1 \2 \3/p' $logfiles | sort -k 3,3 | wc -l`
ulogins=`sed -n 's/^\('$day'\)[ ]*\([0-9]*:[0-9]*:[0-9]*\),[0-9]*: user login: \(.*\)/\1 \2 \3/p' $logfiles | sort -k 3,3 -u | wc -l`
timeouts=`sed -n 's/^\('$day'\)[ ]*\([0-9]*:[0-9]*:[0-9]*\),[0-9]*: user timeout: \(.*\)/\1 \2 \3/p' $logfiles | sort -k 3,3 | wc -l`
utimeouts=`sed -n 's/^\('$day'\)[ ]*\([0-9]*:[0-9]*:[0-9]*\),[0-9]*: user timeout: \(.*\)/\1 \2 \3/p' $logfiles | sort -k 3,3 -u | wc -l`

## explanation of sed:
# sed -n [...] file 1 file2 filen #load each line of each file into the buffer (one after one), do not output buffer contents (-n)
# 's/pattern/replacement/p'  #process buffer: replace each occurence of "pattern" with "replacement" (s/); print new buffer content (/p)
# if parts of "pattern" are enclosed with \( \), this part is later addressable as \1, \2, and so on
# EXAMPLE: s/\(15\) \(men and \)a \(bottle of rum\)/one \2 \1 \3/p
# will transform "15 men and a bottle of rum" to "one men and 15 bottle of rum" (OK, orthographical wrong, but a nice example, isn't it? ;-)
#
# | sort -k 3,3  # pass the output to "sort"; sorting shall occur on the columns 3 to 3 (so only on column 3)
# | wc -l # pass the output of sort to "wc", which does a word count. Simply count lines instead of words (-l)

# a so called here-document:
# write everything upto a line with a leeding marker ("EOF") to stdout
cat << EOF
Statistics for $day:
----------------------

Total logins:       $logins
Unique logins:     $ulogins
Total timeouts:   $timeouts
Unique timeouts: $utimeouts
EOF



HTH,

-XoF-
0
 
jculkincysAuthor Commented:
I do appreciate the rum example

Just a few more questions (thanks you have been very helpful so far)

1.)  Currently I don't know exactly how many session logs will be present when I run this program - so it might work better if I build $logfiles a different way. I know there will be a session.log but there may also be session.log.1 , session.log.2 - There will never be more than 9 total session logs so we don't have to worry about session.log.10. After we combine the session logs - the result may be quite large >  50 megs. Knowing this, would you continue to load them the current way or create a sessionstemp.log. I am just trying to figure out how large the total size would have to be before you would consider a different method.

2.) How would sed's  "-u" parameter affect this operation? Would it run slower and take less memory.

3.) Can you give me a quick example of a line or two that was being piped into sort. All the \'s have me a little confused - but I get the general idea. I can't get a feeling for then \2\1\3

Thanks again
0
 
jculkincysAuthor Commented:
Also - any ideas for making it more robust would be helpful?

You don't have to go through the trouble of doing it. I just want to learn more about bash programming.
-error handling
-checking to see if the file executes from the "logs" folder - if its doesn't then I want to cd to there
etc etc
0
 
XoFCommented:
> 1.)  Currently I don't know exactly how many session logs will be present when I run this program - so it might work better if I build $logfiles a different way

Once again:
The number of logfiles to be processed does _not_ matter in any way! Let's say, you call the script like that:
/usr/local/bin/loganalyzer.sh 2006-03-28 /var/log/session.log.?

Now the shift-operator will strip off the first argument ("2006-03-28"). $logfiles will now contain "/var/log/session.log.?".
sed itself now will process session.log.[0-9] line by line - it's really as simple as it looks.

> -u param

I don't know this param.

> 3.) quick example

Original data:
2006-03-28 13:47:07,669: user login: test
2006-03-28 14:03:06,156: user timeout: johnsonr4
2006-03-28 14:03:06,314: user logout: johnsonr2
2006-03-28 14:10:53,206: user login: jonesg4
2006-03-28 14:10:57,817: user login: smithf3

After sed-processing (with match for "user login"):
2006-03-28 13:47:07 test
2006-03-28 14:10:53 jonesg4
2006-03-28 14:10:57 smithf3

> error handling

Can be achieved by:
- return values:
<command> || rc=1
if [ $rc -eq 1 ];then ....;fi

- signal handling --> man trap



HTH,

-XoF-
0
 
jculkincysAuthor Commented:
Alrighty - sorry for my ignorance.

I think I know understand that the files never get completely loaded into memory because sed goes through them one line at a time.
0
 
XoFCommented:
alright. Perhaps it would have been helpful for you to know, what the name "sed" does stand for: streaming editor ;-)
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 5
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now