bash log filter

Hello

I need help with a bash script that will concat 3 log files and bring back some usefull information

Here is a sample of the data that is in the files

2006-03-28 13:47:07,669: user login: test
2006-03-28 14:03:06,156: user timeout: johnsonr4
2006-03-28 14:03:06,314: user logout: johnsonr2
2006-03-28 14:10:53,206: user login: jonesg4
2006-03-28 14:10:57,817: user login: smithf3

Anything after the comma in the time is millisecond and can be ignored

I want to get a total number of log ins for a day and also get the total number of unique log ins for a day.

Any tips on how to do this  or get started would be appreciated.
The first thing I need to find out if how to import this data from .log files and how to go about manipulating it

Thanks
jculkincys

LVL 2
jculkincysAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

XoFCommented:
Below a code snippet which does what you want. Just call the script with the date as first param and the filename to parse as second.
The sed expression is more complex as needed to meet your requirements and can be used for further investigation. Just drop the "| wc -l" at the end and you will get a complete listing for example.

<code>
#!/bin/sh
day=$1
logfile=$2

logins=`sed -n 's/^\('$day'\)[ ]*\([0-9]*:[0-9]*:[0-9]*\),[0-9]*: user login: \(.*\)/\1 \2 \3/p' $logfile | sort -k 3,3 | wc -l`
ulogins=`sed -n 's/^\('$day'\)[ ]*\([0-9]*:[0-9]*:[0-9]*\),[0-9]*: user login: \(.*\)/\1 \2 \3/p' $logfile | sort -k 3,3 -u | wc -l`

cat << EOF
Statistics for $day:
----------------------

Total logins:   $logins
Unique logins: $ulogins
EOF

</code>

HTH,
-XoF-
0
XoFCommented:
oops. forgot the concatenation:

<code>
#!/bin/sh
day=$1
shift
logfiles=$@

logins=`sed -n 's/^\('$day'\)[ ]*\([0-9]*:[0-9]*:[0-9]*\),[0-9]*: user login: \(.*\)/\1 \2 \3/p' $logfiles | sort -k 3,3 | wc -l`
ulogins=`sed -n 's/^\('$day'\)[ ]*\([0-9]*:[0-9]*:[0-9]*\),[0-9]*: user login: \(.*\)/\1 \2 \3/p' $logfiles | sort -k 3,3 -u | wc -l`

cat << EOF
Statistics for $day:
----------------------

Total logins:   $logins
Unique logins: $ulogins
EOF

</code>

call the script as:
./script <day> <file1> <file2> ... <file n>
0
jculkincysAuthor Commented:
That looks like it should work

Could you explain it a little for me.

Sorry I am a little new to bash programming

A line by line description would be most helpful
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

jculkincysAuthor Commented:
I forgot to tell you that I would want the total number of timeouts for a day but I should be able to implement that once I understand your code.

how are the concatenated files stores? in a bash variable?

I was think that if we were dealing with very large logs - it might be better to concatenate all the longs into a temporary file and read from there - so we would not have to load so much into memory.

Your thoughts?

jculkincys
0
XoFCommented:

#!/bin/sh
## the so-called she-bang, which defines the interpreter to be used

day=$1
# store the first argument into variable "day"

shift # delete the first argument from the arg-list
logfiles=$@ # store all remaining arguments into the variable "logfiles"

# run sed on each logfile specified. the logfiles will be processed one after one, so you don't have to fear large memory consumptions.
# concatenation of the files does not occur
logins=`sed -n 's/^\('$day'\)[ ]*\([0-9]*:[0-9]*:[0-9]*\),[0-9]*: user login: \(.*\)/\1 \2 \3/p' $logfiles | sort -k 3,3 | wc -l`
ulogins=`sed -n 's/^\('$day'\)[ ]*\([0-9]*:[0-9]*:[0-9]*\),[0-9]*: user login: \(.*\)/\1 \2 \3/p' $logfiles | sort -k 3,3 -u | wc -l`
timeouts=`sed -n 's/^\('$day'\)[ ]*\([0-9]*:[0-9]*:[0-9]*\),[0-9]*: user timeout: \(.*\)/\1 \2 \3/p' $logfiles | sort -k 3,3 | wc -l`
utimeouts=`sed -n 's/^\('$day'\)[ ]*\([0-9]*:[0-9]*:[0-9]*\),[0-9]*: user timeout: \(.*\)/\1 \2 \3/p' $logfiles | sort -k 3,3 -u | wc -l`

## explanation of sed:
# sed -n [...] file 1 file2 filen #load each line of each file into the buffer (one after one), do not output buffer contents (-n)
# 's/pattern/replacement/p'  #process buffer: replace each occurence of "pattern" with "replacement" (s/); print new buffer content (/p)
# if parts of "pattern" are enclosed with \( \), this part is later addressable as \1, \2, and so on
# EXAMPLE: s/\(15\) \(men and \)a \(bottle of rum\)/one \2 \1 \3/p
# will transform "15 men and a bottle of rum" to "one men and 15 bottle of rum" (OK, orthographical wrong, but a nice example, isn't it? ;-)
#
# | sort -k 3,3  # pass the output to "sort"; sorting shall occur on the columns 3 to 3 (so only on column 3)
# | wc -l # pass the output of sort to "wc", which does a word count. Simply count lines instead of words (-l)

# a so called here-document:
# write everything upto a line with a leeding marker ("EOF") to stdout
cat << EOF
Statistics for $day:
----------------------

Total logins:       $logins
Unique logins:     $ulogins
Total timeouts:   $timeouts
Unique timeouts: $utimeouts
EOF



HTH,

-XoF-
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
jculkincysAuthor Commented:
I do appreciate the rum example

Just a few more questions (thanks you have been very helpful so far)

1.)  Currently I don't know exactly how many session logs will be present when I run this program - so it might work better if I build $logfiles a different way. I know there will be a session.log but there may also be session.log.1 , session.log.2 - There will never be more than 9 total session logs so we don't have to worry about session.log.10. After we combine the session logs - the result may be quite large >  50 megs. Knowing this, would you continue to load them the current way or create a sessionstemp.log. I am just trying to figure out how large the total size would have to be before you would consider a different method.

2.) How would sed's  "-u" parameter affect this operation? Would it run slower and take less memory.

3.) Can you give me a quick example of a line or two that was being piped into sort. All the \'s have me a little confused - but I get the general idea. I can't get a feeling for then \2\1\3

Thanks again
0
jculkincysAuthor Commented:
Also - any ideas for making it more robust would be helpful?

You don't have to go through the trouble of doing it. I just want to learn more about bash programming.
-error handling
-checking to see if the file executes from the "logs" folder - if its doesn't then I want to cd to there
etc etc
0
XoFCommented:
> 1.)  Currently I don't know exactly how many session logs will be present when I run this program - so it might work better if I build $logfiles a different way

Once again:
The number of logfiles to be processed does _not_ matter in any way! Let's say, you call the script like that:
/usr/local/bin/loganalyzer.sh 2006-03-28 /var/log/session.log.?

Now the shift-operator will strip off the first argument ("2006-03-28"). $logfiles will now contain "/var/log/session.log.?".
sed itself now will process session.log.[0-9] line by line - it's really as simple as it looks.

> -u param

I don't know this param.

> 3.) quick example

Original data:
2006-03-28 13:47:07,669: user login: test
2006-03-28 14:03:06,156: user timeout: johnsonr4
2006-03-28 14:03:06,314: user logout: johnsonr2
2006-03-28 14:10:53,206: user login: jonesg4
2006-03-28 14:10:57,817: user login: smithf3

After sed-processing (with match for "user login"):
2006-03-28 13:47:07 test
2006-03-28 14:10:53 jonesg4
2006-03-28 14:10:57 smithf3

> error handling

Can be achieved by:
- return values:
<command> || rc=1
if [ $rc -eq 1 ];then ....;fi

- signal handling --> man trap



HTH,

-XoF-
0
jculkincysAuthor Commented:
Alrighty - sorry for my ignorance.

I think I know understand that the files never get completely loaded into memory because sed goes through them one line at a time.
0
XoFCommented:
alright. Perhaps it would have been helpful for you to know, what the name "sed" does stand for: streaming editor ;-)
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.