Aggregate Count of 24 hourly Apache Logs

In my daily scripts, it is running the following which seems to work, but does take a very long time..
gawk '/searchString/ { count++ } END { print count } log1 log2 logx

Open in new window


I also know that you can do a grep -hc log1 log2 logx

Just trying to understand the most efficient way to do this.

Thanks,

Kent
LVL 17
Kent DyerIT Security Analyst SeniorAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

pwustCommented:
generally speaking: for this job grep is MUCH more efficient than awk/gawk.

If you want to have one sum only while using grep, try the following:
SUM=0;for TERM in `grep -hc "searchString" access_log*` ; do SUM=$((SUM+TERM)); done; echo $SUM

Open in new window

0
Kent DyerIT Security Analyst SeniorAuthor Commented:
This does not seem to work..  I even tried to put this into a bash shell in Windows..

D:\scripts>SUM=0;for TERM in `grep -hc "signin xmlns" /logs/access.log.2011-10-2
5-*` ; do SUM=$((SUM+TERM)); done; echo $SUM
SUM: unknown option -- h
Try `SUM --help' for more information.

Tried a couple variations of this with no success.

D:\scripts>grep -hc "signin xmlns" /logs/access.log.2011-10-25-*;
4970
This seems to work fine..

Thanks,

Kent
0
pwustCommented:
ok, the snippet was for Bash shell. I did not realize that you are using Windows here.
0
Cloud Class® Course: Certified Penetration Testing

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

pwustCommented:
How do you have installed grep on your Windows box? is it part of a Cygwin installation, or of GnuWin32, or something else?
0
Kent DyerIT Security Analyst SeniorAuthor Commented:
I originally started with the Unxtools

http://unxutils.sourceforge.net/

but then I wound up going with CygWin..

http://www.cygwin.com

I kind of cheated with Cygwiin as I have not installed the binaries to servers, but just copied them to the server and the seem to work fine.

The hourly files seem to work fine.  However, the ones where I am counting all of the records for the entire day, seem to process.  But it never lets me know that this has completed.

Here is the VBS Code that I am "calling" to the UNIX tools:
Option Explicit
Const ForReading = 1, ForWriting = 2
Dim oFSO, objWS
Dim yMonth, yDay, yYear, Yesterday
Dim nMonth, nDay, nYear, nToday
Dim Log4Path, FL, f1, line, LogPath, Separator
Dim FullApacheLogPath, ApacheLog, APacheLogPath, DataCenter
Dim ScriptDir, LogDir, WshNetwork, strMachineName
Dim ResultFile, ResultFile2, fileoutput, fileoutput2
Dim objFolder, colFiles, objFile, FLPath, FLName, FLlist
Dim YearVar, MonthVar, DayVar, DateVar, HourVar
Dim String, searchstr, t, Str, objExecObject, Results

Set oFSO = CreateObject("Scripting.FileSystemObject")
Set WshNetwork = CreateObject("Wscript.Network")
Set objWS = CreateObject("WScript.Shell")

strMachineName = WshNetwork.ComputerName 

log4j

' -- Calculate yesterday "y" for yesterday
yMonth = Right(100 + Month(Date() - 1), 2)
yDay = Right(100 + Day(Date() - 1), 2)
yYear = Year(Date() - 1)
Yesterday = yMonth & "-" & yDay & "-" & yYear

' -- Calculate Today "n" now or today
nMonth = Right(100 + Month(Date()), 2)
nDay = Right(100 + Day(Date()), 2)
nYear = Year(Date())
nToday = nMonth & "-" & nDay & "-" & nYear

' -- Set some additional variables to be re-used
ScriptDir = Replace(WScript.ScriptFullName, WScript.ScriptName, "")
LogDir = ScriptDir & "logs\"
If Not oFSO.FolderExists(LogDir) Then
	oFSO.CreateFolder(LogDir)
End If
If Not oFSO.FolderExists("D:\logs\ArchiveTemp\stats\") Then
	oFSO.CreateFolder("D:\logs\ArchiveTemp\stats\")
End If
ResultFile = "D:\logs\ArchiveTemp\stats\" & strMachineName & "." & yYear & "-" & yMonth & "-" & yDay & ".csv"
ResultFile2 = LogDir & "Aggregate_stats_" & strMachineName & "." & yYear & "-" & yMonth & "-" & yDay & ".csv"

' -- For testing, we want to clear our previously generated result data
If oFSO.FileExists(ResultFile) Then
	oFSO.DeleteFile(ResultFile)
End If
If oFSO.FileExists(ResultFile2) Then
	oFSO.DeleteFile(ResultFile2)
End If

' -- We need to create the text files to work with
Set fileOutput = oFSO.CreateTextFile(ResultFile, True)
Set fileOutput2 = oFSO.CreateTextFile(ResultFile2, True)

'Wscript.echo APacheLogPath
LogLoop APacheLogPath
SumAggregate
Mailer

Sub LogLoop(Path)
	Set objFolder = oFSO.GetFolder(Path)
	Set colFiles = objFolder.Files
	For Each objFile In colFiles
		FLPath = objFile.Path
		FLName = objFile.Name
		If InStr(FLName, "alarm") Or InStr(FLName, "failed") Or InStr(FLName, "audit") Or InStr(FLName, "foren") Then
		Else
			If InStr(FLName, ".log.") Then
				If InStr(FLName, "-") Then '-- We want to be sure that we have file names that have a date within them
					'WScript.Echo FLName
					If InStr(FLName, ApacheLog) Then
						YearVar = Right(Split(FLName, "-") (0), 4)
						'Wscript.Echo YearVar
						MonthVar = Right(Split(FLName, "-") (1), 2)
						'Wscript.Echo MonthVar
						DayVar = Left(Split(FLName, "-") (2), 2)
						'Wscript.Echo DayVar
						HourVar = Split(FLName, "-") (3)
						'Wscript.Echo HourVar
						DateVar = MonthVar & "-" & DayVar & "-" & YearVar
						'If DateVar = Yesterday Then ' -- We only want yesterday's logs
						If DateVar = nToday Then ' -- We only want yesterday's logs
							'WScript.Echo FLName
							'WScript.Echo DateVar
							' 6.x servers use the following
							''If InStr(FLName, "aa_server.log") Then
							'WScript.Echo "We Made it"
							HourVar = Split(FLName, "-") (3)
							'WScript.Echo HourVar
							'fileOutput.WriteLine FLName & "," & DateVar & "," & HourVar
							FLlist = FLlist & " " & Chr(34) & APacheLogPath & FLName & Chr(34)
							'Wscript.Echo FLlist
							String = "javax.crypto.BadPaddingException: Given final block not properly padded,"
							String = String & "Unable to get DB connection,java.lang.IllegalArgumentException: userName is required,"
							String = String & "NullPointerException,signin xmlns,signinResponse xmlns,"
							String = String & "some xmlns,someResponse xmlns,someother xmlns,"
							String = String & "someotherResponse xmlns,somethinelse xmlns"
							searchstr = Split(String, ",")
							For Each t In searchstr
								'fileOutput.WriteLine vbcrlf & "counts for: " & t
								Str = "%comspec% /k " & ScriptDir & "grep -c  """ & t & """ " & APacheLogPath & FLName & ""
								'WScript.Echo Str
								''objWS.Run(Str)
								Set objExecObject = objWS.Exec(Str)
								Results = objExecObject.StdOut.ReadLine()
								fileOutput.WriteLine HourVar & ":00," & Results & "," & t
							Next
						End If
					End If
				End If
			End If
		End If
	Next
End Sub

Sub Mailer()
	Dim recipient, server, from, subject, body, strCommand, oExec
	recipient = "recipient1@company.com,recipient2@company.com"
	'recipient = "recipient1@company.com"
	If InStr(strMachineName, "XXX")  Then
		server = "1.2.3.4" ' -- DataCenter 1
	End If
	If InStr(strMachineName, "YYY") Then
		server = "5.6.7.8" ' -- DataCenter2
	End If
	from = "recipient1@company.com"
	subject = "Statistical Counts from: " & strMachineName
	strCommand = "%comspec% /c " & ScriptDir & "Blat.exe"
	strCommand = strCommand & " -log ""D:\Scripts\Blat.log"""
	strCommand = strCommand & " -f " & Chr(34) & from & Chr(34) & ""
	strCommand = strCommand & " -to " & Chr(34) & recipient & Chr(34) & ""
	strCommand = strCommand & " -server " & Chr(34) & server & Chr(34) & ""
	strCommand = strCommand & " -subject " & Chr(34) & subject & Chr(34) & ""
	strCommand = strCommand & " -body ""Attached are the Statistical Counts and are also on the server at: """ & ResultFile
	'strCommand = strCommand & " -bodyF " & Chr(34) & ResultFile & Chr(34) & ""
	strCommand = strCommand & " -attach " & Chr(34) & ResultFile & Chr(34) & "," & Chr(34) & ResultFile2 & Chr(34) & ""
	'strCommand = strCommand & " -attach " & Chr(34) & ResultFile & Chr(34) & ""
	strCommand = strCommand & " -q"
	'WScript.Echo strCommand
	Set oExec = objWS.Exec(strCommand)
	While Not oExec.Status = 0
		WScript.Sleep 10
	Wend
	Set objWS = Nothing
End Sub

Sub log4j
	'C:\WEB-APPS\webapps1\pmws_server\WEB-INF\classes
	'log4j.appender.LOGFILE.File=D:/Logs/SERVER@DATACENTER.log ' -- current log
	'log4j.appender.forensic.File=D:/Logs/SERVER@DATACENTER_forensic.log ' --current forensic log
	' -- With Apache Tomcat under Windows, you can use "/" or "\\" for the folder file paths
						
	Log4Path = "C:\WEB-APPS\webapps1\pmws_server\WEB-INF\classes\"
	FL = Log4Path & "log4j.properties"
						
	Set f1 = oFSO.OpenTextFile(FL, ForReading, True)
						
	Do Until f1.AtEndOfStream
		line = f1.Readline
		If InStr(line, "log4j.appender.LOGFILE.File") Then
			LogPath = Split(line, "=") (1)
			Select Case True
				Case InStrRev(LogPath, "\\") > 0
					Separator = "\\"
				Case InStrRev(LogPath, "/") > 0
					Separator = "/"
			End Select
			FullApacheLogPath = Replace(LogPath, Separator, "\") ' -- e.g.: D:\Logs\SERVER@DATACENTER.log
			'WScript.Echo FullApacheLogPath
			ApacheLog = Right(LogPath, Len(logPath) - InStrRev(LogPath, Separator)) ' -- e.g.: SERVER@DATACENTER.log
			'WScript.Echo ApacheLog
			APacheLogPath = Replace(Left(LogPath, InStrRev(LogPath, Separator)), Separator, "\") ' -- e.g.: D:\Logs\
			'WScript.Echo APacheLogPath
			'If Not Instr(Log4Path,"tomcat") then
			'	Server = Split(ApacheLog, "@") (0) ' -- e.g.: SERVER
			'	'WScript.Echo Server
			'End If
			If Not InStr(Log4Path, "tomcat") Then
				DataCenter = Left(Split(ApacheLog, "@") (1), Len(Split(ApacheLog, "@") (1)) - 4) ' -- e.g.: DATACENTER
				'WScript.Echo DataCenter
			End If
		End If
		'If InStr(line, "log4j.appender.forensic.File") Then
		'	ForensicLogPath = Split(line, "=") (1)
		'	Select Case True
		'		Case InStrRev(ForensicLogPath, "\\") > 0
		'			Separator = "\\"
		'		Case InStrRev(ForensicLogPath, "/") > 0
		'			Separator = "/"
		'	End Select
		'	ApacheForensicLogPath = Replace(ForensicLogPath, Separator, "\")
		'	ApacheForensicLog = Right(ForensicLogPath, Len(ForensicLogPath) - InStrRev(ForensicLogPath, Separator))
		'	'WScript.Echo ApacheForensicLog
		'End If
	Loop
End Sub

Sub SumAggregate
	Dim UnxPath
	UnxPath = Right(Replace(APacheLogPath, "\", "/"), Len(APacheLogPath) - 2)
	'Wscript.Echo UnxPath
	String = "javax.crypto.BadPaddingException: Given final block not properly padded,"
	String = String & "Unable to get DB connection,java.lang.IllegalArgumentException: userName is required,"
	String = String & "NullPointerException,signin xmlns,signinResponse xmlns,"
	String = String & "some xmlns,someResponse xmlns,someother xmlns,"
	String = String & "someotherResponse xmlns,somethinelse xmlns"
	searchstr = Split(String, ",")
	For Each t In searchstr
		Str = "%comspec% /k " & ScriptDir & "grep -hc """ & t & """ " & UnxPath & strMachineName & "@" & DataCenter & ".log." & nYear & "-" & nMonth & "-" & nDay & "-*"
		'WScript.Echo Str
		'objWS.Run(Str)
		Set objExecObject = objWS.Exec(Str)
		Results = objExecObject.StdOut.ReadLine()
		fileOutput2.WriteLine Results & "," & t
	Next
End Sub

Open in new window


Thanks,

Kent
0
pwustCommented:
Hm. If you would enumerate all log files matching your pattern within VBScript, you then could loop throught these files with "grep -hc ..." and add together all output figures within VBScript, like (pseudo code)
for each finding in (enumeration of needed log files)
  do
    call "grep -hc" with the current file
    add number to total sum
  done
(use aggregated sum for further processing)
 

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Kent DyerIT Security Analyst SeniorAuthor Commented:
Decided to back up and take a different approach on this..  Everything seems to be working fine with the exception of totalling the hourly counts..

gawk -F ',' 'BEGIN { print $3 };{ x = x + $2 } END { print x }' resultfile.csv | sort | uniq > aggregate.csv

Open in new window


I have even tried to remove the comma to have uniq -c sum things up, but does not seem to work..  I am so close.

gawk '{ print substr($1,7,length($1)) }' resultfile.csv | sort| uniq -c | sort -r -n | sed "s/\([0-9]\),/\1 /" | uniq | sort -t "","" -k 2,2 -n -r > aggregate.csv

Open in new window


Thanks,

Kent
0
pwustCommented:
how does this resultfile.csv look like, where you wan to total values on?
0
Kent DyerIT Security Analyst SeniorAuthor Commented:
Here is what I have in the hourly logs:

Hour,Count,SearchString
00:00,48,javax.crypto.BadPaddingException: Given final block not properly padded
00:00,0,Unable to get DB connection
00:00,0,java.lang.IllegalArgumentException: userName is required
00:00,0,NullPointerException
00:00,6150,signin xmlns
00:00,6150,signinResponse xmlns
00:00,2360,challenge xmlns
00:00,2360,challengeResponse xmlns
00:00,4548,notify xmlns
00:00,4548,notifyResponse xmlns
00:00,20,enroll xmlns
01:00,28,javax.crypto.BadPaddingException: Given final block not properly padded
01:00,0,Unable to get DB connection
01:00,0,java.lang.IllegalArgumentException: userName is required
01:00,0,NullPointerException
01:00,6155,signin xmlns
01:00,6155,signinResponse xmlns
01:00,2490,challenge xmlns
01:00,2490,challengeResponse xmlns
01:00,4484,notify xmlns
01:00,4484,notifyResponse xmlns
01:00,9,enroll xmlns

What I would like to see is the hourly file rolled up:

Hour,Count,SearchString
76,javax.crypto.BadPaddingException: Given final block not properly padded
0,Unable to get DB connection
0,java.lang.IllegalArgumentException: userName is required
0,NullPointerException
12305,signin xmlns
12305,signinResponse xmlns
4850,challenge xmlns
4850,challengeResponse xmlns
9032,notify xmlns
9032,notifyResponse xmlns
29,enroll xmlns

Thanks,

Kent
0
Kent DyerIT Security Analyst SeniorAuthor Commented:
I did get from one of my colleagues the following:
gawk -F ',' '{print $3}' "results.csv" |sort |uniq -c |gawk '{print $1","$2}' |sort -t "","" -nr -k 2,2

Open in new window


And results in the following:
24,signinResponse
24,signin
24,notifyResponse
24,notify
24,javax.crypto.BadPaddingExcep
24,java.lang.IllegalArgumentExc
24,enroll
24,challengeResponse
24,challenge
24,Unable
24,NullPointerException

The problem is that it does count how many times a string appears but does not sum the counts..

I can do these individually:
gawk -F ',' '{print $3}' "results.csv" |sort|uniq

Open in new window


NullPointerException
Unable to get DB connection
challenge xmlns
challengeResponse xmlns
enroll xmlns
java.lang.IllegalArgumentException: userName is required
javax.crypto.BadPaddingException: Given final block not properly padded
notify xmlns
notifyResponse xmlns
signin xmlns
signinResponse xmlns

but this shows how many times a number appears..  Argh..
gawk -F ',' '{print $2}' "results.csv" | sort -t "","" -nr -k 2 |uniq -c

Open in new window


     1 9958
      1 9955
      2 987
      2 980
      2 971
      2 957
      1 9125
      1 9117
      1 9

Let's go back and look at the information that my colleague sent and make a couple tweaks..
gawk -F ',' '{print $3}' "results.csv" |sort |uniq -c |gawk '{print $1};{print $2","$3","$4","$5","$6","$7","$8}'

Open in new window


Still is not right..
24
NullPointerException,,,,,,
24
Unable,to,get,DB,connection,,
24
challenge,xmlns,,,,,

Thanks,

Kent
0
Kent DyerIT Security Analyst SeniorAuthor Commented:
I am very, very close..

http://www.commandlinefu.com/commands/view/2967/sum-columns-from-csv-column-col

gawk -F ',' '{ x = x + $3 } END { print x }' resultsfile.csv

Open in new window

and it does a summation of all entries..  Now, just need to do it by category..

Thanks,

Kent
0
Kent DyerIT Security Analyst SeniorAuthor Commented:
Got it!  It is now working correctly..

gawk 'BEGIN {FS=","}{Msg[$3]++;count[$3]+=$2;}END{for (var in Msg)print count[var]","var;}' "results.csv" | sort -nr > aggregate.csv

Open in new window


Thanks,

Kent
0
Kent DyerIT Security Analyst SeniorAuthor Commented:
This got me to thinking about what we need to do for this.

Thanks!

Kent
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Shell Scripting

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.