[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Aggregate Count of 24 hourly Apache Logs

Posted on 2011-10-24
13
Medium Priority
?
668 Views
Last Modified: 2012-06-27
In my daily scripts, it is running the following which seems to work, but does take a very long time..
gawk '/searchString/ { count++ } END { print count } log1 log2 logx

Open in new window


I also know that you can do a grep -hc log1 log2 logx

Just trying to understand the most efficient way to do this.

Thanks,

Kent
0
Comment
Question by:Kent Dyer
  • 8
  • 5
13 Comments
 
LVL 8

Expert Comment

by:pwust
ID: 37023212
generally speaking: for this job grep is MUCH more efficient than awk/gawk.

If you want to have one sum only while using grep, try the following:
SUM=0;for TERM in `grep -hc "searchString" access_log*` ; do SUM=$((SUM+TERM)); done; echo $SUM

Open in new window

0
 
LVL 17

Author Comment

by:Kent Dyer
ID: 37028269
This does not seem to work..  I even tried to put this into a bash shell in Windows..

D:\scripts>SUM=0;for TERM in `grep -hc "signin xmlns" /logs/access.log.2011-10-2
5-*` ; do SUM=$((SUM+TERM)); done; echo $SUM
SUM: unknown option -- h
Try `SUM --help' for more information.

Tried a couple variations of this with no success.

D:\scripts>grep -hc "signin xmlns" /logs/access.log.2011-10-25-*;
4970
This seems to work fine..

Thanks,

Kent
0
 
LVL 8

Expert Comment

by:pwust
ID: 37029490
ok, the snippet was for Bash shell. I did not realize that you are using Windows here.
0
Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

 
LVL 8

Expert Comment

by:pwust
ID: 37029491
How do you have installed grep on your Windows box? is it part of a Cygwin installation, or of GnuWin32, or something else?
0
 
LVL 17

Author Comment

by:Kent Dyer
ID: 37030392
I originally started with the Unxtools

http://unxutils.sourceforge.net/

but then I wound up going with CygWin..

http://www.cygwin.com

I kind of cheated with Cygwiin as I have not installed the binaries to servers, but just copied them to the server and the seem to work fine.

The hourly files seem to work fine.  However, the ones where I am counting all of the records for the entire day, seem to process.  But it never lets me know that this has completed.

Here is the VBS Code that I am "calling" to the UNIX tools:
Option Explicit
Const ForReading = 1, ForWriting = 2
Dim oFSO, objWS
Dim yMonth, yDay, yYear, Yesterday
Dim nMonth, nDay, nYear, nToday
Dim Log4Path, FL, f1, line, LogPath, Separator
Dim FullApacheLogPath, ApacheLog, APacheLogPath, DataCenter
Dim ScriptDir, LogDir, WshNetwork, strMachineName
Dim ResultFile, ResultFile2, fileoutput, fileoutput2
Dim objFolder, colFiles, objFile, FLPath, FLName, FLlist
Dim YearVar, MonthVar, DayVar, DateVar, HourVar
Dim String, searchstr, t, Str, objExecObject, Results

Set oFSO = CreateObject("Scripting.FileSystemObject")
Set WshNetwork = CreateObject("Wscript.Network")
Set objWS = CreateObject("WScript.Shell")

strMachineName = WshNetwork.ComputerName 

log4j

' -- Calculate yesterday "y" for yesterday
yMonth = Right(100 + Month(Date() - 1), 2)
yDay = Right(100 + Day(Date() - 1), 2)
yYear = Year(Date() - 1)
Yesterday = yMonth & "-" & yDay & "-" & yYear

' -- Calculate Today "n" now or today
nMonth = Right(100 + Month(Date()), 2)
nDay = Right(100 + Day(Date()), 2)
nYear = Year(Date())
nToday = nMonth & "-" & nDay & "-" & nYear

' -- Set some additional variables to be re-used
ScriptDir = Replace(WScript.ScriptFullName, WScript.ScriptName, "")
LogDir = ScriptDir & "logs\"
If Not oFSO.FolderExists(LogDir) Then
	oFSO.CreateFolder(LogDir)
End If
If Not oFSO.FolderExists("D:\logs\ArchiveTemp\stats\") Then
	oFSO.CreateFolder("D:\logs\ArchiveTemp\stats\")
End If
ResultFile = "D:\logs\ArchiveTemp\stats\" & strMachineName & "." & yYear & "-" & yMonth & "-" & yDay & ".csv"
ResultFile2 = LogDir & "Aggregate_stats_" & strMachineName & "." & yYear & "-" & yMonth & "-" & yDay & ".csv"

' -- For testing, we want to clear our previously generated result data
If oFSO.FileExists(ResultFile) Then
	oFSO.DeleteFile(ResultFile)
End If
If oFSO.FileExists(ResultFile2) Then
	oFSO.DeleteFile(ResultFile2)
End If

' -- We need to create the text files to work with
Set fileOutput = oFSO.CreateTextFile(ResultFile, True)
Set fileOutput2 = oFSO.CreateTextFile(ResultFile2, True)

'Wscript.echo APacheLogPath
LogLoop APacheLogPath
SumAggregate
Mailer

Sub LogLoop(Path)
	Set objFolder = oFSO.GetFolder(Path)
	Set colFiles = objFolder.Files
	For Each objFile In colFiles
		FLPath = objFile.Path
		FLName = objFile.Name
		If InStr(FLName, "alarm") Or InStr(FLName, "failed") Or InStr(FLName, "audit") Or InStr(FLName, "foren") Then
		Else
			If InStr(FLName, ".log.") Then
				If InStr(FLName, "-") Then '-- We want to be sure that we have file names that have a date within them
					'WScript.Echo FLName
					If InStr(FLName, ApacheLog) Then
						YearVar = Right(Split(FLName, "-") (0), 4)
						'Wscript.Echo YearVar
						MonthVar = Right(Split(FLName, "-") (1), 2)
						'Wscript.Echo MonthVar
						DayVar = Left(Split(FLName, "-") (2), 2)
						'Wscript.Echo DayVar
						HourVar = Split(FLName, "-") (3)
						'Wscript.Echo HourVar
						DateVar = MonthVar & "-" & DayVar & "-" & YearVar
						'If DateVar = Yesterday Then ' -- We only want yesterday's logs
						If DateVar = nToday Then ' -- We only want yesterday's logs
							'WScript.Echo FLName
							'WScript.Echo DateVar
							' 6.x servers use the following
							''If InStr(FLName, "aa_server.log") Then
							'WScript.Echo "We Made it"
							HourVar = Split(FLName, "-") (3)
							'WScript.Echo HourVar
							'fileOutput.WriteLine FLName & "," & DateVar & "," & HourVar
							FLlist = FLlist & " " & Chr(34) & APacheLogPath & FLName & Chr(34)
							'Wscript.Echo FLlist
							String = "javax.crypto.BadPaddingException: Given final block not properly padded,"
							String = String & "Unable to get DB connection,java.lang.IllegalArgumentException: userName is required,"
							String = String & "NullPointerException,signin xmlns,signinResponse xmlns,"
							String = String & "some xmlns,someResponse xmlns,someother xmlns,"
							String = String & "someotherResponse xmlns,somethinelse xmlns"
							searchstr = Split(String, ",")
							For Each t In searchstr
								'fileOutput.WriteLine vbcrlf & "counts for: " & t
								Str = "%comspec% /k " & ScriptDir & "grep -c  """ & t & """ " & APacheLogPath & FLName & ""
								'WScript.Echo Str
								''objWS.Run(Str)
								Set objExecObject = objWS.Exec(Str)
								Results = objExecObject.StdOut.ReadLine()
								fileOutput.WriteLine HourVar & ":00," & Results & "," & t
							Next
						End If
					End If
				End If
			End If
		End If
	Next
End Sub

Sub Mailer()
	Dim recipient, server, from, subject, body, strCommand, oExec
	recipient = "recipient1@company.com,recipient2@company.com"
	'recipient = "recipient1@company.com"
	If InStr(strMachineName, "XXX")  Then
		server = "1.2.3.4" ' -- DataCenter 1
	End If
	If InStr(strMachineName, "YYY") Then
		server = "5.6.7.8" ' -- DataCenter2
	End If
	from = "recipient1@company.com"
	subject = "Statistical Counts from: " & strMachineName
	strCommand = "%comspec% /c " & ScriptDir & "Blat.exe"
	strCommand = strCommand & " -log ""D:\Scripts\Blat.log"""
	strCommand = strCommand & " -f " & Chr(34) & from & Chr(34) & ""
	strCommand = strCommand & " -to " & Chr(34) & recipient & Chr(34) & ""
	strCommand = strCommand & " -server " & Chr(34) & server & Chr(34) & ""
	strCommand = strCommand & " -subject " & Chr(34) & subject & Chr(34) & ""
	strCommand = strCommand & " -body ""Attached are the Statistical Counts and are also on the server at: """ & ResultFile
	'strCommand = strCommand & " -bodyF " & Chr(34) & ResultFile & Chr(34) & ""
	strCommand = strCommand & " -attach " & Chr(34) & ResultFile & Chr(34) & "," & Chr(34) & ResultFile2 & Chr(34) & ""
	'strCommand = strCommand & " -attach " & Chr(34) & ResultFile & Chr(34) & ""
	strCommand = strCommand & " -q"
	'WScript.Echo strCommand
	Set oExec = objWS.Exec(strCommand)
	While Not oExec.Status = 0
		WScript.Sleep 10
	Wend
	Set objWS = Nothing
End Sub

Sub log4j
	'C:\WEB-APPS\webapps1\pmws_server\WEB-INF\classes
	'log4j.appender.LOGFILE.File=D:/Logs/SERVER@DATACENTER.log ' -- current log
	'log4j.appender.forensic.File=D:/Logs/SERVER@DATACENTER_forensic.log ' --current forensic log
	' -- With Apache Tomcat under Windows, you can use "/" or "\\" for the folder file paths
						
	Log4Path = "C:\WEB-APPS\webapps1\pmws_server\WEB-INF\classes\"
	FL = Log4Path & "log4j.properties"
						
	Set f1 = oFSO.OpenTextFile(FL, ForReading, True)
						
	Do Until f1.AtEndOfStream
		line = f1.Readline
		If InStr(line, "log4j.appender.LOGFILE.File") Then
			LogPath = Split(line, "=") (1)
			Select Case True
				Case InStrRev(LogPath, "\\") > 0
					Separator = "\\"
				Case InStrRev(LogPath, "/") > 0
					Separator = "/"
			End Select
			FullApacheLogPath = Replace(LogPath, Separator, "\") ' -- e.g.: D:\Logs\SERVER@DATACENTER.log
			'WScript.Echo FullApacheLogPath
			ApacheLog = Right(LogPath, Len(logPath) - InStrRev(LogPath, Separator)) ' -- e.g.: SERVER@DATACENTER.log
			'WScript.Echo ApacheLog
			APacheLogPath = Replace(Left(LogPath, InStrRev(LogPath, Separator)), Separator, "\") ' -- e.g.: D:\Logs\
			'WScript.Echo APacheLogPath
			'If Not Instr(Log4Path,"tomcat") then
			'	Server = Split(ApacheLog, "@") (0) ' -- e.g.: SERVER
			'	'WScript.Echo Server
			'End If
			If Not InStr(Log4Path, "tomcat") Then
				DataCenter = Left(Split(ApacheLog, "@") (1), Len(Split(ApacheLog, "@") (1)) - 4) ' -- e.g.: DATACENTER
				'WScript.Echo DataCenter
			End If
		End If
		'If InStr(line, "log4j.appender.forensic.File") Then
		'	ForensicLogPath = Split(line, "=") (1)
		'	Select Case True
		'		Case InStrRev(ForensicLogPath, "\\") > 0
		'			Separator = "\\"
		'		Case InStrRev(ForensicLogPath, "/") > 0
		'			Separator = "/"
		'	End Select
		'	ApacheForensicLogPath = Replace(ForensicLogPath, Separator, "\")
		'	ApacheForensicLog = Right(ForensicLogPath, Len(ForensicLogPath) - InStrRev(ForensicLogPath, Separator))
		'	'WScript.Echo ApacheForensicLog
		'End If
	Loop
End Sub

Sub SumAggregate
	Dim UnxPath
	UnxPath = Right(Replace(APacheLogPath, "\", "/"), Len(APacheLogPath) - 2)
	'Wscript.Echo UnxPath
	String = "javax.crypto.BadPaddingException: Given final block not properly padded,"
	String = String & "Unable to get DB connection,java.lang.IllegalArgumentException: userName is required,"
	String = String & "NullPointerException,signin xmlns,signinResponse xmlns,"
	String = String & "some xmlns,someResponse xmlns,someother xmlns,"
	String = String & "someotherResponse xmlns,somethinelse xmlns"
	searchstr = Split(String, ",")
	For Each t In searchstr
		Str = "%comspec% /k " & ScriptDir & "grep -hc """ & t & """ " & UnxPath & strMachineName & "@" & DataCenter & ".log." & nYear & "-" & nMonth & "-" & nDay & "-*"
		'WScript.Echo Str
		'objWS.Run(Str)
		Set objExecObject = objWS.Exec(Str)
		Results = objExecObject.StdOut.ReadLine()
		fileOutput2.WriteLine Results & "," & t
	Next
End Sub

Open in new window


Thanks,

Kent
0
 
LVL 8

Accepted Solution

by:
pwust earned 2000 total points
ID: 37034277
Hm. If you would enumerate all log files matching your pattern within VBScript, you then could loop throught these files with "grep -hc ..." and add together all output figures within VBScript, like (pseudo code)
for each finding in (enumeration of needed log files)
  do
    call "grep -hc" with the current file
    add number to total sum
  done
(use aggregated sum for further processing)
 

Open in new window

0
 
LVL 17

Author Comment

by:Kent Dyer
ID: 37042126
Decided to back up and take a different approach on this..  Everything seems to be working fine with the exception of totalling the hourly counts..

gawk -F ',' 'BEGIN { print $3 };{ x = x + $2 } END { print x }' resultfile.csv | sort | uniq > aggregate.csv

Open in new window


I have even tried to remove the comma to have uniq -c sum things up, but does not seem to work..  I am so close.

gawk '{ print substr($1,7,length($1)) }' resultfile.csv | sort| uniq -c | sort -r -n | sed "s/\([0-9]\),/\1 /" | uniq | sort -t "","" -k 2,2 -n -r > aggregate.csv

Open in new window


Thanks,

Kent
0
 
LVL 8

Expert Comment

by:pwust
ID: 37043692
how does this resultfile.csv look like, where you wan to total values on?
0
 
LVL 17

Author Comment

by:Kent Dyer
ID: 37046706
Here is what I have in the hourly logs:

Hour,Count,SearchString
00:00,48,javax.crypto.BadPaddingException: Given final block not properly padded
00:00,0,Unable to get DB connection
00:00,0,java.lang.IllegalArgumentException: userName is required
00:00,0,NullPointerException
00:00,6150,signin xmlns
00:00,6150,signinResponse xmlns
00:00,2360,challenge xmlns
00:00,2360,challengeResponse xmlns
00:00,4548,notify xmlns
00:00,4548,notifyResponse xmlns
00:00,20,enroll xmlns
01:00,28,javax.crypto.BadPaddingException: Given final block not properly padded
01:00,0,Unable to get DB connection
01:00,0,java.lang.IllegalArgumentException: userName is required
01:00,0,NullPointerException
01:00,6155,signin xmlns
01:00,6155,signinResponse xmlns
01:00,2490,challenge xmlns
01:00,2490,challengeResponse xmlns
01:00,4484,notify xmlns
01:00,4484,notifyResponse xmlns
01:00,9,enroll xmlns

What I would like to see is the hourly file rolled up:

Hour,Count,SearchString
76,javax.crypto.BadPaddingException: Given final block not properly padded
0,Unable to get DB connection
0,java.lang.IllegalArgumentException: userName is required
0,NullPointerException
12305,signin xmlns
12305,signinResponse xmlns
4850,challenge xmlns
4850,challengeResponse xmlns
9032,notify xmlns
9032,notifyResponse xmlns
29,enroll xmlns

Thanks,

Kent
0
 
LVL 17

Author Comment

by:Kent Dyer
ID: 37050065
I did get from one of my colleagues the following:
gawk -F ',' '{print $3}' "results.csv" |sort |uniq -c |gawk '{print $1","$2}' |sort -t "","" -nr -k 2,2

Open in new window


And results in the following:
24,signinResponse
24,signin
24,notifyResponse
24,notify
24,javax.crypto.BadPaddingExcep
24,java.lang.IllegalArgumentExc
24,enroll
24,challengeResponse
24,challenge
24,Unable
24,NullPointerException

The problem is that it does count how many times a string appears but does not sum the counts..

I can do these individually:
gawk -F ',' '{print $3}' "results.csv" |sort|uniq

Open in new window


NullPointerException
Unable to get DB connection
challenge xmlns
challengeResponse xmlns
enroll xmlns
java.lang.IllegalArgumentException: userName is required
javax.crypto.BadPaddingException: Given final block not properly padded
notify xmlns
notifyResponse xmlns
signin xmlns
signinResponse xmlns

but this shows how many times a number appears..  Argh..
gawk -F ',' '{print $2}' "results.csv" | sort -t "","" -nr -k 2 |uniq -c

Open in new window


     1 9958
      1 9955
      2 987
      2 980
      2 971
      2 957
      1 9125
      1 9117
      1 9

Let's go back and look at the information that my colleague sent and make a couple tweaks..
gawk -F ',' '{print $3}' "results.csv" |sort |uniq -c |gawk '{print $1};{print $2","$3","$4","$5","$6","$7","$8}'

Open in new window


Still is not right..
24
NullPointerException,,,,,,
24
Unable,to,get,DB,connection,,
24
challenge,xmlns,,,,,

Thanks,

Kent
0
 
LVL 17

Author Comment

by:Kent Dyer
ID: 37050809
I am very, very close..

http://www.commandlinefu.com/commands/view/2967/sum-columns-from-csv-column-col

gawk -F ',' '{ x = x + $3 } END { print x }' resultsfile.csv

Open in new window

and it does a summation of all entries..  Now, just need to do it by category..

Thanks,

Kent
0
 
LVL 17

Author Comment

by:Kent Dyer
ID: 37057468
Got it!  It is now working correctly..

gawk 'BEGIN {FS=","}{Msg[$3]++;count[$3]+=$2;}END{for (var in Msg)print count[var]","var;}' "results.csv" | sort -nr > aggregate.csv

Open in new window


Thanks,

Kent
0
 
LVL 17

Author Closing Comment

by:Kent Dyer
ID: 37057476
This got me to thinking about what we need to do for this.

Thanks!

Kent
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This Windows batch file is useful for organizing image files from a digital camera or other source, but can have many other uses.  It simply renames the file(s) to match their create date.  For example, if you took a picture today at 1:40pm and the …
How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…
Suggested Courses
Course of the Month17 days, 21 hours left to enroll

829 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question