Link to home
Start Free TrialLog in
Avatar of S-a-t
S-a-t

asked on

Shell Script to Search for ID in log file, skip searched files in next search

Hi Experts,

I need your help to create a shell script.

Below requirement lines may have repeated lines, please excuse me for that. It may be because I have tried to explain what ever first came in my mind.

Requirement is:

Search for data using parent ID from old.log log files which has sequence appended to it as suffix when it gets created.
Each old.log gets created with incremental sequence number after a perticular size has been reached like a log rotate.
I have to search through each old.log file for some data and capture it in Oracle database.

For example, lot of lines will have parent ID and same parent ID can also occur in next old.log sequenced file with an incremental suffix.
Once I searched for parent ID in a file and collected all it's data, I have to parse that data and insert in database.
Parsing meaning, I have to capture few field data like CHILDId, StartTime, FinishTime etc.

The challenge is, parent ID can be in next file too with it's relevant data so I have to search for parent ID in many files.
Data could be scattered all over log files, mostly once parent ID started the same parent ID would be in next log file and not in old log files. If parent ID started in old.log.12111 file then it can continue in next old.log.12112 log file.
old.log files be created every few minutes so script will have to remember which files it searched and skip them next time search.
I do not want to search through same log file next time, but I have to search parent ID in multiple log files to make sure I am not missing any data.
There will be multiple parent ID's in every log file, can be identied by line "PARENT", in below example parent ID is 1849312.
I want to run this script every one hour.

Below is ls -l output.

-rw-rw-r-- 1 user user1 121611 Apr  1 14:48 old.log.12111
-rw-rw-r-- 1 user user1 139872 Apr  1 14:48 old.log.12112
-rw-rw-r-- 1 user user1 147591 Apr  1 14:48 old.log.12113

Sample data from old.log after a grep

[user@server directory]$ grep 1849312 old.log.12111
"PARENT" "user" "1522849159" "1849312:user:NAME" "Triggered PARENT" " "
"PARENTDEF" "user" "1522849159" "user:NAME" "Instantiated PARENT definition" "PARENTId=1849312|PARENTUser=user"
"PARENT" "user" "1522849159" "1849312:user:NAME" "Start PARENT" " "
"CHILD" "user" "1522849159" "1849312:user:NAME:common_name" "Start CHILD" " "
"CHILD" "user" "1522849165" "1849312:user:NAME:common_name" "Started CHILD" "CHILDId=394330"
"CHILD" "user" "1522849165" "1849312:user:NAME:common_name" "Execute CHILD" "CHILDId=394330|Host=hostname"
"CHILD" "user" "1522849165" "1849312:user:NAME:common_name" "Finished CHILD" "CHILDId=394330|State=Exit|Status=5|StartTime=1522849165|FinishTime=1522849165|CPUUsage=0.890900 sec"
"PARENT" "user" "1522849165" "1849312:user:NAME" "Finished PARENT" "State=Exit|Status=5|StartTime=1522849159|FinishTime=1522849165"
[user@server directory]$

Below is script I came up with which needs modification with your expert help.

------------------
#!/bin/bash

touch parent1.txt

# Skip files searched earlier through parent1.txt
file=$(ls -l old.log.* | grep -v -f parent1.txt | awk '{print $9}')

# Check if variable has file names if empty dont do any thing
if [ -z "$file" ]

then

echo "Var is empty"

else

# Read each file for parent ID in current directory
grep "Start parent" $file | awk '{print $4}' | sed 's/"//' | sed 's/:.*//' | while read par

# Search for parent ID in each file
do grep $par $file

# Log all grep results in a file for later processing
done >output.txt

# Get all file names in a file to skip them next time for search
ls -l old.log.* | awk '{print $9}' >> parent.txt

# Get only uniq file names, all files will be uniq because of sequence suffix
cat parent.txt | sort | uniq > parent1.txt

# To get suffix
parent_id=$(tail -1 parent1.txt | sed 's/old.log.//')
# To get prefix file name
old_new="old.log."
# Experimental, thought of incrementing suffice and use it as next file to search, but not sure if I have to do that
new_file=$(($parent_id+1))
# Experimental, to get next file names
old_new_file=$($old_new$new_file)
echo $new_file

fi

Open in new window


------------------

Thanks in Advance!
ASKER CERTIFIED SOLUTION
Avatar of johnsone
johnsone
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of S-a-t
S-a-t

ASKER

Hi Johnsone,

Thank you for spending your valuable time, I really appreciate it.

I want to say, "What a wonderful logic!" I didnt think of it.

Let me try to implement this logic in the script and see if that resolves this issue.

In below you have mentioned timing.dat which we are moving at the end of the script.
I am wondering if we try to find with timing.dat for the first time it may throw an error because it is not created yet, what do you say?

file=`find . -name 'old.log.*' -newer timing.dat -print`

I'm not sure what you are trying to do with looking at other files for the same id.

Comment: Lot of details get included in old.log.xxxxxx file continuously, since old.log.xxxxxx gets renamed after every 500 KB it generates new log file with sequence number and that new file becomes current one to include next data.
parent ID is the key to search relevant data as per my observation and when I search with parent ID I get all details which can be used to extract information.

Application logs data in current log file and do not log anything in old log file so half information would be in old log file and remaining would be in current old.log file. Either I have to remember which file name I scanned last and which file to scan next or search in all old.log files. Another chanllenge is, I can get list of all parent ID's but I dont know when to stop searching for a perticular ID in next old.log file when the data for that ID is already found in previous log.

Lets say, I have 10 ID's which I searched in all logs and only last two didnt get me all information because they didnt finish processing what ever they were processing yet so they will do that in next couple of logs. May be after 4-5 logs. Generally speaking, we should exclude those 8 ID's from our next search and only use remaining two ID's in next search. That way it would be faster and organized. I am not sure how to achieve this yet.

Initially I was trying to do this real time but then logic was not getting developed so I have decided to keep data in a file and later process it to extract information.

Again, Thank you very much!
Yes, you would need to artifically create the file the first time.  You can use the touch command with -d to create the file the first time with whatever date you want.

Like I said, you can use touch and find to get older files.

What about loading the whole files into a database table and then querying from there?  Might be a lot easier to make your connections with a table.
Avatar of S-a-t

ASKER

I haven't done that before, loading of data into database.

How to do it?

I will prefer to load data into database because finally I have to load extracted information into database.

Thanks
I would think that I would use SQL*Loader to load the data into a table.  Then a PL/SQL procedure to process each file.

You could use external files and skip SQL*Loader (it would still use it internally), but with changing file names, SQL*Loader is easier.

There are plenty of sample control files out there.  Load everything as text and then deal with it one file at a time.  You only load them once and just leave them there.
Avatar of S-a-t

ASKER

The way we grep, find etc. in shell scripts to get required output, can we do that in SQL once data is loaded as text?
What you do is you load it as plain text.  That way nothing gets rejected.  Then you have a stored procedure parse it out into a relational model.  Then you can pretty easily query it.  You can query the raw text, but it isn't very effective.
Avatar of S-a-t

ASKER

Good idea, I can try that on a test server.

It would be first time to try this, any good site or youtube video to start with?

Thanks
Avatar of S-a-t

ASKER

From my above script (copied part of it below).

Any idea, how to get $par values stored in a text file while the loop is running. I mean I want each of the $par value to be stored in a file as the loop runs.

I also want to "| wc -l" to below line to get count and then put a condition to check if "wc -l" = 2 then dont store that $par value in a file else store it.

do grep $par $file | wc -l

Open in new window


# Check if variable has file names if empty dont do any thing
if [ -z "$file" ]

then

echo "Var is empty"

else

# Read each file for parent ID in current directory
grep "Start parent" $file | awk '{print $4}' | sed 's/"//' | sed 's/:.*//' | while read par

# Search for parent ID in each file
do grep $par $file

# Log all grep results in a file for later processing
done >output.txt

Open in new window

I would use perl. As It makes it fairly. Simple to organize the data as it is generated.
Process the log when it is generated added in a db

Missing is what the end result you expect based on the sample lines you posted
In terms of reference, try the documentation.  There are lots of good examples in there.  Not sure what version of Oracle you are working with, but SQL*Loader hasn't changed a whole lot over the years.

If you do get stuck, post the control file you are using, the command line and some sample data in a new question.  There are plenty of people here that can help with it if you get stuck.
Avatar of S-a-t

ASKER

Thanks for your comment arnold, at the moment the script is needed in Bash Shell Script but I will see to modify it perl in future.
Avatar of S-a-t

ASKER

Thanks Johnsone, I will go through it.
Avatar of S-a-t

ASKER

Hi Johnsone,

Thanks for your detailed answer it helped me to resolve my issue and I used it in script.
It was exactly what I wanted.

Sat