Solved

Bash script / while loop extremely slow read file

Posted on 2014-03-13
8
742 Views
Last Modified: 2014-03-25
I have a while loop that that reads in a ftp log file and puts it into an array so I'll be able to search through the array and match up/search for a flow. Unfortunately the while loop is taking forever to get through the file, it is a very large file but there must be another faster way of doing this.

# read file into array for original search results
while read FTP_SEARCH
do
ogl_date[count]=`echo $FTP_SEARCH | awk '{print $1, $2}'`
ogl_time[count]=`echo $FTP_SEARCH | awk '{print $3}'`
ogl_server[count]=`echo $FTP_SEARCH | awk '{print $4}'`
ogl_id[count]=`echo $FTP_SEARCH | awk '{print $5}'`
ogl_type[count]=`echo $FTP_SEARCH | awk -F '[' '{print $1}' | awk '{print $5}'`
ogl_pid[count]=`echo $FTP_SEARCH | awk -F'[' '{print $2}' | awk -F']' '{print $1}'`
ogl_commands[count]=`echo $FTP_SEARCH | awk '{
    for(i = 6; i <= NF; i++) 
        print $i;
    }'`

let "count += 1"

done < /tmp/ftp_search.14-12-02

Open in new window

0
Comment
Question by:dloszewski
  • 3
  • 2
  • 2
  • +1
8 Comments
 

Author Comment

by:dloszewski
ID: 39926825
some sample from ftp_search

Dec  1 23:59:03 sslmftp1 ftpd[4152]: USER xxxxxx  
Dec  1 23:59:03 sslmftp1 ftpd[4152]: PASS password  
Dec  1 23:59:03 sslmftp1 ftpd[4152]: FTP LOGIN FROM 172.19.x.xx [172.19.x.xx], xxxxxx  
Dec  1 23:59:03 sslmftp1 ftpd[4152]: PWD  
Dec  1 23:59:03 sslmftp1 ftpd[4152]: CWD /test/data/872507/  
Dec  1 23:59:03 sslmftp1 ftpd[4152]: TYPE Image`
0
 
LVL 84

Expert Comment

by:ozo
ID: 39926948
What are you doing with ogl_date, ogl_time, ogl_server, ogl_id, ogl_type, ogl_type, ogl_commands?
What do the lines in /tmp/ftp_search.14-12-02 look like?  
This should be a little faster, but knowing more about the format of each line or what you want to do with the arrays would probably allow further improvements

while read FTP_1 FTP_2 FTP_3 FTP_4 FTP_5 FTP_6
do
ogl_date[count]="$FTP_1 $FTP_2"
ogl_time[count]=$FTP_3
ogl_server[count]=$FTP_4
ogl_id[count]=$FTP_5
ogl_type[count]=`echo $FTP_1 $FTP_2 $FTP_3 $FTP_4 $FTP_5 $FTP_6 | awk -F '[' '{print $1}' | awk '{print $5}'`
ogl_pid[count]=`echo $FTP__1 $FTP_2 $FTP_3 $FTP_4 $FTP_5 $FTP_6 | awk -F'[' '{print $2}' | awk -F']' '{print $1}'`
ogl_commands[count]=$FTP_6
let count+=1
done < /tmp/ftp_search.14-12-02
0
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 39926960
. . .  puts it into an array so I'll be able to search through the array and match up/search for a flow. . .
There may be other alternatives, but you need to provide the requirements / expected results for the above.
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 
LVL 84

Expert Comment

by:ozo
ID: 39926973
Given the format of http:#a39926825, this should be equivalent

# read file into array for original search results
while read FTP_1 FTP_2 FTP_3 FTP_4 FTP_5 FTP_6
do
ogl_date[count]="$FTP_1 $FTP_2"
ogl_time[count]=$FTP_3
ogl_server[count]=$FTP_4
ogl_id[count]=$FTP_5
ogl_type[count]=${FTP_5%[*}
FTP_5=${FTP_5%]*}
ogl_pid[count]=${FTP_5#*[}
ogl_commands[count]=$FTP_6
let count+=1
done < /tmp/ftp_search.14-12-02
0
 

Author Comment

by:dloszewski
ID: 39926991
Basically, I have a ftp log file with above data, and I want to show the entire flow by searching username or IP. So I figured I'd read data into array, search for criteria, and then match that process id with others so I would get the entire flow.
0
 
LVL 29

Accepted Solution

by:
MikeOM_DBA earned 500 total points
ID: 39927758
Perhaps if you load the data into some database (Access/ MySQL/ Oracle/ ...) it would be quicker and then you can analyze using sql queries!

Loaded into M$ Access
0
 
LVL 62

Expert Comment

by:gheist
ID: 39941055
popular web statistics pacages have recipies for handling ftp xferlogs from popular ftp servers.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39941079
> read data into array, search for criteria, and then match that process id
Depending on how you are doing this, I would think it could be faster to
search for criteria, match that process id, and then read data into array
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction: Dialogs (2) modeless dialog and a worker thread.  Handling data shared between threads.  Recursive functions. Continuing from the tenth article about sudoku.   Last article we worked with a modal dialog to help maintain informat…
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

791 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question