Solved

Bash script / while loop extremely slow read file

Posted on 2014-03-13
8
785 Views
Last Modified: 2014-03-25
I have a while loop that that reads in a ftp log file and puts it into an array so I'll be able to search through the array and match up/search for a flow. Unfortunately the while loop is taking forever to get through the file, it is a very large file but there must be another faster way of doing this.

# read file into array for original search results
while read FTP_SEARCH
do
ogl_date[count]=`echo $FTP_SEARCH | awk '{print $1, $2}'`
ogl_time[count]=`echo $FTP_SEARCH | awk '{print $3}'`
ogl_server[count]=`echo $FTP_SEARCH | awk '{print $4}'`
ogl_id[count]=`echo $FTP_SEARCH | awk '{print $5}'`
ogl_type[count]=`echo $FTP_SEARCH | awk -F '[' '{print $1}' | awk '{print $5}'`
ogl_pid[count]=`echo $FTP_SEARCH | awk -F'[' '{print $2}' | awk -F']' '{print $1}'`
ogl_commands[count]=`echo $FTP_SEARCH | awk '{
    for(i = 6; i <= NF; i++) 
        print $i;
    }'`

let "count += 1"

done < /tmp/ftp_search.14-12-02

Open in new window

0
Comment
Question by:dloszewski
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +1
8 Comments
 

Author Comment

by:dloszewski
ID: 39926825
some sample from ftp_search

Dec  1 23:59:03 sslmftp1 ftpd[4152]: USER xxxxxx  
Dec  1 23:59:03 sslmftp1 ftpd[4152]: PASS password  
Dec  1 23:59:03 sslmftp1 ftpd[4152]: FTP LOGIN FROM 172.19.x.xx [172.19.x.xx], xxxxxx  
Dec  1 23:59:03 sslmftp1 ftpd[4152]: PWD  
Dec  1 23:59:03 sslmftp1 ftpd[4152]: CWD /test/data/872507/  
Dec  1 23:59:03 sslmftp1 ftpd[4152]: TYPE Image`
0
 
LVL 84

Expert Comment

by:ozo
ID: 39926948
What are you doing with ogl_date, ogl_time, ogl_server, ogl_id, ogl_type, ogl_type, ogl_commands?
What do the lines in /tmp/ftp_search.14-12-02 look like?  
This should be a little faster, but knowing more about the format of each line or what you want to do with the arrays would probably allow further improvements

while read FTP_1 FTP_2 FTP_3 FTP_4 FTP_5 FTP_6
do
ogl_date[count]="$FTP_1 $FTP_2"
ogl_time[count]=$FTP_3
ogl_server[count]=$FTP_4
ogl_id[count]=$FTP_5
ogl_type[count]=`echo $FTP_1 $FTP_2 $FTP_3 $FTP_4 $FTP_5 $FTP_6 | awk -F '[' '{print $1}' | awk '{print $5}'`
ogl_pid[count]=`echo $FTP__1 $FTP_2 $FTP_3 $FTP_4 $FTP_5 $FTP_6 | awk -F'[' '{print $2}' | awk -F']' '{print $1}'`
ogl_commands[count]=$FTP_6
let count+=1
done < /tmp/ftp_search.14-12-02
0
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 39926960
. . .  puts it into an array so I'll be able to search through the array and match up/search for a flow. . .
There may be other alternatives, but you need to provide the requirements / expected results for the above.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 84

Expert Comment

by:ozo
ID: 39926973
Given the format of http:#a39926825, this should be equivalent

# read file into array for original search results
while read FTP_1 FTP_2 FTP_3 FTP_4 FTP_5 FTP_6
do
ogl_date[count]="$FTP_1 $FTP_2"
ogl_time[count]=$FTP_3
ogl_server[count]=$FTP_4
ogl_id[count]=$FTP_5
ogl_type[count]=${FTP_5%[*}
FTP_5=${FTP_5%]*}
ogl_pid[count]=${FTP_5#*[}
ogl_commands[count]=$FTP_6
let count+=1
done < /tmp/ftp_search.14-12-02
0
 

Author Comment

by:dloszewski
ID: 39926991
Basically, I have a ftp log file with above data, and I want to show the entire flow by searching username or IP. So I figured I'd read data into array, search for criteria, and then match that process id with others so I would get the entire flow.
0
 
LVL 29

Accepted Solution

by:
MikeOM_DBA earned 500 total points
ID: 39927758
Perhaps if you load the data into some database (Access/ MySQL/ Oracle/ ...) it would be quicker and then you can analyze using sql queries!

Loaded into M$ Access
0
 
LVL 62

Expert Comment

by:gheist
ID: 39941055
popular web statistics pacages have recipies for handling ftp xferlogs from popular ftp servers.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39941079
> read data into array, search for criteria, and then match that process id
Depending on how you are doing this, I would think it could be faster to
search for criteria, match that process id, and then read data into array
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction: Load and Save to file, Document-View interaction inside the SDI. Continuing from the second article about sudoku.   Open the project in visual studio. From the class view select CSudokuDoc and double click to open the header …
Introduction: The undo support, implementing a stack. Continuing from the eigth article about sudoku.   We need a mechanism to keep track of the digits entered so as to implement an undo mechanism.  This should be a ‘Last In First Out’ collec…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question