Solved

Bash script / while loop extremely slow read file

Posted on 2014-03-13
8
774 Views
Last Modified: 2014-03-25
I have a while loop that that reads in a ftp log file and puts it into an array so I'll be able to search through the array and match up/search for a flow. Unfortunately the while loop is taking forever to get through the file, it is a very large file but there must be another faster way of doing this.

# read file into array for original search results
while read FTP_SEARCH
do
ogl_date[count]=`echo $FTP_SEARCH | awk '{print $1, $2}'`
ogl_time[count]=`echo $FTP_SEARCH | awk '{print $3}'`
ogl_server[count]=`echo $FTP_SEARCH | awk '{print $4}'`
ogl_id[count]=`echo $FTP_SEARCH | awk '{print $5}'`
ogl_type[count]=`echo $FTP_SEARCH | awk -F '[' '{print $1}' | awk '{print $5}'`
ogl_pid[count]=`echo $FTP_SEARCH | awk -F'[' '{print $2}' | awk -F']' '{print $1}'`
ogl_commands[count]=`echo $FTP_SEARCH | awk '{
    for(i = 6; i <= NF; i++) 
        print $i;
    }'`

let "count += 1"

done < /tmp/ftp_search.14-12-02

Open in new window

0
Comment
Question by:dloszewski
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +1
8 Comments
 

Author Comment

by:dloszewski
ID: 39926825
some sample from ftp_search

Dec  1 23:59:03 sslmftp1 ftpd[4152]: USER xxxxxx  
Dec  1 23:59:03 sslmftp1 ftpd[4152]: PASS password  
Dec  1 23:59:03 sslmftp1 ftpd[4152]: FTP LOGIN FROM 172.19.x.xx [172.19.x.xx], xxxxxx  
Dec  1 23:59:03 sslmftp1 ftpd[4152]: PWD  
Dec  1 23:59:03 sslmftp1 ftpd[4152]: CWD /test/data/872507/  
Dec  1 23:59:03 sslmftp1 ftpd[4152]: TYPE Image`
0
 
LVL 84

Expert Comment

by:ozo
ID: 39926948
What are you doing with ogl_date, ogl_time, ogl_server, ogl_id, ogl_type, ogl_type, ogl_commands?
What do the lines in /tmp/ftp_search.14-12-02 look like?  
This should be a little faster, but knowing more about the format of each line or what you want to do with the arrays would probably allow further improvements

while read FTP_1 FTP_2 FTP_3 FTP_4 FTP_5 FTP_6
do
ogl_date[count]="$FTP_1 $FTP_2"
ogl_time[count]=$FTP_3
ogl_server[count]=$FTP_4
ogl_id[count]=$FTP_5
ogl_type[count]=`echo $FTP_1 $FTP_2 $FTP_3 $FTP_4 $FTP_5 $FTP_6 | awk -F '[' '{print $1}' | awk '{print $5}'`
ogl_pid[count]=`echo $FTP__1 $FTP_2 $FTP_3 $FTP_4 $FTP_5 $FTP_6 | awk -F'[' '{print $2}' | awk -F']' '{print $1}'`
ogl_commands[count]=$FTP_6
let count+=1
done < /tmp/ftp_search.14-12-02
0
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 39926960
. . .  puts it into an array so I'll be able to search through the array and match up/search for a flow. . .
There may be other alternatives, but you need to provide the requirements / expected results for the above.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 84

Expert Comment

by:ozo
ID: 39926973
Given the format of http:#a39926825, this should be equivalent

# read file into array for original search results
while read FTP_1 FTP_2 FTP_3 FTP_4 FTP_5 FTP_6
do
ogl_date[count]="$FTP_1 $FTP_2"
ogl_time[count]=$FTP_3
ogl_server[count]=$FTP_4
ogl_id[count]=$FTP_5
ogl_type[count]=${FTP_5%[*}
FTP_5=${FTP_5%]*}
ogl_pid[count]=${FTP_5#*[}
ogl_commands[count]=$FTP_6
let count+=1
done < /tmp/ftp_search.14-12-02
0
 

Author Comment

by:dloszewski
ID: 39926991
Basically, I have a ftp log file with above data, and I want to show the entire flow by searching username or IP. So I figured I'd read data into array, search for criteria, and then match that process id with others so I would get the entire flow.
0
 
LVL 29

Accepted Solution

by:
MikeOM_DBA earned 500 total points
ID: 39927758
Perhaps if you load the data into some database (Access/ MySQL/ Oracle/ ...) it would be quicker and then you can analyze using sql queries!

Loaded into M$ Access
0
 
LVL 62

Expert Comment

by:gheist
ID: 39941055
popular web statistics pacages have recipies for handling ftp xferlogs from popular ftp servers.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39941079
> read data into array, search for criteria, and then match that process id
Depending on how you are doing this, I would think it could be faster to
search for criteria, match that process id, and then read data into array
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction: Ownerdraw of the grid button.  A singleton class implentation and usage. Continuing from the fifth article about sudoku.   Open the project in visual studio. Go to the class view – CGridButton should be visible as a class.  R…
Over the years I've spent many an hour playing on hardened, DMZ'd servers, with only a sub-set of the usual GNU toy's to keep me company; frequently I've needed to save and send log or data extracts from these server back to my PC, or to others, and…
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

732 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question