Solved

Performance Challenge... converting multi lines sql output into single lines csv...

Posted on 2006-11-06
11
418 Views
Last Modified: 2013-12-26

Hi Unix experts,  I have a small script that takes an input from a oracle select statement that looks like this:
(Its sorted by email so they are grouped, and all distinct rows)

usereamail1     user name     date logged in     role1
usereamail1     user name     date logged in     role2
usereamail2     user name     date logged in     role1
usereamail2     user name     date logged in     role2
...


and converts it to:

usereamail1, user name, date logged in, role1, role2
usereamail2, user name, date logged in, role1, role2
usereamail3, user name, date logged in, role6
usereamail4, user name, date logged in, role1, role3, role5, role6, role7
...

This is then emailed to the system owner as a csv file which works ok.

It has to work through 53,000+ records brings it down to about 20K, but it is taking 1/2 hr to run... How can I speed up the code (below) that does the main processing as my ksh (this is on a HP box) skills are pretty basic.


while read inputLine
do
  if [ "${inputLine}" !=  "" ] ; then

    email=`echo "$inputLine" | cut -d"," -f1 `
    name=`echo "$inputLine" |  cut -d"," -f2`
    logondate=`echo "$inputLine" | cut -d"," -f3`
    role=`echo "$inputLine" |  cut -d"," -f4`

    if [ "$email" != "$lastEmail" ] ; then
      echo "$lastEmail,$oldname,$oldlogon,$outputRoles" >>$outFile
      outputRoles=""
    fi

    outstr=`echo "$email,$name,$logondate"`
    oldname=$name
    oldlogon=$logondate
    lastEmail=$email
    outputRoles=`echo "${outputRoles}${role},"`
  fi
done < ${tmpFile}

Thanks,
S.
0
Comment
Question by:SimonBlake
  • 5
  • 3
  • 3
11 Comments
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 17882118

You could try using awk:

awk '
{name[$1]=$2; logondt[$1]=$3; role[$1]=role[$1]","$4;}
END {for (e in name) print e","name[e]","logondt[e]","role[e];}
' ${tmpFile}
0
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 17882170

PS: add this to last line:
...etc...
' ${tmpFile} >$outFile
0
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 17882194

Ooops, to remove extra comma use:

awk '
{name[$1]=$2; logondt[$1]=$3; role[$1]=role[$1]","$4;}
END {for (e in name) print e","name[e]","logondt[e] role[e];}
' ${tmpFile}  >$outFile

0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 7

Author Comment

by:SimonBlake
ID: 17882392

It's damn close... couple of things tho - it's concatinating the dates instead of the roles and putting extra comma's where spaces exist, eg between the first and second names  - the separator is tab in the sql output I think.

I can sed the line to remove the tabs with a different delimeter easily tho - I struggle with awk quite a bit tho, altho I've just noticed

The target is less than 45 secs to output the whole lot ( laptop with vbscript time ;)  )

eg. (I got - names deleted to protect the innocent)
some.one@somewhere.com,First,Second,18/09/2006,18/09/2006,18/09/2006,18/09/2006,18/09/2006,18/09/2006
0
 
LVL 29

Accepted Solution

by:
MikeOM_DBA earned 500 total points
ID: 17883016

Better post a sample of the original data.

Use this for tab delimiters:
IFS="\009"
awk '
{ name[$1]=$2; logondt[$1]=$3; role[$1]=role[$1]","$4; }
END {for (e in name) print e","name[e]","logondt[e]role[e];}
' tmpFile >outFile

PS: The extra comma is resolved on my last post.
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 17884312
sed might be faster:
    sed -e "s/^\(.*\)\t\(.*\)\t\(.*\)\t\(.*\)$/\1,\2,\3,\4/"
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 17884354
Just tried it with sed - took 5 seconds.
0
 
LVL 7

Author Comment

by:SimonBlake
ID: 17888480

Thanks Mike, had to make a few changes, modified the sql statement so that it didn't output random tabs/spaces between columns, and also added a specified delimiter to tighten up the output... then in the awk line used -F to specify the field sep.

sorry bpmurray, your sed line gave me a 0 byte file, I tried myself just using sed lines but couldn't get it to work.

S.
0
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 17890752

And the final timming is?
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 17892630
Just realised - there's more to this than simple extracting the columns as sed does. Anyway that was just the script, the whole thing is:

     sed -e "s/^\(.*\)\t\(.*\)\t\(.*\)\t\(.*\)$/\1,\2,\3,\4/" input.file >output.file
0
 
LVL 7

Author Comment

by:SimonBlake
ID: 17892786

Down from 30 mins (original ksh script) to about 32 seconds so as expected (for the whole process - select statement, generate report and mail), faster than, my low on memory, too many processes/threads loaded, windows 2000 laptop with the latest pre-vista-launch go slow patches by 11 secs (altho that was non compiled vb script to be fair, but included all the same work)...

Thanks.
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction: Dynamic window placements and drawing on a form, simple usage of windows registry as a storage place for information. Continuing from the first article about sudoku.  There we have designed the application and put a lot of user int…
Introduction: Hints for the grid button.  Nested classes, templated collections.  Squash that darned bug! Continuing from the sixth article about sudoku.   Open the project in visual studio. First we will finish with the SUD_SETVALUE messa…
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
In an interesting question (https://www.experts-exchange.com/questions/29008360/) here at Experts Exchange, a member asked how to split a single image into multiple images. The primary usage for this is to place many photographs on a flatbed scanner…

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question