Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 444
  • Last Modified:

Performance Challenge... converting multi lines sql output into single lines csv...


Hi Unix experts,  I have a small script that takes an input from a oracle select statement that looks like this:
(Its sorted by email so they are grouped, and all distinct rows)

usereamail1     user name     date logged in     role1
usereamail1     user name     date logged in     role2
usereamail2     user name     date logged in     role1
usereamail2     user name     date logged in     role2
...


and converts it to:

usereamail1, user name, date logged in, role1, role2
usereamail2, user name, date logged in, role1, role2
usereamail3, user name, date logged in, role6
usereamail4, user name, date logged in, role1, role3, role5, role6, role7
...

This is then emailed to the system owner as a csv file which works ok.

It has to work through 53,000+ records brings it down to about 20K, but it is taking 1/2 hr to run... How can I speed up the code (below) that does the main processing as my ksh (this is on a HP box) skills are pretty basic.


while read inputLine
do
  if [ "${inputLine}" !=  "" ] ; then

    email=`echo "$inputLine" | cut -d"," -f1 `
    name=`echo "$inputLine" |  cut -d"," -f2`
    logondate=`echo "$inputLine" | cut -d"," -f3`
    role=`echo "$inputLine" |  cut -d"," -f4`

    if [ "$email" != "$lastEmail" ] ; then
      echo "$lastEmail,$oldname,$oldlogon,$outputRoles" >>$outFile
      outputRoles=""
    fi

    outstr=`echo "$email,$name,$logondate"`
    oldname=$name
    oldlogon=$logondate
    lastEmail=$email
    outputRoles=`echo "${outputRoles}${role},"`
  fi
done < ${tmpFile}

Thanks,
S.
0
SimonBlake
Asked:
SimonBlake
  • 5
  • 3
  • 3
1 Solution
 
MikeOM_DBACommented:

You could try using awk:

awk '
{name[$1]=$2; logondt[$1]=$3; role[$1]=role[$1]","$4;}
END {for (e in name) print e","name[e]","logondt[e]","role[e];}
' ${tmpFile}
0
 
MikeOM_DBACommented:

PS: add this to last line:
...etc...
' ${tmpFile} >$outFile
0
 
MikeOM_DBACommented:

Ooops, to remove extra comma use:

awk '
{name[$1]=$2; logondt[$1]=$3; role[$1]=role[$1]","$4;}
END {for (e in name) print e","name[e]","logondt[e] role[e];}
' ${tmpFile}  >$outFile

0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
SimonBlakeAuthor Commented:

It's damn close... couple of things tho - it's concatinating the dates instead of the roles and putting extra comma's where spaces exist, eg between the first and second names  - the separator is tab in the sql output I think.

I can sed the line to remove the tabs with a different delimeter easily tho - I struggle with awk quite a bit tho, altho I've just noticed

The target is less than 45 secs to output the whole lot ( laptop with vbscript time ;)  )

eg. (I got - names deleted to protect the innocent)
some.one@somewhere.com,First,Second,18/09/2006,18/09/2006,18/09/2006,18/09/2006,18/09/2006,18/09/2006
0
 
MikeOM_DBACommented:

Better post a sample of the original data.

Use this for tab delimiters:
IFS="\009"
awk '
{ name[$1]=$2; logondt[$1]=$3; role[$1]=role[$1]","$4; }
END {for (e in name) print e","name[e]","logondt[e]role[e];}
' tmpFile >outFile

PS: The extra comma is resolved on my last post.
0
 
bpmurrayCommented:
sed might be faster:
    sed -e "s/^\(.*\)\t\(.*\)\t\(.*\)\t\(.*\)$/\1,\2,\3,\4/"
0
 
bpmurrayCommented:
Just tried it with sed - took 5 seconds.
0
 
SimonBlakeAuthor Commented:

Thanks Mike, had to make a few changes, modified the sql statement so that it didn't output random tabs/spaces between columns, and also added a specified delimiter to tighten up the output... then in the awk line used -F to specify the field sep.

sorry bpmurray, your sed line gave me a 0 byte file, I tried myself just using sed lines but couldn't get it to work.

S.
0
 
MikeOM_DBACommented:

And the final timming is?
0
 
bpmurrayCommented:
Just realised - there's more to this than simple extracting the columns as sed does. Anyway that was just the script, the whole thing is:

     sed -e "s/^\(.*\)\t\(.*\)\t\(.*\)\t\(.*\)$/\1,\2,\3,\4/" input.file >output.file
0
 
SimonBlakeAuthor Commented:

Down from 30 mins (original ksh script) to about 32 seconds so as expected (for the whole process - select statement, generate report and mail), faster than, my low on memory, too many processes/threads loaded, windows 2000 laptop with the latest pre-vista-launch go slow patches by 11 secs (altho that was non compiled vb script to be fair, but included all the same work)...

Thanks.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 5
  • 3
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now