We help IT Professionals succeed at work.

We've partnered with Certified Experts, Carl Webster and Richard Faulkner, to bring you a podcast all about Citrix Workspace, moving to the cloud, and analytics & intelligence. Episode 2 coming soon!Listen Now

x

Performance Challenge... converting multi lines sql output into single lines csv...

SimonBlake
SimonBlake asked
on
Medium Priority
473 Views
Last Modified: 2013-12-26

Hi Unix experts,  I have a small script that takes an input from a oracle select statement that looks like this:
(Its sorted by email so they are grouped, and all distinct rows)

usereamail1     user name     date logged in     role1
usereamail1     user name     date logged in     role2
usereamail2     user name     date logged in     role1
usereamail2     user name     date logged in     role2
...


and converts it to:

usereamail1, user name, date logged in, role1, role2
usereamail2, user name, date logged in, role1, role2
usereamail3, user name, date logged in, role6
usereamail4, user name, date logged in, role1, role3, role5, role6, role7
...

This is then emailed to the system owner as a csv file which works ok.

It has to work through 53,000+ records brings it down to about 20K, but it is taking 1/2 hr to run... How can I speed up the code (below) that does the main processing as my ksh (this is on a HP box) skills are pretty basic.


while read inputLine
do
  if [ "${inputLine}" !=  "" ] ; then

    email=`echo "$inputLine" | cut -d"," -f1 `
    name=`echo "$inputLine" |  cut -d"," -f2`
    logondate=`echo "$inputLine" | cut -d"," -f3`
    role=`echo "$inputLine" |  cut -d"," -f4`

    if [ "$email" != "$lastEmail" ] ; then
      echo "$lastEmail,$oldname,$oldlogon,$outputRoles" >>$outFile
      outputRoles=""
    fi

    outstr=`echo "$email,$name,$logondate"`
    oldname=$name
    oldlogon=$logondate
    lastEmail=$email
    outputRoles=`echo "${outputRoles}${role},"`
  fi
done < ${tmpFile}

Thanks,
S.
Comment
Watch Question


You could try using awk:

awk '
{name[$1]=$2; logondt[$1]=$3; role[$1]=role[$1]","$4;}
END {for (e in name) print e","name[e]","logondt[e]","role[e];}
' ${tmpFile}

PS: add this to last line:
...etc...
' ${tmpFile} >$outFile

Ooops, to remove extra comma use:

awk '
{name[$1]=$2; logondt[$1]=$3; role[$1]=role[$1]","$4;}
END {for (e in name) print e","name[e]","logondt[e] role[e];}
' ${tmpFile}  >$outFile

Author

Commented:

It's damn close... couple of things tho - it's concatinating the dates instead of the roles and putting extra comma's where spaces exist, eg between the first and second names  - the separator is tab in the sql output I think.

I can sed the line to remove the tabs with a different delimeter easily tho - I struggle with awk quite a bit tho, altho I've just noticed

The target is less than 45 secs to output the whole lot ( laptop with vbscript time ;)  )

eg. (I got - names deleted to protect the innocent)
some.one@somewhere.com,First,Second,18/09/2006,18/09/2006,18/09/2006,18/09/2006,18/09/2006,18/09/2006

Better post a sample of the original data.

Use this for tab delimiters:
IFS="\009"
awk '
{ name[$1]=$2; logondt[$1]=$3; role[$1]=role[$1]","$4; }
END {for (e in name) print e","name[e]","logondt[e]role[e];}
' tmpFile >outFile

PS: The extra comma is resolved on my last post.

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts

Commented:
sed might be faster:
    sed -e "s/^\(.*\)\t\(.*\)\t\(.*\)\t\(.*\)$/\1,\2,\3,\4/"

Commented:
Just tried it with sed - took 5 seconds.

Author

Commented:

Thanks Mike, had to make a few changes, modified the sql statement so that it didn't output random tabs/spaces between columns, and also added a specified delimiter to tighten up the output... then in the awk line used -F to specify the field sep.

sorry bpmurray, your sed line gave me a 0 byte file, I tried myself just using sed lines but couldn't get it to work.

S.

And the final timming is?

Commented:
Just realised - there's more to this than simple extracting the columns as sed does. Anyway that was just the script, the whole thing is:

     sed -e "s/^\(.*\)\t\(.*\)\t\(.*\)\t\(.*\)$/\1,\2,\3,\4/" input.file >output.file

Author

Commented:

Down from 30 mins (original ksh script) to about 32 seconds so as expected (for the whole process - select statement, generate report and mail), faster than, my low on memory, too many processes/threads loaded, windows 2000 laptop with the latest pre-vista-launch go slow patches by 11 secs (altho that was non compiled vb script to be fair, but included all the same work)...

Thanks.
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.