Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Performance Challenge... converting multi lines sql output into single lines csv...

Posted on 2006-11-06
11
Medium Priority
?
435 Views
Last Modified: 2013-12-26

Hi Unix experts,  I have a small script that takes an input from a oracle select statement that looks like this:
(Its sorted by email so they are grouped, and all distinct rows)

usereamail1     user name     date logged in     role1
usereamail1     user name     date logged in     role2
usereamail2     user name     date logged in     role1
usereamail2     user name     date logged in     role2
...


and converts it to:

usereamail1, user name, date logged in, role1, role2
usereamail2, user name, date logged in, role1, role2
usereamail3, user name, date logged in, role6
usereamail4, user name, date logged in, role1, role3, role5, role6, role7
...

This is then emailed to the system owner as a csv file which works ok.

It has to work through 53,000+ records brings it down to about 20K, but it is taking 1/2 hr to run... How can I speed up the code (below) that does the main processing as my ksh (this is on a HP box) skills are pretty basic.


while read inputLine
do
  if [ "${inputLine}" !=  "" ] ; then

    email=`echo "$inputLine" | cut -d"," -f1 `
    name=`echo "$inputLine" |  cut -d"," -f2`
    logondate=`echo "$inputLine" | cut -d"," -f3`
    role=`echo "$inputLine" |  cut -d"," -f4`

    if [ "$email" != "$lastEmail" ] ; then
      echo "$lastEmail,$oldname,$oldlogon,$outputRoles" >>$outFile
      outputRoles=""
    fi

    outstr=`echo "$email,$name,$logondate"`
    oldname=$name
    oldlogon=$logondate
    lastEmail=$email
    outputRoles=`echo "${outputRoles}${role},"`
  fi
done < ${tmpFile}

Thanks,
S.
0
Comment
Question by:SimonBlake
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
  • 3
11 Comments
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 17882118

You could try using awk:

awk '
{name[$1]=$2; logondt[$1]=$3; role[$1]=role[$1]","$4;}
END {for (e in name) print e","name[e]","logondt[e]","role[e];}
' ${tmpFile}
0
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 17882170

PS: add this to last line:
...etc...
' ${tmpFile} >$outFile
0
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 17882194

Ooops, to remove extra comma use:

awk '
{name[$1]=$2; logondt[$1]=$3; role[$1]=role[$1]","$4;}
END {for (e in name) print e","name[e]","logondt[e] role[e];}
' ${tmpFile}  >$outFile

0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 7

Author Comment

by:SimonBlake
ID: 17882392

It's damn close... couple of things tho - it's concatinating the dates instead of the roles and putting extra comma's where spaces exist, eg between the first and second names  - the separator is tab in the sql output I think.

I can sed the line to remove the tabs with a different delimeter easily tho - I struggle with awk quite a bit tho, altho I've just noticed

The target is less than 45 secs to output the whole lot ( laptop with vbscript time ;)  )

eg. (I got - names deleted to protect the innocent)
some.one@somewhere.com,First,Second,18/09/2006,18/09/2006,18/09/2006,18/09/2006,18/09/2006,18/09/2006
0
 
LVL 29

Accepted Solution

by:
MikeOM_DBA earned 2000 total points
ID: 17883016

Better post a sample of the original data.

Use this for tab delimiters:
IFS="\009"
awk '
{ name[$1]=$2; logondt[$1]=$3; role[$1]=role[$1]","$4; }
END {for (e in name) print e","name[e]","logondt[e]role[e];}
' tmpFile >outFile

PS: The extra comma is resolved on my last post.
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 17884312
sed might be faster:
    sed -e "s/^\(.*\)\t\(.*\)\t\(.*\)\t\(.*\)$/\1,\2,\3,\4/"
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 17884354
Just tried it with sed - took 5 seconds.
0
 
LVL 7

Author Comment

by:SimonBlake
ID: 17888480

Thanks Mike, had to make a few changes, modified the sql statement so that it didn't output random tabs/spaces between columns, and also added a specified delimiter to tighten up the output... then in the awk line used -F to specify the field sep.

sorry bpmurray, your sed line gave me a 0 byte file, I tried myself just using sed lines but couldn't get it to work.

S.
0
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 17890752

And the final timming is?
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 17892630
Just realised - there's more to this than simple extracting the columns as sed does. Anyway that was just the script, the whole thing is:

     sed -e "s/^\(.*\)\t\(.*\)\t\(.*\)\t\(.*\)$/\1,\2,\3,\4/" input.file >output.file
0
 
LVL 7

Author Comment

by:SimonBlake
ID: 17892786

Down from 30 mins (original ksh script) to about 32 seconds so as expected (for the whole process - select statement, generate report and mail), faster than, my low on memory, too many processes/threads loaded, windows 2000 laptop with the latest pre-vista-launch go slow patches by 11 secs (altho that was non compiled vb script to be fair, but included all the same work)...

Thanks.
0

Featured Post

Learn how to optimize MySQL for your business need

With the increasing importance of apps & networks in both business & personal interconnections, perfor. has become one of the key metrics of successful communication. This ebook is a hands-on business-case-driven guide to understanding MySQL query parameter tuning & database perf

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction: Finishing the grid – keyboard support for arrow keys to manoeuvre, entering the numbers.  The PreTranslateMessage function is to be used to intercept and respond to keyboard events. Continuing from the fourth article about sudoku. …
If you use Adobe Reader X it is possible you can't open OLE PDF documents in the standard. The reason is the 'save box mode' in adobe reader X. Many people think the protected Mode of adobe reader x is only to stop the write access. But this fe…
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…
Suggested Courses

704 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question