Solved

Performance Challenge... converting multi lines sql output into single lines csv...

Posted on 2006-11-06
11
415 Views
Last Modified: 2013-12-26

Hi Unix experts,  I have a small script that takes an input from a oracle select statement that looks like this:
(Its sorted by email so they are grouped, and all distinct rows)

usereamail1     user name     date logged in     role1
usereamail1     user name     date logged in     role2
usereamail2     user name     date logged in     role1
usereamail2     user name     date logged in     role2
...


and converts it to:

usereamail1, user name, date logged in, role1, role2
usereamail2, user name, date logged in, role1, role2
usereamail3, user name, date logged in, role6
usereamail4, user name, date logged in, role1, role3, role5, role6, role7
...

This is then emailed to the system owner as a csv file which works ok.

It has to work through 53,000+ records brings it down to about 20K, but it is taking 1/2 hr to run... How can I speed up the code (below) that does the main processing as my ksh (this is on a HP box) skills are pretty basic.


while read inputLine
do
  if [ "${inputLine}" !=  "" ] ; then

    email=`echo "$inputLine" | cut -d"," -f1 `
    name=`echo "$inputLine" |  cut -d"," -f2`
    logondate=`echo "$inputLine" | cut -d"," -f3`
    role=`echo "$inputLine" |  cut -d"," -f4`

    if [ "$email" != "$lastEmail" ] ; then
      echo "$lastEmail,$oldname,$oldlogon,$outputRoles" >>$outFile
      outputRoles=""
    fi

    outstr=`echo "$email,$name,$logondate"`
    oldname=$name
    oldlogon=$logondate
    lastEmail=$email
    outputRoles=`echo "${outputRoles}${role},"`
  fi
done < ${tmpFile}

Thanks,
S.
0
Comment
Question by:SimonBlake
  • 5
  • 3
  • 3
11 Comments
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 17882118

You could try using awk:

awk '
{name[$1]=$2; logondt[$1]=$3; role[$1]=role[$1]","$4;}
END {for (e in name) print e","name[e]","logondt[e]","role[e];}
' ${tmpFile}
0
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 17882170

PS: add this to last line:
...etc...
' ${tmpFile} >$outFile
0
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 17882194

Ooops, to remove extra comma use:

awk '
{name[$1]=$2; logondt[$1]=$3; role[$1]=role[$1]","$4;}
END {for (e in name) print e","name[e]","logondt[e] role[e];}
' ${tmpFile}  >$outFile

0
 
LVL 7

Author Comment

by:SimonBlake
ID: 17882392

It's damn close... couple of things tho - it's concatinating the dates instead of the roles and putting extra comma's where spaces exist, eg between the first and second names  - the separator is tab in the sql output I think.

I can sed the line to remove the tabs with a different delimeter easily tho - I struggle with awk quite a bit tho, altho I've just noticed

The target is less than 45 secs to output the whole lot ( laptop with vbscript time ;)  )

eg. (I got - names deleted to protect the innocent)
some.one@somewhere.com,First,Second,18/09/2006,18/09/2006,18/09/2006,18/09/2006,18/09/2006,18/09/2006
0
 
LVL 29

Accepted Solution

by:
MikeOM_DBA earned 500 total points
ID: 17883016

Better post a sample of the original data.

Use this for tab delimiters:
IFS="\009"
awk '
{ name[$1]=$2; logondt[$1]=$3; role[$1]=role[$1]","$4; }
END {for (e in name) print e","name[e]","logondt[e]role[e];}
' tmpFile >outFile

PS: The extra comma is resolved on my last post.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 15

Expert Comment

by:bpmurray
ID: 17884312
sed might be faster:
    sed -e "s/^\(.*\)\t\(.*\)\t\(.*\)\t\(.*\)$/\1,\2,\3,\4/"
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 17884354
Just tried it with sed - took 5 seconds.
0
 
LVL 7

Author Comment

by:SimonBlake
ID: 17888480

Thanks Mike, had to make a few changes, modified the sql statement so that it didn't output random tabs/spaces between columns, and also added a specified delimiter to tighten up the output... then in the awk line used -F to specify the field sep.

sorry bpmurray, your sed line gave me a 0 byte file, I tried myself just using sed lines but couldn't get it to work.

S.
0
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 17890752

And the final timming is?
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 17892630
Just realised - there's more to this than simple extracting the columns as sed does. Anyway that was just the script, the whole thing is:

     sed -e "s/^\(.*\)\t\(.*\)\t\(.*\)\t\(.*\)$/\1,\2,\3,\4/" input.file >output.file
0
 
LVL 7

Author Comment

by:SimonBlake
ID: 17892786

Down from 30 mins (original ksh script) to about 32 seconds so as expected (for the whole process - select statement, generate report and mail), faster than, my low on memory, too many processes/threads loaded, windows 2000 laptop with the latest pre-vista-launch go slow patches by 11 secs (altho that was non compiled vb script to be fair, but included all the same work)...

Thanks.
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Windows Drag & Drop Location 2 85
ORA-01403: no data found 43 83
Define unique primary key 9 81
Detect CR LF to each line 12 137
Here is how to use MFC's automatic Radio Button handling in your dialog boxes and forms.  Beginner programmers usually start with a OnClick handler for each radio button and that's just not the right way to go.  MFC has a very cool system for handli…
This is to be the first in a series of articles demonstrating the development of a complete windows based application using the MFC classes.  I’ll try to keep each article focused on one (or a couple) of the tasks that one may meet.   Introductio…
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now