Link to home
Start Free TrialLog in
Avatar of vmoore99
vmoore99Flag for United States of America

asked on

Reading a logfile

I have  a client with an application that in one week can create over a million lines in a tab delimited text file where the respective tab delimitations on a single record could exceed 455 columns.  

I need to create a solution where I can select the text logfile and search for a term and write it to report.  

Some of the text files exceed 100mg.  To read them even into excel I have had to use a tool to slice up the text files.

The logfile, for legal reasons, cannot leave the network it  is in.  The creator of the application knows that they have a problem and they have not created a solution.  

I am open to any solutions someone might have to offer.  Right now I do not have access to a server running a scripting language in the network that I could use to write a web based app.  I have asked the IT person to see if they can get that for me...I'm not holding my breath.

Thank you!
SOLUTION
Avatar of Darrell Porter
Darrell Porter
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Bill Prew
Bill Prew

You might want to take a look at the Microsoft LogParser utility.  I'm not sure how well it will perform on a CSV file of 100MB, but it can give you a slick way to pull the data out.  Basically it provides a SQL based command set against non database data, in your case a CSV file.  For certain tasks it's a killer utility, but you will need to experiment with it a bit to determine if it could work for you.

Here's a wiki page about it, and then you can link to the Microsoft download page.

http://en.wikipedia.org/wiki/Logparser

~bp
billprew : excellent point.  That parser is awesome if you need to actually do work on the data at the same time.  It proved very useful in dealing with IIS logs for me.

If you're after just a quickie extract the grep tool is crazy fast though.  Not sure the native Windows tool was so much slower.
There's also grepWin at
http://stefanstools.sourceforge.net/grepWin.html
which can search using a string, phrase, or regular expression.
@vmoore99,

Can you tell us a little more about the use case?  Meaning how do you need to query the data specifically and what do you want the output to be - full original records, or just selected columns?  Is there a column title header row as part of the data?  Are you just doing simple string searches across all the columns, or limited to specific columns?  Any need for regex type searches?  How often do you run these searches, is it frequent or rarely?  Etc...

~bp
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of vmoore99

ASKER

There is no header record.  There is no definition of the columns.  I do have some idea of what the first 12 columns are but the rest of it is just a recording of every movement a person makes in this application.  The data they see, the records they access.  

The query I would need would be to see all records that have a particular string anywhere in the record.  It could be column 9 or column 125.  I would then need to print out the entire record to see the movement across the system based on that string search.

The searches would be needed frequently enough (a couple times a week) to monitor employee movement and legal searches.  There's a whole body of law that must be adhered to when accessing this application.  

I am impressed at the ready response.  Thank you.  It will take me a couple of days to wander through the respective solutions.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Why don't you introduce rotation on a daily basis, handling one magnitude smaller files at one step?

That said, grep is indeed your best bet if you know exactly (at least you can build up a decent regular expression) what to look for.
The beta version of the fileseek application worked great and I can turn queries over to the non-technical.  I appreciate all the responses.