Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Need a suggestion for indexing many delimited files

Posted on 2013-01-15
2
Medium Priority
?
314 Views
Last Modified: 2013-02-08
Hello,

I am in the process of dealing with an extremely large amount of specific delimited files that I need to index. Each file consists of a file name (and path), and the file contains about 30 pieces  of information, each separated by a pipe ( | ) that I would like to index into a database and be able to report on, or search for specific pieces of information and find the file name (and path) that the specific piece of information comes from.

I was hoping to turn this into a php app / web page that is able to query the mysql database that holds all of the imported data ... An update process for incoming new files as well somehow needs to happen as well.

Can anyone point me in a direction?
0
Comment
Question by:jkeegan123
2 Comments
 
LVL 18

Expert Comment

by:Mark Gilbert
ID: 38781653
Hi, this might sound crazy right now but I have a feeling you'll be amazed by the results. The solution: Splunk. For some reason Splunk have nailed data structures. Pump almost anything in and it will store the data, and then allow you to query against it, or filter on specific values just by clicking on them. For more custom reporting you can use a simple query language that they have. I do recommend it, especially in this situation.
0
 
LVL 1

Accepted Solution

by:
rzorn earned 2000 total points
ID: 38781894
If the position of the pipe-separated fields is consistent (field 7 will always contain the same kind of information), you can create a sql table with the fields: id (auto_inc), path, filename, and all fields in order;
Then you want to convert the delimited files with an automated script, something like:

You can use the command-line linux utils find and sed to achieve this with a one-line script which you can cron:

find *.csv | sed -r "s/""/'/g" | sed -r "s/^/INSERT INTO tmpt VALUES (""/g" | sed -r "s/\|/"", ""/g" | sed -r "s/$/"");/g" > main_sql_insert.sql

If you need to filter out some fields, based on specific conditions, you can use cut to extract only the fields you want:

cut -f "1,2,4,5,10,12-19,23,24" filename_to_parse.csv

Once you have your .sql file, you can import it with
 mysql -u testuser --password="testpassw" testdb < main_sql_insert.sql

or, you can write a php script, but it'll take longer.
0

Featured Post

Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Q&A with Course Creator, Mark Lassoff, on the importance of HTML5 in the career of a modern-day developer.
If you are a mobile app developer and especially develop hybrid mobile apps then these 4 mistakes you must avoid for hybrid app development to be the more genuine app developer.
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…

783 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question