• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 319
  • Last Modified:

Need a suggestion for indexing many delimited files

Hello,

I am in the process of dealing with an extremely large amount of specific delimited files that I need to index. Each file consists of a file name (and path), and the file contains about 30 pieces  of information, each separated by a pipe ( | ) that I would like to index into a database and be able to report on, or search for specific pieces of information and find the file name (and path) that the specific piece of information comes from.

I was hoping to turn this into a php app / web page that is able to query the mysql database that holds all of the imported data ... An update process for incoming new files as well somehow needs to happen as well.

Can anyone point me in a direction?
0
jkeegan123
Asked:
jkeegan123
1 Solution
 
Mark GilbertSenior Performance EngineerCommented:
Hi, this might sound crazy right now but I have a feeling you'll be amazed by the results. The solution: Splunk. For some reason Splunk have nailed data structures. Pump almost anything in and it will store the data, and then allow you to query against it, or filter on specific values just by clicking on them. For more custom reporting you can use a simple query language that they have. I do recommend it, especially in this situation.
0
 
rzornCommented:
If the position of the pipe-separated fields is consistent (field 7 will always contain the same kind of information), you can create a sql table with the fields: id (auto_inc), path, filename, and all fields in order;
Then you want to convert the delimited files with an automated script, something like:

You can use the command-line linux utils find and sed to achieve this with a one-line script which you can cron:

find *.csv | sed -r "s/""/'/g" | sed -r "s/^/INSERT INTO tmpt VALUES (""/g" | sed -r "s/\|/"", ""/g" | sed -r "s/$/"");/g" > main_sql_insert.sql

If you need to filter out some fields, based on specific conditions, you can use cut to extract only the fields you want:

cut -f "1,2,4,5,10,12-19,23,24" filename_to_parse.csv

Once you have your .sql file, you can import it with
 mysql -u testuser --password="testpassw" testdb < main_sql_insert.sql

or, you can write a php script, but it'll take longer.
0

Featured Post

Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now