Solved

Need a suggestion for indexing many delimited files

Posted on 2013-01-15
2
298 Views
Last Modified: 2013-02-08
Hello,

I am in the process of dealing with an extremely large amount of specific delimited files that I need to index. Each file consists of a file name (and path), and the file contains about 30 pieces  of information, each separated by a pipe ( | ) that I would like to index into a database and be able to report on, or search for specific pieces of information and find the file name (and path) that the specific piece of information comes from.

I was hoping to turn this into a php app / web page that is able to query the mysql database that holds all of the imported data ... An update process for incoming new files as well somehow needs to happen as well.

Can anyone point me in a direction?
0
Comment
Question by:jkeegan123
2 Comments
 
LVL 18

Expert Comment

by:Mark Gilbert
ID: 38781653
Hi, this might sound crazy right now but I have a feeling you'll be amazed by the results. The solution: Splunk. For some reason Splunk have nailed data structures. Pump almost anything in and it will store the data, and then allow you to query against it, or filter on specific values just by clicking on them. For more custom reporting you can use a simple query language that they have. I do recommend it, especially in this situation.
0
 
LVL 1

Accepted Solution

by:
rzorn earned 500 total points
ID: 38781894
If the position of the pipe-separated fields is consistent (field 7 will always contain the same kind of information), you can create a sql table with the fields: id (auto_inc), path, filename, and all fields in order;
Then you want to convert the delimited files with an automated script, something like:

You can use the command-line linux utils find and sed to achieve this with a one-line script which you can cron:

find *.csv | sed -r "s/""/'/g" | sed -r "s/^/INSERT INTO tmpt VALUES (""/g" | sed -r "s/\|/"", ""/g" | sed -r "s/$/"");/g" > main_sql_insert.sql

If you need to filter out some fields, based on specific conditions, you can use cut to extract only the fields you want:

cut -f "1,2,4,5,10,12-19,23,24" filename_to_parse.csv

Once you have your .sql file, you can import it with
 mysql -u testuser --password="testpassw" testdb < main_sql_insert.sql

or, you can write a php script, but it'll take longer.
0

Featured Post

NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SubQuery link 4 43
backup program with robocopy 6 69
Adding a countdown to HTA 12 92
Hide file extensions in Notepad++ tabs 2 44
This is about my first experience with programming Arduino.
In this post we will learn different types of Android Layout and some basics of an Android App.
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

679 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question