Solved

Need a suggestion for indexing many delimited files

Posted on 2013-01-15
2
301 Views
Last Modified: 2013-02-08
Hello,

I am in the process of dealing with an extremely large amount of specific delimited files that I need to index. Each file consists of a file name (and path), and the file contains about 30 pieces  of information, each separated by a pipe ( | ) that I would like to index into a database and be able to report on, or search for specific pieces of information and find the file name (and path) that the specific piece of information comes from.

I was hoping to turn this into a php app / web page that is able to query the mysql database that holds all of the imported data ... An update process for incoming new files as well somehow needs to happen as well.

Can anyone point me in a direction?
0
Comment
Question by:jkeegan123
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 18

Expert Comment

by:Mark Gilbert
ID: 38781653
Hi, this might sound crazy right now but I have a feeling you'll be amazed by the results. The solution: Splunk. For some reason Splunk have nailed data structures. Pump almost anything in and it will store the data, and then allow you to query against it, or filter on specific values just by clicking on them. For more custom reporting you can use a simple query language that they have. I do recommend it, especially in this situation.
0
 
LVL 1

Accepted Solution

by:
rzorn earned 500 total points
ID: 38781894
If the position of the pipe-separated fields is consistent (field 7 will always contain the same kind of information), you can create a sql table with the fields: id (auto_inc), path, filename, and all fields in order;
Then you want to convert the delimited files with an automated script, something like:

You can use the command-line linux utils find and sed to achieve this with a one-line script which you can cron:

find *.csv | sed -r "s/""/'/g" | sed -r "s/^/INSERT INTO tmpt VALUES (""/g" | sed -r "s/\|/"", ""/g" | sed -r "s/$/"");/g" > main_sql_insert.sql

If you need to filter out some fields, based on specific conditions, you can use cut to extract only the fields you want:

cut -f "1,2,4,5,10,12-19,23,24" filename_to_parse.csv

Once you have your .sql file, you can import it with
 mysql -u testuser --password="testpassw" testdb < main_sql_insert.sql

or, you can write a php script, but it'll take longer.
0

Featured Post

Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What do responsible coders do? They don't take detrimental shortcuts. They do take reasonable security precautions, create important automation, implement sufficient logging, fix things they break, and care about users.
By, Vadim Tkachenko. In this article we’ll look at ClickHouse on its one year anniversary.
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

624 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question