Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Need a suggestion for indexing many delimited files

Posted on 2013-01-15
2
Medium Priority
?
303 Views
Last Modified: 2013-02-08
Hello,

I am in the process of dealing with an extremely large amount of specific delimited files that I need to index. Each file consists of a file name (and path), and the file contains about 30 pieces  of information, each separated by a pipe ( | ) that I would like to index into a database and be able to report on, or search for specific pieces of information and find the file name (and path) that the specific piece of information comes from.

I was hoping to turn this into a php app / web page that is able to query the mysql database that holds all of the imported data ... An update process for incoming new files as well somehow needs to happen as well.

Can anyone point me in a direction?
0
Comment
Question by:jkeegan123
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 18

Expert Comment

by:Mark Gilbert
ID: 38781653
Hi, this might sound crazy right now but I have a feeling you'll be amazed by the results. The solution: Splunk. For some reason Splunk have nailed data structures. Pump almost anything in and it will store the data, and then allow you to query against it, or filter on specific values just by clicking on them. For more custom reporting you can use a simple query language that they have. I do recommend it, especially in this situation.
0
 
LVL 1

Accepted Solution

by:
rzorn earned 2000 total points
ID: 38781894
If the position of the pipe-separated fields is consistent (field 7 will always contain the same kind of information), you can create a sql table with the fields: id (auto_inc), path, filename, and all fields in order;
Then you want to convert the delimited files with an automated script, something like:

You can use the command-line linux utils find and sed to achieve this with a one-line script which you can cron:

find *.csv | sed -r "s/""/'/g" | sed -r "s/^/INSERT INTO tmpt VALUES (""/g" | sed -r "s/\|/"", ""/g" | sed -r "s/$/"");/g" > main_sql_insert.sql

If you need to filter out some fields, based on specific conditions, you can use cut to extract only the fields you want:

cut -f "1,2,4,5,10,12-19,23,24" filename_to_parse.csv

Once you have your .sql file, you can import it with
 mysql -u testuser --password="testpassw" testdb < main_sql_insert.sql

or, you can write a php script, but it'll take longer.
0

Featured Post

10 Questions to Ask when Buying Backup Software

Choosing the right backup solution for your organization can be a daunting task. To make the selection process easier, ask solution providers these 10 key questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this blog, we’ll look at how improvements to Percona XtraDB Cluster improved IST performance.
If you are a mobile app developer and especially develop hybrid mobile apps then these 4 mistakes you must avoid for hybrid app development to be the more genuine app developer.
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

670 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question