quicker to use command line than database?

andieje
andieje used Ask the Experts™
on
Hello

One of the tasks i regularly need to do is read through the rows of data in a text file, see if an entry already exists in the database for that entry and if not, create it. This can be quite slow. Having gained a bit more experience of linux commands i'm actually thinking it is quicker to export all the records for the mysql table in question into a tab file, compare this tab file to my input file using linux join, and then mysqlimport the records from the input file that 'fail' the join (i.e. they fail the join as they don't exist in the table)

This is significantly less coding than writing a perl script to query if the records exist and creating insert statements to insert the data.

Another thing i often need to do is, given a input file of data, i need to get the database ids for these input records in the database. I ususally do this by iterating over the file and and querying the database table to get the database id for each input. However for thousands of records i'm thinking again it might be easier to export the table into a tab file and linux join the exported file to the input file. This way the file will have the original input data plus the database id

I was wondering if anyone has any comments or reasons why not to use this approach. I know one possible caveat is lack of error checking for correct input format (though you can do a fair bit of that in mysqlimport/load data infile) but we dont do that in the perl script anyhow as we know the data is ok.

Exporting and importing data in mysql is so quick, and linux command line is so quick that i think this approach is ok

thanks
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2011
Commented:
1. outfile your database table as table.csv
2. Sort the table.csv and txt file with certain column
3. Compare those two sorted files (or join) and output the difference.
4. Based on difference and insert or update into database

or
3. Combine/join the two files into one file
4. Upload/Import into Database

Sounds ok to me.

I would add
4. Import into Database with table_name_new, rename old table with _old and new table to original table name. If everything is fine for a certain period. Then drop the _old table.
Importing and exporting a lot of data is overhead if u ask me. If ur data queries are slow than perhaps ur forgot to add indexes on fields ur are searching on. This would make the search way faster.

Author

Commented:
Everything is indexed it is just really slow to check if a record exists and then add new ones when you have a lot of records split across a lot of tables. I could combine the inserts into a multi insert statement for speed but i've been doing command line stuff recently and it seems so much faster
Acronis in Gartner 2019 MQ for datacenter backup

It is an honor to be featured in Gartner 2019 Magic Quadrant for Datacenter Backup and Recovery Solutions. Gartner’s MQ sets a high standard and earning a place on their grid is a great affirmation that Acronis is delivering on our mission to protect all data, apps, and systems.

Author

Commented:
I admit this does seem counter intuitive to use the command line than the native database features which are designed specifically for that purpose - that was why i asked the question really
Top Expert 2015

Commented:
What would you do with 1TB database?

http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

No need to reinvent the wheel....
Top Expert 2015

Commented:
Tiy can batch all (or at least more than one) inserts in a single transaction.

Author

Commented:
@gheist - please could you elaborate your comments.
Top Expert 2015

Commented:
Approximate SQL load code
BEGIN QQQ
http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html
1000x or so.... (you can generate SQL code in PERL for example and feed to SQL command line util, it is quite similar to shell syntax you are used to)
...
...
COMMIT QQQ

Top Expert 2014

Commented:
create a unique index on your table.
Append all the data.  Only those key values that aren't already in the table will be inserted.
Top Expert 2015

Commented:
there should be some balance - the more indices you have the heavier the eventual update. yes, unique key is good to make sure particular transaction works perfectly and quickly.
actually one should benchmark if prepared statement or transaction or both works better in the end.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial