Avatar of andieje
andieje

asked on 

quicker to use command line than database?

Hello

One of the tasks i regularly need to do is read through the rows of data in a text file, see if an entry already exists in the database for that entry and if not, create it. This can be quite slow. Having gained a bit more experience of linux commands i'm actually thinking it is quicker to export all the records for the mysql table in question into a tab file, compare this tab file to my input file using linux join, and then mysqlimport the records from the input file that 'fail' the join (i.e. they fail the join as they don't exist in the table)

This is significantly less coding than writing a perl script to query if the records exist and creating insert statements to insert the data.

Another thing i often need to do is, given a input file of data, i need to get the database ids for these input records in the database. I ususally do this by iterating over the file and and querying the database table to get the database id for each input. However for thousands of records i'm thinking again it might be easier to export the table into a tab file and linux join the exported file to the input file. This way the file will have the original input data plus the database id

I was wondering if anyone has any comments or reasons why not to use this approach. I know one possible caveat is lack of error checking for correct input format (though you can do a fair bit of that in mysqlimport/load data infile) but we dont do that in the perl script anyhow as we know the data is ok.

Exporting and importing data in mysql is so quick, and linux command line is so quick that i think this approach is ok

thanks
LinuxMySQL ServerDatabases

Avatar of undefined
Last Comment
gheist
ASKER CERTIFIED SOLUTION
Avatar of wesly_chen
wesly_chen
Flag of United States of America image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of PimOnline
PimOnline

Importing and exporting a lot of data is overhead if u ask me. If ur data queries are slow than perhaps ur forgot to add indexes on fields ur are searching on. This would make the search way faster.
Avatar of andieje
andieje

ASKER

Everything is indexed it is just really slow to check if a record exists and then add new ones when you have a lot of records split across a lot of tables. I could combine the inserts into a multi insert statement for speed but i've been doing command line stuff recently and it seems so much faster
Avatar of andieje
andieje

ASKER

I admit this does seem counter intuitive to use the command line than the native database features which are designed specifically for that purpose - that was why i asked the question really
Avatar of gheist
gheist
Flag of Belgium image

What would you do with 1TB database?

http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

No need to reinvent the wheel....
Avatar of gheist
gheist
Flag of Belgium image

Tiy can batch all (or at least more than one) inserts in a single transaction.
Avatar of andieje
andieje

ASKER

@gheist - please could you elaborate your comments.
Avatar of gheist
gheist
Flag of Belgium image

Approximate SQL load code
BEGIN QQQ
http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html
1000x or so.... (you can generate SQL code in PERL for example and feed to SQL command line util, it is quite similar to shell syntax you are used to)
...
...
COMMIT QQQ

Avatar of aikimark
aikimark
Flag of United States of America image

create a unique index on your table.
Append all the data.  Only those key values that aren't already in the table will be inserted.
Avatar of gheist
gheist
Flag of Belgium image

there should be some balance - the more indices you have the heavier the eventual update. yes, unique key is good to make sure particular transaction works perfectly and quickly.
actually one should benchmark if prepared statement or transaction or both works better in the end.
Linux
Linux

Linux is a UNIX-like open source operating system with hundreds of distinct distributions, including: Fedora, openSUSE, Ubuntu, Debian, Slackware, Gentoo, CentOS, and Arch Linux. Linux is generally associated with web and database servers, but has become popular in many niche industries and applications.

71K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo