Link to home
Start Free TrialLog in
Avatar of MRS
MRSFlag for United States of America

asked on

Update otherwise insert (A.K.A Upsert), vs Truncating data. Which operation will be more efficient?

Writing a nightly process, which will query all inventory in our current CRM and dump all the data to Postgress 9.2 schema. The goal is to create a single system snapshot containing the latest records for reporting every night. The source data for some of the records may change during the course of the day. Since this is a complete data dump, it will be fairly large. I am deliberating which approach will be more efficient. Please also not that I am using the npgSQL data provider from Visual Studio 2013.

Do I:

A.) Truncate all records in the schema and reload all records every night?

B.) Use the following technique to perform an Upsert? (http://www.the-art-of-web.com/sql/upsert/)

Any other suggestions are also welcome...
ASKER CERTIFIED SOLUTION
Avatar of chaau
chaau
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of MRS

ASKER

Excellent example. Thank you! I did fail to mention that records will never be deleted from the source. Assume we will be inserting 5 million records, but only 2 million of these may have changed in some way. Is it still better to TRUNCATE?
I think if the new set represents the whole table truncate is better. Do not forget, Postgresql has the COPY command for bulk inserting which very effective. When you say "changed in some way" you will have to find this change. E.g. Customer phone has changed. Your upsert syntax will have to check that for matching customer IDs it needs to check if any of the columns do not match. And update only when not matched. Otherwise, it will have to update all 5million rows. My point is that upsert will be more effective if it manages to correctly identify these 2million records and update only them. Plus insert any new records. If you have a LastUpdated column and ot os reliable you can use it, otherwise you will have to locate this particular column that has been changed, or blindly update all rows, which is quite ineffective
Avatar of MRS

ASKER

I understand.... makes sense. Do you happen to know if npgSQL supports the COPY command?
COPY is an SQL command, it is not an external utility. So, I think it can be used by npgSQL. Please note that the file that you are copying from should be accessible by the server. It is better to put the source file somewhere on the server or a network path and specify the path as a UNC path
Why not establish Postgresql replication and just keep it back for working hours and kick in at night?
Avatar of MRS

ASKER

gheist - The source data comes from non ODBC data source. Otherwise that's a good suggestion!

chaau - I did get this working with the COPY command. Apparently the copy command can take a direct stream. I was able to query my source data and feed it directly to the COPY command as a stream.

Thanks for all the good suggestions...
It has nothing to do with ODBC.