Update otherwise insert (A.K.A Upsert), vs Truncating data. Which operation will be more efficient?

Writing a nightly process, which will query all inventory in our current CRM and dump all the data to Postgress 9.2 schema. The goal is to create a single system snapshot containing the latest records for reporting every night. The source data for some of the records may change during the course of the day. Since this is a complete data dump, it will be fairly large. I am deliberating which approach will be more efficient. Please also not that I am using the npgSQL data provider from Visual Studio 2013.

Do I:

A.) Truncate all records in the schema and reload all records every night?

B.) Use the following technique to perform an Upsert? (http://www.the-art-of-web.com/sql/upsert/)

Any other suggestions are also welcome...
MRSAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

chaauCommented:
Upserting only makes sense when you are adding a set of records to the existing set. Imagine this situation: you have a library of books, say there are 10,000 books. Another library nearby is closing for renovation and bringing their books to your library. You can't truncate data from your tables. Obviously, you would have some titles already in your library: you would not want to create duplicate entries. For the existing entries you would have to just update the quantity. There would be also books in the new collection that do not exist in your database. they will need to be inserted as new records. The upsert will work for this situation.

In your case, however, the new collection is that is ultimately should become the new database. Not only Upsert will be a complete waste of time, but it can potentially lead to the incorrect data. Imagine that during the day some data from your new collection has been deleted. Upsert will not catch this situation: it is designed for Insert / Update. When you run the statement the records in the target that do not exist in the source will not be affected. They will stay as they are. MS SQL MERGE have a built-in support for this, you can specify "WHEN NOT MATCHED BY SOURCE THEN DELETE". The technique described in the article provided by you does not cover this scenario.

Bottomline: TRUNCATING then INSERTING is not only more performant, but better suits in some cases
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
MRSAuthor Commented:
Excellent example. Thank you! I did fail to mention that records will never be deleted from the source. Assume we will be inserting 5 million records, but only 2 million of these may have changed in some way. Is it still better to TRUNCATE?
0
chaauCommented:
I think if the new set represents the whole table truncate is better. Do not forget, Postgresql has the COPY command for bulk inserting which very effective. When you say "changed in some way" you will have to find this change. E.g. Customer phone has changed. Your upsert syntax will have to check that for matching customer IDs it needs to check if any of the columns do not match. And update only when not matched. Otherwise, it will have to update all 5million rows. My point is that upsert will be more effective if it manages to correctly identify these 2million records and update only them. Plus insert any new records. If you have a LastUpdated column and ot os reliable you can use it, otherwise you will have to locate this particular column that has been changed, or blindly update all rows, which is quite ineffective
0
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

MRSAuthor Commented:
I understand.... makes sense. Do you happen to know if npgSQL supports the COPY command?
0
chaauCommented:
COPY is an SQL command, it is not an external utility. So, I think it can be used by npgSQL. Please note that the file that you are copying from should be accessible by the server. It is better to put the source file somewhere on the server or a network path and specify the path as a UNC path
0
gheistCommented:
Why not establish Postgresql replication and just keep it back for working hours and kick in at night?
0
MRSAuthor Commented:
gheist - The source data comes from non ODBC data source. Otherwise that's a good suggestion!

chaau - I did get this working with the COPY command. Apparently the copy command can take a direct stream. I was able to query my source data and feed it directly to the COPY command as a stream.

Thanks for all the good suggestions...
0
gheistCommented:
It has nothing to do with ODBC.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PostgreSQL

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.