EasyAim
asked on
Fastest method to INSERT and UPDATE
I have a back office application that needs to insert/update thousands of records in a JET database using ADO in a VBA Excel application. There is about an even chance that a new record will be an existing record to be updated or a new record to be inserted.
QUESTION: What is the fastest method of determining if the record exists and if so, update it or if not, insert it?
The choices to me appear to be....
1. Perform a SELECT to see if the record exists. If yes, UPDATE; if not, INSERT.
2. Perform an INSERT. If no error, it worked; if "duplicate error" use UPDATE.
Example VB or VBA code would be appreciated. I'll perform some timing tests of my own if I have to but your input would be appreciated.
QUESTION: What is the fastest method of determining if the record exists and if so, update it or if not, insert it?
The choices to me appear to be....
1. Perform a SELECT to see if the record exists. If yes, UPDATE; if not, INSERT.
2. Perform an INSERT. If no error, it worked; if "duplicate error" use UPDATE.
Example VB or VBA code would be appreciated. I'll perform some timing tests of my own if I have to but your input would be appreciated.
ASKER
I'll check in to it.
Given these 50/50 odds, my advice would be to try the UPDATE first and then check the RecordsAffected property. If no records affected, then execute the INSERT.
It would be best to have your UPDATE and INSERT statements prepackaged into two queries.
Performance of either function will depend on the structure and health of your tables and the indexes on these tables.
It would be best to have your UPDATE and INSERT statements prepackaged into two queries.
Performance of either function will depend on the structure and health of your tables and the indexes on these tables.
ASKER
I'll run some timing tests but since it takes a SQL statement to pull the original record over to UPDATE it, I would bet that it would be better to:
* create a unique key in the table that does not allow duplicates
* perform an insert
* if "duplicate error", then create update
However, the point of my question was in the hope that someone had already tested all this out...
* create a unique key in the table that does not allow duplicates
* perform an insert
* if "duplicate error", then create update
However, the point of my question was in the hope that someone had already tested all this out...
@EasyAim
You have asked us a question that provided scant details on the table(s) and no details on the context of the application's actions.
Therefore, your central question:
<<QUESTION: What is the fastest method of determining if the record exists and if so, update it or if not, insert it?>>
Has produced two good recommendations, based on what we know. We can't test your database or application.
========================== =
It is doubtful that your latest plan of action would produce faster results for two reasons:
1. you are making an extra database call
2. your Insert statement will now require an extra operation to update the unique index.
========================== =
If you supply us with more information, we might ammend our recommendations. Only you can evaluate the best solution for your environment.
You have asked us a question that provided scant details on the table(s) and no details on the context of the application's actions.
Therefore, your central question:
<<QUESTION: What is the fastest method of determining if the record exists and if so, update it or if not, insert it?>>
Has produced two good recommendations, based on what we know. We can't test your database or application.
==========================
It is doubtful that your latest plan of action would produce faster results for two reasons:
1. you are making an extra database call
2. your Insert statement will now require an extra operation to update the unique index.
==========================
If you supply us with more information, we might ammend our recommendations. Only you can evaluate the best solution for your environment.
ASKER
Regardless of whether DAO or ADO is used, it seems to me that a query to first determine if a record exists is a wasted hit to the database if you can create a unique key to the table:
* querying a database to determine if a record exists (1 hit)
* INSERT if it does not exists or UPDATE if it does exists (1 hit)
TOTAL = 2 calls to the database for every record
versus.....
* INSERT record (1 hit)
(if successful: TOTAL = 1 call to the database for some records)
or
(if not successful with dupllicate detected: )
* UPDATE record (1 hit)
TOTAL = 2 calls to the database for some records
Say we had four new changes to be implemented to a database: two are UPDATES and two are new records.
Using the SELECT + INSERT/UPDATE method would require 8 hits to the database.
Using the INSERT + (potential failure + UPDATE) method would require only 6 hits to the database.
It seems to me this would be the fastest method to update a table for any and all database engines yet you rarely see code written to take advantage of this.
It makes me wonder if there is some overhead that I am missing. Does an INSERT attempt take longer than a simple SELECT? I would think that reducing the across-the-ethernet hits to the database would be the most efficient solution but perhaps aikimark is correct that it may be more dependent upon the database engine and schema than I originally thought.
Keep in mind I'm talking about a backoffice massize processing of 10,000+ records. I can see where the SELECT + UPDATE/INSERT is more logical for those applications that require user interface and user input to review, etc...
* querying a database to determine if a record exists (1 hit)
* INSERT if it does not exists or UPDATE if it does exists (1 hit)
TOTAL = 2 calls to the database for every record
versus.....
* INSERT record (1 hit)
(if successful: TOTAL = 1 call to the database for some records)
or
(if not successful with dupllicate detected: )
* UPDATE record (1 hit)
TOTAL = 2 calls to the database for some records
Say we had four new changes to be implemented to a database: two are UPDATES and two are new records.
Using the SELECT + INSERT/UPDATE method would require 8 hits to the database.
Using the INSERT + (potential failure + UPDATE) method would require only 6 hits to the database.
It seems to me this would be the fastest method to update a table for any and all database engines yet you rarely see code written to take advantage of this.
It makes me wonder if there is some overhead that I am missing. Does an INSERT attempt take longer than a simple SELECT? I would think that reducing the across-the-ethernet hits to the database would be the most efficient solution but perhaps aikimark is correct that it may be more dependent upon the database engine and schema than I originally thought.
Keep in mind I'm talking about a backoffice massize processing of 10,000+ records. I can see where the SELECT + UPDATE/INSERT is more logical for those applications that require user interface and user input to review, etc...
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
@EasyAim
Thanks for the points. Were you able to run some tests to compare performance profiles?
Thanks for the points. Were you able to run some tests to compare performance profiles?
ASKER
Honestly... I haven't had time. But I intend to do so and I'll try to get it documented and posted somewhere.
FWIW You should really use DAO with a seek on on index.
It would beat anything else you could do by far.
JimD
@JimD
I disagree. While your method does provide the fastest method for finding if a row exists, this architecture will ALWAYS require two SQL calls. My solution will only make two calls 50% of the time.
======================
@EasyAim
When you get around to testing, you might want to consider wrapping groups of these calls in transactions. A transaction doesn't commit the changes until you invoke the .COMMIT method. Otherwise, a commit happens with every SQL call.
Moreover, it might be possible to queue the INSERTS until you are finished with the updates or have accumulated a 'group'. You will have to decide on the allowable INSERT latency if you adopt this architecture.
It is also possible to have a separate process to perform the INSERT calls during periods of low activity.
I disagree. While your method does provide the fastest method for finding if a row exists, this architecture will ALWAYS require two SQL calls. My solution will only make two calls 50% of the time.
======================
@EasyAim
When you get around to testing, you might want to consider wrapping groups of these calls in transactions. A transaction doesn't commit the changes until you invoke the .COMMIT method. Otherwise, a commit happens with every SQL call.
Moreover, it might be possible to queue the INSERTS until you are finished with the updates or have accumulated a 'group'. You will have to decide on the allowable INSERT latency if you adopt this architecture.
It is also possible to have a separate process to perform the INSERT calls during periods of low activity.
<<I disagree. While your method does provide the fastest method for finding if a row exists, this architecture will ALWAYS require two SQL calls. My solution will only make two calls 50% of the time.>>
Sorry, but no as the setup is totally different. Your not using SQL statements at all and the execution is totally different. Your making calls directly to the database engine, not specifying a SQL statement which must be analyzed, costed, and then executed. There is a lot of overhead associated with processing a SQL statement, all of which is bypassed.
The index search is extremely fast and when working with a JET database is the fastest method by far. The downside of course is that you need an index to search on.
JimD.
Sorry, but no as the setup is totally different. Your not using SQL statements at all and the execution is totally different. Your making calls directly to the database engine, not specifying a SQL statement which must be analyzed, costed, and then executed. There is a lot of overhead associated with processing a SQL statement, all of which is bypassed.
The index search is extremely fast and when working with a JET database is the fastest method by far. The downside of course is that you need an index to search on.
JimD.
@JimD
I meant "two database calls". I stand corrected. When performance is a high requirement, I agree that DAO is faster than ADO. But fewer calls should result in overall better performance.
In no way should any of these calls be made with dynamic SQL. QueryDefs are the way to go. (IMHO)
This will certainly be an interesting performace test.
I meant "two database calls". I stand corrected. When performance is a high requirement, I agree that DAO is faster than ADO. But fewer calls should result in overall better performance.
In no way should any of these calls be made with dynamic SQL. QueryDefs are the way to go. (IMHO)
This will certainly be an interesting performace test.
hi jim! i really like your code and i want to use it in my project, however i need to search for multi column indexes. i've created a question and cited your guidance there! could you help me check it out? thanks jim!! = ))
question name: creating and searching for multi column indexes
question name: creating and searching for multi column indexes
First problem; DAO is faster then ADO when a JET database is involved.
<<QUESTION: What is the fastest method of determining if the record exists and if so, update it or if not, insert it?>>
Using DAO, open the database, then the table as a table. Use SEEK on an existing index. If the seek fails, it's .Addnew. If it suceeds, it's .Edit
This would open the table:
' Open the table for update with seek
Set wrk = DBEngine(0)
Set dbRemote = wrk.OpenDatabase("C:\MyMDB
Set rstPTInfo = dbRemote.OpenRecordset("PT
rstPTInfo.index = "PTID"
Then you'd loop and in your loop do:
' Find the patient record
rstPTInfo.Seek "=", <key>
If rstPTInfo.NoMatch Then
rstPTInfo.Addnew
Else
rstPTInfo.Update
End if
' set all your fields here
rstPTInfo![PK] = <key>
' Update record
rstPTInfo.Update
end loop.
That would be the fastest.
JimD