Link to home
Start Free TrialLog in
Avatar of Cory Miller
Cory MillerFlag for United States of America

asked on

How can I update a database table with a .Net DataTable and ignore existing records?

I have a .Net DataTable that contains records, all of which are "added" records. The corresponding table in the database may contain millions of rows. If I attempt to simply call the "Update" method on my SqlDataAdapter, any existing records cause an exception to be raised due to a violation of the primary key constraint. I considered loading all of the physical table's records into a second DataTable instance, merging the two, and then calling the Update method on the second DataTable. This actually works exactly like I want. However, my concern is that if there are 30 billion records in the physical table, loading all of that data into a DataTable in memory could be an issue.

I considered selecting a sub-set of data from the physical table and proceeding as described above, but the construction of the sub-query has proved to be very involved and very tedious. You see, I am not working with a single known table. I am working with a DataSet that contains several hundred DataTables. Each of the DataTables maps to its own physical table. The name and schema of the tables are not known at compile time. This has to all be done at run time.

I have played with the SqlBulkCopy class but have the same issue - duplicate records raise an exception.

I don't want to have to dynamically construct queries for each table at run time. If that is the only way, so be it, but I just can't help but think that there must be a simpler solution using what Ado.Net provides.
Avatar of Gustav Brock
Gustav Brock
Flag of Denmark image

I would first retrieve those records from the physical table that match your datatable into a (temp) datatable.
The records of the datatable that don't match the records of the temp datatable are new, and these you can append to the physical table.
With potentially 30 bil. records, I see no other method.

/gustav
Update command should never produce primary key violations because it does not insert new records, it updates the existing ones. The only scenario where it's possible is if the table has unique key (in addition to the primary key), and updated value in your existing record violates it.

In any case, in order to bypass existing records, you need to modify Update command the adapter is using, for example:

instead of

update mytable set col1=newvalue where pk=mykey
you use
if not exists(select 1 from mytable where col1=newvalue) update mytable set col1=newvalue where pk=mykey
Can you post the table structure and tell us which field or fields are the primary key?
For a large table I would define an IDENTITY column as primary key so SQL Server engine will take care of not repeating values and you avoid the error without any extra work.
Avatar of Cory Miller

ASKER

I appreciate everyone's quick responses. Thank you. Please see my responses to each individual comment below:

Gustav Brock - I cannot retrieve the physical table's records into a temp DataTable because I could potentially have in excess of 30 billion records. The DataTable is a .NET in-memory data structure. That was to be my original solution until I realized the amount of data that I could potentially be dealing with.

vadimrapp1 - When I talked about "Update", I was referring to the "SqlDataAdapter.Update()" method. This method will perform INSERT, UPDATE, and DELETE operations on its corresponding physical table depending on the state of the data in the DataTable. Additionally, business requirements dictate that existing physical records not be updated. I am only allowed to insert new records.

Vitor Montalvão - Unfortunately I do not have the latitude to change the structure of the physical table. An IDENTITY column is not an option for me.
What about my first question?
Can you post the table structure and tell us which field or fields are the primary key?
> I was referring to the "SqlDataAdapter.Update()" method.

"Update" method runs Update command defined in the Properties of the adapter. Review that command.

>  Additionally, business requirements dictate that existing physical records not be updated. I am only allowed to insert new records.

While updating, the only way to hit  "violation of the primary key constraint" is if you actually try to change primary key.
> I cannot retrieve the physical table's records into a temp DataTable because I could
> potentially have in excess of 30 billion records.

That's not what I wrote:

> I would first retrieve those records from the physical table that match your datatable
> into a (temp) datatable.

Could be done with a: select id from table where id in (<list of ids>)

/gustav
Gustav Brock - Thank you for clarifying your response. However, I had already considered a similar solution. But I really want to avoid going that direction.

From my original post:

I considered selecting a sub-set of data from the physical table and proceeding as described above, but the construction of the sub-query has proved to be very involved and very tedious. You see, I am not working with a single known table. I am working with a DataSet that contains several hundred DataTables. Each of the DataTables maps to its own physical table. The name and schema of the tables are not known at compile time. This has to all be done at run time.

I would have to dynamically construct queries for hundreds of tables each with compound primary keys - something like:

SELECT * FROM table WHERE Id01 in (<list of Id01's> AND Id02 in (<list of Id02's> AND Id03 in (<list of Id03's> AND... 

Open in new window


While not impossible to do, it is not exactly in line with my original question, that is:

I don't want to have to dynamically construct queries for each table at run time. If that is the only way, so be it, but I just can't help but think that there must be a simpler solution using what Ado.Net provides.
Oh, I see. I thought you could identify records by a single unique value.

Then I think your best option would be to copy your datatable to a temp table on the server and then run an update query on the server via a pass-through query or a stored procedure.

/gustav
Usually, the attempts to resolve the problem without understanding what's causing it, are doomed.
Nevertheless, sometimes such attempts may help you understand what's causing the problem.

/gustav
Vitor Montalvão - I don't believe that it would necessarily be helpful to post the table structure. I am working with several hundred tables with compound primary keys that may be 1 to many columns. I have to dynamically handle each table individually at run time. My code will not know the table structure at compile time.

vadimrapp1 -
"Update" method runs Update command defined in the Properties of the adapter. Review that command.
That's not exactly what that command does. I have reviewed it exhaustively and found no solution. That is why I elected to post my original question. The "Update" command on a SqlDataAdapter will attempt to execute a SQL "INSERT" if the record has been added to the source DataTable. Every single record in my DataTable objects are added records. Therefore, the SqlDataAdapter attempts to execute an INSERT for every record in the DataTable. If the record already exists in the physical table, an exception is raised.
While updating, the only way to hit  "violation of the primary key constraint" is if you actually try to change primary key.
The SqlDataAdapter is NOT doing UPDATEs. It is attempting to do INSERTs as I described above.
So that's your answer then: modify INSERT command of the adapter so it does not do anything. For example

INSERT INTO MYTABLE(MYCOLUMN) SELECT 1 WHERE 'USA'='LAND OF THE FREE'

or even

SELECT 1
I appreciate everyone's input and suggestions. Unfortunately, it appears that there really is not answer to my issue (using built in ADO.NET library functions to conditionally insert a record into a physical table). Because I must move on and get this project wrapped up, I am implementing my original solution - iterate the DataRow objects in my DataTable instance, check to see if a corresponding physical record exists, and skip it if it does, otherwise go ahead and insert it. I am implementing this by extending the DbDataAdapter class and overriding the "Update" method to perform the logic I outlined above.
Please explain, why modifying InsertCommand used by the adapter so it does not really insert, is not a solution. I actually tested it by creating a small project, and it worked as expected. I was indeed wrong in the previous comment when I said that Update method only updates - you are right, it also inserts missing and deletes extra; but still, it's doing it by launching respectively specified InsertCommand and DeleteCommand, which are adapter's properties. If you change those commands from what the wizard has created, to something else, like the examples I gave above, then it will launch that command and the command will not do anything.
Vadim Rapp,

If I understand correctly, you are saying that one can set the "InsertCommand" property on the DataAdapter to something that does not perform any INSERTs. That would work if I did not want to INSERT any records at all. But how do I create an "InsertCommand" that INSERTs only when there is no existing physical record. The InsertCommand must insert brand new records but must skip records that are already in the physical table. Maybe I am not understanding your suggestion. Can you attach your small project that you created for me so I can take a look? Thanks!
Will attach tomorrow morning.
ASKER CERTIFIED SOLUTION
Avatar of Vadim Rapp
Vadim Rapp
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial