Link to home
Start Free TrialLog in
Avatar of Bruce Gust
Bruce GustFlag for United States of America

asked on

What is wrong with this SQL INSERT?

Here's my SQL:

select 
   case 
      when 
         (select count(*) from tasks t where t.CUSTID='DPR01' and t.INTID='INT0001'>0)
      then
         UPDATE tasks_master_tracker_idb SET 
            CLIENT_CONTACT = (SELECT t.CLIENT_CONTACT FROM tasks t WHERE t.CUSTID='DPR01' AND t.INTID='INT0001'),
            COMPLEXITY = (SELECT t.COMPLEXITY FROM tasks t WHERE t.CUSTID='DPR01' AND t.INTID='INT0001'),
            CREATED_BY = (SELECT t.CREATED_BY FROM tasks t WHERE t.CUSTID='DPR01' AND t.INTID='INT0001'),
            CUSTID = (SELECT t.CUSTID FROM tasks t WHERE t.CUSTID='DPR01' AND t.INTID='INT0001'),
            DIRECTION = (SELECT t.DIRECTION FROM tasks t WHERE t.CUSTID='DPR01' AND t.INTID='INT0001'),
            INTEGRATION_ID = (SELECT t.INTID FROM tasks t WHERE t.CUSTID='DPR01' AND t.INTID='INT0001'),
            INTEGRATION_NAME = (SELECT t.NAME FROM tasks t WHERE t.CUSTID='DPR01' AND t.INTID='INT0001'),
            INTEGRATION_STATUS = (SELECT t.STATUS FROM tasks t WHERE t.CUSTID='DPR01' AND t.INTID='INT0001'),
            INTEGRATION_TYPE = (SELECT t.PROJ_TYPE FROM tasks t WHERE t.CUSTID='DPR01' AND t.INTID='INT0001'),
            VENDOR = (SELECT t.VENDOR FROM tasks t WHERE t.CUSTID='DPR01' AND t.INTID='INT0001')
            WHERE
            CUSTID='DPR01'
                AND
                INTEGRATION_ID='INT001'
      else
         INSERT into tasks_master_tracker_idb (
         CLIENT_CONTACT, COMPLEXITY, CREATED_BY, CUSTID, DIRECTION,
         INTEGRATION_ID, INTEGRATION_NAME, INTEGRATION_STATUS, INTEGRATION_TYPE, VENDOR
         )
            SELECT t.CLIENT_CONTACT, t.COMPLEXITY, t.CREATED_BY, t.CUSTID, t.DIRECTION,
            t.INTID, t.NAME, t.STATUS, t.PROJ_TYPE, t.VENDOR 
            FROM
            tasks t
            WHERE 
            t.CUSTID='DPR01'
            AND
            t.INTID='INT0001'
      end

Open in new window

I'm getting an error that says, "Error Code: 1064. You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'UPDATE tasks_master_tracker_idb SET      CLIENT_CONTACT = (SELECT t.CLIENT_CONTA' at line 6

Not sure what's wrong or how to fix it.

Any ideas?
ASKER CERTIFIED SOLUTION
Avatar of PortletPaul
PortletPaul
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Bruce Gust

ASKER

OK, well, nuts!

Here's my dilemma: Initially I was going to use an ON DUPLICATE KEY UPDATE dynamic, but I don't think that's going to work because there's not any kind of index within the incoming data. It's only the combination of the CUSTID and the INTID that would produce a "duplicate" as opposed to a single value.

What are my options, Paul?

I need to insert a row if it's brand new or update it if it already exists...
You could use a stored procedure which do support branching
It is commonly easier to help resolve an issue when explained as opposed to,
Here is what I have, why is it not working?
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

>>It's only the combination of the CUSTID and the INTID that would produce a "duplicate" as opposed to a single value.

Many tables have a multiple column natural primary key.  Nowhere in the rules of normalization say a primary key MUST be a single column.

So the solution is to create a primary key on those two columns and use the INSERT ... ON DUPLICATE KEY UPDATE.
Paul, using a stored procedure may be an option, but it's not something I want to pursue if I can solve this using a different approach.

For example, creating a primary key on two columns. How is that done? That smells like the answer to my problem right there!
alter table tasks_master_tracker_idb add primary key(CUSTID , INTID );

Open in new window

Arnold, not sure how much more background or context you would need, but as a courtesy...

A file is being uploaded that's coming from Workday. The idea is to automate the process where the info coming in is automatically inserted into our application's database.

In some instances, the table being targeted is going to have some of the data that would otherwise be inserted. In that instance, the existing data needs to be updated.

The challenge is that neither the Customer ID or the Integration ID is unique in and of themselves. You'll have the same customers more than once and the Integration IDs can be the same when compared from one project to the next.

What is unique is the combination of both the Customer ID and the INTID. That will always be distinctive.

Since the ON DUPLICATE KEY UPDATE approach is only an option if there's an index of some sort that would trigger an error and a CASE can't be used in that you can't incorporate an UPDATE in the context of a SELECT, the challenge is to figure out how to get it done.

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
slight! This smells really good: alter table tasks_master_tracker_idb add primary key(CUSTID , INTID );

Once that's done, is there anything that needs to be added or changed to this:

INSERT into tasks_master_tracker_idb (
CLIENT_CONTACT, COMPLEXITY, CREATED_BY, CREATED_DATE, CUSTID, DIRECTION,
INTEGRATION_ID, INTEGRATION_NAME, INTEGRATION_STATUS, INTEGRATION_TYPE, VENDOR
)
   SELECT t.CLIENT_CONTACT, t.COMPLEXITY, t.CREATED_BY, t.CREATED_DT, t.CUSTID, t.DIRECTION,
   t.INTID, t.NAME, t.STATUS, t.PROJ_TYPE, t.VENDOR 
   FROM
   tasks t
   WHERE 
   t.CUSTID='{$params['custid']}'
   AND
   t.INTID='{$params['intid']}'
ON DUPLICATE KEY UPDATE
CLIENT_CONTACT = (SELECT t.CLIENT_CONTACT FROM tasks t WHERE t.CUSTID='{$params['custid']}' AND t.INTID='{$params['intid']}'),
COMPLEXITY = (SELECT t.COMPLEXITY FROM tasks t WHERE t.CUSTID='{$params['custid']}' AND t.INTID='{$params['intid']}'),
CREATED_BY = (SELECT t.CREATED_BY FROM tasks t WHERE t.CUSTID='{$params['custid']}' AND t.INTID='{$params['intid']}'),
CREATED_DATE = (SELECT t.CREATED_DT FROM tasks t WHERE t.CUSTID='{$params['custid']}' AND t.INTID='{$params['intid']}'),
CUSTID = (SELECT t.CUSTID FROM tasks t WHERE t.CUSTID='{$params['custid']}' AND t.INTID='{$params['intid']}'),
DIRECTION = (SELECT t.DIRECTION FROM tasks t WHERE t.CUSTID='{$params['custid']}' AND t.INTID='{$params['intid']}'),
INTEGRATION_ID = (SELECT t.INTID FROM tasks t WHERE t.CUSTID='{$params['custid']}' AND t.INTID='{$params['intid']}'),
INTEGRATION_NAME = (SELECT t.NAME FROM tasks t WHERE t.CUSTID='{$params['custid']}' AND t.INTID='{$params['intid']}'),
INTEGRATION_STATUS = (SELECT t.STATUS FROM tasks t WHERE t.CUSTID='{$params['custid']}' AND t.INTID='{$params['intid']}'),
INTEGRATION_TYPE = (SELECT t.PROJ_TYPE FROM tasks t WHERE t.CUSTID='{$params['custid']}' AND t.INTID='{$params['intid']}'),
VENDOR = (SELECT t.VENDOR FROM tasks t WHERE t.CUSTID='{$params['custid']}' AND t.INTID='{$params['intid']}');

Open in new window

...or will the new Primary Key be automatically included in the evaluation of the INSERT and recognize it accordingly?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
It should be something like this:
...
ON DUPLICATE KEY UPDATE
CLIENT_CONTACT = t.CLIENT_CONTACT,
...
slight! This smells really good: alter table tasks_master_tracker_idb add primary key(CUSTID , INTID );
CAVEAT:

This is a serious change in the data model. Not having a primary key must have its reason. E.g. it is common for staging tables.

For further ETL processing, the used tables must match your data model to some extend, thus it is not possible, that there is a primary key missing in a OLTP table.
The reverse view of this point is: Your data model is not properly normalized. Here you MUST normalize the model first and implement it correctly.

Having imported data from a bunch of CRM or timesheet systems, my personal experience:

Every design flaw you allow to pass, e.g. for business reason "this will not happen do to process" etc., will result in reports delivered, where you do not have taken that flaw into account.
And processes change, thus a won't happen due to convention will for sure happen. Then such a flaw will give you the problems relational theory solves: inconsistent data and redundant data.
Now correcting that flaw is much more expensive by time, tests and budget.

Long story, short morale: Review your data model. Normalize it correctly. Implement it completely.

p.s. as @PortletPaul wrote: and implement it as stored procedure.
I agree to an extent.  At times, some times tables don't or can't have a primary key.  That goes to your point about knowing the system.

Yes, if the staging table cannot have a PK for some other business reason, then a stored procedure to process the data is probably necessary.

If the PK was just an oversight in the design, add it and use the simple insert..

Related point:
Adding the PK might uncover other problems with the staged data that might not have been previously known.

You might also want to look at ordering the staged data in the proper order to make sure you don't end up with an incorrect change.  I see there is a created_date in the data.

for example, on 1/1/2021 you get an 'update' (because of the duplicate key) that changes the vendor to 'Fred', then on 1/2/2021 you get an another duplicate that updates to 'Barney'.

If you process that in the wrong order, you might end up with it set to 'Fred', when it should be 'Barney'.
for example, on 1/1/2021 you get an 'update' (because of the duplicate key) that changes the vendor to 'Fred', then on 1/2/2021 you get an another duplicate that updates to 'Barney'.
This is also a typical ETL problem: What does the data to import represent?
E.g. is it dimensional or transactional data from the source system?

Cause the above scenario is common for transferring data from a SCD dimension in a dimensional model or it is the fact data. In the first case, you need to decide, whether you need the old dimensional data or not. When not, then the order of UPDATE would be relevant. But if it is fact data, then you need to store both values.

So in short: Review your model, review the process you implement, audit the source system to some extend. Cause even the most trivial problem maybe just the tip of an iceberg.
Given the breadth and scope of the analysis slightwv and ste5fan imparted.
What is the timeframe liist if any between file creations?

One option is to load the new data into a temporary table the issues raised deal with identifying the columns on whose basis updates should be done and to which columns.
Using update with a join ; then removing the lines from the staging table
Update first then delete based on same croteria

Select into existing table data from the staging data that remain post delete

The approach deals with staggering pulling in new data in part, then new data in whole when there is no matching reference

Scripting that handles the logic sequential queries and apply your process rules.

Outside a stored procedure, the complexity of trying to accommodate all the possible variants scripting might be a viable alternative.
This was a question I submitted as part of a scenario referenced in another question - the resolution was the same. I changed the structure of the table in a way where I combined two columns to form an index. At that, my SQL worked using ON DUPLICATE KEY UPDATE