Link to home
Start Free TrialLog in
Avatar of sqldba2013
sqldba2013

asked on

MySQL query - Merge statment

Hi Experts,

I am new to MySQL scripting and I am facing error while merging Insert and Update query into single query.

Please suggest me how to merge attached query into single query with Merge option (Insert+Update).

Thanks in advance.
For-Insert.sql
For-Update.sql
Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

sqldba,
 
  MySQL has implement the "merge" feature as INSERT ... ON DUPLICATE KEY, see the technical reference here:
  http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html
  it will require a primary key or unique index on the relevant columns
Avatar of sqldba2013
sqldba2013

ASKER

Thanks for your reply.

I have tried with above URL, still I am not able to write correct query and scripting is very horrible for DBA's :)

TX_TASK_ID column is primary column in table TIMESHEET_UPDATE_HOURS
I am not really sure where the issue is ...

INSERT INTO yourtable T ( columns ...)
SELECT SQ FROM ( <YOUR SELECT FROM INSERT> ) SQ
ON DUPLICATE KEY UPDATE
  SET T.USER = SQ.USER
   ,T.DAYONE = SQ.DAYONE
  , etc ....
The sql you provided, is a select statement.

We expect the insert and update statements, as provided by the link Guy provided.

INSERT statement If record is a new record:
INSERT INTO table (a,b,c) VALUES(1,2,3);

UPDATE statement if a record with key a=1 exists:
UPDATE table SET c=c+1 WHERE a=1;

Merging Insert and Update, to insert the record if key a=1 does not exist, and updates c if exists.
INSERT INTO table (a,b,c) VALUES(1,2,3)
ON DUPLICATE KEY UPDATE SET c=c+1;

Open in new window

You haven't provided the right information for a merge which needs:
1. some inserts
2. an update query

You have only supplied select queries User generated image
Thanks to all for your suggestions/help.

I have created script myself as per my requirement. However, the attached select statement is part of my script and it was  taking very long time (10 min for 80 rows). Please suggest on which line code I have to modify to speed up the attached query.
Select-Query.txt
to speed up is a different question, actually.
in short, you need to use the explain plan for the query (only for the select part), and build up the needed indexes.
note that 80 rows "output" is not the relevant number, but the number of rows that are in the source tables, before filtering, that are relevant
I would say it is the multi-column use of IN()
and (task_id, ENTERED_BY_USER, DATE_ADD(DATE_WORKED, INTERVAL -(DAYOFWEEK(DATE_WORKED) - 2) DAY)) not in
(select TX_TASK_ID, USER, cast(start_date as char) from TIMESHEET_SUMMARY  where ENTERED_BY_USER not like 'eUS_N%')

Open in new window

but it is a complete guess without any ability to test it. I would suggest finding an alternative method to exclude data, perhaps using a join instead.
We have already created and tested given select script with indexes and we didn't get any improvements.

@Portletpaul : Could you please suggest alternative method for IN operator for above snippet
I already did - a join (a left join to TIMESHEET_SUMMARY, then exclude matches)

but I cannot attempt this tonight (it's late evening for me) and also would not try it without all table definitions and sample data.

I cannot be certain this is your performance issue - but it looks like it might be.
Thanks PortletPaul and find herewith attached table definitions.
Table-Defn.txt
and some data?
Here is one thing to try that doesn't alter your existing query too much, see line 12 below. In your original you reference `ENTERED_BY_USER` in that location which is making it into a correlated subquery as well as an IN() comparison.
        AND (
               task_id
             , ENTERED_BY_USER
             , DATE_ADD(DATE_WORKED, INTERVAL - (DAYOFWEEK(DATE_WORKED) - 2) DAY)
             ) NOT IN (
                       SELECT
                              TX_TASK_ID
                            , USER
                            , cast(start_date AS CHAR)
                       FROM TIMESHEET_SUMMARY
          
                       WHERE USER NOT LIKE 'eUS_N%'
/* change in line above      ^^^^                           */    
                       )
       GROUP BY
              ENTERED_BY_USER
            , task_id
            , DATE_WORKED

Open in new window

Here is an approach using NOT EXISTS () instead of NOT IN ()
SELECT
       TX_TASK_ID
     , USER
     , SUM(DAYONE) DAYONE
     , SUM(DAYTWO) DAYTWO
     , SUM(DAYTHREE) DAYTHREE
     , SUM(DAYFOUR) DAYFOUR
     , SUM(DAYFIVE) DAYFIVE
     , SUM(DAYSIX) DAYSIX
     , SUM(DAYSEVEN) DAYSEVEN
     , NETWORK_CODE
     , ACTIVITY_CODE
     , ACTIVE
     , SUM(TOTAL)
     , START_DATE
     , START_DAY
     , END_DATE
FROM (
       SELECT
              task_id TX_TASK_ID
            , ENTERED_BY_USER USER
            , CASE WHEN (DAYOFWEEK(DATE_WORKED) - 1) = 1 THEN sum(HOURS_WORKED) ELSE 0.0 END DAYONE
            , CASE WHEN (DAYOFWEEK(DATE_WORKED) - 1) = 2 THEN sum(HOURS_WORKED) ELSE 0.0 END DAYTWO
            , CASE WHEN (DAYOFWEEK(DATE_WORKED) - 1) = 3 THEN sum(HOURS_WORKED) ELSE 0.0 END DAYTHREE
            , CASE WHEN (DAYOFWEEK(DATE_WORKED) - 1) = 4 THEN sum(HOURS_WORKED) ELSE 0.0 END DAYFOUR
            , CASE WHEN (DAYOFWEEK(DATE_WORKED) - 1) = 5 THEN sum(HOURS_WORKED) ELSE 0.0 END DAYFIVE
            , CASE WHEN (DAYOFWEEK(DATE_WORKED) - 1) = 6 THEN sum(HOURS_WORKED) ELSE 0.0 END DAYSIX
            , CASE WHEN (DAYOFWEEK(DATE_WORKED) - 1) = 0 THEN sum(HOURS_WORKED) ELSE 0.0 END DAYSEVEN
            , NULL NETWORK_CODE
            , NULL ACTIVITY_CODE
            , 1 ACTIVE
            , sum(HOURS_WORKED) TOTAL
            , DATE_ADD(DATE_WORKED, INTERVAL - (DAYOFWEEK(DATE_WORKED) - 2) DAY) AS START_DATE
            , DATE_ADD(DATE_WORKED, INTERVAL - (DAYOFWEEK(DATE_WORKED) - 2 + 7) DAY) AS END_DATE
            , NULL START_DAY
       FROM TIMESHEET_UPDATE_HOURS tuh
       WHERE ACTUAL_DATE_ENTERED >= '2014-07-31'
       AND ENTERED_BY_USER NOT LIKE 'eUS_N%'
       AND NOT EXISTS (
                        SELECT NULL
                        FROM TIMESHEET_SUMMARY ts
                        where ts.TX_TASK_ID = tuh.task_id
                        AND ts.USER = tuh.ENTERED_BY_USER
                        AND cast(ts.start_date AS CHAR) = DATE_ADD(tuh.DATE_WORKED, INTERVAL - (DAYOFWEEK(tuh.DATE_WORKED) - 2) DAY)
                       )
  
       GROUP BY
              ENTERED_BY_USER
            , task_id
            , DATE_WORKED
     ) f
GROUP BY
     USER
   , TX_TASK_ID
   , NETWORK_CODE
   , ACTIVITY_CODE
   , ACTIVE
   , START_DATE
   , END_DATE
   , START_DAY
;

Open in new window

note you should propagate use of the alias tuh which is needed in the correlated subquery
I should point out that that this will work against performance here:

comparing

         cast(ts.start_date AS CHAR)
to
         DATE_ADD(tuh.DATE_WORKED, INTERVAL - (DAYOFWEEK(tuh.DATE_WORKED) - 2) DAY)

Is it absolutely necessary to  cast(ts.start_date AS CHAR)  ????

---

and don't get me started on storing dates as varchars!
never a good thing to do IMO
ASKER CERTIFIED SOLUTION
Avatar of PortletPaul
PortletPaul
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Its working perfect...Thanks a lot for your help.
No problem, glad I could help. Cheers, Paul