We help IT Professionals succeed at work.

MySQL query - Merge statment

sqldba2013
sqldba2013 asked
on
312 Views
Last Modified: 2014-08-26
Hi Experts,

I am new to MySQL scripting and I am facing error while merging Insert and Update query into single query.

Please suggest me how to merge attached query into single query with Merge option (Insert+Update).

Thanks in advance.
For-Insert.sql
For-Update.sql
Comment
Watch Question

Guy Hengel [angelIII / a3]Billing Engineer
CERTIFIED EXPERT
Most Valuable Expert 2014
Top Expert 2009

Commented:
sqldba,
 
  MySQL has implement the "merge" feature as INSERT ... ON DUPLICATE KEY, see the technical reference here:
  http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html
  it will require a primary key or unique index on the relevant columns

Author

Commented:
Thanks for your reply.

I have tried with above URL, still I am not able to write correct query and scripting is very horrible for DBA's :)

TX_TASK_ID column is primary column in table TIMESHEET_UPDATE_HOURS
Guy Hengel [angelIII / a3]Billing Engineer
CERTIFIED EXPERT
Most Valuable Expert 2014
Top Expert 2009

Commented:
I am not really sure where the issue is ...

INSERT INTO yourtable T ( columns ...)
SELECT SQ FROM ( <YOUR SELECT FROM INSERT> ) SQ
ON DUPLICATE KEY UPDATE
  SET T.USER = SQ.USER
   ,T.DAYONE = SQ.DAYONE
  , etc ....
Hamed NasrRetired IT Professional
CERTIFIED EXPERT

Commented:
The sql you provided, is a select statement.

We expect the insert and update statements, as provided by the link Guy provided.

INSERT statement If record is a new record:
INSERT INTO table (a,b,c) VALUES(1,2,3);

UPDATE statement if a record with key a=1 exists:
UPDATE table SET c=c+1 WHERE a=1;

Merging Insert and Update, to insert the record if key a=1 does not exist, and updates c if exists.
INSERT INTO table (a,b,c) VALUES(1,2,3)
ON DUPLICATE KEY UPDATE SET c=c+1;

Open in new window

PortletPaulEE Topic Advisor
CERTIFIED EXPERT
Most Valuable Expert 2014
Awarded 2013

Commented:
You haven't provided the right information for a merge which needs:
1. some inserts
2. an update query

You have only supplied select queries for-update-for-insert.png

Author

Commented:
Thanks to all for your suggestions/help.

I have created script myself as per my requirement. However, the attached select statement is part of my script and it was  taking very long time (10 min for 80 rows). Please suggest on which line code I have to modify to speed up the attached query.
Select-Query.txt
Guy Hengel [angelIII / a3]Billing Engineer
CERTIFIED EXPERT
Most Valuable Expert 2014
Top Expert 2009

Commented:
to speed up is a different question, actually.
in short, you need to use the explain plan for the query (only for the select part), and build up the needed indexes.
note that 80 rows "output" is not the relevant number, but the number of rows that are in the source tables, before filtering, that are relevant
PortletPaulEE Topic Advisor
CERTIFIED EXPERT
Most Valuable Expert 2014
Awarded 2013

Commented:
I would say it is the multi-column use of IN()
and (task_id, ENTERED_BY_USER, DATE_ADD(DATE_WORKED, INTERVAL -(DAYOFWEEK(DATE_WORKED) - 2) DAY)) not in
(select TX_TASK_ID, USER, cast(start_date as char) from TIMESHEET_SUMMARY  where ENTERED_BY_USER not like 'eUS_N%')

Open in new window

but it is a complete guess without any ability to test it. I would suggest finding an alternative method to exclude data, perhaps using a join instead.

Author

Commented:
We have already created and tested given select script with indexes and we didn't get any improvements.

@Portletpaul : Could you please suggest alternative method for IN operator for above snippet
PortletPaulEE Topic Advisor
CERTIFIED EXPERT
Most Valuable Expert 2014
Awarded 2013

Commented:
I already did - a join (a left join to TIMESHEET_SUMMARY, then exclude matches)

but I cannot attempt this tonight (it's late evening for me) and also would not try it without all table definitions and sample data.

I cannot be certain this is your performance issue - but it looks like it might be.

Author

Commented:
Thanks PortletPaul and find herewith attached table definitions.
Table-Defn.txt
PortletPaulEE Topic Advisor
CERTIFIED EXPERT
Most Valuable Expert 2014
Awarded 2013

Commented:
and some data?
PortletPaulEE Topic Advisor
CERTIFIED EXPERT
Most Valuable Expert 2014
Awarded 2013

Commented:
Here is one thing to try that doesn't alter your existing query too much, see line 12 below. In your original you reference `ENTERED_BY_USER` in that location which is making it into a correlated subquery as well as an IN() comparison.
        AND (
               task_id
             , ENTERED_BY_USER
             , DATE_ADD(DATE_WORKED, INTERVAL - (DAYOFWEEK(DATE_WORKED) - 2) DAY)
             ) NOT IN (
                       SELECT
                              TX_TASK_ID
                            , USER
                            , cast(start_date AS CHAR)
                       FROM TIMESHEET_SUMMARY
          
                       WHERE USER NOT LIKE 'eUS_N%'
/* change in line above      ^^^^                           */    
                       )
       GROUP BY
              ENTERED_BY_USER
            , task_id
            , DATE_WORKED

Open in new window

PortletPaulEE Topic Advisor
CERTIFIED EXPERT
Most Valuable Expert 2014
Awarded 2013

Commented:
Here is an approach using NOT EXISTS () instead of NOT IN ()
SELECT
       TX_TASK_ID
     , USER
     , SUM(DAYONE) DAYONE
     , SUM(DAYTWO) DAYTWO
     , SUM(DAYTHREE) DAYTHREE
     , SUM(DAYFOUR) DAYFOUR
     , SUM(DAYFIVE) DAYFIVE
     , SUM(DAYSIX) DAYSIX
     , SUM(DAYSEVEN) DAYSEVEN
     , NETWORK_CODE
     , ACTIVITY_CODE
     , ACTIVE
     , SUM(TOTAL)
     , START_DATE
     , START_DAY
     , END_DATE
FROM (
       SELECT
              task_id TX_TASK_ID
            , ENTERED_BY_USER USER
            , CASE WHEN (DAYOFWEEK(DATE_WORKED) - 1) = 1 THEN sum(HOURS_WORKED) ELSE 0.0 END DAYONE
            , CASE WHEN (DAYOFWEEK(DATE_WORKED) - 1) = 2 THEN sum(HOURS_WORKED) ELSE 0.0 END DAYTWO
            , CASE WHEN (DAYOFWEEK(DATE_WORKED) - 1) = 3 THEN sum(HOURS_WORKED) ELSE 0.0 END DAYTHREE
            , CASE WHEN (DAYOFWEEK(DATE_WORKED) - 1) = 4 THEN sum(HOURS_WORKED) ELSE 0.0 END DAYFOUR
            , CASE WHEN (DAYOFWEEK(DATE_WORKED) - 1) = 5 THEN sum(HOURS_WORKED) ELSE 0.0 END DAYFIVE
            , CASE WHEN (DAYOFWEEK(DATE_WORKED) - 1) = 6 THEN sum(HOURS_WORKED) ELSE 0.0 END DAYSIX
            , CASE WHEN (DAYOFWEEK(DATE_WORKED) - 1) = 0 THEN sum(HOURS_WORKED) ELSE 0.0 END DAYSEVEN
            , NULL NETWORK_CODE
            , NULL ACTIVITY_CODE
            , 1 ACTIVE
            , sum(HOURS_WORKED) TOTAL
            , DATE_ADD(DATE_WORKED, INTERVAL - (DAYOFWEEK(DATE_WORKED) - 2) DAY) AS START_DATE
            , DATE_ADD(DATE_WORKED, INTERVAL - (DAYOFWEEK(DATE_WORKED) - 2 + 7) DAY) AS END_DATE
            , NULL START_DAY
       FROM TIMESHEET_UPDATE_HOURS tuh
       WHERE ACTUAL_DATE_ENTERED >= '2014-07-31'
       AND ENTERED_BY_USER NOT LIKE 'eUS_N%'
       AND NOT EXISTS (
                        SELECT NULL
                        FROM TIMESHEET_SUMMARY ts
                        where ts.TX_TASK_ID = tuh.task_id
                        AND ts.USER = tuh.ENTERED_BY_USER
                        AND cast(ts.start_date AS CHAR) = DATE_ADD(tuh.DATE_WORKED, INTERVAL - (DAYOFWEEK(tuh.DATE_WORKED) - 2) DAY)
                       )
  
       GROUP BY
              ENTERED_BY_USER
            , task_id
            , DATE_WORKED
     ) f
GROUP BY
     USER
   , TX_TASK_ID
   , NETWORK_CODE
   , ACTIVITY_CODE
   , ACTIVE
   , START_DATE
   , END_DATE
   , START_DAY
;

Open in new window

note you should propagate use of the alias tuh which is needed in the correlated subquery
PortletPaulEE Topic Advisor
CERTIFIED EXPERT
Most Valuable Expert 2014
Awarded 2013

Commented:
I should point out that that this will work against performance here:

comparing

         cast(ts.start_date AS CHAR)
to
         DATE_ADD(tuh.DATE_WORKED, INTERVAL - (DAYOFWEEK(tuh.DATE_WORKED) - 2) DAY)

Is it absolutely necessary to  cast(ts.start_date AS CHAR)  ????

---

and don't get me started on storing dates as varchars!
never a good thing to do IMO
EE Topic Advisor
CERTIFIED EXPERT
Most Valuable Expert 2014
Awarded 2013
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION

Author

Commented:
Its working perfect...Thanks a lot for your help.
PortletPaulEE Topic Advisor
CERTIFIED EXPERT
Most Valuable Expert 2014
Awarded 2013

Commented:
No problem, glad I could help. Cheers, Paul

Gain unlimited access to on-demand training courses with an Experts Exchange subscription.

Get Access
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Empower Your Career
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.