mySQL - unions, subqueries and joins

tomaugerdotcom
tomaugerdotcom used Ask the Experts™
on
Hi again, experts!

I'm struggling with the syntax in the case of a union that must then join to another union.

First the mySQL that does not work:

select N.*, M.* from
(      
            select * from notification N where N.row_id = ?
      union
            select * from arc_notification N where N.row_id = ?

) as N

join
(
            select * from message M where M.row_id = N.message_id
      union
            select * from arc_message M where M.row_id = N.message_id

) M

Here the N.message_id in the where clause of the two last subqueries is an "unknown column". Clearly the alias of the previous union, while understood in the SELECT clause is not carried over into the next JOIN clause.

Before you suggest:
join ( select * from message M union select * from arc_message M ) as M on M.row_id=N.message_id
which does work, consider the nasty cross result that cannot take advantage of the pimrary key index on row_id. This is precisely what I'm trying to avoid.

If anyone is interested, here are some sample data that illustrate the point:
TABLE message
row_id | message
1 | Testing 1-2-3
2 | Hello, world
3 | Foo is bar!

TABLE arc_message
row_id | message
5 | What up doc
6 | Yo yo ma
7 | Undsoweiter

TABLE notification
row_id | message_id
1 | 1
2 | 5
3 | 7

TABLE arc_notification
row_id | message_id
6 | 2
7 | 3
8 | 6

So in the previous statement, if we substitute ? with an actual notification OR arc_notification row_id, we should get back the corresponding message from either the message OR the arc_message tables, depending on which one works...

Thanks in advance!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
SharathData Engineer

Commented:
Can you post the expected result?
Why not
 select N.*, M.* from message M JOIN Notification N ON M.row_id = N.message_id WHERE N.row_id=? UNION ALL select N.*, M.* from arc_message M JOIN arc_Notification N ON M.row_id = N.message_id WHERE N.row_id=?
 
 Of course this assumes the data in your current tables and archive tables correspond within themselves (message <-> notification, arc_message <-> arc_notification).

Author

Commented:
@ Sharath - I don't understand what you are asking me to post. The sample data is all there in my OP.

@ techhealth - thanks for the attempt, but the point is that they do NOT correspond. Hence the need for 2 separate UNIONS.

Thanks for your continued help!

T
Acronis in Gartner 2019 MQ for datacenter backup

It is an honor to be featured in Gartner 2019 Magic Quadrant for Datacenter Backup and Recovery Solutions. Gartner’s MQ sets a high standard and earning a place on their grid is a great affirmation that Acronis is delivering on our mission to protect all data, apps, and systems.

SharathData Engineer

Commented:
I asked you to post the expected result for the sample set posted.

Author

Commented:
given 1
1, 1, 1, Testing 1-2-3
2, 5, 5, What up doc
3, 7, 7, Undsoweiter
6, 2, 2, Hello World
7, 3, 3, Foo is bar
8, 6, 6, Yo yo ma
SharathData Engineer

Commented:
try this.
select t1.row_id,t1.message_id,t2.message 
  from (select row_id,message_id from notification union
        select row_id,message_id from arc_notification) as t1 
 inner join (select row_id,message from message union all
             select row_id,message from arc_message) as t2 
    on t1.message_id = t2.row_id

Open in new window

SharathData Engineer

Commented:
if you want all the columns, use *
select *
  from (select row_id,message_id from notification union
        select row_id,message_id from arc_notification) as t1 
 inner join (select row_id,message from message union all
             select row_id,message from arc_message) as t2 
    on t1.message_id = t2.row_id

Open in new window

Author

Commented:
@ Sharath 123 - thanks for the follow-up. Unfortunately, as I did mention in my original post, the cross-product that that solution generates is totally inefficient and is precisely what I want to avoid. See, what you're doing is selecting EVERY row from message and EVERY row from arc_message and unioning them (why union all instead of union, anyway?) and only THEN do you then select on row_id, which of course at that point is not using any indexing or optimization whatsoever.

What we need to do is be able to restrict the resultset from message and arc_message PRIOR to the union, to make use of the optimizer.

Thanks though. Any other ideas?

T
select N.*, M.* from message M JOIN Notification N ON M.row_id = N.message_id WHERE N.row_id=?
 UNION
 select N.*, M.* from arc_message M JOIN arc_Notification N ON M.row_id = N.message_id WHERE N.row_id=?
UNION
select N.*, M.* from message M JOIN arc_Notification N ON M.row_id = N.message_id WHERE N.row_id=?
UNION
select N.*, M.* from arc_message M JOIN Notification N ON M.row_id = N.message_id WHERE N.row_id=?

You are probably better off that way which will let all optimization to be available.  Syntax should work but not an expert on MySQL specifics...

Author

Commented:
Arrrgggggh. @ techhealth. Yeah, this is ugly and works but I was _really_ hoping to avoid it because my example SQL is a gross oversimplification of the issue.

I'll accept this if nothing else turns up because it _is_ a workable solution, but I would rather find something that allows me to do something closer to what I posted in the OP.

Can anyone advise me on why the second join in my original example (the union on table `message`) does not recognize the column N.`message_id` ? Please ?

Author

Commented:
Or rather, can someone suggest a way to make that aliasing work?
Try making a column alias for N.Message_id, then use it in your JOIN.  The table_name.column_name reference doesn't work for conditions after UNION.  I suspect that's why it throws "unknown column".  So

select N.*, M.* from
(      
            select *, N.message_id AS mID from notification N where N.row_id = ?
      union
            select *, N.message_id AS mID from arc_notification N where N.row_id = ?

) as N

join
(
            select * from message M where M.row_id = mID
      union
            select * from arc_message M where M.row_id = mID

) M



Author

Commented:
Well, that was certainly worth a try. Unfortunately it throws the same error.

Unknown column 'mID' in where clause.

Any other suggestions? That last one felt close...

T
SharathData Engineer

Commented:
>> @ Sharath 123 - thanks for the follow-up. Unfortunately, as I did mention in my original post, the cross-product that that solution generates is totally inefficient and is precisely what I want to avoid.

Did you check the execution plan of all the queries suggested?

Author

Commented:
I will do this.

Author

Commented:
@angeliii - thanks for reminding me about this open question. Unfortunately I am unable to award any points based on a correct answer because none were given. I _could_ assign points based on effort, but that's not really what EE is all about.

The execution plan would only give me a hint about what is slowing the query down, but would not reveal the answer I'm looking for. We can either delete this question, I can post my workaround and accept my own answer, or we can leave it open.

Or, we can get fresh eyes on the question - perhaps there is another Expert out there who has seen this precise problem before and knows the answer.

What do you recommend to be the best course of action?
Billing Engineer
Most Valuable Expert 2014
Top Expert 2009
Commented:
several things here, as an expert:
* UNION ALL does NOT do a implicit DISTINCT. UNION does a implicit DISTINCT.
  => this alone might explain some performance issues. I see you asked about why to use union all, but have not got a reply on that, visibly

*  Clearly the alias of the previous union, while understood in the SELECT clause is not carried over into the next JOIN clause.
this is normal. the subquery has another scope, and is "outside" the scope of knowing about the N alias.

* how to improve ?
well, it might depend on how much you simplified the situation, but I see 2 approaches:
 
#1: store  the results of N in a temp table, and then continue from there
#2: "duplicate" the N inside the M alias without using the temp table above


select N.*, M.* from
(      
            select * from notification N where N.row_id = ?
      union ALL
            select * from arc_notification N where N.row_id = ?

) as N

join
(
            select M.* from message M 
 JOIN (      
            select * from notification N where N.row_id = ?
      union
            select * from arc_notification N where N.row_id = ?

) as N ON M.row_id = N.message_id

      union ALL
            select M.* from arc_message M 
 JOIN (      
            select * from notification N where N.row_id = ?
      union
            select * from arc_notification N where N.row_id = ?

) as N ON M.row_id = N.message_id

) M

Open in new window

Author

Commented:
I apologize for leaving this question open so long.

Thanks to everyone for their efforts, and AngelIII for follow-up. I'm splitting the points because these answers are both correct. I had hoped there was some nuance of SQL that I had missed, but it looks like there's no "elegant" solution.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial