sub select in 'in clause'

Turkish27
Turkish27 used Ask the Experts™
on
Hi All,

I have just started using mySQL after a few years using MSSQL and I have hit a road block. For some reason the performance on a subselect within an 'in' clause in mySQL is abysmal.
I have a fairly simple query:

SELECT      SYMBOL, PRICE, SELLBROKER, TRADETIMESTAMP
FROM        dbname.TRADES
WHERE       SYMBOL IN ( SELECT      DISTINCT SYMBOL
                        FROM        dbname.TRADES
                        WHERE       PREVIOUSTICKDIRECTION < 1
                        GROUP BY    SYMBOL
                        HAVING      COUNT(SELLBROKER) > 3)
ORDER BY    SYMBOL, timestamp(TRADETIMESTAMP) ASC;

Open in new window


As it stands this takes 35 SEC (omg!) to run. If i run the subselect separately  it only takes .03s to run! If I grab the 200 records from it in a comma delimited list (i.e. 'AAA', 'BBB', 'CCC') and throw it in the IN clause manually the query runs super fast. So it is the subquery within the in clause itself that it does not like. MS SQL would handle this no problem. The only article on the net that I could actually find to explain this problem is:

http://spin.atomicobject.com/2011/03/25/mysql-in-query-performance/

I don't have the option (from what I know about the nature of this query) to do any type of join. I mean, it seems pretty straight forward, and I am surprised on the lack of documentation on this. Can anyone help please!!

The SYMBOL column is indexed as well. Originally I thought it was the problem but the fact that I can enter a comma delimited string explicitly and it runs fine shows that it isnt an index problem.

My table currently has 10k records in it but will have many more.

Thank you!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2010
Commented:
Try this:


SELECT      t.SYMBOL, t.PRICE, t.SELLBROKER, t.TRADETIMESTAMP
FROM        dbname.TRADES t
WHERE       EXISTS ( SELECT      t2.SYMBOL
                        FROM        dbname.TRADES t2
                        WHERE       t2.SYMBOL = t.SYMBOL AND t2.PREVIOUSTICKDIRECTION < 1
                        GROUP BY    t2.SYMBOL
                        HAVING      COUNT(t2.SELLBROKER) > 3)
ORDER BY    t.SYMBOL, timestamp(t.TRADETIMESTAMP) ASC;

Open in new window

Author

Commented:
Thanks for the quick reply matthew. This actually did work a lot faster. The query completed in 0.4 seconds. However this is a bit too slow still. If I manually enter in the comma delimited string, it comes back in 0.016 seconds.

I will be running all this in a stored procedure. So I DO have the option of firing the subselect query, parsing it out into a comma delimited list and putting it into the other query. However, this seems ridiculous given the fact of how mature of a product mySQL is. Do you have any other suggestions? (I will give you partial points when all is said and done, since yours indeed does work better)
The reason a manually entered comma delimited string is faster is that, because you are referencing the outer table in your subselect, the subselect's query is being executed once for every row that the main select finds.  In other words, if there are 50,000 rows in your TRADES table and you only want 1200 of them, you are still executing your subselect 50,000 times.

Try the following:
SELECT  T.SYMBOL
       ,T.PRICE
       ,T.SELLBROKER
       ,T.TRADETIMESTAMP
FROM dbname.TRADES T
INNER JOIN
(
 SELECT DISTINCT SYMBOL
 FROM dbname.TRADES
 WHERE PREVIOUSTICKDIRECTION < 1
 GROUP BY SYMBOL
 HAVING COUNT(SELLBROKER) > 3
)
ORDER BY T.SYMBOL
        ,TIMESTAMP(T.TRADETIMESTAMP) ASC;

Open in new window

Author

Commented:
Thanks Diver.

This worked with a final time of .03ms which is very acceptable. There was just a syntax issue where you were missing the join column and second table alias:

SELECT  T.SYMBOL
       ,T.PRICE
       ,T.SELLBROKER
       ,T.TRADETIMESTAMP
FROM dbname.TRADES T
INNER JOIN
(
 SELECT DISTINCT SYMBOL
 FROM dbname.TRADES
 WHERE PREVIOUSTICKDIRECTION < 1
 GROUP BY SYMBOL
 HAVING COUNT(SELLBROKER) > 3
) T2 ON T.SYMBOL = T2.SYMBOL
ORDER BY T.SYMBOL
        ,TIMESTAMP(T.TRADETIMESTAMP) ASC;

Open in new window


Glad I could help! ;-)

Sorry about the omissions . . . glad you could debug them (that's not always the case . . . which, IMHO, is a little frustrating for those providing the answers without anything to test against ;-).

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial