Avatar of Turkish27
Turkish27

asked on 

sub select in 'in clause'

Hi All,

I have just started using mySQL after a few years using MSSQL and I have hit a road block. For some reason the performance on a subselect within an 'in' clause in mySQL is abysmal.
I have a fairly simple query:

SELECT      SYMBOL, PRICE, SELLBROKER, TRADETIMESTAMP
FROM        dbname.TRADES
WHERE       SYMBOL IN ( SELECT      DISTINCT SYMBOL
                        FROM        dbname.TRADES
                        WHERE       PREVIOUSTICKDIRECTION < 1
                        GROUP BY    SYMBOL
                        HAVING      COUNT(SELLBROKER) > 3)
ORDER BY    SYMBOL, timestamp(TRADETIMESTAMP) ASC;

Open in new window


As it stands this takes 35 SEC (omg!) to run. If i run the subselect separately  it only takes .03s to run! If I grab the 200 records from it in a comma delimited list (i.e. 'AAA', 'BBB', 'CCC') and throw it in the IN clause manually the query runs super fast. So it is the subquery within the in clause itself that it does not like. MS SQL would handle this no problem. The only article on the net that I could actually find to explain this problem is:

http://spin.atomicobject.com/2011/03/25/mysql-in-query-performance/

I don't have the option (from what I know about the nature of this query) to do any type of join. I mean, it seems pretty straight forward, and I am surprised on the lack of documentation on this. Can anyone help please!!

The SYMBOL column is indexed as well. Originally I thought it was the problem but the fact that I can enter a comma delimited string explicitly and it runs fine shows that it isnt an index problem.

My table currently has 10k records in it but will have many more.

Thank you!
MySQL ServerSQL

Avatar of undefined
Last Comment
8080_Diver
SOLUTION
Avatar of Patrick Matthews
Patrick Matthews
Flag of United States of America image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of Turkish27
Turkish27

ASKER

Thanks for the quick reply matthew. This actually did work a lot faster. The query completed in 0.4 seconds. However this is a bit too slow still. If I manually enter in the comma delimited string, it comes back in 0.016 seconds.

I will be running all this in a stored procedure. So I DO have the option of firing the subselect query, parsing it out into a comma delimited list and putting it into the other query. However, this seems ridiculous given the fact of how mature of a product mySQL is. Do you have any other suggestions? (I will give you partial points when all is said and done, since yours indeed does work better)
ASKER CERTIFIED SOLUTION
Avatar of 8080_Diver
8080_Diver
Flag of United States of America image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
Avatar of Turkish27
Turkish27

ASKER

Thanks Diver.

This worked with a final time of .03ms which is very acceptable. There was just a syntax issue where you were missing the join column and second table alias:

SELECT  T.SYMBOL
       ,T.PRICE
       ,T.SELLBROKER
       ,T.TRADETIMESTAMP
FROM dbname.TRADES T
INNER JOIN
(
 SELECT DISTINCT SYMBOL
 FROM dbname.TRADES
 WHERE PREVIOUSTICKDIRECTION < 1
 GROUP BY SYMBOL
 HAVING COUNT(SELLBROKER) > 3
) T2 ON T.SYMBOL = T2.SYMBOL
ORDER BY T.SYMBOL
        ,TIMESTAMP(T.TRADETIMESTAMP) ASC;

Open in new window


Avatar of 8080_Diver
8080_Diver
Flag of United States of America image

Glad I could help! ;-)

Sorry about the omissions . . . glad you could debug them (that's not always the case . . . which, IMHO, is a little frustrating for those providing the answers without anything to test against ;-).
SQL
SQL

SQL (Structured Query Language) is designed to be used in conjunction with relational database products as of a means of working with sets of data. SQL consists of data definition, data manipulation, and procedural elements. Its scope includes data insert, query, update and delete, schema creation and modification, and data access control.

61K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo