java preparedstatement with in clause for lists - SQL server

Hi Experts,

I'm doing a SELECT with a fairly large IN clause.  I've seen comments about batching where people tell you to use something like batch sizes of 51, 11, 4, then 1.  I know you can't bind a '?' to a list of items, but do I need to create 4 preparedstatements and have the respective number of '?' in my list to properly take advantage of the speed up of a prepared statement, or is it enough to build the list myself without the use of '?'   ?

Thanks!
Mike
LVL 1
threadyAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

CEHJCommented:
Yes, afaik you can't get optimization with a dynamic list. It would have to be static. IOW, you might as well use an ordinary Statement with a format string
0
threadyAuthor Commented:
Much better to have 4 preparedstatements no?
0
CEHJCommented:
Yes. afaik the contents of the list in each case would have to be literals. But i could be wrong ;)
0
Acronis True Image 2019 just released!

Create a reliable backup. Make sure you always have dependable copies of your data so you can restore your entire system or individual files.

mccarlIT Business Systems Analyst / Software DeveloperCommented:
The issue is not so much around the fact that you IN clause would be large, or about the performance of any single SELECT query, it is about how many iterations of this query and how often it would be run, etc. It comes down to the time it takes SQL Server to "parse" the SQL query. When you send a SQL query string to the server to process, it needs to convert those characters in the string into an internal representation which is easier to use to fetch the data, etc. Also, the server contains a cache of this information which is keyed by the actual SQL string representation. So if you send a SQL string which exactly matches a previous one, it can reuse the internal representation and save time.

I think you probably understand the above in principle, but I state to highlight the first point I wanted to make. It is not the Java PreparedStatement that caches this information and provides the optimization, ie. you don't need to create those 4 PreparedStatement objects and always use those to get the optimization, you can create a new PreparedStatement and as long as it has the exact same SQL query string, it will still be optimzed. Now if you can, it can also help a little bit to reuse the PreparedStatements too but that is less critical.

do I need to create 4 preparedstatements and have the respective number of '?' in my list to properly take advantage of the speed up
So yes, but you don't have to have all 4, you can just create the appropriate statements but keeping those batch sizes.

For example, say on one query run you have 106 values to go in your IN clause, then you create a PreparedStatement with the 51 "?" placeholders and provide the first 51 values to the query. So you have 55 values remaining, so you can reuse the PreparedStatement and send the next 51 values. Now you have 4 values remaining, so you create a PreparedStatement with 4 "?" placeholders and send the last values in. Notice that you haven't needed to create the other 2 statements yet.

Much better to have 4 preparedstatements no?
So from the above, yes, if you can keep the 4 PreparedStatements around, that's great. Otherwise just make sure that if you have to create the PreparedStatements each time the query is run, that the SQL query string has the same text in it. The PreparedStatement creation is not a big operation but ensuring that the SQL query string is one that the server has in it's cache is the important part.


Also remember that the server SQL cache is limited and that is why you pick 4 batch sizes and use them. This is better than using a variable number of ? placeholders each time because the cache will fill up with an entry for each different number of ?'s. Then the worst of the options is to replace the ?'s with the actual values in the actual query string, as this pretty much ensures that there WON'T be an entry in the cache for that particular query, so the query needs to be parsed, and then that one off query will waste space in the cache as it's unlikely to be used again.


Hope all that makes some sense, feel free to ask more questions if it doesn't.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
CEHJCommented:
Thanks, mccarl for that explanation. I've found in the past that information on this issue is hard to come by. I'm sure you're correct but if you could cite some authoritative sources (say the docs of a major RDBMS) that would be great.
0
threadyAuthor Commented:
Awesome answer mccarl, thank you very much for that!  I too would love to have something to back up this stuff so I could prove my arguments to my boss!
0
mccarlIT Business Systems Analyst / Software DeveloperCommented:
You're welcome. I guess something like the following explains SQL Servers functionality (I'll admit that I am coming more from an Oracle perspective, but it appears to work in a similar fashion)

https://technet.microsoft.com/en-us/library/ms181055(v=sql.105).aspx
0
CEHJCommented:
Great, thanks. And great documentation too. Here's a little gotcha that it would be easy to fall foul of:

The algorithms to match new SQL statements to existing, unused execution plans in the cache require that all object references be fully qualified. For example, the first of these SELECT statements is not matched with an existing plan, and the second is matched:
0
threadyAuthor Commented:
Awesome help!  Thank you again!  Very much appreciated.

Mike
0
threadyAuthor Commented:
And CEHJ - good point there, but I can't wrap my head around the example they used... So I doubt I'm understanding what this caveat is...

SELECT * FROM Person;

SELECT * FROM Person.Person;

Would you mind explaining this ?

Thanks!
Mike
0
CEHJCommented:
Would you mind explaining this ?
They are simply saying that only the second statement will be cacheable as it's the only one fully-qualified in object naming terms. Let's hope that doesn't apply to other RDBMSs that i'm more likely to use ;) It's enough to make anyone paranoid enough to use fq-naming all the time
0
threadyAuthor Commented:
Ok I did understand correctly.  That's insane!

I can't believe they would transform a query before they store its hash in a map (or whatever they're doing)...
0
CEHJCommented:
Well i suppose the 'cache' could be global to the RDBMS, meaning that the first form could apply to several databases and hence would  not be usable as a key
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.