java preparedstatement with in clause for lists - SQL server

Hi Experts,

I'm doing a SELECT with a fairly large IN clause.  I've seen comments about batching where people tell you to use something like batch sizes of 51, 11, 4, then 1.  I know you can't bind a '?' to a list of items, but do I need to create 4 preparedstatements and have the respective number of '?' in my list to properly take advantage of the speed up of a prepared statement, or is it enough to build the list myself without the use of '?'   ?

Thanks!
Mike
LVL 1
threadyAsked:
Who is Participating?
 
mccarlIT Business Systems Analyst / Software DeveloperCommented:
The issue is not so much around the fact that you IN clause would be large, or about the performance of any single SELECT query, it is about how many iterations of this query and how often it would be run, etc. It comes down to the time it takes SQL Server to "parse" the SQL query. When you send a SQL query string to the server to process, it needs to convert those characters in the string into an internal representation which is easier to use to fetch the data, etc. Also, the server contains a cache of this information which is keyed by the actual SQL string representation. So if you send a SQL string which exactly matches a previous one, it can reuse the internal representation and save time.

I think you probably understand the above in principle, but I state to highlight the first point I wanted to make. It is not the Java PreparedStatement that caches this information and provides the optimization, ie. you don't need to create those 4 PreparedStatement objects and always use those to get the optimization, you can create a new PreparedStatement and as long as it has the exact same SQL query string, it will still be optimzed. Now if you can, it can also help a little bit to reuse the PreparedStatements too but that is less critical.

do I need to create 4 preparedstatements and have the respective number of '?' in my list to properly take advantage of the speed up
So yes, but you don't have to have all 4, you can just create the appropriate statements but keeping those batch sizes.

For example, say on one query run you have 106 values to go in your IN clause, then you create a PreparedStatement with the 51 "?" placeholders and provide the first 51 values to the query. So you have 55 values remaining, so you can reuse the PreparedStatement and send the next 51 values. Now you have 4 values remaining, so you create a PreparedStatement with 4 "?" placeholders and send the last values in. Notice that you haven't needed to create the other 2 statements yet.

Much better to have 4 preparedstatements no?
So from the above, yes, if you can keep the 4 PreparedStatements around, that's great. Otherwise just make sure that if you have to create the PreparedStatements each time the query is run, that the SQL query string has the same text in it. The PreparedStatement creation is not a big operation but ensuring that the SQL query string is one that the server has in it's cache is the important part.


Also remember that the server SQL cache is limited and that is why you pick 4 batch sizes and use them. This is better than using a variable number of ? placeholders each time because the cache will fill up with an entry for each different number of ?'s. Then the worst of the options is to replace the ?'s with the actual values in the actual query string, as this pretty much ensures that there WON'T be an entry in the cache for that particular query, so the query needs to be parsed, and then that one off query will waste space in the cache as it's unlikely to be used again.


Hope all that makes some sense, feel free to ask more questions if it doesn't.
0
 
CEHJCommented:
Yes, afaik you can't get optimization with a dynamic list. It would have to be static. IOW, you might as well use an ordinary Statement with a format string
0
 
threadyAuthor Commented:
Much better to have 4 preparedstatements no?
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
CEHJCommented:
Yes. afaik the contents of the list in each case would have to be literals. But i could be wrong ;)
0
 
CEHJCommented:
Thanks, mccarl for that explanation. I've found in the past that information on this issue is hard to come by. I'm sure you're correct but if you could cite some authoritative sources (say the docs of a major RDBMS) that would be great.
0
 
threadyAuthor Commented:
Awesome answer mccarl, thank you very much for that!  I too would love to have something to back up this stuff so I could prove my arguments to my boss!
0
 
mccarlIT Business Systems Analyst / Software DeveloperCommented:
You're welcome. I guess something like the following explains SQL Servers functionality (I'll admit that I am coming more from an Oracle perspective, but it appears to work in a similar fashion)

https://technet.microsoft.com/en-us/library/ms181055(v=sql.105).aspx
0
 
CEHJCommented:
Great, thanks. And great documentation too. Here's a little gotcha that it would be easy to fall foul of:

The algorithms to match new SQL statements to existing, unused execution plans in the cache require that all object references be fully qualified. For example, the first of these SELECT statements is not matched with an existing plan, and the second is matched:
0
 
threadyAuthor Commented:
Awesome help!  Thank you again!  Very much appreciated.

Mike
0
 
threadyAuthor Commented:
And CEHJ - good point there, but I can't wrap my head around the example they used... So I doubt I'm understanding what this caveat is...

SELECT * FROM Person;

SELECT * FROM Person.Person;

Would you mind explaining this ?

Thanks!
Mike
0
 
CEHJCommented:
Would you mind explaining this ?
They are simply saying that only the second statement will be cacheable as it's the only one fully-qualified in object naming terms. Let's hope that doesn't apply to other RDBMSs that i'm more likely to use ;) It's enough to make anyone paranoid enough to use fq-naming all the time
0
 
threadyAuthor Commented:
Ok I did understand correctly.  That's insane!

I can't believe they would transform a query before they store its hash in a map (or whatever they're doing)...
0
 
CEHJCommented:
Well i suppose the 'cache' could be global to the RDBMS, meaning that the first form could apply to several databases and hence would  not be usable as a key
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.