Strategy to solve a large table problem

I have a table which is a many to many child of two tables in my data model. It is very large, 21 million rows and occupies about 14GB. 7GB is index, 7GB is data roughly.

The table could be split into 14 tables representing 14 types of data stored in it. They all have the same data model but splitting this way would allow me to drop one int field and a lot of teh indexing and hence query plans would be similar. If I did this split it would cost be about one week of work, and the biggest resulting table would have 8 million records. The indexing would be much smaller because each of the 14 resulting tables only needs a subset of the indexing currently used.

I've tried running it through the database tuning wizard. I ended up with 12GB of indexes, about 40 separate Create Stats statements, and some queries were still slow.

So then I resorted to my own query optimisation work. Being an old hand at query optimisation I can many any query run in a few seconds with the right index, but when I've done that across all queries, I end up with 8GB of indexes and Sql gets confused about which indexes to use and will often choose the wrong one. It even seems necessary to repeat the same indexes with a different column order - hard to understand why sql can't adapt its query plans to use an existing index.

The queries are generated dynamically by the product so I don't want to get into only allowing certain "ways in"

One thing I've not tried yet is clustered views so I would be interested to know if any one has had success with this.

Any general thoughts on other strategies, approaches, experience in dealing with problems with large tables like this would be appreciated

Who is Participating?
Guy Hengel [angelIII / a3]Billing EngineerCommented:
well, indexing can be hard sometimes :)

now, let's see: do you know about filtered indexes:

this can make you create the smaller indexes based on the "type of data" you spoke of.

apart from that, I do presume that you know that the order of the columns present in the index is important, while the order of the columns mentioned in the WHERE / JOIN clause is not important.
'queries are generated dynamically '  and if there are lots of indexes available and statistics change there will always be a small difference that makes the one indexes more suited at that time.

What about partitioning?
plqAuthor Commented:
gosh I didn't know about that, but sadly the product runs on lots of servers using sql web edition - partioning seems to be an enterprise edition feature only. Still I'll read up on it as it may generate further ideas -- thanks
Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

plqAuthor Commented:
regarding your first comment - I'm saying that I create an optimal index that sql uses to get maybe 0.01 seconds response, but with that index still in existence alongside 10 others, sql will choose another one and take perhaps 2 minutes to run the query. When I force it to use the first index, it still takes 0.01 seconds, but going through all the dynamic sql generation code to work out which index is best would be a nightmare (well impossible).
plqAuthor Commented:
Wow I actually thought when I started this a few days ago.. why doesn't ms let us do create index .. where fieldvalue = NN ? - that, potentially, is exactly whats needed.

Regarding column ordering, yes I've had to create several indexes with the same columns in different orders to make sql use them - pain in the neck ! I have to admit I'm not impressed with having to do this.

OK this moves the goalposts hugely... let me have a play with this idea before diving into splitting up the table.


Are the table statistics up to date?
In most cases, the reason why query optimiser chooses wrong index is that the statistics are not up to date.

What way do you update statistics?
When SQL Server automatically creates statistics for a table and the table is large, it does not examine every row. So the statistics are created only from table sample. And so updates...
This can be another reason, why the query plan is suboptimal.
Try UPDATE STATISTICS table_name WITH FULLSCAN  (be careful on the production DB (table locking)... it can take some time)
then execute the slow query again and see if it helps

'queries are generated dynamically '
Which way you execute them?
EXEC SP_EXECUTESQL(@SQL) (better for execution plan reuse)

Can you send us the execution plan and statistics for the query?

plqAuthor Commented:
The sql is constructed by a .net4 application, and is executed through It will be different depending on each users permission levels and the query being constructed at the time. So I cannot reliably predict except to say that the sql will generally "go in" via 4 or 5 different key values in joins and wheres, in any combination.

Potentially there are thousands of different sql statements that could be generated. I have built up a workfile.

To date we've just created indexes and left sql to sort its own stats. Obviously thats not great, but as I said in the question even when we end up with 60 lots of create stats statements, sql is choosing the wrong indexes and taking ages to run simple queries that have indexes specifically designed for them (and if we force those indexes the query can still run sub 1 second). So the problem is the broad variety of queries that hit this table and sql;s failure to automatically choose the right indexes. I may end up forcing indexes as solution but at the moment I am staying with filtered indexing as a possible path to resolution.

Filtered indexing also promises much smaller index sizes which will be another benefit - smaller database size.

So I see what you're saying and I'll certainly investigate creating stats but for now I'll stay with filtered indexing


You wrote
......sql is choosing the wrong indexes.....
But in most cases it is not a failure of sql server, but failure of poor database maintanance

I agree with the filtered indexes, partitioning, covering indexes etc., but in this case, when one query runs less then a second and other almost similar two minutes, the problem really isn't in the sql server

Forcing sql server to use the right index (with HINTS) isn't really good solution...

Updating statistics and index defragmentation periodically IS A MUST, it can solve you many problems with query performance and it should be the first you have to examine when some execution plan uses wrong index
plqAuthor Commented:
>>"I agree with the filtered indexes, partitioning, covering indexes etc., but in this case, when one query runs less then a second and other almost similar two minutes, the problem really isn't in the sql server"

Well if that index is the ONLY index on that table, sql uses it and it takes 0.01 seconds

Add 20 more indexes and sql doesn't use the original, and it takes 20 seconds from using another wrong index instead of 0.01 seconds.  Force the index and its fast again.

Perhaps you're right that it needs create stats, but I would have thought sql would do better than that, and I've already tried creating stats without solving this problem.

So far, I am having huge success at filtered indexing.. working on it now...

plqAuthor Commented:
Filtered indexing provided an excellent solution to the problem. The index size has gone down to about 3GB and I have all queries using an index and mostly subsecond responses - not bad for 14GB and 21 million rows !

I've asked another question here about create stats

thanks for helping
plqAuthor Commented:
also, I didn't need to split the tables up, so saved me a week of nasty programming and bad language ! thank you !!
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.