Link to home
Start Free TrialLog in
Avatar of plq
plqFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Strategy to solve a large table problem

I have a table which is a many to many child of two tables in my data model. It is very large, 21 million rows and occupies about 14GB. 7GB is index, 7GB is data roughly.

The table could be split into 14 tables representing 14 types of data stored in it. They all have the same data model but splitting this way would allow me to drop one int field and a lot of teh indexing and hence query plans would be similar. If I did this split it would cost be about one week of work, and the biggest resulting table would have 8 million records. The indexing would be much smaller because each of the 14 resulting tables only needs a subset of the indexing currently used.

I've tried running it through the database tuning wizard. I ended up with 12GB of indexes, about 40 separate Create Stats statements, and some queries were still slow.

So then I resorted to my own query optimisation work. Being an old hand at query optimisation I can many any query run in a few seconds with the right index, but when I've done that across all queries, I end up with 8GB of indexes and Sql gets confused about which indexes to use and will often choose the wrong one. It even seems necessary to repeat the same indexes with a different column order - hard to understand why sql can't adapt its query plans to use an existing index.

The queries are generated dynamically by the product so I don't want to get into only allowing certain "ways in"

One thing I've not tried yet is clustered views so I would be interested to know if any one has had success with this.

Any general thoughts on other strategies, approaches, experience in dealing with problems with large tables like this would be appreciated

thanks
Paul
Avatar of jogos
jogos
Flag of Belgium image

'queries are generated dynamically '  and if there are lots of indexes available and statistics change there will always be a small difference that makes the one indexes more suited at that time.

What about partitioning? http://www.mssqltips.com/sqlservertip/1914/sql-server-database-partitioning-myths-and-truths/
Avatar of plq

ASKER

gosh I didn't know about that, but sadly the product runs on lots of servers using sql web edition - partioning seems to be an enterprise edition feature only. Still I'll read up on it as it may generate further ideas -- thanks
Avatar of plq

ASKER

regarding your first comment - I'm saying that I create an optimal index that sql uses to get maybe 0.01 seconds response, but with that index still in existence alongside 10 others, sql will choose another one and take perhaps 2 minutes to run the query. When I force it to use the first index, it still takes 0.01 seconds, but going through all the dynamic sql generation code to work out which index is best would be a nightmare (well impossible).
ASKER CERTIFIED SOLUTION
Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of plq

ASKER

Wow I actually thought when I started this a few days ago.. why doesn't ms let us do create index .. where fieldvalue = NN ? - that, potentially, is exactly whats needed.

Regarding column ordering, yes I've had to create several indexes with the same columns in different orders to make sql use them - pain in the neck ! I have to admit I'm not impressed with having to do this.

OK this moves the goalposts hugely... let me have a play with this idea before diving into splitting up the table.

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of plq

ASKER

The sql is constructed by a .net4 application, and is executed through ado.net. It will be different depending on each users permission levels and the query being constructed at the time. So I cannot reliably predict except to say that the sql will generally "go in" via 4 or 5 different key values in joins and wheres, in any combination.

Potentially there are thousands of different sql statements that could be generated. I have built up a workfile.

To date we've just created indexes and left sql to sort its own stats. Obviously thats not great, but as I said in the question even when we end up with 60 lots of create stats statements, sql is choosing the wrong indexes and taking ages to run simple queries that have indexes specifically designed for them (and if we force those indexes the query can still run sub 1 second). So the problem is the broad variety of queries that hit this table and sql;s failure to automatically choose the right indexes. I may end up forcing indexes as solution but at the moment I am staying with filtered indexing as a possible path to resolution.

Filtered indexing also promises much smaller index sizes which will be another benefit - smaller database size.

So I see what you're saying and I'll certainly investigate creating stats but for now I'll stay with filtered indexing

thanks

You wrote
......sql is choosing the wrong indexes.....
But in most cases it is not a failure of sql server, but failure of poor database maintanance

I agree with the filtered indexes, partitioning, covering indexes etc., but in this case, when one query runs less then a second and other almost similar two minutes, the problem really isn't in the sql server

Forcing sql server to use the right index (with HINTS) isn't really good solution...

Updating statistics and index defragmentation periodically IS A MUST, it can solve you many problems with query performance and it should be the first you have to examine when some execution plan uses wrong index
Avatar of plq

ASKER

>>"I agree with the filtered indexes, partitioning, covering indexes etc., but in this case, when one query runs less then a second and other almost similar two minutes, the problem really isn't in the sql server"

Well if that index is the ONLY index on that table, sql uses it and it takes 0.01 seconds

Add 20 more indexes and sql doesn't use the original, and it takes 20 seconds from using another wrong index instead of 0.01 seconds.  Force the index and its fast again.

Perhaps you're right that it needs create stats, but I would have thought sql would do better than that, and I've already tried creating stats without solving this problem.

So far, I am having huge success at filtered indexing.. working on it now...

Avatar of plq

ASKER

Filtered indexing provided an excellent solution to the problem. The index size has gone down to about 3GB and I have all queries using an index and mostly subsecond responses - not bad for 14GB and 21 million rows !

I've asked another question here about create stats

https://www.experts-exchange.com/questions/27383064/Create-Statistics-how-to-decide-what-to-create.html?cid=239

thanks for helping
Avatar of plq

ASKER

also, I didn't need to split the tables up, so saved me a week of nasty programming and bad language ! thank you !!