plq
asked on
Strategy to solve a large table problem
I have a table which is a many to many child of two tables in my data model. It is very large, 21 million rows and occupies about 14GB. 7GB is index, 7GB is data roughly.
The table could be split into 14 tables representing 14 types of data stored in it. They all have the same data model but splitting this way would allow me to drop one int field and a lot of teh indexing and hence query plans would be similar. If I did this split it would cost be about one week of work, and the biggest resulting table would have 8 million records. The indexing would be much smaller because each of the 14 resulting tables only needs a subset of the indexing currently used.
I've tried running it through the database tuning wizard. I ended up with 12GB of indexes, about 40 separate Create Stats statements, and some queries were still slow.
So then I resorted to my own query optimisation work. Being an old hand at query optimisation I can many any query run in a few seconds with the right index, but when I've done that across all queries, I end up with 8GB of indexes and Sql gets confused about which indexes to use and will often choose the wrong one. It even seems necessary to repeat the same indexes with a different column order - hard to understand why sql can't adapt its query plans to use an existing index.
The queries are generated dynamically by the product so I don't want to get into only allowing certain "ways in"
One thing I've not tried yet is clustered views so I would be interested to know if any one has had success with this.
Any general thoughts on other strategies, approaches, experience in dealing with problems with large tables like this would be appreciated
thanks
Paul
The table could be split into 14 tables representing 14 types of data stored in it. They all have the same data model but splitting this way would allow me to drop one int field and a lot of teh indexing and hence query plans would be similar. If I did this split it would cost be about one week of work, and the biggest resulting table would have 8 million records. The indexing would be much smaller because each of the 14 resulting tables only needs a subset of the indexing currently used.
I've tried running it through the database tuning wizard. I ended up with 12GB of indexes, about 40 separate Create Stats statements, and some queries were still slow.
So then I resorted to my own query optimisation work. Being an old hand at query optimisation I can many any query run in a few seconds with the right index, but when I've done that across all queries, I end up with 8GB of indexes and Sql gets confused about which indexes to use and will often choose the wrong one. It even seems necessary to repeat the same indexes with a different column order - hard to understand why sql can't adapt its query plans to use an existing index.
The queries are generated dynamically by the product so I don't want to get into only allowing certain "ways in"
One thing I've not tried yet is clustered views so I would be interested to know if any one has had success with this.
Any general thoughts on other strategies, approaches, experience in dealing with problems with large tables like this would be appreciated
thanks
Paul
ASKER
gosh I didn't know about that, but sadly the product runs on lots of servers using sql web edition - partioning seems to be an enterprise edition feature only. Still I'll read up on it as it may generate further ideas -- thanks
ASKER
regarding your first comment - I'm saying that I create an optimal index that sql uses to get maybe 0.01 seconds response, but with that index still in existence alongside 10 others, sql will choose another one and take perhaps 2 minutes to run the query. When I force it to use the first index, it still takes 0.01 seconds, but going through all the dynamic sql generation code to work out which index is best would be a nightmare (well impossible).
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Wow I actually thought when I started this a few days ago.. why doesn't ms let us do create index .. where fieldvalue = NN ? - that, potentially, is exactly whats needed.
Regarding column ordering, yes I've had to create several indexes with the same columns in different orders to make sql use them - pain in the neck ! I have to admit I'm not impressed with having to do this.
OK this moves the goalposts hugely... let me have a play with this idea before diving into splitting up the table.
Regarding column ordering, yes I've had to create several indexes with the same columns in different orders to make sql use them - pain in the neck ! I have to admit I'm not impressed with having to do this.
OK this moves the goalposts hugely... let me have a play with this idea before diving into splitting up the table.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
The sql is constructed by a .net4 application, and is executed through ado.net. It will be different depending on each users permission levels and the query being constructed at the time. So I cannot reliably predict except to say that the sql will generally "go in" via 4 or 5 different key values in joins and wheres, in any combination.
Potentially there are thousands of different sql statements that could be generated. I have built up a workfile.
To date we've just created indexes and left sql to sort its own stats. Obviously thats not great, but as I said in the question even when we end up with 60 lots of create stats statements, sql is choosing the wrong indexes and taking ages to run simple queries that have indexes specifically designed for them (and if we force those indexes the query can still run sub 1 second). So the problem is the broad variety of queries that hit this table and sql;s failure to automatically choose the right indexes. I may end up forcing indexes as solution but at the moment I am staying with filtered indexing as a possible path to resolution.
Filtered indexing also promises much smaller index sizes which will be another benefit - smaller database size.
So I see what you're saying and I'll certainly investigate creating stats but for now I'll stay with filtered indexing
thanks
Potentially there are thousands of different sql statements that could be generated. I have built up a workfile.
To date we've just created indexes and left sql to sort its own stats. Obviously thats not great, but as I said in the question even when we end up with 60 lots of create stats statements, sql is choosing the wrong indexes and taking ages to run simple queries that have indexes specifically designed for them (and if we force those indexes the query can still run sub 1 second). So the problem is the broad variety of queries that hit this table and sql;s failure to automatically choose the right indexes. I may end up forcing indexes as solution but at the moment I am staying with filtered indexing as a possible path to resolution.
Filtered indexing also promises much smaller index sizes which will be another benefit - smaller database size.
So I see what you're saying and I'll certainly investigate creating stats but for now I'll stay with filtered indexing
thanks
You wrote
......sql is choosing the wrong indexes.....
But in most cases it is not a failure of sql server, but failure of poor database maintanance
I agree with the filtered indexes, partitioning, covering indexes etc., but in this case, when one query runs less then a second and other almost similar two minutes, the problem really isn't in the sql server
Forcing sql server to use the right index (with HINTS) isn't really good solution...
Updating statistics and index defragmentation periodically IS A MUST, it can solve you many problems with query performance and it should be the first you have to examine when some execution plan uses wrong index
ASKER
>>"I agree with the filtered indexes, partitioning, covering indexes etc., but in this case, when one query runs less then a second and other almost similar two minutes, the problem really isn't in the sql server"
Well if that index is the ONLY index on that table, sql uses it and it takes 0.01 seconds
Add 20 more indexes and sql doesn't use the original, and it takes 20 seconds from using another wrong index instead of 0.01 seconds. Force the index and its fast again.
Perhaps you're right that it needs create stats, but I would have thought sql would do better than that, and I've already tried creating stats without solving this problem.
So far, I am having huge success at filtered indexing.. working on it now...
Well if that index is the ONLY index on that table, sql uses it and it takes 0.01 seconds
Add 20 more indexes and sql doesn't use the original, and it takes 20 seconds from using another wrong index instead of 0.01 seconds. Force the index and its fast again.
Perhaps you're right that it needs create stats, but I would have thought sql would do better than that, and I've already tried creating stats without solving this problem.
So far, I am having huge success at filtered indexing.. working on it now...
ASKER
Filtered indexing provided an excellent solution to the problem. The index size has gone down to about 3GB and I have all queries using an index and mostly subsecond responses - not bad for 14GB and 21 million rows !
I've asked another question here about create stats
https://www.experts-exchange.com/questions/27383064/Create-Statistics-how-to-decide-what-to-create.html?cid=239
thanks for helping
I've asked another question here about create stats
https://www.experts-exchange.com/questions/27383064/Create-Statistics-how-to-decide-what-to-create.html?cid=239
thanks for helping
ASKER
also, I didn't need to split the tables up, so saved me a week of nasty programming and bad language ! thank you !!
What about partitioning? http://www.mssqltips.com/sqlservertip/1914/sql-server-database-partitioning-myths-and-truths/