?
Solved

Strategy to solve a large table problem

Posted on 2011-10-05
11
Medium Priority
?
236 Views
Last Modified: 2012-05-12
I have a table which is a many to many child of two tables in my data model. It is very large, 21 million rows and occupies about 14GB. 7GB is index, 7GB is data roughly.

The table could be split into 14 tables representing 14 types of data stored in it. They all have the same data model but splitting this way would allow me to drop one int field and a lot of teh indexing and hence query plans would be similar. If I did this split it would cost be about one week of work, and the biggest resulting table would have 8 million records. The indexing would be much smaller because each of the 14 resulting tables only needs a subset of the indexing currently used.

I've tried running it through the database tuning wizard. I ended up with 12GB of indexes, about 40 separate Create Stats statements, and some queries were still slow.

So then I resorted to my own query optimisation work. Being an old hand at query optimisation I can many any query run in a few seconds with the right index, but when I've done that across all queries, I end up with 8GB of indexes and Sql gets confused about which indexes to use and will often choose the wrong one. It even seems necessary to repeat the same indexes with a different column order - hard to understand why sql can't adapt its query plans to use an existing index.

The queries are generated dynamically by the product so I don't want to get into only allowing certain "ways in"

One thing I've not tried yet is clustered views so I would be interested to know if any one has had success with this.

Any general thoughts on other strategies, approaches, experience in dealing with problems with large tables like this would be appreciated

thanks
Paul
0
Comment
Question by:plq
11 Comments
 
LVL 25

Expert Comment

by:jogos
ID: 36915917
'queries are generated dynamically '  and if there are lots of indexes available and statistics change there will always be a small difference that makes the one indexes more suited at that time.

What about partitioning? http://www.mssqltips.com/sqlservertip/1914/sql-server-database-partitioning-myths-and-truths/
0
 
LVL 8

Author Comment

by:plq
ID: 36915956
gosh I didn't know about that, but sadly the product runs on lots of servers using sql web edition - partioning seems to be an enterprise edition feature only. Still I'll read up on it as it may generate further ideas -- thanks
0
 
LVL 8

Author Comment

by:plq
ID: 36915971
regarding your first comment - I'm saying that I create an optimal index that sql uses to get maybe 0.01 seconds response, but with that index still in existence alongside 10 others, sql will choose another one and take perhaps 2 minutes to run the query. When I force it to use the first index, it still takes 0.01 seconds, but going through all the dynamic sql generation code to work out which index is best would be a nightmare (well impossible).
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 143

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 1800 total points
ID: 36916496
well, indexing can be hard sometimes :)

now, let's see: do you know about filtered indexes:
http://technet.microsoft.com/en-us/library/cc280372.aspx

this can make you create the smaller indexes based on the "type of data" you spoke of.

apart from that, I do presume that you know that the order of the columns present in the index is important, while the order of the columns mentioned in the WHERE / JOIN clause is not important.
0
 
LVL 8

Author Comment

by:plq
ID: 36916548
Wow I actually thought when I started this a few days ago.. why doesn't ms let us do create index .. where fieldvalue = NN ? - that, potentially, is exactly whats needed.

Regarding column ordering, yes I've had to create several indexes with the same columns in different orders to make sql use them - pain in the neck ! I have to admit I'm not impressed with having to do this.

OK this moves the goalposts hugely... let me have a play with this idea before diving into splitting up the table.

0
 
LVL 3

Assisted Solution

by:jvejskrab
jvejskrab earned 200 total points
ID: 36916839

Are the table statistics up to date?
In most cases, the reason why query optimiser chooses wrong index is that the statistics are not up to date.

What way do you update statistics?
When SQL Server automatically creates statistics for a table and the table is large, it does not examine every row. So the statistics are created only from table sample. And so updates...
This can be another reason, why the query plan is suboptimal.
Try UPDATE STATISTICS table_name WITH FULLSCAN  (be careful on the production DB (table locking)... it can take some time)
then execute the slow query again and see if it helps

'queries are generated dynamically '
Which way you execute them?
EXEC(@SQL)
OR
EXEC SP_EXECUTESQL(@SQL) (better for execution plan reuse)

Can you send us the execution plan and statistics for the query?

0
 
LVL 8

Author Comment

by:plq
ID: 36916921
The sql is constructed by a .net4 application, and is executed through ado.net. It will be different depending on each users permission levels and the query being constructed at the time. So I cannot reliably predict except to say that the sql will generally "go in" via 4 or 5 different key values in joins and wheres, in any combination.

Potentially there are thousands of different sql statements that could be generated. I have built up a workfile.

To date we've just created indexes and left sql to sort its own stats. Obviously thats not great, but as I said in the question even when we end up with 60 lots of create stats statements, sql is choosing the wrong indexes and taking ages to run simple queries that have indexes specifically designed for them (and if we force those indexes the query can still run sub 1 second). So the problem is the broad variety of queries that hit this table and sql;s failure to automatically choose the right indexes. I may end up forcing indexes as solution but at the moment I am staying with filtered indexing as a possible path to resolution.

Filtered indexing also promises much smaller index sizes which will be another benefit - smaller database size.

So I see what you're saying and I'll certainly investigate creating stats but for now I'll stay with filtered indexing

thanks
0
 
LVL 3

Expert Comment

by:jvejskrab
ID: 36917318

You wrote
......sql is choosing the wrong indexes.....
But in most cases it is not a failure of sql server, but failure of poor database maintanance

I agree with the filtered indexes, partitioning, covering indexes etc., but in this case, when one query runs less then a second and other almost similar two minutes, the problem really isn't in the sql server

Forcing sql server to use the right index (with HINTS) isn't really good solution...

Updating statistics and index defragmentation periodically IS A MUST, it can solve you many problems with query performance and it should be the first you have to examine when some execution plan uses wrong index
0
 
LVL 8

Author Comment

by:plq
ID: 36917419
>>"I agree with the filtered indexes, partitioning, covering indexes etc., but in this case, when one query runs less then a second and other almost similar two minutes, the problem really isn't in the sql server"

Well if that index is the ONLY index on that table, sql uses it and it takes 0.01 seconds

Add 20 more indexes and sql doesn't use the original, and it takes 20 seconds from using another wrong index instead of 0.01 seconds.  Force the index and its fast again.

Perhaps you're right that it needs create stats, but I would have thought sql would do better than that, and I've already tried creating stats without solving this problem.

So far, I am having huge success at filtered indexing.. working on it now...

0
 
LVL 8

Author Closing Comment

by:plq
ID: 36923264
Filtered indexing provided an excellent solution to the problem. The index size has gone down to about 3GB and I have all queries using an index and mostly subsecond responses - not bad for 14GB and 21 million rows !

I've asked another question here about create stats

http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/Q_27383064.html?cid=239

thanks for helping
0
 
LVL 8

Author Comment

by:plq
ID: 36923267
also, I didn't need to split the tables up, so saved me a week of nasty programming and bad language ! thank you !!
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Why is this different from all of the other step by step guides?  Because I make a living as a DBA and not as a writer and I lived through this experience. Defining the name: When I talk to people they say different names on this subject stuff l…
What if you have to shut down the entire Citrix infrastructure for hardware maintenance, software upgrades or "the unknown"? I developed this plan for "the unknown" and hope that it helps you as well. This article explains how to properly shut down …
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
Suggested Courses

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question