Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

SQL 2008 Indexing issues on a large table with many searchable fields

Posted on 2013-11-22
Last Modified: 2013-12-06
I am working on a project where there is a very large table (28,827,861 rows w/ 32,160.711 MB data space)

90% of the 132 fields in this table are queriable (unfortunately, please do not ask why).  I spend last 3 days trying to get the most out of indexing.  I realize I can't capture verey scenario, but if I can get most I will be happy.  I am not sure how to go about asking for help here as I do not want to put the layout of my table.  And I do not want to rename 132 fields to display here.  But basically I been using non-clustered indexing.  Clustered did not help and takes forever to index.  If I create an optimal index per specific query it's great instant run.  However I can't do that for every query.  I tried creating a non-clustered index where I placed 16 (max allowed) fields to ON clause, and added about 100 fields to INCLUDE clause of the indexing clause.  This took about an hour to create index.  and helped some queries, but not all my test case queries.  In any case, I am wondering from our DB guru's is this an impossible task, or what is recommended for large table with many selectable fields?  If you can respond with plain easy to understand explanations I would appreciate it, or I would be spending a day deciphering your answers as I am not a DB expert, just sufficient knowledge to get by.  Thanks in advance for your help.
Question by:CPOINT2000
LVL 39

Expert Comment

ID: 39670310
In my opinion the best is to have the "key" of the row in an index - no more than 2/3 columns and unfortunately in SQL (unlike ORACLE) case the order of them matters and INCLUDE all rest of the columns. Only the first 1 and 2 (maybe 3rd) columns makes the difference in an index and if you don't have a clustered index on the table this may work for you.

I am faced with similar issue on the SQL backend hit by LLBGEN generated code where clients can search on any column in a particular set of tables (products, orders) and is impossible as you said to cover all possible scenarios. What helped me tremendously and I warmly recommend it is for you to use SQL Own Performance Dashboard reports and that one has a Missing Index report sortable by different criteria and you can quickly see what indexes are missing that can give you the biggest performance improvement.  

LVL 50

Expert Comment

ID: 39672698
to assist we have to understand more of the background to the table.

how is it updated

 frequency of inserts/updates/deletes

what do you mean by queriable?

are the columns code values ... and queries just look for individual values,or ranges of values, or lists of values, or searched using LIKE or via functions  (e.g... substring(x,1,10)='xxx'

in most cases it is preferable to have a clustering index defined on a table

what types of queries are being run...

return a few rows, return many rows,  
return counts, sums, averages , across sets of data...

how many simultaneous users are there...

what is the server configuration

is the table partitioned?
are the columns sparse ?
what in general are the data types for the columns?

have you considered indexed views?

Author Comment

ID: 39672761
updated in batch
weekly updates
majority of the fields can be selected and be in where clause
some fields are range checks, some use wildcard, some exact match, some with "IN" clause
adhoc queries run, can be any combination
returns few to many.  when return over 500 lines, extraction comes back as zipped csv file
currently few simultaneous users 2-5, but later 25-50
Windows server 2008 R2 standard, 12GB ram, 64bit OS, intel Xeon CPU X560 @ 2.67GHz (2 processors)
Table is not partitioned.  I tried but did not get good results and due to time sensitivity did not have excessive time to test it thoroughly
Data types are mixed string,numeric,date.  mostly string and numeric, some of numeric are large format currency
LVL 69

Accepted Solution

Scott Pletcher earned 500 total points
ID: 39675342
You must have a clustered index; SQL handles heaps poorly, so they can perform poorly no matter what you do to them.  The best clustered index is the critical part of tuning queries anyway.  Otherwise, as you've discovered, you must have covering indexes for virtually every query, and that's almost never practical.

Please post the results of these queries, including headings, exactly as SQL returns them:

DECLARE @table_name sysname
SET @table_name = '<very_large_table_name_goes_here>' --<<-- chg to your table name

--missing indexes, according to SQL
        GETDATE() AS capture_date,
        DB_NAME(mid.database_id) AS Db_Name,
        OBJECT_NAME(mid.object_id /*, mid.database_id*/) AS Table_Name,
        mid.equality_columns, mid.inequality_columns, mid.included_columns,
        mid.statement, mid.object_id, mid.index_handle
    FROM sys.dm_db_missing_index_details mid WITH (NOLOCK)
    LEFT OUTER JOIN sys.dm_db_missing_index_groups mig WITH (NOLOCK) ON
        mig.index_handle = mid.index_handle
    LEFT OUTER JOIN sys.dm_db_missing_index_group_stats migs WITH (NOLOCK) ON
        migs.group_handle = mig.index_group_handle
    --order by
        --DB_NAME, Table_Name, equality_columns
        OBJECT_NAME(mid.object_id) = @table_name

-- existing index usage
    ius2.row_num, DB_NAME() AS db_name, OBJECT_NAME(i.object_id) AS table_name, i.name AS index_name,
    FILEGROUP_NAME(i.data_space_id) AS filegroup_name,
    (SELECT DATEDIFF(DAY, create_date, GETDATE()) FROM sys.databases WHERE name = 'tempdb') AS sql_up_days,
    fk.Reference_Count AS fk_ref_count,
    ius.index_id, --ius.user_seeks + ius.user_scans AS total_reads,
    ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates,
    ius.last_user_seek, ius.last_user_scan, ius.last_user_lookup, ius.last_user_update,
    ius.system_seeks, ius.system_scans, ius.system_lookups, ius.system_updates,
    ius.last_system_seek, ius.last_system_scan, ius.last_system_lookup, ius.last_system_update
FROM sys.indexes i WITH (NOLOCK)
    o.object_id = i.object_id
LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON
    dps.object_id = i.object_id AND
    dps.index_id = i.index_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats ius WITH (NOLOCK) ON
    ius.database_id = DB_ID() AND
    ius.object_id = i.object_id AND
    ius.index_id = i.index_id
        database_id, object_id, MAX(user_scans) AS user_scans,
        ROW_NUMBER() OVER (ORDER BY MAX(user_scans) DESC) AS row_num --user_scans|user_seeks+user_scans
    FROM sys.dm_db_index_usage_stats WITH (NOLOCK)
        database_id = DB_ID()
        --AND index_id > 0
        database_id, object_id
) AS ius2 ON
    ius2.database_id = DB_ID() AND
    ius2.object_id = i.object_id
        referenced_object_id, COUNT(*) AS Reference_Count
    FROM sys.foreign_keys
        is_disabled = 0
) AS fk ON
    fk.referenced_object_id = i.object_id
    OBJECT_NAME(i.object_id) = @table_name

Author Comment

ID: 39675358
Thanks I will check these out and let you know

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There have been several questions about Large Transaction Log Files in SQL Server 2008, and how to get rid of them when disk space has become critical. This article will explain how to disable full recovery and implement simple recovery that carries…
Hi all, It is important and often overlooked to understand “Database properties”. Often we see questions about "log files" or "where is the database" and one of the easiest ways to get general information about your database is to use “Database p…
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question