How do I properly index a simple table for fast searches?

I have an audit table with the following schema:

CREATE TABLE [dbo].[T4E_Audit](
	[ActionDateTime] [datetime] NOT NULL
	, [ActionCode] [varchar](50) NOT NULL
	, [ActionText] [varchar](8000)
	, [ActionOldValue] [varchar] (8000)
	, [ActionNewValue] [varchar] (8000)
	
	, [ActorCN] [varchar] (500)
	, [ActorLBACid] [int]
	, [ActorWIUID] [bigint]
	, [ActorUsername] [varchar] (20)
	, [ActorDisplayName] [varchar] (50)
	
	, [SubjectCN] [varchar] (500) NOT NULL
	, [SubjectLBACid] [int]
	, [SubjectWIUID] [bigint] NOT NULL
	, [SubjectUsername] [varchar] (20) NOT NULL
	, [SubjectDisplayName] [varchar] (50) NOT NULL
)

Open in new window


I am running a lot of searches against this table, specifically:
- like '..%' compares against ActorWIUID,ActorUsername, and ActorDisplayName (same for Subject* fields)
- = compares against ActionCode
- date range compares against ActionDateTime

I figured if I add the following index (and rebuild it), everything would run faster, but it runs considerably slower.

CREATE NONCLUSTERED INDEX [IX_T4E_Audit] ON [dbo].[T4E_Audit] 
(
	[ActionCode] ASC,
	[ActionDateTime] ASC,
	[ActorWIUID] ASC,
	[ActorUsername] ASC,
	[ActorDisplayName] ASC,
	[SubjectWIUID] ASC,
	[SubjectUsername] ASC,
	[SubjectDisplayName] ASC
)

Open in new window


...What is the right way to go about this?
CleanAsisAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
LowfatspreadConnect With a Mentor Commented:
1)  it depends on how data is inserted into the table  
      lets leave it for the moment , as a clustering index will be more beneficial

2) no you aren't allowed to specify the "same" index more than once anyway...
    the preference would be to define it as the clustered index.

3) you defined it as non clustered?

4) -

------------------------

can you try this?

CREATE CLUSTERED INDEX [IX_T4E_Audit] ON [dbo].[T4E_Audit]
(
     
      [ActionDateTime] ASC,
       [ActionCode] ASC
)
GO
SELECT top 1000 [ActionDateTime]
                                ,[ActionCode]
                                ,[ActionText]
                                ,[ActionOldValue]
                                ,[ActionNewValue]
                                ,[ActorCN]
                                ,[ActorLBACid]
                                ,[ActorWIUID]
                                ,[ActorUsername]
                                ,[ActorDisplayName]
                                ,[SubjectCN]
                                ,[SubjectLBACid]
                                ,[SubjectWIUID]
                                ,[SubjectUsername]
                                ,[SubjectDisplayName]
                          FROM [dbo].[T4E_Audit]
where  [ActionDateTime]  between '20110901' and '20110914 23:59:59.997'
    and  [ActionCode] in ('Audit Search' ,'Front Page Redirect to Legacy'  )
   and (
               [ActorWIUID] between 1100 and 1199  -- its a number so don't do a character comparison
           or [ActorUsername] like '11%'
           or [ActorDisplayName] like '11%'
           or [SubjectWIUID] between 1100 AND 1199 -- ITS A NUMBER ...
           or [SubjectUsername] like '11%'
           or [SubjectDisplayName] like '11%')
 
ORDER BY [ActionDateTime] DESC
0
 
khairilCommented:
Hi,

You can create multiple indexes for each criteria you are searching, try avoid compound indexes.
0
 
CleanAsisAuthor Commented:
Can you give a best example based on the criteria I described please?
0
Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

 
Anthony PerkinsCommented:
If you are doing LIKE '%something'  no amount of indexes will help you, they will not be used.  Your query will still be a table scan.

Try posting your specific query and we can give you some suggestions.
0
 
khairilCommented:
Same like you did above, but just one field only.

acper said is true, if you using LIKE statement indexes will not help. You can analyze you query using by displaying estimted planned, http://www.slideshare.net/nspyrenet/sql-query-analyzer-maintenance. Over there then you can see what query does tables scan and what doing index scan.

Better yet is you can use database tuning advisor (DTA), this will require some effort on your side to profile your database and do the workload analyzing in DTA. DTA also will give you recommendation on how to tune your database. Hope this link will help you doing profiling dan running DTA, http://www.zimbio.com/SQL/articles/655/How+Tune+Database+Using+Database+Tuning+Advisor 
0
 
LowfatspreadCommented:
what is the tables primary key?
what is the clustering key?

why are you doing a like search against the UID columns they are bigints?

how big are the table rows usually?

how many rows on the table?

can you show us an example select ?

what cardinality do the columns have (how many unique values per column) ?

what proportion of the table do you expect the queries to return?
0
 
CleanAsisAuthor Commented:
Lowfatspread,

what is the tables primary key? doesn't have one

what is the clustering key? I'm not entirely sure what this means but I do not have any indexes on this table besides this one I'm trying to add

why are you doing a like search against the UID columns they are bigints? Because if someone's ID is "1234" I want to be able to find it by searching for "12*"

how big are the table rows usually? The fields I'm searching aren't big; see the schema I posted in the original email

how many rows on the table? Millions

can you show us an example select ? Here's one with all possible filters:
SELECT top 1000 [ActionDateTime]
					  ,[ActionCode]
					  ,[ActionText]
					  ,[ActionOldValue]
					  ,[ActionNewValue]
					  ,[ActorCN]
					  ,[ActorLBACid]
					  ,[ActorWIUID]
					  ,[ActorUsername]
					  ,[ActorDisplayName]
					  ,[SubjectCN]
					  ,[SubjectLBACid]
					  ,[SubjectWIUID]
					  ,[SubjectUsername]
					  ,[SubjectDisplayName]
				  FROM [dbo].[T4E_Audit] where  
([ActorWIUID] like '11%' or [ActorUsername] like '11%' or [ActorDisplayName] like '11%' or [SubjectWIUID] like '11%' or [SubjectUsername] like '11%' or [SubjectDisplayName] like '11%')
  and  [ActionDateTime] > CONVERT( DATETIME, '9/1/2011', 101 )  and  [ActionDateTime] < CONVERT( DATETIME, '9/14/2011', 101 ) 
 and  (  [ActionCode]='Audit Search' or [ActionCode]='Front Page Redirect to Legacy'  ) 
ORDER BY [ActionDateTime] DESC

Open in new window


what cardinality do the columns have (how many unique values per column) ? I'm not entirely sure what this means, but there will be a lot of repetition of data across these columns

what proportion of the table do you expect the queries to return? I'm not sure what you mean by proportion, but 1000 records...

Thank you!


0
 
LowfatspreadCommented:
ok a primary key would help
 is there an obvious set of columns on the table which (in combination) are unique?
 (or could you alter the table and add an identity column  and use that as the primary key?)

a clustering index would also help
  from the look of the table I would assume that  a clustering index of
     Actiondatetime asc,ActionCode asc,subjectWiuid asc
   would probably be beneficial

UID search then its better to do a range search e.g.  UID between 1200 and 1299 which is the correct datatype, and a much more limited range...

the search fields may not be big but you have 3 columns which potentially could be 8000 characters so the rows themselve are potentially large which means more IO and less speed.

Ol it looks like your query is trying to return all data from the row...
given the above  example query the processing probably has to search the whole table anyway (because the table doesn't have a Primary Key or a Clustering Index defined ... so its a Heap ... so data will potentially exist on every page in the table and once the query has to visit more than 15-25% of the tables pages it usually better to just read the whole table as large IO reads are then generally more possible to achieve)

your "candidate" index is probably better as

CREATE NONCLUSTERED(*) INDEX [IX_T4E_Audit] ON [dbo].[T4E_Audit]
(
      
      [ActionDateTime] ASC,
       [ActionCode] ASC,
       [SubjectWIUID] ASC,
      [SubjectUsername] ASC,

-- just the above 4 columns may be the "better" solution
      [SubjectDisplayName] ASC
      [ActorWIUID] ASC,
      [ActorUsername] ASC,
      [ActorDisplayName] ASC,
      
)

the inclusion of the nullable columns may not provide much benefit

This is a possible candidate for a CLUSTERED Index ... which would be benficial (appears to be) to your queries


>cardinality
 very important concept for database and index efficiency...
 How many distinct/Unique values does each column contain ...
   ie how many valid Action Codes are there? (presumably only a few < 50?) so cardinality is low-medium
       the ActionDateTime presumeably has a "high" cardinality since there are 365 days per year (even if you don't record      actual times) so each "unique" actiondate is about 0.3% of the Tables data? (divide that further by each year the table holds)

  when designing indexes you want the columns whith the highest cardinality at the start (left) of the index (subject to your queries actually specifying values for those columns)

>proportion
  ol you are limiting the queries output to 1000 rows but it still has to process all the rows matching the query selection in order to determine which 1000 to give to you ... its the number of rows it has to process that is important ... as that percentage of rows against the number of rows on the table determines how the database (generally) processes the query.


0
 
CleanAsisAuthor Commented:
Lowfatspread,

Thanks for all your insight.

1) Primary key - how would it help with my searches? There is no set of columns that would guarantee a unique value.

2) In regards to a clustered index - you're saying it could be beneficial, but in the query you're proposing you're using all the same columns within a non-clustered index.  Are you saying I should have both, clustered and non-clustered index on those columns?

3) The "candidate" index you provided - I set it on my table and clicked "rebuild" on it in MSSQL studio and I'm experiencing the same as I described with my original index - the searches actually get slower

4) You're right on your cardinality assumptions
0
 
Anthony PerkinsConnect With a Mentor Commented:
khairil,

>>acper said is true, if you using LIKE statement indexes will not help.<<
Actually that is not what I said.  Here it is again:
If you are doing LIKE '%something'  no amount of indexes will help you, they will not be used.
Since the author is using LIKE '11%', provided the column is character data, it can make use of an index.
0
 
khairilCommented:
Sorry for that acper, not reading you thoroughly.

Still depend on how you use LIKE operation. The LIKE operator will perform a full table scan if operand string begins with a wildcard character ("LIKE '%foo'"). It will use an index (if available) otherwise.
0
 
Anthony PerkinsCommented:
>>It will use an index (if available) otherwise. <<
Not to be picky, but the correct statement would be:
It may use an index (if available) otherwise.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.