Solved

How do I properly index a simple table for fast searches?

Posted on 2011-09-10
12
286 Views
Last Modified: 2012-05-12
I have an audit table with the following schema:

CREATE TABLE [dbo].[T4E_Audit](
	[ActionDateTime] [datetime] NOT NULL
	, [ActionCode] [varchar](50) NOT NULL
	, [ActionText] [varchar](8000)
	, [ActionOldValue] [varchar] (8000)
	, [ActionNewValue] [varchar] (8000)
	
	, [ActorCN] [varchar] (500)
	, [ActorLBACid] [int]
	, [ActorWIUID] [bigint]
	, [ActorUsername] [varchar] (20)
	, [ActorDisplayName] [varchar] (50)
	
	, [SubjectCN] [varchar] (500) NOT NULL
	, [SubjectLBACid] [int]
	, [SubjectWIUID] [bigint] NOT NULL
	, [SubjectUsername] [varchar] (20) NOT NULL
	, [SubjectDisplayName] [varchar] (50) NOT NULL
)

Open in new window


I am running a lot of searches against this table, specifically:
- like '..%' compares against ActorWIUID,ActorUsername, and ActorDisplayName (same for Subject* fields)
- = compares against ActionCode
- date range compares against ActionDateTime

I figured if I add the following index (and rebuild it), everything would run faster, but it runs considerably slower.

CREATE NONCLUSTERED INDEX [IX_T4E_Audit] ON [dbo].[T4E_Audit] 
(
	[ActionCode] ASC,
	[ActionDateTime] ASC,
	[ActorWIUID] ASC,
	[ActorUsername] ASC,
	[ActorDisplayName] ASC,
	[SubjectWIUID] ASC,
	[SubjectUsername] ASC,
	[SubjectDisplayName] ASC
)

Open in new window


...What is the right way to go about this?
0
Comment
Question by:CleanAsis
  • 3
  • 3
  • 3
  • +1
12 Comments
 
LVL 13

Expert Comment

by:khairil
ID: 36517816
Hi,

You can create multiple indexes for each criteria you are searching, try avoid compound indexes.
0
 

Author Comment

by:CleanAsis
ID: 36517825
Can you give a best example based on the criteria I described please?
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 36517872
If you are doing LIKE '%something'  no amount of indexes will help you, they will not be used.  Your query will still be a table scan.

Try posting your specific query and we can give you some suggestions.
0
 
LVL 13

Expert Comment

by:khairil
ID: 36518264
Same like you did above, but just one field only.

acper said is true, if you using LIKE statement indexes will not help. You can analyze you query using by displaying estimted planned, http://www.slideshare.net/nspyrenet/sql-query-analyzer-maintenance. Over there then you can see what query does tables scan and what doing index scan.

Better yet is you can use database tuning advisor (DTA), this will require some effort on your side to profile your database and do the workload analyzing in DTA. DTA also will give you recommendation on how to tune your database. Hope this link will help you doing profiling dan running DTA, http://www.zimbio.com/SQL/articles/655/How+Tune+Database+Using+Database+Tuning+Advisor
0
 
LVL 50

Expert Comment

by:Lowfatspread
ID: 36519146
what is the tables primary key?
what is the clustering key?

why are you doing a like search against the UID columns they are bigints?

how big are the table rows usually?

how many rows on the table?

can you show us an example select ?

what cardinality do the columns have (how many unique values per column) ?

what proportion of the table do you expect the queries to return?
0
 

Author Comment

by:CleanAsis
ID: 36519612
Lowfatspread,

what is the tables primary key? doesn't have one

what is the clustering key? I'm not entirely sure what this means but I do not have any indexes on this table besides this one I'm trying to add

why are you doing a like search against the UID columns they are bigints? Because if someone's ID is "1234" I want to be able to find it by searching for "12*"

how big are the table rows usually? The fields I'm searching aren't big; see the schema I posted in the original email

how many rows on the table? Millions

can you show us an example select ? Here's one with all possible filters:
SELECT top 1000 [ActionDateTime]
					  ,[ActionCode]
					  ,[ActionText]
					  ,[ActionOldValue]
					  ,[ActionNewValue]
					  ,[ActorCN]
					  ,[ActorLBACid]
					  ,[ActorWIUID]
					  ,[ActorUsername]
					  ,[ActorDisplayName]
					  ,[SubjectCN]
					  ,[SubjectLBACid]
					  ,[SubjectWIUID]
					  ,[SubjectUsername]
					  ,[SubjectDisplayName]
				  FROM [dbo].[T4E_Audit] where  
([ActorWIUID] like '11%' or [ActorUsername] like '11%' or [ActorDisplayName] like '11%' or [SubjectWIUID] like '11%' or [SubjectUsername] like '11%' or [SubjectDisplayName] like '11%')
  and  [ActionDateTime] > CONVERT( DATETIME, '9/1/2011', 101 )  and  [ActionDateTime] < CONVERT( DATETIME, '9/14/2011', 101 ) 
 and  (  [ActionCode]='Audit Search' or [ActionCode]='Front Page Redirect to Legacy'  ) 
ORDER BY [ActionDateTime] DESC

Open in new window


what cardinality do the columns have (how many unique values per column) ? I'm not entirely sure what this means, but there will be a lot of repetition of data across these columns

what proportion of the table do you expect the queries to return? I'm not sure what you mean by proportion, but 1000 records...

Thank you!


0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 50

Expert Comment

by:Lowfatspread
ID: 36519709
ok a primary key would help
 is there an obvious set of columns on the table which (in combination) are unique?
 (or could you alter the table and add an identity column  and use that as the primary key?)

a clustering index would also help
  from the look of the table I would assume that  a clustering index of
     Actiondatetime asc,ActionCode asc,subjectWiuid asc
   would probably be beneficial

UID search then its better to do a range search e.g.  UID between 1200 and 1299 which is the correct datatype, and a much more limited range...

the search fields may not be big but you have 3 columns which potentially could be 8000 characters so the rows themselve are potentially large which means more IO and less speed.

Ol it looks like your query is trying to return all data from the row...
given the above  example query the processing probably has to search the whole table anyway (because the table doesn't have a Primary Key or a Clustering Index defined ... so its a Heap ... so data will potentially exist on every page in the table and once the query has to visit more than 15-25% of the tables pages it usually better to just read the whole table as large IO reads are then generally more possible to achieve)

your "candidate" index is probably better as

CREATE NONCLUSTERED(*) INDEX [IX_T4E_Audit] ON [dbo].[T4E_Audit]
(
      
      [ActionDateTime] ASC,
       [ActionCode] ASC,
       [SubjectWIUID] ASC,
      [SubjectUsername] ASC,

-- just the above 4 columns may be the "better" solution
      [SubjectDisplayName] ASC
      [ActorWIUID] ASC,
      [ActorUsername] ASC,
      [ActorDisplayName] ASC,
      
)

the inclusion of the nullable columns may not provide much benefit

This is a possible candidate for a CLUSTERED Index ... which would be benficial (appears to be) to your queries


>cardinality
 very important concept for database and index efficiency...
 How many distinct/Unique values does each column contain ...
   ie how many valid Action Codes are there? (presumably only a few < 50?) so cardinality is low-medium
       the ActionDateTime presumeably has a "high" cardinality since there are 365 days per year (even if you don't record      actual times) so each "unique" actiondate is about 0.3% of the Tables data? (divide that further by each year the table holds)

  when designing indexes you want the columns whith the highest cardinality at the start (left) of the index (subject to your queries actually specifying values for those columns)

>proportion
  ol you are limiting the queries output to 1000 rows but it still has to process all the rows matching the query selection in order to determine which 1000 to give to you ... its the number of rows it has to process that is important ... as that percentage of rows against the number of rows on the table determines how the database (generally) processes the query.


0
 

Author Comment

by:CleanAsis
ID: 36519793
Lowfatspread,

Thanks for all your insight.

1) Primary key - how would it help with my searches? There is no set of columns that would guarantee a unique value.

2) In regards to a clustered index - you're saying it could be beneficial, but in the query you're proposing you're using all the same columns within a non-clustered index.  Are you saying I should have both, clustered and non-clustered index on those columns?

3) The "candidate" index you provided - I set it on my table and clicked "rebuild" on it in MSSQL studio and I'm experiencing the same as I described with my original index - the searches actually get slower

4) You're right on your cardinality assumptions
0
 
LVL 75

Assisted Solution

by:Anthony Perkins
Anthony Perkins earned 100 total points
ID: 36519846
khairil,

>>acper said is true, if you using LIKE statement indexes will not help.<<
Actually that is not what I said.  Here it is again:
If you are doing LIKE '%something'  no amount of indexes will help you, they will not be used.
Since the author is using LIKE '11%', provided the column is character data, it can make use of an index.
0
 
LVL 50

Accepted Solution

by:
Lowfatspread earned 400 total points
ID: 36519946
1)  it depends on how data is inserted into the table  
      lets leave it for the moment , as a clustering index will be more beneficial

2) no you aren't allowed to specify the "same" index more than once anyway...
    the preference would be to define it as the clustered index.

3) you defined it as non clustered?

4) -

------------------------

can you try this?

CREATE CLUSTERED INDEX [IX_T4E_Audit] ON [dbo].[T4E_Audit]
(
     
      [ActionDateTime] ASC,
       [ActionCode] ASC
)
GO
SELECT top 1000 [ActionDateTime]
                                ,[ActionCode]
                                ,[ActionText]
                                ,[ActionOldValue]
                                ,[ActionNewValue]
                                ,[ActorCN]
                                ,[ActorLBACid]
                                ,[ActorWIUID]
                                ,[ActorUsername]
                                ,[ActorDisplayName]
                                ,[SubjectCN]
                                ,[SubjectLBACid]
                                ,[SubjectWIUID]
                                ,[SubjectUsername]
                                ,[SubjectDisplayName]
                          FROM [dbo].[T4E_Audit]
where  [ActionDateTime]  between '20110901' and '20110914 23:59:59.997'
    and  [ActionCode] in ('Audit Search' ,'Front Page Redirect to Legacy'  )
   and (
               [ActorWIUID] between 1100 and 1199  -- its a number so don't do a character comparison
           or [ActorUsername] like '11%'
           or [ActorDisplayName] like '11%'
           or [SubjectWIUID] between 1100 AND 1199 -- ITS A NUMBER ...
           or [SubjectUsername] like '11%'
           or [SubjectDisplayName] like '11%')
 
ORDER BY [ActionDateTime] DESC
0
 
LVL 13

Expert Comment

by:khairil
ID: 36520827
Sorry for that acper, not reading you thoroughly.

Still depend on how you use LIKE operation. The LIKE operator will perform a full table scan if operand string begins with a wildcard character ("LIKE '%foo'"). It will use an index (if available) otherwise.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 36524418
>>It will use an index (if available) otherwise. <<
Not to be picky, but the correct statement would be:
It may use an index (if available) otherwise.
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
SQL 2012 Syntax Error 5 25
Contained Database Collations 6 20
Delete Trigger in SQL Server2008R2 5 13
select over clause 1 10
I annotated my article on ransomware somewhat extensively, but I keep adding new references and wanted to put a link to the reference library.  Despite all the reference tools I have on hand, it was not easy to find a way to do this easily. I finall…
In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now