Link to home
Start Free TrialLog in
Avatar of bfuchs
bfuchsFlag for United States of America

asked on

Finding missing indexes & adding them to SQL.

Hi Experts,
I found a script that lists all missing indexes on a database as following

-- Missing Index Script
-- Original Author: Pinal Dave (C) 2011
SELECT --TOP 25
dm_mid.database_id AS DatabaseID,
dm_migs.avg_user_impact*(dm_migs.user_seeks+dm_migs.user_scans) Avg_Estimated_Impact,
dm_migs.last_user_seek AS Last_User_Seek,
OBJECT_NAME(dm_mid.OBJECT_ID,dm_mid.database_id) AS [TableName],
'CREATE INDEX [IX_' + OBJECT_NAME(dm_mid.OBJECT_ID,dm_mid.database_id) + '_'
+ REPLACE(REPLACE(REPLACE(ISNULL(dm_mid.equality_columns,''),', ','_'),'[',''),']','') +
CASE
WHEN dm_mid.equality_columns IS NOT NULL AND dm_mid.inequality_columns IS NOT NULL THEN '_'
ELSE ''
END
+ REPLACE(REPLACE(REPLACE(ISNULL(dm_mid.inequality_columns,''),', ','_'),'[',''),']','')
+ ']'
+ ' ON ' + dm_mid.statement
+ ' (' + ISNULL (dm_mid.equality_columns,'')
+ CASE WHEN dm_mid.equality_columns IS NOT NULL AND dm_mid.inequality_columns IS NOT NULL THEN ',' ELSE
'' END
+ ISNULL (dm_mid.inequality_columns, '')
+ ')'
+ ISNULL (' INCLUDE (' + dm_mid.included_columns + ')', '') AS Create_Statement
FROM sys.dm_db_missing_index_groups dm_mig
INNER JOIN sys.dm_db_missing_index_group_stats dm_migs
ON dm_migs.group_handle = dm_mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details dm_mid
ON dm_mig.index_handle = dm_mid.index_handle
WHERE dm_mid.database_ID = DB_ID()
ORDER BY Avg_Estimated_Impact DESC
GO

Open in new window


Would you suggest I implement them according to high impact, or I need to do more analysis before?

Perhaps you have a better way of accomplishing that, let me know.
Avatar of Scott Pletcher
Scott Pletcher
Flag of United States of America image

You can't rely on SQL missing index stats fully enough to just automatically create indexes from it.  You really have to, at a minimum, analyze missing index stats in conjunction with index usage stats.
You don't want to add every index suggested because you may be adding large indexes for a procedure that is run once a week.  You need to start with execution plans that are run frequently and evaluate them individually to determine what indexes will help.
Avatar of bfuchs

ASKER

@Scott, Brian,

I thought the column avg_estimated_Impact in that script would refer to how much the usage of the index suggested would be.

Well, if that is not the case, do you have a more reliable script that I can rely on?

Thanks,
Ben
I agree on the above: adding indexes should never be automatic.
rather, try to find "slow / intensive" queries and see if they may require some indexes or some rewrite (it may go as far as to change the table structure altogether)
I thought the column avg_estimated_Impact in that script would refer to how much the usage of the index suggested would be.
No. That's the estimated improvement impact for that single query if you create the suggested index.

Well, if that is not the case, do you have a more reliable script that I can rely on?
No scripts will do that. The one you have is not bad but you should use it very carefully. Are you having any performance issue? If so, I would start to capture the long running queries and check those against the results you already have from the current script you're using.
Please fill in the db name and table name pattern for the tables you want to analyze ('%' if you want to review all tables) in the script below, then run it and post the results.  Hopefully SQL has been up long enough to get some meaningful stats.


USE [<your_db_name_here>] --naturally make sure you are in the right db

SET DEADLOCK_PRIORITY LOW --probably irrelevant, but just in case

DECLARE @list_missing_indexes bit
DECLARE @table_name sysname

--NOTE: showing missing indexes can take some time; set to 0 if you don't want to wait.
SET @list_missing_indexes = 1
SET @table_name = '%' --<<-- !your table name / LIKE pattern goes here!
--SET @table_name = '%' --all tables

SELECT * FROM sys.dm_db_index_physical_stats ( DB_ID(), OBJECT_ID(@table_name), NULL, NULL, 'LIMITED')

PRINT 'Started @ ' + CONVERT(varchar(30), GETDATE(), 120)

IF @list_missing_indexes = 1
BEGIN
    SELECT
        GETDATE() AS capture_date,
        DB_NAME(mid.database_id) AS Db_Name,
        dps.row_count,
        OBJECT_NAME(mid.object_id /*, mid.database_id*/) AS Table_Name,
        mid.equality_columns, mid.inequality_columns, mid.included_columns,
        user_seeks, user_scans, ca1.max_days_active, unique_compiles,
        last_user_seek, last_user_scan, avg_total_user_cost, avg_user_impact,
        system_seeks, system_scans, last_system_seek, last_system_scan, avg_total_system_cost, avg_system_impact,
        mid.statement, mid.object_id, mid.index_handle
    FROM sys.dm_db_missing_index_details mid WITH (NOLOCK)
    CROSS APPLY (
        SELECT DATEDIFF(DAY, create_date, GETDATE()) AS max_days_active FROM sys.databases WHERE name = 'tempdb'
    ) AS ca1
    LEFT OUTER JOIN sys.dm_db_missing_index_groups mig WITH (NOLOCK) ON
        mig.index_handle = mid.index_handle
    LEFT OUTER JOIN sys.dm_db_missing_index_group_stats migs WITH (NOLOCK) ON
        migs.group_handle = mig.index_group_handle
    LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON
        dps.object_id = mid.object_id AND
        dps.index_id IN (0, 1)
    --order by
        --DB_NAME, Table_Name, equality_columns
    WHERE
        1 = 1
        AND mid.database_id = DB_ID() --only current db
        AND OBJECT_NAME(mid.object_id) LIKE @table_name
        --AND mid.object_id IN (OBJECT_ID('<table_name_1>'), OBJECT_ID('<table_name_2>'))
    ORDER BY
        --avg_total_user_cost * (user_seeks + user_scans) DESC,
        Db_Name, Table_Name, equality_columns, inequality_columns
END --IF

PRINT 'Midpoint @ ' + CONVERT(varchar(30), GETDATE(), 120)
-- list index usage stats (seeks, scans, etc.)
SELECT
    ius2.row_num, DB_NAME() AS db_name,
    CASE WHEN i.name LIKE ca2.table_name + '%'
         THEN '~' + SUBSTRING(i.name, LEN(ca2.table_name) + 1, 200)
         ELSE i.name END AS index_name,
    CASE WHEN i.is_unique = 0 THEN 'N' ELSE 'Y' END + '.' +
    CASE WHEN i.is_primary_key = 0 AND i.is_unique_constraint = 0 THEN 'N' ELSE 'Y' END AS [uniq?],
    ca2.table_name,
    i.index_id, --ius.user_seeks + ius.user_scans AS total_reads,
    dps.row_count,
    SUBSTRING(key_cols, 3, 8000) AS key_cols, SUBSTRING(nonkey_cols, 3, 8000) AS nonkey_cols,
    ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates,
    ius.last_user_seek, ius.last_user_scan, ius.last_user_lookup, ius.last_user_update,
    fk.Reference_Count AS fk_ref_count,
    DATEDIFF(DAY, CASE WHEN o.create_date > ca1.sql_startup_date THEN o.create_date
        ELSE ca1.sql_startup_date END, GETDATE()) AS max_days_active,
    FILEGROUP_NAME(i.data_space_id) AS filegroup_name,
    ius.system_seeks, ius.system_scans, ius.system_lookups, ius.system_updates,
    ius.last_system_seek, ius.last_system_scan, ius.last_system_lookup, ius.last_system_update
FROM sys.indexes i WITH (NOLOCK)
INNER JOIN sys.objects o WITH (NOLOCK) ON
    o.object_id = i.object_id
CROSS JOIN (
    SELECT create_date AS sql_startup_date FROM sys.databases WHERE name = 'tempdb'
) AS ca1
CROSS APPLY (
    SELECT OBJECT_NAME(i.object_id/*, DB_ID()*/) AS table_name
) AS ca2
OUTER APPLY (
    SELECT
        ', ' + COL_NAME(object_id, ic.column_id)
    FROM sys.index_columns ic
    WHERE
        ic.key_ordinal > 0 AND
        ic.object_id = i.object_id AND
        ic.index_id = i.index_id
    ORDER BY
        ic.key_ordinal
    FOR XML PATH('')
) AS key_cols (key_cols)
OUTER APPLY (
    SELECT
        ', ' + COL_NAME(object_id, ic.column_id)
    FROM sys.index_columns ic
    WHERE
        ic.key_ordinal = 0 AND
        ic.object_id = i.object_id AND
        ic.index_id = i.index_id
    ORDER BY
        COL_NAME(object_id, ic.column_id)
    FOR XML PATH('')
) AS nonkey_cols (nonkey_cols)
LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON
    dps.object_id = i.object_id AND
    dps.index_id = i.index_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats ius WITH (NOLOCK) ON
    ius.database_id = DB_ID() AND
    ius.object_id = i.object_id AND
    ius.index_id = i.index_id
LEFT OUTER JOIN (
    SELECT
        database_id, object_id, MAX(user_scans) AS user_scans,
        ROW_NUMBER() OVER (ORDER BY MAX(user_scans) DESC) AS row_num --user_scans|user_seeks+user_scans
    FROM sys.dm_db_index_usage_stats WITH (NOLOCK)
    WHERE
        database_id = DB_ID()
        --AND index_id > 0
    GROUP BY
        database_id, object_id
) AS ius2 ON
    ius2.database_id = DB_ID() AND
    ius2.object_id = i.object_id
LEFT OUTER JOIN (
    SELECT
        referenced_object_id, COUNT(*) AS Reference_Count
    FROM sys.foreign_keys
    WHERE
        is_disabled = 0
    GROUP BY
        referenced_object_id
) AS fk ON
    fk.referenced_object_id = i.object_id
WHERE
    i.object_id > 100 AND
    i.is_hypothetical = 0 AND
    i.type IN (0, 1, 2) AND
    o.type NOT IN ( 'IF', 'IT', 'TF', 'TT' ) AND
    (
     o.name LIKE @table_name  AND
     o.name NOT LIKE 'dtprop%' AND
     o.name NOT LIKE 'filestream[_]' AND
     o.name NOT LIKE 'MSpeer%' AND
     o.name NOT LIKE 'MSpub%' AND
     --o.name NOT LIKE 'queue[_]%' AND
     o.name NOT LIKE 'sys%'
    )
    --AND OBJECT_NAME(i.object_id /*, DB_ID()*/) IN ('tbl1', 'tbl2', 'tbl3')
ORDER BY
    --row_count DESC,
    --ius.user_scans DESC,
    --ius2.row_num, --user_scans&|user_seeks
    db_name, table_name,
    -- list clustered index first, if any, then other index(es)
    CASE WHEN i.index_id IN (0, 1) THEN 1 ELSE 2 END,
    key_cols

PRINT 'Ended @ ' + CONVERT(varchar(30), GETDATE(), 120)

SET DEADLOCK_PRIORITY NORMAL
Avatar of bfuchs

ASKER

@Guy, Vitor,
How about if I look at the activity monitor for recent expensive queries/ show execution plan/ view missing index  and apply whats suggested there?

@Vitor,
No. That's the estimated improvement impact for that single query if you create the suggested index
That script is only displaying indexes no queries are shown there, what single query are you referring to?

The top row has a avg_estimated_impact of 116982.84, what does it mean?

@Scott,
Will have the results of that script shortly.

Thanks,
Ben
Avatar of bfuchs

ASKER

@Scott,
I am getting the following error on the line below
SELECT * FROM sys.dm_db_index_physical_stats ( DB_ID(), OBJECT_ID(@table_name), NULL, NULL, 'LIMITED')

Open in new window

Incorrect syntax near '('.
Hmm, the syntax error must be before that, as that statement is OK.  But I don't get any errors when I parse the original script (assuming a valid db name is entered for the "USE" statement).
Avatar of bfuchs

ASKER

Yes, name of DB is correct.
See screenshot.
Untitled.png
Double-click on the error message and it will take you to the bad line (or usually close to it, or where the parser first "noticed" it).
Avatar of bfuchs

ASKER

its taking me to that line posted above.
That parses fine ... copy it into SSMS, and test it yourself:

DECLARE @table_name varchar(50)
<copy line from above>
Avatar of bfuchs

ASKER

same thing, see attached.
Untitled.png
Avatar of bfuchs

ASKER

I am using SSMS 2008 if that matters.
Avatar of bfuchs

ASKER

btw, a while ago you gave me a similar script to inspect this and that worked for the table I was interested, however now I am looking for the DB in general, should I use that?
Avatar of bfuchs

ASKER

@Scott,

I just did a file compare with the other script you had posted and realized that the other version does not include this line (See attached).
So I will run that & post here the results.

Thanks,
Ben
Untitled.png
Avatar of bfuchs

ASKER

@Scott,

Attached are the results
Let me know what do you suggest accordingly.

Thanks,
Ben
first.csv
second.csv
Comma-delimited doesn't work very well on those files because the key column list and include column list can have embedded columns in them.  Could you re-post it as tab delimited?

SQL is recommending 172(!) indexes, which is an extraordinarily large amount for a relatively small number of tables.  Then again, SQL wants an index for everything, which is why you have to review its recommendations carefully :-) .
Avatar of bfuchs

ASKER

Hi,

Uploaded in tab delimited format.

SQL is recommending 172(!) indexes, which is an extraordinarily large amount for a relatively small number of tables.
And this is after I started adding indexes according to what the script (posted on original question) suggested...And after every index added the list got relatively shorter, then I realized once I apply an index to a table, all indexes related to that table are not there..bottom line, I decided not to do this kind of stuff on my own account, I'm better off consulting with you experts first...

PS. Perhaps it would also be useful after completing this task, to run a script of all unused indexes & get rid of them.

Thanks
Ben
First.txt
Second.txt
Avatar of bfuchs

ASKER

I just ran the script again & saw 185, it means the list is growing by the day..?
Wow.  Overall it's not that bad though.  Many of them are basically dups of each other.
Concentrating on the larger tables first, I suggest adding the following new/added/removed index(es) and or column(s).  I haven't had time to fully analyze, but I think these are solid recommendations so far.  This is not an easy, or quick, process!


Table: PatientsScheduleDeleted_HC
Add index with::  Keys: ( [PatientID], [EmployeeID] )   Include: ( [Day] )


Table: PatientsEmployeesSchedule

Remove index EmployeeID       --covered by IX_PatientsEmployeesSchedule_EmployeeID

Change index IX_PatientsEmployeesSchedule_EmployeeID to (also)
    INCLUDE ( [From], [To], [TravelYN] )

Remove index PatientID --covered by Unique1

Change index Unique1 to (also)
    INCLUDE ( FromOriginal )

Remove index UniqueFromOriginal      --covered by Unique1 after INCLUDEing FromOriginal in it

Remove index IX_PatientsEmployeesSchedule_PatientID --covered by Unique1 except for PatientsNotes, but since it's only 1400 accesses, I'd just allow SQL to go back to the main table for PatientNotes if it has to
I'm still concerned about:

EmployeesEmailMessages

even though SQL isn't showing an index as needed.  Its 2M rows are being scanned ~50 times a day.  That ain't great.
Avatar of bfuchs

ASKER

Really appreciate your work Scott.

Re EmployeesEmailMessages, I wonder how it gets to 50 a day, as this table its not something users are running reports of it?
That's what I'm curious about.  It's showing ~8000 user scans in ~167 days (max).  Maybe there's some application process that reads it for some reason??
Avatar of bfuchs

ASKER

thats right, the app used for creating a list of email addresses that are being sent mass email in regular bases, checks if that email add already received X amount of emails the current date,
see following VBA code
    sSql = "SELECT EmployeesEmailMessages.EmailAddress" _
        & " FROM EmailMessagesTbl INNER JOIN EmployeesEmailMessages ON EmailMessagesTbl.ID = EmployeesEmailMessages.EmailMessagesID" _
        & " WHERE (((EmailMessagesTbl.DateCreated) > DATE()))" _
        & " GROUP BY EmployeesEmailMessages.EmailAddress" _
        & " HAVING (((Count(EmailMessagesTbl.ID))>4));"

Open in new window

Any suggestion to improve that?
Avatar of bfuchs

ASKER

Change index Unique1 to (also)
    INCLUDE ( FromOriginal )

Remove index UniqueFromOriginal      --covered by Unique1 after INCLUDEing FromOriginal in it

These are two unique indexes enforcing two different rules, I need that both should be unique, Pat+Day+From and Pat+Day+FromOrig, meaning that even tough the value of field From changes, as long as the FromOrig remains the same, it should not allow another entry with that combination, while (to my understanding) if I drop the second unique index and just include all in one, I will not get it to enforce that, in other words it will do the following rule, only if all 4 columns are the same as another row it would reject it.
Sorry, quite right: if they are used for uniqueness enforcement, yes, you do need to keep them.  I was looking at it purely from a performance perspective.
>> Any suggestion to improve that? <<

Index modifications.  As always, the real key to good performance is getting the best clustered index on each table.  And, yes, the clustering key is not unique here, but it's still the best clustering key:


DROP INDEX IX_EmailMessagesTbl_Initial ON EmailMessagesTbl
ALTER TABLE EmailMessagesTbl DROP CONSTRAINT PK_EmailMessagesTbl
CREATE CLUSTERED INDEX CL_EmailMessagesTbl ON EmailMessagesTbl ( DateCreated ) WITH ( FILLFACTOR = 98 ) ON [PRIMARY]
ALTER TABLE EmailMessagesTbl ADD CONSTRAINT PK_EmailMessagesTbl PRIMARY KEY NONCLUSTERED ( ID ) WITH ( FILLFACTOR = 100 ) ON [PRIMARY]
CREATE INDEX IX_EmailMessagesTbl_Initial ON EmailMessagesTbl ( Initial ) INCLUDE ( ID ) ON [PRIMARY]


DROP INDEX IX_EmployeesEmailMessages_EmailMessagesID ON EmployeesEmailMessages
ALTER TABLE EmployeesEmailMessages DROP CONSTRAINT PK_EmployeesEmailMessages
CREATE CLUSTERED INDEX CL_EmployeesEmailMessages ON EmployeesEmailMessages ( EmailMessagesID ) WITH ( FILLFACTOR = 95 ) ON [PRIMARY]
ALTER TABLE EmployeesEmailMessages ADD CONSTRAINT PK_EmployeesEmailMessages PRIMARY KEY NONCLUSTERED ( ID ) WITH ( FILLFACTOR = 100 ) ON [PRIMARY]
Avatar of bfuchs

ASKER

Hi Scott,

I am a little confused as to what the numbers of records in this script are referring to.

Last Week after applying your suggestions it went down from 185 to 145. (the results of 1st query)

Now when I run it, its back to 179..

Thanks,
Ben
Review the "missing indexes" carefully.  Even in the one I looked at above, over 40 of the indexes were basically the same key(s), with only different included columns.  SQL Server is terrible at properly determining indexes, but we still need the missing index stats to help us determine the degree of index usage.
Definitely look at the usage stats too (the second result set).  What you want -- and expect -- to see are (far) more seeks and (far) fewer scans.
Avatar of bfuchs

ASKER

Hi Scott,

Currently I did apply all your suggestions, while the numbers keep climbing (presently over 200), from my understanding your description, the actual numbers are  not relevant.

however at this point I am not sure how to proceed regarding the rest of the tables, unless you can point me to some place where I can follow clear instructions, I am still not able to do it myself.

Perhaps we already covered the main bottlenecks and the rest are not that important?

Thanks,
Ben
Perhaps.  You have to check both stats again and see if you still have any significant issues.
Avatar of bfuchs

ASKER

Hi Scott,

Just want to make sure I got it right.

Is the first results displaying a list of all missing indexes and the second is displaying the usage of the existing indexes?

What are considered significant issues, those records with big numbers on user_scans and relative small on user_seeks?

Thanks,
Ben
ASKER CERTIFIED SOLUTION
Avatar of Scott Pletcher
Scott Pletcher
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of bfuchs

ASKER

Got you!

Thanks again for your help Scott, I hope to get the rest in shape as well...

In case I get stuck, I will post again (if you don't mind-:)

Ben
Not at all.  Please do.  I'm really busy right now, but I might get some time later to review it all again.
Avatar of bfuchs

ASKER

Hi Scott,

A question for you.

I did the following in order to see if there are something significant indexes missing.

1- opened a sql trace and let it run for a full day.
2- saved the results into a table.
3- sorted the results by duration desc
4- copied the sql text of the top rows
5- pasted them in SSMS and selected display estimated execution plan.
6- didn't see anything that it should suggest missing index.
7- then I ran the following sql to give me the most frequent statements used
--total of rows grouped by sql text up till the "=" character, which in most part would just be a different param.
SELECT Count([dbo_6-9-15].RowNumber) AS CountOfRowNumber, nz(Mid([TextData],1,InStr([textdata],"="))) AS Expr1, Min([dbo_6-9-15].RowNumber) AS MinOfRowNumber
FROM [dbo_6-9-15]
GROUP BY nz(Mid([TextData],1,InStr([textdata],"=")))
ORDER BY Count([dbo_6-9-15].RowNumber) DESC;

Open in new window

8- from the above sql results I took the MinOfRowNumber to get a full SQL of each category group.
9- again pasted them individually in SSMS and selected Display estimated execution plan
10- bottom line, there were no missing index suggestions.

What is your opinion, does that all makes sense?

Thanks,
Ben
Yes, that makes sense.  The missing index stats in SQL Server also indicate the last time that index would have been used for a seek or scan.  Hopefully those dates keep receding as the new clustered indexes take over much more of the SELECT processing.
Avatar of bfuchs

ASKER

Thanks a million Scott!!
You're welcome!  If you'd like, wait a week or two and re-post the results and we'll review them again.  Tuning is definitely an iterative process.
Avatar of bfuchs

ASKER

Thanks for the offer Scott,
I will definitely consider that as this something crucial for our business.
Ben
Avatar of bfuchs

ASKER

Hi Scott,

Let me know if you have some time this week, I will start a new post & you take it from there..

Thanks,
Ben
I will have some time.  I'm off work on Friday, but will be here the rest of the week.