?
Solved

Speed a query

Posted on 2012-08-24
26
Medium Priority
?
696 Views
Last Modified: 2012-08-29
I have this table:
CREATE TABLE [dbo].[tblHorarios](
	[FK_ID_intEmpresa] [int] NOT NULL,
	[FK_ID_intFuncionario] [int] NOT NULL,
	[datHoraEntradaHorario] [smalldatetime] NOT NULL,
	[datHoraSaidaHorario] [smalldatetime] NOT NULL,
	[FK_ID_intTipoHorario] [uniqueidentifier] NOT NULL,
	[datDataAlteracaoHorario] [smalldatetime] NOT NULL,
	[strNumHorario] [varchar](50) NOT NULL,
 CONSTRAINT [PK_tblHorarios] PRIMARY KEY CLUSTERED 
(
	[FK_ID_intEmpresa] ASC,
	[FK_ID_intFuncionario] ASC,
	[FK_ID_intTipoHorario] ASC,
	[datDataAlteracaoHorario] ASC,
	[strNumHorario] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Open in new window

I need to speed up this query:
DELETE h2
FROM tblHorarios h1
INNER JOIN tblHorarios h2
ON h1.FK_ID_intEmpresa=h2.FK_ID_intEmpresa
	AND h1.FK_ID_intFuncionario=h2.FK_ID_intFuncionario
	AND h1.FK_ID_intTipoHorario=h2.FK_ID_intTipoHorario
	AND h2.datDataAlteracaoHorario=(SELECT MAX(datDataAlteracaoHorario)
									FROM tblhorarios h3
									WHERE h3.FK_ID_intEmpresa=h1.FK_ID_intEmpresa
										AND h3.FK_ID_intFuncionario=h1.FK_ID_intFuncionario
										AND h3.FK_ID_intTipoHorario=h1.FK_ID_intTipoHorario
										AND DATEPART(dw,h3.datDataAlteracaoHorario)=DATEPART(dw,h1.datDataAlteracaoHorario)
										AND h3.datDataAlteracaoHorario<h1.datDataAlteracaoHorario)
WHERE h1.datHoraEntradaHorario=h2.datHoraEntradaHorario
	AND h1.datHoraSaidaHorario=h2.datHoraSaidaHorario
	AND h1.strNumHorario=h2.strNumHorario

Open in new window

This takes quite some time, and I'm not sure how to improve on it. Basically, for each row, if the previous row with the same week day, FK_ID_intTipoHorario, FK_ID_intEmpresa and FK_ID_intFuncionario have the same values for datHoraEntradaHorario,datHoraSaidaHorario and strNumHorario, then it's a duplicate and needs to be deleted.

I've attached the execution plan (rename it to sqlplan, as EE uploader doesn't allow sqlplan extension). I've noticed that the last row on the query (strNumHorario) increased the execution time, but I'm not sure why.
ExecPlan.sql
0
Comment
Question by:Cluskitt
  • 13
  • 10
  • 3
26 Comments
 
LVL 7

Expert Comment

by:sqlxl
ID: 38330023
How long does it currently take to run?

How many rows in each of the tables?
0
 
LVL 18

Author Comment

by:Cluskitt
ID: 38330049
The table is the same. It's comparing with itself. As of right now, it has about 75k rows. It's taking about 2 mins to run. If it were just the DELETE query, that wouldn't be much of a problem, but we also check that query with a SELECT *.
0
 
LVL 7

Expert Comment

by:sqlxl
ID: 38330065
That is a long time for a table with only 75,000 rows.

How often do you run this query?
0
Free Backup Tool for VMware and Hyper-V

Restore full virtual machine or individual guest files from 19 common file systems directly from the backup file. Schedule VM backups with PowerShell scripts. Set desired time, lean back and let the script to notify you via email upon completion.  

 
LVL 18

Author Comment

by:Cluskitt
ID: 38330079
The delete one will be run either every day or every other day. The select will be run at least once an hour.
0
 
LVL 7

Accepted Solution

by:
sqlxl earned 1500 total points
ID: 38330110
According to the exec plan, the 32% is spent in the clustered table seek. The problem I see initially is that the join is based on a derived value which is being forced by this section of the query. AND DATEPART(dw,h3.datDataAlteracaoHorario) =DATEPART(dw,h1.datDataAlteracaoHorario)

This will sound a little weird but if the table already had a column with that value in it then the join could be based on a literal value, not derived. You could also index it so it did not have to seek the clustered index, it could seek the new index which would contain the predicates as well as the returned value. (covering index)

Try this:

1. Create a column called datDataAlteracaoHorarioDW and populate it with DATEPART(dw,h3.datDataAlteracaoHorario)  for each row.

2. Create an index on the following columns.

FK_ID_intEmpresa
FK_ID_intFuncionario
FK_ID_intTipoHorario
datDataAlteracaoHorario

3. Change AND DATEPART(dw,h3.datDataAlteracaoHorario) =DATEPART(dw,h1.datDataAlteracaoHorario)
to
AND h3.datDataAlteracaoHorarioDW =h1.datDataAlteracaoHorarioDW

I'm expecting the clustered index seek to be replaced with a seek of the new index. The new index should be much smaller and not need to join based on a derived value.
0
 
LVL 18

Author Comment

by:Cluskitt
ID: 38330164
So, you're saying to basically create a week day column. Yes, I can see that that would be much more effective, however that would require some changes to the site, especially all the parts in the code where I do inserts on that table, which will then require a new column. I won't be able to do such an extensive change today, so I'll have to get back to you on monday.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 38330581
however that would require some changes to the site, especially all the parts in the code
where I do inserts on that table, which will then require a new column.

Not necessarily.  Perhaps you can create a computed column and index it.  That would not require any changes.
0
 
LVL 7

Expert Comment

by:sqlxl
ID: 38330656
You could add the column with a null allowed and have a job update the value. No change to the app itself.

Suppose you could use a trigger too but I personally am not a fan of triggers.

acperkins computed column would also be a good idea. Better than a null column or trigger.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 38330681
Just so that I understand are the following rows that should be deleted?
SELECT	FK_ID_intTipoHorario, 
	FK_ID_intEmpresa,
	FK_ID_intFuncionario,
	datHoraEntradaHorario,
	datHoraSaidaHorario,
	strNumHorario,
	MAX(datDataAlteracaoHorario)
FROM    tblHorarios h1
GROUP BY
	FK_ID_intTipoHorario, 
	FK_ID_intEmpresa,
	FK_ID_intFuncionario,
	datHoraEntradaHorario,
	datHoraSaidaHorario,
	strNumHorario,
	DATEPART(dw, datDataAlteracaoHorario)
HAVING	COUNT(*) > 1
	AND MIN(datDataAlteracaoHorario) <> MAX(datDataAlteracaoHorario)

Open in new window

0
 
LVL 18

Author Comment

by:Cluskitt
ID: 38335705
@acperkins: That is a good idea and I'll try that.

@sqlxl: I know I can create with a null column, update, then update to not be a null column. I was pointing out that, if I add a new column, I have to make changes to the app where I insert values, seeing as I use INSERT INTO tblHorarios VALUES (....), without specifying columns, which would mean an error would be thrown due to the number of columns not matching.

@acperkins: (regarding the last query) Not at all. What I want to delete is consecutive dates on the same weekday (for example, two mondays, either one week or one year apart. As long as they're consecutive, that is, no other monday in between) that have the same datHoraEntradaHorario, datHoraSaidaHorario, FK_ID_intTipoHorario and strNumHorario. I'm not exactly sure what your query is doing, but it's returning a little over 14k rows, in a 68k table. Seeing as I ran the delete recently, it shouldn't have more than a few hundred rows at most.
0
 
LVL 18

Author Comment

by:Cluskitt
ID: 38335761
I can't seem to create a computed column based on the weekday datepart. MS SQL won't allow the column to be persisted due to being non-deterministic, which won't allow me to create the index. I've looked around and can't seem to find a workaround for it. Even (((@@datefirst-1) + datepart(weekday, @dtDate))) doesn't work.

Seems the only way to get this is to create a normal column and calculate it on UPDATE/INSERT.
0
 
LVL 7

Expert Comment

by:sqlxl
ID: 38336275
I guess I was just assuming your were specifying the column names in your INSERT.

This is a perfect example of why it is good practice to do so. (not trying to sound condescending)
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 38336302
What I want to delete is consecutive dates on the same weekday (for example, two mondays, either one week or one year apart. As long as they're consecutive, that is, no other monday in between)
I missed that detail.

I can't seem to create a computed column based on the weekday datepart.
Actually strictly speaking you can make it a computed column, you just cannot index it.  Which defeats the whole point of the exercise.

Seems the only way to get this is to create a normal column and calculate it on UPDATE/INSERT.
I suspect you are right.
0
 
LVL 18

Author Comment

by:Cluskitt
ID: 38336318
I know that. On the other hand, this was a conscious decision. At the beginning there were many DB changes and we decided that it would be better to have the app throw errors rather than having inconsistent results due to inserting null values where there weren't supposed to be null values.

We have very few fields where we actually allow nulls. Mostly, there's always some default value or, more often, some calculated one. This is due to the main data coming from a different DB. This is sort of a complementary DB. Not quite, but on a few things it is. Also, even for the new things, we get lots of data from the other one, so it's rare to have null fields in this one.

It shouldn't be too complicated to change it anyway. I'll just need to identify where the inserts are being done (I think it's about 3-5 pages. A quick find will identify them fast enough) and add the new column. I believe I may have some time to perform this operation today, especially because I suspect inserting this new column will allow me to speed up other queries where I use the DATEPART function.
0
 
LVL 7

Expert Comment

by:sqlxl
ID: 38336426
Let us know how it goes.

Thanks!
0
 
LVL 18

Author Closing Comment

by:Cluskitt
ID: 38336734
The query now takes 4 seconds to run. This is very acceptable. Thanks for the help.
0
 
LVL 7

Expert Comment

by:sqlxl
ID: 38336899
2 minutes to 4 seconds will work.

Mind sending us an updated execution plan? I want to see if it behaved the way I expected.

Thanks!
0
 
LVL 18

Author Comment

by:Cluskitt
ID: 38336918
Sure.
Updated query:
--delete h1
SELECT *
FROM tblHorarios h1
INNER JOIN tblHorarios h2
ON h1.FK_ID_intEmpresa=h2.FK_ID_intEmpresa
	AND h1.FK_ID_intFuncionario=h2.FK_ID_intFuncionario
	AND h1.FK_ID_intTipoHorario=h2.FK_ID_intTipoHorario
	AND h2.datDataAlteracaoHorario=(SELECT MAX(datDataAlteracaoHorario)
									FROM tblhorarios h3
									WHERE h3.FK_ID_intEmpresa=h1.FK_ID_intEmpresa
										AND h3.FK_ID_intFuncionario=h1.FK_ID_intFuncionario
										AND h3.FK_ID_intTipoHorario=h1.FK_ID_intTipoHorario
										AND h3.intDiaSemana=h1.intDiaSemana
										AND h3.datDataAlteracaoHorario<h1.datDataAlteracaoHorario)
WHERE h1.datHoraEntradaHorario=h2.datHoraEntradaHorario
	AND h1.datHoraSaidaHorario=h2.datHoraSaidaHorario
	AND h1.strNumHorario=h2.strNumHorario

Open in new window

Exec.sql
0
 
LVL 18

Author Comment

by:Cluskitt
ID: 38337003
Actually, after doing a restore on the table (due to a different issue), the select took 14 seconds, the delete took 48 seconds to run. Maybe I created the index wrong? I just created a nonclustered index on the field (I now use it on other queries)
0
 
LVL 7

Expert Comment

by:sqlxl
ID: 38337359
Is it the same index from my original post?

FK_ID_intEmpresa
FK_ID_intFuncionario
FK_ID_intTipoHorario
datDataAlteracaoHorario
0
 
LVL 18

Author Comment

by:Cluskitt
ID: 38339867
I believe I already had one of those. Although it would also have strNumHorario. After all, those 4 fields along with this one are the table keys.
0
 
LVL 18

Author Comment

by:Cluskitt
ID: 38339881
Just checked the DB. I have these two indexes now:
CREATE NONCLUSTERED INDEX [_dta_index_tblHorarios_5_1879677744__K7_K2_K1_K5_K6_3_4] ON [dbo].[tblHorarios] 
(
	[strNumHorario] ASC,
	[FK_ID_intFuncionario] ASC,
	[FK_ID_intEmpresa] ASC,
	[FK_ID_intTipoHorario] ASC,
	[datDataAlteracaoHorario] ASC
)
INCLUDE ( [datHoraEntradaHorario],
[datHoraSaidaHorario]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Open in new window

CREATE NONCLUSTERED INDEX [intDiaSemanaIDX] ON [dbo].[tblHorarios] 
(
	[intDiaSemana] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Open in new window

0
 
LVL 7

Expert Comment

by:sqlxl
ID: 38340726
Are you running the original query of yours or the modified one?

Change AND DATEPART(dw,h3.datDataAlteracaoHorario) =DATEPART(dw,h1.datDataAlteracaoHorario)
to
AND h3.datDataAlteracaoHorarioDW =h1.datDataAlteracaoHorarioDW
0
 
LVL 18

Author Comment

by:Cluskitt
ID: 38340772
I'm running the modified one, as posted in #a38336918. It's the one that was used for the execution plan. I'm also running profiler on the server, for tuning purposes. I took the chance to run that query 2 or 3 times. Maybe advisor will suggest something.

Btw, the field you're calling datDataAlteracaoHorarioDW, I've named intDiaSemana.
0
 
LVL 18

Author Comment

by:Cluskitt
ID: 38344493
The advisor created a whole bunch of statistics, and now the query takes 1 second to run. Seems to be a closed case. Thanks once again.
0
 
LVL 7

Expert Comment

by:sqlxl
ID: 38345209
Cool!
0

Featured Post

Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Microsoft Access has a limit of 255 columns in a single table; SQL Server allows tables with over 255 columns, but reading that data is not necessarily simple.  The final solution for this task involved creating a custom text parser and then reading…
Recursive SQL is one of the most fascinating and powerful and yet dangerous feature offered in many modern databases today using a Common Table Expression (CTE) first introduced in the ANSI SQL 99 standard. The first implementations of CTE began ap…
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question