asked on

Speed a query

I have this table:

CREATE TABLE [dbo].[tblHorarios](
	[FK_ID_intEmpresa] [int] NOT NULL,
	[FK_ID_intFuncionario] [int] NOT NULL,
	[datHoraEntradaHorario] [smalldatetime] NOT NULL,
	[datHoraSaidaHorario] [smalldatetime] NOT NULL,
	[FK_ID_intTipoHorario] [uniqueidentifier] NOT NULL,
	[datDataAlteracaoHorario] [smalldatetime] NOT NULL,
	[strNumHorario] [varchar](50) NOT NULL,
 CONSTRAINT [PK_tblHorarios] PRIMARY KEY CLUSTERED 
(
	[FK_ID_intEmpresa] ASC,
	[FK_ID_intFuncionario] ASC,
	[FK_ID_intTipoHorario] ASC,
	[datDataAlteracaoHorario] ASC,
	[strNumHorario] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Open in new window

I need to speed up this query:

DELETE h2
FROM tblHorarios h1
INNER JOIN tblHorarios h2
ON h1.FK_ID_intEmpresa=h2.FK_ID_intEmpresa
	AND h1.FK_ID_intFuncionario=h2.FK_ID_intFuncionario
	AND h1.FK_ID_intTipoHorario=h2.FK_ID_intTipoHorario
	AND h2.datDataAlteracaoHorario=(SELECT MAX(datDataAlteracaoHorario)
									FROM tblhorarios h3
									WHERE h3.FK_ID_intEmpresa=h1.FK_ID_intEmpresa
										AND h3.FK_ID_intFuncionario=h1.FK_ID_intFuncionario
										AND h3.FK_ID_intTipoHorario=h1.FK_ID_intTipoHorario
										AND DATEPART(dw,h3.datDataAlteracaoHorario)=DATEPART(dw,h1.datDataAlteracaoHorario)
										AND h3.datDataAlteracaoHorario<h1.datDataAlteracaoHorario)
WHERE h1.datHoraEntradaHorario=h2.datHoraEntradaHorario
	AND h1.datHoraSaidaHorario=h2.datHoraSaidaHorario
	AND h1.strNumHorario=h2.strNumHorario

Open in new window

This takes quite some time, and I'm not sure how to improve on it. Basically, for each row, if the previous row with the same week day, FK_ID_intTipoHorario, FK_ID_intEmpresa and FK_ID_intFuncionario have the same values for datHoraEntradaHorario,datHoraSaidaHorario and strNumHorario, then it's a duplicate and needs to be deleted.

I've attached the execution plan (rename it to sqlplan, as EE uploader doesn't allow sqlplan extension). I've noticed that the last row on the query (strNumHorario) increased the execution time, but I'm not sure why.
ExecPlan.sql

sqlxl

How long does it currently take to run?

How many rows in each of the tables?

Cluskitt

ASKER

The table is the same. It's comparing with itself. As of right now, it has about 75k rows. It's taking about 2 mins to run. If it were just the DELETE query, that wouldn't be much of a problem, but we also check that query with a SELECT *.

sqlxl

That is a long time for a table with only 75,000 rows.

How often do you run this query?

Cluskitt

ASKER

The delete one will be run either every day or every other day. The select will be run at least once an hour.

ASKER CERTIFIED SOLUTION

sqlxl

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Cluskitt

ASKER

So, you're saying to basically create a week day column. Yes, I can see that that would be much more effective, however that would require some changes to the site, especially all the parts in the code where I do inserts on that table, which will then require a new column. I won't be able to do such an extensive change today, so I'll have to get back to you on monday.

Anthony Perkins

however that would require some changes to the site, especially all the parts in the code
where I do inserts on that table, which will then require a new column.
Not necessarily. Perhaps you can create a computed column and index it. That would not require any changes.

sqlxl

You could add the column with a null allowed and have a job update the value. No change to the app itself.

Suppose you could use a trigger too but I personally am not a fan of triggers.

acperkins computed column would also be a good idea. Better than a null column or trigger.

Anthony Perkins

Just so that I understand are the following rows that should be deleted?

SELECT	FK_ID_intTipoHorario, 
	FK_ID_intEmpresa,
	FK_ID_intFuncionario,
	datHoraEntradaHorario,
	datHoraSaidaHorario,
	strNumHorario,
	MAX(datDataAlteracaoHorario)
FROM    tblHorarios h1
GROUP BY
	FK_ID_intTipoHorario, 
	FK_ID_intEmpresa,
	FK_ID_intFuncionario,
	datHoraEntradaHorario,
	datHoraSaidaHorario,
	strNumHorario,
	DATEPART(dw, datDataAlteracaoHorario)
HAVING	COUNT(*) > 1
	AND MIN(datDataAlteracaoHorario) <> MAX(datDataAlteracaoHorario)

Open in new window

Cluskitt

ASKER

@acperkins: That is a good idea and I'll try that.

@sqlxl: I know I can create with a null column, update, then update to not be a null column. I was pointing out that, if I add a new column, I have to make changes to the app where I insert values, seeing as I use INSERT INTO tblHorarios VALUES (....), without specifying columns, which would mean an error would be thrown due to the number of columns not matching.

@acperkins: (regarding the last query) Not at all. What I want to delete is consecutive dates on the same weekday (for example, two mondays, either one week or one year apart. As long as they're consecutive, that is, no other monday in between) that have the same datHoraEntradaHorario, datHoraSaidaHorario, FK_ID_intTipoHorario and strNumHorario. I'm not exactly sure what your query is doing, but it's returning a little over 14k rows, in a 68k table. Seeing as I ran the delete recently, it shouldn't have more than a few hundred rows at most.

Cluskitt

ASKER

I can't seem to create a computed column based on the weekday datepart. MS SQL won't allow the column to be persisted due to being non-deterministic, which won't allow me to create the index. I've looked around and can't seem to find a workaround for it. Even (((@@datefirst-1) + datepart(weekday, @dtDate))) doesn't work.

Seems the only way to get this is to create a normal column and calculate it on UPDATE/INSERT.

sqlxl

I guess I was just assuming your were specifying the column names in your INSERT.

This is a perfect example of why it is good practice to do so. (not trying to sound condescending)

Anthony Perkins

What I want to delete is consecutive dates on the same weekday (for example, two mondays, either one week or one year apart. As long as they're consecutive, that is, no other monday in between)
I missed that detail.

I can't seem to create a computed column based on the weekday datepart.
Actually strictly speaking you can make it a computed column, you just cannot index it. Which defeats the whole point of the exercise.

Seems the only way to get this is to create a normal column and calculate it on UPDATE/INSERT.
I suspect you are right.

Cluskitt

ASKER

I know that. On the other hand, this was a conscious decision. At the beginning there were many DB changes and we decided that it would be better to have the app throw errors rather than having inconsistent results due to inserting null values where there weren't supposed to be null values.

We have very few fields where we actually allow nulls. Mostly, there's always some default value or, more often, some calculated one. This is due to the main data coming from a different DB. This is sort of a complementary DB. Not quite, but on a few things it is. Also, even for the new things, we get lots of data from the other one, so it's rare to have null fields in this one.

It shouldn't be too complicated to change it anyway. I'll just need to identify where the inserts are being done (I think it's about 3-5 pages. A quick find will identify them fast enough) and add the new column. I believe I may have some time to perform this operation today, especially because I suspect inserting this new column will allow me to speed up other queries where I use the DATEPART function.

sqlxl

Let us know how it goes.

Thanks!

Cluskitt

ASKER

The query now takes 4 seconds to run. This is very acceptable. Thanks for the help.

sqlxl

2 minutes to 4 seconds will work.

Mind sending us an updated execution plan? I want to see if it behaved the way I expected.

Thanks!

Cluskitt

ASKER

Sure.
Updated query:

--delete h1
SELECT *
FROM tblHorarios h1
INNER JOIN tblHorarios h2
ON h1.FK_ID_intEmpresa=h2.FK_ID_intEmpresa
	AND h1.FK_ID_intFuncionario=h2.FK_ID_intFuncionario
	AND h1.FK_ID_intTipoHorario=h2.FK_ID_intTipoHorario
	AND h2.datDataAlteracaoHorario=(SELECT MAX(datDataAlteracaoHorario)
									FROM tblhorarios h3
									WHERE h3.FK_ID_intEmpresa=h1.FK_ID_intEmpresa
										AND h3.FK_ID_intFuncionario=h1.FK_ID_intFuncionario
										AND h3.FK_ID_intTipoHorario=h1.FK_ID_intTipoHorario
										AND h3.intDiaSemana=h1.intDiaSemana
										AND h3.datDataAlteracaoHorario<h1.datDataAlteracaoHorario)
WHERE h1.datHoraEntradaHorario=h2.datHoraEntradaHorario
	AND h1.datHoraSaidaHorario=h2.datHoraSaidaHorario
	AND h1.strNumHorario=h2.strNumHorario

Open in new window

Exec.sql

Cluskitt

ASKER

Actually, after doing a restore on the table (due to a different issue), the select took 14 seconds, the delete took 48 seconds to run. Maybe I created the index wrong? I just created a nonclustered index on the field (I now use it on other queries)

sqlxl

Is it the same index from my original post?

FK_ID_intEmpresa
FK_ID_intFuncionario
FK_ID_intTipoHorario
datDataAlteracaoHorario

Cluskitt

ASKER

I believe I already had one of those. Although it would also have strNumHorario. After all, those 4 fields along with this one are the table keys.

Cluskitt

ASKER

Just checked the DB. I have these two indexes now:

CREATE NONCLUSTERED INDEX [_dta_index_tblHorarios_5_1879677744__K7_K2_K1_K5_K6_3_4] ON [dbo].[tblHorarios] 
(
	[strNumHorario] ASC,
	[FK_ID_intFuncionario] ASC,
	[FK_ID_intEmpresa] ASC,
	[FK_ID_intTipoHorario] ASC,
	[datDataAlteracaoHorario] ASC
)
INCLUDE ( [datHoraEntradaHorario],
[datHoraSaidaHorario]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Open in new window

CREATE NONCLUSTERED INDEX [intDiaSemanaIDX] ON [dbo].[tblHorarios] 
(
	[intDiaSemana] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Open in new window

sqlxl

Are you running the original query of yours or the modified one?

Change AND DATEPART(dw,h3.datDataAlteracaoHorario) =DATEPART(dw,h1.datDataAlteracaoHorario)
to
AND h3.datDataAlteracaoHorarioDW =h1.datDataAlteracaoHorarioDW

Cluskitt

ASKER

I'm running the modified one, as posted in #a38336918. It's the one that was used for the execution plan. I'm also running profiler on the server, for tuning purposes. I took the chance to run that query 2 or 3 times. Maybe advisor will suggest something.

Btw, the field you're calling datDataAlteracaoHorarioDW, I've named intDiaSemana.

Cluskitt

ASKER

The advisor created a whole bunch of statistics, and now the query takes 1 second to run. Seems to be a closed case. Thanks once again.

sqlxl

Cool!