Cluskitt
asked on
Speed a query
I have this table:
oraSaidaHo rario and strNumHorario, then it's a duplicate and needs to be deleted.
I've attached the execution plan (rename it to sqlplan, as EE uploader doesn't allow sqlplan extension). I've noticed that the last row on the query (strNumHorario) increased the execution time, but I'm not sure why.
ExecPlan.sql
CREATE TABLE [dbo].[tblHorarios](
[FK_ID_intEmpresa] [int] NOT NULL,
[FK_ID_intFuncionario] [int] NOT NULL,
[datHoraEntradaHorario] [smalldatetime] NOT NULL,
[datHoraSaidaHorario] [smalldatetime] NOT NULL,
[FK_ID_intTipoHorario] [uniqueidentifier] NOT NULL,
[datDataAlteracaoHorario] [smalldatetime] NOT NULL,
[strNumHorario] [varchar](50) NOT NULL,
CONSTRAINT [PK_tblHorarios] PRIMARY KEY CLUSTERED
(
[FK_ID_intEmpresa] ASC,
[FK_ID_intFuncionario] ASC,
[FK_ID_intTipoHorario] ASC,
[datDataAlteracaoHorario] ASC,
[strNumHorario] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
I need to speed up this query:DELETE h2
FROM tblHorarios h1
INNER JOIN tblHorarios h2
ON h1.FK_ID_intEmpresa=h2.FK_ID_intEmpresa
AND h1.FK_ID_intFuncionario=h2.FK_ID_intFuncionario
AND h1.FK_ID_intTipoHorario=h2.FK_ID_intTipoHorario
AND h2.datDataAlteracaoHorario=(SELECT MAX(datDataAlteracaoHorario)
FROM tblhorarios h3
WHERE h3.FK_ID_intEmpresa=h1.FK_ID_intEmpresa
AND h3.FK_ID_intFuncionario=h1.FK_ID_intFuncionario
AND h3.FK_ID_intTipoHorario=h1.FK_ID_intTipoHorario
AND DATEPART(dw,h3.datDataAlteracaoHorario)=DATEPART(dw,h1.datDataAlteracaoHorario)
AND h3.datDataAlteracaoHorario<h1.datDataAlteracaoHorario)
WHERE h1.datHoraEntradaHorario=h2.datHoraEntradaHorario
AND h1.datHoraSaidaHorario=h2.datHoraSaidaHorario
AND h1.strNumHorario=h2.strNumHorario
This takes quite some time, and I'm not sure how to improve on it. Basically, for each row, if the previous row with the same week day, FK_ID_intTipoHorario, FK_ID_intEmpresa and FK_ID_intFuncionario have the same values for datHoraEntradaHorario,datHI've attached the execution plan (rename it to sqlplan, as EE uploader doesn't allow sqlplan extension). I've noticed that the last row on the query (strNumHorario) increased the execution time, but I'm not sure why.
ExecPlan.sql
ASKER
The table is the same. It's comparing with itself. As of right now, it has about 75k rows. It's taking about 2 mins to run. If it were just the DELETE query, that wouldn't be much of a problem, but we also check that query with a SELECT *.
That is a long time for a table with only 75,000 rows.
How often do you run this query?
How often do you run this query?
ASKER
The delete one will be run either every day or every other day. The select will be run at least once an hour.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
So, you're saying to basically create a week day column. Yes, I can see that that would be much more effective, however that would require some changes to the site, especially all the parts in the code where I do inserts on that table, which will then require a new column. I won't be able to do such an extensive change today, so I'll have to get back to you on monday.
however that would require some changes to the site, especially all the parts in the code
where I do inserts on that table, which will then require a new column.
Not necessarily. Perhaps you can create a computed column and index it. That would not require any changes.
where I do inserts on that table, which will then require a new column.
Not necessarily. Perhaps you can create a computed column and index it. That would not require any changes.
You could add the column with a null allowed and have a job update the value. No change to the app itself.
Suppose you could use a trigger too but I personally am not a fan of triggers.
acperkins computed column would also be a good idea. Better than a null column or trigger.
Suppose you could use a trigger too but I personally am not a fan of triggers.
acperkins computed column would also be a good idea. Better than a null column or trigger.
Just so that I understand are the following rows that should be deleted?
SELECT FK_ID_intTipoHorario,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
datHoraEntradaHorario,
datHoraSaidaHorario,
strNumHorario,
MAX(datDataAlteracaoHorario)
FROM tblHorarios h1
GROUP BY
FK_ID_intTipoHorario,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
datHoraEntradaHorario,
datHoraSaidaHorario,
strNumHorario,
DATEPART(dw, datDataAlteracaoHorario)
HAVING COUNT(*) > 1
AND MIN(datDataAlteracaoHorario) <> MAX(datDataAlteracaoHorario)
ASKER
@acperkins: That is a good idea and I'll try that.
@sqlxl: I know I can create with a null column, update, then update to not be a null column. I was pointing out that, if I add a new column, I have to make changes to the app where I insert values, seeing as I use INSERT INTO tblHorarios VALUES (....), without specifying columns, which would mean an error would be thrown due to the number of columns not matching.
@acperkins: (regarding the last query) Not at all. What I want to delete is consecutive dates on the same weekday (for example, two mondays, either one week or one year apart. As long as they're consecutive, that is, no other monday in between) that have the same datHoraEntradaHorario, datHoraSaidaHorario, FK_ID_intTipoHorario and strNumHorario. I'm not exactly sure what your query is doing, but it's returning a little over 14k rows, in a 68k table. Seeing as I ran the delete recently, it shouldn't have more than a few hundred rows at most.
@sqlxl: I know I can create with a null column, update, then update to not be a null column. I was pointing out that, if I add a new column, I have to make changes to the app where I insert values, seeing as I use INSERT INTO tblHorarios VALUES (....), without specifying columns, which would mean an error would be thrown due to the number of columns not matching.
@acperkins: (regarding the last query) Not at all. What I want to delete is consecutive dates on the same weekday (for example, two mondays, either one week or one year apart. As long as they're consecutive, that is, no other monday in between) that have the same datHoraEntradaHorario, datHoraSaidaHorario, FK_ID_intTipoHorario and strNumHorario. I'm not exactly sure what your query is doing, but it's returning a little over 14k rows, in a 68k table. Seeing as I ran the delete recently, it shouldn't have more than a few hundred rows at most.
ASKER
I can't seem to create a computed column based on the weekday datepart. MS SQL won't allow the column to be persisted due to being non-deterministic, which won't allow me to create the index. I've looked around and can't seem to find a workaround for it. Even (((@@datefirst-1) + datepart(weekday, @dtDate))) doesn't work.
Seems the only way to get this is to create a normal column and calculate it on UPDATE/INSERT.
Seems the only way to get this is to create a normal column and calculate it on UPDATE/INSERT.
I guess I was just assuming your were specifying the column names in your INSERT.
This is a perfect example of why it is good practice to do so. (not trying to sound condescending)
This is a perfect example of why it is good practice to do so. (not trying to sound condescending)
What I want to delete is consecutive dates on the same weekday (for example, two mondays, either one week or one year apart. As long as they're consecutive, that is, no other monday in between)
I missed that detail.
I can't seem to create a computed column based on the weekday datepart.
Actually strictly speaking you can make it a computed column, you just cannot index it. Which defeats the whole point of the exercise.
Seems the only way to get this is to create a normal column and calculate it on UPDATE/INSERT.
I suspect you are right.
I missed that detail.
I can't seem to create a computed column based on the weekday datepart.
Actually strictly speaking you can make it a computed column, you just cannot index it. Which defeats the whole point of the exercise.
Seems the only way to get this is to create a normal column and calculate it on UPDATE/INSERT.
I suspect you are right.
ASKER
I know that. On the other hand, this was a conscious decision. At the beginning there were many DB changes and we decided that it would be better to have the app throw errors rather than having inconsistent results due to inserting null values where there weren't supposed to be null values.
We have very few fields where we actually allow nulls. Mostly, there's always some default value or, more often, some calculated one. This is due to the main data coming from a different DB. This is sort of a complementary DB. Not quite, but on a few things it is. Also, even for the new things, we get lots of data from the other one, so it's rare to have null fields in this one.
It shouldn't be too complicated to change it anyway. I'll just need to identify where the inserts are being done (I think it's about 3-5 pages. A quick find will identify them fast enough) and add the new column. I believe I may have some time to perform this operation today, especially because I suspect inserting this new column will allow me to speed up other queries where I use the DATEPART function.
We have very few fields where we actually allow nulls. Mostly, there's always some default value or, more often, some calculated one. This is due to the main data coming from a different DB. This is sort of a complementary DB. Not quite, but on a few things it is. Also, even for the new things, we get lots of data from the other one, so it's rare to have null fields in this one.
It shouldn't be too complicated to change it anyway. I'll just need to identify where the inserts are being done (I think it's about 3-5 pages. A quick find will identify them fast enough) and add the new column. I believe I may have some time to perform this operation today, especially because I suspect inserting this new column will allow me to speed up other queries where I use the DATEPART function.
Let us know how it goes.
Thanks!
Thanks!
ASKER
The query now takes 4 seconds to run. This is very acceptable. Thanks for the help.
2 minutes to 4 seconds will work.
Mind sending us an updated execution plan? I want to see if it behaved the way I expected.
Thanks!
Mind sending us an updated execution plan? I want to see if it behaved the way I expected.
Thanks!
ASKER
Sure.
Updated query:
Updated query:
--delete h1
SELECT *
FROM tblHorarios h1
INNER JOIN tblHorarios h2
ON h1.FK_ID_intEmpresa=h2.FK_ID_intEmpresa
AND h1.FK_ID_intFuncionario=h2.FK_ID_intFuncionario
AND h1.FK_ID_intTipoHorario=h2.FK_ID_intTipoHorario
AND h2.datDataAlteracaoHorario=(SELECT MAX(datDataAlteracaoHorario)
FROM tblhorarios h3
WHERE h3.FK_ID_intEmpresa=h1.FK_ID_intEmpresa
AND h3.FK_ID_intFuncionario=h1.FK_ID_intFuncionario
AND h3.FK_ID_intTipoHorario=h1.FK_ID_intTipoHorario
AND h3.intDiaSemana=h1.intDiaSemana
AND h3.datDataAlteracaoHorario<h1.datDataAlteracaoHorario)
WHERE h1.datHoraEntradaHorario=h2.datHoraEntradaHorario
AND h1.datHoraSaidaHorario=h2.datHoraSaidaHorario
AND h1.strNumHorario=h2.strNumHorario
Exec.sql
ASKER
Actually, after doing a restore on the table (due to a different issue), the select took 14 seconds, the delete took 48 seconds to run. Maybe I created the index wrong? I just created a nonclustered index on the field (I now use it on other queries)
Is it the same index from my original post?
FK_ID_intEmpresa
FK_ID_intFuncionario
FK_ID_intTipoHorario
datDataAlteracaoHorario
FK_ID_intEmpresa
FK_ID_intFuncionario
FK_ID_intTipoHorario
datDataAlteracaoHorario
ASKER
I believe I already had one of those. Although it would also have strNumHorario. After all, those 4 fields along with this one are the table keys.
ASKER
Just checked the DB. I have these two indexes now:
CREATE NONCLUSTERED INDEX [_dta_index_tblHorarios_5_1879677744__K7_K2_K1_K5_K6_3_4] ON [dbo].[tblHorarios]
(
[strNumHorario] ASC,
[FK_ID_intFuncionario] ASC,
[FK_ID_intEmpresa] ASC,
[FK_ID_intTipoHorario] ASC,
[datDataAlteracaoHorario] ASC
)
INCLUDE ( [datHoraEntradaHorario],
[datHoraSaidaHorario]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [intDiaSemanaIDX] ON [dbo].[tblHorarios]
(
[intDiaSemana] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Are you running the original query of yours or the modified one?
Change AND DATEPART(dw,h3.datDataAlte racaoHorar io) =DATEPART(dw,h1.datDataAlt eracaoHora rio)
to
AND h3.datDataAlteracaoHorario DW =h1.datDataAlteracaoHorari oDW
Change AND DATEPART(dw,h3.datDataAlte
to
AND h3.datDataAlteracaoHorario
ASKER
I'm running the modified one, as posted in #a38336918. It's the one that was used for the execution plan. I'm also running profiler on the server, for tuning purposes. I took the chance to run that query 2 or 3 times. Maybe advisor will suggest something.
Btw, the field you're calling datDataAlteracaoHorarioDW, I've named intDiaSemana.
Btw, the field you're calling datDataAlteracaoHorarioDW,
ASKER
The advisor created a whole bunch of statistics, and now the query takes 1 second to run. Seems to be a closed case. Thanks once again.
Cool!
How many rows in each of the tables?