SQL Join Query on large table getting slower

Posted on 2010-09-16
Medium Priority
Last Modified: 2012-05-10
We are trying to update a very large SQL table (about 150 million records), by joining it with a smaller table (90,000 records), using SQL Server Workgroup 2008.

The query we are running is:
Update table1
set Bad_Account = table2.Bad_Account
from table1 inner join table2 on table1.LOCNUM = table2.LOCNUM
where table1.EVENTID >= 438651 and table1.EVENTID <= 439183

Table 1 is indexed with a nonclustered index only on the LOCNUM field. Table2 has a primary key on LOCNUM.

The query takes several hours to run. The reason that we have to select certain EVENTID's is because we have limited space on the server, and the resulting log file output would be too large if we tried to run all EVENTIDs at the same time.

What we do is:
-Run the above query for a range of about 500 EVENTID's
-Shrink the database manually in SQL server management studio
- Run the query again with the next 500 EVENTID's.

We have noticed that the query takes longer and longer to run each time. The first time it took about half an hour. By the time we ran the last query it took over 3 hours.

I suspect that there could be an issue with the indexes becoming fragmented. However, the query to initially create the index took 16 hours, so I don't think it would make sense to re-index very frequently.

Do you have any advice as to why the query may be taking longer each time, or how we could optimize the query in general?

Question by:jnelson219
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
LVL 11

Expert Comment

ID: 33696850
if you could, I think you should try create new index on table1 to support this query

create index of EVENTID (and check query plan, if it isn't uses this index, we should give it INDEX HINT)
index by 2 fields  LOCNUM and EVENTID

since current situation, you have index only on LOCNUM,
even it help to narrow down the data to delete.
but in case data of each LOCNUM still large amount,
change the index column or got more index column should provide better result.
LVL 11

Expert Comment

ID: 33696902
and  instead of delete base on eventid,
why can't just delete it?

using TOP or SET ROWCOUNT to help
so you can remove (and ignore) WHERE condition of EVENTID

another note is, should not try to SHRINK database every loop of running, it's waste of time and not provide much help

Author Comment

ID: 33697038
That is a good idea. Let's say we wanted to break the query into segments of 10 million rows.
I understand how we could run the query to only execute for the first 10 million.

Set rowcount = 10000000;
Update table1
set Bad_Account = table2.Bad_Account
from table1 inner join table2 on table1.LOCNUM = table2.LOCNUM

But what would happen the second time that we ran the query -- how would we tell SQL to only update the rows that have not yet been updated?

I agree - I would like to not SHRINK the database each time, but the log file is taking up almost all of the space on our small server, and this is the only way I know to reduce the size of the log file (set to simple recovery mode).
LVL 11

Accepted Solution

JoeNuvo earned 750 total points
ID: 33697930
Oh, I get a point now.
I misconception with my own work on large table.
But that time it's the delete task, so it will never delete the same rows twice :-p

for your case. my suggestion is
1) either new index to support the query for EventID or LOCNUM + EventID (and uses your method of block update by eventid)
    but this way, you need to have enough disk space for index.
2) make a loop, get 1 LOCNUM from table2 for each round, and update that LOCNUM on table1
    this is expecting 1 LOCNUM should not have data more than 10 million (base on your rowcount setting)

Anyway, how much of space left for your log?

extra note : since you are perform update over and over.
LOG will consume back the space after you shrink it.
should be better to finish all the update and shrink after.

Author Closing Comment

ID: 33737934
The response did not solve the slow running of the query, but it was helpful information for optimizing the query execution.

Featured Post

Get your Disaster Recovery as a Service basics

Disaster Recovery as a Service is one go-to solution that revolutionizes DR planning. Implementing DRaaS could be an efficient process, easily accessible to non-DR experts. Learn about monitoring, testing, executing failovers and failbacks to ensure a "healthy" DR environment.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Hi all, It is important and often overlooked to understand “Database properties”. Often we see questions about "log files" or "where is the database" and one of the easiest ways to get general information about your database is to use “Database p…
Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question