SQL Join Query on large table getting slower

Posted on 2010-09-16
Last Modified: 2012-05-10
We are trying to update a very large SQL table (about 150 million records), by joining it with a smaller table (90,000 records), using SQL Server Workgroup 2008.

The query we are running is:
Update table1
set Bad_Account = table2.Bad_Account
from table1 inner join table2 on table1.LOCNUM = table2.LOCNUM
where table1.EVENTID >= 438651 and table1.EVENTID <= 439183

Table 1 is indexed with a nonclustered index only on the LOCNUM field. Table2 has a primary key on LOCNUM.

The query takes several hours to run. The reason that we have to select certain EVENTID's is because we have limited space on the server, and the resulting log file output would be too large if we tried to run all EVENTIDs at the same time.

What we do is:
-Run the above query for a range of about 500 EVENTID's
-Shrink the database manually in SQL server management studio
- Run the query again with the next 500 EVENTID's.

We have noticed that the query takes longer and longer to run each time. The first time it took about half an hour. By the time we ran the last query it took over 3 hours.

I suspect that there could be an issue with the indexes becoming fragmented. However, the query to initially create the index took 16 hours, so I don't think it would make sense to re-index very frequently.

Do you have any advice as to why the query may be taking longer each time, or how we could optimize the query in general?

Question by:jnelson219
  • 3
  • 2
LVL 11

Expert Comment

ID: 33696850
if you could, I think you should try create new index on table1 to support this query

create index of EVENTID (and check query plan, if it isn't uses this index, we should give it INDEX HINT)
index by 2 fields  LOCNUM and EVENTID

since current situation, you have index only on LOCNUM,
even it help to narrow down the data to delete.
but in case data of each LOCNUM still large amount,
change the index column or got more index column should provide better result.
LVL 11

Expert Comment

ID: 33696902
and  instead of delete base on eventid,
why can't just delete it?

using TOP or SET ROWCOUNT to help
so you can remove (and ignore) WHERE condition of EVENTID

another note is, should not try to SHRINK database every loop of running, it's waste of time and not provide much help

Author Comment

ID: 33697038
That is a good idea. Let's say we wanted to break the query into segments of 10 million rows.
I understand how we could run the query to only execute for the first 10 million.

Set rowcount = 10000000;
Update table1
set Bad_Account = table2.Bad_Account
from table1 inner join table2 on table1.LOCNUM = table2.LOCNUM

But what would happen the second time that we ran the query -- how would we tell SQL to only update the rows that have not yet been updated?

I agree - I would like to not SHRINK the database each time, but the log file is taking up almost all of the space on our small server, and this is the only way I know to reduce the size of the log file (set to simple recovery mode).
LVL 11

Accepted Solution

JoeNuvo earned 250 total points
ID: 33697930
Oh, I get a point now.
I misconception with my own work on large table.
But that time it's the delete task, so it will never delete the same rows twice :-p

for your case. my suggestion is
1) either new index to support the query for EventID or LOCNUM + EventID (and uses your method of block update by eventid)
    but this way, you need to have enough disk space for index.
2) make a loop, get 1 LOCNUM from table2 for each round, and update that LOCNUM on table1
    this is expecting 1 LOCNUM should not have data more than 10 million (base on your rowcount setting)

Anyway, how much of space left for your log?

extra note : since you are perform update over and over.
LOG will consume back the space after you shrink it.
should be better to finish all the update and shrink after.

Author Closing Comment

ID: 33737934
The response did not solve the slow running of the query, but it was helpful information for optimizing the query execution.

Featured Post

Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

     When we have to pass multiple rows of data to SQL Server, the developers either have to send one row at a time or come up with other workarounds to meet requirements like using XML to pass data, which is complex and tedious to use. There is a …
Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …
Email security requires an ever evolving service that stays up to date with counter-evolving threats. The Email Laundry perform Research and Development to ensure their email security service evolves faster than cyber criminals. We apply our Threat…

773 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question