Solved

SQL Join Query on large table getting slower

Posted on 2010-09-16
5
750 Views
Last Modified: 2012-05-10
We are trying to update a very large SQL table (about 150 million records), by joining it with a smaller table (90,000 records), using SQL Server Workgroup 2008.

The query we are running is:
Update table1
set Bad_Account = table2.Bad_Account
from table1 inner join table2 on table1.LOCNUM = table2.LOCNUM
where table1.EVENTID >= 438651 and table1.EVENTID <= 439183

Table 1 is indexed with a nonclustered index only on the LOCNUM field. Table2 has a primary key on LOCNUM.

The query takes several hours to run. The reason that we have to select certain EVENTID's is because we have limited space on the server, and the resulting log file output would be too large if we tried to run all EVENTIDs at the same time.

What we do is:
-Run the above query for a range of about 500 EVENTID's
-Shrink the database manually in SQL server management studio
- Run the query again with the next 500 EVENTID's.

We have noticed that the query takes longer and longer to run each time. The first time it took about half an hour. By the time we ran the last query it took over 3 hours.

I suspect that there could be an issue with the indexes becoming fragmented. However, the query to initially create the index took 16 hours, so I don't think it would make sense to re-index very frequently.

Do you have any advice as to why the query may be taking longer each time, or how we could optimize the query in general?

0
Comment
Question by:jnelson219
  • 3
  • 2
5 Comments
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 33696850
if you could, I think you should try create new index on table1 to support this query

create index of EVENTID (and check query plan, if it isn't uses this index, we should give it INDEX HINT)
OR
index by 2 fields  LOCNUM and EVENTID

since current situation, you have index only on LOCNUM,
even it help to narrow down the data to delete.
but in case data of each LOCNUM still large amount,
change the index column or got more index column should provide better result.
0
 
LVL 11

Expert Comment

by:JoeNuvo
ID: 33696902
and  instead of delete base on eventid,
why can't just delete it?

using TOP or SET ROWCOUNT to help
so you can remove (and ignore) WHERE condition of EVENTID

another note is, should not try to SHRINK database every loop of running, it's waste of time and not provide much help
0
 

Author Comment

by:jnelson219
ID: 33697038
That is a good idea. Let's say we wanted to break the query into segments of 10 million rows.
I understand how we could run the query to only execute for the first 10 million.

Set rowcount = 10000000;
Update table1
set Bad_Account = table2.Bad_Account
from table1 inner join table2 on table1.LOCNUM = table2.LOCNUM

But what would happen the second time that we ran the query -- how would we tell SQL to only update the rows that have not yet been updated?

I agree - I would like to not SHRINK the database each time, but the log file is taking up almost all of the space on our small server, and this is the only way I know to reduce the size of the log file (set to simple recovery mode).
0
 
LVL 11

Accepted Solution

by:
JoeNuvo earned 250 total points
ID: 33697930
Oh, I get a point now.
I misconception with my own work on large table.
But that time it's the delete task, so it will never delete the same rows twice :-p

for your case. my suggestion is
1) either new index to support the query for EventID or LOCNUM + EventID (and uses your method of block update by eventid)
    but this way, you need to have enough disk space for index.
2) make a loop, get 1 LOCNUM from table2 for each round, and update that LOCNUM on table1
    this is expecting 1 LOCNUM should not have data more than 10 million (base on your rowcount setting)

Anyway, how much of space left for your log?

extra note : since you are perform update over and over.
LOG will consume back the space after you shrink it.
should be better to finish all the update and shrink after.
0
 

Author Closing Comment

by:jnelson219
ID: 33737934
The response did not solve the slow running of the query, but it was helpful information for optimizing the query execution.
0

Featured Post

NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

After restoring a Microsoft SQL Server database (.bak) from backup or attaching .mdf file, you may run into "Error '15023' User or role already exists in the current database" when you use the "User Mapping" SQL Management Studio functionality to al…
Hi all, It is important and often overlooked to understand “Database properties”. Often we see questions about "log files" or "where is the database" and one of the easiest ways to get general information about your database is to use “Database p…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

791 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question