Solved

rebuilding indexes takes a long time, how can I make it faster?

Posted on 2006-11-02
23
2,239 Views
Last Modified: 2012-05-05
I have a query that loops through all the tables in my database and rebuilds the indexes. Is there a way to make this run faster? Is there a way to force it to use the tempDB when rebuilding the indexes?


DECLARE @TableName varchar(255)  
DECLARE TableCursor CURSOR FOR   select name from sys.tables where name  like 'src%'  

OPEN TableCursor  
FETCH NEXT FROM TableCursor INTO @TableName  
WHILE @@FETCH_STATUS = 0  
BEGIN  
      DBCC DBREINDEX(@TableName,' ',90)  
      print @TableName  
      FETCH NEXT FROM TableCursor INTO @TableName  
END  

CLOSE TableCursor  
DEALLOCATE TableCursor  
0
Comment
Question by:brokeMyLegBiking
  • 9
  • 6
  • 4
  • +4
23 Comments
 
LVL 9

Expert Comment

by:dduser
Comment Utility
I believe you should not run this Reindexing at Production Time, what is the schedule for this Procedure.

Regards,

dduser
0
 
LVL 29

Expert Comment

by:Nightman
Comment Utility
The only way to make this faster is to put the database into single user mode and prevent other connections (no locking and blocking means better speed).

This already uses the tempdb

What you COULD do is set the database into simple recovery mode first, rebuild, and then set back afterwards, although this is not really advisable
0
 
LVL 4

Author Comment

by:brokeMyLegBiking
Comment Utility
I already have my database in simple recovery mode. How do I put it into single user mode? That would help.

(I run this at design time and then I ship the database out to my customers each month, so I can guaruntee that I'm the only one connecting)

-Joseph
0
 
LVL 29

Expert Comment

by:Nightman
Comment Utility
If you are already the only one connecting, don't worry about single user mode (but if you still want you can run

exec sp_dboption 'DatabaseName','single user',true

If there is more than one open connection this will fail.
0
 
LVL 13

Expert Comment

by:Wizilling
Comment Utility
if you are concerned that reindexing takes longer and ur user suffer because of downtime then u may consider  reindexing each table without using a loop.
I mean , list all ur tables down, find out which ones are big and therefore would take much time.
dbreindex one by one starting with the biggest in portions throughout the week.

or u can go the other way on big tables which is defraging ur indexes which doesnt lock ur tables.
0
 
LVL 29

Expert Comment

by:Nightman
Comment Utility
Defragging will only re-organise - it does not free up unused pages and extentss this.

Defragging reduces fragmentation, but does not improve page density. This means that a partially empty page still has to load into memory (8K) even if only half of the page is filled. Reindexing solves this.

Reindexing also updates statistics with full scan, which makes it better in ALL regards (and since statistics are the basis for the query optimiser algorithm this is invaluable).
0
 
LVL 4

Author Comment

by:brokeMyLegBiking
Comment Utility
Nightman, that is a good point about freeing up space.

But I never mentioned anything about defragging. I am only reindexing.
DBCC DBREINDEX()  is the statement I am running

Is there a way to defreg a sql server database file? ( does a reagular disk defragger program do this?)



0
 
LVL 4

Author Comment

by:brokeMyLegBiking
Comment Utility
one last question, what is the difference between "reorganizing" and "reindexing" an index. These are the two options in SSMS, but when I do the reorganize it doesn't seem to make a difference after I've reindexed. And also it only shows fragmentation %, it doesnt show "disorganization" percent.
0
 
LVL 29

Assisted Solution

by:Nightman
Nightman earned 400 total points
Comment Utility
No - reindex, checkpoint, backup the logs (and then, if you want, you can shrink the database as well).

I imagine that placing your log files and tempdb on seperate, dedicated faster striped disk arrays would make this process faster (but costlier)

Of course, if you tempdb and log disks are already fragmented that will just make things slower (but only if they are significantly slower).

More memory would also help, and faster CPU's (preferably with more cache - e.g. 1MB L2 cache to 2MB L2 cache could give you up to a 20% performance improvement)

How about a bigger server :)
0
 
LVL 4

Author Comment

by:brokeMyLegBiking
Comment Utility
I just bought a brand new Dell Precision 690 with 2 gigs memory, dual core 2.9 ghtz processor and the fastest SATA drive I could find. That's the problem :)

My processes actually runs slower on my new machine than they did on  my old single core 3.2 ghtz, 1gig mem machine, I'm not sure why, but that's a separate issue. (I'm going to buy a new SCSI hard drive to see if that can help me out).
0
 
LVL 29

Accepted Solution

by:
Nightman earned 400 total points
Comment Utility
Well BEFORE you go by a bigger box:

1. See if changing the MAXDOP (max degree of parallellism) to 1 (only allows 1 processor per process) improves this.
2. have a look at your performance counters - if the disk counters are fine during this process then you don't need faster disks (http://www.microsoft.com/technet/prodtechnol/sql/2005/tsprfprb.mspx for analysing perf counters)
3. SATA is slower than SCSI when you are looking at random seek operations (which this will be to soime degree).
4. consider table partitioning (if you are running on SQL 2005) - indexing across filegroups allows SQL 2005 to parallelise the streams and therefore improve performance - even better when striped across a disk array (have a look at the MSDN whitepaper on partitioning http://www.sqlskills.com/blogs/kimberly/ct.ashx?id=3f2b3136-b702-43c5-8f93-b7671de81c46&url=http%3a%2f%2fmsdn.microsoft.com%2flibrary%2fen-us%2fdnsql90%2fhtml%2fsql2k5partition.asp%3fframe%3dtrue)
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 18

Assisted Solution

by:ShogunWade
ShogunWade earned 50 total points
Comment Utility
"runs slower on my new machine than they did on  my old single core 3.2 ghtz"

try tweaking the max degree of parallelism.  or the affinity mask so the plans arent parallel.
0
 
LVL 29

Assisted Solution

by:Nightman
Nightman earned 400 total points
Comment Utility
Also, there is a post SP1 hotfix for SQL 2005 (http://support.microsoft.com/kb/918222/en-us) - supposedly they have improved performance on multi-core machines (I will be benchmarking this over the next few weeks)
0
 
LVL 10

Assisted Solution

by:AaronAbend
AaronAbend earned 50 total points
Comment Utility
I have actually never observed a case where reindexing resulted in better performance.  In all cases I have seen, as noted in Nightman's posting, it is the updating of the stats with full scan that is responsible for any improvements.  So you may find that updating the stats with a small sample will get you equal benefits to the reindex with much less impact on your system

So approach this as a performance problem (which is where this question seems to have gotten you with some ee guidence). Run a profile - determine which queries are running slow.  What is running slow - inserts, updates, deletes, or selects? (Reindexing obviously can't help inserts!)

As for hardware -  I have found that buying more memory is the cheapest way to make a database run faster. Certainly faster than hiring a tuning expert - though both are good!



0
 
LVL 29

Assisted Solution

by:Nightman
Nightman earned 400 total points
Comment Utility
Note: re-indexing (instead of defrag) on heavily fragmented indexes WILL give a siginificant improvement:

1. Updates the statistics (and a full scan is always great). Reason: SQL Query optimiser uses the statistical information as an indicator as to which page the data that you are looking for resides. The more accurate the statistics, the less IOs required to retrieve the data, the better the performance. Statistics are EXTREMELY important.

2. SQL Server stores data in 8K pages. Whenever data is retrieved the page needs to be loaded into memory. Frequently accessed data usually remains in the caxche (unless your server is under memory pressure). Fragmented data pages results in inefficient memory use. More memory will help - but as your indexes grow and become more fragmented you are simply reducing the hit ratio of data in the cache as SQL has to flush the pages from memory to make space for other data. This results in more disk access (because if the data isn't in cache it has to look at the disk.)

If you have an 'off-peak' period where you can do this I would strongly advise it.

Note though that re-indexing is a fully logged operation, and also makes heavy use of tempdb for sorting. I would recommend that you make sure that both tempdb and the log files have enough space pre-allocated to them before doing this, otherwise they will autogrow - this is notoriously slow. Also, before you start, checkpoint, and then backup the log. Then reindex. Then Checkpoint. Then backup the log. This ensures that any committed transactions are flushed from the log. While the file size will not reduce, it will flush the committed data from the log and leave the space free for future transactions to utilise.
0
 
LVL 4

Author Comment

by:brokeMyLegBiking
Comment Utility
avoiding autogrow, by preallocating space is a good idea. I'll do that along with everything else.
Lots of good tips. I read the whole article "Troubleshooting Performance Problems in SQL Server 2005" great article, interesting.
0
 
LVL 4

Author Comment

by:brokeMyLegBiking
Comment Utility
excellent advice everyone, very much appreciate the input, sql server sure is a deep piece of software.
0
 
LVL 10

Expert Comment

by:AaronAbend
Comment Utility
Adam (Nightman), do you have benchmarks that demonstrate improvement after reindexing? And is there any difference if the table is clustered or heaped?  I have tried to create such a benchmark and have failed to create a scenario where reindexing improved performance, which is why I say there is not much benefit from the reindex.  I agree with your logic, but I think that cases where fragmentation is so great that it significantly degrades performance are rare.  A database with lots of deletions (for archiving purposes) might be such a candidate, but I have not worked with many databases like that. Archiving is such a complex task due to referential integrity that disk and memory are often cheaper solutions.

I have definitely seen cases where a full scan was necessary to create suitable statistics - i.e., a small sample (<5%) is not always sufficient and once you sample more than 25% or so you may as well scan the whole table based on the time-cost/benefit.  
0
 
LVL 29

Expert Comment

by:Nightman
Comment Utility
Aaron - I'm actually busy on a project to optimise a high volume OLTP system with mized reporting requirements at the moment - busy deriving stats at the moment - I will post some good benchmarks for your (solid stats for a clustered server processing aroung 3000 batches per second, 24/7!).

If you want to create a benchmark, you have to hammer a large table (1 million records + ), significantly fragment it by randomly deleting 10% of the data, and randomly update another 10% and then create another one and fragment it. Restart your server (or run one of the DBCC commands to free up your cache). Reindex one of the tables, checkpoint, backup the log and clear you cache. Then compare performance differences with the same statements across the two tables. I would back up the sample database after loading the data (or use a snapshot) to allow you to revert and compare performance with different indexing strategies.

Heap vs clustered really depends on the type of reporting that you are doing. Usually a heap is a temporary store anyway, so it isn't usually used for transactional throughput such as updates and deletes, so the performance implication on a heap is really negligible as it rarely becomes fragmented. The exception for this is when the heap IS used. Each update to a row actually saves new data on a new page, and then updates the old data to become a forward pointer to the new data. In this way, each call to retrieve data from the page (even a 'bookmark lookup') results in an extra IO operation. So heaps are a bad idea (unless used for staging) and will result in a performance cost later

Remember that the ideal candidate for a clustered index follows these rules:
1. Unique
2. Narrow
3. Static!
4. Monotonically increasing

If you don't have updates to the clustered index, and are seldom deleting, the cluster does NOT fragment - continued good performance. But what about your non-clustered? Invariably this data is not static, and as a result these indexes also fragment. Because your non-clusted covering indexes (which should include foreign keys) are critical to good reporting performance, as they become fragmented performance degrades. There is a greater IO cost in retrieving the data, pages are not optimally packed and there is a memory overhead as well (as I mentioned above). Now on lower volume databases this is seldom even seen, but as the throughput increases, performance degrades exponentially.

And the reality is that while EVERYONE knows that you need a solid professional DBA to administer an Oracle system, it seems that for MSSQL the assumption is made that Bob the accountant is quite capable of managing the server because he has an accounting database - the result of this is that I've seen more poorly designed databases in my time that I have good ones (such as the company who's indexing strategy was to create an index on every column on a heap, or the poor fools that had a volatile table clustered on a non-sequential uniqueidentifier).

Interestingly enough, if you don't have your own clustered index on a table (i.e. a heap), SQL 2005 actually creates one for you internally - it uses this value as the RID (row identifier) for any lookups that it needs - so is a heap really a heap?
0
 
LVL 29

Expert Comment

by:Nightman
Comment Utility
Ack! Just reread my post - it sounds like I'm giving a lecture, doesn't it?

Sorry, wasn't the intention ;(
0
 
LVL 10

Expert Comment

by:AaronAbend
Comment Utility
No problem! Your response, while containing much that I am naturally already aware of (freeing up cache when benchmarking) it is always best to include info like that for other EE users who find the thread.  Actually learned a few things from your post.

I agree on 2 issues in particular - 1) is a heap really a heap? I think it is really better to think of it as the "default cluster".  And the issue about Oracle vs MS dbas is one I tell a lot of people about. MS makes it easy for novices to do quite a bit.  I think there are plenty of MS experts out there (especially here on ee), there are certainly a lot more newbies doing work that needs more expertise.  Of course that's where all our points come from so we can't complain too much!  OK this thread is done!
0
 

Expert Comment

by:gkaffen338
Comment Utility
can rebuilding an index take 48 hours?  i have a 210gig DB thats been unresposive after the idex rebuild started and its been almost 48 hours! I cant remember how long it used to take because its been a long time.  Is there a way to check it?
0
 
LVL 10

Expert Comment

by:AaronAbend
Comment Utility
Of course it could take 48 hours.  And the size of the database in GB is (almost) totally irrelevant, so I am not sure why you mention it. How about the number of records? A 210GB database with one table and one record, yes it is taking too long. A 200 GB database with 80 tables, 100 billion records in total, and 200 indexes - I would not be surprised to see 48 hours.  

Read this thread. Why are you reindexing? Do you  have a benchmark you will check after you rebuild the index to see if it improved performance? Do you have specific problems you are trying to solve with reindexing or  is this the only thing you know how to do to improve performance (It is the very last thing I would do to improve performance - in fact in 25 years of using SQL I have never rebuilt indexes to improve performance - I have improved the indexes instead).

Anyway if you want additional help I suggest posting it as a new question and you will get more responses.
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Nowadays, some of developer are too much worried about data. Who is using data, who is updating it etc. etc. Because, data is more costlier in term of money and information. So security of data is focusing concern in days. Lets' understand the Au…
Introduction SQL Server Integration Services can read XML files, that’s known by every BI developer.  (If you didn’t, don’t worry, I’m aiming this article at newcomers as well.) But how far can you go?  When does the XML Source component become …
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now