Solved

Does a clustered index optimize an update process?

Posted on 2006-10-30
6
465 Views
Last Modified: 2012-05-05
I'm trying to understand how SQL Server will actually access the data during an update.  I've read that clustered indexes are fast for reading ranges but individual reads do not have a performance boost over a non-clustered index.  In an update, does the "read head' (or whatever) move between the source and target for each record regardless of being clustered so there is no performance gain over a regular index?

Background: I'm developing a system with retiree information.  I get an updated Retiree file each month from HR's mainframe with updated address etc.  HR uses RetireeID as their key field which is a composite of several numbers and letters.  I update my Retiree table which contains the same fields as the HR file.  I'm proposing using the RetireeID as the Primary key on my table as a clustered index rather than an identity field.

Note: my system does not capture data through a GUI.

Thanks!
0
Comment
Question by:FatalErr
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 10

Expert Comment

by:AaronAbend
ID: 17838820
How many records are you talking about? In general, it is a good idea to use a clustered index to support the most common queries.

There is a update hit. especially on large updates. In the cluster, the hit is due to the fact that the data is organized along the lines of the index elements and that physical ordering has to be maintained. In a nonclustered, it inserts into the table and then updates the index - two separate acts, as you surmise.

If you are replacing all of the records in a table, drop all indexes before inserting, then add indexes. That's a general rule.

but how many records are you talking about? And whats the hardware platform. As I often quote "premature optimization is the root of all evil." - attributed to alan turing
0
 
LVL 1

Author Comment

by:FatalErr
ID: 17839202
I've got about 250K records.
I don't know how much ram.  At least a gig.  It is dual processor.  I don't know more right now.

Inserts are slowed by a clustered index.  But, I'm focusing just on the update process.  

The cluster would really increase performance if multiple records are updated sequentially rather than one at a time.   I don't know if that's what happens.
0
 
LVL 10

Accepted Solution

by:
AaronAbend earned 500 total points
ID: 17839449
I have been tuning SQL Server for 20 years and I could not make that statement about update performance. I have always found that my assumptions about "what the database should do fast" or slow are consistently inaccurate to the point where I only rely on carefully run benchmarks.

The impact of the cluster on performance has nothing to do with sequentially updating the records, but rather how much time the optimizer has to spend finding those records prior to the update.  So, as long as the columns in the where clause are indexed, you should get great performance. And the type of index will not make much of a difference. Most of my current understanding of SQL is based on actual benchmarks on a stable SQL 2000 system that had 30 million to 500 million records.  As far as I know, records are updated one at a time regardless of whether there is a cluster or not.  Remember that 99% of the operations you do in SQL are going to happen in memory, not on the disk. The statement written to the disk at the time of commit is the update statement itself, not the data being updated.  

A tremendous amount of performance in updates will relate to your log writing. Get an extra controller to write the log file and updates will fly. If you have disk contention between the log writer and the process that is looking for your records, you will probably see slower performance.  

I just did a little benchmark... created 200,000 test records on my P4 2G duo with 2G ram...
an update statement that updated 16,000 of the records (about 8%)

newly created table with no indexes at all
   update column1 from 'A' to 'B'  6 seconds
create clustered index on column1
   update column1 from 'A' to 'B'  71 seconds (wow! didn't expect that! - did the whole thing a second time to make sure - 67 seconds!!)
   update column2 where column1 = 'B' 6 seconds
   Rerun for a different column without clearing the buffer pool - unmeasurable (instantaneous)
create nonclustered index on column1 (had to drop cluster of course)
   update column1 from 'B' to 'A'  8 seconds


So you see - it is really hard to predict even for an expert. Do the benchmarks and once you have a performance problem, use query analyzer to figure out what might help. A great resource, besides ee, is  http://www.sql-server-performance.com

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 10

Expert Comment

by:AaronAbend
ID: 17839469
One error in my post - the queries were updating 160,000 records - almost 80% of the database! Not typical of course. The "shape" of your queries will be a factor deciding what is important.

 
0
 
LVL 29

Expert Comment

by:Nightman
ID: 17839964
1. Indexes will significantly improve seek times when trying to find data.
2. They will slow performance when doing inserts and updates (as SQL also has to maintain the index as well as the data)
3. Sequential clustered inserts are OK (e.g. 1,2,3,4,5).
4. Non-sequential inserts on data pages with high fill factors will result in page splits (think of the page as where the data is stored, and an insert in the middle causes SQL to shuffle the data around to fit the new data in) - this can be a significant overhead on busy servers.
5. Non-clustered covering indexes (i.e. indexes that contain all the data that you are trying to retrieve) will improve performance (as SQL only has to go down to the index, or leaf level, to retrieve the data, and does not have to touch the actual data page).
0
 
LVL 1

Author Comment

by:FatalErr
ID: 17843673
Thanks for the good info.  

Aaron - You said get an extra controller to write the log file.  Will SQL Server automatically do that, or do you have to direct it in some way?  Thanks,
0

Featured Post

Visualize your virtual and backup environments

Create well-organized and polished visualizations of your virtual and backup environments when planning VMware vSphere, Microsoft Hyper-V or Veeam deployments. It helps you to gain better visibility and valuable business insights.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
In this article we will learn how to fix  “Cannot install SQL Server 2014 Service Pack 2: Unable to install windows installer msi file” error ?
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed

724 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question