VERY LARGE TABLE - Performance - Design - Insert - Delete

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

bragn

Hi stefanlennerbrant

Is the clustered index the only index?
The clustered index should be selected from the way the data is accessed in select statements I should think.

A large number of indices slow down delete, inserts and updates considerable. Consider fewer indices.

You're creating a very large transaction with your deletion - and that degrades performance.
Consider deleting in smaller (that is - not as large) chunks. Smaller date/timerange in each transaction.
I think the server might run out of locks in general when you do your deletes and that explains error 1024.

Best regards,
bragn.

ASKER

To bragn,
Thanks for your answer.

The clustered index is not the only index. We also have som nonclustered indexes.
Unfortunately, we need all the indexes that we have, they have been carefully selected to support searching of the data (which is done rarely and not at the same time as deleting).

And, we are deleting small chunks (1000 rows) at a time to avoid locking out the inserting of data.
So, we are not doing a large transaction, the delete thread loops and deletes 1000 rows at a time, and still I find it strange that we run out of general locks!

Best Regards
/Stefan

bragn

Hi Stefan.

You are right. My theory on the locks does not hold water in that case - unless you're not committing between deletions or the server is badly configured.

These pages might be helpful....
http://doc.ddart.net/mssql/sql2000/html/trblsql/tr_reslsyserr_1_6gxg.htm
http://support.microsoft.com/default.aspx?scid=kb;en-us;224453

Best regards,
bragn.

ASKER

To hkamal,

Thanks for your thoughts.

Yes, it is a large table!

1: We do not report on the data much. Insert is continuosly done in the table at the end thanks to the clustered index. Why do you question the use of clustering??? What do you mean by "contention on inserts"?

2: Ok, i have to read about vertical partitioning and what that means! What would we gain by using vertical partitioning?

Ps. We never do any updates! Only inserts and deletes!

Best Regards
/Stefan

I queried the use of clustering has a higher overhead which is worth it if your (select) queries benefit more from it. You didn't say if you tend to do range scans (which I assume you do). There is a very good paper on index costings and for range scans, a non-clustered indexes is better

Vertical partitioning I think would be VERY beneficial for you especiallty as you seem to delete regularly from the bottom of the table suggesting your data does age. Of all the methods described here, I suspect partitioning will be have the biggest impact in your case

I didn't mean updates in the sql sense, I refer to deletes and inserts collectively as updates.

For updates, try to reduce the round-trip from client to server, possibly by encompassing entire transaction in a stored procdure so it remains server-bound

Use explicit transactions rather than implicit transactions. You may have an implict transaction open without your knowledge which hurts concurrency (in a looped delete/insert, implict transaction is potentially open holding locks and blocking other uses from data). Run the DBCC OPENTRAN command from Query Analyzer to find out more

If you have other indexes, check that they're not splitting and use PAD_INDEX

Just a few things to check

ASKER

Thanks hkamal,

where can I find the paper you talk about in "There is a very good paper on index costings and for range scans, a non-clustered indexes is better"?

We are definitely doing range scans when deleting, and sometimes when searching.

/Stefan

Unfortunately I had the pdf but cannot locate the original URL. After a detailed description of each index type, it had a table of various scenarios along with processor costings..

The thinking is this; with a clustered index, the server locates the first matching record and (boosted by SQL Server's look-ahead feature) reads successive records thus requiring fewer pages. With a range query, it cannot benefir from this as only a few record will match per page but it still has to load the leaf causing more pages to be read into memory (compared to a tiny one-column non-clustered index using up far fewer pages to load for the same range)

Re my earlier point re contention; O'Reilly's Transact-SQL optimization and Tuning suggests:
Avoid creating a clustered index on amonotonically increasing key [timestamp in yourcase]. It forces all new rows to go to the end of the table.. such indexes forces all users inserting data to compete for the last page of the table (which is why I said contention, however, if you inserts are done in batch mode rather multi-user interactive mode, this is less of an issue).

HTH

bragn

Hi all

The data is stored around the clustered index. That is - the leaf level of the index contains the data as well as the index key. So when you're scanning the clustered index the data is read as well. Not only that, it's read in page chunks so if you want surrounding data (that is - the data that sorts with the surrounding index keys) the server doesn't need to read another page (up to a limit of the page size ofc).
In practice, the clustered index decides how the physical data is ordered.

So if you're select statement is like:
select * from table where table.column1 > 1 and table.column1 < 100 then you benefit from a clustered index on column1.

You're dealing with a very larger dbase and a lot more of inserts/deletes than I'm used to but I think the locks, transaction and update of indexes is much more time consuming than the seek itself when it comes to that inserts/deletes.
A good clustered index on the other hand is very effective with regard to select's.

Inside Microsoft SQL Server 2000 by Kalen Delaney takes a good look at these matters.

Best regards,
bragn.

stefanlennerbrant , as I mentioned originally, you may also wish to check your locking strategy. If you have datarows locking, set it to page-lock from the outset and avoid escalation straight to table-lock. This is quite common for fighting deadlocks. Or if you know in advance of a query that you'll exceed the pre-configured page-lock threshold beyond which the optimizer will escalate to table-lock, you may wish to pre-empt page-lock allocations and specify table-lock from the out-set.
Check out PAGLOCK also (which allows you to go on reporting on historical data without hindering users adding to the end fo the table)

I would consider horizontal partitioning--not vertical. It all depends on if you have a date column (or another column) that is a prime candidate for the partitioning column. As far as RAID, RAID10 is your best bet for delete/update operations--the write speed is awesome. Vertical partitioning splits the less frequently used columns into another table. Horizontal partitioning splits the rows between tables.

You also want to make sure your log file is on a different raid group/disks than your data files....

"If you have datarows locking, set it to page-lock from the outset and avoid escalation straight to table-lock. This is quite common for fighting deadlocks."

It's also a good way to create deadlocks. Personally, I would use profiler and perfmon to figure out what's causing the excess locking in the first place. Lots of disk io (which I suspect you have on a large delete) will cause excess locking....

nexusSam

arbert, my comment that you quoted came directly from O'Reilly - we can argue the merits of each locking scheme for some time, ultimately it's down to what works for this application.
stefanlennerbrant , of course I was referring to horizontal partitioning all along (hence my references to archiving and logic for key pointers to different date-range archives). Vertical partitioning is when you move some columns into a different table, but with the same number of rows

Chill, I wasn't attacking...

"we can argue the merits of each locking scheme for some time, ultimately it's down to what works for this application."

I'm not arguing anything--I'm simply saying don't put a bandaid on a broken leg. Changing the locking methods, for the most part, simply masks a problem...Just because it's in black and white doesn't mean it's always worth printing...

hmm, it also looks like you have multiple accounts nexus--that's a violation of EE agreement...

ASKER

Arbert, do you know if the clustered index has to be rebuilt much when doing the delete as described?
(The clustered index is on a date, stored as an integer, and the delete is done in a range, where we first do:
set rowcount 1000
delete from zz where date >= yy and date <= xx

/Stefan