Solved

trigger or double-insert ?

Posted on 2010-11-17
23
557 Views
Last Modified: 2012-05-10
hey al
On SQL 2000, Got a batch of about 100,000 inserts every 10-15 seconds. to a table.. there are no performance issues with the writing..

However.. I need to make the last few minutes of data  available for multiple-clients to query in near realtime.. doing it on the table while its inserted into, would make the selects very slow, as well as the inserts will be directly affected as well.

I was thinking of mkaing an in-mem temptable and managing last 30-60 seconds of inserts. Each row is about 30 bytes - so RAM is a non-issue in terms of the temp-table footprint.

The select  can be a VERY DIRTY read, and data does not have to be commited yet.

Any takes on how to do this ?

I can think of 2 possible ways :
1. Parallel - Double-insert.. to in-mem temptable  and real commit
2. insert and trigger on insert for an additional insert.

THinking trigger overhead of #2 will have large overhead and performance issues.

More ways are welcome!

7 Columns (all int)  C1,C2,C3,C4.C5,C6,C7

A workable Example is preffered!

TIA
0
Comment
Question by:controlr
  • 10
  • 8
  • 5
23 Comments
 
LVL 22

Expert Comment

by:8080_Diver
ID: 34159411
How about if you create a covering index that has the 7 columns in it?  

By having the covering index, your users would not actually be hitting the table but would, instead be hitting the index for their data.  

100k inserts every 10 to 15 seconds is good bit.  When you say "multiple clients", just how multiple are you taking?  (25?  50?  100?  1,000?  10,000?)
0
 

Author Comment

by:controlr
ID: 34159476
about 500 clients at the moment..
The problem with the indexes is that is will slow down the inserts.. that was the reasoning for the in-mem.
If I index every column.. im really duplicating the data.. not to mension massive disk-surface access issues.
Your idea sounds great in theoty, bu in prcactice I think it comes with many penalties.
Unless im missing something.... ?
0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 34159626
My question would be, are you going to be able to sustain the load of the 500 clients banging the in-memory data, the double inserts including the one into the in-memory data, and the rest of whatever is going on on the server?  

Unless you have some sort of automatic garbage collection (i.e. deletion of "old" data from the in-memory data), you are going to consume memory like it is going out of style, even if you are only holding 7 int columns per row.  do the math:
(7 Ints / row) * (4 byte/int) * (100,000 rows / 10 seconds) = 2,800,000 bytes /10 seocnds
(60 seconds / minute)  * (2,800,000 / 10 seconds) = 11,200,000 bytes / minute

the cleanest way to maintain the in-memory table will be to add a DateTimeInserted (default GETDATE()) column to that in-memory table.  That is going to increase the size of the table by at least 28%.  Also, the delete function is going to take some time to execute and if it takes more than the 10 seconds for inserting the data, that table will grow.  (Also, the deletes process will compound the overhead for maintaining the table.)

Unless you have some sort of index on the in-memory table, every access is going to be a table scan.
0
 

Author Comment

by:controlr
ID: 34159756
1. one of the fields is already stampped.
2. There is garbage collection.. and its very fast.. essentially, there is a new temp table every minute and previous 3 minutes are beeing dropped..
Thats as fast as it can get.
All the dat will be sorder on the client's end, as it needs to see all the data anyhow... so there is no use indexing anything on there..

But I think you just justified my way of the in-mem.... unless I completely missunderstood you :)

and 11MB is NOTHING!.. its a 8-core with 16GB - cpu doesnt even move to 10%..
The main issue here I think is disk hits more than anything... and it will be very constly if I do the selects on the actual disk surface.
0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 34160178
Just watch out for the memory load.  With a bunch of hits being done in memory, you could start getting a lot of CPU Waits.  (What about creating the covering index in memory? ;-)

Good luck and let us know how it works out.
0
 
LVL 69

Accepted Solution

by:
ScottPletcher earned 167 total points
ID: 34160288
I assumed you've tried SELECT ... WITH (NOLOCK) [allow dirty reads] from the main table, and that is not acceptable for whatever reason, such as the INSERT keeps the page locked even with NOLOCK specified.

What is the clus index on the table?  Maybe you could make the clus index the insert datetime.  Then specify NOLOCK on the queries.  SQL will use a clus index seek, which should be very fast.  Also, the rows will be inserted in sequential order, so there will be no fragmenation issues, even with the high volume of INSERTs.


>> THinking trigger overhead of #2 will have large overhead and performance issues. <<

Not necessarily; a properly written trigger will almost certainly be less overhead than doing the INSERTs twice yourself.


You could try a global temp table, which should be the least overhead, but INFURIATINGLY, SQL will delete it automatically if it ever, even briefly, has no active connections.
0
 

Author Comment

by:controlr
ID: 34166869
This is new...  Temp table will drop in-Mem table if no active connections?
WOW... Thanks for the heads up!

Although there are always active connections from other servers so it should not be an issue.

The only Other thing Im  thinking is to have a table per day...
and then somehow concatinate them for  a multi-day   select..

How would I g o about doind that ?  Thats probably the most efficient  way to manage history, and lower concentrate hits on a subset of the data..

Any takes on this ?

After this I will award.. as Im doind some performance testing already..
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 34167125
Multiple tables would be more overhead.  Best would stil likely be a single table with an ascending key and (NOLOCK) on the SELECTs.
0
 
LVL 22

Assisted Solution

by:8080_Diver
8080_Diver earned 333 total points
ID: 34167167
Which version of SQL Server are you on?

If 2005/2008, you could partition the data horizontally by today and history, rolling the Today partition into the History partition eacch night.

Or you could provide a separate History table that you copy the data to and then truncate the Current table each night.

Or, you could use the suggestion to set up the PK on the DateTimeStamp column and simply alter the user's search queries so that they specify a data range that is at least prior t, say, the last 10 seconds.

If you have the history table and a current day table, you would seem to be violating one of your original constraints ("I need to make the last few minutes of data  available for multiple-clients to query in near realtime").  Perhaps, combining the History Table trick with the PK on the DateTimeStamp and the nightly copy/truncate step could provide a reasonable work-around?
0
 

Author Comment

by:controlr
ID: 34172112
unfortunately - this must remain on sql2000 - no upgrades in the near 2-3 years.

The current working code uses a timestamp + 2 other fields as OK..
And its killing the insert processes.
If no selects are made .. inserts rate is about 1M records per minute, if selects are made on that the max insert rate is about 200K records per minute.

Ofcourse im using NOLOCK :)

As long as there are dual operations  (insert/selects)..  the solution will not work with current client select loads.    I dont want to think what will happen when its going to be 10 or 100 fold in the next few months.

The growth rates of inserts and selects are immense..

Perhaps I need to create a read-only database applying snapshots.. but the snapshot rate  would not provide real-time solution...
Any ideas on how to expand on this dual DB solution to make it work?

Will award points after this
0
 
LVL 22

Assisted Solution

by:8080_Diver
8080_Diver earned 333 total points
ID: 34173269
Okay, to summarize:

The process is on a SQL Server 2000 database.
The SQL Server versin cannot be upgraded in the next 2 - 3 years.
If there are no SELECT statements being executed, the rate of inserts is about 1,000,000 rows per minute.
If there are SELECT statements being executed, the rate of inserts drops to about 200,000 rows per minute, i.e. about 20% of the previous rate.
There are currently about 500 clients.
In the next few months the client base is expected to grow from 500 to somewhere between 5,000 and 50,000 clients.
Clients expect to be able to query against the last few minutes of data at any time (i.e. data latency is less than 1 minute?).

Okay, assuming the above is true, what happens to the data input rate when you have the 5,000 to 50,000 clients?  Does it go up similarly (or exponentially)?  

Given the situation as described, I would say that you have 3 options:
Reduce the client expectation such that you can increase the latency of the data.  In other words, get your clients to accept the fact that they can't save data AND query that data in real time.
Discuss converting the database to something like a Teradata database that may be able to process the queries a bit faster.
Update your resume and quietly start looking NOW!

The database, given the data collection rate, is not capable of handling the clients you already are trying to support, so there is no way it will support the impending increased work load.  
0
Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

 
LVL 69

Expert Comment

by:ScottPletcher
ID: 34174022
What are the index(es) on the table??

With the proper indexes, and NOLOCK on the SELECT, I don't see why SQL wouldn't be able to handle this.
0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 34174078
@controlr,

Have you tried using the covering index trick?  (Yes, it will duplicate the table's data but that is essentially what you are talking about doing with the #temp table anyway, isn't it?)  Just set up an index on the datetime stamp column and then all the others.  Then have your users make sure that their searches are based on the datetimestamp column first.
0
 

Author Closing Comment

by:controlr
ID: 34174521
thanbks for input... im just going to have to experiment.. bottom line...

Think im going to split the table up.. to seperate tables by the secondary index and live with the dilema of aggregating it at end of day and end-of month

And 8080.. the last comment on updating resume got you a little more points :) although that part is not an option.

Cheers
0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 34175325
I had a wise mentor, early in my career, who advised me to always include that option when faced with problems like this.  Hwe also told me that my resume should always be up to date . . . one just never knows. ;-)
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 34175358
>> Just set up an index on the datetime stamp column and then all the others.  Then have your users make sure that their searches are based on the datetimestamp column first. <<

But if just make the *clustered* index the datetime, you don't have to duplicate the data at all.
0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 34175407
Scott,

Keep reading next time. ;-)

"and then all the others" . . . In other words, make the datetimestamp the FIRST column and then include the other columns.
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 34188619
??

And that duplicates (essentially) all the data in the table in a non-clus index, which is not necessary and not nearly as efficient as making the datetime, by itself and/or with an id column, the clus index, which will automatically include all the other columns.
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 34188628
Since the datetimes will be dupliicated, you might as well add an ident column to the table and include it in the index, *after* the datetime, of course, since SQL would have to add its own anyway to prevent duplicate keys in the clus index.
0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 34190358
Scott,

The rate at which the author indicated there would be inserts will probably preclude using just the datatime column as the PK/clustered index.  There is a limit to the accuracy of the DateTime datatype within SQL Server and/or on a serverin general.  If the data input rate is greater than the accuracy can handle then you will begin to have duplicate entries for the datetime column.  Therefore, as a minimum, if you plan to have the datetime column as you clustered/unique index, you will need to include an Identity column to differentiate between the identical datetime entries.

On the other hand, just making the index (without clustering or uniqueness constraints) does NOT require the additional column.  As you indicate, yes, it does duplicate the data; however, it also means that the users are not actually accessing the table and that they can access just the index to get their data needs fulfilled.  That would be the gain from the trade-off of duplicating the data.  Also, since the author is indicating that he is considering duplicating the data into another table in order to address the problem, the covering index can provide the mechanism without having to set up a trigger or any other process to force the copying of the data.
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 34192140
Yes, so you include your own identity in the PK, as I stated in my comments immed. prior to yours :-) .

The INSERTs are happening *so* fast I really think you can safely use the main index and avoid duplicating all that data (you'll basically double the log size -- ouch!).

Just code the SELECT queries to use GETDATE() - 30 secs (or 15 secs or 1 min or whatever is acceptable).  Including the NOLOCK, and properly using the clus index, I would think that could use the existing clus index -- the fastest way to read any table -- without interfering with new rows being added.

To me that meets the OP's requirements, as stated in the intial q:
"However.. I need to make the last few minutes of data  available for multiple-clients to query in near realtime.. "

To me, 15 secs (or 10 or 20 or whatever works best) *is* "near realtime" and avoids the *huge* overhead of double inserts.

And keeping that non clus index up to date could be a real pain later.  Every time you add a column to the main table that is used in the lookup query, you have to add it to another index.  Just seems too much trouble to me.
0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 34192715
Hmmm, come to think of it, if you put the Identity as the second column in the PK (i.e. unique, clustered key), you achieve the best of worlds, the PK is continually ascending, thus not introducing any more fragmentation than is necessary, and the look up based on the datetimestamp - xx seconds is also optomized.  

Good solution!
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 34192992
Thanks!!

Yep, just have to code the SQL properly.  The calculation / adjustment of the time *must not* be done on the table column, but *only* on the GETDATE() side.  For example:

SELECT ...
FROM ... WITH (NOLOCK)
WHERE dateColumn >= DATEADD(MINUTE, -15, GETDATE())
    AND dateColumn <=  DATEADD(SECOND, -15, GETDATE())
    AND ...
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

When you hear the word proxy, you may become apprehensive. This article will help you to understand Proxy and when it is useful. Let's talk Proxy for SQL Server. (Not in terms of Internet access.) Typically, you'll run into this type of problem w…
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
Via a live example, show how to shrink a transaction log file down to a reasonable size.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now