Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 592
  • Last Modified:

trigger or double-insert ?

hey al
On SQL 2000, Got a batch of about 100,000 inserts every 10-15 seconds. to a table.. there are no performance issues with the writing..

However.. I need to make the last few minutes of data  available for multiple-clients to query in near realtime.. doing it on the table while its inserted into, would make the selects very slow, as well as the inserts will be directly affected as well.

I was thinking of mkaing an in-mem temptable and managing last 30-60 seconds of inserts. Each row is about 30 bytes - so RAM is a non-issue in terms of the temp-table footprint.

The select  can be a VERY DIRTY read, and data does not have to be commited yet.

Any takes on how to do this ?

I can think of 2 possible ways :
1. Parallel - Double-insert.. to in-mem temptable  and real commit
2. insert and trigger on insert for an additional insert.

THinking trigger overhead of #2 will have large overhead and performance issues.

More ways are welcome!

7 Columns (all int)  C1,C2,C3,C4.C5,C6,C7

A workable Example is preffered!

TIA
0
controlr
Asked:
controlr
  • 10
  • 8
  • 5
3 Solutions
 
8080_DiverCommented:
How about if you create a covering index that has the 7 columns in it?  

By having the covering index, your users would not actually be hitting the table but would, instead be hitting the index for their data.  

100k inserts every 10 to 15 seconds is good bit.  When you say "multiple clients", just how multiple are you taking?  (25?  50?  100?  1,000?  10,000?)
0
 
controlrAuthor Commented:
about 500 clients at the moment..
The problem with the indexes is that is will slow down the inserts.. that was the reasoning for the in-mem.
If I index every column.. im really duplicating the data.. not to mension massive disk-surface access issues.
Your idea sounds great in theoty, bu in prcactice I think it comes with many penalties.
Unless im missing something.... ?
0
 
8080_DiverCommented:
My question would be, are you going to be able to sustain the load of the 500 clients banging the in-memory data, the double inserts including the one into the in-memory data, and the rest of whatever is going on on the server?  

Unless you have some sort of automatic garbage collection (i.e. deletion of "old" data from the in-memory data), you are going to consume memory like it is going out of style, even if you are only holding 7 int columns per row.  do the math:
(7 Ints / row) * (4 byte/int) * (100,000 rows / 10 seconds) = 2,800,000 bytes /10 seocnds
(60 seconds / minute)  * (2,800,000 / 10 seconds) = 11,200,000 bytes / minute

the cleanest way to maintain the in-memory table will be to add a DateTimeInserted (default GETDATE()) column to that in-memory table.  That is going to increase the size of the table by at least 28%.  Also, the delete function is going to take some time to execute and if it takes more than the 10 seconds for inserting the data, that table will grow.  (Also, the deletes process will compound the overhead for maintaining the table.)

Unless you have some sort of index on the in-memory table, every access is going to be a table scan.
0
Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

 
controlrAuthor Commented:
1. one of the fields is already stampped.
2. There is garbage collection.. and its very fast.. essentially, there is a new temp table every minute and previous 3 minutes are beeing dropped..
Thats as fast as it can get.
All the dat will be sorder on the client's end, as it needs to see all the data anyhow... so there is no use indexing anything on there..

But I think you just justified my way of the in-mem.... unless I completely missunderstood you :)

and 11MB is NOTHING!.. its a 8-core with 16GB - cpu doesnt even move to 10%..
The main issue here I think is disk hits more than anything... and it will be very constly if I do the selects on the actual disk surface.
0
 
8080_DiverCommented:
Just watch out for the memory load.  With a bunch of hits being done in memory, you could start getting a lot of CPU Waits.  (What about creating the covering index in memory? ;-)

Good luck and let us know how it works out.
0
 
Scott PletcherSenior DBACommented:
I assumed you've tried SELECT ... WITH (NOLOCK) [allow dirty reads] from the main table, and that is not acceptable for whatever reason, such as the INSERT keeps the page locked even with NOLOCK specified.

What is the clus index on the table?  Maybe you could make the clus index the insert datetime.  Then specify NOLOCK on the queries.  SQL will use a clus index seek, which should be very fast.  Also, the rows will be inserted in sequential order, so there will be no fragmenation issues, even with the high volume of INSERTs.


>> THinking trigger overhead of #2 will have large overhead and performance issues. <<

Not necessarily; a properly written trigger will almost certainly be less overhead than doing the INSERTs twice yourself.


You could try a global temp table, which should be the least overhead, but INFURIATINGLY, SQL will delete it automatically if it ever, even briefly, has no active connections.
0
 
controlrAuthor Commented:
This is new...  Temp table will drop in-Mem table if no active connections?
WOW... Thanks for the heads up!

Although there are always active connections from other servers so it should not be an issue.

The only Other thing Im  thinking is to have a table per day...
and then somehow concatinate them for  a multi-day   select..

How would I g o about doind that ?  Thats probably the most efficient  way to manage history, and lower concentrate hits on a subset of the data..

Any takes on this ?

After this I will award.. as Im doind some performance testing already..
0
 
Scott PletcherSenior DBACommented:
Multiple tables would be more overhead.  Best would stil likely be a single table with an ascending key and (NOLOCK) on the SELECTs.
0
 
8080_DiverCommented:
Which version of SQL Server are you on?

If 2005/2008, you could partition the data horizontally by today and history, rolling the Today partition into the History partition eacch night.

Or you could provide a separate History table that you copy the data to and then truncate the Current table each night.

Or, you could use the suggestion to set up the PK on the DateTimeStamp column and simply alter the user's search queries so that they specify a data range that is at least prior t, say, the last 10 seconds.

If you have the history table and a current day table, you would seem to be violating one of your original constraints ("I need to make the last few minutes of data  available for multiple-clients to query in near realtime").  Perhaps, combining the History Table trick with the PK on the DateTimeStamp and the nightly copy/truncate step could provide a reasonable work-around?
0
 
controlrAuthor Commented:
unfortunately - this must remain on sql2000 - no upgrades in the near 2-3 years.

The current working code uses a timestamp + 2 other fields as OK..
And its killing the insert processes.
If no selects are made .. inserts rate is about 1M records per minute, if selects are made on that the max insert rate is about 200K records per minute.

Ofcourse im using NOLOCK :)

As long as there are dual operations  (insert/selects)..  the solution will not work with current client select loads.    I dont want to think what will happen when its going to be 10 or 100 fold in the next few months.

The growth rates of inserts and selects are immense..

Perhaps I need to create a read-only database applying snapshots.. but the snapshot rate  would not provide real-time solution...
Any ideas on how to expand on this dual DB solution to make it work?

Will award points after this
0
 
8080_DiverCommented:
Okay, to summarize:

The process is on a SQL Server 2000 database.
The SQL Server versin cannot be upgraded in the next 2 - 3 years.
If there are no SELECT statements being executed, the rate of inserts is about 1,000,000 rows per minute.
If there are SELECT statements being executed, the rate of inserts drops to about 200,000 rows per minute, i.e. about 20% of the previous rate.
There are currently about 500 clients.
In the next few months the client base is expected to grow from 500 to somewhere between 5,000 and 50,000 clients.
Clients expect to be able to query against the last few minutes of data at any time (i.e. data latency is less than 1 minute?).

Okay, assuming the above is true, what happens to the data input rate when you have the 5,000 to 50,000 clients?  Does it go up similarly (or exponentially)?  

Given the situation as described, I would say that you have 3 options:
Reduce the client expectation such that you can increase the latency of the data.  In other words, get your clients to accept the fact that they can't save data AND query that data in real time.
Discuss converting the database to something like a Teradata database that may be able to process the queries a bit faster.
Update your resume and quietly start looking NOW!

The database, given the data collection rate, is not capable of handling the clients you already are trying to support, so there is no way it will support the impending increased work load.  
0
 
Scott PletcherSenior DBACommented:
What are the index(es) on the table??

With the proper indexes, and NOLOCK on the SELECT, I don't see why SQL wouldn't be able to handle this.
0
 
8080_DiverCommented:
@controlr,

Have you tried using the covering index trick?  (Yes, it will duplicate the table's data but that is essentially what you are talking about doing with the #temp table anyway, isn't it?)  Just set up an index on the datetime stamp column and then all the others.  Then have your users make sure that their searches are based on the datetimestamp column first.
0
 
controlrAuthor Commented:
thanbks for input... im just going to have to experiment.. bottom line...

Think im going to split the table up.. to seperate tables by the secondary index and live with the dilema of aggregating it at end of day and end-of month

And 8080.. the last comment on updating resume got you a little more points :) although that part is not an option.

Cheers
0
 
8080_DiverCommented:
I had a wise mentor, early in my career, who advised me to always include that option when faced with problems like this.  Hwe also told me that my resume should always be up to date . . . one just never knows. ;-)
0
 
Scott PletcherSenior DBACommented:
>> Just set up an index on the datetime stamp column and then all the others.  Then have your users make sure that their searches are based on the datetimestamp column first. <<

But if just make the *clustered* index the datetime, you don't have to duplicate the data at all.
0
 
8080_DiverCommented:
Scott,

Keep reading next time. ;-)

"and then all the others" . . . In other words, make the datetimestamp the FIRST column and then include the other columns.
0
 
Scott PletcherSenior DBACommented:
??

And that duplicates (essentially) all the data in the table in a non-clus index, which is not necessary and not nearly as efficient as making the datetime, by itself and/or with an id column, the clus index, which will automatically include all the other columns.
0
 
Scott PletcherSenior DBACommented:
Since the datetimes will be dupliicated, you might as well add an ident column to the table and include it in the index, *after* the datetime, of course, since SQL would have to add its own anyway to prevent duplicate keys in the clus index.
0
 
8080_DiverCommented:
Scott,

The rate at which the author indicated there would be inserts will probably preclude using just the datatime column as the PK/clustered index.  There is a limit to the accuracy of the DateTime datatype within SQL Server and/or on a serverin general.  If the data input rate is greater than the accuracy can handle then you will begin to have duplicate entries for the datetime column.  Therefore, as a minimum, if you plan to have the datetime column as you clustered/unique index, you will need to include an Identity column to differentiate between the identical datetime entries.

On the other hand, just making the index (without clustering or uniqueness constraints) does NOT require the additional column.  As you indicate, yes, it does duplicate the data; however, it also means that the users are not actually accessing the table and that they can access just the index to get their data needs fulfilled.  That would be the gain from the trade-off of duplicating the data.  Also, since the author is indicating that he is considering duplicating the data into another table in order to address the problem, the covering index can provide the mechanism without having to set up a trigger or any other process to force the copying of the data.
0
 
Scott PletcherSenior DBACommented:
Yes, so you include your own identity in the PK, as I stated in my comments immed. prior to yours :-) .

The INSERTs are happening *so* fast I really think you can safely use the main index and avoid duplicating all that data (you'll basically double the log size -- ouch!).

Just code the SELECT queries to use GETDATE() - 30 secs (or 15 secs or 1 min or whatever is acceptable).  Including the NOLOCK, and properly using the clus index, I would think that could use the existing clus index -- the fastest way to read any table -- without interfering with new rows being added.

To me that meets the OP's requirements, as stated in the intial q:
"However.. I need to make the last few minutes of data  available for multiple-clients to query in near realtime.. "

To me, 15 secs (or 10 or 20 or whatever works best) *is* "near realtime" and avoids the *huge* overhead of double inserts.

And keeping that non clus index up to date could be a real pain later.  Every time you add a column to the main table that is used in the lookup query, you have to add it to another index.  Just seems too much trouble to me.
0
 
8080_DiverCommented:
Hmmm, come to think of it, if you put the Identity as the second column in the PK (i.e. unique, clustered key), you achieve the best of worlds, the PK is continually ascending, thus not introducing any more fragmentation than is necessary, and the look up based on the datetimestamp - xx seconds is also optomized.  

Good solution!
0
 
Scott PletcherSenior DBACommented:
Thanks!!

Yep, just have to code the SQL properly.  The calculation / adjustment of the time *must not* be done on the table column, but *only* on the GETDATE() side.  For example:

SELECT ...
FROM ... WITH (NOLOCK)
WHERE dateColumn >= DATEADD(MINUTE, -15, GETDATE())
    AND dateColumn <=  DATEADD(SECOND, -15, GETDATE())
    AND ...
0

Featured Post

 [eBook] Windows Nano Server

Download this FREE eBook and learn all you need to get started with Windows Nano Server, including deployment options, remote management
and troubleshooting tips and tricks

  • 10
  • 8
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now