Big data 2 million records per hour

I have data source generating 2 million inputs per hour ( then there are inputs too )
How do I manage this ?
I'm used to SQL Server

Data has time stamp and value

Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

rwallacejAuthor Commented:
So fields are tag, value, time stamp for 2 million records
Vitor MontalvãoMSSQL Senior EngineerCommented:
How are the records being inputed?
There's a purge mechanism? If affirmative, how often it runs and how late data is permitted to be stored?
Which edition and SQL Server version are you using?
How much resource the server has?
ste5anSenior DeveloperCommented:
I guess it strongly depends on your requirements..

How long you need to store those values? This means you need an estimated 1TB per week. So you need engouh storage.

Do you need to store the native values or would be storing aggregates by interval, e.g. 1 second, enough? Then maybe Stream Insight is an option.

Do you need to do this on prem or is cloud an alternative?

2,000,000 rows per hour are <1000 rows per second, which can be handled, but requires already a good design avoiding locking.
SolarWinds® VoIP and Network Quality Manager(VNQM)

WAN and VoIP monitoring tools that can help with troubleshooting via an intuitive web interface. Review quality of service data, including jitter, latency, packet loss, and MOS. Troubleshoot call performance and correlate call issues with WAN performance for Cisco and Avaya calls

rwallacejAuthor Commented:
Server has many, many terabytes of space
Data held for few years then could archive
Time stamp would be date time second.
I don't quite know how data arrives, via a web service I think.
SQL server enterprise
Purge mechanism - what is this
Cloud would perhaps be option - but pretty sure it would need to be on existing server
Aggregates by interval - what is this ?
Vitor MontalvãoMSSQL Senior EngineerCommented:
"Purge mechanism - what is this" --> It's this: "Data held for few years then could archive"

"SQL server enterprise" --> That's the edition. How about the version (2012, 2014,...)?

I don't quite know how data arrives,
You should because the solution depends on that. How can you know the transfer rate if you don't know how it arrives ("I have data source generating 2 million inputs per hour")?
rwallacejAuthor Commented:
Purge mechanism - yes
I only know 2 million per hour. It's storage of this I'm interested in, actual process of arrival unknown at moment , only they are 2 million
ste5anSenior DeveloperCommented:
Seems that my first estimate was wrong. A row consists of a date and a value. What kind of value?

Data alone see sheet. Then you need the index overhead. Evaluate the space for partitioning.

Could this be correct?

Vitor MontalvãoMSSQL Senior EngineerCommented:
It's storage of this I'm interested
Well, you already told that "Server has many, many terabytes of space" so without knowing the load mechanism there's not much to say. SQL Server engine is very able to manage 2M of rows by hour and you have storage enough so for now can't see any issue unless you can provide more information.
1 TB per week seems to be wrong when you don't know the record size...

1 TB / (2mio*24*7) = 3000 B per record approx.

Are 3000 bytes enough for the tag, value, and timestamp?  I would say the record could consume 30 bytes at most which is 100 times less than 1 TB per week so SQL Server should be OK with sufficient disk array and backup/archive plan.

Of course, answers to questions placed by other experts are highly important to finalize this planning stage.

Next steps:
Read something about Big data, e.g. here:
SQL Server supports File streams:

Your case does not seem to belong into one of the above two categories so one big SQL table should be OK for your data but you have to say what are your plans with these data which then help to create indexes or partitioning etc.
rwallacejAuthor Commented:
Tag would be int, 0 to 2million (very max)
Date a date time with d,m,y, h,m,sec
Value a double , say -1 million to +1m, with decimals
rwallacejAuthor Commented:
Data for calculations, charts, max /min ,averages
Brian CroweDatabase AdministratorCommented:
I would also recommend storing this table in its own file group depending on what else is in the database.  I don't expect you want to be backing up and restoring this database on any staging (non-production) environments and it is easier to exclude if it is in a different file group.
Tomas Helgi JohannssonCommented:

You say 2 million records/hour. That is 48 million records every 24 hours and 336 million records / week. Almost 17,5 billion records/year. That is quite large. ;)
Is the table holding that data partitioned using the timestamp field ?
If not then I strongly suggest you consider to partition the table in that way for easy data-management and archiving of old data.
And at least have the partition every week (52 partitions with 336 million records per partition) or even per day (365 partitions each with 48 million records)

This will also drastically speed up sql-queries and inserts/updates/deletes and data load.

     Tomas Helgi
Jim HornMicrosoft SQL Server Data DudeCommented:
Quick question not already handled buy the above experts:  Do you really need to ETL all rows?

I had an interview process with a solar energy company where there they had a TON of devices (think Internet of Things IOT)  that created rows every second.  The strategy was to ETL every minute the most recent row, and not the 60 rows from the previous ETL.  That would end up in their data warehouse / decision management system.  If anything died / blew up / went off the chart / was sabotaged by crazy environmentalist wackos, there was a side ETL job that could be executed that would extract every row from the problem device.
Scott PletcherSenior DBACommented:
Cluster the table first on datetime.  If the tags are in order, you could add the tag to the clus key.  You could also add an identity to the key just to force the key to be unique.  But the clus key should not be just identity, or start with identity.

Assuming this is Enterprise edition:
If possible, I'd also partition the table.  I think partition switches could be very useful on a table this large.  In the current/active partition, use just row compression.  In the older partitions, you can use page compression with 100% fillfactor.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Scott PletcherSenior DBACommented:
Given the very limited info in the original q, I'm not sure what more you wanted in the answer, but hopefully what I provided did help out at least a little bit.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.