Link to home
Start Free TrialLog in
Avatar of Wayne Barron
Wayne BarronFlag for United States of America

asked on

SQL Server - Large amounts of data structuring

Hello, All.
I have been thinking about this for a while, and tonight, I thought I would ask for some opinions and advice.

I am designing two websites that both have the potential for huge growth.
The question is this.
When dealing with large data stores, I read a long time that it is best to split up the tables.
I am assuming it meant to put the tables, which will contain the larger amounts of data, into their own database.

What are your thoughts on this?
Should I put the tables in a separate database to grow independently, without taking up space from the much smaller table's own growth over time?


Avatar of ste5an
Flag of Germany image

When dealing with large data stores, I read a long time that it is best to split up the tables.
I am assuming it meant to put the tables, which will contain the larger amounts of data, into their own database.
There is no such general advice. The data integrity boundary is a database. Thus this would risk a loss of data integrity.

When you need to be prepared for growth, then you need a scalable architecture. Using the tables approach is an implementation detail, not architecture.

Here you should read about the stackoverflow architecture or WhatsApp (PHP to Erlang migration).

The most important things you need to know about SQL Server are primarily:

- Server clustering
- Table partitioning
- Column store indices
You can use multiple files for the same DB. SQL Server has ability for parallel writing like write 1 page to file 1, write page 2 in file 2, ... write page n to file n, write page n+1 to file 1, and so on. If files reside on different hard drives then the speed of write is multiple of speed of single disk by the number of disks.

You can use some system like ComVault. It has 3 levels of storage: ultra speed SSD, fast HD, slow HD. It uses statistics to move disk blocks between them optimizing IO speed.

MS Clustering does not improve speed - only one node really working (Oracle cluster nodes work simultaneously). FCI just switches disk and restarts SQL Server, AOAG spends additional resources for permanent data flow between nodes. Backup from passive node is READ_ONLY. It does not free up the active node.

"Read Intent" can improve read speed from AOAG. To make it work an application must understand and use "Read Intent".

Partitioning does not add speed. Moving data between partitions does not improve performance. It could be good if you have significant volume of read_only data - you can save time backing up only read/write partitions.  

Performance optimization can help. Indexes, statistics, normalized data model, in memory tables, star schema/data marts for data warehouse, etc.
Clustering does not improve speed
As we talk about architecture, clustering is a basic building block for scale-out. Thus it improve speed, if correctly fitted into the system.

Partitioning does not add speed.
Sure, separate hot data from cold data and distribute it over different file groups on different storage systems, while being logically a single table. Out-aging data for example becomes a meta-data operation, minimally logged.
Avatar of Wayne Barron


Sorry I have not responded to either of your comments.

When dealing with Read and Read/Write.

Is there a way to have two AOAG connections?
One used solely for READ?
One used solely for Write?

When I try to add one, I get this.

TITLE: Microsoft SQL Server Management Studio

Cannot show requested dialog.


Cannot show requested dialog. (SqlMgmt)


The availability group already has a listener. Another listener cannot be added. (SqlManagerUI)



I just found this, so I will follow its information later this week when I have spare time.
Avatar of Scott Pletcher
Scott Pletcher
Flag of United States of America image

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks, Scott.
I will take a look into that later today.
As to the other points you brought up, you generally don't need to split tables into multiple pieces, unless the new rows expand a lot after inserting, in which case it can be helpful to have two partitions.  One is for active data and one is for static data.  You can shift the rows themselves from the active partition to the static partition over time, based on the add date (or however else you know when the data becomes less volatile).

You can use separate filegroups / files for different tables if you need to, although each table can also grow individually within the same filegroup without taking space from each other.
OK, so it has been a while since I've commented back on this question.
clustered index - I never knew that is what it is called for what I've always done.
Every table I create, I always set the ID key to [Primary Key]
And the [Identify Specification] set to [Yes]

I just watched a video on Filegroups.
I learned a little from it, will have to dig in a little deeper to see if it will benefit me at all to use it.
I am also looking in on [Partitioning] as well.

The project I am working on now, is a complete redesign and restructuring of an old website.
That database right now has just over 100 tables.
It runs very nice, and easy to handle loading records quickly.
The site is designed for large growth.

The other day I deleted all test data, which before I deleted everything, I checked the number of rows in all tables, and it was just shy of 50,000 rows.
The site could have upwards of a half a million records in it before the new version is released to the world on (hoping) Christmas Eve.

I am going to close this one out and split with everyone who provided information.
Yaw can still comment with any additional information.