SQL Server Storage Basics: Database Files

Yashwant VishwakarmaSQL DBA
CERTIFIED EXPERT
Published:
In this article we will discuss about storage basics in SQL Server. We will get to know how database is physically implemented in SQL Server. We will explore data files and transaction log files in SQL Server through this article.
SQL Server Storage Basics: Database Files
 
Note: All screenshots are applied to SQL Server 2012 Enterprise Evaluation Edition.
 
1. Introduction:
Like every other DBMS software, SQL Server has also its storage engine which we need to know for better understanding of basics. A SQL DBA should know well about these storage basics. In this article we will discuss about SQL Server storage engine. SQL Server storage engine itself has a vast topic one article will not be sufficient for this because it has many things in it that we should know. After thinking a lot how to start this topic, I decided to discuss these basics in bunch of short articles because if I will try to cover all the things in one article it will get lengthy and we will not easily grasp. So let’s move ahead into SQL Server Storage Basics.
 
2. Physical Database Architecture:
Every SQL DBA knows that data in SQL Server is stored and organized into the logical components such as tables, views, procedures in databases, which a user can view easily but from the point of database administration we will talk here about physically implementation of data in databases. A database is physically implemented as two or more files on disk, which we called primary data files (.mdf), secondary data files (.ndf) and log files (.ldf). For more better understanding we can put below points:

compare.PNGfig1_1.png                           Figure1: User view & physical implementation of database
 
3. Storage Basics Explained:
3.1) Primary Data Files:
3.1.1 What is Primary Data File?
Primary data file is a SQL Server database file which contains the startup information and act as starting point for any database. It holds user data and all objects like tables, indexes and stored procedures.

fig2_2.pngfig2-1.png     Figure2: Illustrating physical files, extents, pages and page structure in SQL Server
 
 3.1.2 Extension of Primary Data File:
It is recommended that primary data file should have .mdf extension but really you are free to give any extension to it like .abc, .ian, .jim, .cathy, .xyz , below is the example:
 
USE [master]
                      GO
                      ALTER DATABASE [tempdb]
                      ADD FILE
                      (NAME = N'tempdev_Data06',
                      FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\tempdev_Data06.ian',
                      SIZE = 100MB, FILEGROWTH = 1024KB)
                      GO
                       
                      ALTER DATABASE [tempdb]
                      ADD FILE
                      (NAME = N'tempdev_Data07',
                      FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\tempdev_Data07.jim',
                      SIZE = 100MB, FILEGROWTH = 1024KB)
                      GO
                       
                      ALTER DATABASE [tempdb]
                      ADD FILE
                      (NAME = N'tempdev_Data08',
                      FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\tempdev_Data08.cathy',
                      SIZE = 100MB, FILEGROWTH = 1024KB)
                      GO

Open in new window


fig3_1.png  Figure3: Showing that data files have different extension i.e .ian, .jim and .cathy
 
3.1.3 How many Primary Data Files, a database can have:
Every database can only have ONE primary data file ( mdf ) but do you really agree with this point. Let’s dig it more with some experiments:
 
Step1: I changed the name of primary data file and transaction log file with below query to make it easy to understand:
 
USE [master]
                      GO
                      ALTER DATABASE [tempdb] MODIFY FILE (NAME=N'templog', NEWNAME= N'tempdev_Log')
                      GO
                      ALTER DATABASE [tempdb] MODIFY FILE (NAME=N'tempdev', NEWNAME=N'tempdev_Data01')
                      GO

Open in new window


Step2: Now I added four more data files 'tempdev_Data02', 'tempdev_Data03', 'tempdev_Data04'and  'tempdev_Data05' with below script:
 
USE [master]
                      GO
                      ALTER DATABASE [tempdb]
                      ADD FILE
                      ( NAME = N'tempdev_Data02',
                      FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\tempdev_Data02.mdf' ,
                      SIZE = 100MB , FILEGROWTH = 1024KB )
                      GO
                       
                      ALTER DATABASE [tempdb]
                      ADD FILE
                      ( NAME = N'tempdev_Data03',
                      FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\tempdev_Data03.mdf' ,
                      SIZE = 100MB , FILEGROWTH = 1024KB )
                      GO
                       
                      ALTER DATABASE [tempdb]
                      ADD FILE
                      ( NAME = N'tempdev_Data04',
                      FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\tempdev_Data04.mdf' ,
                      SIZE = 100MB , FILEGROWTH = 1024KB )
                      GO
                       
                      ALTER DATABASE [tempdb]
                      ADD FILE
                      ( NAME = N'tempdev_Data05',
                      FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\tempdev_Data05.mdf' ,
                      SIZE = 100MB , FILEGROWTH = 1024KB )
                      GO

Open in new window

 
Step3: Let’s see all the files with below query:

Use tempdb;
                      select name, physical_name from sys.database_files

Open in new window


fig4_1.pngFigure4: Showing that data files tempdev_data01, tempdev_data02, tempdev_data03, tempdev_data04, tempdev_data05 having same extension .mdf
 
So still the statement “Every database can only have ONE primary data file ( mdf )” is true or something wrong with this statement or the statement is not correct.
 
The answer lies in the statement which we made in 3.1.2 section:It is recommended that primary data file should have .mdf extension but really you are free to give any extension to it like .abc, .ian, .jim, .cathy, .xyz “ what we discussed earlier and the statement “Every database can only have ONE primary data file ( mdf )” is also true because a primary data file is the data file associated with any database when database is created for the very first time, and YES this can only be ONE. Same is true for transaction log file ( .ldf ) and secondary data file ( .ndf )
 
Here I want to specify one more thing, suppose we have several mdf files for a particular database & we want to know that which is the main; to uncover it let’s take the above example which is in figure4, here we have FIVE data files tempdev_data01, tempdev_data02, tempdev_data03, tempdev_data04, tempdev_data05. Now suppose someone changed the name then how we will identify that the only ONE primary data file. Let’s take an example:
 
Step1: In figure4 I showed you FIVE mdf files for tempdb database and three data files which have .ian, .jim and .cathy extension. Now I am going to change the name of tempdev_data01.mdf to tempdev_data10.mdf with below query:

USE master
                      GO
                      ALTER DATABASE [tempdb] MODIFY FILE (NAME=N'tempdev_data01', NEWNAME= N'tempdev_Data10',
                      FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\tempdev_Data10.mdf')

Open in new window


fig5_1.png  Figure5: Modifying data file

Step2: Check the name and physical_name for files in tempdb database with below query:

use [tempdb]
                      GO
                      select name, physical_name  from sys.database_files
                      GO

Open in new window


fig6_1.png  Figure6: Showing multiple data files with .mdf extension and some different extension
 
Now there are again FIVE data file with .mdf extension and THREE data files with other extension .jim, .ian and .cathy. So many data files how we find that the only ONE primary data file which we are discussing, because till this discussion we know that the primary data file should be one and only ONE. So let’s find out that the only ONE primary data file.
 
Step3: To find out one and only ONE primary data file for a database execute below query:

use [tempdb]
                      GO
                      select name, physical_name  from sys.database_files
                      GO

Open in new window


fig7_1.png            Figure7: Fetching the name and location of only ONE primary data file
 
Now it is clearly visible that the only ONE primary data file is tempdev_Data10.mdf. Someone here can ask one question why we use sysdatabases only for query not other query, the answer is simple because the information about the only ONE primary data file is stored in the filename column of the sysdatabases in the master database.
 
3.1.4 Location of Primary Data Files:
The location of the primary data file is stored in the filename column of the sysdatabases in the master database.
When SQL Server starts up database it looks for this file because in this file the information for all databases exists.
Primary data file contains sys.database_files system table which stores information about all other files in a specific database. Here someone can ask you why sys.database_files not sysfiles [sysfiles also give you the information about all files in the database]. So let’s move ahead to know why I said sys.database_files.
 
sysfiles vs sys.database_files
The compatibility view sysfiles returns  one row for every file that is associated with a database. 

fig8_1.png  Figure8: Showing output from sysfiles

The catalog view sys.database_files is the replacement for sysfiles.  Like sysfiles, sys.database_files also returns one row for every file that is associated to a database. sys.database_files gives more information than the sysfiles and have more readable output. It gives information about state i.e. ONLINE/OFFLINE, lsn, growth, media etc. Below is the output of sys.database_files splitted in 3 parts because of space.
fig9.png                                                Figure9: Showing output from sys.database_files
 
So from the above discussion we can conclude also that primary data file have pointer to all other files in the specific database.
 
3.2) Secondary Data Files:

  • A database can have any number of secondary data files, these files are optional and user-defined. We can use secondary files to spread data across multiple disks.
  • Data files other than the primary data file make secondary data files.
  • It is not necessary to have secondary data files as these are optional we can create it depending on the requirement & database environment.
  • We can put secondary data files in default filegroup or any other filegroup defined by user.
  • .ndf is the recommended extension for secondary data files.
 
3.3) Transaction Log Files:
3.3.1 What is Transaction Log File?
Transaction log file is a sequential record of transactions which holds the log information that is used to help in disaster recovery scenarios to recover the database.
 
3.3.2 Extension of Transaction Log File:
Recommended extension for log file is .ldf, again it is not mandatory you are free to give any extension to it.
 
3.3.3 How many Transaction Log Files, a database can have:
We can create multiple log files for a database but there can be only one log file will be active at a time because transactions are written to first file until it will get full. Once it will get full transactions will get written in second file and so on.
 
3.3.4 Location of Transaction Log Files:
Location of transaction log file is stored in sys.database_files, we can use below query to get the location of transaction log file for a specific database;
 
use TestDB
                      select name, type_desc, physical_name from sys.database_files

Open in new window


fig10_1.png                            Figure10: Getting name and location of transaction log file
 
4. Storage Limitation/Capacity of SQL Server:
SQL Server database engine has following storage capacity for different SQL Server objects:

fig11.png 
5. References:
[]1] Database file and filegroups. Technet Microsoft
[]2] Files and Filegroups Architecture. Technet Microsoft
[]3] Microsoft SQL Server 2000 Unleashed By Ray Rankins, Paul Jensen, Paul Bertucci. Sams Publishing.
[]4] Microsoft SQL Server 2012 Administration: Real-World Skills for MCSA Certification and Beyond
[]5] Maximum Capacity Specifications for SQL Server. Technet Microsoft
[]6] www.sqlserverzest.com

6. Final Words:
This is all about SQL Server database physical files, many more things are still need to uncover which you will see probably in coming articles like files and filegroups, pages, extents, transaction log file in more details. Give your feedback so that in future I can serve you more better till then happy reading and keep sharing your knowledge :-)
 
3
3,158 Views
Yashwant VishwakarmaSQL DBA
CERTIFIED EXPERT

Comments (1)

Jim HornSQL Server Data Dude
CERTIFIED EXPERT
Most Valuable Expert 2013
Author of the Year 2015

Commented:
Nicely illustrated and organized.  Voted Yes.

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.