SQL Server Database NVARCHAR Data Type Conversion

I recently came into a job in which the previous developers had built databases using exclusively NVARCHAR, NCHAR and NTEXT data types. These datatypes were used in all table definitions, and all functions and stored procedures used them. All temp tables that got built also used Unicode data types.

This design provided several disadvantages, especially considering that there was no way the database would ever contain anything other than ASCII text.

Considering the amount of textual data in the database, the overhead of storing all text-based data as Unicode almost doubled the size of our databases.
Backups were considerably larger.
The databases were loaded via an ETL process, and implicit conversion from ASCII to Unicode produced additional overhead during laod times.
The front-end application was moving twice the data, thus increasing network traffic and application response time.

You may not know it but Unicode data occupies two bytes (and depending on the encoding, up to four bytes) of storage for each character, even if it is from a simple character set (e.g. abcdefg...). To prove this, type the following query into SSMS: SELECT DATALENGTH(N'ABC'). The result is 6, not 3 as might be expected.

I was tasked with coming up with a way to convert Unicode references to ASCII. This involved column definitions in tables as well as parameter and declared variable definitions in functinos and stored procedures; a rather simple task if you have a dozen tables and a few stored procedures. However, we had 150 tables and dozens of stored procedures and functions.

The first thought was to use ALTER TABLE to redefine the columns. This had several drawbacks. First, you can only alter one column at a time, and the underlying operation of the ALTER statement writes the contents of the table to a new temp table with the new format, then drops the orignal table and renames the temp table. If you have 100 columns in your table that can take some time, especially if your table has 100 million rows.

Secondly, ALTER TABLE will not work with primary keys or indexes in place, or FK constraints in place that reference the table you are trying to modify. Not only that, but if you have any views that reference the table, the ALTER will fail.

Not only did we have lots of tables with lots of NVARCHAR/NCHAR data types in the database, we had dozens of databases! YIKES! As you can see this was starting to become an insurmountable task that could take weeks (months?).

That is when I thought about using PowerShell. Write one script, and run it against multiple databases on multiple servers. Not only was it portable, it was flexible, in that there were no references to specific table names or column names, or any names, just objects like tables, stored procedures, functions, columns, indexes, etc. This article describes the components of the script and looks at specific portions of it for examination. When executed against a server it will generate a SQL script for each database on that server. A filter allows you to specify a pattern for databases to include in the scripting. A separate script is generated for each database.

PowerShell gives you access to Microsoft SMO (Server Management Object), a very powerful object that gives you access to much of the underlying framework of a SQL Server and/or database(s). The following are actual code snippets from the final project. The goal of the project was to create a script that, when executed, created a SQL Server script that would contain all the code necessary to convert NVARCHAR data types to VARCHAR.

In this snippet of code, $server has been previously defined as the name of the server you wish to connect to. Once executed, the connection is established and a whole wealth of information is available.

[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO') | out-null 
                      $ns = 'Microsoft.SqlServer.Management.Smo'
                      $svr = new-object ("$ns.Server") $server
                      $server = $svr.Name

Open in new window

The last line sets $server to the actual server name in case 'localhost' (as in this example) or an IP address was passed. (note: the connection above assumes you have access to the database through your Windows logon. If not, you will need to provide a userid and password to connect).

In the following snippet, a collection of database objects is created. In this example, we want all databases that start with 'DB_PROD_'. The pipe symbol (|) and Where-Object... could have been left off, and a collection of ALL databases would have been returned.

$dbs = $svr.Databases | Where-Object{$_.name -like "PROD_DB_*"}

Open in new window

This next block of code starts the processing for each database. I ran into situations where a database was offline or in a recovery state, which caused the script to fail, so I put in some added protection. The commented line was put there so I could switch between it and the line following it. Handy for testing when I wanted to run against just one database:

foreach ($db in $dbs)
                      {
                            $dbname = $db.Name
                            if ($dbname.ToLower() -eq 'prod_db_test')
                      #      if ($true)
                            {
                                  if (!$db.IsAccessible)
                                  {
                                        Write-Host "*******************************************************************************"
                                        Write-Host "* Unable to access the database $dbname on $server`."
                                        Write-Host "* The database may be offline, in a recovery state or otherwise unavailable."
                                        Write-Host "*******************************************************************************"
                                        continue
                                  }
                                  
                                  Write-Host "  Processing $dbname"

Open in new window

There is a lot of additional code in the script, most of which generates code for logging results, and may not be needed for your application. However, when we were testing this we wanted to be able to monitor database and log as well as tempdb growth.

The ForEach statement can be used against the following collections: StoredProcedures, Views, Functions, Triggers, Indexes, Tables and many other objects. Furthermore, within each of the objects in a collection, additional collections are available, such as the Columns collection within a Table object.

Another property exposed by most of these obects is .Script(). This property will generate a CREATE (or ALTER TABLE in the case of primary keys and constraints) script for the object. Thus the code

 foreach ($proc in $db.StoredProcedures) 
                                  {
                                        $IsSysObj = $proc.IsSystemObject
                                        if ($IsSysObj -eq $false)
                                        {
                                              $procname = $proc.Name
                                              if ($ExcludeProcedures -notcontains $procname)
                                              {
                                                    $schema = $proc.Schema
                                                    $scriptText = $proc.Script()
                                                    $scriptText = $scriptText.Replace("SET ANSI_NULLS ON SET QUOTED_IDENTIFIER ON", "SET ANSI_NULLS ON`nSET QUOTED_IDENTIFIER ON`nGO`n`n")
                                                    $scriptText = $scriptText -replace "NVARCHAR", "VARCHAR" -replace "NCHAR", "CHAR" -replace "NTEXT", "VARCHAR(MAX)" -replace "create +proc(edure)?", "ALTER PROCEDURE"
                                                    $CreateStoredProceduresScript += $scriptText + "`nGO`n`n" 
                                              }
                                        }
                                  }

Open in new window

will populate the variable $scriptText with code to create the procedure, while insuring only user procedures are scripted and procedures in the exclusion list (more on that later) are omitted.

Within the loop that is scripting the stored procedures, I have included code which first, places the SET commands on separate lines, and secondly replaces NVARCHAR, NCHAR and NTEXT with VARCHAR, CHAR and VARCHAR(MAX) respectively. This code will also replace CREATE PROCEDURE with ALTER PROCEDURE. Since the procedure could have been created using CREATE PROC or CREATE PROCEDURE, in either upper, lower or mixed case, the -REPLACE operator will handle this.

The same type of logic is used to script Triggers and Functions.

Tables are a different story. Since we need to modify the schema of the table, some creativity is needed. Since the entire script is attached, I will not include the actual code here, but will describe the technique used.

The Columns collection is scanned to see if the table has an IDENTITY column. If so, SET IDENTITY_INSERT ON is included in the script.
A count of rows is retrieved from the source table.
A temporary table is created named temp___tablename with the same schema as the source table except NVARCHAR, NCHAR and NTEXT are changed to VARCHAR, CHAR and VARCHAR(MAX).
An INSERT INTO temp__tablename statement is created along with a SELECT FROM tablename to move data from the original table.
A count of rows inserted into the temporary table is obtained and compared to the original table row count. If not equal, processing stops.
The original table is dropped.
temp_tablename is renamed to tablename.

Pretty much all obects associated with the database (tables, triggers, indexes, constraints, views, procedures and functions) are scripted as CREATE scripts (with the exception of functions and procedures, which are modified to ALTER. Once all scripting of the database has been accomplished, the scripts are "put together" into a single executable SQL script, complete with console and database logging.

Some things to mention. The database is set to SIMPLE recovery mode at the start of the generated script and set back to FULL at the end. If your database is, by default set to something other than FULL, that will need to be changed.
A function will be created in the master database and is used as part of logging database stats during conversion.

Towards the start of the script are the following three lines of code:

$ExcludeProcedures = @("procedure21")
                      $ExcludeTables = @("table12","table23","table45","table76")
                      $ExcludeFunctions = @("function13","function28")

Open in new window

The object names are phony, but the intent is that you can exclude certain objects from being scripted/converted. The primary purpose for tables is that they do not contain any Unicode data types, thus do not need to be converted. The function exclude list was included because we have some CLR functions and they need to remain intact because the underlying code is looking for a Unicode data type. The procedure exclude list was included in case you came across any procedures that required exclusion for some reason (we have some that call sp_executesql and that system stored procedure MUST have a Unicode string passed to it, thus we could not be converting NVARCHAR to VARCHAR within it.

In our case, all of our databases are pretty much identical, as we host data for clients, and for security purposes, each client has their own database. Your situation might be different and you would need to adjust the exclusion lists (if you use them) accordingly.

During execution of the PowerShell script, SQL code is generated to create objects tied to tables. This includes primary keys, foreign keys, constaints and indexes as well as triggers and are included in the final SQL script. This is required because the original table is dropped and the temporary table is renamed. In dropping the table, all of these objects are dropped as well.

Not scripted are any extended properties on any of the tables. The script can either be revised to collect these (there are ExtendedProperties collections available for tables and columns) or script them separately and add them back after the conversion.

A DBCC SHRINKDATABASE is executed at the end of the script and depending on a number of circumstances, can take as long, or longer to execute than the actual conversion. Althoguh I typically am against executing a DBCC SHRINKDATABASE, this is one time I would recommend it. If nothing else, it will give you a good idea of the amount fo data you've reclaimed by converting your data types.

WARNING: No transaction processing is done. MAKE SURE YOU HAVE BACKED UP YOUR DATABASE BEFORE STARTING A CONVERSION. Since pretty much every row of every table and every object tied to every table, along with every view, function and stored procedure is touched, the database log would grow to an unbelieveable size if the entire conversion process was logged in its entirety. That is one reason the SQL script sets the database to SIMPLE recovery mode at the start.

PLEASE: MAKE SURE YOU'VE BACKED UP YOUR DATABASE(S) BEFORE EXECUTING THE SQL CONVERSION SCRIPT GENERATED BY THIS CODE!!!

Comments (3)

D B

Software Developer

Author

Commented: 2014-10-20

For some reason, all my code snippets showed as 'undefined' and I had to add them back in.
NVARCHAR-ConversionScript.sql

Commented: 2014-10-25

Okay, I tried that and saved and still no snippets. I am going to include the snippets separately and hopefully you can get them into the article.

I just found it. I had to submit the article first :-).
I've attached the file also.
NVARCHAR-Conversion-Article-Code-Snippet