Transform Text copied from Excel Table into Real Table in SQL

Ryan Chong
CERTIFIED EXPERT
The best way to learn is to teach
Published:
Updated:
Edited by: Andrew Leniart
After looking into a question at EE, a thought came to my mind that I considered would make a good article. "Transform Text copied from Excel Table into Real Table in SQL" as it is totally possible to do. Here's the result.
After looking into this question, I decided to write this article to show that transforming text copied from an Excel Table into a Real Table in SQL is totally possible to do. Here's how to do it.

The Concept



This is simple to understand. Let's say we try to copy a piece of data from Excel and would like to transform it into SQL, which eventually generates the output as a Table.

The Initial Analysis


First, let's try to deal with the text in the original question.

PlanID       SSNUM          EFFDATE          ACCTID
Cumb429   555555555    2019-01-01       39506
Cumb429   555555555    2019-01-01        31506
Cumb429   555555555    2021-01-20      31506

It seems that the source data is formatted with a separator. This leads me to think the source data was copied from somewhere, such as from Excel or a Text Delimited file.

a) Assuming we already have the source in Excel like below:


You could download the file here.

Do note:
* We are not going to handle data types at this moment.
* We assume there is no space for the column names.
* I will be demonstrating how to do it in SQL Server version 2017.

If we copied the text from Excel above and paste it to SSMS, it would be obvious that the separator is "Tab" delimited.


b) However, in the case above in the original question, the field's values are actually separated by spaces.


How to do it correctly


First, I would emphasize that if the source data was in Excel or in Text Delimited format, it could be easily imported into SQL Server, without using the methods that will be discussing below.

  • To import Excel into SQL Server, we could use the OPENROWSET method.
  • To import a Text Delimited file into SQL Server, we could use the BULK INSERT method.

To start the discussion within this article context, I think we now have two different types of input strings, one with "Tab" delimited, while another one is "Space" delimited.

a) Split the string with "Tab" delimited

In order to split the string into rows, we need to know its delimiter. In this case, it would be a carriage return or line break as a delimiter. Then in SQL Server, it is equivalent to CHAR(13)+CHAR(10). For Tab, it would be CHAR(9).

To do this, we use the following codes:

DECLARE @ROW_DELIMITER VARCHAR(10) = CHAR(13)
DECLARE @COL_DELIMITER VARCHAR(10) = CHAR(9)
DECLARE @OUTPUT VARCHAR(MAX)

DECLARE @INPUT VARCHAR(MAX) = 'PlanID    SSNUM    EFFDATE    ACCTID
Cumb429    555555555    2019-01-01    39506
Cumb429    555555555    2019-01-01    31506
Cumb429    555555555    2021-01-20    31506
'

;WITH src AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) row_idx, TRIM([value]) [row]
    FROM STRING_SPLIT(REPLACE(@INPUT, @ROW_DELIMITER+CHAR(10), @ROW_DELIMITER), @ROW_DELIMITER) tb
    WHERE TRIM([value]) <> ''
), header AS
(
    SELECT src.row_idx, ROW_NUMBER() OVER (PARTITION BY src.row_idx ORDER BY (SELECT 1)) col_idx, TRIM(c.[value]) [col]
    FROM src
    CROSS APPLY STRING_SPLIT(src.[row], @COL_DELIMITER) c
    WHERE src.row_idx = 1
), content AS
(
    SELECT src.row_idx, ROW_NUMBER() OVER (PARTITION BY src.row_idx ORDER BY (SELECT 1)) col_idx, TRIM(c.[value]) [col]
    FROM src
    CROSS APPLY STRING_SPLIT(src.[row], @COL_DELIMITER) c
    WHERE src.row_idx > 1
), prep AS
(
    SELECT row_idx, col_idx, [col], tmp.max_row_idx
    FROM content
    CROSS APPLY (SELECT MAX(row_idx) max_row_idx FROM content) tmp
    GROUP BY row_idx, col_idx, [col], tmp.max_row_idx
), sql_list AS
(
    SELECT c.row_idx, 'SELECT '''+STRING_AGG(c.[col]+''' '+h.[col], ', ''') WITHIN GROUP (ORDER BY c.col_idx)+ CASE WHEN c.row_idx = c.max_row_idx THEN '' ELSE ' UNION ALL ' END [sql]
    FROM prep c INNER JOIN header h on c.col_idx = h.col_idx
    GROUP BY c.row_idx, c.max_row_idx, CASE WHEN c.row_idx = c.max_row_idx THEN '' ELSE ' UNION ALL ' END
)
SELECT @OUTPUT = STRING_AGG([sql], CHAR(13)+CHAR(10)) WITHIN GROUP (ORDER BY row_idx)
from sql_list

EXEC(@OUTPUT)

The generated result looks pretty good and is transformed to the expected result in Table form.


The table-value function STRING_SPLIT was being used in this case. In case this function is not supported in your development environment, you could look for alternative ways here.

Similarly, for the STRING_AGG function, in case it's not supported in your development environment, you could find alternative ways here.

b) Split the string with "Space" delimited

The approach to be used for "Space" delimited content should be similar to "Tab" delimited content, but we need to remove the replicate of spaces before doing the transformation.

The sample script below shows how to achieve that.

CREATE OR ALTER FUNCTION dbo.Strip_Chars
(
    @INPUT NVARCHAR(MAX),
    @CHAR NVARCHAR(1) = ' ',
    @TRIMATLAST BIT = 0
)
RETURNS varchar(MAX) AS
BEGIN 
    WHILE CHARINDEX(@CHAR+@CHAR, @INPUT) > 0 
        SET @INPUT = REPLACE(@INPUT, @CHAR+@CHAR, @CHAR)
    
    IF @TRIMATLAST = 1
        SET @INPUT = TRIM(@INPUT)

    RETURN @INPUT
END

GO

DECLARE @ROW_DELIMITER VARCHAR(10) = CHAR(13)
DECLARE @COL_DELIMITER VARCHAR(10) = ' '
DECLARE @OUTPUT VARCHAR(MAX)

DECLARE @INPUT VARCHAR(MAX) = 'PlanID       SSNUM          EFFDATE          ACCTID
Cumb429   555555555    2019-01-01       39506
Cumb429   555555555    2019-01-01        31506
Cumb429   555555555    2021-01-20      31506
'

;WITH src AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) row_idx, TRIM([value]) [row]
    FROM STRING_SPLIT(REPLACE(dbo.Strip_Chars(@INPUT, DEFAULT, DEFAULT), @ROW_DELIMITER+CHAR(10), @ROW_DELIMITER), @ROW_DELIMITER) tb
    WHERE TRIM([value]) <> ''
), header AS
(
    SELECT src.row_idx, ROW_NUMBER() OVER (PARTITION BY src.row_idx ORDER BY (SELECT 1)) col_idx, TRIM(c.[value]) [col]
    FROM src
    CROSS APPLY STRING_SPLIT(src.[row], @COL_DELIMITER) c
    WHERE src.row_idx = 1
), content AS
(
    SELECT src.row_idx, ROW_NUMBER() OVER (PARTITION BY src.row_idx ORDER BY (SELECT 1)) col_idx, TRIM(c.[value]) [col]
    FROM src
    CROSS APPLY STRING_SPLIT(src.[row], @COL_DELIMITER) c
    WHERE src.row_idx > 1
), prep AS
(
    SELECT row_idx, col_idx, [col], tmp.max_row_idx
    FROM content
    CROSS APPLY (SELECT MAX(row_idx) max_row_idx FROM content) tmp
    GROUP BY row_idx, col_idx, [col], tmp.max_row_idx
), sql_list AS
(
    SELECT c.row_idx, 'SELECT '''+STRING_AGG(c.[col]+''' '+h.[col], ', ''') WITHIN GROUP (ORDER BY c.col_idx)+ CASE WHEN c.row_idx = c.max_row_idx THEN '' ELSE ' UNION ALL ' END [sql]
    FROM prep c INNER JOIN header h on c.col_idx = h.col_idx
    GROUP BY c.row_idx, c.max_row_idx, CASE WHEN c.row_idx = c.max_row_idx THEN '' ELSE ' UNION ALL ' END
)
SELECT @OUTPUT = STRING_AGG([sql], CHAR(13)+CHAR(10)) WITHIN GROUP (ORDER BY row_idx)
FROM sql_list

EXEC(@OUTPUT)

The generated result looks pretty good and is transformed to the expected result in Table form.


The Strip_Chars function was being created to facilitate the process of stripping the extra characters from a string.

Do note the change of declaration on variable @COL_DELIMITER from CHAR(9) to ' '.


What is next?


To make the script robust enough, we probably can embed the error handling and make the whole script in a stored procedure. I did make a couple of amendments to make the script better in handling different scenarios.

CREATE OR ALTER FUNCTION dbo.Strip_Chars
(
    @INPUT NVARCHAR(MAX),
    @CHAR NVARCHAR(1) = ' ',
    @TRIMATLAST BIT = 0
)
RETURNS varchar(MAX) AS
BEGIN 
    WHILE CHARINDEX(@CHAR+@CHAR, @INPUT) > 0 
        SET @INPUT = REPLACE(@INPUT, @CHAR+@CHAR, @CHAR)
    
    IF @TRIMATLAST = 1
        SET @INPUT = TRIM(@INPUT)

    RETURN @INPUT
END

GO

CREATE OR ALTER PROCEDURE dbo.ConverTextToTable
(
    @INPUT VARCHAR(MAX),
    @COL_DELIMITER VARCHAR(10) = ' ',
    @DEBUGMODE BIT = 1
) AS
BEGIN
    BEGIN TRY
        
        DECLARE @ROW_DELIMITER VARCHAR(10) = CHAR(13)
        DECLARE @OUTPUT VARCHAR(MAX)
        DECLARE @tb AS TABLE
        (
            [value] VARCHAR(MAX)
        )

        IF @COL_DELIMITER = ' '
            INSERT INTO @tb
            SELECT * FROM
            STRING_SPLIT(REPLACE(dbo.Strip_Chars(@INPUT, DEFAULT, DEFAULT), @ROW_DELIMITER+CHAR(10), @ROW_DELIMITER), @ROW_DELIMITER)
        ELSE
            INSERT INTO @tb
            SELECT * FROM
            STRING_SPLIT(REPLACE(@INPUT, @ROW_DELIMITER+CHAR(10), @ROW_DELIMITER), @ROW_DELIMITER)

        ;WITH src AS
        (
            SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) row_idx, TRIM([value]) [row]
            FROM @tb
            WHERE TRIM([value]) <> ''
        ), header AS
        (
            SELECT src.row_idx, ROW_NUMBER() OVER (PARTITION BY src.row_idx ORDER BY (SELECT 1)) col_idx, TRIM(c.[value]) [col]
            FROM src
            CROSS APPLY STRING_SPLIT(src.[row], @COL_DELIMITER) c
            WHERE src.row_idx = 1
        ), content AS
        (
            SELECT src.row_idx, ROW_NUMBER() OVER (PARTITION BY src.row_idx ORDER BY (SELECT 1)) col_idx, TRIM(c.[value]) [col]
            FROM src
            CROSS APPLY STRING_SPLIT(src.[row], @COL_DELIMITER) c
            WHERE src.row_idx > 1
        ), prep AS
        (
            SELECT row_idx, col_idx, [col], tmp.max_row_idx
            FROM content
            CROSS APPLY (SELECT MAX(row_idx) max_row_idx FROM content) tmp
            GROUP BY row_idx, col_idx, [col], tmp.max_row_idx
        ), sql_list AS
        (
            SELECT c.row_idx, 'SELECT '''+STRING_AGG(c.[col]+''' '+h.[col], ', ''') WITHIN GROUP (ORDER BY c.col_idx)+ CASE WHEN c.row_idx = c.max_row_idx THEN '' ELSE ' UNION ALL ' END [sql]
            FROM prep c INNER JOIN header h on c.col_idx = h.col_idx
            GROUP BY c.row_idx, c.max_row_idx, CASE WHEN c.row_idx = c.max_row_idx THEN '' ELSE ' UNION ALL ' END
        )
        SELECT @OUTPUT = STRING_AGG([sql], CHAR(13)+CHAR(10)) WITHIN GROUP (ORDER BY row_idx)
        FROM sql_list

        EXEC(@OUTPUT)
    END TRY
    BEGIN CATCH
        IF @DEBUGMODE = 1
            SELECT CONCAT('<<ERROR>>Error Number: ',ERROR_NUMBER(),CHAR(13)+CHAR(10),' Line Number: ',ERROR_LINE(),CHAR(13)+CHAR(10),' Error Message: ',ERROR_MESSAGE()) [value]
    END CATCH 

    END

GO

To try it out with the created stored procedure:

a) Split the string with "Tab" delimited

DECLARE @COL_DELIMITER VARCHAR(10) = CHAR(9)
DECLARE @DEBUGMODE BIT = 1
DECLARE @INPUT VARCHAR(MAX) = 'PlanID    SSNUM    EFFDATE    ACCTID
Cumb429    555555555    2019-01-01    39506
Cumb429    555555555    2019-01-01    31506
Cumb429    555555555    2021-01-20    31506
'
EXEC dbo.ConverTextToTable @INPUT, @COL_DELIMITER, @DEBUGMODE

b) Split the string with "Space" delimited

DECLARE @COL_DELIMITER VARCHAR(10) = ' '
DECLARE @DEBUGMODE BIT = 1
DECLARE @INPUT VARCHAR(MAX) = 'PlanID       SSNUM          EFFDATE          ACCTID
Cumb429   555555555    2019-01-01       39506
Cumb429   555555555    2019-01-01        31506
Cumb429   555555555    2021-01-20      31506'
EXEC dbo.ConverTextToTable @INPUT, @COL_DELIMITER, @DEBUGMODE

Both should produce the same result as what mentioned earlier.

More works can be done


So, now the stored procedure created seems good enough to generate the expected result. Having said that, I have mentioned the points below early in the article, which I think can be enhanced:

* We are not going to handle the data types at this moment.
* Assuming there is no space for the column names.

Extra point:
* The UNION ALL approach may not be good in process optimization if the raw data is huge. Hence, an alternative transformation method could be considered.

I would probably discuss the points and further enhance this article, or create the "Part 2" for the article when it's necessary.

Finally, I hope you enjoy this article!
0
1,124 Views
Ryan Chong
CERTIFIED EXPERT
The best way to learn is to teach

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.