SQL server filter and parse

I have a table TableA with field 'ToParse' datatype varchar(8000) with the following value (all in one line - I put line breaks for easier readibility)

False|False|[EmpID]|[EmpID]|int|10|10|0|@@
True|False|[Position]|[Position]|nvarchar|50||||LOWER([Position])@@
False|False|[Gender]|[Gender]|nchar|1||||@@
False|True|[Age]|[Age]|int||2||10|DATEDIFF("Year", [DateOfBirth],getdate())@@
True|True|[PositionGender]|[PositionGender]|varchar|200||||Position+Gender@@
True|True|[SubsPos]|[SubsPos]|varchar|200||||SUBSTRING([Position],[1],[5])@@

I need to parse this (getting rid of unwanted junk data and pipes and @@) to:

[EmpID] as [EmpID],
LOWER([Position]) as [Position],
[Gender] as [Gender],
DATEDIFF("Year", [DateOfBirth],getdate()) as [Age],
Position+Gender as [PositionGender],
SUBSTRING([Position],1,5) as [SubsPos]
spiroseAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

SharathData EngineerCommented:
Do you have line breaks in your actual data or not. If you have pipes, we can seperate the data, otherwise how do you want to seperate the data of each column?
0
spiroseAuthor Commented:
No we don't have line breaks in the actual data. I have pipes and @ symbols. This is one value of a field.
0
SharathData EngineerCommented:
Do you have an identity or any primary key column? If yes,try this code.
;with CTE as ( 
SELECT ID,ltrim(SUBSTRING(ToParse, n, CHARINDEX('|', ToParse + '|',n) - n)) AS ToParse,n,
       ROW_NUMBER() over (partition by ID order by n) RowNum1,
       ROW_NUMBER() over (partition by ID order by n desc) RowNum2 
 FROM #TableA 
CROSS JOIN (SELECT number FROM master..spt_values WHERE type = 'P')  AS Numbers(n)
WHERE SUBSTRING(',' + ToParse, n, 1) = '|'
  AND n < LEN(ToParse) + 1),
CTE2 as (
select ID,max(case when RowNum2 = 1 then REPLACE(ToParse,'@@','') end) s1,
       max(case when RowNum1 = 3 then ToParse end) s2
  from CTE where RowNum1 = 3 or RowNum2 = 1 group by ID)
select ID,ISNULL(NULLIF(s1,''),s2) + ' as ' + s2
  from CTE2

Open in new window


Tested with your sample date.
 
create table #TableA  (ID int identity,ToParse varchar(100))
insert #TableA 
select 'False|False|[EmpID]|[EmpID]|int|10|10|0|@@' union all
select 'True|False|[Position]|[Position]|nvarchar|50||||LOWER([Position])@@' union all
select 'False|False|[Gender]|[Gender]|nchar|1||||@@' union all
select 'False|True|[Age]|[Age]|int||2||10|DATEDIFF("Year", [DateOfBirth],getdate())@@' union all
select 'True|True|[PositionGender]|[PositionGender]|varchar|200||||Position+Gender@@' union all
select 'True|True|[SubsPos]|[SubsPos]|varchar|200||||SUBSTRING([Position],[1],[5])@@'
 

;with CTE as ( 
SELECT ID,ltrim(SUBSTRING(ToParse, n, CHARINDEX('|', ToParse + '|',n) - n)) AS ToParse,n,
       ROW_NUMBER() over (partition by ID order by n) RowNum1,
       ROW_NUMBER() over (partition by ID order by n desc) RowNum2 
 FROM #TableA 
CROSS JOIN (SELECT number FROM master..spt_values WHERE type = 'P')  AS Numbers(n)
WHERE SUBSTRING(',' + ToParse, n, 1) = '|'
  AND n < LEN(ToParse) + 1),
CTE2 as (
select ID,max(case when RowNum2 = 1 then REPLACE(ToParse,'@@','') end) s1,
       max(case when RowNum1 = 3 then ToParse end) s2
  from CTE where RowNum1 = 3 or RowNum2 = 1 group by ID)
select ID,ISNULL(NULLIF(s1,''),s2) + ' as ' + s2
  from CTE2
/*
ID	(No column name)
1	[EmpID] as [EmpID]
2	LOWER([Position]) as [Position]
3	[Gender] as [Gender]
4	DATEDIFF("Year", [DateOfBirth],getdate()) as [Age]
5	Position+Gender as [PositionGender]
6	SUBSTRING([Position],[1],[5]) as [SubsPos]
*/
 drop table #TableA  

Open in new window

0
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

spiroseAuthor Commented:
Hi Sharath,
The field value
"False|False|[EmpID]|[EmpID]|int|10|10|0|@@
True|False|[Position]|[Position]|nvarchar|50||||LOWER([Position])@@
False|False|[Gender]|[Gender]|nchar|1||||@@
False|True|[Age]|[Age]|int||2||10|DATEDIFF("Year", [DateOfBirth],getdate())@@
True|True|[PositionGender]|[PositionGender]|varchar|200||||Position+Gender@@
True|True|[SubsPos]|[SubsPos]|varchar|200||||SUBSTRING([Position],[1],[5])@@"

is actually 1 record value as follows:
"False|False|[EmpID]|[EmpID]|int|10|10|0|@@True|False|[Position]|[Position]|nvarchar|50||||LOWER([Position])@@False|False|[Gender]|[Gender]|nchar|1||||@@False|True|[Age]|[Age]|int||2||10|DATEDIFF("Year",[DateOfBirth],getdate())@@True|True|[PositionGender]|[PositionGender]|varchar|200||||Position+Gender@@True|True|[SubsPos]|[SubsPos]|varchar|200||||SUBSTRING([Position],[1],[5])@@"

Would the logic/query still work?
0
SharathData EngineerCommented:
As you stated in different lines, I assumed each one as one record. If the entire string is in one record, I need to tweak my query a bit. Give me some time.
0
spiroseAuthor Commented:
I was thinking of using the '@@' as a way of separating the various field values.
 
This is going to be the pattern:
 
If the first Value before the first pipe is 'True' or if the Second value after the first pipe (thus before the second pipe) is True,or both the values are True,
then, last value after the last pipe just before the '@@' +'AS' + the third value i.e. after the second pipe (thus before the third pipe).
        ex 1 a: Position+Gender as [PositionGender] BUT we need to put square brackets around ColumnNames if there isn't one already
        i.e. the ex above become [Position] + [Gender] as [PositionGender]
 
        ex 1 b: However, if there are square brackets around numbers ex: SUBSTRING([Position],[1],[5]), then we need to eliminate the brackets around those numbers to make them
        SUBSTRING([Position],1,5) as [SubsPos]
 
        ex 2: DATEDIFF("Year",[DateOfBirth],getdate()) as [Age]
 
If the first value is False and the secondvalue is false,
then we simply do third value after the second pipe (thus before the third pipe) as 4th value
        ex: [EmpID] as [EmpID]
 
Does this help?
 
0
spiroseAuthor Commented:
Please ignore editing the square brackets part , that is bad data and it will be taken care of.
0
spiroseAuthor Commented:
Also, eventually in the final filtered data all the @@ should be replaced by ,
0
SharathData EngineerCommented:
try this.
CREATE TABLE #TableA 
  ( 
     ToParse VARCHAR(MAX) 
  ) 

INSERT #TableA 
SELECT 'False|False|[EmpID]|[EmpID]|int|10|10|0|@@True|False|[Position]|[Position]|nvarchar|50||||LOWER([Position])@@False|False|[Gender]|[Gender]|nchar|1||||@@False|True|[Age]|[Age]|int||2||10|DATEDIFF("Year",[DateOfBirth],getdate())@@True|True|[PositionGender]|[PositionGender]|varchar|200||||Position+Gender@@True|True|[SubsPos]|[SubsPos]|varchar|200||||SUBSTRING([Position],[1],[5])@@'; 

WITH CTE1 
     AS (SELECT LTRIM(SUBSTRING(ToParse, n, CHARINDEX('@', ToParse + '@', n) - n)) AS ToParse,
                ROW_NUMBER() OVER (PARTITION BY ToParse ORDER BY n)                ID 
           FROM (SELECT ToParse = REPLACE(ToParse, '@@', '@') 
                   FROM #TableA) t1 
                CROSS JOIN (SELECT NUMBER 
                              FROM MASTER..spt_values 
                             WHERE TYPE = 'P') AS Numbers(n) 
          WHERE SUBSTRING('@' + ToParse, n, 1) = '@' 
                AND n < LEN(ToParse) + 1), 
     CTE2 
     AS (SELECT ID, 
                ToParse = ToParse + '@@' 
           FROM CTE1), 
     CTE3 
     AS (SELECT ID, 
                LTRIM(SUBSTRING(ToParse, n, CHARINDEX('|', ToParse + '|', n) - n)) AS ToParse,
                n, 
                ROW_NUMBER() OVER (PARTITION BY ID ORDER BY n)                     RowNum1, 
                ROW_NUMBER() OVER (PARTITION BY ID ORDER BY n DESC)                RowNum2 
           FROM CTE2 
                CROSS JOIN (SELECT NUMBER 
                              FROM MASTER..spt_values 
                             WHERE TYPE = 'P') AS Numbers(n) 
          WHERE SUBSTRING(',' + ToParse, n, 1) = '|' 
                AND n < LEN(ToParse) + 1), 
     CTE4 
     AS (SELECT ID, 
                MAX(CASE 
                      WHEN RowNum2 = 1 THEN REPLACE(ToParse, '@@', '') 
                    END) s1, 
                MAX(CASE 
                      WHEN RowNum1 = 3 THEN ToParse 
                    END) s2 
           FROM CTE3 
          WHERE RowNum1 = 3 
                 OR RowNum2 = 1 
          GROUP BY ID) 
SELECT ISNULL(NULLIF(s1, ''), s2) + ' as ' + s2 
  FROM CTE4 

DROP TABLE #TableA 

/*
(No column name)
[EmpID] as [EmpID]
LOWER([Position]) as [Position]
[Gender] as [Gender]
DATEDIFF("Year",[DateOfBirth],getdate()) as [Age]
Position+Gender as [PositionGender]
SUBSTRING([Position],[1],[5]) as [SubsPos]
*/

Open in new window

0
spiroseAuthor Commented:
Hi Sharath,
The output should also be only one record as follows:
[EmpID] as [EmpID],LOWER([Position]) as [Position],
[Gender] as [Gender],DATEDIFF("Year", [DateOfBirth],getdate()) as [Age],Position+Gender as [PositionGender],SUBSTRING([Position],1,5) as [SubsPos]
0
SharathData EngineerCommented:
Here you go.
CREATE TABLE #TableA 
  ( 
     ToParse VARCHAR(MAX) 
  ) 

INSERT #TableA 
SELECT 'False|False|[EmpID]|[EmpID]|int|10|10|0|@@True|False|[Position]|[Position]|nvarchar|50||||LOWER([Position])@@False|False|[Gender]|[Gender]|nchar|1||||@@False|True|[Age]|[Age]|int||2||10|DATEDIFF("Year",[DateOfBirth],getdate())@@True|True|[PositionGender]|[PositionGender]|varchar|200||||Position+Gender@@True|True|[SubsPos]|[SubsPos]|varchar|200||||SUBSTRING([Position],[1],[5])@@'; 

WITH CTE1 
     AS (SELECT LTRIM(SUBSTRING(ToParse, n, CHARINDEX('@', ToParse + '@', n) - n)) AS ToParse,
                ROW_NUMBER() OVER (PARTITION BY ToParse ORDER BY n)                ID 
           FROM (SELECT ToParse = REPLACE(ToParse, '@@', '@') 
                   FROM #TableA) t1 
                CROSS JOIN (SELECT NUMBER 
                              FROM MASTER..spt_values 
                             WHERE TYPE = 'P') AS Numbers(n) 
          WHERE SUBSTRING('@' + ToParse, n, 1) = '@' 
                AND n < LEN(ToParse) + 1), 
     CTE2 
     AS (SELECT ID, 
                ToParse = ToParse + '@@' 
           FROM CTE1), 
     CTE3 
     AS (SELECT ID, 
                LTRIM(SUBSTRING(ToParse, n, CHARINDEX('|', ToParse + '|', n) - n)) AS ToParse,
                n, 
                ROW_NUMBER() OVER (PARTITION BY ID ORDER BY n)                     RowNum1, 
                ROW_NUMBER() OVER (PARTITION BY ID ORDER BY n DESC)                RowNum2 
           FROM CTE2 
                CROSS JOIN (SELECT NUMBER 
                              FROM MASTER..spt_values 
                             WHERE TYPE = 'P') AS Numbers(n) 
          WHERE SUBSTRING(',' + ToParse, n, 1) = '|' 
                AND n < LEN(ToParse) + 1), 
     CTE4 
     AS (SELECT ID, 
                MAX(CASE 
                      WHEN RowNum2 = 1 THEN REPLACE(ToParse, '@@', '') 
                    END) s1, 
                MAX(CASE 
                      WHEN RowNum1 = 3 THEN ToParse 
                    END) s2 
           FROM CTE3 
          WHERE RowNum1 = 3 
                 OR RowNum2 = 1 
          GROUP BY ID), 
     CTE5 
     AS (SELECT ID, 
                ToParse = ISNULL(NULLIF(s1, ''), s2) + ' as ' + s2 
           FROM CTE4) 
SELECT DISTINCT RTRIM(SUBSTRING(ISNULL((SELECT ',' + ToParse 
                                          FROM CTE5 
                                        FOR XML PATH('')), ' '), 2, 2000)) 

DROP TABLE #TableA 
/*
[EmpID] as [EmpID],LOWER([Position]) as [Position],[Gender] as [Gender],DATEDIFF("Year",[DateOfBirth],getdate()) as [Age],Position+Gender as [PositionGender],SUBSTRING([Position],[1],[5]) as [SubsPos]
*/

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
spiroseAuthor Commented:
Thanks!
0
spiroseAuthor Commented:
I have one related question(please let me know if opening a new question is more appropriate). I thought since it is relevant to this question, I could probably seek clarification here.

If ToParse has something like:


I have 2 different types of syntaxes to worry about:
 
1st- convert something like:
 
True|True|[ConvertEmpID]|[ConvertEmpID]|varchar|200||||(DT_STR,200,1252)[EmployeeID]@@
True|True|[ConvertManagerID]|[ConvertManagerID]|int|||||(DT_I4)[ManagerID]@@
 
TO:
convert(varchar(200),[EmployeeID],1252) as ConvertEmpID,
convert(int, ManagerID) as ConvertMgrID
 
 
2nd - convert something like:
 
True|True|[BirthDateAsNull]|[BirthDateAsNull]|datetime|||||NULL(DT_DBTIMESTAMP)@@
True|True|[PositionAsNull]|[PositionAsNull]|varchar|200||||NULL(DT_STR,200,1252)@@
True|True|[EmpIDAsNull]|[EmpIDAsNull]|int|||||NULL(DT_I4)@@
 
TO:
NULL as BirthDateAsNull,
NULL as PositionAsNull,
NULL as EmpIDAsNull
 
How do I modify your code to incorporate this? Thank for your help and I apologize for the hassle
0
SharathData EngineerCommented:
Can you open a new question and post the link for new question here?
0
spiroseAuthor Commented:
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.