I_s
asked on
C# - Parse through DB2 column data and INSERT the data as new rows into SQL
Hi everyone!
I have data stored in a DB2 database which I am attempting to extract and put into a SQL Server 2005 db. My logic is to pull the db2 data into a DataSet so I can manipulate it, and then insert the DataSet into SQL.
Problem is - the data in db2 was stored in such a manner that I have to programatically parse through column data so they can be inserted into SQL as individual rows.
The data in the db2 server looks similare to this:
REQNUM(integer) STATUS(varchar) DATE(date) USERID(varchar) COMMENT(varchar)
-------------------------- ---------- ---------- ---------- ---------- ---------
1 Initial,Approved, 08/08/2010,08/12/2010, BLaher,KReheran, product ready, real...,
2 Rejected, 09/13/2011, KGhracie, product not ready,
45 Initial,Approved,Rejected 08/08/10,03/02/11,04/15/11 BLaher,KReheran,KGhracie, n,n,n...,
Notice that all data related to a REQNUM is stored in the same row separated by ','
REQNUM is not Unique.
In SQL I need it to look like this:
REQNUM(int) STATUS(varchar) DATE(datetime) USERID(varchar) COMMENT(varchar)
-------------------------- ---------- ---------- ---------- ---------- ---------
1 Initial 08/08/2010 BLaher product ready
1 Approved 08/12/2010 KReheran realease approved
2 Rejected 09/13/2011 KGhracie product not ready
45 Initial 08/08/10 ... ...
45 Approved 03/02/11 ... ...
45 Rejected 04/15/11 ... ...
So I have started an application in C# to parse the data, however it is returned in column data and I cannot figure out how to put it back into row data. The code I am using to parse the data is below.
Any help is greatly appreciated!!!
I have data stored in a DB2 database which I am attempting to extract and put into a SQL Server 2005 db. My logic is to pull the db2 data into a DataSet so I can manipulate it, and then insert the DataSet into SQL.
Problem is - the data in db2 was stored in such a manner that I have to programatically parse through column data so they can be inserted into SQL as individual rows.
The data in the db2 server looks similare to this:
REQNUM(integer) STATUS(varchar) DATE(date) USERID(varchar) COMMENT(varchar)
--------------------------
1 Initial,Approved, 08/08/2010,08/12/2010, BLaher,KReheran, product ready, real...,
2 Rejected, 09/13/2011, KGhracie, product not ready,
45 Initial,Approved,Rejected 08/08/10,03/02/11,04/15/11
Notice that all data related to a REQNUM is stored in the same row separated by ','
REQNUM is not Unique.
In SQL I need it to look like this:
REQNUM(int) STATUS(varchar) DATE(datetime) USERID(varchar) COMMENT(varchar)
--------------------------
1 Initial 08/08/2010 BLaher product ready
1 Approved 08/12/2010 KReheran realease approved
2 Rejected 09/13/2011 KGhracie product not ready
45 Initial 08/08/10 ... ...
45 Approved 03/02/11 ... ...
45 Rejected 04/15/11 ... ...
So I have started an application in C# to parse the data, however it is returned in column data and I cannot figure out how to put it back into row data. The code I am using to parse the data is below.
Any help is greatly appreciated!!!
/// <summary>
/// Method parses data and substrings row data on ','
/// </summary>
/// <param name="view">data from DataGridView</param>
/// <returns>output array of CXHistory objects</returns>
public string[] parseData(DataGridView view)
{
int i = 0;
int count = 0;
List<string> cxHistoryObj = new List<string>();
int rowcount = ((view.Rows.Count) - 1);
// iterrate through entire row
for (int row = 0; row <= rowcount; row++)
{
int colCount = ((view.Columns.Count) - 1);
// once all columns are complete in a row, go to next row
for (int col = 2; col <= colCount; col++)
{
// rowStrSub is next row to parse
string rowStrSub = view.Rows[row].Cells[col].Value.ToString();
if (rowStrSub.Length != null)
{
if (!rowStrSub.EndsWith(","))
{
// check if ',' exists at end of string - add one if it does not
// for index purposes
rowStrSub = rowStrSub + ",";
}
// iterrate through the string and find all instances of a ','
for (int k = 0; k < rowStrSub.Length - 1; k = k)
{
count = rowStrSub.Split(',').Length - 1;
i = 0;
while ((i = rowStrSub.IndexOf(',', i)) != -1)
{
// retSubStr is returned sub string after indexed on ','
string retSubStr = (rowStrSub.Substring(k, i - k));
cxHistoryObj.Add(retSubStr);
k = i + 1;
i++;
}
}
}
}
}
string[] output = cxHistoryObj.ToArray();
//string msg = string.Format("Ouput size = {0}", output.Length);
//MessageBox.Show(msg);
return output;
}
why don't you use the String.split method? it will make your life much easier
ASKER
I guess I thought using indexOf was the most efficient, I suppose I could use String.Split, but would that take care of the problem at hand?
I have no trouble separating the strings with the following code, the problem is it puts them all into a single column.
I need it in rows?
I have no trouble separating the strings with the following code, the problem is it puts them all into a single column.
I need it in rows?
Hi I s,
DB2 supports recursive SQL that will convert columns to rows (or rows to columns). But if the maximum number of items on a single line is fixed (known), then you might be able to do this easier with straight SQL. Or split the text withing C#. All are very workable solutions.
Regarding recursive SQL, here's an Experts Exchange article that describes how to do it. Adding the logic to split multiple strings is pretty easy.
https://www.experts-exchange.com/Database/DB2/A_3618-Recursive-SQL-in-DB2-Converting-rows-to-columns.html
Kent
DB2 supports recursive SQL that will convert columns to rows (or rows to columns). But if the maximum number of items on a single line is fixed (known), then you might be able to do this easier with straight SQL. Or split the text withing C#. All are very workable solutions.
Regarding recursive SQL, here's an Experts Exchange article that describes how to do it. Adding the logic to split multiple strings is pretty easy.
https://www.experts-exchange.com/Database/DB2/A_3618-Recursive-SQL-in-DB2-Converting-rows-to-columns.html
Kent
I'm sorry, but I cannot see hoe the data could be stored in the DB2 table as described. The DATE column is defined as datetime - a field which cannot contain more than one date. Are you SURE that SELECT * FROM TABLE returns columns with multiple comma-separated values???
It sounds like you are querying an xml table, in which case there are xquery methods to parse out the data.
It sounds like you are querying an xml table, in which case there are xquery methods to parse out the data.
i don't understand the problem
you fetch a record from the db2 database and then for each column you do a split by comma
for each column you now have an array of values
you pick the array with most values and that is the number of rows you will insert
you either duplicate the id (or apply what ever logic you need) and add default values if some of the columns are missing some of the values (although, i don't think it is possible according to your description)
you fetch a record from the db2 database and then for each column you do a split by comma
for each column you now have an array of values
you pick the array with most values and that is the number of rows you will insert
you either duplicate the id (or apply what ever logic you need) and add default values if some of the columns are missing some of the values (although, i don't think it is possible according to your description)
ASKER
I apologize for the confusing question, I will post back with a more clear description of my problem. Just a little side tracked right now.
I appreciate your replies!
I appreciate your replies!
ASKER
@gmarino
You are correct about the date col. This is a history (TBLHISTORY) table and the dates are pulled from TBLMAIN and converted to varchar. The DB2 Table which I am working with is structured like this:
CREATE TABLE "XXXXXX"."TBLHISTORY" (
"REQNUM" INTEGER NOT NULL ,
"DATE" VARCHAR(600) ,
"STATUS" VARCHAR(300) ,
"USERID" VARCHAR(300) ,
"COMMENT" VARCHAR(3000) )
IN "XXXXX" ;
I am by no means a DBA in SQL or DB2, I was tasked to port the data from one server to the other by any means possible.
In TBLHISTORY for some reason unknown, someone had the idea to save the data as I had explained above. REQNUM can have many STATUS - For each STATUS change, there is a DATE,USERID, and COMMENT - so they inserted the data in a single row separated by a ',' ...
So, I hope this is a little more clear, in DB2 the data looks as such:
REQNUM DATE STATUS USERID COMMENT
-------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----
1 08/08/10,09/09/10,..., Initial,Change,Approved, Ken,Bob,Nicki, Comment1,Comment2,...
My code adds the following to List<string> cxHistoryObj; as such:
08/08/10
09/09/10
09/10/10
Initial
Change
Approved
Ken
Bob
Nicki
Comment1
Comment2
Comment3
I then want to put this data back into a DataTable as such:
1 08/08/10 Initial Ken Comment1
1 09/09/10 Change Bob Comment2
1 09/10/10 Approved Nicki Comment3
I hope this clears up my initial question, and thanks in advance for any assistance!
You are correct about the date col. This is a history (TBLHISTORY) table and the dates are pulled from TBLMAIN and converted to varchar. The DB2 Table which I am working with is structured like this:
CREATE TABLE "XXXXXX"."TBLHISTORY" (
"REQNUM" INTEGER NOT NULL ,
"DATE" VARCHAR(600) ,
"STATUS" VARCHAR(300) ,
"USERID" VARCHAR(300) ,
"COMMENT" VARCHAR(3000) )
IN "XXXXX" ;
I am by no means a DBA in SQL or DB2, I was tasked to port the data from one server to the other by any means possible.
In TBLHISTORY for some reason unknown, someone had the idea to save the data as I had explained above. REQNUM can have many STATUS - For each STATUS change, there is a DATE,USERID, and COMMENT - so they inserted the data in a single row separated by a ',' ...
So, I hope this is a little more clear, in DB2 the data looks as such:
REQNUM DATE STATUS USERID COMMENT
--------------------------
1 08/08/10,09/09/10,..., Initial,Change,Approved, Ken,Bob,Nicki, Comment1,Comment2,...
My code adds the following to List<string> cxHistoryObj; as such:
08/08/10
09/09/10
09/10/10
Initial
Change
Approved
Ken
Bob
Nicki
Comment1
Comment2
Comment3
I then want to put this data back into a DataTable as such:
1 08/08/10 Initial Ken Comment1
1 09/09/10 Change Bob Comment2
1 09/10/10 Approved Nicki Comment3
I hope this clears up my initial question, and thanks in advance for any assistance!
ASKER
@kdo
Kent,
Great post, could this be done in a sp? I am trying to wrap my head around how I might split the strings in each column where a ',' exists, and then use recursive sql to put them into row data.
I'll post back soon
Kent,
Great post, could this be done in a sp? I am trying to wrap my head around how I might split the strings in each column where a ',' exists, and then use recursive sql to put them into row data.
I'll post back soon
Hi I_s,
You can certainly wrap a stored procedure around that. You just need to add a couple of columns. Be glad to do that for you.
The challenge may be the data, no matter which approach you use. The comment field could well contain embedded commas. The Query will always split the string on the next comma, not necessarily the original separator.
Kent
You can certainly wrap a stored procedure around that. You just need to add a couple of columns. Be glad to do that for you.
The challenge may be the data, no matter which approach you use. The comment field could well contain embedded commas. The Query will always split the string on the next comma, not necessarily the original separator.
Kent
ASKER
Basically I am attempting to do the opposite,
Where your example looks like this:
SELECT * FROM rec;
SNUM WORDNUM WORD
1 1 This
1 2 is
1 3 a
1 4 fine
1 5 example
Mine looks like this:
SNUM WORDNUM WORD
1 1,2,3,4,5, This,is,a,fine,example,
And I want your example as a result!
Where your example looks like this:
SELECT * FROM rec;
SNUM WORDNUM WORD
1 1 This
1 2 is
1 3 a
1 4 fine
1 5 example
Mine looks like this:
SNUM WORDNUM WORD
1 1,2,3,4,5, This,is,a,fine,example,
And I want your example as a result!
ASKER
kdo,
They were cognizant of embedded commas!
All embedded commas were replaced with the symbol '#@#' - so, that should not present a problem!
They were cognizant of embedded commas!
All embedded commas were replaced with the symbol '#@#' - so, that should not present a problem!
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Again great post! Exactly what I needed,
Im going to except it as the solution, do you mind if I post back with any followup questions? I do not have much sql exposure...
Thanks for your assistance!
Im going to except it as the solution, do you mind if I post back with any followup questions? I do not have much sql exposure...
Thanks for your assistance!
Hi I_s,
Feel free to ask more questions. That's what Experts Exchange is all about. :)
Grasping recursive SQL can be quite a challenge, especially if one is not well schooled in the principals of recursion and at least one recursive language. Still, it's powerful stuff.
I've taken the liberty of expanding the query to generate the data that you want. Just change the table name on line 49 to match your needs. I've not tested it, but it should be pretty close. If it gives you any trouble, just post back and we'll debug it.
Kent
Feel free to ask more questions. That's what Experts Exchange is all about. :)
Grasping recursive SQL can be quite a challenge, especially if one is not well schooled in the principals of recursion and at least one recursive language. Still, it's powerful stuff.
I've taken the liberty of expanding the query to generate the data that you want. Just change the table name on line 49 to match your needs. I've not tested it, but it should be pretty close. If it gives you any trouble, just post back and we'll debug it.
Kent
WITH pquery (reqnum, subrow, status, date, userid, comment, status_string, date_string, userid_string, comment_string )
AS
(
-- Select the first token from the result strings
SELECT reqnum, 1,
CASE WHEN LOCATE (',', status) > 0 THEN
SUBSTR (status, 1, LOCATE (',', status) - 1)
ELSE
status
END status,
CASE WHEN LOCATE (',', date) > 0 THEN
SUBSTR (date, 1, LOCATE (',', date) - 1)
ELSE
date
END date,
CASE WHEN LOCATE (',', userid) > 0 THEN
SUBSTR (userid, 1, LOCATE (',', userid) - 1)
ELSE
userid
END userid,
CASE WHEN LOCATE (',', comment) > 0 THEN
SUBSTR (comment, 1, LOCATE (',', comment) - 1)
ELSE
comment
END comment,
-- Select the remainder strings that will be passed back to the query
CASE WHEN locate (',', status) > 0 THEN
LTRIM (SUBSTR (status, LOCATE (',', status) + 1))
ELSE
NULL
END status_string,
CASE WHEN locate (',', date) > 0 THEN
LTRIM (SUBSTR (sentence, LOCATE (',', date) + 1))
ELSE
NULL
END date_string,
CASE WHEN locate (',', userid) > 0 THEN
LTRIM (SUBSTR (userid, LOCATE (',', userid) + 1))
ELSE
NULL
END userid_string,
CASE WHEN locate (',', comment) > 0 THEN
LTRIM (SUBSTR (comment, LOCATE (',', comment) + 1))
ELSE
NULL
END comment_string
FROM {yourtable} base
UNION ALL
-- Now select the first (next) token in each of the remainder strings
SELECT reqnum, subrow + 1,
CASE WHEN LOCATE (',', status_string) > 0 then
SUBSTR (status_string, 1, LOCATE (',', status_string) - 1)
ELSE
status_string
END status,
CASE WHEN LOCATE (',', userid_string) > 0 then
SUBSTR (userid_string, 1, LOCATE (',', userid_string) - 1)
ELSE
userid_string
END userid,
CASE WHEN LOCATE (',', date_string) > 0 then
SUBSTR (date_string, 1, LOCATE (',', date_string) - 1)
ELSE
date_string
END date,
CASE WHEN LOCATE (',', comment_string) > 0 then
SUBSTR (comment_string, 1, LOCATE (',', comment_string) - 1)
ELSE
comment_string
END comment,
-- And build the new remainder strings
CASE WHEN LOCATE (',', status_string) > 0 THEN
LTRIM (SUBSTR (status_string, LOCATE (',', status_string) + 1))
ELSE
NULL
END status_string
CASE WHEN LOCATE (',', userid_string) > 0 THEN
LTRIM (SUBSTR (userid_string, LOCATE (',', userid_string) + 1))
ELSE
NULL
END userid_string
CASE WHEN LOCATE (',', date_string) > 0 THEN
LTRIM (SUBSTR (date_string, LOCATE (',', date_string) + 1))
ELSE
NULL
END date_string
CASE WHEN LOCATE (',', comment_string) > 0 THEN
LTRIM (SUBSTR (comment_string, LOCATE (',', comment_string) + 1))
ELSE
NULL
END comment_string
FROM pquery t0
WHERE t0.status_string IS NOT NULL
OR t0.userid_string IS NOT NULL
OR t0.date_string IS NOT NULL
OR t0.comment_string IS NOT NULL
)
SELECT *
FROM pquery pq;
ASKER
Thanks for the code, as it stands, I recieve the following errors.
I am using SQL Server 2005, is LOCATE not compatible?
Msg 195, Level 15, State 10, Line 10
'LOCATE' is not a recognized built-in function name.
Msg 195, Level 15, State 10, Line 59
'LOCATE' is not a recognized built-in function name.
Msg 128, Level 15, State 1, Line 86
The name "userid_string" is not permitted in this context. Valid expressions are constants, constant expressions, and (in some contexts) variables. Column names are not permitted.
Msg 128, Level 15, State 1, Line 91
The name "date_string" is not permitted in this context. Valid expressions are constants, constant expressions, and (in some contexts) variables. Column names are not permitted.
Msg 128, Level 15, State 1, Line 96
The name "comment_string" is not permitted in this context. Valid expressions are constants, constant expressions, and (in some contexts) variables. Column names are not permitted.
I am using SQL Server 2005, is LOCATE not compatible?
Msg 195, Level 15, State 10, Line 10
'LOCATE' is not a recognized built-in function name.
Msg 195, Level 15, State 10, Line 59
'LOCATE' is not a recognized built-in function name.
Msg 128, Level 15, State 1, Line 86
The name "userid_string" is not permitted in this context. Valid expressions are constants, constant expressions, and (in some contexts) variables. Column names are not permitted.
Msg 128, Level 15, State 1, Line 91
The name "date_string" is not permitted in this context. Valid expressions are constants, constant expressions, and (in some contexts) variables. Column names are not permitted.
Msg 128, Level 15, State 1, Line 96
The name "comment_string" is not permitted in this context. Valid expressions are constants, constant expressions, and (in some contexts) variables. Column names are not permitted.
ASKER
I just realized that the error is probably becuase I am writing the query in SSMS... and not DB2..
Yep. That's a DB2 query. To run it in SQL Server you'll have to replace the LOCATE function with CHARINDEX.
Kent
ASKER
Works great Kent,
Thanks again for your help! I can definetly use recursive sql in the future!
Also, for anyone who may look at this post, I was able to do this successfully using C# as well with the following code:
Thanks again for your help! I can definetly use recursive sql in the future!
Also, for anyone who may look at this post, I was able to do this successfully using C# as well with the following code:
/// <summary>
/// getCXHistoryData is a method which makes a connection to the DB2 udb and returns a CXHistory Obj.
/// </summary>
/// <returns>cxhObjectsList is List of CXHistory Obj's</returns>
public List<CXHISTORY> getCXHistoryData(){
OleDbCommand oledbCMD;
OleDbConnection oledbCONN;
OleDbDataReader oledbReader;
cxhObjectsList = new List<CXHISTORY>();
errors = new List<string>();
try
{
using (oledbCONN = new OleDbConnection(ApplicationConsts.DB2_CONNECTION_STRING))
{
oledbCMD = new OleDbCommand(ApplicationConsts.HISTORY_QUERY, oledbCONN);
oledbCONN.Open();
using (oledbReader = oledbCMD.ExecuteReader())
{
while (oledbReader.Read())
{
int requestnum = ((int)oledbReader["REQUESTNUMBER"]);
string finstatus = oledbReader["FINALSTATUS"].ToString();
string dates = oledbReader["DATE"].ToString();
string userId = oledbReader["USERID"].ToString().ToUpper();
string status = oledbReader["STATUS"].ToString();
string comment = oledbReader["COMMENT"].ToString().Replace("@#@", ",");
string[] rowDates = dates.Split(',');
string[] rowUserId = userId.Split(',');
string[] rowStatus = status.Split(',');
string[] rowComment = comment.Split(',');
bool equal = rowDates.Length == rowUserId.Length && rowStatus.Length == rowComment.Length && rowDates.Length == rowStatus.Length;
if (!equal)
{
string error = string.Format("Error: REQUESTNUMBER: {0} Inconsistent number of entries found in row data.", requestnum);
errors.Add(error);
//throw new Exception(error);
}
else
{
for (int i = 0; i < rowDates.Length - 1; i++)
{
CXHISTORY cxhObj = new CXHISTORY(requestnum, finstatus, rowDates[i], rowStatus[i], rowUserId[i], rowComment[i]);
cxhObjectsList.Add(cxhObj);
i++;
}
}
}
}
}
}
catch(Exception e)
{
MessageBox.Show(e.ToString());
}
return cxhObjectsList;
}