Link to home
Start Free TrialLog in
Avatar of I_s
I_s

asked on

C# - Parse through DB2 column data and INSERT the data as new rows into SQL

Hi everyone!

I have data stored in a DB2 database which I am attempting to extract and put into a SQL Server 2005 db. My logic is to pull the db2 data into a DataSet so I can manipulate it, and then insert the DataSet into SQL.

Problem is - the data in db2 was stored in such a manner that I have to programatically parse through column data so they can be inserted into SQL as individual rows.

The data in the db2 server looks similare to this:

REQNUM(integer)        STATUS(varchar)        DATE(date)        USERID(varchar)        COMMENT(varchar)  
---------------------------------------------------------------------------
1        Initial,Approved,        08/08/2010,08/12/2010,        BLaher,KReheran,        product ready, real...,      
2        Rejected,        09/13/2011,        KGhracie,        product not ready,
45      Initial,Approved,Rejected        08/08/10,03/02/11,04/15/11         BLaher,KReheran,KGhracie,  n,n,n...,

Notice that all data related to a REQNUM  is stored in the same row separated by ','
REQNUM is not Unique.

In SQL I need it to look like this:

REQNUM(int)        STATUS(varchar)        DATE(datetime)        USERID(varchar)        COMMENT(varchar)  
---------------------------------------------------------------------------
1        Initial        08/08/2010        BLaher        product ready
1        Approved       08/12/2010        KReheran         realease approved      
2        Rejected        09/13/2011        KGhracie        product not ready
45      Initial        08/08/10         ...         ...
45      Approved        03/02/11       ...          ...
45      Rejected        04/15/11       ...          ...                        


So I have started an application in C# to parse the data, however it is returned in column data and I cannot figure out how to put it back into row data. The code I am using to parse the data is below.

Any help is greatly appreciated!!!
/// <summary>
        /// Method parses data and substrings row data on ','
        /// </summary>
        /// <param name="view">data from DataGridView</param>
        /// <returns>output array of CXHistory objects</returns>
        public string[] parseData(DataGridView view)
        {
            int i = 0;
            int count = 0;
            List<string> cxHistoryObj = new List<string>();
            int rowcount = ((view.Rows.Count) - 1);
            // iterrate through entire row
            for (int row = 0; row <= rowcount; row++)
            {
                int colCount = ((view.Columns.Count) - 1);
                // once all columns are complete in a row, go to next row
                for (int col = 2; col <= colCount; col++)
                {
                    // rowStrSub is next row to parse
                    string rowStrSub = view.Rows[row].Cells[col].Value.ToString();
                    if (rowStrSub.Length != null)
                    {
                        if (!rowStrSub.EndsWith(","))
                        {
                            // check if ',' exists at end of string - add one if it does not  
                            // for index purposes
                            rowStrSub = rowStrSub + ",";
                        }
                        // iterrate through the string and find all instances of a ','
                        for (int k = 0; k < rowStrSub.Length - 1; k = k)
                        {
                            count = rowStrSub.Split(',').Length - 1;
                            i = 0;
                            while ((i = rowStrSub.IndexOf(',', i)) != -1)
                            {
                                // retSubStr is returned sub string after indexed on ','
                                string retSubStr = (rowStrSub.Substring(k, i - k));
                                cxHistoryObj.Add(retSubStr);
                                k = i + 1;  
                                i++;
                            }   
                        }
                    }
                }
            }
            string[] output = cxHistoryObj.ToArray();
            //string msg = string.Format("Ouput size = {0}", output.Length);
            //MessageBox.Show(msg);

            return output;
         }

Open in new window

Avatar of momi_sabag
momi_sabag
Flag of United States of America image

why don't you use the String.split method? it will make your life much easier
Avatar of I_s
I_s

ASKER

I guess I thought using indexOf was the most efficient, I suppose I could use String.Split, but would that take care of the problem at hand?

I have no trouble separating the strings with the following code, the problem is it puts them all into a single column.

I need it in rows?
Avatar of Kent Olsen
Hi I s,

DB2 supports recursive SQL that will convert columns to rows (or rows to columns).  But if the maximum number of items on a single line is fixed (known), then you might be able to do this easier with straight SQL.  Or split the text withing C#.  All are very workable solutions.

Regarding recursive SQL, here's an Experts Exchange article that describes how to do it.  Adding the logic to split multiple strings is pretty easy.

  https://www.experts-exchange.com/Database/DB2/A_3618-Recursive-SQL-in-DB2-Converting-rows-to-columns.html


Kent
I'm sorry, but I cannot see hoe the data could be stored in the DB2 table as described.  The DATE column is defined as datetime - a field which cannot contain more than one date.  Are you SURE that SELECT * FROM TABLE returns columns with multiple comma-separated values???  

It sounds like you are querying an xml table, in which case there are xquery methods to parse out the data.
i don't understand the problem
you fetch a record from the db2 database and then for each column you do a split by comma
for each column you now have an array of values
you pick the array with most values and that is the number of rows you will insert
you either duplicate the id (or apply what ever logic you need) and add default values if some of the columns are missing some of the values (although, i don't think it is possible according to your description)
Avatar of I_s

ASKER

I apologize for the confusing question, I will post back with a more clear description of my problem. Just a little side tracked right now.

I appreciate your replies!
Avatar of I_s

ASKER

@gmarino

You are correct about the date col. This is a history (TBLHISTORY) table and the dates are pulled from TBLMAIN and converted to varchar. The DB2 Table which I am working with is structured like this:

CREATE TABLE "XXXXXX"."TBLHISTORY"  (
              "REQNUM" INTEGER NOT NULL ,
              "DATE" VARCHAR(600) ,
              "STATUS" VARCHAR(300) ,
              "USERID" VARCHAR(300) ,
              "COMMENT" VARCHAR(3000) )  
             IN "XXXXX" ;

I am by no means a DBA in SQL or DB2, I was tasked to port the data from one server to the other by any means possible.

In TBLHISTORY for some reason unknown, someone had the idea to save the data as I had explained above. REQNUM can have many STATUS - For each STATUS change, there is a DATE,USERID, and COMMENT - so they inserted the data in a single row separated by a ',' ...

So, I hope this is a little more clear, in DB2 the data looks as such:

REQNUM                 DATE                                     STATUS                            USERID                    COMMENT
----------------------------------------------------------------------------------------------------------------------------------
1                   08/08/10,09/09/10,...,       Initial,Change,Approved,       Ken,Bob,Nicki,      Comment1,Comment2,...

My code adds the following to List<string> cxHistoryObj; as such:

08/08/10
09/09/10
09/10/10
Initial
Change
Approved
Ken
Bob
Nicki
Comment1
Comment2
Comment3

I then want to put this data back into a DataTable as such:

1     08/08/10     Initial         Ken     Comment1
1     09/09/10     Change     Bob      Comment2
1     09/10/10     Approved  Nicki     Comment3


I hope this clears up my initial question, and thanks in advance for any assistance!
Avatar of I_s

ASKER

@kdo

Kent,

Great post, could this be done in a sp? I am trying to wrap my head around how I might split the strings in each column where a ',' exists, and then use recursive sql to put them into row data.

I'll post back soon
Hi I_s,

You can certainly wrap a stored procedure around that.  You just need to add a couple of columns.  Be glad to do that for you.

The challenge may be the data, no matter which approach you use.  The comment field could well contain embedded commas.  The Query will always split the string on the next comma, not necessarily the original separator.


Kent
Avatar of I_s

ASKER

Basically I am attempting to do the opposite,

Where your example looks like this:


SELECT * FROM rec;

SNUM  WORDNUM    WORD
1     1          This
1     2          is
1     3          a
1     4          fine
1     5          example

Mine looks like this:

SNUM  WORDNUM    WORD
1     1,2,3,4,5,          This,is,a,fine,example,

And I want your example as a result!

Avatar of I_s

ASKER

kdo,

They were cognizant of embedded commas!

All embedded commas were replaced with the symbol '#@#' - so, that should not present a problem!
ASKER CERTIFIED SOLUTION
Avatar of Kent Olsen
Kent Olsen
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of I_s

ASKER

Again great post! Exactly what I needed,

Im going to except it as the solution, do you mind if I post back with any followup questions? I do not have much sql exposure...

Thanks for your assistance!
Hi I_s,

Feel free to ask more questions.  That's what Experts Exchange is all about.  :)

Grasping recursive SQL can be quite a challenge, especially if one is not well schooled in the principals of recursion and at least one recursive language.  Still, it's powerful stuff.

I've taken the liberty of expanding the query to generate the data that you want.  Just change the table name on line 49 to match your needs.  I've not tested it, but it should be pretty close.  If it gives you any trouble, just post back and we'll debug it.


Kent

WITH pquery (reqnum, subrow, status, date, userid, comment, status_string, date_string, userid_string, comment_string )
AS
(
--  Select the first token from the result strings
  SELECT reqnum, 1, 
    CASE WHEN LOCATE (',', status) > 0 THEN 
      SUBSTR (status, 1, LOCATE (',', status) - 1)
    ELSE
      status
    END status,
    CASE WHEN LOCATE (',', date) > 0 THEN 
      SUBSTR (date, 1, LOCATE (',', date) - 1)
    ELSE
      date
    END date,
    CASE WHEN LOCATE (',', userid) > 0 THEN 
      SUBSTR (userid, 1, LOCATE (',', userid) - 1)
    ELSE
      userid
    END userid,
    CASE WHEN LOCATE (',', comment) > 0 THEN 
      SUBSTR (comment, 1, LOCATE (',', comment) - 1)
    ELSE
      comment
    END comment,
    
--  Select the remainder strings that will be passed back to the query
    CASE WHEN locate (',', status) > 0 THEN
      LTRIM (SUBSTR (status, LOCATE (',', status) + 1))
    ELSE
      NULL
    END status_string,
    CASE WHEN locate (',', date) > 0 THEN
      LTRIM (SUBSTR (sentence, LOCATE (',', date) + 1))
    ELSE
      NULL
    END date_string,
    CASE WHEN locate (',', userid) > 0 THEN
      LTRIM (SUBSTR (userid, LOCATE (',', userid) + 1))
    ELSE
      NULL
    END userid_string,
    CASE WHEN locate (',', comment) > 0 THEN
      LTRIM (SUBSTR (comment, LOCATE (',', comment) + 1))
    ELSE
      NULL
    END comment_string
    
  FROM {yourtable} base 
  
  UNION ALL

--  Now select the first (next) token in each of the remainder strings
  SELECT reqnum, subrow + 1, 
    CASE WHEN LOCATE (',', status_string) > 0 then 
      SUBSTR (status_string, 1, LOCATE (',', status_string) - 1)
    ELSE
      status_string
    END status,
    CASE WHEN LOCATE (',', userid_string) > 0 then 
      SUBSTR (userid_string, 1, LOCATE (',', userid_string) - 1)
    ELSE
      userid_string
    END userid,
    CASE WHEN LOCATE (',', date_string) > 0 then 
      SUBSTR (date_string, 1, LOCATE (',', date_string) - 1)
    ELSE
      date_string
    END date,
    CASE WHEN LOCATE (',', comment_string) > 0 then 
      SUBSTR (comment_string, 1, LOCATE (',', comment_string) - 1)
    ELSE
      comment_string
    END comment,

--  And build the new remainder strings    
    CASE WHEN LOCATE (',', status_string) > 0 THEN
      LTRIM (SUBSTR (status_string, LOCATE (',', status_string) + 1))
    ELSE
      NULL
    END status_string
    CASE WHEN LOCATE (',', userid_string) > 0 THEN
      LTRIM (SUBSTR (userid_string, LOCATE (',', userid_string) + 1))
    ELSE
      NULL
    END userid_string
    CASE WHEN LOCATE (',', date_string) > 0 THEN
      LTRIM (SUBSTR (date_string, LOCATE (',', date_string) + 1))
    ELSE
      NULL
    END date_string
    CASE WHEN LOCATE (',', comment_string) > 0 THEN
      LTRIM (SUBSTR (comment_string, LOCATE (',', comment_string) + 1))
    ELSE
      NULL
    END comment_string
  FROM pquery t0
  WHERE t0.status_string IS NOT NULL
     OR t0.userid_string IS NOT NULL
     OR t0.date_string IS NOT NULL
     OR t0.comment_string IS NOT NULL
)
SELECT * 
FROM pquery pq;

Open in new window

Avatar of I_s

ASKER

Thanks for the code, as it stands, I recieve the following errors.

I am using SQL Server 2005, is LOCATE not compatible?

Msg 195, Level 15, State 10, Line 10
'LOCATE' is not a recognized built-in function name.
Msg 195, Level 15, State 10, Line 59
'LOCATE' is not a recognized built-in function name.
Msg 128, Level 15, State 1, Line 86
The name "userid_string" is not permitted in this context. Valid expressions are constants, constant expressions, and (in some contexts) variables. Column names are not permitted.
Msg 128, Level 15, State 1, Line 91
The name "date_string" is not permitted in this context. Valid expressions are constants, constant expressions, and (in some contexts) variables. Column names are not permitted.
Msg 128, Level 15, State 1, Line 96
The name "comment_string" is not permitted in this context. Valid expressions are constants, constant expressions, and (in some contexts) variables. Column names are not permitted.
Avatar of I_s

ASKER

I just realized that the error is probably becuase I am writing the query in SSMS... and not DB2..

Yep.  That's a DB2 query.  To run it in SQL Server you'll have to replace the LOCATE function with CHARINDEX.


Kent
Avatar of I_s

ASKER

Works great Kent,

Thanks again for your help! I can definetly use recursive sql in the future!


Also, for anyone who may look at this post, I was able to do this successfully using C# as well with the following code:
/// <summary>
        /// getCXHistoryData is a method which makes a connection to the DB2 udb and returns a CXHistory Obj.
        /// </summary>
        /// <returns>cxhObjectsList is List of CXHistory Obj's</returns>
        public List<CXHISTORY> getCXHistoryData(){
            OleDbCommand oledbCMD;
            OleDbConnection oledbCONN;
            OleDbDataReader oledbReader;
            cxhObjectsList = new List<CXHISTORY>();
            errors = new List<string>();
            try
            {
                using (oledbCONN = new OleDbConnection(ApplicationConsts.DB2_CONNECTION_STRING))
                {
                    oledbCMD = new OleDbCommand(ApplicationConsts.HISTORY_QUERY, oledbCONN);
                    oledbCONN.Open();
                    using (oledbReader = oledbCMD.ExecuteReader())
                    {
                        while (oledbReader.Read())
                        {
                            int requestnum = ((int)oledbReader["REQUESTNUMBER"]);
                            string finstatus = oledbReader["FINALSTATUS"].ToString();
                            string dates = oledbReader["DATE"].ToString();
                            string userId = oledbReader["USERID"].ToString().ToUpper();
                            string status = oledbReader["STATUS"].ToString();
                            string comment = oledbReader["COMMENT"].ToString().Replace("@#@", ",");

                            string[] rowDates = dates.Split(',');
                            string[] rowUserId = userId.Split(',');
                            string[] rowStatus = status.Split(',');
                            string[] rowComment = comment.Split(',');

                            bool equal = rowDates.Length == rowUserId.Length && rowStatus.Length == rowComment.Length && rowDates.Length == rowStatus.Length;
                            if (!equal)
                            {
                                string error = string.Format("Error: REQUESTNUMBER: {0} Inconsistent number of entries found in row data.", requestnum);
                                errors.Add(error);
                                //throw new Exception(error);
                            }
                            else
                            {
                                for (int i = 0; i < rowDates.Length - 1; i++)
                                {
                                    CXHISTORY cxhObj = new CXHISTORY(requestnum, finstatus, rowDates[i], rowStatus[i], rowUserId[i], rowComment[i]);
                                    cxhObjectsList.Add(cxhObj);
                                    i++;
                                }
                            }
                        }
                    }
                }
            }
            catch(Exception e)
            {
                MessageBox.Show(e.ToString());
            }
            return cxhObjectsList;
        }

Open in new window