parsing CSV files in C#

I need to parse a couple of CSV files (varying in size from a few KB to around 100 MB) and load the data to a SQL 2000 database.

I am using CsvReader from CodeProject (http://www.codeproject.com/KB/database/CsvReader.aspx) and it seems to work great for reading large CSV files.

Here is the structure of the CSV files I am dealing with:

Field1   FIeld1_CHG   Field2   Field2_CHG

So, I need to read the <fieldname>_CHG column. If I see a "Y", I need to load the data from <fieldname>, otherwise ignore it.

I have around 60+ field names in the CSV files and I need a way to map those to the database fields.

I started by creating a class for the field names.

THen I thought I would be able to map each field in the CSV file to a class member. Something like:
            UpdateElement e = new UpdateElement();
            Hashtable mappingTable = new Hashtable();
            mappingTable.Add("prefix", e.Prefix);
            mappingTable.Add("suffix", e.Suffix);

Then when I read the data, I could check to see if each column from the CSV exists in the keyscollection of the hashtable. Something like:

                                    foreach (string field in fieldHeaders)
                                    {
                                        if (mappingTable.ContainsKey(field.ToLower().Trim()))
                                        {
                                            // check for a "Y" on the <fieldname>_CHG field
                                            if (csv[field + "_CHG"].ToUpper() == "Y")
                                            {
                                                mappingTable[field.ToLower().Trim()] = csv[field];
                                            }
                                        }
                                    }

I don't even know if my logic is correct. I guess I need to know if this is the correct approach. If not, what's the best way to do this?

I don't know how easy it will be to implement Bulk insert in this scenario since the columns from the CSV won't match the columns of the destination table.
IUAATechAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

JimBrandleyCommented:
Can you post some lines from one of your CSV files, and a description of the DB table or tables involved?

Jim
0
IUAATechAuthor Commented:
Here is a sample of the CSV file:
ID	ID_CHG	DATE	DATE_CHG	PREFIX	PREFIX_CHG	FIRST	FIRST_CHG	MIDDLE	MIDDLE_CHG	LAST	LAST_CHG	SUFFIX	SUFFIX_CHG
1234	Y	50407	Y	Dr	Y	John	N	Neil	N	Smith	N		N
2345	Y	50807	Y		N	David	N	Frederick	N	Walker	N	Jr	Y

Open in new window

0
IUAATechAuthor Commented:
The formatting looks a little weird on that one. Let me try again:
ID	ID_CHG	DATE	DATE_CHG	PREFIX	PREFIX_CHG	FIRST	FIRST_CHG	MIDDLE	MIDDLE_CHG	LAST	LAST_CHG	SUFFIX	SUFFIX_CHG
1234	Y	50407	Y	Dr	Y	John	N	Neil	N	Smith	N		N
2345	Y	50807	Y		N	David	N	Frederick	N	Walker	N	Jr	Y

Open in new window

0
Learn SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

IUAATechAuthor Commented:
The formatting is still not correct. But I hope you get the idea.

As for the database, the table is pretty big since we use it to load updates from other sources too - xml files, web updates, etc.

Anyways, here is a part of the table:

CREATE TABLE [dbo].[UpdatedPeople] (
      [UpdateId] [int] IDENTITY (1, 1) NOT NULL ,
      [PersonId] [char] (8) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
      [LastName] [varchar] (65) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
      [FirstName] [varchar] (65) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
      [MidName] [varchar] (65) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
      [Prefix] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
      [Suffix] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
      [FullName] [varchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
0
IUAATechAuthor Commented:
And I should mention that the table has lots of extra fields that does not apply to the columns from CSV.
0
JimBrandleyCommented:
Don't worry about the CSV formatting. The first one was fine. A few questions:
1. Does the insert take place if any column is updated - I think so, but want to make sure?
2. Do all lines in the CSV file include at least one updated column?
3. Do you insert values for all columns, or only those that have updates?
4. Can a single PersonId appear more than once in the same file?

Jim
0
IUAATechAuthor Commented:
Here are my answers:
1. Yes, However, let me mention that if any column is updated, then we want to insert ID, Date, First Name, Middle Name and Last Name regardless of whether they were updated or not.
2. ID and Date will always have a "Y", meaning they were updated. However, if there aren't any other fields that were updated, then we should ignore this row.
3. I insert values for columns that only have updates.
4. I am not sure about this. But to be on the safe side, I would say yes.
0
IUAATechAuthor Commented:
one more thing - since we get updates files from lots of sources, we would like to map each column of the CSV file to the corresponding column of the update table. However, if there is a column in CSV that we can't map, then I would like to insert this (along with any other column that we could not map) into a 'miscellaneous' field.

Hope this makes sense :)
0
JimBrandleyCommented:
I understand the part about the Miscellaneous column. I was hoping the answer to question 4 would be a definate No. If you have files of several hundred MB, I am worried about the probability of running out of memory before the file can be processed.

Jim
0
IUAATechAuthor Commented:
Most of the files are less than 5 MB. As for the bigger files (it's a small number), I could always break them up into smaller files.

So that should not be a porblem.
0
JimBrandleyCommented:
Good. Then one last (I think) question: For the miscellaneous column, how would you want deal with a line that contained more than one additional update - say ADDRESS and PHONE?

Jim
0
JimBrandleyCommented:
One more (last) question: Is there header info in the file to identify the columns?

Jim
0
IUAATechAuthor Commented:
I think the best way to deal with this is grab the name of the column from the CSV and write the value next to it. We could then add a link break and add the next column. Somethin like:

ADDRESS: <new address>
PHONE: <new phone>
0
IUAATechAuthor Commented:
if you mean does it have a header row, then yes. If you mean does it have information to id the headers, then no. In this case, I will have to manually map each header column to a database column.
0
JimBrandleyCommented:
OK - I'll play with this a while and see what develops.

Jim
0
IUAATechAuthor Commented:
Thanks a ton. In the meantime, I need to grab a bite.
0
JimBrandleyCommented:
Here's a first pass on how I would attack it. First a class to contain and sort out the data, then a simple roughed-in method to make use of that class.

Jim

public class UpdateElement
{
   private string mPersonID = string.Empty;
   private string mLastName = string.Empty;
   private string mFirstName = string.Empty;
   private string mMidName = string.Empty;
   private string mPrefix = string.Empty;
   private string mSuffix = string.Empty;
   private string mMiscellaneous = string.Empty;
   private string mDate = string.Empty;
 
   public string PersonID
   {
      get { return mPersonID; }
   }
   public string Date
   {
      get { return mDate; }
      set { mDate = value; }
   }
   public string LastName
   {
      get { return mLastName; }
      set { mLastName = value; }
   }
   public string FirstName
   {
      get { return mFirstName; }
      set { mFirstName = value; }
   }
   public string MidName
   {
      get { return mMidName; }
      set { mMidName = value; }
   }
   public string Prefix
   {
      get { return mPrefix; }
      set { mPrefix = value; }
   }
   public string Suffix
   {
      get { return mSuffix; }
      set { mSuffix = value; }
   }
   public string FullName
   {
      get
      {
         bool empty = true;
         StringBuilder sb = new StringBuilder(128);
         if (mPrefix.Length > 0)
         {
            empty = false;
            sb.Append(mPrefix);
         }
         if (mFirstName.Length > 0)
         {
            if (!empty)
               sb.Append(" ");
            empty = false;
            sb.Append(mFirstName);
         }
         if (mMidName.Length > 0)
         {
            if (!empty)
               sb.Append(" ");
            empty = false;
            sb.Append(mMidName);
         }
         if (mLastName.Length > 0)
         {
            if (!empty)
               sb.Append(" ");
            empty = false;
            sb.Append(mLastName);
         }
         if (mSuffix.Length > 0)
         {
            if (!empty)
               sb.Append(" ");
            sb.Append(mSuffix);
         }
         return sb.ToString();
      }
   }
   public string Miscellaneous
   {
      get { return mMiscellaneous; }
      set { mMiscellaneous = value; }
   }
 
   public UpdateElement(string personID)
   {
      mPersonID = personID;
   }
 
   public void AddProperty(string name, string value)
   {
      switch (name.ToUpper())
      {
         case "PREFIX":
            Prefix = value;
            break;
         case "FIRST":
            FirstName = value;
            break;
         case "MIDDLE":
            MidName = value;
            break;
         case "LAST":
            LastName = value;
            break;
         case "SUFFIX":
            Suffix = value;
            break;
         default: // Add to Miscellaneous
            if (Miscellaneous.Length == 0)
               Miscellaneous = name + ": " + value;
            break;
      }
   }
}
 
And here's a method to make use of that class.
 
public void TestPersonHandler()
{
   List<UpdateElement> updates = new List<UpdateElement>(1000);
   // The hashtable contains the ID as a key, and an index into the list above.
   Hashtable entries = new Hashtable();
   bool complete = false;
   UpdateElement current;
   string name = string.Empty;  // This contains the column identifier from the CSV file;
   string value = string.Empty; // This contains the value for that columns from the csv file.
 
   // This is nebulous because I'm not certain what you are getting from the reader,
   // but it should give you the idea.
   while (!complete)
   {
      // Get next person to update from the csv reader.
      string personID = "xxx"; // <-- something frm the reader
      object index = entries[personID];
      if (value == null) //This is a new one
      {
         current = new UpdateElement(personID);
         entries.Add(personID, updates.Count);
      }
      else
      {
         current = updates[(int)index];
      }
      // Loop over the list of values from this line in the file,
      // If updated or required
      {
         name = "yyy"; // column header from csv file for this entry.
         value = "zzz"; // Value for this column from the csv file.
         current.AddProperty( name, value);
      }
 
      // Get the next line fro mthe csv file.
      // If no more,
      {
         complete = true;
      }
   }
 
   // Now, updates contains the organized data for the rows to insert into the DB.
}

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
IUAATechAuthor Commented:
Thanks. I will try it out and will let you know the outcome.
0
JimBrandleyCommented:
OK - I just noticed I didn't quite complete the Miscellaneous handler.

default:      // Add to Miscellaneous
if (Miscellaneous.Length == 0)
   Miscellaneous = name + ": " + value;
else
   Miscellaneous = Miscellaneous + "; " + name + ": " + value;
break;

Jim
0
IUAATechAuthor Commented:
Fantastic. Works very well. Thanks a lot.
0
JimBrandleyCommented:
My pleasure. Good luck.

Jim
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C#

From novice to tech pro — start learning today.