Solved

Reading from a csv file

Posted on 1998-07-29
10
216 Views
Last Modified: 2010-04-02
Hi,
i'm actually confronted to the problem to read data from an exsiting csv file into a memory structure. Has anyone already developed routines which are able to read from a csv file? It also have to deal with quoting like ...,...,"...,..." and "" for a simple double quote mark. I would be very happy if someone else has the source code. Any help would be appreciated.
Many thanks in advance
0
Comment
Question by:trouvain
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
10 Comments
 
LVL 2

Expert Comment

by:rayb
ID: 1168910
Look into creating a text file based ODBC datasource.  This will help you a great deal.  It's very flexible, powerful and it comes in a can!  It will save you much work and headaches.


0
 

Author Comment

by:trouvain
ID: 1168911
Thank you for your answer but unfortunately i didn't get the message where to look for creating text file based ODBC datasources. Are there source code routines to find? Please give me a more specific clue.
Thanx in advance
0
 
LVL 4

Expert Comment

by:erajoj
ID: 1168912
How does/would your memory structures look like?
Exactly what kind of functionality are you looking for?
/// John
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:trouvain
ID: 1168913
The given CSV file is a data matrix which contains a column header and a row header. In the cells of this matrix are the values. I want to read the data into an array of array of valuetype (i.e. float). With the given column header I can determine the the number of columns. The functionality I look for is as stated in my question whether anyone has a link to or the source code by itself how to read in data from a CSV file. This routine should be able to deal with quoted strings ("...,...") and quoted quotation marks (...""...). I hope this is enough to answer your question.
0
 
LVL 1

Expert Comment

by:slinky
ID: 1168914
Use strtok with a comma as the token
0
 
LVL 3

Accepted Solution

by:
stefanr earned 200 total points
ID: 1168915
Try something like this:

#include <iostream.h>
#include <fstream.h>
#include <stdlib.h>
#include <conio.h>
#include <string.h>
#include <assert.h>

bool ProcessLine(char**& ppColumn, int& nColumnCount, const char* pszLine)
{
   bool bProcessingColumn = false; // true from the start to the next ',' or end of line.
   bool bQuote = false; // true if processing a quoted column.
   bool bSkipToNextColumn = false; // true to skip to the next ',' or end of line during processing.

   char szColumn[1024] = { 0 };
   unsigned nIndex = 0;

   for (unsigned i = 0; i <= ::strlen(pszLine); i++)
   {
      if (bProcessingColumn)
      {
         if (bQuote)
         {
            if ('"' == pszLine[i])
            {
               bQuote = false; // Quoted column string terminated.
               bSkipToNextColumn = true;
            }
            else
            {
               szColumn[nIndex] = pszLine[i];
               nIndex++;
            }
         }
         else if (',' == pszLine[i] || 0 == pszLine[i])
         {
            bProcessingColumn = false;
            bSkipToNextColumn = false;

            szColumn[nIndex] = 0; // Terminate temporary column string.

            nColumnCount++;
            ppColumn = (char**) ::realloc(ppColumn, nColumnCount * sizeof(char*)); // Adjust size of array of pointers.
            ppColumn[nColumnCount-1] = ::strdup(szColumn); // Duplicate new string.
         }
         else if (!bSkipToNextColumn)
         {
            szColumn[nIndex] = pszLine[i];
            nIndex++;
         }
      }
      else
      {
         nIndex = 0;

         if (',' == pszLine[i] || 0 == pszLine[i])
         {
            // Column contains empty string.

            szColumn[nIndex] = 0; // Terminate temporary column string.

            nColumnCount++;
            ppColumn = (char**) ::realloc(ppColumn, nColumnCount * sizeof(char*)); // Adjust size of array of pointers.
            ppColumn[nColumnCount-1] = ::strdup(szColumn); // Duplicate new string.
         }
         else if ('"' == pszLine[i])
         {
            bProcessingColumn = true;
            bQuote = true;
         }
         else
         {
            bProcessingColumn = true;

            szColumn[nIndex] = pszLine[i];
            nIndex++;
         }
      }
   }

   return true;
}

int main()
{
   try
   {
      fstream fs("Test.CSV", ios::in | ios::nocreate);

      int nLineCount = 0;
      char szLine[1024] = { 0 };
      char** ppHeader = NULL; // Pointer to array of strings that contains the names of the columns.
      char*** pppRecord = NULL; // Pointer to arrays of record strings.
      int nColumnCount = 0; // When reading the first line that is supposed to contain the names of the columns the number of columns is determined.
      int nRecordCount = 0; // Count of records read.

      while (fs.getline(szLine, sizeof(szLine)))
      {
         if (0 == nLineCount)
         {
            ProcessLine(ppHeader, nColumnCount, szLine);
         }
         else
         {
            char** ppRecord = NULL;
            int nTemp = 0;
            ProcessLine(ppRecord, nTemp, szLine);
            assert(nTemp == nColumnCount);
            nRecordCount++;
            pppRecord = (char***) ::realloc(pppRecord, nRecordCount * sizeof(char**));
            pppRecord[nRecordCount-1] = ppRecord;
         }

         nLineCount++;
      }

      fs.close();

      for (int i = 0; i < nColumnCount; i++)
      {
         cout << ppHeader[i] << ";"; // Print column names.
         ::free(ppHeader[i]);
      }
      cout << endl;
      ::free(ppHeader);

      for (i = 0; i < nRecordCount; i++)
      {
         for (int j = 0; j < nColumnCount; j++)
         {
            cout << pppRecord[i][j] << ";"; // Print content of each column.
            ::free(pppRecord[i][j]);
         }
         cout << endl; // Prepare to print the next records column content.
         ::free(pppRecord[i]);
      }
      ::free(pppRecord);
   }
   catch (...)
   {
      return EXIT_FAILURE;
   }

   cout << "Press any key to exit..." << endl;
   ::_getch();
   return EXIT_SUCCESS;
}

0
 
LVL 3

Expert Comment

by:stefanr
ID: 1168916
To achieve the escaped " facility, replace the ProcessLine above with:

bool ProcessLine(char**& ppColumn, int& nColumnCount, const char* pszLine)
{
   bool bProcessingColumn = false; // true from the start to the next ',' or end of line.
   bool bQuote = false; // true if processing a quoted column.
   bool bSkipToNextColumn = false; // true to skip to the next ',' or end of line during processing.
   bool bEscapedQuote = false;

   char szColumn[1024] = { 0 };
   unsigned nIndex = 0;

   for (unsigned i = 0; i <= ::strlen(pszLine); i++)
   {
      if (bProcessingColumn)
      {
         if (bQuote)
         {
            if ('"' == pszLine[i])
            {
               if (bEscapedQuote)
               {
                  bEscapedQuote = false;
                  szColumn[nIndex] = pszLine[i];
                  nIndex++;
               }
               else if (i < ::strlen(pszLine) && '"' == pszLine[i+1])
               {
                  bEscapedQuote = true;
               }
               else
               {
                  bQuote = false; // Quoted column string terminated.
                  bSkipToNextColumn = true;
               }
            }
            else
            {
               szColumn[nIndex] = pszLine[i];
               nIndex++;
            }
         }
         else if (',' == pszLine[i] || 0 == pszLine[i])
         {
            bProcessingColumn = false;
            bSkipToNextColumn = false;

            szColumn[nIndex] = 0; // Terminate temporary column string.

            nColumnCount++;
            ppColumn = (char**) ::realloc(ppColumn, nColumnCount * sizeof(char*)); // Adjust size of array of pointers.
            ppColumn[nColumnCount-1] = ::strdup(szColumn); // Duplicate new string.
         }
         else if (!bSkipToNextColumn)
         {
            szColumn[nIndex] = pszLine[i];
            nIndex++;
         }
      }
      else
      {
         nIndex = 0;

         if (',' == pszLine[i] || 0 == pszLine[i])
         {
            // Column contains empty string.

            szColumn[nIndex] = 0; // Terminate temporary column string.

            nColumnCount++;
            ppColumn = (char**) ::realloc(ppColumn, nColumnCount * sizeof(char*)); // Adjust size of array of pointers.
            ppColumn[nColumnCount-1] = ::strdup(szColumn); // Duplicate new string.
         }
         else if ('"' == pszLine[i])
         {
            bProcessingColumn = true;
            bQuote = true;
         }
         else
         {
            bProcessingColumn = true;

            szColumn[nIndex] = pszLine[i];
            nIndex++;
         }
      }
   }

   return true;
}

0
 

Author Comment

by:trouvain
ID: 1168917
Hi Stefan,

thank you very much for your efford. I will try it out immediatly.
0
 
LVL 3

Expert Comment

by:stefanr
ID: 1168918
To achieve the escaped " facility, replace the ProcessLine above with:

bool ProcessLine(char**& ppColumn, int& nColumnCount, const char* pszLine)
{
   bool bProcessingColumn = false; // true from the start to the next ',' or end of line.
   bool bQuote = false; // true if processing a quoted column.
   bool bSkipToNextColumn = false; // true to skip to the next ',' or end of line during processing.
   bool bEscapedQuote = false;

   char szColumn[1024] = { 0 };
   unsigned nIndex = 0;

   for (unsigned i = 0; i <= ::strlen(pszLine); i++)
   {
      if (bProcessingColumn)
      {
         if (bQuote)
         {
            if ('"' == pszLine[i])
            {
               if (bEscapedQuote)
               {
                  bEscapedQuote = false;
                  szColumn[nIndex] = pszLine[i];
                  nIndex++;
               }
               else if (i < ::strlen(pszLine) && '"' == pszLine[i+1])
               {
                  bEscapedQuote = true;
               }
               else
               {
                  bQuote = false; // Quoted column string terminated.
                  bSkipToNextColumn = true;
               }
            }
            else
            {
               szColumn[nIndex] = pszLine[i];
               nIndex++;
            }
         }
         else if (',' == pszLine[i] || 0 == pszLine[i])
         {
            bProcessingColumn = false;
            bSkipToNextColumn = false;

            szColumn[nIndex] = 0; // Terminate temporary column string.

            nColumnCount++;
            ppColumn = (char**) ::realloc(ppColumn, nColumnCount * sizeof(char*)); // Adjust size of array of pointers.
            ppColumn[nColumnCount-1] = ::strdup(szColumn); // Duplicate new string.
         }
         else if (!bSkipToNextColumn)
         {
            szColumn[nIndex] = pszLine[i];
            nIndex++;
         }
      }
      else
      {
         nIndex = 0;

         if (',' == pszLine[i] || 0 == pszLine[i])
         {
            // Column contains empty string.

            szColumn[nIndex] = 0; // Terminate temporary column string.

            nColumnCount++;
            ppColumn = (char**) ::realloc(ppColumn, nColumnCount * sizeof(char*)); // Adjust size of array of pointers.
            ppColumn[nColumnCount-1] = ::strdup(szColumn); // Duplicate new string.
         }
         else if ('"' == pszLine[i])
         {
            bProcessingColumn = true;
            bQuote = true;
         }
         else
         {
            bProcessingColumn = true;

            szColumn[nIndex] = pszLine[i];
            nIndex++;
         }
      }
   }

   return true;
}

0
 

Author Comment

by:trouvain
ID: 1168919
Thank you for your work :-)
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

IntroductionThis article is the second in a three part article series on the Visual Studio 2008 Debugger.  It provides tips in setting and using breakpoints. If not familiar with this debugger, you can find a basic introduction in the EE article loc…
Container Orchestration platforms empower organizations to scale their apps at an exceptional rate. This is the reason numerous innovation-driven companies are moving apps to an appropriated datacenter wide platform that empowers them to scale at a …
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

695 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question