Solved

Reading from a csv file

Posted on 1998-07-29
10
215 Views
Last Modified: 2010-04-02
Hi,
i'm actually confronted to the problem to read data from an exsiting csv file into a memory structure. Has anyone already developed routines which are able to read from a csv file? It also have to deal with quoting like ...,...,"...,..." and "" for a simple double quote mark. I would be very happy if someone else has the source code. Any help would be appreciated.
Many thanks in advance
0
Comment
Question by:trouvain
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
10 Comments
 
LVL 2

Expert Comment

by:rayb
ID: 1168910
Look into creating a text file based ODBC datasource.  This will help you a great deal.  It's very flexible, powerful and it comes in a can!  It will save you much work and headaches.


0
 

Author Comment

by:trouvain
ID: 1168911
Thank you for your answer but unfortunately i didn't get the message where to look for creating text file based ODBC datasources. Are there source code routines to find? Please give me a more specific clue.
Thanx in advance
0
 
LVL 4

Expert Comment

by:erajoj
ID: 1168912
How does/would your memory structures look like?
Exactly what kind of functionality are you looking for?
/// John
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:trouvain
ID: 1168913
The given CSV file is a data matrix which contains a column header and a row header. In the cells of this matrix are the values. I want to read the data into an array of array of valuetype (i.e. float). With the given column header I can determine the the number of columns. The functionality I look for is as stated in my question whether anyone has a link to or the source code by itself how to read in data from a CSV file. This routine should be able to deal with quoted strings ("...,...") and quoted quotation marks (...""...). I hope this is enough to answer your question.
0
 
LVL 1

Expert Comment

by:slinky
ID: 1168914
Use strtok with a comma as the token
0
 
LVL 3

Accepted Solution

by:
stefanr earned 200 total points
ID: 1168915
Try something like this:

#include <iostream.h>
#include <fstream.h>
#include <stdlib.h>
#include <conio.h>
#include <string.h>
#include <assert.h>

bool ProcessLine(char**& ppColumn, int& nColumnCount, const char* pszLine)
{
   bool bProcessingColumn = false; // true from the start to the next ',' or end of line.
   bool bQuote = false; // true if processing a quoted column.
   bool bSkipToNextColumn = false; // true to skip to the next ',' or end of line during processing.

   char szColumn[1024] = { 0 };
   unsigned nIndex = 0;

   for (unsigned i = 0; i <= ::strlen(pszLine); i++)
   {
      if (bProcessingColumn)
      {
         if (bQuote)
         {
            if ('"' == pszLine[i])
            {
               bQuote = false; // Quoted column string terminated.
               bSkipToNextColumn = true;
            }
            else
            {
               szColumn[nIndex] = pszLine[i];
               nIndex++;
            }
         }
         else if (',' == pszLine[i] || 0 == pszLine[i])
         {
            bProcessingColumn = false;
            bSkipToNextColumn = false;

            szColumn[nIndex] = 0; // Terminate temporary column string.

            nColumnCount++;
            ppColumn = (char**) ::realloc(ppColumn, nColumnCount * sizeof(char*)); // Adjust size of array of pointers.
            ppColumn[nColumnCount-1] = ::strdup(szColumn); // Duplicate new string.
         }
         else if (!bSkipToNextColumn)
         {
            szColumn[nIndex] = pszLine[i];
            nIndex++;
         }
      }
      else
      {
         nIndex = 0;

         if (',' == pszLine[i] || 0 == pszLine[i])
         {
            // Column contains empty string.

            szColumn[nIndex] = 0; // Terminate temporary column string.

            nColumnCount++;
            ppColumn = (char**) ::realloc(ppColumn, nColumnCount * sizeof(char*)); // Adjust size of array of pointers.
            ppColumn[nColumnCount-1] = ::strdup(szColumn); // Duplicate new string.
         }
         else if ('"' == pszLine[i])
         {
            bProcessingColumn = true;
            bQuote = true;
         }
         else
         {
            bProcessingColumn = true;

            szColumn[nIndex] = pszLine[i];
            nIndex++;
         }
      }
   }

   return true;
}

int main()
{
   try
   {
      fstream fs("Test.CSV", ios::in | ios::nocreate);

      int nLineCount = 0;
      char szLine[1024] = { 0 };
      char** ppHeader = NULL; // Pointer to array of strings that contains the names of the columns.
      char*** pppRecord = NULL; // Pointer to arrays of record strings.
      int nColumnCount = 0; // When reading the first line that is supposed to contain the names of the columns the number of columns is determined.
      int nRecordCount = 0; // Count of records read.

      while (fs.getline(szLine, sizeof(szLine)))
      {
         if (0 == nLineCount)
         {
            ProcessLine(ppHeader, nColumnCount, szLine);
         }
         else
         {
            char** ppRecord = NULL;
            int nTemp = 0;
            ProcessLine(ppRecord, nTemp, szLine);
            assert(nTemp == nColumnCount);
            nRecordCount++;
            pppRecord = (char***) ::realloc(pppRecord, nRecordCount * sizeof(char**));
            pppRecord[nRecordCount-1] = ppRecord;
         }

         nLineCount++;
      }

      fs.close();

      for (int i = 0; i < nColumnCount; i++)
      {
         cout << ppHeader[i] << ";"; // Print column names.
         ::free(ppHeader[i]);
      }
      cout << endl;
      ::free(ppHeader);

      for (i = 0; i < nRecordCount; i++)
      {
         for (int j = 0; j < nColumnCount; j++)
         {
            cout << pppRecord[i][j] << ";"; // Print content of each column.
            ::free(pppRecord[i][j]);
         }
         cout << endl; // Prepare to print the next records column content.
         ::free(pppRecord[i]);
      }
      ::free(pppRecord);
   }
   catch (...)
   {
      return EXIT_FAILURE;
   }

   cout << "Press any key to exit..." << endl;
   ::_getch();
   return EXIT_SUCCESS;
}

0
 
LVL 3

Expert Comment

by:stefanr
ID: 1168916
To achieve the escaped " facility, replace the ProcessLine above with:

bool ProcessLine(char**& ppColumn, int& nColumnCount, const char* pszLine)
{
   bool bProcessingColumn = false; // true from the start to the next ',' or end of line.
   bool bQuote = false; // true if processing a quoted column.
   bool bSkipToNextColumn = false; // true to skip to the next ',' or end of line during processing.
   bool bEscapedQuote = false;

   char szColumn[1024] = { 0 };
   unsigned nIndex = 0;

   for (unsigned i = 0; i <= ::strlen(pszLine); i++)
   {
      if (bProcessingColumn)
      {
         if (bQuote)
         {
            if ('"' == pszLine[i])
            {
               if (bEscapedQuote)
               {
                  bEscapedQuote = false;
                  szColumn[nIndex] = pszLine[i];
                  nIndex++;
               }
               else if (i < ::strlen(pszLine) && '"' == pszLine[i+1])
               {
                  bEscapedQuote = true;
               }
               else
               {
                  bQuote = false; // Quoted column string terminated.
                  bSkipToNextColumn = true;
               }
            }
            else
            {
               szColumn[nIndex] = pszLine[i];
               nIndex++;
            }
         }
         else if (',' == pszLine[i] || 0 == pszLine[i])
         {
            bProcessingColumn = false;
            bSkipToNextColumn = false;

            szColumn[nIndex] = 0; // Terminate temporary column string.

            nColumnCount++;
            ppColumn = (char**) ::realloc(ppColumn, nColumnCount * sizeof(char*)); // Adjust size of array of pointers.
            ppColumn[nColumnCount-1] = ::strdup(szColumn); // Duplicate new string.
         }
         else if (!bSkipToNextColumn)
         {
            szColumn[nIndex] = pszLine[i];
            nIndex++;
         }
      }
      else
      {
         nIndex = 0;

         if (',' == pszLine[i] || 0 == pszLine[i])
         {
            // Column contains empty string.

            szColumn[nIndex] = 0; // Terminate temporary column string.

            nColumnCount++;
            ppColumn = (char**) ::realloc(ppColumn, nColumnCount * sizeof(char*)); // Adjust size of array of pointers.
            ppColumn[nColumnCount-1] = ::strdup(szColumn); // Duplicate new string.
         }
         else if ('"' == pszLine[i])
         {
            bProcessingColumn = true;
            bQuote = true;
         }
         else
         {
            bProcessingColumn = true;

            szColumn[nIndex] = pszLine[i];
            nIndex++;
         }
      }
   }

   return true;
}

0
 

Author Comment

by:trouvain
ID: 1168917
Hi Stefan,

thank you very much for your efford. I will try it out immediatly.
0
 
LVL 3

Expert Comment

by:stefanr
ID: 1168918
To achieve the escaped " facility, replace the ProcessLine above with:

bool ProcessLine(char**& ppColumn, int& nColumnCount, const char* pszLine)
{
   bool bProcessingColumn = false; // true from the start to the next ',' or end of line.
   bool bQuote = false; // true if processing a quoted column.
   bool bSkipToNextColumn = false; // true to skip to the next ',' or end of line during processing.
   bool bEscapedQuote = false;

   char szColumn[1024] = { 0 };
   unsigned nIndex = 0;

   for (unsigned i = 0; i <= ::strlen(pszLine); i++)
   {
      if (bProcessingColumn)
      {
         if (bQuote)
         {
            if ('"' == pszLine[i])
            {
               if (bEscapedQuote)
               {
                  bEscapedQuote = false;
                  szColumn[nIndex] = pszLine[i];
                  nIndex++;
               }
               else if (i < ::strlen(pszLine) && '"' == pszLine[i+1])
               {
                  bEscapedQuote = true;
               }
               else
               {
                  bQuote = false; // Quoted column string terminated.
                  bSkipToNextColumn = true;
               }
            }
            else
            {
               szColumn[nIndex] = pszLine[i];
               nIndex++;
            }
         }
         else if (',' == pszLine[i] || 0 == pszLine[i])
         {
            bProcessingColumn = false;
            bSkipToNextColumn = false;

            szColumn[nIndex] = 0; // Terminate temporary column string.

            nColumnCount++;
            ppColumn = (char**) ::realloc(ppColumn, nColumnCount * sizeof(char*)); // Adjust size of array of pointers.
            ppColumn[nColumnCount-1] = ::strdup(szColumn); // Duplicate new string.
         }
         else if (!bSkipToNextColumn)
         {
            szColumn[nIndex] = pszLine[i];
            nIndex++;
         }
      }
      else
      {
         nIndex = 0;

         if (',' == pszLine[i] || 0 == pszLine[i])
         {
            // Column contains empty string.

            szColumn[nIndex] = 0; // Terminate temporary column string.

            nColumnCount++;
            ppColumn = (char**) ::realloc(ppColumn, nColumnCount * sizeof(char*)); // Adjust size of array of pointers.
            ppColumn[nColumnCount-1] = ::strdup(szColumn); // Duplicate new string.
         }
         else if ('"' == pszLine[i])
         {
            bProcessingColumn = true;
            bQuote = true;
         }
         else
         {
            bProcessingColumn = true;

            szColumn[nIndex] = pszLine[i];
            nIndex++;
         }
      }
   }

   return true;
}

0
 

Author Comment

by:trouvain
ID: 1168919
Thank you for your work :-)
0

Featured Post

[Webinar] Code, Load, and Grow

Managing multiple websites, servers, applications, and security on a daily basis? Join us for a webinar on May 25th to learn how to simplify administration and management of virtual hosts for IT admins, create a secure environment, and deploy code more effectively and frequently.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

What is C++ STL?: STL stands for Standard Template Library and is a part of standard C++ libraries. It contains many useful data structures (containers) and algorithms, which can spare you a lot of the time. Today we will look at the STL Vector. …
Introduction This article is a continuation of the C/C++ Visual Studio Express debugger series. Part 1 provided a quick start guide in using the debugger. Part 2 focused on additional topics in breakpoints. As your assignments become a little more …
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question