Link to home
Start Free TrialLog in
Avatar of tcy08
tcy08

asked on

Use C to parse data from a file ?

In tmp.txt, I have :

1756,"this, that",01/01/01,A
745,there,,B
75,hello,01/01/01,C
....


How do I write a C prog that will read each line from the tmp.txt and seperate each field into
a variable ?

For above example:

id = 1756
str = "this,that"                              -- this is compilcated, beacuse I have a comma.
date = 01/01/01
chr = A

id = 745
str = there
date =
chr = B

id = 75
str = hello
date = 01/01/01
chr = C
ASKER CERTIFIED SOLUTION
Avatar of fcavalier
fcavalier

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of tcy08
tcy08

ASKER

what if I don't want this str = "this,that", take out
the ", so I have str = this,that. What should I change ?                        
Just a personal preference, but I would do that outside of parseline, with a separate function.  Here's a function and updated main().

The parseline() function stays unchanged.

What you are asking for is often part of parsing a database export.  In that case, you will also want to convert occurrences of "" inside a string to a single ", because that is how most databsae programs export fields containing ".  (Annoying to process.)  See the comment inside FixDoubleQuote()

/* EXAMPLE USE of parseline() */

void FixDoubleQuote(char *psz)
{ /* If str starts with a '\"' then strip a leading and
     trailing double quote
  */
    char *ptr = psz;
    const char *ptr2;

    if (*ptr != '\"') { /* Not starting with double quote. Don't change */
     return;
    }
    ptr2 = ptr+1; /* ptr2 starts ahead of ptr, so in
               essence we are copying the string
               down, one character at a time.
             */

    while(*ptr2) {
   
/* NOTE: To convert "" to " within the string, (common in database export
   processing, then uncomment the next line.
     if ((*ptr2 == '\"')&&(ptr[1]=='"')) ptr2++;
 */
     *ptr++ = *ptr2++;
    }
    *ptr = '\0';

    /* Remove trailing quote */
    if (*(ptr-1) == '\"') {
     *(ptr-1) = '\0';
    }
} /* FixDoubleQuote */

#include <stdio.h>
int main(int argc,char **argv)
{
    char buf[1024];

    char *pField[4];
    char *id;
    char *str;
    char *date;
    char *chr;

    while(fgets(buf,sizeof(buf)-1,stdin)) {
     /* fgets forces a line length limitation! This is just an example!

     Might want to use something like afgets() in production, see:
         http://www.mibsoftware.com/libmib/astring/

     */

     if  (parseline(buf,pField,4) == 4) {
         id = pField[0];
         str = pField[1];
         FixDoubleQuote(str);
         date = pField[2];
         chr = pField[3];
         printf("id = %s\n",id);
         printf("str = %s\n",str);
         printf("date = %s\n",date);
         printf("chr = %s\n",chr);
     } else {
         /* Unexpected number of fields on line! */
     }

    }

}

Avatar of tcy08

ASKER

Can you comment these code ?

if (iField < cMaxField) {
                           ppField[iField] = ptr;
                           iField++;
                          }

                          while(*ptr) {
                           if (*ptr == ',') {
                               *ptr++ = '\0';
                               if (*ptr && (iField < cMaxField)) {
                                ppField[iField] = ptr;
                               }
                               iField++;

                           } else if (*ptr == '\"') {
                               ptr++;
                               while(*ptr && (*ptr != '\"')) {
                                ptr++;
                               }
                               if (*ptr) {
                                ptr++;
                               }
                           } else {
                               ptr++;
                           }
                          }


I am not good with C
Avatar of tcy08

ASKER

I have + another 20 points for you
The comments at the top are adequate for most moderately experienced C programmers.

If you aren't good with C, then it can be improved to make it easier to understand.  I added some explanations.
I also split some of the autoincrement operators, and compound tests to make it easier to follow, but the code is functionally the same.

Note also that in replying I made two improvements:

     Now when call with cMaxField==0, you can get an
     accurate count.  (I don't know how useful this is,
     since line gets changed no matter what.)

     Correct handling when the last field on the line is
     empty.  (Not a test case, but it might happen.)

<PRE>
------------------------------------------
    /* Always test that we aren't making an assignment
       after the end the array ppField[]
     */
    if (iField < cMaxField) {
     ppField[iField] = ptr;
    }
    iField++;

    while(*ptr) {
        /* This loop has three cases to handle,
           and it does it with if-else tests.
              ptr at a ,
              ptr at a "
              ptr at anything else
         */

     if (*ptr == ',') {
            /* We got a field separator. */
            /* End the previous field */
         *ptr = '\0';
            ptr++;

            /* Start the next field */
            if (iField < cMaxField) {
          ppField[iField] = ptr;
            }
         iField++;

     } else if (*ptr == '\"') {
            /* Process all characters up to
               next '\"' (or end of string.)
             */
         ptr++;
         while(*ptr) {
                if (*ptr == '\"') { /* End of string */
                    ptr++;
                    break;
                }
          ptr++;
         }
     } else {
         ptr++;
     }
    }
    return iField;
</PRE>
Avatar of tcy08

ASKER

How does this :

     if (iField < cMaxField) {
        ppField[iField] = ptr;
     }

assign the first field to the 1st variable ?
How does it knows where to end the first field ?
Avatar of tcy08

ASKER


Why strtok will not work correctly ?

For example :
 
  745,there,,B

strtok will assign 745 to 1st variable, there to 2nd and
B to third ?

But if I put a space like 745,there, ,B
the strtok will work correctly. Why ?
Avatar of ozo
strtok(char *s1,const char *s2) searches the string pointed to by s1 for the first character that is not contained in the current separator string pointed to by s2
strtok considers the string s1 to consist of a sequence of zero or more text tokens separated by spans of one or more characters from the separator string s2.
Avatar of tcy08

ASKER

ozo,

I have just posted a question "Is there a split function in C ?"
Can you tell me how to fix that strtok ?
This question was LOCKED with a PROPOSED ANSWER and awaits your decision today.  Once a question is LOCKED with a Proposed Answer, few new experts will step in to help on that question, since the assumption is, you've been helped.  If the Proposed Answer helped you, please accept it and award that expert.  If it did not help you, please reject it and add comments as to status and what else is needed.
 
If you wish to award multiple experts, just comment here with detail, I'll respond as soon as possible.  As it stands today, you asked the question, got help and not one expert was awarded for the contribution(s) made.  Your response is needed.  I'll monitor through month end, and if you've not returned to complete this, we'll need to decide.  Expert input is welcome (as always) to determine the outcome here if the Asker does not respond.
 
Your response in finalizing this (and ALL) your question(s) is appreciated.
 
Moondancer
Community Support Moderator @ Experts Exchange

ADMINISTRATION WILL BE CONTACTING YOU SHORTLY.  Moderators Computer101 or Netminder will return to finalize these if still open in seven days.  Please post closing recommendations before that time.

Question(s) below appears to have been abandoned. Your options are:
 
1. Accept a Comment As Answer (use the button next to the Expert's name).
2. Close the question if the information was not useful to you, but may help others. You must tell the participants why you wish to do this, and allow for Expert response.  This choice will include a refund to you, and will move this question to our PAQ (Previously Asked Question) database.  If you found information outside this question thread, please add it.
3. Ask Community Support to help split points between participating experts, or just comment here with details and we'll respond with the process.
4. Delete the question (if it has no potential value for others).
   --> Post comments for expert of your intention to delete and why
   --> YOU CANNOT DELETE A QUESTION with comments; special handling by a Moderator is required.

For special handling needs, please post a zero point question in the link below and include the URL (question QID/link) that it regards with details.
https://www.experts-exchange.com/jsp/qList.jsp?ta=commspt
 
Please click this link for Help Desk, Guidelines/Member Agreement and the Question/Answer process.  https://www.experts-exchange.com/jsp/cmtyHelpDesk.jsp

Click you Member Profile to view your question history and please keep them updated. If you are a KnowledgePro user, use the Power Search option to find them.  

Questions which are LOCKED with a Proposed Answer but do not help you, should be rejected with comments added.  When you grade the question less than an A, please comment as to why.  This helps all involved, as well as others who may access this item in the future.  PLEASE DO NOT AWARD POINTS TO ME.

To view your open questions, please click the following link(s) and keep them all current with updates.
https://www.experts-exchange.com/questions/Q.20020256.html
https://www.experts-exchange.com/questions/Q.20275491.html
https://www.experts-exchange.com/questions/Q.20278358.html
https://www.experts-exchange.com/questions/Q.20278843.html
https://www.experts-exchange.com/questions/Q.20288064.html
https://www.experts-exchange.com/questions/Q.20288080.html
https://www.experts-exchange.com/questions/Q.20288780.html
https://www.experts-exchange.com/questions/Q.20288783.html
https://www.experts-exchange.com/questions/Q.20288785.html
https://www.experts-exchange.com/questions/Q.20289261.html
https://www.experts-exchange.com/questions/Q.20289866.html
https://www.experts-exchange.com/questions/Q.20290728.html
https://www.experts-exchange.com/questions/Q.20287467.html


To view your locked questions, please click the following link(s) and evaluate the proposed answer.
https://www.experts-exchange.com/questions/Q.20030535.html
https://www.experts-exchange.com/questions/Q.20024123.html
https://www.experts-exchange.com/questions/Q.20022811.html

*****  E X P E R T S    P L E A S E  ******  Leave your closing recommendations if this item remains inactive another seven (7) days.  If you are interested in the cleanup effort, please click this link https://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=commspt&qid=20274643 
POINTS FOR EXPERTS awaiting comments are listed here -> https://www.experts-exchange.com/commspt/Q.20277028.html
 
Moderators will finalize this question if in @7 days Asker has not responded.  This will be moved to the PAQ (Previously Asked Questions) at zero points, deleted or awarded.
 
Thank you everyone.
 
Moondancer
Moderator @ Experts Exchange
Zero response, so finalized today.
Moondancer - EE Moderator
Hi,

About the first code example by fcavalier.
I think the  int iField = 0; should be a static value.
like  static int iField = 0;
if not we loose the field number each time we leave the funcion parsing.