Tokenizing a string without separators.  How to parse a string in C?

Posted on 2006-11-20
Last Modified: 2010-04-15
I am writing a parser to handle an ASCII string representing voltages.  The intent is to have this string separated into seven tokens.  The string format is:


The # indicates to my program to select a "hex" case.  The <cr> at the end signifies a carraige return which should be treated as an "end of line" flag.  This last <CR> completes the string.

I want the string tokenized as follows:

# FF FF FF 0 FF 0 <CR>

There are no separators, only the position within the string is indicative of each token.  Remember that these are being sent as ASCII characters over a serial line and I'll need to convert them from ASCII to integers (subtract 48 from the byte.)

The goal is to be able to perform calculations on the individual tokens.  I am going to use the # as a start flag (not stored in a variable) and the <CR> as an end flag.

I realize this is a pretty basic issue for most programmers, but to me it's a hobby.  I'm trying to build a microcontroller-driven servo and send it commands with my PC.  I'm going to keep plugging away at the solution myself, but hope you can help.  I would like to get an answer as soon as possible.

Question by:wwward0
  • 4
  • 2
  • 2
  • +3
LVL 12

Accepted Solution

rajeev_devin earned 200 total points
ID: 17985406
Can be extracted this way:

char str[] = "#FFFFFF0FF0\n";
char tokens[7][10];
sscanf(str, "%1s%2s%2s%2s%1s%2s%1s",tokens[0],
LVL 53

Expert Comment

ID: 17985893
>> I want the string tokenized as follows:
>> # FF FF FF 0 FF 0 <CR>

A few questions :

1) is <CR> 1 character (ie. the carriage return character with ASCII value 13), or are these 4 characters (<, C, R and >) ?

2) will the first 3 values always be 2 characters wide, the fourth 1 character wide, the fifth 2 characters wide, and the last 1 character wide ? If not, what are the possible values, and how do you determine where to split (tokenize) them ?
LVL 45

Expert Comment

ID: 17987086

Hi rajeev,

All kinds of algorithms exist to tokenize that string.  I'll be glad to provide some code for you.

But I believe that the algorithm is in need of a tune up.  Given that the last 4 chars of '0FF0' represent 0, FF, and 0, what string would represent 0F F0?

LVL 39

Assisted Solution

itsmeandnobodyelse earned 150 total points
ID: 17987212
The following should do. You easily might add some checks if the string could contain invalid chars.

Regards, Alex

#include <malloc.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <memory.h>

int tokenize(const char *input, char*** ppszTokens)
     int count;
     size_t i;
     char c;
     char* p;
     int toggle;
     size_t len;
     char** pszTokens;

     len = strlen(input);
     if (len == 0 || input[0] != '#')
          return -1;   /* wrong input */

     // allocate space for max possible tokens
     *ppszTokens = (char**)malloc(len*sizeof(char*));
     pszTokens = *ppszTokens;
     toggle = 0;
     // allocate space for max possible output
     p = (char*) malloc(2*len + 5);
     memset(p, 0, 2*len + 5);
     count = 0;
     for (i = 0; i < len; i++)
           c = input[i];
           *p++ = c;
           if (c == '#' || c == '0')
               pszTokens[count++] = p-1;
               *p++ = 0;
           else if (c == '\n')
               pszTokens[count++] = --p;
               strcpy(p, "<CR>");
               p += 5;
           else if (toggle)
                toggle = 0;    
               pszTokens[count++] = p-2;
               *p++ = 0;
                toggle = 1;
     return count;

int main()
   char** pszTokens;
   char   input[] = "#FFFFFF0FF0\n";
   int    count;
   int    i;

   count = tokenize(input, &pszTokens);
   for (i = 0; i < count; ++i)
      printf("%s ", pszTokens[i]);
   return count;

Expert Comment

ID: 17990030
Infinity and Kdo,
I am just guessing, but the way Bill has tokenised the string it appears only 2 values a high(FF) and a low(0) are allowed values. Would not make any sense otherwise.
ScreenConnect 6.0 Free Trial

Check out the updates in one game-changing release, ScreenConnect 6.0, based on partner feedback. New features include a redesigned UI that improves session organization and overall user experience. See the enhancements for yourself!

LVL 53

Expert Comment

ID: 17990059

that's what I thought, but it's best to be sure before offering the best solution.

On the other hand, if it's the case that there are only 2 values (FF and 0), then it's maybe better to use  F and 0 or 1 and 0 ?
LVL 53

Expert Comment

ID: 17990093
>> then it's maybe better to use  F and 0 or 1 and 0 ?
especially because it seems that it's not always gonna be 6 values, but there might be more or less.
And if each token is only one character, it makes decoding a LOT easier.
LVL 45

Expert Comment

ID: 17990182

Welll, the original statement "a parser to handle an ASCII string representing voltages" isn't descript enough, though I doubt seriously that we've settled on the string's true representation.

If he's got exactly 6 voltages, the are exactly 10 characters between the '#' and the '<', and positions 4 and 6 are always 0, then a lot of methods work.

But we'll need a lot more answers.

-- Are the voltages for positions 1,2, 3, and 5 0/FF or can any value in the range 0/FF work?
-- Can the voltages for positions 2 and 6 be anything other than 0?
-- Does the poster believe that a single digit of zero represents a zero voltage?  If so, how are voltages of 1,2,3,4,5,6,7,8,9,A,B,C,D,E,F represented?

Perhaps these are masks.  (All bits on, all bits off.)  Then parsing the string is a snap.  Upon encountering the '#' just switch() on 'F', '0', and '<'.

Input.  Need Input.....   :~}

Author Comment

ID: 17992201
1.  The "<CR>" carriage return is in fact the ASCII value 13.  I am specifically looking for the string to terminate with a carriage return to signal the end of the string.
2.  The values must always be the same length.  This is intended to take ASCII characters representing hex values.  If the length is incorrect, the it's a bad command and should be ignored.

The string is an example.  The first three pairs of bytes will represent an eight-bit value when converted from hex to integer.  The single-digits are basically intended to signal one of 16 possible modes (0-f).  The string 0FF0 could exist if I selected mode 0, value FF, second mode 0.

For double-byte groups, the values can be anything from 00-FF in hex.  It should represent an eight bit integer once converted from hex to int (this is done elsewhere in the program.)

Thanks, I'm checking this out!

LVL 53

Assisted Solution

Infinity08 earned 150 total points
ID: 17993905
This is an example of how you could extract all values :

    #include <stdio.h>
    #include <stdlib.h>
    int main(void) {
      char * test = "#FFFFFF0FF0\r";
      int val1 = 0;
      int val2 = 0;
      int val3 = 0;
      int mode1 = 0;
      int val4 = 0;
      int mode2 = 0;
      int ret = sscanf(test, "#%02x%02x%02x%01x%02x%01x\r", &val1, &val2, &val3, &mode1, &val4, &mode2);
      if (ret == 6) {
        fprintf(stdout, "val1 : %d\n", val1);
        fprintf(stdout, "val2 : %d\n", val2);
        fprintf(stdout, "val3 : %d\n", val3);
        fprintf(stdout, "mode1 : %d\n", mode1);
        fprintf(stdout, "val4 : %d\n", val1);
        fprintf(stdout, "mode2 : %d\n", mode2);
      else {
        fprintf(stdout, "Invalid input !!\n");
      return 0;

Author Comment

ID: 18042735
Thank you all for your help, you pointed me in the right direction, and I have a working setup!



Featured Post

Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
This tutorial is posted by Aaron Wojnowski, administrator at  To view more iPhone tutorials, visit This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
The goal of this video is to provide viewers with basic examples to understand how to use strings and some functions related to them in the C programming language.
The goal of this video is to provide viewers with basic examples to understand opening and reading files in the C programming language.

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now