Link to home
Start Free TrialLog in
Avatar of wwward0
wwward0

asked on

Tokenizing a string without separators. How to parse a string in C?

I am writing a parser to handle an ASCII string representing voltages.  The intent is to have this string separated into seven tokens.  The string format is:

#FFFFFF0FF0<cr>

The # indicates to my program to select a "hex" case.  The <cr> at the end signifies a carraige return which should be treated as an "end of line" flag.  This last <CR> completes the string.

I want the string tokenized as follows:

# FF FF FF 0 FF 0 <CR>

There are no separators, only the position within the string is indicative of each token.  Remember that these are being sent as ASCII characters over a serial line and I'll need to convert them from ASCII to integers (subtract 48 from the byte.)

The goal is to be able to perform calculations on the individual tokens.  I am going to use the # as a start flag (not stored in a variable) and the <CR> as an end flag.

I realize this is a pretty basic issue for most programmers, but to me it's a hobby.  I'm trying to build a microcontroller-driven servo and send it commands with my PC.  I'm going to keep plugging away at the solution myself, but hope you can help.  I would like to get an answer as soon as possible.

Thanks,
Bill
ASKER CERTIFIED SOLUTION
Avatar of rajeev_devin
rajeev_devin

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Infinity08
>> I want the string tokenized as follows:
>> 
>> # FF FF FF 0 FF 0 <CR>

A few questions :

1) is <CR> 1 character (ie. the carriage return character with ASCII value 13), or are these 4 characters (<, C, R and >) ?

2) will the first 3 values always be 2 characters wide, the fourth 1 character wide, the fifth 2 characters wide, and the last 1 character wide ? If not, what are the possible values, and how do you determine where to split (tokenize) them ?

Hi rajeev,

All kinds of algorithms exist to tokenize that string.  I'll be glad to provide some code for you.

But I believe that the algorithm is in need of a tune up.  Given that the last 4 chars of '0FF0' represent 0, FF, and 0, what string would represent 0F F0?


Kent
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of avsrivastava
avsrivastava

Infinity and Kdo,
I am just guessing, but the way Bill has tokenised the string it appears only 2 values a high(FF) and a low(0) are allowed values. Would not make any sense otherwise.
avsrivastava,

that's what I thought, but it's best to be sure before offering the best solution.

On the other hand, if it's the case that there are only 2 values (FF and 0), then it's maybe better to use  F and 0 or 1 and 0 ?
>> then it's maybe better to use  F and 0 or 1 and 0 ?
especially because it seems that it's not always gonna be 6 values, but there might be more or less.
And if each token is only one character, it makes decoding a LOT easier.

Welll, the original statement "a parser to handle an ASCII string representing voltages" isn't descript enough, though I doubt seriously that we've settled on the string's true representation.

If he's got exactly 6 voltages, the are exactly 10 characters between the '#' and the '<', and positions 4 and 6 are always 0, then a lot of methods work.

But we'll need a lot more answers.

-- Are the voltages for positions 1,2, 3, and 5 0/FF or can any value in the range 0/FF work?
-- Can the voltages for positions 2 and 6 be anything other than 0?
-- Does the poster believe that a single digit of zero represents a zero voltage?  If so, how are voltages of 1,2,3,4,5,6,7,8,9,A,B,C,D,E,F represented?

Perhaps these are masks.  (All bits on, all bits off.)  Then parsing the string is a snap.  Upon encountering the '#' just switch() on 'F', '0', and '<'.


Input.  Need Input.....   :~}
Kent
Avatar of wwward0

ASKER

Infinity08:
1.  The "<CR>" carriage return is in fact the ASCII value 13.  I am specifically looking for the string to terminate with a carriage return to signal the end of the string.
2.  The values must always be the same length.  This is intended to take ASCII characters representing hex values.  If the length is incorrect, the it's a bad command and should be ignored.

Kdo:
The string is an example.  The first three pairs of bytes will represent an eight-bit value when converted from hex to integer.  The single-digits are basically intended to signal one of 16 possible modes (0-f).  The string 0FF0 could exist if I selected mode 0, value FF, second mode 0.

avsrivastava:
For double-byte groups, the values can be anything from 00-FF in hex.  It should represent an eight bit integer once converted from hex to int (this is done elsewhere in the program.)

itsmeandnobodyelse:
Thanks, I'm checking this out!



SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of wwward0

ASKER

Thank you all for your help, you pointed me in the right direction, and I have a working setup!

:)

Bill