Strange character mapping when running in a cmd shell
Posted on 2004-03-25
I recently ran into a strange problem.
We had send an example of a command line invocation of our software to a customer, who cut and pasted it into a cmd shell (Windows XP). Sadly, it didn't work. The command line looked pretty normal, something like
prog.exe -opt1 -opt2 arg3 -opt3 arg3 etc.
The problem turned out to be that somewhere along the line between us and the customer, one of the dashes got converted from a regular ascii dash (0x2D) to an extended ascii dash (0x96). When the command line was pasted into the cmd shell, what the program got in the argv string was 8211 (0x2013). This completely messed up the string-handling functions and nothing worked of course.
The program is unicode, the main function declaration looks like
int wmain(int argc, wchar_t* argv)
All string manipulations are done using the unicode versions of the functions.
So the question is:
Why/how did the extended dash character get converted from 0x96 to 0x2013, instead of 0x0096 like the regular ascii characters?
How do I map back from 0x2013 to 0x002D (or even 0x0096)? Obviously, I can run every extended ascii character through the command line, see what winds up in the strings, and build a big case statement to convert back to ascii, but I'm looking for something more algorithmic in nature.
OK, that's 2 questions. :)