RegEx: Split by non-word char, except inside quotes
Posted on 2009-05-20
Been banging my head on this one for a little while. I'm using boost::regex, and I'd like to split a string by non-word characters, except where the non-word character is inside quotes.
An example string:
The gopher's bike wasn't "hot enough" for the judges.
Would get split into:
The really tricky part is where the quoted string has an escaped quote. For example:
Here is a "string with \"a quote\" inside" of it.
That should be split as
"string with \"a quote\" inside"
I think boost::regex is Perl compatible, so it shouldn't matter if I'm using boost::regex, or PHP's preg_split, or any other Perl compatible regex engine.
Can anyone offer any suggestions?
P.S. Yes, I'm trying to keep the quotes in the match, as the split examples above show.