Perl String Split question

Hi,
I have a perl string and would like to split (or tokenize) it by white space w/ the following taken into consideration also - the string may contain
a) Double Strings
b) escaped spaces, i.e. one\ part
c) All the special Characters like
- (Hyphen)
, (Comma)
: (Colon)
; (Semi-colon)
' (apostrophe)
~ (tilda)
@ (At)
# (Hash)
$ (Dollar)
% (Percentage)
^ (Carat)
! (Exclamation)
( (Open brackets)
) (Close brackets)
{ (Open braces)
} (Open braces)
[ (Open Square brackets)
] (Close Square brackets)
+ (Plus)
. (Dot)
| (Pipe)
\ (Backslash)
? (question mark)
_ (underscore)
etc.

For example, if a string contains
The te\ st "of" string_ reg-ex:
should be parsed into
Token 0: The
Token 1: te st
Token2: "of"
Token3: string_
Token4: reg-ex

NOTE: Recommend answers that tries to re-use existing libraries to do the split / tokenizing / parsing etc.

K
Purdue_PeteAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Adam314Commented:

my $str  ='The te\ st "of" string_ reg-ex';
 
my @tokens = split(/(?<!\\) /, $str);
s/\\ / / for (@tokens);

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Purdue_PeteAuthor Commented:
Adam314,
Does your code take care of all the considerations above or is just for the example posted? If not, I am looking for a solution that will take care of all the considerations.

BTW, what does this line do?
my @tokens = split(/(?<!\\) /, $str);


Thanks.
0
Adam314Commented:
I believe it takes care of all the requirements, unless I misunderstood the requirements.
a) double string - not real sure what this is... is this consecutive spaces?
    If so, add a "+" after the space, like so:
    my @tokens = split(/(?<!\\) +/, $str);

b) escaped spaces - this is handled
c) special characters - these have no effect

This line:
    my @tokens = split(/(?<!\\) /, $str);
will split a string on spaces if the character preceding the space is not a backslash.
0
Why Diversity in Tech Matter

Kesha Williams, certified professional and software developer, explores the imbalance of diversity in the world of technology -- especially when it comes to hiring women. She showcases ways she's making a difference ithrough the Colors of STEM program.

ozoCommented:
#is this what you mean?
$_= 'The te\ st "of" string_ reg-ex: "double string"';
  my @tokens = /((?:\\.|"[^"]*"|\S)+)/g
print "$_\n" for @tokens;
0
Purdue_PeteAuthor Commented:
Adam314,
Excellent - will try your solution with various strings.
a) I meant double quoted token, i.e. "of" in the example should be treated as one token and should include double quotes also in the token
Related to the consecutive spaces, you mean \ \ , i.e. slash-space-slash-space?

0
Adam314Commented:
If the double string should work like "double string", keeping this as 1 token, you'll need to use what ozo posted.  Otherwise, what I posted should work.

By double space, i meant:
    the   test         string  "of" stuff
the consecutive spaces would be counted only once, so you would not end up with a bunch of empty tokens.
0
Terry WoodsIT GuruCommented:
One thing I don't think you've clarified enough - if there is a double quoted string, do you want to keep it together?

For example, if a string contains
The te\ st "of string_" reg-ex:
should it be parsed into
Token 0: The
Token 1: te st
Token2: "of string_"
Token3: reg-ex
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Programming Languages-Other

From novice to tech pro — start learning today.