Solved

Perl String Split question

Posted on 2009-07-14
7
404 Views
Last Modified: 2013-11-18
Hi,
I have a perl string and would like to split (or tokenize) it by white space w/ the following taken into consideration also - the string may contain
a) Double Strings
b) escaped spaces, i.e. one\ part
c) All the special Characters like
- (Hyphen)
, (Comma)
: (Colon)
; (Semi-colon)
' (apostrophe)
~ (tilda)
@ (At)
# (Hash)
$ (Dollar)
% (Percentage)
^ (Carat)
! (Exclamation)
( (Open brackets)
) (Close brackets)
{ (Open braces)
} (Open braces)
[ (Open Square brackets)
] (Close Square brackets)
+ (Plus)
. (Dot)
| (Pipe)
\ (Backslash)
? (question mark)
_ (underscore)
etc.

For example, if a string contains
The te\ st "of" string_ reg-ex:
should be parsed into
Token 0: The
Token 1: te st
Token2: "of"
Token3: string_
Token4: reg-ex

NOTE: Recommend answers that tries to re-use existing libraries to do the split / tokenizing / parsing etc.

K
0
Comment
Question by:Purdue_Pete
7 Comments
 
LVL 39

Accepted Solution

by:
Adam314 earned 200 total points
ID: 24849963

my $str  ='The te\ st "of" string_ reg-ex';
 

my @tokens = split(/(?<!\\) /, $str);

s/\\ / / for (@tokens);

Open in new window

0
 

Author Comment

by:Purdue_Pete
ID: 24850068
Adam314,
Does your code take care of all the considerations above or is just for the example posted? If not, I am looking for a solution that will take care of all the considerations.

BTW, what does this line do?
my @tokens = split(/(?<!\\) /, $str);


Thanks.
0
 
LVL 39

Assisted Solution

by:Adam314
Adam314 earned 200 total points
ID: 24850121
I believe it takes care of all the requirements, unless I misunderstood the requirements.
a) double string - not real sure what this is... is this consecutive spaces?
    If so, add a "+" after the space, like so:
    my @tokens = split(/(?<!\\) +/, $str);

b) escaped spaces - this is handled
c) special characters - these have no effect

This line:
    my @tokens = split(/(?<!\\) /, $str);
will split a string on spaces if the character preceding the space is not a backslash.
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 84

Assisted Solution

by:ozo
ozo earned 50 total points
ID: 24851455
#is this what you mean?
$_= 'The te\ st "of" string_ reg-ex: "double string"';
  my @tokens = /((?:\\.|"[^"]*"|\S)+)/g
print "$_\n" for @tokens;
0
 

Author Comment

by:Purdue_Pete
ID: 24852868
Adam314,
Excellent - will try your solution with various strings.
a) I meant double quoted token, i.e. "of" in the example should be treated as one token and should include double quotes also in the token
Related to the consecutive spaces, you mean \ \ , i.e. slash-space-slash-space?

0
 
LVL 39

Expert Comment

by:Adam314
ID: 24853927
If the double string should work like "double string", keeping this as 1 token, you'll need to use what ozo posted.  Otherwise, what I posted should work.

By double space, i meant:
    the   test         string  "of" stuff
the consecutive spaces would be counted only once, so you would not end up with a bunch of empty tokens.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 24854294
One thing I don't think you've clarified enough - if there is a double quoted string, do you want to keep it together?

For example, if a string contains
The te\ st "of string_" reg-ex:
should it be parsed into
Token 0: The
Token 1: te st
Token2: "of string_"
Token3: reg-ex
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now