Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Perl String Split question

Posted on 2009-07-14
7
Medium Priority
?
424 Views
Last Modified: 2013-11-18
Hi,
I have a perl string and would like to split (or tokenize) it by white space w/ the following taken into consideration also - the string may contain
a) Double Strings
b) escaped spaces, i.e. one\ part
c) All the special Characters like
- (Hyphen)
, (Comma)
: (Colon)
; (Semi-colon)
' (apostrophe)
~ (tilda)
@ (At)
# (Hash)
$ (Dollar)
% (Percentage)
^ (Carat)
! (Exclamation)
( (Open brackets)
) (Close brackets)
{ (Open braces)
} (Open braces)
[ (Open Square brackets)
] (Close Square brackets)
+ (Plus)
. (Dot)
| (Pipe)
\ (Backslash)
? (question mark)
_ (underscore)
etc.

For example, if a string contains
The te\ st "of" string_ reg-ex:
should be parsed into
Token 0: The
Token 1: te st
Token2: "of"
Token3: string_
Token4: reg-ex

NOTE: Recommend answers that tries to re-use existing libraries to do the split / tokenizing / parsing etc.

K
0
Comment
Question by:Purdue_Pete
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
7 Comments
 
LVL 39

Accepted Solution

by:
Adam314 earned 800 total points
ID: 24849963

my $str  ='The te\ st "of" string_ reg-ex';
 
my @tokens = split(/(?<!\\) /, $str);
s/\\ / / for (@tokens);

Open in new window

0
 

Author Comment

by:Purdue_Pete
ID: 24850068
Adam314,
Does your code take care of all the considerations above or is just for the example posted? If not, I am looking for a solution that will take care of all the considerations.

BTW, what does this line do?
my @tokens = split(/(?<!\\) /, $str);


Thanks.
0
 
LVL 39

Assisted Solution

by:Adam314
Adam314 earned 800 total points
ID: 24850121
I believe it takes care of all the requirements, unless I misunderstood the requirements.
a) double string - not real sure what this is... is this consecutive spaces?
    If so, add a "+" after the space, like so:
    my @tokens = split(/(?<!\\) +/, $str);

b) escaped spaces - this is handled
c) special characters - these have no effect

This line:
    my @tokens = split(/(?<!\\) /, $str);
will split a string on spaces if the character preceding the space is not a backslash.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 84

Assisted Solution

by:ozo
ozo earned 200 total points
ID: 24851455
#is this what you mean?
$_= 'The te\ st "of" string_ reg-ex: "double string"';
  my @tokens = /((?:\\.|"[^"]*"|\S)+)/g
print "$_\n" for @tokens;
0
 

Author Comment

by:Purdue_Pete
ID: 24852868
Adam314,
Excellent - will try your solution with various strings.
a) I meant double quoted token, i.e. "of" in the example should be treated as one token and should include double quotes also in the token
Related to the consecutive spaces, you mean \ \ , i.e. slash-space-slash-space?

0
 
LVL 39

Expert Comment

by:Adam314
ID: 24853927
If the double string should work like "double string", keeping this as 1 token, you'll need to use what ozo posted.  Otherwise, what I posted should work.

By double space, i meant:
    the   test         string  "of" stuff
the consecutive spaces would be counted only once, so you would not end up with a bunch of empty tokens.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 24854294
One thing I don't think you've clarified enough - if there is a double quoted string, do you want to keep it together?

For example, if a string contains
The te\ st "of string_" reg-ex:
should it be parsed into
Token 0: The
Token 1: te st
Token2: "of string_"
Token3: reg-ex
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
The viewer will learn how to implement Singleton Design Pattern in Java.
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

604 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question