Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Perl String Split question

Posted on 2009-07-14
7
408 Views
Last Modified: 2013-11-18
Hi,
I have a perl string and would like to split (or tokenize) it by white space w/ the following taken into consideration also - the string may contain
a) Double Strings
b) escaped spaces, i.e. one\ part
c) All the special Characters like
- (Hyphen)
, (Comma)
: (Colon)
; (Semi-colon)
' (apostrophe)
~ (tilda)
@ (At)
# (Hash)
$ (Dollar)
% (Percentage)
^ (Carat)
! (Exclamation)
( (Open brackets)
) (Close brackets)
{ (Open braces)
} (Open braces)
[ (Open Square brackets)
] (Close Square brackets)
+ (Plus)
. (Dot)
| (Pipe)
\ (Backslash)
? (question mark)
_ (underscore)
etc.

For example, if a string contains
The te\ st "of" string_ reg-ex:
should be parsed into
Token 0: The
Token 1: te st
Token2: "of"
Token3: string_
Token4: reg-ex

NOTE: Recommend answers that tries to re-use existing libraries to do the split / tokenizing / parsing etc.

K
0
Comment
Question by:Purdue_Pete
7 Comments
 
LVL 39

Accepted Solution

by:
Adam314 earned 200 total points
ID: 24849963

my $str  ='The te\ st "of" string_ reg-ex';
 
my @tokens = split(/(?<!\\) /, $str);
s/\\ / / for (@tokens);

Open in new window

0
 

Author Comment

by:Purdue_Pete
ID: 24850068
Adam314,
Does your code take care of all the considerations above or is just for the example posted? If not, I am looking for a solution that will take care of all the considerations.

BTW, what does this line do?
my @tokens = split(/(?<!\\) /, $str);


Thanks.
0
 
LVL 39

Assisted Solution

by:Adam314
Adam314 earned 200 total points
ID: 24850121
I believe it takes care of all the requirements, unless I misunderstood the requirements.
a) double string - not real sure what this is... is this consecutive spaces?
    If so, add a "+" after the space, like so:
    my @tokens = split(/(?<!\\) +/, $str);

b) escaped spaces - this is handled
c) special characters - these have no effect

This line:
    my @tokens = split(/(?<!\\) /, $str);
will split a string on spaces if the character preceding the space is not a backslash.
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
LVL 84

Assisted Solution

by:ozo
ozo earned 50 total points
ID: 24851455
#is this what you mean?
$_= 'The te\ st "of" string_ reg-ex: "double string"';
  my @tokens = /((?:\\.|"[^"]*"|\S)+)/g
print "$_\n" for @tokens;
0
 

Author Comment

by:Purdue_Pete
ID: 24852868
Adam314,
Excellent - will try your solution with various strings.
a) I meant double quoted token, i.e. "of" in the example should be treated as one token and should include double quotes also in the token
Related to the consecutive spaces, you mean \ \ , i.e. slash-space-slash-space?

0
 
LVL 39

Expert Comment

by:Adam314
ID: 24853927
If the double string should work like "double string", keeping this as 1 token, you'll need to use what ozo posted.  Otherwise, what I posted should work.

By double space, i meant:
    the   test         string  "of" stuff
the consecutive spaces would be counted only once, so you would not end up with a bunch of empty tokens.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 24854294
One thing I don't think you've clarified enough - if there is a double quoted string, do you want to keep it together?

For example, if a string contains
The te\ st "of string_" reg-ex:
should it be parsed into
Token 0: The
Token 1: te st
Token2: "of string_"
Token3: reg-ex
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
split53 challenge 7 109
combine multiple lines 2 67
Re-position sub-options beneath the TAB 7 98
perl syntax 3 16
Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
The viewer will learn how to implement Singleton Design Pattern in Java.
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question