Solved

Perl String Split question

Posted on 2009-07-14
7
405 Views
Last Modified: 2013-11-18
Hi,
I have a perl string and would like to split (or tokenize) it by white space w/ the following taken into consideration also - the string may contain
a) Double Strings
b) escaped spaces, i.e. one\ part
c) All the special Characters like
- (Hyphen)
, (Comma)
: (Colon)
; (Semi-colon)
' (apostrophe)
~ (tilda)
@ (At)
# (Hash)
$ (Dollar)
% (Percentage)
^ (Carat)
! (Exclamation)
( (Open brackets)
) (Close brackets)
{ (Open braces)
} (Open braces)
[ (Open Square brackets)
] (Close Square brackets)
+ (Plus)
. (Dot)
| (Pipe)
\ (Backslash)
? (question mark)
_ (underscore)
etc.

For example, if a string contains
The te\ st "of" string_ reg-ex:
should be parsed into
Token 0: The
Token 1: te st
Token2: "of"
Token3: string_
Token4: reg-ex

NOTE: Recommend answers that tries to re-use existing libraries to do the split / tokenizing / parsing etc.

K
0
Comment
Question by:Purdue_Pete
7 Comments
 
LVL 39

Accepted Solution

by:
Adam314 earned 200 total points
ID: 24849963

my $str  ='The te\ st "of" string_ reg-ex';
 

my @tokens = split(/(?<!\\) /, $str);

s/\\ / / for (@tokens);

Open in new window

0
 

Author Comment

by:Purdue_Pete
ID: 24850068
Adam314,
Does your code take care of all the considerations above or is just for the example posted? If not, I am looking for a solution that will take care of all the considerations.

BTW, what does this line do?
my @tokens = split(/(?<!\\) /, $str);


Thanks.
0
 
LVL 39

Assisted Solution

by:Adam314
Adam314 earned 200 total points
ID: 24850121
I believe it takes care of all the requirements, unless I misunderstood the requirements.
a) double string - not real sure what this is... is this consecutive spaces?
    If so, add a "+" after the space, like so:
    my @tokens = split(/(?<!\\) +/, $str);

b) escaped spaces - this is handled
c) special characters - these have no effect

This line:
    my @tokens = split(/(?<!\\) /, $str);
will split a string on spaces if the character preceding the space is not a backslash.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 84

Assisted Solution

by:ozo
ozo earned 50 total points
ID: 24851455
#is this what you mean?
$_= 'The te\ st "of" string_ reg-ex: "double string"';
  my @tokens = /((?:\\.|"[^"]*"|\S)+)/g
print "$_\n" for @tokens;
0
 

Author Comment

by:Purdue_Pete
ID: 24852868
Adam314,
Excellent - will try your solution with various strings.
a) I meant double quoted token, i.e. "of" in the example should be treated as one token and should include double quotes also in the token
Related to the consecutive spaces, you mean \ \ , i.e. slash-space-slash-space?

0
 
LVL 39

Expert Comment

by:Adam314
ID: 24853927
If the double string should work like "double string", keeping this as 1 token, you'll need to use what ozo posted.  Otherwise, what I posted should work.

By double space, i meant:
    the   test         string  "of" stuff
the consecutive spaces would be counted only once, so you would not end up with a bunch of empty tokens.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 24854294
One thing I don't think you've clarified enough - if there is a double quoted string, do you want to keep it together?

For example, if a string contains
The te\ st "of string_" reg-ex:
should it be parsed into
Token 0: The
Token 1: te st
Token2: "of string_"
Token3: reg-ex
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
mapAB Challlenge 35 123
What's wrong with this web.config regex? 3 41
allswap challenge 6 75
What does msixpodualn stand for and how do I read this "qr/STRING/msixpodualn"? 4 37
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.

919 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now