Solved

Question for a Regex in split

Posted on 1998-09-23
4
194 Views
Last Modified: 2012-05-04
Hi,

can anyone tell me a regex which I can use to split
lines where fields are separated by a ; if this is
not within two paired double quotes?

Example:

Field 1;Field 2;"Field 3a;Field 3b";"Field 4a;Field 4b"

I want to get:

Field 1
Field 2
Field 3a;Field 3b
Field 4a;Field 4b

I have to use split as it is not known to me how many
fields there are and where a semicolon is included in
double quotes.

Thanks for your help,
Kai.
0
Comment
Question by:kaijen
  • 3
4 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 1204955
@fields=/((?:"[^"]*"|[^;])+)/g;
0
 
LVL 84

Expert Comment

by:ozo
ID: 1204956
perldoc -q split
Found in perlfaq4.pod

How can I split a [character] delimited string except when inside [character]? (Comma-separated files)

Take the example case of trying to split a string that is comma-separated
into its different fields.  (We'll pretend you said comma-separated, not
comma-delimited, which is different and almost never what you mean.) You
can't use C<split(/,/)> because you shouldn't split if the comma is inside
quotes.  For example, take a data line like this:

    SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"

Due to the restriction of the quotes, this is a fairly complex
problem.  Thankfully, we have Jeffrey Friedl, author of a highly
recommended book on regular expressions, to handle these for us.  He
suggests (assuming your string is contained in $text):

     @new = ();
     push(@new, $+) while $text =~ m{
         "([^\"\\]*(?:\\.[^\"\\]*)*)",?  # groups the phrase inside the quotes
       | ([^,]+),?
       | ,
     }gx;
     push(@new, undef) if substr($text,-1,1) eq ',';

If you want to represent quotation marks inside a
quotation-mark-delimited field, escape them with backslashes (eg,
C<"like \"this\"">.  Unescaping them is a task addressed earlier in
this section.

Alternatively, the Text::ParseWords module (part of the standard perl
distribution) lets you say:

    use Text::ParseWords;
    @new = quotewords(",", 0, $text);

0
 

Author Comment

by:kaijen
ID: 1204957
Thanks alot!

This works. I only have to work out how to strip the
doublequotes. But this sounds like a good homework.

Please state this as an answer and I'll give you the
points!

Best regards,
Kai.
0
 
LVL 84

Accepted Solution

by:
ozo earned 200 total points
ID: 1204958
# If the quotes are not part of the field, does that mean that you'll never see anything like
# Field 1;;"Field" "3a;" "Field 3b";Field 4a";"Field 4b
#If not then something like this may work for you:
@fields = grep length,split /;|"([^"]*)"/;
#or
push @fields,$+  while /(?:"([^"]*)|([^;]+))[^;]*;?/g;
0

Featured Post

Are your AD admin tools letting you down?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question