Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Question for a Regex in split

Posted on 1998-09-23
4
201 Views
Last Modified: 2012-05-04
Hi,

can anyone tell me a regex which I can use to split
lines where fields are separated by a ; if this is
not within two paired double quotes?

Example:

Field 1;Field 2;"Field 3a;Field 3b";"Field 4a;Field 4b"

I want to get:

Field 1
Field 2
Field 3a;Field 3b
Field 4a;Field 4b

I have to use split as it is not known to me how many
fields there are and where a semicolon is included in
double quotes.

Thanks for your help,
Kai.
0
Comment
Question by:kaijen
  • 3
4 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 1204955
@fields=/((?:"[^"]*"|[^;])+)/g;
0
 
LVL 84

Expert Comment

by:ozo
ID: 1204956
perldoc -q split
Found in perlfaq4.pod

How can I split a [character] delimited string except when inside [character]? (Comma-separated files)

Take the example case of trying to split a string that is comma-separated
into its different fields.  (We'll pretend you said comma-separated, not
comma-delimited, which is different and almost never what you mean.) You
can't use C<split(/,/)> because you shouldn't split if the comma is inside
quotes.  For example, take a data line like this:

    SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"

Due to the restriction of the quotes, this is a fairly complex
problem.  Thankfully, we have Jeffrey Friedl, author of a highly
recommended book on regular expressions, to handle these for us.  He
suggests (assuming your string is contained in $text):

     @new = ();
     push(@new, $+) while $text =~ m{
         "([^\"\\]*(?:\\.[^\"\\]*)*)",?  # groups the phrase inside the quotes
       | ([^,]+),?
       | ,
     }gx;
     push(@new, undef) if substr($text,-1,1) eq ',';

If you want to represent quotation marks inside a
quotation-mark-delimited field, escape them with backslashes (eg,
C<"like \"this\"">.  Unescaping them is a task addressed earlier in
this section.

Alternatively, the Text::ParseWords module (part of the standard perl
distribution) lets you say:

    use Text::ParseWords;
    @new = quotewords(",", 0, $text);

0
 

Author Comment

by:kaijen
ID: 1204957
Thanks alot!

This works. I only have to work out how to strip the
doublequotes. But this sounds like a good homework.

Please state this as an answer and I'll give you the
points!

Best regards,
Kai.
0
 
LVL 84

Accepted Solution

by:
ozo earned 200 total points
ID: 1204958
# If the quotes are not part of the field, does that mean that you'll never see anything like
# Field 1;;"Field" "3a;" "Field 3b";Field 4a";"Field 4b
#If not then something like this may work for you:
@fields = grep length,split /;|"([^"]*)"/;
#or
push @fields,$+  while /(?:"([^"]*)|([^;]+))[^;]*;?/g;
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
use perl to insert into MySQL database 9 144
Perl Awk Need Help 3 128
Vb script to unzip a files and rename the files 12 113
perl: Cleaning meta tags using RegEX 12 82
Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question