Solved

Question for a Regex in split

Posted on 1998-09-23
4
216 Views
Last Modified: 2012-05-04
Hi,

can anyone tell me a regex which I can use to split
lines where fields are separated by a ; if this is
not within two paired double quotes?

Example:

Field 1;Field 2;"Field 3a;Field 3b";"Field 4a;Field 4b"

I want to get:

Field 1
Field 2
Field 3a;Field 3b
Field 4a;Field 4b

I have to use split as it is not known to me how many
fields there are and where a semicolon is included in
double quotes.

Thanks for your help,
Kai.
0
Comment
Question by:kaijen
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
4 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 1204955
@fields=/((?:"[^"]*"|[^;])+)/g;
0
 
LVL 84

Expert Comment

by:ozo
ID: 1204956
perldoc -q split
Found in perlfaq4.pod

How can I split a [character] delimited string except when inside [character]? (Comma-separated files)

Take the example case of trying to split a string that is comma-separated
into its different fields.  (We'll pretend you said comma-separated, not
comma-delimited, which is different and almost never what you mean.) You
can't use C<split(/,/)> because you shouldn't split if the comma is inside
quotes.  For example, take a data line like this:

    SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"

Due to the restriction of the quotes, this is a fairly complex
problem.  Thankfully, we have Jeffrey Friedl, author of a highly
recommended book on regular expressions, to handle these for us.  He
suggests (assuming your string is contained in $text):

     @new = ();
     push(@new, $+) while $text =~ m{
         "([^\"\\]*(?:\\.[^\"\\]*)*)",?  # groups the phrase inside the quotes
       | ([^,]+),?
       | ,
     }gx;
     push(@new, undef) if substr($text,-1,1) eq ',';

If you want to represent quotation marks inside a
quotation-mark-delimited field, escape them with backslashes (eg,
C<"like \"this\"">.  Unescaping them is a task addressed earlier in
this section.

Alternatively, the Text::ParseWords module (part of the standard perl
distribution) lets you say:

    use Text::ParseWords;
    @new = quotewords(",", 0, $text);

0
 

Author Comment

by:kaijen
ID: 1204957
Thanks alot!

This works. I only have to work out how to strip the
doublequotes. But this sounds like a good homework.

Please state this as an answer and I'll give you the
points!

Best regards,
Kai.
0
 
LVL 84

Accepted Solution

by:
ozo earned 200 total points
ID: 1204958
# If the quotes are not part of the field, does that mean that you'll never see anything like
# Field 1;;"Field" "3a;" "Field 3b";Field 4a";"Field 4b
#If not then something like this may work for you:
@fields = grep length,split /;|"([^"]*)"/;
#or
push @fields,$+  while /(?:"([^"]*)|([^;]+))[^;]*;?/g;
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question