Solved

How do I use regular expressions to check for characters within quotation marks in C#?

Posted on 2009-04-08
6
743 Views
Last Modified: 2013-12-17
I'm reading a CSV file and am trying to check for text within quotation marks.  The idea is that:
- if the string has two pairs of quotation marks, it is valid; skip to the next entry.
- if the string only has one pair of quotation marks, then loop through the rest of the entries until the closing pair of quotation marks is found.  Combine those strings together as you go along, inserting a comma between them.  This is done because the quotation marks contain commas, which should be a part of the collected text.

My problem is that my use of Regex.IsMatch( ) seems to be returning TRUE for the case where only a single quotation marks exists.  How can I make the regular expression return true for text surrounded by quotation marks while returning false for only a single quotation marks character?

for (int i = (int)EColumns.e_colName; i < lineValues.Length; i++) {
 

                                        // We want any single-quotation-mark-set entries

                                        if (lineValues[i].Contains("\""))

                                            if (Regex.IsMatch(lineValues[i], (".*"))) 

                                                // We don't want entries with a matching set already

                                                continue;

                                            else {

                                                //  Search for the entry with the closing quotation marks. Assumed: quotation marks

                                                // are never meant to be contained within quotation marks.

                                                int j = i;

                                                while (!lineValues[++j].Contains("\""))

                                                    ;
 

                                                lineValues[j].Insert(0, ",");

                                                lineValues[i].Insert(lineValues[i].Length, lineValues[j]);

                                            }

                                    }

Open in new window

0
Comment
Question by:hyliandanny
  • 3
  • 2
6 Comments
 
LVL 23

Expert Comment

by:Stacy Spear
Comment Utility
Can you provide an example of a valid string and an invalid string?
0
 

Author Comment

by:hyliandanny
Comment Utility
Certainly.

Sample 1:
"This is a string that I don't want to match because it's surrounded by quotation marks."

Sample 2:
"This is a string I want to find a match because it starts with quotation marks at the start but not at the end to close it.

Sample 3:
Sometime later might come a string that I don't want matched either since it doesn't have quotation marks at all

Sample 4:
but eventually one entry would close it.  This would be a string I do want matched since it has quotation marks at the end."
0
 
LVL 13

Accepted Solution

by:
numberkruncher earned 125 total points
Comment Utility
One way to match literal type content like this is to use an expression like the following:
// The following regular expression matches "anything but a double quote"
 

"[^"]*"
 
 

// The following regular expression matches "abc""def"

// where "" is an escape character.
 

"([^"]|"")*"
 
 

// The following regular expression matches "abc""def

// where where no closing double quote was specified in input.

// Effectively this means that it will encapsulate the entire document

// unless a closing quote is detected.
 

"([^"]|"")*("|$)

Open in new window

0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 23

Expert Comment

by:Stacy Spear
Comment Utility
I'm still not understanding the comma part.

Is each line a separate string?
0
 

Author Comment

by:hyliandanny
Comment Utility
I still need to try out numberkruncher's proposed solution, but to answer your question, darkstar3d, the following are either desired for being found a match or not-desired.  Each line you see is a separate string.

"desired

"not-desired"

desired"

not-desired

Commas should not exist in any of the strings inspected, since I split up the entire strings by commas.  I'm getting everything within the commas.  The logic is that if I find a double-quotation (I'll call it the "open double-quotation), then there exist commas between strings such that they belong in a string formed by the open double-quotation and a following double-quotation (which I'll call the closing double-quotation).

It falls to me, then, to insert commas between each string since the commas are actually encompassed by the open double-quotation and closing double-quotation.
0
 
LVL 23

Expert Comment

by:Stacy Spear
Comment Utility
You don't have the original data set so that you can properly escape the commas? Seems to be a better approach than what you are attempting to do now.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

This document covers how to connect to SQL Server and browse its contents.  It is meant for those new to Visual Studio and/or working with Microsoft SQL Server.  It is not a guide to building SQL Server database connections in your code.  This is mo…
More often than not, we developers are confronted with a need: a need to make some kind of magic happen via code. Whether it is for a client, for the boss, or for our own personal projects, the need must be satisfied. Most of the time, the Framework…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now