hyliandanny
asked on
How do I use regular expressions to check for characters within quotation marks in C#?
I'm reading a CSV file and am trying to check for text within quotation marks. The idea is that:
- if the string has two pairs of quotation marks, it is valid; skip to the next entry.
- if the string only has one pair of quotation marks, then loop through the rest of the entries until the closing pair of quotation marks is found. Combine those strings together as you go along, inserting a comma between them. This is done because the quotation marks contain commas, which should be a part of the collected text.
My problem is that my use of Regex.IsMatch( ) seems to be returning TRUE for the case where only a single quotation marks exists. How can I make the regular expression return true for text surrounded by quotation marks while returning false for only a single quotation marks character?
- if the string has two pairs of quotation marks, it is valid; skip to the next entry.
- if the string only has one pair of quotation marks, then loop through the rest of the entries until the closing pair of quotation marks is found. Combine those strings together as you go along, inserting a comma between them. This is done because the quotation marks contain commas, which should be a part of the collected text.
My problem is that my use of Regex.IsMatch( ) seems to be returning TRUE for the case where only a single quotation marks exists. How can I make the regular expression return true for text surrounded by quotation marks while returning false for only a single quotation marks character?
for (int i = (int)EColumns.e_colName; i < lineValues.Length; i++) {
// We want any single-quotation-mark-set entries
if (lineValues[i].Contains("\""))
if (Regex.IsMatch(lineValues[i], (".*")))
// We don't want entries with a matching set already
continue;
else {
// Search for the entry with the closing quotation marks. Assumed: quotation marks
// are never meant to be contained within quotation marks.
int j = i;
while (!lineValues[++j].Contains("\""))
;
lineValues[j].Insert(0, ",");
lineValues[i].Insert(lineValues[i].Length, lineValues[j]);
}
}
Can you provide an example of a valid string and an invalid string?
ASKER
Certainly.
Sample 1:
"This is a string that I don't want to match because it's surrounded by quotation marks."
Sample 2:
"This is a string I want to find a match because it starts with quotation marks at the start but not at the end to close it.
Sample 3:
Sometime later might come a string that I don't want matched either since it doesn't have quotation marks at all
Sample 4:
but eventually one entry would close it. This would be a string I do want matched since it has quotation marks at the end."
Sample 1:
"This is a string that I don't want to match because it's surrounded by quotation marks."
Sample 2:
"This is a string I want to find a match because it starts with quotation marks at the start but not at the end to close it.
Sample 3:
Sometime later might come a string that I don't want matched either since it doesn't have quotation marks at all
Sample 4:
but eventually one entry would close it. This would be a string I do want matched since it has quotation marks at the end."
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I'm still not understanding the comma part.
Is each line a separate string?
Is each line a separate string?
ASKER
I still need to try out numberkruncher's proposed solution, but to answer your question, darkstar3d, the following are either desired for being found a match or not-desired. Each line you see is a separate string.
"desired
"not-desired"
desired"
not-desired
Commas should not exist in any of the strings inspected, since I split up the entire strings by commas. I'm getting everything within the commas. The logic is that if I find a double-quotation (I'll call it the "open double-quotation), then there exist commas between strings such that they belong in a string formed by the open double-quotation and a following double-quotation (which I'll call the closing double-quotation).
It falls to me, then, to insert commas between each string since the commas are actually encompassed by the open double-quotation and closing double-quotation.
"desired
"not-desired"
desired"
not-desired
Commas should not exist in any of the strings inspected, since I split up the entire strings by commas. I'm getting everything within the commas. The logic is that if I find a double-quotation (I'll call it the "open double-quotation), then there exist commas between strings such that they belong in a string formed by the open double-quotation and a following double-quotation (which I'll call the closing double-quotation).
It falls to me, then, to insert commas between each string since the commas are actually encompassed by the open double-quotation and closing double-quotation.
You don't have the original data set so that you can properly escape the commas? Seems to be a better approach than what you are attempting to do now.