Avatar of rwheeler23
rwheeler23
Flag for United States of America asked on

VS C# Strings with quote help

I have a troublesome text file I am trying to import into SQL using SSIS. I may try to write a VS C# application to correct the fields before I insert them in to SQL but here is the problem.

"The rain in Spain stays mainly in the "Plains"" or
"The rain in Spain stays mainly in the "Plains" instead of just
"The rain in Spain stays mainly in the Plains"

Out of 50,000 + lines there are about 100 where the number of quotation marks is messing up the import. Can anyone point to some code or provide hints as to how I can insure that every string field starts and ends with a quotation mark and has no other quotation marks in between the beginning and the end?
C#

Avatar of undefined
Last Comment
rwheeler23

8/22/2022 - Mon
Dmitry G

Something like this:

        private void button1_Click(object sender, EventArgs e)
        {
            string s1 = "The rain in Spain stays mainly in the \"Plains\"";
            string s2 = "The rain in Spain stays mainly in the \"Plains";

            string s = "";

            s = "\"" + s1.Replace("\"", string.Empty) + "\"";
            System.Diagnostics.Debug.WriteLine(s);
            s = "\"" + s2.Replace("\"", string.Empty) + "\"";
            System.Diagnostics.Debug.WriteLine(s);

        }

Open in new window


The problem is that performance could be not too good . Some tricks might be required to speed up the algorithm.

E.g., first - check if quotation character exists starting from index 1 to length-2 including. If yes - do above. But still  might be slow.
rwheeler23

ASKER
Performance is not going to be an issue here. I am doing a one time clean up of old data and have to mop it up before transfer into a new database. I thing my biggest challenge will be setting up the streamreader so that it can read each field separately so your code has the correct string value upon which to work.
Dmitry G

Again, it's not clear what the source is? Is it a plain text file. Is each "field" a separate line? Etc. For example, when I deal with my import csv files, I read and process text line by line. But I don't know what structure your source file has...


BTW, I usually use text reader class.
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
rwheeler23

ASKER
What comes out of this old software is a comma delimited text. The problem is the embedded quotation marks. There will be values like:

"Mr. Mark O"Reilly's rental care fee" or
"Move office to new "Location""

Where the number of quotations marks within the quotation marks varies. The CR/LF combination is at the end of each line of this plain text file.

Here is an actual sample line. This line is fine. It is the ones with the embedded quotation marks that cause grief.

Customer ID,Customer Name,Invoice/CM #,Apply to Invoice Number,Date,Customer PO,Ship Via,Date Due,Discount Amount,Discount Date,Sales Tax ID,Quantity,Item ID,Description,Unit Price,Amount
5013,"UltraGroup Consutling, LLC",111,,6/15/99,,Courier,7/15/99,0.00,6/25/99,,1.00,3700,Merger and Acquistion Service,150.00,-150.00
Dmitry G

There are no big challenges  in splitting lines to fields if you have right number of quotation marks. Should be pretty straightforward. But there are some issues.
This is csv. Some fields may contain commas that are not field separators. As a rule such commas sit between quotation marks, as in your case:
"UltraGroup Consutling, LLC"
These commas are to be excluded and we can use string.Split.

On the other hand, if your field has not even number of quotation marks it is pretty distinguish between such commas.
If we have
... ,"UltraGroup Consutling, LLC,

we are in trouble. In our application we discard such records as not valid - just "bad data".
rwheeler23

ASKER
Well I have seen things like:

"Bill O"Reilly prefers you not bloviate" - Would this be a problem?
"Fee for tent "assembly"" - Would this be a problem?
"Fee for tent "assembly" - This looks like a problem?
"Los Angeles, CA 91245"  -  This should be fine

In the file I have never seen just one quotation mark. It always begins and ends with one. It is what happens in between that is the problem.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER CERTIFIED SOLUTION
Dmitry G

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
rwheeler23

ASKER
Is there anyway to adjust the code I have attached to conquer this problem? Would switching to string.split make any difference? If not, does this mean I need to resort to reading each line in in its entirety and then parse each line character by character and proceed as previously mentioned?
TextFieldParser.txt
Dmitry G

Hmmm... Honestly, I never used this parser. I don't like to use VB namespace despite I work in VB! This parser is quite primitive. At least, it does not handle cases like:

"Fee for tent "assembly""


For me it is valid case. I'd implement parser myself, it's not too hard.

String.split does not work here...
rwheeler23

ASKER
I made a minor change to the code by inserting a try/catch combination. I just ran this on the largest csv file I have and the one with the most embedded quotation marks. Out of almost 100,000 records there were less than two dozen of these. For now, what I will do is just log the record numbers of the bad ones and then use NOTEPAD to fix then until they all pass. Then when time permitting I will follow your suggestion to just read the entire line and parse it looking for errant quotation marks. The catch caught all the ones with the emdedded quotation marks.
TextFieldParser.txt
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
Dmitry G

This makes sense, well done!
rwheeler23

ASKER
Thank you for all your help!