Trouble with character encoding in a table

I am having the most frustrating problem.  Each day I receive an export from our mainframe which contains data about courses for our course catalog.  The data in the mainframe contains quotation marks.  The CSV file that is pushed over contains quotation marks. The data after it is imported into the table, contains “.

EXAMPLES

HOW THE DATA APPEARS IN THE TABLE:

Advanced literature seminars that focus on a period or topic in British literature. Recent topics have included “Non-Shakespearean Early Modern British Drama,” “Crime and Victorian Literature,” “The Rise of the English Novelâ€

Open in new window


HOW THE DATA APPEARS IN THE CSV FILE:

Advanced literature seminars that focus on a period or topic in British literature. Recent topics have included "Non-Shakespearean Early Modern British Drama," "Crime and Victorian Literature," "The Rise of the English Novel".

Open in new window


I have been researching this and other articles say maybe the table encoding is wrong. I tracked down the encoding on the table, which is SQL_Latin1_General_CP1_CI_AS and the articles suggest it should be UTF-8.  Any idea how to change this? I found a lot about how to change it in mySQL, but not SQL Server.

Also, can I just change it at the table level or does it have to be applied to the entire database? I'd rather not break other things that seem to be working at the moment.
vcbertiniAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

lcohanDatabase AnalystCommented:
You cannot change it at the server level other than re-installing SQL and is not trivial to change it at database level so I suggest use a staging table to import your data having the desired collation/code page matching your mainframe export then import form that staging table into your permanent database table via a simple INSERT/SELECT where you have lot more control.
0
vcbertiniAuthor Commented:
MS Says you can change it on the column level using the COLLATE statement, which would be good enough but (1) who knows if that will work (2) who knows if that's actually the problem, and (3) I have no idea what to change the encoding to.  The choices are here:  http://technet.microsoft.com/en-us/library/ms144250(v=sql.105).aspx but I don't know what will change a “ to a ".
0
ZberteocCommented:
The quotes in the CSV file are not the same with the "classic" double quotes from SQL server. I bet they are some special character double quotes the ones that are upside down and their pair.  Collating will not solve the problem as SQL does not have support for UTF-8.

I quess the only solution is to replace those characters with quotes:

update your_table  set your_col=replace(your_col,'”', '"'), '“','"')

where '"' is actually single_quote+double_quote+single_quote. The problem is if you have other special characters like that, you will have to identify them all and replace with readable counterparts.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

lcohanDatabase AnalystCommented:
"MS Says you can change it on the column level using the COLLATE statement, which would be good enough"
                Indeed - it would be enough.

"...but (1) who knows if that will work"
                If you know the Code page and collation of course it will work but...are you willing to change the receiving table column as that may be dangerous OR rather write the transformation

" (2) who knows if that's actually the problem,"
            << Each day I receive an export from our mainframe which contains data about courses for our course catalog.>> you need to know what is the code page used in your mainframe and what translation takes place when you export from mainframe to windows I suppose. Is it EBDIDIC to ASCII? I had lost of fun myself with that kind of data transfers then try figure send it to UNIX which means yet another data transformation.  Just think the EXACT same character you see on screen will have different codex and most important MAY have even different HEX-adecimal representation just because of different code pages on different systems and implicit translations during transfers and that's exactly where MAPPING helps - like described below:

http://support.microsoft.com/kb/216399

As far as SQL not supporting UTF-8 that's not 100% accurate because all UTF 8 characters can fit in SQL UNICODE data types - is just that they are stored on 16 not 8 bytes like UTF8

http://support.microsoft.com/kb/232580
0
PortletPaulfreelancerCommented:
“Crime and Victorian Literature,â€

The first double quote becomes:  â€œ
and the trailing double quote becomes: â€

which appears quite consistently in the tiny sample.
I think Zberteoc's point is valid in that they are quite distinct, so it's not just a one-to-one mapping.

This MS support page might be of some help
How to convert between ASCII and EBCDIC character codes
a topic which by the way isn't helped by having different flavours of EBCDIC
0
vcbertiniAuthor Commented:
I ended up writing a stored procedure containing multiple update statements to address and replace the bad characters in the data file after it is updated each day. What a pain. Microsoft in it's usual, aggravating glory. I was hoping there was some magic answer, but it is never as easy as that. Thanks!
0
PortletPaulfreelancerCommented:
I'd lay the blame on the M/F download and the probable existence of antique EBCDIC :)
happy you have the result though.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server 2008

From novice to tech pro — start learning today.