Link to home
Start Free TrialLog in
Avatar of vcbertini
vcbertiniFlag for United States of America

asked on

Trouble with character encoding in a table

I am having the most frustrating problem.  Each day I receive an export from our mainframe which contains data about courses for our course catalog.  The data in the mainframe contains quotation marks.  The CSV file that is pushed over contains quotation marks. The data after it is imported into the table, contains “.

EXAMPLES

HOW THE DATA APPEARS IN THE TABLE:

Advanced literature seminars that focus on a period or topic in British literature. Recent topics have included “Non-Shakespearean Early Modern British Drama,” “Crime and Victorian Literature,” “The Rise of the English Novelâ€

Open in new window


HOW THE DATA APPEARS IN THE CSV FILE:

Advanced literature seminars that focus on a period or topic in British literature. Recent topics have included "Non-Shakespearean Early Modern British Drama," "Crime and Victorian Literature," "The Rise of the English Novel".

Open in new window


I have been researching this and other articles say maybe the table encoding is wrong. I tracked down the encoding on the table, which is SQL_Latin1_General_CP1_CI_AS and the articles suggest it should be UTF-8.  Any idea how to change this? I found a lot about how to change it in mySQL, but not SQL Server.

Also, can I just change it at the table level or does it have to be applied to the entire database? I'd rather not break other things that seem to be working at the moment.
Avatar of lcohan
lcohan
Flag of Canada image

You cannot change it at the server level other than re-installing SQL and is not trivial to change it at database level so I suggest use a staging table to import your data having the desired collation/code page matching your mainframe export then import form that staging table into your permanent database table via a simple INSERT/SELECT where you have lot more control.
Avatar of vcbertini

ASKER

MS Says you can change it on the column level using the COLLATE statement, which would be good enough but (1) who knows if that will work (2) who knows if that's actually the problem, and (3) I have no idea what to change the encoding to.  The choices are here:  http://technet.microsoft.com/en-us/library/ms144250(v=sql.105).aspx but I don't know what will change a “ to a ".
ASKER CERTIFIED SOLUTION
Avatar of Zberteoc
Zberteoc
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
“Crime and Victorian Literature,â€

The first double quote becomes:  â€œ
and the trailing double quote becomes: â€

which appears quite consistently in the tiny sample.
I think Zberteoc's point is valid in that they are quite distinct, so it's not just a one-to-one mapping.

This MS support page might be of some help
How to convert between ASCII and EBCDIC character codes
a topic which by the way isn't helped by having different flavours of EBCDIC
I ended up writing a stored procedure containing multiple update statements to address and replace the bad characters in the data file after it is updated each day. What a pain. Microsoft in it's usual, aggravating glory. I was hoping there was some magic answer, but it is never as easy as that. Thanks!
I'd lay the blame on the M/F download and the probable existence of antique EBCDIC :)
happy you have the result though.