vcbertini
asked on
Trouble with character encoding in a table
I am having the most frustrating problem. Each day I receive an export from our mainframe which contains data about courses for our course catalog. The data in the mainframe contains quotation marks. The CSV file that is pushed over contains quotation marks. The data after it is imported into the table, contains “.
EXAMPLES
HOW THE DATA APPEARS IN THE TABLE:
HOW THE DATA APPEARS IN THE CSV FILE:
I have been researching this and other articles say maybe the table encoding is wrong. I tracked down the encoding on the table, which is SQL_Latin1_General_CP1_CI_ AS and the articles suggest it should be UTF-8. Any idea how to change this? I found a lot about how to change it in mySQL, but not SQL Server.
Also, can I just change it at the table level or does it have to be applied to the entire database? I'd rather not break other things that seem to be working at the moment.
EXAMPLES
HOW THE DATA APPEARS IN THE TABLE:
Advanced literature seminars that focus on a period or topic in British literature. Recent topics have included “Non-Shakespearean Early Modern British Drama,†“Crime and Victorian Literature,†“The Rise of the English Novelâ€
HOW THE DATA APPEARS IN THE CSV FILE:
Advanced literature seminars that focus on a period or topic in British literature. Recent topics have included "Non-Shakespearean Early Modern British Drama," "Crime and Victorian Literature," "The Rise of the English Novel".
I have been researching this and other articles say maybe the table encoding is wrong. I tracked down the encoding on the table, which is SQL_Latin1_General_CP1_CI_
Also, can I just change it at the table level or does it have to be applied to the entire database? I'd rather not break other things that seem to be working at the moment.
You cannot change it at the server level other than re-installing SQL and is not trivial to change it at database level so I suggest use a staging table to import your data having the desired collation/code page matching your mainframe export then import form that staging table into your permanent database table via a simple INSERT/SELECT where you have lot more control.
ASKER
MS Says you can change it on the column level using the COLLATE statement, which would be good enough but (1) who knows if that will work (2) who knows if that's actually the problem, and (3) I have no idea what to change the encoding to. The choices are here: http://technet.microsoft.com/en-us/library/ms144250(v=sql.105).aspx but I don't know what will change a “ to a ".
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
“Crime and Victorian Literature,â€
The first double quote becomes: “
and the trailing double quote becomes: â€
which appears quite consistently in the tiny sample.
I think Zberteoc's point is valid in that they are quite distinct, so it's not just a one-to-one mapping.
This MS support page might be of some help
How to convert between ASCII and EBCDIC character codes
a topic which by the way isn't helped by having different flavours of EBCDIC
The first double quote becomes: “
and the trailing double quote becomes: â€
which appears quite consistently in the tiny sample.
I think Zberteoc's point is valid in that they are quite distinct, so it's not just a one-to-one mapping.
This MS support page might be of some help
How to convert between ASCII and EBCDIC character codes
a topic which by the way isn't helped by having different flavours of EBCDIC
ASKER
I ended up writing a stored procedure containing multiple update statements to address and replace the bad characters in the data file after it is updated each day. What a pain. Microsoft in it's usual, aggravating glory. I was hoping there was some magic answer, but it is never as easy as that. Thanks!
I'd lay the blame on the M/F download and the probable existence of antique EBCDIC :)
happy you have the result though.
happy you have the result though.