Solved

MS SQL Bulk Import UTF8 CSV data

Posted on 2014-03-11
5
3,466 Views
Last Modified: 2014-04-05
We have to import UTF8 formated CSV data into a MS SQL 2008R2 DB table.

MS SQL don't support the codepage 65001.

Also the datafiletypes "widechar" and "widenative" stops with the error message that the first column is too long in the first line.

For convert the data file without lost information we try convert the file from UFT8 to UTF16, but UTF16 is not a known coding in .NET.  Only UTF7, UTF8 and UTF32 are knowing codings.

What is good working solution for this problem?
0
Comment
Question by:hpnix4ever
  • 3
5 Comments
 
LVL 142

Assisted Solution

by:Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3] earned 500 total points
Comment Utility
for the time being, here some MS official statement about why the UTF-8 is not supported (yet):
https://connect.microsoft.com/SQLServer/feedback/details/321839/msft-edw-tapblock-bcp-doesnt-recognize-field-terminator-on-code-page-65001

and .net does support utf-16, namely unicode encoding:
http://msdn.microsoft.com/en-us/library/system.text.unicodeencoding%28v=vs.110%29.aspx
0
 

Accepted Solution

by:
hpnix4ever earned 0 total points
Comment Utility
Thanks for answer. I will try to convert from UTF8 to UTF16 before SQL bulk import.
------------------------------
Now i had test some convert functions, but nothing of the results can be imported in SQL.  

For read the UTF8 file i had used a streamreader with encoding UFT8. For write as UTF16 i had used a streamwriter with unicode encoding.

For verification i had convert from UTF8 to UTF16 and back to UFT8 that i can load with Excel. The UTF16 file has double size as UTF8, but i can't show with my programs. The reconvert file i can load to Excel and it looks fine. Only the SQL import don't work. Anything by the UTF16 file looks wrong.

For the string conversation i had used all known .NET functions and also some native conversations, but nothing is ready for the bulk import.

The SQL Part is like this:

BULK INSERT tbl_Import
FROM 'C:\temp\test.csv'
WITH
(
  FIELDTERMINATOR=';',
  ROWTERMINATOR='\r\n',
  -- DATAFILETYPE = 'widechar',
  -- DATAFILETYPE= 'widenative',
  CODEPAGE = 1200
 -- CODEPAGE = 1201
)

See the other versions as SQL comments
0
 

Author Comment

by:hpnix4ever
Comment Utility
The solution was to convert the data before import from UTF-8 to Unicode-BigEndian with a .NET-Programm. Unicode-BigEndian can be imported with BULK-Import.
0
 

Author Closing Comment

by:hpnix4ever
Comment Utility
It's no solution posible direct inside MS SQL. Only a converison to UnicodeBigEndian with a external Script is working.
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
When we want to run, execute or repeat a statement multiple times, a loop is necessary. This article covers the two types of loops in Python: the while loop and the for loop.
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now