[Last Call] Learn how to a build a cloud-first strategyRegister Now


User can't read UTF-8 encoded text file

Posted on 2009-02-09
Medium Priority
Last Modified: 2012-05-06
I run an open source project that distributes SQL scripts that are run during installation. One user reported that when he opened the SQL script (which is just a text file ending in .sql), it looked corrupted (see attached snippet). After a little back and forth (read thread at http://www.galleryserverpro.com/forum/yaf_postsm1894.aspx) I was able to determine that if I changed the text encoding to ASCII (it was UTF-8 little endian), the user was able to view and run it.

I don't understand encoding issues very well and I don't understand how this guy could not open the SQL file. He reported the issue occurred on Microsoft Windows Server 2003 RC2, Standard x64 Edition, Service Pack 2. He *was* able to open the UTF-8 encoded version on an XP machine.

I have two questions:
1. Do you know why he could not open a file encoded in UTF-8 little endian? What would you have to do to a WinServer 2003 setup to cause this behavior?

2. Should I distribute my SQL scripts under a different encoding? All the characters can be encoded as ASCII, but I don't want to unintentionally introduce new issues by using an "old-fashioned" encoding.

Roger Martin
Gallery Server Pro - Open source web gallery for photos, video, audio, and documents


core Gallery Server Pro operation. This script runs on SQL Server 2005 and later.

Open in new window

Question by:rdogmartin
  • 4
  • 3
LVL 41

Expert Comment

ID: 23592729
I wonder what would happen if the guy had opened the file in NotePad and changed the encoding...  I presume that would fix it on his PC?
You might wanna read up on the  Unicode Byte Order Mask (BOM) "tag" that gets added to text files:  http://en.wikipedia.org/wiki/Byte-order_mark

Author Comment

ID: 23593453
Yes, I presume it would have, since that is essentially what I did to the file before I gave it back to him.

Interestingly, the code snippet in my original post is not showing the strange characters that I pasted into it. So I attached a screen shot that represents the "corrupt" file the user saw when he opened it. When I copied all these strange characters into this post, Experts Exchange filtered them out.

My core question still stands about which is the best encoding to use for widely distributed SQL script files...

LVL 41

Accepted Solution

graye earned 1000 total points
ID: 23594453
Well, I was wondering about the encoding of SQL Server itself... and whether or not that had to do with anything.   (The thinking was... if the encoding of the text file matched the default encoding of SQL Server)
UTF-8 is the default for Visual Studio and the SQL Server Management Studio...   so I'd stick with that.
So, tell us how/where the *.sql files were created... and specifically (if created via SQL Server), the "Language" and "Server Collation" values of SQL Server
NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.


Author Comment

ID: 23594949
Not sure what you mean by "encoding of SQL Server itself". You aren't confusing encoding with collation, are you?

I believe I created the original SQL script by copying the output from the scripting tool built in to Visual Studio 2005 Database Edition into a blank Notepad file. But that was long ago and I may have moved things around.

I based my original post on the conversation I had with that user many months ago. When I looked just now, I see that Notepad++ reports the file encoding as "UCS-2 Little Endian" - It doesn't even have an option for UTF-8 Little Endian. To add to the confusion, Visual Studio 2008 reports the same file to be in "Unicode - CodePage 1200". I don't understand why the two programs report different values for the same file - maybe those are two names that refer to the same thing?

If I use Visual Studio to create a blank text file, it wants to use "Unicode (UTF-8 with signature) - Codepage 65001".

I just don't understand enough to decide whether to move to the encoding VS wants to use for txt files or stay with the current ("UCS-2 Little Endian" or "Unicode - CodePage 1200", depending on which program I use). I just want these SQL scripts to be readable by my web installer around the world.
LVL 41

Expert Comment

ID: 23595123
So, the "Language" and "Server Collation" values of SQL Server are "English" and "SQL_Latin1_General_CP1_CI_AS"?

Author Comment

ID: 23595516
Yes, on my PC they are. I never found out what the user had.

Author Closing Comment

ID: 31544570
Core questions were not fully addressed; user never followed up on my answer to his/her question...

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A Stored Procedure in Microsoft SQL Server is a powerful feature that it can be used to execute the Data Manipulation Language (DML) or Data Definition Language (DDL). Depending on business requirements, a single Stored Procedure can return differe…
One of the most important things in an application is the query performance. This article intends to give you good tips to improve the performance of your queries.
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Suggested Courses

829 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question