User can't read UTF-8 encoded text file

Posted on 2009-02-09
Last Modified: 2012-05-06
I run an open source project that distributes SQL scripts that are run during installation. One user reported that when he opened the SQL script (which is just a text file ending in .sql), it looked corrupted (see attached snippet). After a little back and forth (read thread at I was able to determine that if I changed the text encoding to ASCII (it was UTF-8 little endian), the user was able to view and run it.

I don't understand encoding issues very well and I don't understand how this guy could not open the SQL file. He reported the issue occurred on Microsoft Windows Server 2003 RC2, Standard x64 Edition, Service Pack 2. He *was* able to open the UTF-8 encoded version on an XP machine.

I have two questions:
1. Do you know why he could not open a file encoded in UTF-8 little endian? What would you have to do to a WinServer 2003 setup to cause this behavior?

2. Should I distribute my SQL scripts under a different encoding? All the characters can be encoded as ASCII, but I don't want to unintentionally introduce new issues by using an "old-fashioned" encoding.

Roger Martin
Gallery Server Pro - Open source web gallery for photos, video, audio, and documents


core Gallery Server Pro operation. This script runs on SQL Server 2005 and later.

Open in new window

Question by:rdogmartin
    LVL 41

    Expert Comment

    I wonder what would happen if the guy had opened the file in NotePad and changed the encoding...  I presume that would fix it on his PC?
    You might wanna read up on the  Unicode Byte Order Mask (BOM) "tag" that gets added to text files:
    LVL 6

    Author Comment

    Yes, I presume it would have, since that is essentially what I did to the file before I gave it back to him.

    Interestingly, the code snippet in my original post is not showing the strange characters that I pasted into it. So I attached a screen shot that represents the "corrupt" file the user saw when he opened it. When I copied all these strange characters into this post, Experts Exchange filtered them out.

    My core question still stands about which is the best encoding to use for widely distributed SQL script files...

    LVL 41

    Accepted Solution

    Well, I was wondering about the encoding of SQL Server itself... and whether or not that had to do with anything.   (The thinking was... if the encoding of the text file matched the default encoding of SQL Server)
    UTF-8 is the default for Visual Studio and the SQL Server Management Studio...   so I'd stick with that.
    So, tell us how/where the *.sql files were created... and specifically (if created via SQL Server), the "Language" and "Server Collation" values of SQL Server
    LVL 6

    Author Comment

    Not sure what you mean by "encoding of SQL Server itself". You aren't confusing encoding with collation, are you?

    I believe I created the original SQL script by copying the output from the scripting tool built in to Visual Studio 2005 Database Edition into a blank Notepad file. But that was long ago and I may have moved things around.

    I based my original post on the conversation I had with that user many months ago. When I looked just now, I see that Notepad++ reports the file encoding as "UCS-2 Little Endian" - It doesn't even have an option for UTF-8 Little Endian. To add to the confusion, Visual Studio 2008 reports the same file to be in "Unicode - CodePage 1200". I don't understand why the two programs report different values for the same file - maybe those are two names that refer to the same thing?

    If I use Visual Studio to create a blank text file, it wants to use "Unicode (UTF-8 with signature) - Codepage 65001".

    I just don't understand enough to decide whether to move to the encoding VS wants to use for txt files or stay with the current ("UCS-2 Little Endian" or "Unicode - CodePage 1200", depending on which program I use). I just want these SQL scripts to be readable by my web installer around the world.
    LVL 41

    Expert Comment

    So, the "Language" and "Server Collation" values of SQL Server are "English" and "SQL_Latin1_General_CP1_CI_AS"?
    LVL 6

    Author Comment

    Yes, on my PC they are. I never found out what the user had.
    LVL 6

    Author Closing Comment

    Core questions were not fully addressed; user never followed up on my answer to his/her question...

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    How to improve team productivity

    Quip adds documents, spreadsheets, and tasklists to your Slack experience
    - Elevate ideas to Quip docs
    - Share Quip docs in Slack
    - Get notified of changes to your docs
    - Available on iOS/Android/Desktop/Web
    - Online/Offline

    On July 14th 2015, Windows Server 2003 will become End of Support, leaving hundreds of thousands of servers around the world that still run this 12 year old operating system vulnerable and potentially out of compliance in many organisations around t…
    This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
    Via a live example combined with referencing Books Online, show some of the information that can be extracted from the Catalog Views in SQL Server.
    Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties

    760 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    7 Experts available now in Live!

    Get 1:1 Help Now