ASCII to UTF8 Conversion of flat files

Posted on 2006-03-23
Last Modified: 2008-02-01

I have a flat file that contains ASCII characters mixed with Japanese Kanji. I am tryin gto get an application to read this flat file, but it is unable to read the Kanji, unless I convert the flat file into UTF8 format.

The file is on a Windows 2003 server machine. Does anyone know / has a working program that will convert ASCII to UTF8?

Question by:Tsirapi28
    LVL 2

    Expert Comment


    Author Comment

    Thanks, but are there any - sources recommendations about writting C / Java / VB to convert ACSII to UTF8?
    LVL 25

    Expert Comment

    How can an ASCII file contain Kanji ?
    LVL 25

    Expert Comment

    Such characters requre a Unicode variant - such as UTF-8. It's more likely that the file is encoded in ISO-10646 or something.

    Are you able to identify which one?
    LVL 2

    Expert Comment

    There's also a program called 'iconv'. It converts from many charsets to many. But there will sure be specific japanese charcode converters around..

    On windows, the plain file is probably in Shift-Jis (SJIS) format. If not, try JIS or EUC-JP.

    The great and free JWPCE editor will probably be able to open it anyway.. from that you can save to many format.

    Author Comment

    Thanks to all for your input.

    How ASCII contains Kanji?

    Here is an extract of the file I'm working with. The code characters are Kanji.


    This file comes in as .txt format. I guess my first question is how does one tell what is this file encoded in ASCII, UNICODE or what and then how can I feed this file through a program that will produce a UTF8 output. I do not care to see printable charatcers as long as it is in UTF8. My goal is to then feed the file into an application to process the data.

    LVL 27

    Accepted Solution

    The Japanese "Ascii" is actually ISO-2022-JP, an 8-bit shift code. To convert such code the Windows API MultiByteToWideChar() (see

    You can test the conversion here (scroll the page down to miss the adverts) :-

    LVL 1

    Expert Comment

    I am using Delphi where u can use the function UTF8toAnsi function. Hope this helps u.
    LVL 2

    Assisted Solution

      Bits are bits and nothing else. An ASCII file is still a file full of bits, in frames of 8. Only certain of the 256 possible bit patterns correspond to approved ASCII characters, the ones that appear on a keyboard. Other bit patterns are unused (or misused) or used in certain special ways, such as the first 32, and those past 127.
       Your file should be following some convention - for instance, patterns beyond 127 may signify certain symbols, but, different conventions represent symbols in different ways. Thus the same file can be shown with different glyphs depending on whether it is displayed with Notepad (and different versions of Norepad exist), Wordpad, or Word. Another ploy is a two-byte encoding, perhaps ESC-x, or, a code that states that subsequently, all symbols are being represented in 16 bits, not 8.
       You need to find out what convention is being used, and then a prog. that operates according to that convention, or, write one yourself. If you don't know the name of the convention, but have available a number of interpreting progs. then you could try them all and see which seems to give sensible results just as you won't get far displaying a .doc file with Notepad.

    Featured Post

    Maximize Your Threat Intelligence Reporting

    Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

    Join & Write a Comment

    Suggested Solutions

    Title # Comments Views Activity
    map interface methods 3 46
    withoutTen challenge 14 69
    SPLUNK REST  API call to Splunk to create and index? 2 43
    strCount chalenge 3 35
    Here we come across an interesting topic of coding guidelines while designing automation test scripts. The scope of this article will not be limited to QTP but to an overall extent of using VB Scripting for automation projects. Introduction Now…
    A short article about a problem I had getting the GPS LocationListener working.
    An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
    In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

    728 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    20 Experts available now in Live!

    Get 1:1 Help Now