[Last Call] Learn how to a build a cloud-first strategyRegister Now

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 828
  • Last Modified:

ASCII to UTF8 Conversion of flat files


I have a flat file that contains ASCII characters mixed with Japanese Kanji. I am tryin gto get an application to read this flat file, but it is unable to read the Kanji, unless I convert the flat file into UTF8 format.

The file is on a Windows 2003 server machine. Does anyone know / has a working program that will convert ASCII to UTF8?

2 Solutions
Tsirapi28Author Commented:
Thanks, but are there any - sources recommendations about writting C / Java / VB to convert ACSII to UTF8?
How can an ASCII file contain Kanji ?
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Such characters requre a Unicode variant - such as UTF-8. It's more likely that the file is encoded in ISO-10646 or something.

Are you able to identify which one?
There's also a program called 'iconv'. It converts from many charsets to many. But there will sure be specific japanese charcode converters around..

On windows, the plain file is probably in Shift-Jis (SJIS) format. If not, try JIS or EUC-JP.

The great and free JWPCE editor will probably be able to open it anyway.. from that you can save to many format.
Tsirapi28Author Commented:
Thanks to all for your input.

How ASCII contains Kanji?

Here is an extract of the file I'm working with. The code characters are Kanji.


This file comes in as .txt format. I guess my first question is how does one tell what is this file encoded in ASCII, UNICODE or what and then how can I feed this file through a program that will produce a UTF8 output. I do not care to see printable charatcers as long as it is in UTF8. My goal is to then feed the file into an application to process the data.

The Japanese "Ascii" is actually ISO-2022-JP, an 8-bit shift code. To convert such code the Windows API MultiByteToWideChar() (see msdn.com).

You can test the conversion here (scroll the page down to miss the adverts) :-


I am using Delphi where u can use the function UTF8toAnsi function. Hope this helps u.
  Bits are bits and nothing else. An ASCII file is still a file full of bits, in frames of 8. Only certain of the 256 possible bit patterns correspond to approved ASCII characters, the ones that appear on a keyboard. Other bit patterns are unused (or misused) or used in certain special ways, such as the first 32, and those past 127.
   Your file should be following some convention - for instance, patterns beyond 127 may signify certain symbols, but, different conventions represent symbols in different ways. Thus the same file can be shown with different glyphs depending on whether it is displayed with Notepad (and different versions of Norepad exist), Wordpad, or Word. Another ploy is a two-byte encoding, perhaps ESC-x, or, a code that states that subsequently, all symbols are being represented in 16 bits, not 8.
   You need to find out what convention is being used, and then a prog. that operates according to that convention, or, write one yourself. If you don't know the name of the convention, but have available a number of interpreting progs. then you could try them all and see which seems to give sensible results just as you won't get far displaying a .doc file with Notepad.

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now