Link to home
Start Free TrialLog in
Avatar of funkyfinger
funkyfinger

asked on

Windows IME .dic file format

I would like to understand the encoding of Window's dic file.
I would like to be able to read/write/understand the data these file contains. I am assuming that these files are some sort of database and that no matter the language it is a similar format.

If you have installed Japanese on your computer then one such file will exist in your windows directory:

%winroot%\IME\IMJP8_1\Dicts\IMJPTK.DIC


Thanks
ff


(I have already seen the following pages and they do not help me much:
http://msdn.microsoft.com/library/en-us/wcemain4/html/cmtskCreatingDictionaryFile.asp?frame=true

http://msdn.microsoft.com/library/en-us/wcemain4/html/cmrefJapaneseIME30Part-of-SpeechCodes.asp?frame=true
)

============================================================
Deleted, with no points refunded
12/25/2004 12:35AM PST

modulo
Community Support Moderator
============================================================
Avatar of funkyfinger
funkyfinger

ASKER

Btw,

As far as I can tell the correct name for this file is a "binary IME dictionary file".

I believe that these files are not used for window's spell check.
They are not used in spellcheck

I'm not  sure what the format is but if I knew it then I would try to port Jdict over since  that would just be awesome for when I'm typing in kana
This is a round about way of doing it but it will get you (me) the information you (I) want. It will not contain all the data however, you (I.. ok I'm going to stop talking in first person to myself from this point, because obviously I'm not reading it and hopefully you find this information valuable.) .. you will not get the type of word contained by the Japanese character (the database also contains if the word is a noun, verb or place) but every thing else even the radical incoding.
Here's how:
Start Character Map, select advanced view, start Spy++, use Visual Basic 6.0 (because I know how but you hate .Net) and the SendMessage API to get the text cotained within the list boxes.
Use the Group By select box to select radicals, kana, etc...
This will popup another window with the title "Group By"
Write a program that selects each item listed in this window, this control might not be a select list so using an API that simulates a mouse click might be a eaiser (but longer) process. Next use the WM_gettext message to get the sub grouping of data from the character map window. Remember that each character is a wide character and that Unicode is not 2 bytes.
Alos know that each window might have scroll bars so that is a nastly little problem as well.
The rest you will have to do on your own (store data in DB)
Good Luck
ASKER CERTIFIED SOLUTION
Avatar of modulo
modulo

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial