Solved

Changing the Character symbol mapping.

Posted on 2001-07-04
12
444 Views
Last Modified: 2013-12-02
Can the characters having ANSI value more than 127 mapped to different character symbols? If yes, how does one do it?
Is there some thing called character tables and are there more than one character table defined by Windows OS / Hardware? If, Yes how does one switch to different Character table.
0
Comment
Question by:amitd
  • 5
  • 3
  • 3
  • +1
12 Comments
 
LVL 22

Expert Comment

by:CJ_S
ID: 6251807
You call it a character table, while it is in fact a code page.

A system can have several code pages installed.... setting them depends on whether the language supports it. What programming language are you using?

regards,
CJ
0
 
LVL 4

Expert Comment

by:Neutron
ID: 6252479
Can you give more details on what you're trying to do?

Greetings,
    Ntr:)
0
 

Author Comment

by:amitd
ID: 6252603
CJ,

I am using VC++.

How will I set the code page?

Regards,

amit
0
 
LVL 22

Expert Comment

by:CJ_S
ID: 6252703
I have an MSDN article here, if you want I can send it to you. it's too much to show here, but it contains exactly the info you would like. One thing in advance. the code page / character set is for the whole system. The only thing you can actually do is map the characters entered to another code page.

What is your email?

(I couldn't find the article at MSDN online...so...)

regards,
CJ
0
 

Author Comment

by:amitd
ID: 6254426
My email Id is shuklas@mahindrabt.com.

Regards,

amit
0
 
LVL 22

Expert Comment

by:CJ_S
ID: 6254556
Email sent
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 24

Expert Comment

by:SunBow
ID: 6255994
Note, code page is term for Microsoft, which varies in interpretation (ex OS).

For character set, whether extended or expanded, implementation really depends on the receiver platform. For example, uses differ between modem, monitor, and printer varieties.

Likely, for VC the article of MSDN may satisfy, but I have no details on it, although I did try access through latest VS as well.
0
 

Author Comment

by:amitd
ID: 6276608
Below I have given details of what I am doing I hope this will make my requirement clear.

I have taken HP DeskJet printer driver sample from Win 98 DDK. This sample has a file Minidrv.c in which the function ExtTextOut (Export Ordinal 14) has various parameters. One of them is ?lpStr? (Fifth parameter) that gives me character string to be printed. Each character is represented by a Two-byte code. It is mentioned in the ?Graphic Device Interface Reference? document, provided along with 98 DDK, that the two-bye code is a ?Glyph index? into the character-offset table.

Interestingly the difference between Glyph indexes and ANSI code for characters between 32 to 126 inclusive is 29.
Example: For ?SPACE?, the Glyph index is 3 whereas in ANSI, it is 32. This is not true for characters greater then 126. There is no consistency for characters greater than 126.

I want to get ANSI / UNICODE from the glyph indexes. Please help me out with this. It?s really urgent
0
 
LVL 4

Expert Comment

by:Neutron
ID: 6276884

These numerous parameters which you mentioned, is one of them name of the font which is being used?

Some comments:
- one font can contain several character tables
- for each character in a table, font selects a Glyph which represents it
- some of these character tables can (and usually do) contain characters which look just the same.
- characters from one or different tables can share the same Glyph.
- font can contain Glyphs that are not used in any character table.

This means that font designer can do whatever he wishes, he can put Glyphs in different order, he can have duplicates, unused Glyphs or even missing Glyphs.

Later, in definition of one of character tables, he can select Glyphs in the correct order, thus forming a table where he places Glyph index at the n-th position in the character table.

For the TrueType font format, those tables can be located inside a file.

If you know the font file name or at least the font name.

If you know the font name (that's why I was asking) you can examine TTF header and locate US ASCII standard character table which contains Glyph index for each character.
Once you have the table, you can look for Glyph index in the table. Position in table where you find it is the character code you're looking for.

TrueType font specification (almost 500KB) can be found at Microsoft site somewhere, but you don't need to go in-depth analyzing file structure, just look for part about tables.


If you don't have font file name, but only have the full font name (which is different) you have to scan your whole fonts directory and extract font name field from every font (also in TTF font file specification), but then you can have problems not all fonts are copied in Fonts folder.

One more problem can occur:
If original text contains some characters that don't have a Glyph in this font that was used, for all those characters you will have one the same "missing glyph" index (typically looking like small square - you've probably seen it at least on the Web on some non-english sites - a lot of small squares instead of text)

- - -

If you want to convert codes of one specific font (some standard fonr like Currier New ot Times New Roman) it is cheaper for you to send on your own an input text which contain full character set and see which glyphs you will get, so you can mage a correspondence table between glyphs and characters.

- - -

To illustrate why is this so complicated I will only say this:
Your font doesn't have to be a text font, but it can contain some symbols, decoration elements or clipart.


If you get stuck with TTF format, just scream :o)

Greetings,
    Ntr:)
0
 
LVL 4

Accepted Solution

by:
Neutron earned 300 total points
ID: 6277369
Here is the link for complete TTF specification.

http://www.microsoft.com/typography/tt/ttf_spec/ttspec.zip

From this archive you only need file TTCH02.DOC

Read first two chapters (about tables in TTF).

It says something like this (only a condensed review, read the file anyway):

-

At the beginning of TTF file skip 4 bytes and read on 16bit integer - NUMBER OF TABLES.

Skip 6 bytes to get to the beginning of TABLE DIRECTORY.

-

For each entry in TABLE DIRECTORY you have this:

4 characters TAG (contains 4char lowercase words like cmap, name, head...)

Skip 2 bytes.

32bit integer - TABLE OFFSET (from beginning of TTF file)

32bit integer - TABLE LENGTH

-

So, get number of tables and start searching the TABLE DIRECTORY to find entry with tag cmap.
This 'cmap' contains character to glyph mapping array.

When you find that entry, go to specified OFFSET in file.

-
Reading cmap table:

First read subtable directory header.
Skip 2 bytes.
16bit integer - NUMBER OF SUBTABLES.

There can be multiple subtables there, so once again you search through a structure like this:
(all entries have this structure)

16bit integer - PLATFORM ID
16bit integer - ENCODING
32bit integer - SUBTABLE OFFSET (from beginning of TTF file)

You are looking for PLATFORM ID==3 and ENCODING==1
There can be only one!
(ID and encoding combination is unique, with Microsoft it is always 3 and 1)

When you find it - just seek to subtable offset.

-

There you will find a subtable in format 4 (Microsoft)

In short:
Skip 2 bytes
16bit integer - SUBTABLE LENGTH
Skip 20 bytes

Now you should be positioned at GLYPH ID ARRAY.

This array is more complex (used in combination with those 20 bytes you skipped) but that complexity can be used when you have the UNICODE code and want to find a Glyph index.
This is ireversible.
When you convert from Glyph index to UNICODE code, it is a relation one to many (generally), so you just find the first occurance of Glyph index.
If you find a Glyph index in first 128 table positions then it is (likely) a Basic Latin character, and from 128 to 255 is Latin-1 Supplement character range.

Good luck,
    Ntr:)
0
 

Author Comment

by:amitd
ID: 6316813
Ntr:

How do I get the Font name / Font File name from the parameters of "ExtTextOut". Please help, I was not able get the information from "ExtTextOut" paramters.

Regards,

amitd
0
 

Author Comment

by:amitd
ID: 6321018
Hi,

I got the font name can I get the file name for the same font. Without going and opening every file to check which file the font beloongs too.

Thanks,

amitd
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Suggested Solutions

Here we come across an interesting topic of coding guidelines while designing automation test scripts. The scope of this article will not be limited to QTP but to an overall extent of using VB Scripting for automation projects. Introduction Now…
Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now