UNIX or mainframe to Windows character conversion

Hi,

In a Connect:Direct data transfer between a UNIX and Windows systems, a character conversion table is needed to translate the hexadecimal codes of the two respective systems.

2 questions:
1. what determines what the translation table will be - the codepage of each respective system?
2. can a given translation table be obtained/created, other than by one-by-one analysis of the characters?

Example translate for one character might be: Ò  Hex: ED -> D2
Answer to 2 would be to obtain such a listing for all characters.

Thanks!
xeniumAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
Do you need a table to be set up or a Unix command for translation?


A command for character translation in Unix would be:

a) using sed (all Unix variants support this -- even very (!) old ones):
        sed -e 'y/ABCabc/XYZxyz/'   sourcefile  > targetfile  
    This will translate all 'A's into 'X's, 'B's into 'Y's, 'C's into 'Z's and the lower
    case characters as well
b) using tr (all the more recent veriants):
        tr 'A-Ca-c' 'X-Zx-z'   sourcefile  > targetfile
    This is the same translation

The only difference is the ways you specify the translation (tables) for sed
and tr, repectively.
0
omarfaridCommented:
You may use dd command on unix to convert from one code to another if standard. Please see

http://www.hmug.org/man/1/dd.php

look at the conv option
0
svsCommented:
It sounds like you need to generate a translation table in for Connect:Direct software, probably in its own format. Right?
0
Introduction to Web Design

Develop a strong foundation and understanding of web design by learning HTML, CSS, and additional tools to help you develop your own website.

giltjrCommented:
I am assuming you are talking about code page conversion, as Unix and Windows use the same hex value for the same characters when using the same code page, which are all based on 7-bit ASCII with extensions for 8 bit characters.

What do you mean by mainframe?  If you mean IBM's zSeries, which OS.  zVM, zVSE, and zOS are all EBCDIC based.  In this case even of both sides are using the same code page, there must be a conversion to/from EBCDIC as even when using the same code page, the mainframe is based on EBCDIC and the other side (Unix or Windows) is based on ASCII.

When connecting to a "z" based os, IIRC the mainframe side does the conversion no matter what, meaning both sending and receiving.

I believe that when doing distributed to distributed, the receiving side is responsible for doing the conversion.  I never used Connect:Direct in a distributed to distributed enviroment.

What code pages are you trying to convert to/from?
0
xeniumAuthor Commented:
JustUNIX, need table, not command.

svs, yes, and in own format. But a text list like in my example is what is needed as interim.

giltjr, yes mainframe would do the conversion, but i understand the transmission process refers to the translate table during transfer.

What code pages? n/a. I want to know how to do this in general. Is there some online library available for this kind of stuff?

Thanks all for your feedback so far.
0
ahoffmannCommented:
> need table, not command.
and in question
> conversion table is needed to translate the hexadecimal codes of the two respective systems.

In unix this table is obsolete 'cause any hex representation can be converted to its charater representation on the fly, without any table, Unix does not waste resources this way ;-)
Example (for ASCII):
  perl -le 'printf"%c",0x42'

So the question remains (at least to me) what you need this table for. If it is required for conversion between different represenations, like ASCII, EBCDIC, IBM, etc., then see dd (as already suggested) or provide more information about what you want to achieve.
0
giltjrCommented:
For Connect:Direct to do the conversion as part of the transfer, you would need to refer to their documentation on how to setup a mapping table and how to specify which table to use.  

Each product has their own way of setting up the table, some products use a text file in a specific format while others require you to "assemble" the table using some programing language.  Some products use a unique file for each conversion table, while others use a single table with multiple entries.

Each product has their own parameter for specifying which table, or table entry to use.

As for a generic way of doing conversions, it really depends.  Some programs have a mapping table where it has the current and new values specifed like in two columns.  So for that you would:

     read byte
     find byte in table
     look at what it is to be converted to
     convert

In your example the table would have something like

   EA,EA
   EB,EB
   EC,EC
   ED,D2
   EF,EF

So you would read until you matched on ED, check to see that it is changed to D2, and do the change.
While others have a "single" column and the replacement value is located at the offset within the table for the character it is replacing.  Using your example again, you would have D2 located at xED, (237 bytes) into the table.  Your program would then look 237 bytes into the table, and use whatever character is located their.

I will say that programs on the mainframe use the second way of setting up the table.  I would assume that most distributed platform also use the second method.  I have only see a few programs use the 1st method as it takes up more storage, you would have 512 bytes per table using the 1st method and only 256 bytes using the second.  Although this does not seem like much if you have 100 tables, it starts to add up.  Also, it is generally easier and requires less CPU to go to a offset within a table than to do compares of storage locations.

I can't really tell, are you looking for something specific, or are you just attempting to learn in general how this is done?  You have mentioned some specific platforms and products, but then you also state you want general information.

Here is a tutorial on some of this:

http://www.cs.tut.fi/~jkorpela/chars.html
0
xeniumAuthor Commented:
giltjr, yes thanks, as you mention,  my table would look something like:
"   EA,EA
   EB,EB
   EC,EC
   ED,D2
   EF,EF
"
etc (only in my question, i added a column showing the display character, for reference only)

This is what i want - how do i get it? Assuming it depends on things like codepage, op system, i imagine there maybe websites where i can choose these from a drop-down list, and get back a table?

btw thanks for the tutorial, looks like it may be useful.
0
xeniumAuthor Commented:
(points will be 500 if the solution is as simple as i hope! eg website providing all such tables for free)
0
giltjrCommented:
What code pages do you want to convert between?  You can always go to Wikipedia:

     http://en.wikipedia.org/wiki/Code_page

and they have links to various code pages.  Each OS support their own code pages and in some cases indvidual products may have their own code page table and conversion routines.


0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
svsCommented:
I don't think there's any web site that will generate a translation table for you; but you can try attached Perl one-liner.

Substitute FROM and TO with codepage names.  You'll get a translation table on standard output; it could be incomplete if some characters do not exist in target codepage.
perl -MEncode -e 'for (0..255) { $u = Encode::decode("FROM", chr($_)); eval { $t = Encode::encode("TO", $u, 1) } or undef $t; if ($t) { printf ("%02X,%02X\n", $_, ord($t)); }}'

Open in new window

0
xeniumAuthor Commented:
giltjr, thanks for the link

svs, i assume that op system should also be a variable? If not, why not? PS also i don't know how to run perl ! :-(
0
svsCommented:
Sorry, I don't understand why you think that 'op system' should be a variable.

If you have perl on your system, just copy/paste this code to command line and run it.
0
giltjrCommented:
I'm still confused as to what he is attempting to do.  Could you try to let us know what exactly you are doing?
0
xeniumAuthor Commented:
Ok, i've read some of the useful wiki link and its starting to make some sense.

You are right, operating system is not a variable, only codepage is. My confusion arose because some codepages exist only on some operating systems.

Now, an example translation i need is EBCDIC 037 to Unicode (if i understand correctly now).  My question gave a clue by example: Ò  Hex: ED -> D2 (this translation i found by inspection - ie viewing the hex values on mainframe and then windows)

The full listing is here http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT
(via the wiki link http://en.wikipedia.org/wiki/EBCDIC_037 - but wiki seems to have at least one typing error eg FB-D8, correct is FB-DB)

In general, it seems i just need to identify the codepages to convert from/to, and search wiki for the relevant translation table. Pls let me know if any different. Other useful website links are welcome.

As for implementing the translation, this is done by Connect:Direct. Just i need to type in the translate table to the Connect:Direct system, or maybe obtain from the vendor (Sterling Commerce).
0
giltjrCommented:
You may want to check the Connect:Direct doc.  How is USED to work (its been 10-15 years since I have used Connect:Direct),  was the Unix/Windows side told the z/OS side what Code Page it was using, and the z/OS side would convert from that to whatever the z/OS side was configured to use by default.  The only issue the would arise was if the z/OS side did not have a conversion table for whatever the Unix/Windows side was using.
0
svsCommented:
This discussion strongly reminds me of classic "Unix Consultant" text: http://fgouget.free.fr/fun/UnixConsulting.shtml...

xenium: Unicode is not a single-byte encoding; do you really need to convert from CP037 to Unicode?
0
giltjrCommented:
That is hilarious.
0
ahoffmannCommented:
> .. translation i need is EBCDIC 037 to Unicode
this is what the suggested perl solution with the Encode module does
Please get used to perl and Encode.pm and your convertion most likely will be a one-liner.

> .. Unicode is not a single-byte encoding
may be someone is talking about Unicode while UTF8 is meant ;-)
0
svsCommented:
My perl code will work only for a pair of single-byte encodings; UTF-8 isn't one (it's variable-length multibyte) and neither is Unicode (it's wide characters)...
0
ahoffmannCommented:
in perl there're various ways for encoding (as usual: there is more than one way to do it in perl:)
  use utf8;
  use Convert;
  use Encode;
  use Unicode;
  use Unicode::String;
  # man perlpacktut perlunicode perluniintro perlebcdic
  # perldoc -f pack
  # perldoc -f unpack
# .. and probably many more ..
0
xeniumAuthor Commented:
Thanks all for your input. My previous comment summarises the info needed. Sorry to confuse some of you, but glad to have led it to some entertainment!

(btw, perl is not relevant here, since the requirement is to use the proprietory system "Connect:Direct", as mentioned in the question)
0
svsCommented:
...so, you have misled all of us for entertainment?
0
xeniumAuthor Commented:
Entertainment by you for you.. i didn't get the jokes!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Unix OS

From novice to tech pro — start learning today.