Link to home
Start Free TrialLog in
Avatar of Alexey Fedorov
Alexey FedorovFlag for Russian Federation

asked on

Storing Unicode strings in DBF

Hello!

There is the requirement to interchange data in DBF format containing localized strings (Korean, Chinise...). Are there ways of storing UTF-8 inside DBF? For example, inside Varchar (binary), Memo (binary) or Character (binary).

It is desirable to store strings in several languages in one DBF at one time.

Thanks and good luck!
ASKER CERTIFIED SOLUTION
Avatar of suhashegde
suhashegde

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Cyril Joudieh
I tried it this way. If you have the univode in Excel or XML, you can append to a DBF using APPEND FROM TYPE XLS. The only condition is that the locale ID of the operative system needs to be for that language (Chinese, Korean).

Check CPCURRENT(1)
Avatar of Alexey Fedorov

ASKER

Thanks!
It is a serious problem of DBF format, because strings are stored in ASCII instead of UNICODE. As the result: only one CP can be used in one DBF and I should know this CP.

I solved this problem by means of decoder. I read DBF by Jet.OLEDB.4.0:
Data Source=C:\Work\CATALOG_TEXT\;Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=dBASE 5.0;

And I wrote the next code for SSIS Script Component (it is the helper assembly in C#):

private Encoding _srcEnc, _dstEnc;

_srcEnc = ASCIIEncoding.GetEncoding( GetActiveCodePage(), new EncoderExceptionFallback(), new DecoderExceptionFallback() );
 _dstEnc = ASCIIEncoding.GetEncoding( 949, new EncoderExceptionFallback(), new DecoderExceptionFallback() );

private String ConvertAsciiToUnicode( String str )
        {
            Byte[] mbcs = _srcEnc.GetBytes( str );            
            return _dstEnc.GetString( mbcs );
        }

        private static Int32 GetActiveCodePage()
        {
            Object kVal = Registry.GetValue( @"HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Xbase", "DataCodePage", "OEM" );
            String v = ((String)kVal).ToLower();
            return v == "ansi" ? Thread.CurrentThread.CurrentCulture.TextInfo.ANSICodePage:Thread.CurrentThread.CurrentCulture.TextInfo.OEMCodePage;
        }

I use ConvertAsciiToUnicode to decode Korean (CP 949) ASCII coming from DBF.
Avatar of suhashegde
suhashegde

BTW,
If had good expertise in C# or C++ then you could have gon in for a FLL to read the tables directly and get the converted Encoding. That would have been more fast and No need of Jet driver.