Alexey Fedorov
asked on
Storing Unicode strings in DBF
Hello!
There is the requirement to interchange data in DBF format containing localized strings (Korean, Chinise...). Are there ways of storing UTF-8 inside DBF? For example, inside Varchar (binary), Memo (binary) or Character (binary).
It is desirable to store strings in several languages in one DBF at one time.
Thanks and good luck!
There is the requirement to interchange data in DBF format containing localized strings (Korean, Chinise...). Are there ways of storing UTF-8 inside DBF? For example, inside Varchar (binary), Memo (binary) or Character (binary).
It is desirable to store strings in several languages in one DBF at one time.
Thanks and good luck!
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks!
It is a serious problem of DBF format, because strings are stored in ASCII instead of UNICODE. As the result: only one CP can be used in one DBF and I should know this CP.
I solved this problem by means of decoder. I read DBF by Jet.OLEDB.4.0:
Data Source=C:\Work\CATALOG_TEX T\;Provide r=Microsof t.Jet.OLED B.4.0;Exte nded Properties=dBASE 5.0;
And I wrote the next code for SSIS Script Component (it is the helper assembly in C#):
private Encoding _srcEnc, _dstEnc;
_srcEnc = ASCIIEncoding.GetEncoding( GetActiveCodePage(), new EncoderExceptionFallback() , new DecoderExceptionFallback() );
_dstEnc = ASCIIEncoding.GetEncoding( 949, new EncoderExceptionFallback() , new DecoderExceptionFallback() );
private String ConvertAsciiToUnicode( String str )
{
Byte[] mbcs = _srcEnc.GetBytes( str );
return _dstEnc.GetString( mbcs );
}
private static Int32 GetActiveCodePage()
{
Object kVal = Registry.GetValue( @"HKEY_LOCAL_MACHINE\SOFTW ARE\Micros oft\Jet\4. 0\Engines\ Xbase", "DataCodePage", "OEM" );
String v = ((String)kVal).ToLower();
return v == "ansi" ? Thread.CurrentThread.Curre ntCulture. TextInfo.A NSICodePag e:Thread.C urrentThre ad.Current Culture.Te xtInfo.OEM CodePage;
}
I use ConvertAsciiToUnicode to decode Korean (CP 949) ASCII coming from DBF.
It is a serious problem of DBF format, because strings are stored in ASCII instead of UNICODE. As the result: only one CP can be used in one DBF and I should know this CP.
I solved this problem by means of decoder. I read DBF by Jet.OLEDB.4.0:
Data Source=C:\Work\CATALOG_TEX
And I wrote the next code for SSIS Script Component (it is the helper assembly in C#):
private Encoding _srcEnc, _dstEnc;
_srcEnc = ASCIIEncoding.GetEncoding(
_dstEnc = ASCIIEncoding.GetEncoding(
private String ConvertAsciiToUnicode( String str )
{
Byte[] mbcs = _srcEnc.GetBytes( str );
return _dstEnc.GetString( mbcs );
}
private static Int32 GetActiveCodePage()
{
Object kVal = Registry.GetValue( @"HKEY_LOCAL_MACHINE\SOFTW
String v = ((String)kVal).ToLower();
return v == "ansi" ? Thread.CurrentThread.Curre
}
I use ConvertAsciiToUnicode to decode Korean (CP 949) ASCII coming from DBF.
BTW,
If had good expertise in C# or C++ then you could have gon in for a FLL to read the tables directly and get the converted Encoding. That would have been more fast and No need of Jet driver.
If had good expertise in C# or C++ then you could have gon in for a FLL to read the tables directly and get the converted Encoding. That would have been more fast and No need of Jet driver.
Check CPCURRENT(1)