Question

character encoding (unicode to utf-8) conversion problem

Asked by: cheekycj

I have run into a problem that I can't seem to find a solution to.

my users are copying and pasting from MS-Word.  My DB is Oracle with its encoding set to "UTF-8".

Using Oracle's thin driver it automatically converts to the DB's default character set.

When Java tries to encode Unicode to UTF-8 and it runs into an unknown character (typically a character that is in the High Ascii range) it substitutes it with '?' or some other wierd character.

How do I prevent this.

I tried different encodings using a simple driver like:
class UnicodeConversionTest
{
    public static void main(String[] args)
    {
   try {
     String str = new String("`test3`");
     String utfStr = new String(str.getBytes("UTF-8"), "UTF-8");
     System.out.println("Converted:" + str + " to:" + utfStr);
   } catch (Exception e) {
       e.printStackTrace(System.out);
     }
    }
}

But that didn't work.  Then I tried a more elaborate conversion:
import sun.io.CharToByteConverter;
import sun.io.ByteToCharConverter;

public class UnicodeTest {
 public UnicodeTest() {
 }

 public static void main(String[] args) {

   UnicodeTest unicodeTest1 = new UnicodeTest();

   try {
     ByteToCharConverter fromUnicode = ByteToCharConverter.getConverter("US-ASCII");
     char[] subChars = { ' ' };
     fromUnicode.setSubstitutionMode(true);
     fromUnicode.setSubstitutionChars(subChars);
     String originalStr = new String("test3");
     char[] convertedChars = fromUnicode.convertAll(originalStr.getBytes());
     String convertedStr = new String(convertedChars);
     //String convertedStr = new String(originalStr.getBytes("US-ASCII"), "US-ASCII");
     System.out.println("String:" + originalStr + " converted to:" + convertedStr);
   } catch (Exception e) {
     e.printStackTrace(System.out);
   }
 }

I tried a variation of the second code snippet that inserts into the DB - just to see the results and it was a no go.

I don't want '?' replacing the uknown chars.  I would rather strip them or replace them with ' ' but I haven't been able to get that to work (using the second bit of code)

Any ideas on what I am doing wrong?

Thanx,
CJ

This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.

Subscribe now for full access to Experts Exchange and get

Instant Access to this Solution

  • Plus...
  • 30 Day FREE access, no risk, no obligation
  • Collaborate with the world's top tech experts
  • Unlimited access to our exclusive solution database
  • Never be left without tech help again

Subscribe Now

Asked On
2003-02-13 at 10:47:31ID20512969
Tags

java

,

encoding

Topic

Java Programming Language

Participating Experts
9
Points
200
Comments
41

Trusted by hundreds of thousands everyday for fast, accurate and reliable tech support.

  • "The time we save is the biggest benefit of Experts Exchange to Warner Bros. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange." Mike Kapnisakis, Warner Bros.
  • "Our team likes having a resource that is more secure than just using Google and most experts using this service really know their stuff. It's nice to look here first versus using Google." Dayna Sellner, Lockheed Martin
  • "Anytime that I've been stumped with a problem, 9 out of 10 times Experts Exchange has either the accepted solution or an open discussion of the potential solution to the problem." Kenny Red, eBay Inc.

See what Experts Exchange can do for you.

Got a question?

We've got the answer.

Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.

Screenshot of Experts Exchange Knowledgebase

Need individual assistance?

Our experts are ready to help.

If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.

Screenshot of Experts Exchange Knowledgebase

Want to learn from the best?

Read articles from industry experts.

Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.

Screenshot of an Article

Working on a long term project?

Store your work and research.

Save solutions to your questions, answers you’ve discovered through searching plus helpful articles in your personal knowledgebase for easy future access.

Screenshot of Experts Exchange Knowledgebase

Access the answers to your technology questions today.

Subscribe Now

30-day free trial. Register in 60 seconds.

What Makes Experts Exchange Unique?

Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Trusted by the world's most respected brands.

image of each brand's logo

Faithfully serving IT professionals since 1996.

Experts Exchange Logo

Try it out and discover for yourself.

Subscribe Now

30-day free trial. Register in 60 seconds.

Related Solutions

  1. UTF-8 to Unicode conversion
    How to convert UTF-8 to unicode (UCS-2) ?? new String(asUTF8.getBytes("UTF8"), "Unicode"); Where asUTF8 is string which contains UTF-8 returns blank string! This is required since the string obtained from webpage is UTF8(filename) and the file is to be ...

Free Tech Articles

  1. WARNING: 5 Reasons why you should NEVER fix a computer for free.
    It is in our nature to love the puzzle. We are obsessed. The lot of us. We love puzzles. We love the challenge. We thrive on finding the answer. We hate disarray. It bothers us deep in our soul. W...
  2. SCCM OSD Basic troubleshooting
    SCCM 2007 OSD is a fantastic way to deploy operating systems, however, like most things SCCM issues can sometimes be difficult to resolve due to the sheer volume of logs to sift through and the dispe...
  3. Migrate Small Business Server 2003 to Exchange 2010 and Windows 2008 R2
    This guide is intended to provide step by step instructions on how to migrate from Small Business Server 2003 to Windows 2008 R2 with Exchange 2010. For this migration to work you will need the fo...
  4. Create a Win7 Gadget
    This article shows you how to create a simple "Gadget" -- a sort of mini-application supported by Windows 7 and Vista. Gadgets can be dropped anywhere on the desktop to provide instant information, ...
  5. Outlook continually prompting for username and password
    There have been a lot of questions recently regarding Outlook prompting for a username and password whilst using Exchange 2007. There are a few reasons why this would happen and I will try to cover t...
  6. Backup Exchange 2010 Information Store using Windows Backup
    There seems to be quite a lot of confusion around the ability to backup Exchange 2010 using the built in Windows Backup feature. This stems from the omission of this feature prior to Exchange 2007 s...

Cloud Class Webinars

  1. Avoiding Bugs in Microsoft Access
    Alison Balter takes and in-depth look at avoiding bugs in Access. In this webinar you will learn about using the immediate window to debug your applications, invoking the debugger, using breakpoints to troubleshoot, stepping through code, setting the next statement to execute, ...
  2. Top 10 Best New Features in Visio 2010
    Scott Helmers gives live demonstrations of the top 10 new features in Visio 2010. This webinar will teach you how to create compelling diagrams by adding shapes to the page with a single click, linking the shapes in a diagram to data in Excel (or SQL Server, or SharePoint), ...
  3. IT Consultant Business Secrets Revealed
    Michael Munger, Experts Exchange tech pro and IT consultant, pulls back the curtain on his very successful businesses and answers question on every IT consultant and business owner should know about. He shares secrets on what he did to solve the 5 most common problems in IT, ...
  4. Disaster Recovery and Business Continuity
    Quest CTO, Mike Billon, gives an overview of the steps involved in building a dunamic disaster recovery plan. Through case studies and an examination of software/hardware tooles for monitoring and testing, you'll gain a better understandin of where you are, where you want ...
  5. Organize Your Visio Diagrams with Containers and Lists
    Scott Helmers uses cross functional flowcharts, wireframe diagrams, data graphic legends and seating charts to teach you: how to ustilize all three new structured diagram components in Visio 2010, the best practices for organizeing shapes in previous version of Visio, how to organize ...
  6. How to Us Objects, Properties, Events and Methods in Microsoft Access
    Alison Dalter gives an in-depbth look at objects, properties, events and methods in Microsoft Access. In this webinar you will learn about using the object browser, referring to objects, working with properties and methods, working with object variables, understanding the ...

Join the Community

Give a Little. Get a Lot.

Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.

Join the Community

Answers

 

by: CEHJPosted on 2003-02-13 at 10:51:44ID: 7944022

>>System.out.println("Converted:" + str + " to:" + utfStr);

You're talking about '?' getting printed out unexpectedly by the above code i take it?

 

by: cheekycjPosted on 2003-02-13 at 11:15:53ID: 7944214

yes.

This works:
convertedStr = convertedStr.replace('\ufffd', ' ');

But I was hoping for a solution that wouldn't require me to replace the chars manually.

CJ

 

by: orangehead911Posted on 2003-02-13 at 11:16:56ID: 7944220

To me it sounds like that the problem you're having is that the output device can't handle the odd characters! I have been getting the same result in the past.

Have you tried round-trip, meaning inserting from Java and then selecting from Java, showing the final result in a JTextField?

I am pretty sure that the driver works just fine, and that a call to Statement.execute or Statement.executeUpdate would encode your strings correctly.

\t

 

by: cheekycjPosted on 2003-02-13 at 11:21:50ID: 7944255

When I use the following (for testing purposes):

      String insertSql = "insert into unicode_test (string_id, string_value) values (?,?)";
      PreparedStatement ps = conn.prepareStatement(insertSql);
      ps.setInt(1, maxID);
      ps.setString(2, convertedStr);

      int rowsInserted = ps.executeUpdate();

The DB gets an inverted '?' stored as the character.

So the conversion that jdbc driver is doing is using a wierd character and I don't want that character to be stored or displayed in my tools or site.

Retrieving the value from the DB and displaying it returns the string with the '?' or inverted '?'

CJ

 

by: CEHJPosted on 2003-02-13 at 11:24:36ID: 7944282

>>yes.

You can't print Unicode characters to the console. Just show them in a GUI component and you'll probably find they're OK.

Another (rough) way to test:

String s = "\u20AC";
System.out.println(s.getBytes("UTF8").length);

You'll see that the length > 1

 

by: CEHJPosted on 2003-02-13 at 11:27:03ID: 7944300

OK - i'm not getting updates from this thread. You're ahead of me!

 

by: cheekycjPosted on 2003-02-13 at 11:33:08ID: 7944353

:-)

Its not the console I am worried about, it is the data being stored in the DB that has this ugly character that other languages quering that data are displaying.

CJ

 

by: orangehead911Posted on 2003-02-13 at 11:33:26ID: 7944357

Why are you converting the string before yhou send it to the DB? The driver should be handling all necessary conversion!

\t

 

by: cheekycjPosted on 2003-02-13 at 11:39:38ID: 7944406

The driver's conversion is resulting in the string being stored with those inverted '?' or '\ufffd' chars in the DB.  I don't want those in the DB.  Since the jdbc driver's conversion is doing this, I want to convert before the driver does it, so any unsupported chars are not replaced with ugly characters in the DB.

CJ

 

by: CEHJPosted on 2003-02-13 at 11:41:25ID: 7944423

That's what i was about to ask too.

>>that other languages quering that data are displaying.

They're all displaying the same thing are they?

 

by: cheekycjPosted on 2003-02-13 at 11:48:32ID: 7944481

well the other languages querying the DB (besides Java) are Perl, ColdFusion and C.

They display the inverted '?' b/c that is what they get back from the DB.

Coldfusion and Perl also insert into the DB.  ColdFusion is having the same problem as Java.  Perl reads a environment setting var called 'NLS_LANG' that fixes this issue as we set the encoding to 'WE8ISO8859P1'.

The OCI Oracle driver supports the 'NLS_LANG' setting and I set it for that but the thin driver (which is used 100% of the time) ignores all client settings.

CJ

 

by: orangehead911Posted on 2003-02-13 at 11:50:35ID: 7944496

Is the db driver set up correctly, or is it auto-configured?

Regarding Unicode and UTF-8, either the db supports UTF-8 or it doesn't, Unicode never enters the stage at that level. That is what UTF-8 is used for, encoding whatever multi-byte character set into discrete 8 bit values. It sounds more like there is a 8 bit to 7 bit conversion problem.

Have you tried printing out the actual byte values? Have you tried to store the UTF-8 byte buffer as binary in the db?

\t

 

by: CEHJPosted on 2003-02-13 at 12:02:19ID: 7944578

>>I want to convert before the driver does it

I wouldn't increase complexity. All you need to do is ascertain

a. if the db actually supports UTF-8
b. if the driver supports it

 

by: cheekycjPosted on 2003-02-13 at 12:36:03ID: 7944797

The DB is Oracle 8.1.7 and the driver is Oracle's jdbc thin driver (version 8.1.7 too).

I believe both support UTF-8.

CJ

 

by: CEHJPosted on 2003-02-13 at 12:43:27ID: 7944845

OK. As far as the DB is concerned, are the non-Java clients able to both read and write their own UTF-8?

 

by: objectsPosted on 2003-02-14 at 01:36:33ID: 7948424

Reading the following it appears to me that all a mapping exists for all characters:
http://www.sun.com/developers/gadc/technicalpublications/articles/utf8.html

or am i missing something?

there's also a bit of code to encode utf8 that may (or may not) help u.

 

by: cheekycjPosted on 2003-02-14 at 13:09:24ID: 7952761

I am looking into somethings with the DBAs.  I will give an update on Tuesday (at the latest)

CJ

 

by: CEHJPosted on 2003-02-14 at 13:18:46ID: 7952829

OK. If they answer my last question in the affirmative, it will eliminate the database from the problem.

 

by: cheekycjPosted on 2003-02-19 at 11:41:23ID: 7983201

still waiting on the verification of non-java clients being able to read/write UTF-8 to the DB.  Supposedly in Perl they set the environment (NLS_LANG charset to the Wester setting some "WEP..." String and it works) but I want proof :-)

objects: the encode method didn't work (given in the URL)
I used the following code:
      char[] charArray = originalStr.toCharArray();
      int[] scalorArray = new int[charArray.length];
      for (int i=0;i<charArray.length;i++)
        scalorArray[i]= Character.getNumericValue(charArray[i]);
      System.out.println("Conversion using encode results in:" + new String(encode(scalorArray), "UTF-8"));

I think that should be the way to convert Unicode to scalor values.

How can I verify that the driver supports UTF-8.  All documentation seem to point that way.

CJ

 

by: CEHJPosted on 2003-02-20 at 14:03:16ID: 7989825

You might try


SELECT DUMP(some_column, 1016) FROM some_table

then you compare the value with a UTF-8-encoded string holding the column value.

What does this prove? Good question! Possibly that the column value can be de/encoded as UTF-8

 

by: cheekycjPosted on 2003-03-03 at 12:44:36ID: 8059863

CEHJ: that just dumps out the numeric representation that is oracle specific right?

I am sorry that this question has to remain open this long but we are busy and fixing this seems to be low on the list but yet significant enough to devote time to it, if you know what I mean :-)

Please be patient.

CJ

 

by: CEHJPosted on 2003-03-03 at 13:23:40ID: 8060123

>>that just dumps out the numeric representation that is oracle specific right?

Well it should dump a hex representation of what's in there, so if it dumps utf-8, then it's a good sign!

 

by: kennethxuPosted on 2003-03-05 at 19:08:17ID: 8077031

>> Perl reads a environment setting var called 'NLS_LANG' that fixes this issue as we set the encoding to 'WE8ISO8859P1'.
check with you DBA, I remember that you must specify if you need multi-byte support during oracle dabase creation. WE8ISO8859P1 is single byte schema.

another point the the data source, how do you get the original string? use entered in java GUI?

 

by: cheekycjPosted on 2003-03-06 at 09:02:05ID: 8081612

well, the string is entered into a Java GUI and then inserted into the DB.

It is retrieved in various places using Perl, C++, and java.

CJ

 

by: kennethxuPosted on 2003-03-06 at 10:49:37ID: 8082516

I would try this:
get the data from GUI, say string "original", display it back to GUI, then save it to database without any convertion using prepared statement's setString() method. retrieve it back from database immediately without convertion into string "fromdb", display "fromdb" to the GUI again.

guess you might have already done it. what's the result?

 

by: CEHJPosted on 2003-03-06 at 10:52:14ID: 8082533

So what was the result of the dump?

 

by: cheekycjPosted on 2003-03-06 at 11:32:32ID: 8082871

we don't do any conversion in the gui.  We get the string and use a prepared statement to insert it.

But high ascii characters like (ALT-0147) and (ALT-0148)
show up as inverted chars in the DB and then are retrieved as such.  When you paste them into the gui they show up as small squares.


the results of the dump:
Typ=1 Len=6 CharacterSet=UTF8: 60,74,65,73,74,60
Typ=1 Len=6 CharacterSet=UTF8: 60,74,65,73,74,60
Typ=1 Len=6 CharacterSet=UTF8: 60,74,65,73,74,60
Typ=1 Len=6 CharacterSet=UTF8: 60,74,65,73,74,60
Typ=1 Len=6 CharacterSet=UTF8: 60,74,65,73,74,60
Typ=1 Len=10 CharacterSet=UTF8: e2,80,98,74,65,73,74,e2,80,99
Typ=1 Len=10 CharacterSet=UTF8: e2,80,98,74,65,73,74,e2,80,99
Typ=1 Len=10 CharacterSet=UTF8: e2,80,98,74,65,73,74,e2,80,99

CJ

 

by: CEHJPosted on 2003-03-06 at 11:58:07ID: 8083103

OK - both of those Strings are valid UTF-8. Can you recap what you think the problem is?

 

by: kennethxuPosted on 2003-03-06 at 16:59:11ID: 8085006

>> show up as inverted chars in the DB and then are retrieved as such.
when we say show up, it must through some front end. so what you see is the presentation of your data by that particular front end application (be it DB or 3rd party tools or utilities).

>> When you paste them into the gui they show up as small squares.
copy and paste again passes the data through windows clipboard, which does convertion.

why not try to retrieve it back directly using java code and display it on java GUI.

 

by: cheekycjPosted on 2003-03-10 at 14:56:19ID: 8106658

I mean on the front end (using Perl etc) and when we using PL-SQL Developer to query the data from the DB.

>>why not try to retrieve it back directly using java code and display it on java GUI

The data is used on a website driven by Perl and Java.  But the data is entered in a Java GUI tool.

CJ

 

by: kennethxuPosted on 2003-03-10 at 15:21:12ID: 8106802

i mean display back to java GUI to verify the data, for debug purpose.

It could be java inserted the right data but perl and/or PL-SQL Developer cannot display it correctly. so display back in java GUI can isolate the problem.

 

by: cheekycjPosted on 2003-03-11 at 07:48:48ID: 8111916

ok, I will try that.

CJ

 

by: kfahrutPosted on 2003-04-02 at 12:49:14ID: 8256512

I have some experience saving/retrieving encoded data. Here's my comments.

First, you need to know exactly what is NLS_CHARACTERSET setting for database. Use something like:
  select * from NLS_DATABASE_PARAMETERS;
You will have probably either WE8ISO8859P1 or UTF8.

1. If your database encoding is UTF8, you should NOT encode characters yourself! The reason is that driver will encode output string automatically, so your string will be UTF-8 encoded twice.

Every high Ascii character is UTF-8 encoded as 2 or three bytes. For example, "é" (hE9) will be encoded as two bytes "é" (hC3 hA9). If you will encode it yourself - you will have 4 or more bytes, because "Ã" character (hC3) will be UTF-8 encoded again.

To check what do you have, save some fixed string say "==é==" and read it back. See, how many characters you have saved in database - 5 or 10 - you should see either "==é==" or "==é==". You should retrieve back exactly your 5 charactes, because driver should decode UTF-8 to Unicode. See hexadecimals you retrieved to match with those you tried to save.

Win CP-1252 glyphs (e.g. TM or mdash) in the range 128 (80h) to 159 (9Fh) should be treated differently - they should not be UTF-8 encoded with others. I can explain why and how if you are interested.

2. If your database encoding is WE8ISO8859P1 - you can use your application level UTF-8 encoding. But in this case every other client reading from DB should be aware of it - to decode it "manually" as well.


 

by: kfahrutPosted on 2003-04-02 at 13:07:07ID: 8256631

I have some experience saving/retrieving encoded data. Here's my comments.

First, you need to know exactly what is NLS_CHARACTERSET setting for database. Use something like:
  select * from NLS_DATABASE_PARAMETERS;
You will have probably either WE8ISO8859P1 or UTF8.

1. If your database encoding is UTF8, you should NOT encode characters yourself! The reason is that driver will encode output string automatically, so your string will be UTF-8 encoded twice.

Every high Ascii character is UTF-8 encoded as 2 or three bytes. For example, "é" (hE9) will be encoded as two bytes "é" (hC3 hA9). If you will encode it yourself - you will have 4 or more bytes, because "Ã" character (hC3) will be UTF-8 encoded again.

To check what do you have, save some fixed string say "==é==" and read it back. See, how many characters you have saved in database - 5 or 10 - you should see either "==é==" or "==é==". You should retrieve back exactly your 5 charactes, because driver should decode UTF-8 to Unicode. See hexadecimals you retrieved to match with those you tried to save.

Win CP-1252 glyphs (e.g. TM or mdash) in the range 128 (80h) to 159 (9Fh) should be treated differently - they should not be UTF-8 encoded with others. I can explain why and how if you are interested.

2. If your database encoding is WE8ISO8859P1 - you can use your application level UTF-8 encoding. But in this case every other client reading from DB should be aware of it - to decode it "manually" as well.


 

by: cheekycjPosted on 2003-05-26 at 11:19:10ID: 8585576

time to close this out guys.

I am splitting the pts btw various experts since I believe it was a combination of GUI issues and DB.  I will award pts to those who gave me the best resources and tips to resolve the problem.

Thanx everyone!!

CJ

 

by: objectsPosted on 2003-05-26 at 16:36:03ID: 8586675

Thanks CJ :)

http://www.evalu8.com.au
"Giving everyone a voice"

 

by: plsqlPosted on 2003-06-05 at 06:11:40ID: 8656783

>>Comment from kfahrut
>>Win CP-1252 glyphs (e.g. TM or mdash) in the range 128 (80h) to 159 (9Fh) should be treated >>differently - they should not be UTF-8 encoded with others. I can explain why and how if you are >>interested.

i am expiereiencing the same problems. woulld you please explain this encoding-matter to me?!

Thx plsql

 

by: kfahrutPosted on 2003-06-05 at 08:35:52ID: 8658188

OK, here's my offline addition. Hope it helps.
Here's the situation we have: Our editors are editing texts and submitting them as ISO-8859-1 to the Oracle database with ISO-8859-1 encoding. The issue is that when they are entering characters like mdash or TM (as unicode characters), those are converted by Microsoft tools to CP-1252. So that mdash becomes 97h and TM - 99h. Though those CP-1252 characters in the range 128 (80h) to 159 (9Fh) are illegal in ISO-8859-1, both web browser and Oracle database are accepting and saving them.

So now we have CP-1252 characters in ISO-8859-1 database. If we read them back as if they were ISO-8859-1 and send back to the browser as ISO-8859-1 - it works! We even tried it on Netscape browser on Apple - and we saw all Windows CP-1252 characters properly on the ISO-8859-1 HTML page. I would never suggest that Netscape would support CP-1252 characters on Apple as if they were true ISO-8859-1 chars!

Now goes the problem. If we are reading those characters from database and sending response back UTF-8 encoded, then CP-1252 characters are UTF-8 encoded simply by appending C2h, so that mdash (97h) becomes C2h 97h. And now we can't see those UTF-8 encoded CP-1252 characters neither in MS IE or Netscape (with UTF-8 encoding for the HTML page). For example, if we are sending those back through web services (that are using UTF-8) - client can't see mdashes, TMs, etc.

To solve the issue we have (more than) two approaches.
1. Before saving supposedly ISO-8859-1 characters to database - scan the string for the illegal for ISO CP-1252 codes: 128 (80h) to 159 (9Fh).
Substitute them with HTML or XML named or numerical entities, so that instead of saving one byte mdash character 97h, save "&mdash;".
2. If you already have those parasite CP-1252 chars in your ISO-8859-1 database - then before UTF-8-encoding and sending string to the client either do the same - substitute CP-1252 characters with HTML or XML named or numerical entities - or - that what I an currently doing - in Java program substitute Unicode chars 00 80h to 00 9Fh with the true Unicode equivalents, so that mdash character 0097h to be substituted with the true Unicode - 2014h, TM 0099h with 2122h, etc.
Now if this Java Unicode string - without CP-1252 chars - will be encoded with UTF-8, those characters will be properly UTF-8 encoded as three bytes: E2 80 94h (mdash) and E2 84 A2h (TM).







 

by: wikizPosted on 2004-05-24 at 02:22:28ID: 11141440

Though this might be too late to answer this, but I've had no problems just using String.getBytes("UTF-8") and, using parameterized query, setBytes(). It's the point that you _cannot_ output a string once you get it as bytes, because it needs backward conversion to somewhat charset, and this last step produces these "unrecognized character" question-marks.
This worked for me using Interbase, but I believe this would work with any DB that does not perform any character coliation and stores strings as it gets them.
When you retreive the results from a query, just perform steps backward: from a result set, call getBytes(), and construct a string with these bytes and UTF-8 encoding.

 

by: vinodh-tkPosted on 2008-08-19 at 06:21:40ID: 22259471

Is there any way to validate the 'encoding style' in the XML.

 

by: pelauPosted on 2010-04-09 at 12:27:07ID: 30228879

String before = "Chèque reçu";
bytes [] isoBytes = before.getBytes("iso-8859-1");
String after = new String(isoBytes, "utf-8");

the console shows after.toString as "Chèque reçu"

20120131-EE-VQP-002

3 Ways to Join

30-Day Free Trial

The Experts

98% positive feedback on 31,087 answers since March 2000. angeliii is a Microsoft Most Valuable Professional for his work with MS SQL Server & Develoment.

He has also proven his knowledge of Visual Basic Programming, PHP Scripting and Oracle Databases.

The Experts

97% positive feedback on 10,752 answers since July 2000. lrmoore has more than 18 years experience in the networking industry.

The six-time Mircosoft MVPs specialties include firewalls, virtual private networking, and network management.

Testimonials

"...and excellent source for support... Kind of like having your very own IT dept." Electriciansnet

Testimonials

"I was apprehensive at signing up at first. However... it has already made my life as an IT administrator much easier." JaCrews

Testimonials

"WOW! You guys have great, active, and knowledgeable people on here." moore50

Business Clients

Business Clients

In the Press

"If you’ve got a question... Experts Exchange can supply an answer.”

In the Press

"...an invaluable aid for both IT professionals and those who require tech support."

In the Press

"where IT professionals provide quick answers on just about any topic"

Business Account Plans

Loading Advertisement...