We help IT Professionals succeed at work.

Character set problem

fcanepa
fcanepa asked
on
Medium Priority
1,402 Views
Last Modified: 2008-02-01
Hello,
porting an existing application to a new server with MySQL 5, when retreiving data from MySQL through Connector/J, I have encountered a weird problem: the character 0x92 (closing quote), which is correctly stored on the db, is retrieved by the jdbc driver as three characters: "’" (0xe2,0x80,0x99). The other characters, such as accenter letters, are retrieved correctly and rendered in html as ISO-8859-1.

I couldn't manage to figure out how to solve this problem in any way, so I decided make a text replace:

ret=ret.replaceAll("’","’");

but it seems not to match the three character sequence correctly.

Can anybody help me with this, or better figure out how to solve the problem at its source?

Thanks,
Fabio
Comment
Watch Question

CERTIFIED EXPERT
Top Expert 2016

Commented:
>>the character 0x92 (closing quote)

That's not part of the iso8859-1 character set

Author

Commented:
Yes, you are right. Anyway, in the old setup I didn't have to bother about character sets at all, and the character 0x92 (closing quote) was fetched correctly by jdbc and rendered correctly by apache.

But now I can't understand how to solve this strange problem. The character seems transformed in this utf-8-like three character sequence by the jdbc driver. I have logged the strings just as they are fetched from the db. If I utf-8 encode all the strings sent to the web page and set utf-8 encoding of the web page, this character will continue to give me problems, I think...
CERTIFIED EXPERT
Top Expert 2016

Commented:
Its Unicode code is U+2019 and your jdbc driver is reading in UTF-8
CERTIFIED EXPERT
Top Expert 2016

Commented:
The character is storable as UTF-8

Author

Commented:
You are right. If I log the strings immediately after fetching them from the db, also accented letters are in UTF-8. The problem is in how they are processed afterwards. I use Velocity template engine. I'll chech if it has some character set-related options. After merging the template the accented characters seems rendered as normal latin1 characters, while 0x92 is rendered as a '?'.
CERTIFIED EXPERT
Top Expert 2016
Commented:
You need to get Velocity using UTF-8. You can see from the 3-byte encoding that the right quote is treated correctly as UtF-8. Now you need Velocity to do so as well

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts
CERTIFIED EXPERT
Top Expert 2016

Commented:
:-)
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.