PHP displaying UTF-8 encoded characters

So, this is probably a simple question, but I must be missing something. I successfully save UTF-8 encoded Chinese characters to a mysql database.

For example, they end up looking like this in the field ( This is random text taken from a google search, so I do not know what it means )
汉语/漢語

Open in new window


If I simply display it, it works fine. However, all my form values get htmlspecialchars treatment, and when this is done it ends up changing the & to & and displays the text as above and not as its corresponding Chinese character. There doesn't seem to be an additional step in any of the instruction I can find on dealing with these characters, so curious if I'm missing something simple.

I can of course "fix" it by replacing &# with &# after the htmlspecialchars call, I'd prefer to just know what I'm doing wrong though. Thanks!
WhistlingMtnAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
Ray PaseurConnect With a Mentor Commented:
I am wondering about this part: all my form values get htmlspecialchars treatment -- why?  The usual place one might use htmlspecialchars() is to prevent user-supplied text from containing HTML markup in a message board or guest book.  Thus it would not apply to all form values, but would be used on external text before displaying the text output to the browser.  In any case, there are only five translations performed by the function, so you might try performing four of them yourself in a local function.
0
 
Andrew DerseIT ManagerCommented:
I belive the &amp is actually the '&' itself...

Have you tried just using: &27721;&35821;

?
0
 
Andrew DerseIT ManagerCommented:
Yeah I just tried that within my Joomla installation.  The text editor is filtering the & and changing it to &.

The way you supplied the &#27721 into the content is how you can trick the system...

This is what I got using &27721;

&27721;

This is what I got using &#27721:
¿

The issue here is if you are using a text editor or not...as it's filtering your code and changing it on you...you can try turning it off and see what happens.

Looks like you are doing it right.
0
Get 10% Off Your First Squarespace Website

Ready to showcase your work, publish content or promote your business online? With Squarespace’s award-winning templates and 24/7 customer service, getting started is simple. Head to Squarespace.com and use offer code ‘EXPERTS’ to get 10% off your first purchase.

 
WhistlingMtnAuthor Commented:
well, & is the encoded version of &

I don't have a choice on what they're ending up as, they're getting encoded by mysql to UTF-8. The problem would still be the same though;
汉 and 汉 are not the same thing

<input type="text" value="&#27721;" /> Displays the Chinese Character
<input type="text" value="&amp;#27721;" /> Displays the literal "&#27721;" text

Open in new window


I can pick out the &amp;# and convert it back to &#, but having viewed examples online I didn't see anyone else requiring this, they just got their encoded text, htmlspecialchars, and display. Maybe I just misunderstood them.
0
 
Andrew DerseIT ManagerCommented:
Whoa...even here they are using a text filter...it changed the character to an upside down question mark...here's a screen shot of what it looks like:

 char
0
 
Andrew DerseIT ManagerCommented:
Ah, I see what you mean...
0
 
WhistlingMtnAuthor Commented:
I may just replace all &amp; back to &, since it's not a dangerous character in a text field anyway. Just perplexed as to why I'm having to do this when the dozens of threads online make no mention of it.
0
 
WhistlingMtnAuthor Commented:
Yea I should have closed the question, this was basically my solution.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.