• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1282
  • Last Modified:

Changing UTF8 characters to HTML entities in PHP?

I have user data with valid characters (like the long dash, see attached screen shot) that I need to put on a web page.  Of course it's showing up funny.  Current encoding on the web page is iso-8859-1 but I suppose I can change it to UTF-8.

The options for getting HTML entities out of UTF8 characters baffle me like mb_check_encoding(), utf8_decode(), etc.  Some of them require complied in support and I cannot recompile PHP since I'm on Red Hat Linux so please see attached for what's available in php_info();

Basically I need a function that makes the funny characters go away, replaced with the correct expression - the long dash, etc.
ScreenShot001.jpg
php-info.txt
0
Geoff Millikan
Asked:
Geoff Millikan
  • 6
  • 6
  • 2
  • +1
1 Solution
 
Chris HarteThaumaturgeCommented:
0
 
Geoff MillikanAuthor Commented:
Come on guys!

MunterMan: I want to encode, not decode!

shobinsun:

http://www.jainsachin.com/php_manual_tutorial/function.htmlspecialchars.html
http://in.php.net/htmlentities

htmlspecialcharacters() encodes normal characters, not multibye weird ones.

http://www.seopher.com/articles/html_character_encoding_with_php_made_easy

I mentioned this one in my question, mb_check_encoding().  This uses mb_convert_encoding() which is great except that it requires complied in support per line below:

http://www.php.net/manual/en/mbstring.installation.php

http://htmlpurifier.org/docs/enduser-utf8.html

Huh?


0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Geoff MillikanAuthor Commented:

shell> php -r "echo mb_convert_encoding();"
 
PHP Fatal error:  Call to undefined function mb_convert_encoding() in Command line code on line 1

Open in new window

0
 
Chris HarteThaumaturgeCommented:
fffd is not an html special character. It is the unicode replacement character, something that is used for an unknown or unprintable character.
Red Hat uses utf-8 as the default character encoding so I would try setting the webpage encoding to that first. If that does not work, then you can worry about converting strings.
0
 
shobinsunCommented:
Hi,

Try with urlencode () function.

http://in.php.net/urlencode

Hope this will help you.

Regards
0
 
Geoff MillikanAuthor Commented:
shobinsun: urlencode() turns it into more of a mess so it looks like this now in the web browser:: Business+card%97Electronic

MunterMan: Web page HTML is set to UTF-8 now and no change.  Output below should that everything is UTF-8.

Would love to have a solution.
==== HTML page ===
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
....
 
=====HTML Response Header ====
HTTP/1.x 200 OK
Date: Fri, 24 Apr 2009 00:50:53 GMT
Server: Apache/2.2.3 (Red Hat)
X-Powered-By: PHP/5.1.6
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

Open in new window

0
 
shobinsunCommented:
Hi,

Then you can use the str_replace() function too along with url_encode().

http://in2.php.net/preg_replace

http://php.about.com/od/advancedphp/ss/php_preg_4.htm


Hope this will help you.

Thanks And Regards
0
 
Geoff MillikanAuthor Commented:
Turns out RHEL5 can have the multi byte package installed!  So I have it up and running but converting t to both these just ends up striping the special characters totally out - it doesn't fill in with a space, just strips the whole character out.

mb_convert_encoding($data, 'ISO-8859-1', 'UTF-8')

mb_convert_encoding($data, 'HTML-ENTITIES', 'UTF-8')

Anyone, thoughts?
#Redhat support says:
Please install the "php-mbstring" package using "yum install php-mbstring" command to use mb_convert_encoding() function.

Open in new window

0
 
Geoff MillikanAuthor Commented:
Currently using preg_replace() below to take out any special characters and replace them with a space.  But that's not what  I w ant to do.  I want to show the UTF-8 characters.

What can't I show UTF8 characters on a web page without them showing up as boxes?  Seems like a simple question.
preg_replace("/[^a-zA-Z0-9)(-s]/", " ", $data);

Open in new window

0
 
shobinsunCommented:
Hi,

Make sure you have the following in your html: <meta http-equiv="content-type" content="text/html; charset=utf-8">

And also you have

default-character-set=utf8

in your php.ini.

Please go through:

http://blogs.sun.com/shankar/entry/how_to_handle_utf_8

0
 
shobinsunCommented:
Hi,

Also look at this:
http://www.penguin-soft.com/penguin/developer/php/ref.mbstring.html

Hope this will give you the good idea.

Regards
0
 
abolinhasCommented:
Hi

You can use one of the following options:


1º  Inside html tag
<html>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<body>
</body>
</html>

2º inside php code
<?php
header('Content-Type: text/html; charset=ISO-8859-1');
?>

Regards
0
 
abolinhasCommented:
Hi

You can use one of the following options:


1º  Inside html tag
<html>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<body>
</body>
</html>

2º inside php code
<?php
header('Content-Type: text/html; charset=ISO-8859-1');
?>

Regards
0
 
Geoff MillikanAuthor Commented:
Yes, please close as unsolved. Change page encoding to UTF-8 or ISO-8859-1 doesn't resolve the issue.  Implemented own solution of striping out special characters as short term solution.
0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

  • 6
  • 6
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now