Solved

htmlspecialchars and htmlentities are not working

Posted on 2013-05-13
10
1,217 Views
Last Modified: 2013-06-21
Hi

This code sample is not working. Can you please help me out

<?php
$str = "A 'quote' is <b>bold</b>";

echo htmlspecialchars($str);
echo htmlentities($str, ENT_QUOTES);
?>

Open in new window

0
Comment
Question by:KaranGupta
  • 4
  • 3
  • 3
10 Comments
 
LVL 4

Expert Comment

by:ramyajanarthanan
Comment Utility
May I know what you mean by not working

htmlspecialchars() is encoding the < and > characters properly. It is just that when you echo the encoded string to your computer screen, your browser helpfully decodes the characters again. If you view the page source you will see the encoded string.

ON BROWSER
A 'quote' is <b>bold</b>

View page source

A 'quote' is &lt;b&gt;bold&lt;/b&gt;
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
This appears to be working perfectly.

Browser display:
A 'quote' is <b>bold</b>A 'quote' is <b>bold</b>

View source:
A 'quote' is &lt;b&gt;bold&lt;/b&gt;A &#039;quote&#039; is &lt;b&gt;bold&lt;/b&gt;

Man page references:
http://www.php.net/manual/en/function.htmlspecialchars.php
http://www.php.net/manual/en/function.htmlentities.php
0
 

Author Comment

by:KaranGupta
Comment Utility
Hi

In the sample code given above. I am using both the methods i.e. htmlspecialchars() and htmlentities(). In both the cases output are same. What is the difference between the two?
0
 
LVL 4

Assisted Solution

by:ramyajanarthanan
ramyajanarthanan earned 250 total points
Comment Utility
htmlspecialchars ONLY converts these

* '&' (ampersand) becomes '&'
* '"' (double quote) becomes '"' when ENT_NOQUOTES is not set.
* ''' (single quote) becomes ''' only when ENT_QUOTES is set.
* '<' (less than) becomes '<'
* '>' (greater than) becomes '>'

thats why its "special" and the htmlentities - Convert all applicable characters to HTML entities

EXAMPLE

Using htmlentities()
<?php echo htmlentities('ñ'); ?>

OUTPUT (in view source): &Atilde;&plusmn;


Using htmlspecialchars()

echo htmlspecialchars('ñ');
<?php echo htmlspecialchars('ñ'); ?>

OUTPUT (in view source): ñ

as you can see htmlentities() converts the ñ to the actual HTML code, where as htmlcharacters only coverts HTML tags for example

Hope you can understand the difference.
Look at these links too

http://www.w3schools.com/php/func_string_htmlspecialchars.asp

http://php.net/manual/en/function.htmlspecialchars.php

http://www.php.net/manual/en/function.htmlentities.php
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
What is the difference between the two?
You're in luck!  PHP is documented online, complete with an online manual and infused with user-contributed notes!  You never need to wonder about any PHP function again.  Simply go to the online man page and read the descriptions.  Look at the examples.  See how others have used the language, and what interesting problems they have solved.

IMHO, the popularity of PHP is not rooted in its "ease of use" which has an embarrassing legacy of sloppy code and security problems, but in its online documentation.  You can find the links to the documentation for these functions here.
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 

Author Comment

by:KaranGupta
Comment Utility
I have got the fair knowledge of htmlspecialchars() method, but I am still confused with htmlentities method

I have tried the following code
htmlentities("ñ");
But I can't see anything
0
 
LVL 4

Expert Comment

by:ramyajanarthanan
Comment Utility
what is that you cant find anything?

Did you saw the output difference between those two,

as you can see htmlentities() converts the ñ to the actual HTML code, where as htmlcharacters only coverts HTML tags

in the view source the htmlcharacters outputs same as the input but not in case of
htmlentities() as it converts all html characters

Please look at the difference


Using htmlentities()

OUTPUT (in view source): &Atilde;&plusmn;


Using htmlspecialchars()


OUTPUT (in view source): ñ
0
 

Author Comment

by:KaranGupta
Comment Utility
Hi

I have tried following samples yesterday


echo htmlspecialchars("ñ");
echo htmlentities("ñ");

If you check the view source you don't find anything. Secondly when I tried following code sample


echo htmlspecialchars("<");
echo htmlentities("<");

according to the description given by you former should be displayed like '<' and later should be like '&lt;' when I view the source. But both are showing as '&lt;'

Please correct me if my understanding is right.
0
 
LVL 4

Expert Comment

by:ramyajanarthanan
Comment Utility
Tried your example,i'm getting it displayed correct.I don't know how you could alone  see the same on output screen and view source.Check it or post the screen shot of yours.
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 250 total points
Comment Utility
You may have more than one issue at work here.  One of the issues has been run to ground, and that is the use of PHP functions htmlentities() and htmlspecialchars().  These work correctly and predictably.  You must use the browser view source to determine how the entities were rendered.

The other issue appears to be the character-encoding issue.  It shows up in the form of the "ñ" being rendered in the form of &Atilde;&plusmn; The A-Tilde character is one of the "signature" characters of botched UTF-8 encoding (another common signature character is the A-Ring).  To get something of a background in character encoding, please read this article.  You may find, as I did, that you need to read it more than once to absorb the information -- it's concentrated material and very worthwhile knowledge!
http://www.joelonsoftware.com/articles/Unicode.html

You may also want to read this.
https://en.wikipedia.org/wiki/UTF-8

ASCII is a one-byte character set.  Since the maximum value of a byte is 256 decimal, it follows that ASCII is not capable of rendering more than 256 charcacters.  UTF-8 is a multi-byte character set.  It can have one, two, three or four bytes in each character.  UTF-8 is capable of rendering millions of characters.

All of the characters below code point 128 decimal are the same in UTF-8 and ASCII. This covers the American English alphabet, most western punctuation and all the Arabic numbers.  Up to code point 127 you can "get away" with mixing UTF-8 and ASCII.

You can render the n-tilde in ASCII.  It resides at code point 241 decimal.  You can also render the n-tilde in UTF-8.  In UTF-8 it takes up two bytes, 195 and 177 decimal.  

You have to choose between ASCII and UTF-8; you cannot intermix them reliably if you ever want to use any of the accented characters.  The appropriate meta charset declaration will tell the browser which encoding you want to use.

The decimal number 128 is hexadecimal 80 and is binary 1000 0000.  It is this leftmost bit that signals the UTF-8 multi-byte characters.  If the browser expects UTF-8 and sees this bit it will look at one, two or three of the following bytes to determine what character to render.  If you use an ASCII character above code point 127 (hex 7F and binary 0111 1111) but you have told the browser that it's getting UTF-8 characters, a collision will occur and the output will not be rendered correctly.  In some instances (JSON comes to mind) only UTF-8 is supported and a character encoding error will cause data loss.  If you have told the browser that it's getting ASCII characters and you give it UTF-8 multi-byte characters the browser will render things like the A-tilde followed by one, two or three additional "goofy looking" characters.

Executive summary: Use consistent encoding throughout the application.  Lean into the direction of UTF-8, since it is becoming the overarching standard.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Why do we like using grid based layouts in website design? Let's look at the live examples of websites and compare them to grid based WordPress themes.
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
The viewer will learn how to dynamically set the form action using jQuery.
HTML5 has deprecated a few of the older ways of showing media as well as offering up a new way to create games and animations. Audio, video, and canvas are just a few of the adjustments made between XHTML and HTML5. As we learned in our last micr…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now