Solved

HTML - special characters not being displayed correctly?

Posted on 2011-09-21
8
469 Views
Last Modified: 2012-05-12
Hi,
if I insert the following html into a page the first 3 lines display as a box where the special characters are but last 3 lines display OK.

Is there a way of fixing this?

<h1>      – when</h1>
<h1>      “Special”</h1>
<h1>      hour’s</h1>
<h1>      - when</h1>
<h1>      "Special"</h1>
<h1>      hour's</h1>


chars.jpg
chars-2.jpg
0
Comment
Question by:sabecs
  • 4
  • 3
8 Comments
 
LVL 23

Expert Comment

by:Brian Gee
Comment Utility
It looks like your HTML editor was MS Word or the code was created in MS Word and then copied over to the HTML editor (at least for the first portion of data). Lines two and three in your example is using MS Word's smart quotes while lines five and six are using standard straight quotes which are easily interpretable by all Web browsers. Change the smart quotes to straight quotes accordingly.

The emdash in the first line is an MS Word symbol as well. Change it with this instead: &#151;
0
 
LVL 82

Expert Comment

by:Dave Baldwin
Comment Utility
Your web page should specify a character set that matches the character encoding that you are actually using.  Your bottom 3 lines are using plain ASCII (7-bit) characters and the top 3 are most likely Windows-1252 like you would find in Word or possibly ISO8859-1 Latin1.  If you specify UTF-8, the 'special characters' don't match up so you don't get what you wanted.

Character set encodings can be very confusing and irritating.
0
 

Author Comment

by:sabecs
Comment Utility
Thanks for your help, html is saved in my MySQl table with Collation of utf8_general_ci.
Should this be changed to something else?
0
 
LVL 82

Expert Comment

by:Dave Baldwin
Comment Utility
There are two different subjects here.  One is the character set you're telling the software that you're using and the other is the actual encoding of the characters.  If you are going to be using the characters you're shown above, then you might want to change both the collation in the database and the 'charset' in the web pages so they match the actual characters that you are using.  Which appears to be ISO8859-1 Latin or Windows, perhaps 'ANSI'.   This page http://www.alanwood.net/ can tell you a lot more about the details.  Note that changing the collation or 'charset' does not change the data, the text, that you have entered.  There is no conversion, you're just telling the software to interpret what's there in a specific way.
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 

Author Comment

by:sabecs
Comment Utility
Thanks Dave, are you saying I need to chnage the Doctype as well?
I currently have the following:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Also, collation option I have are shown below, which one shoudl I choose.

Thanks for your help, it is greatly appreciated.

armscii8_bin        
armscii8_general_ci  
ascii_bin            
ascii_general_ci    
big5_bin            
big5_chinese_ci      
binary              
cp1250_bin          
cp1250_croatian_ci  
cp1250_czech_cs      
cp1250_general_ci    
cp1251_bin          
cp1251_bulgarian_ci  
cp1251_general_ci    
cp1251_general_cs    
cp1251_ukrainian_ci  
cp1256_bin          
cp1256_general_ci    
cp1257_bin          
cp1257_general_ci    
cp1257_lithuanian_ci
cp850_bin            
cp850_general_ci    
cp852_bin            
cp852_general_ci    
cp866_bin            
cp866_general_ci    
cp932_bin            
cp932_japanese_ci    
dec8_bin            
dec8_swedish_ci      
eucjpms_bin          
eucjpms_japanese_ci  
euckr_bin            
euckr_korean_ci      
gb2312_bin          
gb2312_chinese_ci    
gbk_bin              
gbk_chinese_ci      
geostd8_bin          
geostd8_general_ci  
greek_bin            
greek_general_ci    
hebrew_bin          
hebrew_general_ci    
hp8_bin              
hp8_english_ci      
keybcs2_bin          
keybcs2_general_ci  
koi8r_bin            
koi8r_general_ci    
koi8u_bin            
koi8u_general_ci    
latin1_bin          
latin1_danish_ci    
latin1_general_ci    
latin1_general_cs    
latin1_german1_ci    
latin1_german2_ci    
latin1_spanish_ci    
latin1_swedish_ci    
latin2_bin          
latin2_croatian_ci  
latin2_czech_cs      
latin2_general_ci    
latin2_hungarian_ci  
latin5_bin          
latin5_turkish_ci    
latin7_bin          
latin7_estonian_cs  
latin7_general_ci    
latin7_general_cs    
macce_bin            
macce_general_ci    
macroman_bin        
macroman_general_ci  
sjis_bin            
sjis_japanese_ci    
swe7_bin            
swe7_swedish_ci      
tis620_bin          
tis620_thai_ci      
ucs2_bin            
ucs2_czech_ci        
ucs2_danish_ci      
ucs2_esperanto_ci    
ucs2_estonian_ci    
ucs2_general_ci      
ucs2_hungarian_ci    
ucs2_icelandic_ci    
ucs2_latvian_ci      
ucs2_lithuanian_ci  
ucs2_persian_ci      
ucs2_polish_ci      
ucs2_roman_ci        
ucs2_romanian_ci    
ucs2_slovak_ci      
ucs2_slovenian_ci    
ucs2_spanish2_ci    
ucs2_spanish_ci      
ucs2_swedish_ci      
ucs2_turkish_ci      
ucs2_unicode_ci      
ujis_bin            
ujis_japanese_ci    
utf8_bin            
utf8_czech_ci        
utf8_danish_ci      
utf8_esperanto_ci    
utf8_estonian_ci    
utf8_general_ci      
utf8_hungarian_ci    
utf8_icelandic_ci    
utf8_latvian_ci      
utf8_lithuanian_ci  
utf8_persian_ci      
utf8_polish_ci      
utf8_roman_ci        
utf8_romanian_ci    
utf8_slovak_ci      
utf8_slovenian_ci    
utf8_spanish2_ci    
utf8_spanish_ci      
utf8_swedish_ci      
utf8_turkish_ci      
utf8_unicode_ci
0
 
LVL 82

Accepted Solution

by:
Dave Baldwin earned 500 total points
Comment Utility
Not the DOCTYPE but the charset value in this:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

That works when I put your text above in a page.  It doesn't work if I put UTF-8.

The matching collation is latin1_general_ci.
0
 

Author Closing Comment

by:sabecs
Comment Utility
Thanks Dave, that did the trick
0
 
LVL 82

Expert Comment

by:Dave Baldwin
Comment Utility
You're welcome, glad to help.
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Have you tried to learn about Unicode, UTF-8, and multibyte text encoding and all the articles are just too "academic" or too technical? This article aims to make the whole topic easy for just about anyone to understand.
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
In this tutorial viewers will learn how to style elements, such a divs, with a "drop shadow" effect using the CSS box-shadow property Start with a normal styled element, such as a div.: In the element's style, type the box shadow property: "box-shad…
In this tutorial viewers will learn how to embed Flash content in a webpage using HTML5. Ensure your DOCTYPE declaration is set to HTML5: "<!DOCTYPE html>": Use the <object> tag to embed Flash content.: To specify that the object is Flash content, d…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now