Solved

Malformed Spanish Characters when want UTF-8 output

Posted on 2008-10-17
9
917 Views
Last Modified: 2012-06-27
Hi

I'm having problems displaying UTF-8 characters on ASP webpages.

http://staging-www.bedandbreakfasts.es/  is example look at body of text lots of boxes appearing.

I've added <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> meta tag

and

<%@codepage=65001 %>
<%Response.Charset="UTF-8"%>

At top of ASP code.   But still getting these boxes.

What gives?
0
Comment
Question by:bendecko
  • 6
  • 3
9 Comments
 
LVL 51

Expert Comment

by:Mark Wills
ID: 22746808
By the time I see it, the characters are already compromised. They are of course multi-byte characters, and would suggest you need :

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" lang="es">
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 22746813
Could also be the datasource, or, the string used to retrieve that data from the database - maybe it is not unicode data type ...
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 22747102
Well, it all appears to be part of the Ansi character set, so, should be OK. Interesting to see the body does have correct letter representation, and the hotkeys in the otherwise wrongly formatted text are also OK - but they do have substitution happening e.g.  funci&oacute;n de b&uacute;squeda   ie o acute and u acute respectively... but not in the rest. Where as choose a different language and we do see the substitution... interesting...
0
 
LVL 1

Author Comment

by:bendecko
ID: 22747543
The page is comprised of part database part static HTML mix.  

I've added the line

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" lang="es">

in the meta but the body is still junk.

When I look in the HTML I can see some &codes for characters and some not.  Should they all be &codes or theoretically should the browser display the correct characters since the Content-Type is set correctly?

In the SQL database I can see the spanish characters via Enterprise Manager and they do render correctly e.g in the left handside menu bar.

Thanks for the help

Bendecko
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 51

Expert Comment

by:Mark Wills
ID: 22747575
yes I can see some of that - the big question is why just the top part (excluding hyperlinks) - so, how is that part different from the next paragraph...
0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 250 total points
ID: 22748177
There is definitely non-ansi characters where the boxes appear, when I save the page and open in textpad - textpad complains about non-ansi characters.

For example, on the destination navigation indicator (ie El mundo > Europa > España) we find España is spelled (in Hex) : 45 73 70 61 C3 B1 61      

meaning two digits C3 B1 for ñ

on the next line (ie <H1> ) it is spelled : 45 73 70 61 EF BF BD 61

meaning three digits EF BF BD for i do not know what... other than it shows as a square (maybe that is what it is)...

Now, if I manually go in and change those characters and make sure it is:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">    i did end up removing the lang = "ES" did not seem to matter (just yet maybe).
then it does work. Changing the above line as utf_8 really screws it up again...

So, if you focus on getting <H1> correct, then the rest should likely follow on (in terms of a fix). At this stage, it looks a bit like how the page is being generated, because everything from the next title down (ie  Costas fabulosas y escapadas a las Islas )  . Similary <H2> seems compromised as well. Maybe it has something to do with the embedded tables - thought doesn't really explain why it comes good again after that next bold heading...

When I click on any of the top few links, then the following page is similarly compromised... But, the other languages appear to be OK.

So, definitely look at how you are retrieving, what the datatypes are from the database, and because the other lagaunages are OK, would say there is some default handling behaviour or database content that is fundamentally different.

Not sure if I can help any more at this stage...  Maybe show how you retrieve if Italian versus Spanish, and what you are retrieving, and where you are retrieving it from.

Hope that helps...
0
 
LVL 1

Author Comment

by:bendecko
ID: 22755605
OK Great I'm getting somewhere.  It looks like that file was saved in the wrong sort of encoding so I loaded with textpad and saved it out as UTF-8 and now the characters appear.   However further down the page I'm still getting the boxes.

The text below is generated from a database and an <!--Include--> file.  I loaded that file and saved it out as before but this time it didn't work.  The database was generated by a FORM post from a translator writing the Spanish.  I don't know the encoding of that form; it might not have been UTF-8 - probably not - so maybe the data in that part of database is not now compatible with the encoding of the page.

Don't worry about French etc being different.  The Spanish staging site is the first one to specify the encodings and all the other languages will have the same problems later!

How do I see in textpad the byte sequences you mention above?

Thanks

Bendecko
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 22755636
have to open textpad first, then do a file open and select binary as file type... cannot edit, it becomes read only, but gives you the hex view - like the old fashioned DUMP command.

Yes it does sound like a database / data problem... It is a pity about french et al - it looks like that was working well.
0
 
LVL 1

Author Closing Comment

by:bendecko
ID: 31508175
Thank you.

It turned out to be a routine that copied the translated HTML sections from their inital location to the staging areas that was corrupting the characters. The routine used FSO filesystem object and this messed up the UTF-8 characters.  You have to use ADOstreams instead to preserve the formatting.  

For any other EE user embarking on Internationalisation you should definately read even just for a good laugh Joel's article: http://www.joelonsoftware.com/articles/Unicode.html

Thanks again Mark for help me on this one.

Bendecko
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Having an SQL database can be a big investment for a small company. Hardware, setup and of course, the price of software all add up to a big bill that some companies may not be able to absorb.  Luckily, there is a free version SQL Express, but does …
Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now