Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Malformed Spanish Characters when want UTF-8 output

Posted on 2008-10-17
9
Medium Priority
?
962 Views
Last Modified: 2012-06-27
Hi

I'm having problems displaying UTF-8 characters on ASP webpages.

http://staging-www.bedandbreakfasts.es/  is example look at body of text lots of boxes appearing.

I've added <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> meta tag

and

<%@codepage=65001 %>
<%Response.Charset="UTF-8"%>

At top of ASP code.   But still getting these boxes.

What gives?
0
Comment
Question by:bendecko
  • 6
  • 3
9 Comments
 
LVL 51

Expert Comment

by:Mark Wills
ID: 22746808
By the time I see it, the characters are already compromised. They are of course multi-byte characters, and would suggest you need :

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" lang="es">
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 22746813
Could also be the datasource, or, the string used to retrieve that data from the database - maybe it is not unicode data type ...
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 22747102
Well, it all appears to be part of the Ansi character set, so, should be OK. Interesting to see the body does have correct letter representation, and the hotkeys in the otherwise wrongly formatted text are also OK - but they do have substitution happening e.g.  funci&oacute;n de b&uacute;squeda   ie o acute and u acute respectively... but not in the rest. Where as choose a different language and we do see the substitution... interesting...
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 1

Author Comment

by:bendecko
ID: 22747543
The page is comprised of part database part static HTML mix.  

I've added the line

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" lang="es">

in the meta but the body is still junk.

When I look in the HTML I can see some &codes for characters and some not.  Should they all be &codes or theoretically should the browser display the correct characters since the Content-Type is set correctly?

In the SQL database I can see the spanish characters via Enterprise Manager and they do render correctly e.g in the left handside menu bar.

Thanks for the help

Bendecko
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 22747575
yes I can see some of that - the big question is why just the top part (excluding hyperlinks) - so, how is that part different from the next paragraph...
0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 1000 total points
ID: 22748177
There is definitely non-ansi characters where the boxes appear, when I save the page and open in textpad - textpad complains about non-ansi characters.

For example, on the destination navigation indicator (ie El mundo > Europa > España) we find España is spelled (in Hex) : 45 73 70 61 C3 B1 61      

meaning two digits C3 B1 for ñ

on the next line (ie <H1> ) it is spelled : 45 73 70 61 EF BF BD 61

meaning three digits EF BF BD for i do not know what... other than it shows as a square (maybe that is what it is)...

Now, if I manually go in and change those characters and make sure it is:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">    i did end up removing the lang = "ES" did not seem to matter (just yet maybe).
then it does work. Changing the above line as utf_8 really screws it up again...

So, if you focus on getting <H1> correct, then the rest should likely follow on (in terms of a fix). At this stage, it looks a bit like how the page is being generated, because everything from the next title down (ie  Costas fabulosas y escapadas a las Islas )  . Similary <H2> seems compromised as well. Maybe it has something to do with the embedded tables - thought doesn't really explain why it comes good again after that next bold heading...

When I click on any of the top few links, then the following page is similarly compromised... But, the other languages appear to be OK.

So, definitely look at how you are retrieving, what the datatypes are from the database, and because the other lagaunages are OK, would say there is some default handling behaviour or database content that is fundamentally different.

Not sure if I can help any more at this stage...  Maybe show how you retrieve if Italian versus Spanish, and what you are retrieving, and where you are retrieving it from.

Hope that helps...
0
 
LVL 1

Author Comment

by:bendecko
ID: 22755605
OK Great I'm getting somewhere.  It looks like that file was saved in the wrong sort of encoding so I loaded with textpad and saved it out as UTF-8 and now the characters appear.   However further down the page I'm still getting the boxes.

The text below is generated from a database and an <!--Include--> file.  I loaded that file and saved it out as before but this time it didn't work.  The database was generated by a FORM post from a translator writing the Spanish.  I don't know the encoding of that form; it might not have been UTF-8 - probably not - so maybe the data in that part of database is not now compatible with the encoding of the page.

Don't worry about French etc being different.  The Spanish staging site is the first one to specify the encodings and all the other languages will have the same problems later!

How do I see in textpad the byte sequences you mention above?

Thanks

Bendecko
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 22755636
have to open textpad first, then do a file open and select binary as file type... cannot edit, it becomes read only, but gives you the hex view - like the old fashioned DUMP command.

Yes it does sound like a database / data problem... It is a pity about french et al - it looks like that was working well.
0
 
LVL 1

Author Closing Comment

by:bendecko
ID: 31508175
Thank you.

It turned out to be a routine that copied the translated HTML sections from their inital location to the staging areas that was corrupting the characters. The routine used FSO filesystem object and this messed up the UTF-8 characters.  You have to use ADOstreams instead to preserve the formatting.  

For any other EE user embarking on Internationalisation you should definately read even just for a good laugh Joel's article: http://www.joelonsoftware.com/articles/Unicode.html

Thanks again Mark for help me on this one.

Bendecko
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article shows gives you an overview on SQL Server 2016 row level security. You will also get to know the usages of row-level-security and how it works
An alternative to the "For XML" way of pivoting and concatenating result sets into strings, and an easy introduction to "common table expressions" (CTEs). Being someone who is always looking for alternatives to "work your data", I came across this …
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.
Suggested Courses

564 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question