In this example, there are a number of characters that get "wierded up" in the tidy handling of the UTF8 file.
Or do I misunderstand how this should work?
For instance, the Ã‹ entity is of course the same "characters" (by the look of them) as the correct entity Ã« which it should have been converted to.
But even though they still both result in the "characters Ã«", the version of "«" defined by the former "‹" is of course not at all translatable by UTF8 - it must be « in order to get the UTF8 bit-calculations correct.
The same goes for „ and – and ‰ and œ and …
Is this by design or is this a bug? Is tidy incapable of handling UTF8 files and still keep translating to &#xxx; entities?