I am tidying up some old html files and storing parts of them in a mysql database. PHP tidy does a great job but it changes not English characters to their html equivalent. e.g. í becomes í etc. I know it is good html but I want the database to be clean text (utf8)
(I can easily convert it to html on the way back out)
I have tried to configure Tidy to use UTF8 encoding
$config = array( "char-encoding" => "utf8");
Alas to no avail.
Is it possible to get tidy to ignore these special characters?