Reversing corrupted UTF8 Strings
Posted on 2004-11-03
Quite likely I can't find anything because I'm using the wrong keywords, bu I'm lloking for a tool/piece of java-code to retrieve information from corrupted UTF8 strings.
In an internationalised application that I've become responsible for, every now and again a new form gets added. Quite often people don't realise the implications of international forms and just assume that it'll work for any language they let it be filled in with. Off course this is not so, for instance, with Poland and Russia some characters, of not all, have to be stored as UTF8. This does not always happen, but the resulting corruption of the string follows a predictable pattern. As far as I know that meens it should be a reversible process, I am however at a loss and short of time to figure it out myself and was hoping somebody, somewhere might have a simple piece of code to 'fix' a corrupted string.
I was hoping it'd be as simple as taking in the characters as pairs and creating a new utf-character by merging the charcode of the two characters as one charcode, however my attempts at doing so fail horribly...
Hoping you guys can come up with something,