jackjeckyl
asked on
Anyone have a good UNICODE filter class?
I'm having to convert an AFP (mainframe) file to a PDF document. I can do all that just fine, but there are plenty of garbage characters sprinkled throughout the file that I want to get rid of. I want a method that removes all invalid characters. The only characters I want is all the alphanumerics, cr & lf, and the normal keyboard characters. I know how to do this painstakingly, but I bet one of you guys already code to do this. The method should only ALLOW characters from an approved list. Unless there's a better way, of course.
Can you post (attach) the unclean file here?
ASKER
Forgot to add - the characters are so off the wall, I've been using UNICODE to replaceAll on them. Some aren't even visible.
ASKER
I can't post the file. It won't always be the same file, it'll always be something different.
The best thing to do would be to implement a FilterReader to clean out anything not ISO8859-1
http://www.technojeeves.com/joomla/index.php/free/48-iso8859-1
http://www.technojeeves.com/joomla/index.php/free/48-iso8859-1
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I ended up just doing an approved list of characters and ignored UNICODE. Thanks for your responses.
:-)
I would just use the ISO8859-1 charset
I would just use the ISO8859-1 charset