Link to home
Start Free TrialLog in
Avatar of jackjeckyl
jackjeckyl

asked on

Anyone have a good UNICODE filter class?

I'm having to convert an AFP (mainframe) file to a PDF document.  I can do all that just fine, but there are plenty of garbage characters sprinkled throughout the file that I want to get rid of.  I want a method that removes all invalid characters.  The only characters I want is all the alphanumerics, cr & lf,  and the normal keyboard characters.  I know how to do this painstakingly, but I bet one of you guys already code to do this.  The method should only ALLOW characters from an approved list.  Unless there's a better way, of course.  
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Can you post (attach) the unclean file here?
Avatar of jackjeckyl
jackjeckyl

ASKER

Forgot to add - the characters are so off the wall, I've been using UNICODE to replaceAll on them.  Some aren't even visible.  
I can't post the file.  It won't always be the same file, it'll always be something different.  
The best thing to do would be to implement a FilterReader to clean out anything not ISO8859-1

http://www.technojeeves.com/joomla/index.php/free/48-iso8859-1
ASKER CERTIFIED SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I ended up just doing an approved list of characters and ignored UNICODE.  Thanks for your responses.
:-)

I would just use the ISO8859-1 charset