hadrons
asked on
Substitutions involving non-ASCII characters
I have a file with some binary characters that are unrecognizable in any of my text editors.
For example, I would see something like this:
\xC2Freien - with \xC2 being some binary character. I attempted to make this substitution in my processing script to clean these out:
s/[^[:ascii:]]Freien/Freie n/g;
However, it failed to do so. I know I can do something like this:
s/[^[:ascii:]]//g;
While this will get rid of those binary characters, it will also strip out non-ASCII characters I do what to keep.
I know I can also do hexadecimal substitution like this:
s/\xC2Freien/Freien/g;
However, in future cases I won't always know the hexadecimal value.
I can't cut & paste binary characters like this because \xC2 will transform into this: ? in the cut & paste process
For example, I would see something like this:
\xC2Freien - with \xC2 being some binary character. I attempted to make this substitution in my processing script to clean these out:
s/[^[:ascii:]]Freien/Freie
However, it failed to do so. I know I can do something like this:
s/[^[:ascii:]]//g;
While this will get rid of those binary characters, it will also strip out non-ASCII characters I do what to keep.
I know I can also do hexadecimal substitution like this:
s/\xC2Freien/Freien/g;
However, in future cases I won't always know the hexadecimal value.
I can't cut & paste binary characters like this because \xC2 will transform into this: ? in the cut & paste process
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER