• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2324
  • Last Modified:

OSX grep (or sed? awk?) to find/replace non-ASCII hex values

I need to use OSX shell script (bash is current shell, would rather not re-write for others, but could if needed) to find and replace specific non-ASCII hex values (specifically Unicode #65533) from text files.

This appears to work, but wonder if there is something more elegant.
grep `echo -e 's/\xEF\xBF\xBD//'` fileName.txt

Have not otherwise been able to get the hex understood by grep or found. Any ideas?
0
michaellanham
Asked:
michaellanham
  • 3
  • 2
1 Solution
 
ozoCommented:
awk '/s\/\xEF\xBF\xBD\/\//' fileName.txt
perl -ne 'print if m{s/\xEF\xBF\xBD//}' fileName.txt
0
 
michaellanhamAuthor Commented:
Oddly, I'm unable to diagnose why I can't get my OS X 10.9.1 to play nice. I've attached a copy of a test file with this character sequence in it. I tried both the suggested solutions, as well as returning to my own example. And I'll be darned that all three do not have any discernable affect on the source file. I can close it and reopen it in the hex editor and sure enough, bad Hex Symbols still there.
The grep --version output is: grep (BSD grep) 2.5.1-FreeBSD, and that might be useful.

Diagnosis assistance would be great!
Screen-Shot-2014-01-07-at-8.34.0.png
0
 
michaellanhamAuthor Commented:
Well...weirdness..a minor modification to suggestion #2 seems to be working, but I'm not clear what the difference is...I concede I'm doing a replacement with 'foo' instead of deleting, but the 'm' in front of the first brace seemed to be interfering with proper execution.

perl -ane '{if(s/[\xEF\xBF\xBD]+/foo/) { print } }' foo.csv

but
perl -e s/[\xEF\xBF\xBD]+/foo/ foo.csv

does not work. Argh! Why not?
0
 
ozoCommented:
In the screen shot, I see the  character sequence "\xEF\xBF\xBD", but I don't see the character sequence "s/\xEF\xBF\xBD//", which is what your grep command would have been searching for
If you just want to replace all instances of those characters in any sequence with "foo" then you can do
perl -i.bak -pe 's/[\xEF\xBF\xBD]+/foo/' foo.csv
0
 
michaellanhamAuthor Commented:
Zoinks, you are of course correct I was searching with grep for more characters than existed--hence no match.

I also noticed that I had not used the -i (to edit <> in place, with backup) nor quotes around the perl segment. Grrr.....

When using grep, this worked...
grep -e `echo -e $'\xEF\xBF\xBD'` foo.csv

Notice I had to have bash interpret the Hex characters before passing to grep. found an example after much searching and mostly-blind modifications to see if they would work as expected. Other than painful discovery learning, any suggestions on how to ID the actual problem with grep? I've read multiple conflicting posts that the version on Mac does/does not handle unicode characters, and my exposure thus far goes with the 'does not' camp.
Thank you!
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now