Solved

OSX grep (or sed? awk?) to find/replace non-ASCII hex values

Posted on 2014-01-06
5
1,954 Views
Last Modified: 2014-01-08
I need to use OSX shell script (bash is current shell, would rather not re-write for others, but could if needed) to find and replace specific non-ASCII hex values (specifically Unicode #65533) from text files.

This appears to work, but wonder if there is something more elegant.
grep `echo -e 's/\xEF\xBF\xBD//'` fileName.txt

Have not otherwise been able to get the hex understood by grep or found. Any ideas?
0
Comment
Question by:michaellanham
  • 3
  • 2
5 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 39761210
awk '/s\/\xEF\xBF\xBD\/\//' fileName.txt
perl -ne 'print if m{s/\xEF\xBF\xBD//}' fileName.txt
0
 

Author Comment

by:michaellanham
ID: 39763953
Oddly, I'm unable to diagnose why I can't get my OS X 10.9.1 to play nice. I've attached a copy of a test file with this character sequence in it. I tried both the suggested solutions, as well as returning to my own example. And I'll be darned that all three do not have any discernable affect on the source file. I can close it and reopen it in the hex editor and sure enough, bad Hex Symbols still there.
The grep --version output is: grep (BSD grep) 2.5.1-FreeBSD, and that might be useful.

Diagnosis assistance would be great!
Screen-Shot-2014-01-07-at-8.34.0.png
0
 

Author Comment

by:michaellanham
ID: 39763955
Well...weirdness..a minor modification to suggestion #2 seems to be working, but I'm not clear what the difference is...I concede I'm doing a replacement with 'foo' instead of deleting, but the 'm' in front of the first brace seemed to be interfering with proper execution.

perl -ane '{if(s/[\xEF\xBF\xBD]+/foo/) { print } }' foo.csv

but
perl -e s/[\xEF\xBF\xBD]+/foo/ foo.csv

does not work. Argh! Why not?
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 39764012
In the screen shot, I see the  character sequence "\xEF\xBF\xBD", but I don't see the character sequence "s/\xEF\xBF\xBD//", which is what your grep command would have been searching for
If you just want to replace all instances of those characters in any sequence with "foo" then you can do
perl -i.bak -pe 's/[\xEF\xBF\xBD]+/foo/' foo.csv
0
 

Author Comment

by:michaellanham
ID: 39764974
Zoinks, you are of course correct I was searching with grep for more characters than existed--hence no match.

I also noticed that I had not used the -i (to edit <> in place, with backup) nor quotes around the perl segment. Grrr.....

When using grep, this worked...
grep -e `echo -e $'\xEF\xBF\xBD'` foo.csv

Notice I had to have bash interpret the Hex characters before passing to grep. found an example after much searching and mostly-blind modifications to see if they would work as expected. Other than painful discovery learning, any suggestions on how to ID the actual problem with grep? I've read multiple conflicting posts that the version on Mac does/does not handle unicode characters, and my exposure thus far goes with the 'does not' camp.
Thank you!
0

Featured Post

Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
A short article about problems I had with the new location API and permissions in Marshmallow
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question