I have an Apple file, InfoPlist.strings, that is in UTF-16 format. I am using an sed string, that I was helped with here on this forum, for searching the file to replace a version number as:
number="0.9.8.7" sed "s/\(#define VERSION.*\"\)\0-9]\.[0-9]\.[0-9]\.[0-9]/\1V$number\"/" file >/tmp/$$ mv /tmp/$$ /mydir/file
this works great on non UTF-16 files but now I need to use it on the UTF-16 file. I read about sed supporting \xHH as an escape sequence for hex notation but I am not sure how to use this in my script.
Looking for some help on how I can manipulate text in my UTF-16 file.
Quick note I am searching for the string #define VERSION "1.2.3.4" and replacing the 1.2.3.4 with what every number the user/script types in as an argument.
> .. I have an Apple file, InfoPlist.strings .. so you're on a Mac? then I'd use defaults command and pipe the output through sed According your posted sed command line: if it does what you want depends on the shell you issue that command 'cause you use escaped " , I'd use it as follows:
sed -e 's/\(#define VERSION.*"\)[0-9]\.[0-9]\.[0-9]\.[0-9]/\1'$number'"/' file >/tmp/$$
Also keep in mind that sed is not UTF compliant, though I'm not sure how BSD's sed on Mac behaves ..
Yes this is running on a MAC and shell is 'bash'. If sed is not UTF compliant then I need a way to find, in the UTF the version number that I am looking for and replace it with the number I pass in as an arg parameter. SED will NOT find the version number in the UTF file as it does in the other none UTF files. I thought there was some setting that would force sed to use or become UTF compliant. IF there is not a way then I need a solution for a UTF file. Thanks
even OS X is Unix-based, Mac's unix support is a pain, sometimes, somehow ... You may try awk instead, but I gues that it's as old fashoined as sed .. Then your friend will be perl. Can you please check if the file contains a BOM, use od -x for that.
will show the content of your file in hex notation The BOM -byte order mark- is a special 2- or 3-byte code used in UTF files. For UTF16 it must be FEFF or FFFE as the very first 2 bytes in the file. The above od command should prove this.
Thanks Ahoffmann: I am getting like: feff 002f 002a 0020 004c 006f .... All the characters have the 00 in front of them except for the first feff. Thanks for the explanation of the BOM notation. I remember reading on the web about using \x00 in front of the character(s) that one is trying to search for in the SED call but again I don't see how to incorporate that with what I am trying to do.