Solved

Vim: getting rid of non printable characters

Posted on 2007-11-19
5
1,422 Views
Last Modified: 2012-08-14
I have a number of text files, generated on windows machines that appear strangly in Vim due to a massive amount of non printable characters and I'm not sure how to remove them.  For example, in windows, a particular XML file looks like this in notepad.

<?xml version="1.0" encoding="Unicode" ?>
<SYSTEMINFO>
<SYSTEM>
      <OSNAME>Microsoft Windows Server 2003 Advanced Server</OSNAME>
      <OSVER>5.2.3790 1.0</OSVER>
      <OSLANGUAGE>1033</OSLANGUAGE>
</SYSTEM>


But in Vim, the same thing looks like this with npc's after every character:

ÿþ<?^@x^@m^@l^@ ^@v^@e^@r^@s^@i^@o^@n^@=^@"^@1^@.^@0^@"^@ ^@e^@n^@c^@o^@d^@i^@n^@g^@=^@"^@U^@n^@i^@c^@o^@d^@e^@"^@ ^@?^@>^@^M
<^@S^@Y^@S^@T^@E^@M^@I^@N^@F^@O^@>^@^M
^@<^@S^@Y^@S^@T^@E^@M^@>^@^M
^@      ^@<^@O^@S^@N^@A^@M^@E^@>^@M^@i^@c^@r^@o^@s^@o^@f^@t^@ ^@W^@i^@n^@d^@o^@w^@s^@ ^@S^@e^@r^@v^@e^@r^@ ^@2^@0^@0^@3^@ ^@A^@d^@v^@a^@n^@c^@e^@d^@ ^@S^@e^@r^@v^@e^@r^@<^@/^@O^@S^@N^@A^@M^@E^@>^@^M
^@      ^@<^@O^@S^@V^@E^@R^@>^@5^@.^@2^@.^@3^@7^@9^@0^@ ^@1^@.^@0^@<^@/^@O^@S^@V^@E^@R^@>^@^M
^@      ^@<^@O^@S^@L^@A^@N^@G^@U^@A^@G^@E^@>^@1^@0^@3^@3^@<^@/^@O^@S^@L^@A^@N^@G^@U^@A^@G^@E^@>^@^M
^@<^@/^@S^@Y^@S^@T^@E^@M^@>^@^M


Not all files, mostly just .xml files generated by a microsoft program or .reg registry files.
0
Comment
Question by:Marketing_Insists
  • 2
  • 2
5 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 20317307
:%s/[^ -~]//g
0
 
LVL 3

Expert Comment

by:amirs80
ID: 20317837
before opening the file run the command
#dos2unix filename
now check it
0
 
LVL 34

Expert Comment

by:Duncan Roe
ID: 20318283
Unicode characters are 2 bytes wide. The original ASCII characters map into the 2nd byte leaving the first byte zero. Also microsoft xml files often seem to have some weird binary garbage at the front - you can remove that with no ill effect in my limited experience.
So ozo's advice is sound - remove all characters that aren't printable. I don't understand the :% at the front but the rest is a sed command that would do what you want.
dos2unix will convert CrLf pairs to Lf but the sed command will effectively do that for you anyway
0
 
LVL 34

Expert Comment

by:Duncan Roe
ID: 20318296
One other thing - for the benefit of xml parsers, you need to change "Unicode"  in line 1 to "utf-8", so the parser will know characters are 1 byte wide.
0
 
LVL 84

Expert Comment

by:ozo
ID: 20318367
:% tells vim to exceute a sed command on every line in the buffer
0

Featured Post

Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Linux VM 6 101
lunix and unix command 21 87
Apache LDAP Authentication 20 34
Linux MD5 Hash 7 42
I. Introduction There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension (http://www.experts-exchange.com/discussions/210281/Attachments-with-no-extension.html). This reminded me of questions tha…
Fine Tune your automatic Updates for Ubuntu / Debian
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question