Solved

Vim: getting rid of non printable characters

Posted on 2007-11-19
5
1,430 Views
Last Modified: 2012-08-14
I have a number of text files, generated on windows machines that appear strangly in Vim due to a massive amount of non printable characters and I'm not sure how to remove them.  For example, in windows, a particular XML file looks like this in notepad.

<?xml version="1.0" encoding="Unicode" ?>
<SYSTEMINFO>
<SYSTEM>
      <OSNAME>Microsoft Windows Server 2003 Advanced Server</OSNAME>
      <OSVER>5.2.3790 1.0</OSVER>
      <OSLANGUAGE>1033</OSLANGUAGE>
</SYSTEM>


But in Vim, the same thing looks like this with npc's after every character:

ÿþ<?^@x^@m^@l^@ ^@v^@e^@r^@s^@i^@o^@n^@=^@"^@1^@.^@0^@"^@ ^@e^@n^@c^@o^@d^@i^@n^@g^@=^@"^@U^@n^@i^@c^@o^@d^@e^@"^@ ^@?^@>^@^M
<^@S^@Y^@S^@T^@E^@M^@I^@N^@F^@O^@>^@^M
^@<^@S^@Y^@S^@T^@E^@M^@>^@^M
^@      ^@<^@O^@S^@N^@A^@M^@E^@>^@M^@i^@c^@r^@o^@s^@o^@f^@t^@ ^@W^@i^@n^@d^@o^@w^@s^@ ^@S^@e^@r^@v^@e^@r^@ ^@2^@0^@0^@3^@ ^@A^@d^@v^@a^@n^@c^@e^@d^@ ^@S^@e^@r^@v^@e^@r^@<^@/^@O^@S^@N^@A^@M^@E^@>^@^M
^@      ^@<^@O^@S^@V^@E^@R^@>^@5^@.^@2^@.^@3^@7^@9^@0^@ ^@1^@.^@0^@<^@/^@O^@S^@V^@E^@R^@>^@^M
^@      ^@<^@O^@S^@L^@A^@N^@G^@U^@A^@G^@E^@>^@1^@0^@3^@3^@<^@/^@O^@S^@L^@A^@N^@G^@U^@A^@G^@E^@>^@^M
^@<^@/^@S^@Y^@S^@T^@E^@M^@>^@^M


Not all files, mostly just .xml files generated by a microsoft program or .reg registry files.
0
Comment
Question by:Marketing_Insists
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 20317307
:%s/[^ -~]//g
0
 
LVL 3

Expert Comment

by:amirs80
ID: 20317837
before opening the file run the command
#dos2unix filename
now check it
0
 
LVL 35

Expert Comment

by:Duncan Roe
ID: 20318283
Unicode characters are 2 bytes wide. The original ASCII characters map into the 2nd byte leaving the first byte zero. Also microsoft xml files often seem to have some weird binary garbage at the front - you can remove that with no ill effect in my limited experience.
So ozo's advice is sound - remove all characters that aren't printable. I don't understand the :% at the front but the rest is a sed command that would do what you want.
dos2unix will convert CrLf pairs to Lf but the sed command will effectively do that for you anyway
0
 
LVL 35

Expert Comment

by:Duncan Roe
ID: 20318296
One other thing - for the benefit of xml parsers, you need to change "Unicode"  in line 1 to "utf-8", so the parser will know characters are 1 byte wide.
0
 
LVL 84

Expert Comment

by:ozo
ID: 20318367
:% tells vim to exceute a sed command on every line in the buffer
0

Featured Post

Congratulations! You’re Certified – Now What?

Starting a new career can be overwhelming. Becoming certified in your field of expertise is a great start, but where do you go from here?  Here are some tips to help you on your career journey.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Network Interface Card (NIC) bonding, also known as link aggregation, NIC teaming and trunking, is an important concept to understand and implement in any environment where high availability is of concern. Using this feature, a server administrator …
Fine Tune your automatic Updates for Ubuntu / Debian
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses

615 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question