Solved

Vim: getting rid of non printable characters

Posted on 2007-11-19
5
1,429 Views
Last Modified: 2012-08-14
I have a number of text files, generated on windows machines that appear strangly in Vim due to a massive amount of non printable characters and I'm not sure how to remove them.  For example, in windows, a particular XML file looks like this in notepad.

<?xml version="1.0" encoding="Unicode" ?>
<SYSTEMINFO>
<SYSTEM>
      <OSNAME>Microsoft Windows Server 2003 Advanced Server</OSNAME>
      <OSVER>5.2.3790 1.0</OSVER>
      <OSLANGUAGE>1033</OSLANGUAGE>
</SYSTEM>


But in Vim, the same thing looks like this with npc's after every character:

ÿþ<?^@x^@m^@l^@ ^@v^@e^@r^@s^@i^@o^@n^@=^@"^@1^@.^@0^@"^@ ^@e^@n^@c^@o^@d^@i^@n^@g^@=^@"^@U^@n^@i^@c^@o^@d^@e^@"^@ ^@?^@>^@^M
<^@S^@Y^@S^@T^@E^@M^@I^@N^@F^@O^@>^@^M
^@<^@S^@Y^@S^@T^@E^@M^@>^@^M
^@      ^@<^@O^@S^@N^@A^@M^@E^@>^@M^@i^@c^@r^@o^@s^@o^@f^@t^@ ^@W^@i^@n^@d^@o^@w^@s^@ ^@S^@e^@r^@v^@e^@r^@ ^@2^@0^@0^@3^@ ^@A^@d^@v^@a^@n^@c^@e^@d^@ ^@S^@e^@r^@v^@e^@r^@<^@/^@O^@S^@N^@A^@M^@E^@>^@^M
^@      ^@<^@O^@S^@V^@E^@R^@>^@5^@.^@2^@.^@3^@7^@9^@0^@ ^@1^@.^@0^@<^@/^@O^@S^@V^@E^@R^@>^@^M
^@      ^@<^@O^@S^@L^@A^@N^@G^@U^@A^@G^@E^@>^@1^@0^@3^@3^@<^@/^@O^@S^@L^@A^@N^@G^@U^@A^@G^@E^@>^@^M
^@<^@/^@S^@Y^@S^@T^@E^@M^@>^@^M


Not all files, mostly just .xml files generated by a microsoft program or .reg registry files.
0
Comment
Question by:Marketing_Insists
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 20317307
:%s/[^ -~]//g
0
 
LVL 3

Expert Comment

by:amirs80
ID: 20317837
before opening the file run the command
#dos2unix filename
now check it
0
 
LVL 34

Expert Comment

by:Duncan Roe
ID: 20318283
Unicode characters are 2 bytes wide. The original ASCII characters map into the 2nd byte leaving the first byte zero. Also microsoft xml files often seem to have some weird binary garbage at the front - you can remove that with no ill effect in my limited experience.
So ozo's advice is sound - remove all characters that aren't printable. I don't understand the :% at the front but the rest is a sed command that would do what you want.
dos2unix will convert CrLf pairs to Lf but the sed command will effectively do that for you anyway
0
 
LVL 34

Expert Comment

by:Duncan Roe
ID: 20318296
One other thing - for the benefit of xml parsers, you need to change "Unicode"  in line 1 to "utf-8", so the parser will know characters are 1 byte wide.
0
 
LVL 84

Expert Comment

by:ozo
ID: 20318367
:% tells vim to exceute a sed command on every line in the buffer
0

Featured Post

NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Network Interface Card (NIC) bonding, also known as link aggregation, NIC teaming and trunking, is an important concept to understand and implement in any environment where high availability is of concern. Using this feature, a server administrator …
It’s 2016. Password authentication should be dead — or at least close to dying. But, unfortunately, it has not traversed Quagga stage yet. Using password authentication is like laundering hotel guest linens with a washboard — it’s Passé.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses

710 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question