Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Vim: getting rid of non printable characters

Posted on 2007-11-19
5
Medium Priority
?
1,432 Views
Last Modified: 2012-08-14
I have a number of text files, generated on windows machines that appear strangly in Vim due to a massive amount of non printable characters and I'm not sure how to remove them.  For example, in windows, a particular XML file looks like this in notepad.

<?xml version="1.0" encoding="Unicode" ?>
<SYSTEMINFO>
<SYSTEM>
      <OSNAME>Microsoft Windows Server 2003 Advanced Server</OSNAME>
      <OSVER>5.2.3790 1.0</OSVER>
      <OSLANGUAGE>1033</OSLANGUAGE>
</SYSTEM>


But in Vim, the same thing looks like this with npc's after every character:

ÿþ<?^@x^@m^@l^@ ^@v^@e^@r^@s^@i^@o^@n^@=^@"^@1^@.^@0^@"^@ ^@e^@n^@c^@o^@d^@i^@n^@g^@=^@"^@U^@n^@i^@c^@o^@d^@e^@"^@ ^@?^@>^@^M
<^@S^@Y^@S^@T^@E^@M^@I^@N^@F^@O^@>^@^M
^@<^@S^@Y^@S^@T^@E^@M^@>^@^M
^@      ^@<^@O^@S^@N^@A^@M^@E^@>^@M^@i^@c^@r^@o^@s^@o^@f^@t^@ ^@W^@i^@n^@d^@o^@w^@s^@ ^@S^@e^@r^@v^@e^@r^@ ^@2^@0^@0^@3^@ ^@A^@d^@v^@a^@n^@c^@e^@d^@ ^@S^@e^@r^@v^@e^@r^@<^@/^@O^@S^@N^@A^@M^@E^@>^@^M
^@      ^@<^@O^@S^@V^@E^@R^@>^@5^@.^@2^@.^@3^@7^@9^@0^@ ^@1^@.^@0^@<^@/^@O^@S^@V^@E^@R^@>^@^M
^@      ^@<^@O^@S^@L^@A^@N^@G^@U^@A^@G^@E^@>^@1^@0^@3^@3^@<^@/^@O^@S^@L^@A^@N^@G^@U^@A^@G^@E^@>^@^M
^@<^@/^@S^@Y^@S^@T^@E^@M^@>^@^M


Not all files, mostly just .xml files generated by a microsoft program or .reg registry files.
0
Comment
Question by:Marketing_Insists
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 2000 total points
ID: 20317307
:%s/[^ -~]//g
0
 
LVL 3

Expert Comment

by:amirs80
ID: 20317837
before opening the file run the command
#dos2unix filename
now check it
0
 
LVL 35

Expert Comment

by:Duncan Roe
ID: 20318283
Unicode characters are 2 bytes wide. The original ASCII characters map into the 2nd byte leaving the first byte zero. Also microsoft xml files often seem to have some weird binary garbage at the front - you can remove that with no ill effect in my limited experience.
So ozo's advice is sound - remove all characters that aren't printable. I don't understand the :% at the front but the rest is a sed command that would do what you want.
dos2unix will convert CrLf pairs to Lf but the sed command will effectively do that for you anyway
0
 
LVL 35

Expert Comment

by:Duncan Roe
ID: 20318296
One other thing - for the benefit of xml parsers, you need to change "Unicode"  in line 1 to "utf-8", so the parser will know characters are 1 byte wide.
0
 
LVL 84

Expert Comment

by:ozo
ID: 20318367
:% tells vim to exceute a sed command on every line in the buffer
0

Featured Post

Moving data to the cloud? Find out if you’re ready

Before moving to the cloud, it is important to carefully define your db needs, plan for the migration & understand prod. environment. This wp explains how to define what you need from a cloud provider, plan for the migration & what putting a cloud solution into practice entails.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

How many times have you wanted to quickly do the same thing to a list but found yourself typing it again and again? I first figured out a small time saver with the up arrow to recall the last command but that can only get you so far if you have a bi…
Setting up Secure Ubuntu server on VMware 1.      Insert the Ubuntu Server distribution CD or attach the ISO of the CD which is in the “Datastore”. Note that it is important to install the x64 edition on servers, not the X86 editions. 2.      Power on th…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question