Remove Unicode Charecter 'ÿ' from Text files using a script

Hi!

i have a bunch of files which include unicode string - ÿ

i would like to replace it with a null and re-write the file

ive looked for a few vbscripts along with python scripts - but nothing can really nail it

it should preferably be able to go on all text file (*.txt) in a directory

VBS/Python/Batch would help :)

Thanks!

m0tekAsked:
Who is Participating?
 
peprConnect With a Mentor Commented:
My guess is that it is the first or second character of the file.  My second guess it is that your files are stored using utf-16 with BOM (little endian or big endian -- or it could be even utf-32).  If I am right you are interpreting the BOM bytes as characters using some encoding (based on my own recent observation).  If this is true, you should or skip the first two (four) bytes and read the rest as utf-16 encoded (or utf-32).  Try the following snippet with the attached files:
f = open('utf16be.txt')
s = f.read()
f.close()
print s

f = open('utf16Le.txt')
s = f.read()
f.close()
print s

import codecs

f = codecs.open('utf16be.txt', encoding='UTF-16')
s = f.read()
f.close()
print s

f = codecs.open('utf16le.txt', encoding='UTF-16')
s = f.read()
f.close()
print s

Open in new window

utf16le.txt utf16be.txt
0
 
gelonidaCommented:
Do you want to replace all non representable unicode strings or only the unicode string with the
ÿ


Is your file encoded with UTF-8?
If not please tell us the file encoding
0
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

 
gelonidaCommented:
in order to be 100% sure, that the script works on the correctly encoded txt files you could perhaps
upload a small example .txt file
0
 
Tony MassaCommented:
Here's a simple script to replace the character and create a new copy of your file with the character removed:
Const ForReading = 1
Const ForWriting = 2

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\file1.txt", ForReading)

strText = objFile.ReadAll
objFile.Close

strNewText = Replace(strText, "ÿ", "")

Set objFile = objFSO.OpenTextFile("C:\file2.txt", ForWriting) objFile.WriteLine strNewText objFile.Close

Open in new window

0
 
Tony MassaCommented:
The previous paste was bad...here's the correct script:

Const ForReading = 1
Const ForWriting = 2

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\file1.txt", ForReading)

strText = objFile.ReadAll
objFile.Close

strNewText = Replace(strText, "ÿ", "")

Set objFile = objFSO.OpenTextFile("C:\file2.txt", ForWriting) objFile.WriteLine strNewText
objFile.Close

Open in new window

0
 
Tony MassaCommented:
It still did it!  Frustrating:

Const ForReading = 1
Const ForWriting = 2

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\file1.txt", ForReading)

strText = objFile.ReadAll
objFile.Close

strNewText = Replace(strText, "ÿ", "")

Set objFile = objFSO.OpenTextFile("C:\file2.txt", ForWriting)
objFile.WriteLine strNewText
objFile.Close
0
 
asawatzkiCommented:
Try specifying to open it in either Unicode or ANSI.  If the below code doesn't work, then try changing it from FormatUnicode to FormatANSI on both cases OpenTextFile lines.


Const ForReading = 1
Const ForWriting = 2
Const FormatUnicode = -1
Const FormatANSI = 0

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\file1.txt", ForReading, False, FormatUnicode)

strText = objFile.ReadAll
objFile.Close

strNewText = Replace(strText, "ÿ", "")

Set objFile = objFSO.OpenTextFile("C:\file2.txt", ForWriting, False, FormatUnicode )
objFile.Write strNewText
objFile.Close
0
 
MytixCommented:
I think you can do that in python like this:
# -*- coding: cp1252 -*-
import re

input_filepath = "C:\\temp\\input.txt"
output_filepath = "C:\\temp\\output.txt"

fip = open(input_filepath,"rb")
lines = fip.readlines()
fip.close()

fop = open(output_filepath,"wb")
for line in lines:
    l = re.sub("ÿ","",line)
    fop.write(l)
fop.close()

Open in new window

0
 
MytixCommented:
Or if you want to change all files that end with .txt in a folder, you can try something like this:
# -*- coding: cp1252 -*-
import re, os
foldername = "C:\\temp\\"

for root, dirs, files in os.walk(foldername):
    for name in files:
        if re.search("(.*)\.txt$",name,re.IGNORECASE):
            filename = os.path.join(root, name)
            
            fip = open(filename,"rb")
            lines = fip.readlines()
            fip.close()

            fop = open(filename,"wb")
            for line in lines:
                l = re.sub("ÿ","",line)
                fop.write(l)
            fop.close()

Open in new window

0
 
peprCommented:
m0tek: Each question should be closed.  If you know the right answer, put it here, and accept your own comment. If there is no correct answer, just ask for deletion of the question with points refund.  

Or you can attach here the sampe file that shows the problem.  Then the solution could be found.  It is not clear now, what is the problem, whether it persists, whether you died or what.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.