Link to home
Start Free TrialLog in
Avatar of Rich Rumble
Rich RumbleFlag for United States of America

asked on

Best way to remove non-ascii characters

I'd like to drop any character in a string that is less than or equal to 32 or greater than 122 (html/decimal) and some in between: / \ |  ? : " < >
Below is what I've used so far, but I'd like to add all the other chars and was hoping there was a more efficient way of doing so. I found what looks like an efficient way of doing this, but I don't know how to fit it in the script below:
http://www.mombu.com/microsoft/microsoft/t-removing-non-ascii-characters-from-a-text-file-1681093.html#postmenu_6449496
Let me know if there are any questions.
-rich
Const ForAppending = 8
Dim objFSO : Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder("C:\scripto\mail\")
Const olTXT = 0
Const olMSG = 3
Const olFolderInbox = 6
Set objOutlook = CreateObject("Outlook.Application")
Set objNamespace = objOutlook.GetNamespace("MAPI")
Set objFolder = objNamespace.GetDefaultFolder(olFolderInbox)
Set colItems = objFolder.Items
Set colFilteredItems = colItems.Restrict("[ReceivedTime] < 'October 29, 2008 2:52 PM'")
 
For Each objMessage In colFilteredItems
    strName = StripChars(objMessage.Subject)
    If (objfso.FileExists("C:\scripto\mail\" & strName & ".txt")) Then
       objMessage.SaveAs "C:\scripto\mail\" & strName & ".tmp", olTXT
       Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".tmp")
       tmpData = objFile.ReadAll
       objFile.Close
       objFSO.DeleteFile "C:\scripto\mail\" & strName & ".tmp"
       Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".txt", ForAppending, True)
       objFile.WriteLine tmpData
       objFile.Close
   Else
     objMessage.SaveAs "C:\scripto\mail\" & strName & ".txt", olTXT
     Wscript.Echo strName & ".txt"
   End If
Next
 
Function StripChars(txtData)
    txtData = Replace(Replace(Replace(UCase(txtData), "FW:", ""), "RE:", ""), "FWD:", "")
    txtData = Replace(Replace(Replace(Replace(UCase(txtData), "AW:", "An:", ""), ":", ""), "/","")
    txtData = Replace(Replace(Replace(txtData,"\",""), ",",""), "?","")
    txtData = Replace(Replace(Replace(txtData,"|",""), "<",""), ">","")
    txtData = Replace(Replace(Replace(Replace(Replace(txtData, Chr(9),""), Chr(25),""), Chr(34),""),  Chr(39),""), Chr(42),"")
    StripChars = Trim(txtData)
End Function

Open in new window

Avatar of Chris Bottomley
Chris Bottomley
Flag of United Kingdom of Great Britain and Northern Ireland image

Hello richrumble,

I use a construct of the form:

    str = "E$^*&IFTUVYBJKOP:UI" & Chr(34) & "(){*_%& *VCEFYOILUO:J > MKJNVC $%RT*Y(JKLJBNKuy&i*ty buiojlknmbjhg"
    For intCount = 1 To Len(str)
        If Mid(str, intCount, 1) Like "[a-zA-Z0-9""/ \|?:<>]" Then
            NEWSTR = NEWSTR & Mid(str, intCount, 1)
        End If
    Next

Accepted chars are in this case:
"[a-zA-Z0-9""/ \|?:<>]

Where "" represents a single instance of double quote.

Regards,
Chris
Avatar of Rich Rumble

ASKER

I'm removing characters that aren't valid for file names, so  <  > \ | / * ? " : have to go as well.
Is it easier to do the oppisite? Replace anything Not Equal To  A-Z a-z 0-9 ) ( * & ^ % $ # @ ! ~ + _ - ] [ { } , .
Thanks for the post though, I really have limited knowledge of scripting so it'd take me a while to figure out how to use the code you supplied.
-rich
I don't understand your problem

Replace anything Not Equal, i.e. as below anything that is not one of those characters is deleted from newstr.

    str = "E$^*&IFTUVYBJKOP:UI" & Chr(34) & "(){*_%& *VCEFYOILUO:J > MKJNVC $%RT*Y(JKLJBNKuy&i*ty buiojlknmbjhg"
    For intCount = 1 To Len(str)
        If Mid(str, intCount, 1) Like "A-Za-z0-9)(*&^%$#@!~+_-][{}, " Then
            NEWSTR = NEWSTR & Mid(str, intCount, 1)
        End If
    Next

Chris
I don't know integrate the code you provided basically, how do I use "str" to replace StripChars function I currently use. I would need it written for me, sorry I my mind just can't seem to make much sense of programming.
-rich
As in the following perhaps:

Chris
Function StripChars(txtData)
 Dim intCount As Integer
    For intCount = 1 To Len(txtData)
        If Mid(txtData, intCount, 1) Like "[a-zA-Z0-9/ \|?:<>]" Then
            StripChars = StripChars & Mid(txtData, intCount, 1)
        End If
    Next
End Function

Open in new window

email.vbs (31, 15) Microsoft VBScript compilation error: Expected end of statement
Script is below, errors on "As" ?
-rich
Const ForAppending = 8
Dim objFSO : Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder("C:\scripto\mail\")
Const olTXT = 0
Const olMSG = 3
Const olFolderInbox = 6
Set objOutlook = CreateObject("Outlook.Application")
Set objNamespace = objOutlook.GetNamespace("MAPI")
Set objFolder = objNamespace.GetDefaultFolder(olFolderInbox)
Set colItems = objFolder.Items
Set colFilteredItems = colItems.Restrict("[ReceivedTime] < 'October 29, 2008 2:52 PM'")
 
For Each objMessage In colFilteredItems
    strName = StripChars(objMessage.Subject)
    If (objfso.FileExists("C:\scripto\mail\" & strName & ".txt")) Then
       objMessage.SaveAs "C:\scripto\mail\" & strName & ".tmp", olTXT
       Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".tmp")
       tmpData = objFile.ReadAll
       objFile.Close
       objFSO.DeleteFile "C:\scripto\mail\" & strName & ".tmp"
       Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".txt", ForAppending, True)
       objFile.WriteLine tmpData
       objFile.Close
   Else
     objMessage.SaveAs "C:\scripto\mail\" & strName & ".txt", olTXT
     Wscript.Echo strName & ".txt"
   End If
Next
 
Function StripChars(txtData)
 Dim intCount As Integer
    For intCount = 1 To Len(txtData)
        If Mid(txtData, intCount, 1) Like "[a-zA-Z0-9+=_-)(*&^%$#@!~]" Then
            StripChars = StripChars & Mid(txtData, intCount, 1)
        End If
    Next
End Function

Open in new window

Doh

Sorry VBS, it's sometimes easier to test in the application and then convert ... as long as I do the conversion!  HAven't time to test as VBS yet so i'll edit as is for you to try.

Chris
Function StripChars(txtData)
 Dim intCount
    For intCount = 1 To Len(txtData)
        If Mid(txtData, intCount, 1) Like "[a-zA-Z0-9+=_-)(*&^%$#@!~]" Then
            StripChars = StripChars & Mid(txtData, intCount, 1)
        End If
    Next
End Function

Open in new window

mail.vbs(33, 9) Microsoft VBScript runtime error: Sub or Function not defined
Nothing seems to be misspelled, sorry I'm not much more help at troubleshooting.
Const ForAppending = 8
Dim objFSO : Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder("C:\scripto\mail\")
Const olTXT = 0
Const olMSG = 3
Const olFolderInbox = 6
Set objOutlook = CreateObject("Outlook.Application")
Set objNamespace = objOutlook.GetNamespace("MAPI")
Set objFolder = objNamespace.GetDefaultFolder(olFolderInbox)
Set colItems = objFolder.Items
Set colFilteredItems = colItems.Restrict("[ReceivedTime] < 'October 29, 2008 2:52 PM'")
 
For Each objMessage In colFilteredItems
    strName = StripChars(objMessage.Subject)
    If (objfso.FileExists("C:\scripto\mail\" & strName & ".txt")) Then
       objMessage.SaveAs "C:\scripto\mail\" & strName & ".tmp", olTXT
       Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".tmp")
       tmpData = objFile.ReadAll
       objFile.Close
       objFSO.DeleteFile "C:\scripto\mail\" & strName & ".tmp"
       Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".txt", ForAppending, True)
       objFile.WriteLine tmpData
       objFile.Close
   Else
     objMessage.SaveAs "C:\scripto\mail\" & strName & ".txt", olTXT
     Wscript.Echo strName & ".txt"
   End If
Next
 
Function StripChars(txtData)
 Dim intCount
    For intCount = 1 To Len(txtData)
        If Mid(txtData, intCount, 1) Like "[a-zA-Z0-9+=_-)(*&^%$#@!~]" Then
            StripChars = StripChars & Mid(txtData, intCount, 1)
        End If
    Next
End Function

Open in new window

Doesn't look as though VBS likes the 'like' construct.  Slightly longer winded but rewritten as below.  See how it looks.

Chris
Const ForAppending = 8
Dim objFSO : Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder("C:\scripto\mail\")
Const olTXT = 0
Const olMSG = 3
Const olFolderInbox = 6
Set objOutlook = CreateObject("Outlook.Application")
Set objNamespace = objOutlook.GetNamespace("MAPI")
Set objFolder = objNamespace.GetDefaultFolder(olFolderInbox)
Set colItems = objFolder.Items
Set colFilteredItems = colItems.Restrict("[ReceivedTime] < 'October 29, 2008 2:52 PM'")
 
For Each objMessage In colFilteredItems
    strName = StripChars(objMessage.Subject)
    If (objfso.FileExists("C:\scripto\mail\" & strName & ".txt")) Then
       objMessage.SaveAs "C:\scripto\mail\" & strName & ".tmp", olTXT
       Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".tmp")
       tmpData = objFile.ReadAll
       objFile.Close
       objFSO.DeleteFile "C:\scripto\mail\" & strName & ".tmp"
       Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".txt", ForAppending, True)
       objFile.WriteLine tmpData
       objFile.Close
   Else
     objMessage.SaveAs "C:\scripto\mail\" & strName & ".txt", olTXT
     Wscript.Echo strName & ".txt"
   End If
Next
 
Function StripChars(txtData)
 Dim intCount
    For intCount = 1 To Len(txtData)
        If instr(" abcdefghijklmnopqrstuvwxyz0123456789+=_-)(*&^%$#@!~", Mid(lcase(txtData), intCount, 1)) > 0 Then
            StripChars = StripChars & Mid(txtData, intCount, 1)
        End If
    Next
End Function

Open in new window

So far working well, I do need to add combination's like:
Re: Fw: Fwd: An: Aw: (also case insensitive) and add Trim() too?
I'm not sure how to group these together in the instr and how to add trim for the extra spaces.
Thanks again!
Function StripChars(txtData)
 Dim intCount
    For intCount = 1 To Len(txtData)
        If instr(" abcdefghijklmnopqrstuvwxyz0123456789+=_-)(&^%$#@!~Chr(34))", Mid(lcase(txtData), intCount, 1)) > 0 Then
            StripChars = StripChars & Mid(txtData, intCount, 1)
        End If
    Next
End Function

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Chris Bottomley
Chris Bottomley
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Ahh I see, I think the vbtextcompare is a visual basic command, I went back to the original code I had and used that as well as the previous code you've supplied. Thanks again for your patience and time!
-rich
Const ForAppending = 8
Dim objFSO : Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder("C:\scripto\mail\")
Const olTXT = 0
Const olFolderInbox = 6
Set objOutlook = CreateObject("Outlook.Application")
Set objNamespace = objOutlook.GetNamespace("MAPI")
Set objFolder = objNamespace.GetDefaultFolder(olFolderInbox)
Set colItems = objFolder.Items
Set colFilteredItems = colItems.Restrict("[ReceivedTime] < 'October 29, 2008 2:52 PM'")
 
For Each objMessage In colFilteredItems
    strName = StripChars(objMessage.Subject)
    If (objfso.FileExists("C:\scripto\mail\" & strName & ".txt")) Then
       objMessage.SaveAs "C:\scripto\mail\" & strName & ".tmp", olTXT
       Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".tmp")
       tmpData = objFile.ReadAll
       objFile.Close
       objFSO.DeleteFile "C:\scripto\mail\" & strName & ".tmp"
       Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".txt", ForAppending, True)
       objFile.WriteLine tmpData
       objFile.Close
   Else
     objMessage.SaveAs "C:\scripto\mail\" & strName & ".txt", olTXT
     Wscript.Echo strName & ".txt"
   End If
Next
 
Function StripChars(Str)
Dim intCount
Dim txtData
	txtData = str
	txtData = Trim(Replace(Replace(Replace(Replace(Replace(UCase(txtData), "AW:", ""), "AN:", ""), "FW:", ""), "RE:", ""), "FWD:", ""))
	For intCount = 1 To Len(txtData)
		If instr(" abcdefghijklmnopqrstuvwxyz0123456789+=_-)(&^%$#@!~Chr(34))", Mid(lcase(txtData), intCount, 1)) > 0 Then
			StripChars = StripChars & Mid(txtData, intCount, 1)
		End If
	Next
End Function

Open in new window

Very good, this works well.
How would I limit the character length of the file name to 200 or less characters? I haven't found a good example using len to "chomp" or split the length so i can remove any characters after the 200 mark.
-rich