Rich Rumble
asked on
Best way to remove non-ascii characters
I'd like to drop any character in a string that is less than or equal to 32 or greater than 122 (html/decimal) and some in between: / \ | ? : " < >
Below is what I've used so far, but I'd like to add all the other chars and was hoping there was a more efficient way of doing so. I found what looks like an efficient way of doing this, but I don't know how to fit it in the script below:
http://www.mombu.com/microsoft/microsoft/t-removing-non-ascii-characters-from-a-text-file-1681093.html#postmenu_6449496
Let me know if there are any questions.
-rich
Below is what I've used so far, but I'd like to add all the other chars and was hoping there was a more efficient way of doing so. I found what looks like an efficient way of doing this, but I don't know how to fit it in the script below:
http://www.mombu.com/microsoft/microsoft/t-removing-non-ascii-characters-from-a-text-file-1681093.html#postmenu_6449496
Let me know if there are any questions.
-rich
Const ForAppending = 8
Dim objFSO : Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder("C:\scripto\mail\")
Const olTXT = 0
Const olMSG = 3
Const olFolderInbox = 6
Set objOutlook = CreateObject("Outlook.Application")
Set objNamespace = objOutlook.GetNamespace("MAPI")
Set objFolder = objNamespace.GetDefaultFolder(olFolderInbox)
Set colItems = objFolder.Items
Set colFilteredItems = colItems.Restrict("[ReceivedTime] < 'October 29, 2008 2:52 PM'")
For Each objMessage In colFilteredItems
strName = StripChars(objMessage.Subject)
If (objfso.FileExists("C:\scripto\mail\" & strName & ".txt")) Then
objMessage.SaveAs "C:\scripto\mail\" & strName & ".tmp", olTXT
Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".tmp")
tmpData = objFile.ReadAll
objFile.Close
objFSO.DeleteFile "C:\scripto\mail\" & strName & ".tmp"
Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".txt", ForAppending, True)
objFile.WriteLine tmpData
objFile.Close
Else
objMessage.SaveAs "C:\scripto\mail\" & strName & ".txt", olTXT
Wscript.Echo strName & ".txt"
End If
Next
Function StripChars(txtData)
txtData = Replace(Replace(Replace(UCase(txtData), "FW:", ""), "RE:", ""), "FWD:", "")
txtData = Replace(Replace(Replace(Replace(UCase(txtData), "AW:", "An:", ""), ":", ""), "/","")
txtData = Replace(Replace(Replace(txtData,"\",""), ",",""), "?","")
txtData = Replace(Replace(Replace(txtData,"|",""), "<",""), ">","")
txtData = Replace(Replace(Replace(Replace(Replace(txtData, Chr(9),""), Chr(25),""), Chr(34),""), Chr(39),""), Chr(42),"")
StripChars = Trim(txtData)
End Function
ASKER
I'm removing characters that aren't valid for file names, so < > \ | / * ? " : have to go as well.
Is it easier to do the oppisite? Replace anything Not Equal To A-Z a-z 0-9 ) ( * & ^ % $ # @ ! ~ + _ - ] [ { } , .
Thanks for the post though, I really have limited knowledge of scripting so it'd take me a while to figure out how to use the code you supplied.
-rich
Is it easier to do the oppisite? Replace anything Not Equal To A-Z a-z 0-9 ) ( * & ^ % $ # @ ! ~ + _ - ] [ { } , .
Thanks for the post though, I really have limited knowledge of scripting so it'd take me a while to figure out how to use the code you supplied.
-rich
I don't understand your problem
Replace anything Not Equal, i.e. as below anything that is not one of those characters is deleted from newstr.
str = "E$^*&IFTUVYBJKOP:UI" & Chr(34) & "(){*_%& *VCEFYOILUO:J > MKJNVC $%RT*Y(JKLJBNKuy&i*ty buiojlknmbjhg"
For intCount = 1 To Len(str)
If Mid(str, intCount, 1) Like "A-Za-z0-9)(*&^%$#@!~+_-][ {}, " Then
NEWSTR = NEWSTR & Mid(str, intCount, 1)
End If
Next
Chris
Replace anything Not Equal, i.e. as below anything that is not one of those characters is deleted from newstr.
str = "E$^*&IFTUVYBJKOP:UI" & Chr(34) & "(){*_%& *VCEFYOILUO:J > MKJNVC $%RT*Y(JKLJBNKuy&i*ty buiojlknmbjhg"
For intCount = 1 To Len(str)
If Mid(str, intCount, 1) Like "A-Za-z0-9)(*&^%$#@!~+_-][
NEWSTR = NEWSTR & Mid(str, intCount, 1)
End If
Next
Chris
ASKER
I don't know integrate the code you provided basically, how do I use "str" to replace StripChars function I currently use. I would need it written for me, sorry I my mind just can't seem to make much sense of programming.
-rich
-rich
As in the following perhaps:
Chris
Chris
Function StripChars(txtData)
Dim intCount As Integer
For intCount = 1 To Len(txtData)
If Mid(txtData, intCount, 1) Like "[a-zA-Z0-9/ \|?:<>]" Then
StripChars = StripChars & Mid(txtData, intCount, 1)
End If
Next
End Function
ASKER
email.vbs (31, 15) Microsoft VBScript compilation error: Expected end of statement
Script is below, errors on "As" ?
-rich
Script is below, errors on "As" ?
-rich
Const ForAppending = 8
Dim objFSO : Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder("C:\scripto\mail\")
Const olTXT = 0
Const olMSG = 3
Const olFolderInbox = 6
Set objOutlook = CreateObject("Outlook.Application")
Set objNamespace = objOutlook.GetNamespace("MAPI")
Set objFolder = objNamespace.GetDefaultFolder(olFolderInbox)
Set colItems = objFolder.Items
Set colFilteredItems = colItems.Restrict("[ReceivedTime] < 'October 29, 2008 2:52 PM'")
For Each objMessage In colFilteredItems
strName = StripChars(objMessage.Subject)
If (objfso.FileExists("C:\scripto\mail\" & strName & ".txt")) Then
objMessage.SaveAs "C:\scripto\mail\" & strName & ".tmp", olTXT
Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".tmp")
tmpData = objFile.ReadAll
objFile.Close
objFSO.DeleteFile "C:\scripto\mail\" & strName & ".tmp"
Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".txt", ForAppending, True)
objFile.WriteLine tmpData
objFile.Close
Else
objMessage.SaveAs "C:\scripto\mail\" & strName & ".txt", olTXT
Wscript.Echo strName & ".txt"
End If
Next
Function StripChars(txtData)
Dim intCount As Integer
For intCount = 1 To Len(txtData)
If Mid(txtData, intCount, 1) Like "[a-zA-Z0-9+=_-)(*&^%$#@!~]" Then
StripChars = StripChars & Mid(txtData, intCount, 1)
End If
Next
End Function
Doh
Sorry VBS, it's sometimes easier to test in the application and then convert ... as long as I do the conversion! HAven't time to test as VBS yet so i'll edit as is for you to try.
Chris
Sorry VBS, it's sometimes easier to test in the application and then convert ... as long as I do the conversion! HAven't time to test as VBS yet so i'll edit as is for you to try.
Chris
Function StripChars(txtData)
Dim intCount
For intCount = 1 To Len(txtData)
If Mid(txtData, intCount, 1) Like "[a-zA-Z0-9+=_-)(*&^%$#@!~]" Then
StripChars = StripChars & Mid(txtData, intCount, 1)
End If
Next
End Function
ASKER
mail.vbs(33, 9) Microsoft VBScript runtime error: Sub or Function not defined
Nothing seems to be misspelled, sorry I'm not much more help at troubleshooting.
Nothing seems to be misspelled, sorry I'm not much more help at troubleshooting.
Const ForAppending = 8
Dim objFSO : Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder("C:\scripto\mail\")
Const olTXT = 0
Const olMSG = 3
Const olFolderInbox = 6
Set objOutlook = CreateObject("Outlook.Application")
Set objNamespace = objOutlook.GetNamespace("MAPI")
Set objFolder = objNamespace.GetDefaultFolder(olFolderInbox)
Set colItems = objFolder.Items
Set colFilteredItems = colItems.Restrict("[ReceivedTime] < 'October 29, 2008 2:52 PM'")
For Each objMessage In colFilteredItems
strName = StripChars(objMessage.Subject)
If (objfso.FileExists("C:\scripto\mail\" & strName & ".txt")) Then
objMessage.SaveAs "C:\scripto\mail\" & strName & ".tmp", olTXT
Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".tmp")
tmpData = objFile.ReadAll
objFile.Close
objFSO.DeleteFile "C:\scripto\mail\" & strName & ".tmp"
Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".txt", ForAppending, True)
objFile.WriteLine tmpData
objFile.Close
Else
objMessage.SaveAs "C:\scripto\mail\" & strName & ".txt", olTXT
Wscript.Echo strName & ".txt"
End If
Next
Function StripChars(txtData)
Dim intCount
For intCount = 1 To Len(txtData)
If Mid(txtData, intCount, 1) Like "[a-zA-Z0-9+=_-)(*&^%$#@!~]" Then
StripChars = StripChars & Mid(txtData, intCount, 1)
End If
Next
End Function
Doesn't look as though VBS likes the 'like' construct. Slightly longer winded but rewritten as below. See how it looks.
Chris
Chris
Const ForAppending = 8
Dim objFSO : Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder("C:\scripto\mail\")
Const olTXT = 0
Const olMSG = 3
Const olFolderInbox = 6
Set objOutlook = CreateObject("Outlook.Application")
Set objNamespace = objOutlook.GetNamespace("MAPI")
Set objFolder = objNamespace.GetDefaultFolder(olFolderInbox)
Set colItems = objFolder.Items
Set colFilteredItems = colItems.Restrict("[ReceivedTime] < 'October 29, 2008 2:52 PM'")
For Each objMessage In colFilteredItems
strName = StripChars(objMessage.Subject)
If (objfso.FileExists("C:\scripto\mail\" & strName & ".txt")) Then
objMessage.SaveAs "C:\scripto\mail\" & strName & ".tmp", olTXT
Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".tmp")
tmpData = objFile.ReadAll
objFile.Close
objFSO.DeleteFile "C:\scripto\mail\" & strName & ".tmp"
Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".txt", ForAppending, True)
objFile.WriteLine tmpData
objFile.Close
Else
objMessage.SaveAs "C:\scripto\mail\" & strName & ".txt", olTXT
Wscript.Echo strName & ".txt"
End If
Next
Function StripChars(txtData)
Dim intCount
For intCount = 1 To Len(txtData)
If instr(" abcdefghijklmnopqrstuvwxyz0123456789+=_-)(*&^%$#@!~", Mid(lcase(txtData), intCount, 1)) > 0 Then
StripChars = StripChars & Mid(txtData, intCount, 1)
End If
Next
End Function
ASKER
So far working well, I do need to add combination's like:
Re: Fw: Fwd: An: Aw: (also case insensitive) and add Trim() too?
I'm not sure how to group these together in the instr and how to add trim for the extra spaces.
Thanks again!
Re: Fw: Fwd: An: Aw: (also case insensitive) and add Trim() too?
I'm not sure how to group these together in the instr and how to add trim for the extra spaces.
Thanks again!
Function StripChars(txtData)
Dim intCount
For intCount = 1 To Len(txtData)
If instr(" abcdefghijklmnopqrstuvwxyz0123456789+=_-)(&^%$#@!~Chr(34))", Mid(lcase(txtData), intCount, 1)) > 0 Then
StripChars = StripChars & Mid(txtData, intCount, 1)
End If
Next
End Function
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Ahh I see, I think the vbtextcompare is a visual basic command, I went back to the original code I had and used that as well as the previous code you've supplied. Thanks again for your patience and time!
-rich
-rich
Const ForAppending = 8
Dim objFSO : Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder("C:\scripto\mail\")
Const olTXT = 0
Const olFolderInbox = 6
Set objOutlook = CreateObject("Outlook.Application")
Set objNamespace = objOutlook.GetNamespace("MAPI")
Set objFolder = objNamespace.GetDefaultFolder(olFolderInbox)
Set colItems = objFolder.Items
Set colFilteredItems = colItems.Restrict("[ReceivedTime] < 'October 29, 2008 2:52 PM'")
For Each objMessage In colFilteredItems
strName = StripChars(objMessage.Subject)
If (objfso.FileExists("C:\scripto\mail\" & strName & ".txt")) Then
objMessage.SaveAs "C:\scripto\mail\" & strName & ".tmp", olTXT
Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".tmp")
tmpData = objFile.ReadAll
objFile.Close
objFSO.DeleteFile "C:\scripto\mail\" & strName & ".tmp"
Set objFile = objFSO.OpenTextFile("C:\scripto\mail\" & strName & ".txt", ForAppending, True)
objFile.WriteLine tmpData
objFile.Close
Else
objMessage.SaveAs "C:\scripto\mail\" & strName & ".txt", olTXT
Wscript.Echo strName & ".txt"
End If
Next
Function StripChars(Str)
Dim intCount
Dim txtData
txtData = str
txtData = Trim(Replace(Replace(Replace(Replace(Replace(UCase(txtData), "AW:", ""), "AN:", ""), "FW:", ""), "RE:", ""), "FWD:", ""))
For intCount = 1 To Len(txtData)
If instr(" abcdefghijklmnopqrstuvwxyz0123456789+=_-)(&^%$#@!~Chr(34))", Mid(lcase(txtData), intCount, 1)) > 0 Then
StripChars = StripChars & Mid(txtData, intCount, 1)
End If
Next
End Function
ASKER
Very good, this works well.
ASKER
How would I limit the character length of the file name to 200 or less characters? I haven't found a good example using len to "chomp" or split the length so i can remove any characters after the 200 mark.
-rich
-rich
I use a construct of the form:
str = "E$^*&IFTUVYBJKOP:UI" & Chr(34) & "(){*_%& *VCEFYOILUO:J > MKJNVC $%RT*Y(JKLJBNKuy&i*ty buiojlknmbjhg"
For intCount = 1 To Len(str)
If Mid(str, intCount, 1) Like "[a-zA-Z0-9""/ \|?:<>]" Then
NEWSTR = NEWSTR & Mid(str, intCount, 1)
End If
Next
Accepted chars are in this case:
"[a-zA-Z0-9""/ \|?:<>]
Where "" represents a single instance of double quote.
Regards,
Chris