asgarcymed
asked on
Script to Kill All UniCode Characters, from Files' Name, using a Regular Expression
I have many downloaded files which contain, in its name, UniCode characters (such as Chinese, Japanese, Korean, Arabic, Hebraic, Russian) which are seen as "Illegal Characters" by English version of Windows XP's explorer.exe... This causes serious troubles when managing such files...
Thus, I would like to get a script to automatically delete such characters from files' name, in order to avoid problems when trying to access them...
PS.1 - even when downloading, for example, an eBook (or application) totally written in English, stupidly the files' names can contain such unicode/illegal characters (I think that's because they are hosted in foreign servers)...
PS.2 - This question is related to:
https://www.experts-exchange.com/questions/23005080/Script-to-Remove-UniCode-Illegal-Characters-from-Files'-Name.html
Although "ReNamer" is a superb renaming tool; I still to want/need to get a script, as a "light-weight" and quickly/rapidly solution "on-the-fly"...
Someone, outside EE, told me about REGULAR EXPRESSIONS... Please give a look at:
http://www.autoitscript.com/forum/index.php?showtopic=58848&st=0&gopid=444082&#entry444082
and
http://www.isthisthingon.org/unicode/allchars1.php
My problem is that I do not know how to make the RegExp...
I can use VBScript, AutoIt, Perl, Python, Ruby, Tcl-Tk, or whatever, but I need some help...
Thanks.
Regards.
Thus, I would like to get a script to automatically delete such characters from files' name, in order to avoid problems when trying to access them...
PS.1 - even when downloading, for example, an eBook (or application) totally written in English, stupidly the files' names can contain such unicode/illegal characters (I think that's because they are hosted in foreign servers)...
PS.2 - This question is related to:
https://www.experts-exchange.com/questions/23005080/Script-to-Remove-UniCode-Illegal-Characters-from-Files'-Name.html
Although "ReNamer" is a superb renaming tool; I still to want/need to get a script, as a "light-weight" and quickly/rapidly solution "on-the-fly"...
Someone, outside EE, told me about REGULAR EXPRESSIONS... Please give a look at:
http://www.autoitscript.com/forum/index.php?showtopic=58848&st=0&gopid=444082&#entry444082
and
http://www.isthisthingon.org/unicode/allchars1.php
My problem is that I do not know how to make the RegExp...
I can use VBScript, AutoIt, Perl, Python, Ruby, Tcl-Tk, or whatever, but I need some help...
Thanks.
Regards.
ASKER
I need to allow: All English/German and Latin (Portuguese/Spanish/French /Italian) letters, lower and upper case [A..Z; À; Ã; É; Ê; Í; Ì; Ó; Ò; Õ; Ñ; Ç]
AND
!; ""; #; $; %; &; @; £; §; {; }; '; «; »; [American and European Keyboard]
I very urgently need to kill ALL Chinese, Japanese, Korean, Arabic, Hebraic, Russian characters (all letters are "crazy")...
Could you please help?
You "must" look at:
http://www.isthisthingon.org/unicode/allchars1.php
Thank you very much
AND
!; ""; #; $; %; &; @; £; §; {; }; '; «; »; [American and European Keyboard]
I very urgently need to kill ALL Chinese, Japanese, Korean, Arabic, Hebraic, Russian characters (all letters are "crazy")...
Could you please help?
You "must" look at:
http://www.isthisthingon.org/unicode/allchars1.php
Thank you very much
Replace:
[^\u0000-\u024F]+
With empty string.
Would cover:
\p{InBasic_Latin}: U+0000..U+007F
\p{InLatin-1_Supplement}: U+0080..U+00FF
\p{InLatin_Extended-A}: U+0100..U+017F
\p{InLatin_Extended-B}: U+0180..U+024F
[^\u0000-\u024F]+
With empty string.
Would cover:
\p{InBasic_Latin}: U+0000..U+007F
\p{InLatin-1_Supplement}: U+0080..U+00FF
\p{InLatin_Extended-A}: U+0100..U+017F
\p{InLatin_Extended-B}: U+0180..U+024F
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you very much!! You made may day shining!!
Regards.
Regards.
Thanks for the question and the points. Glad I could help.
[^\w.]+
With empty string.
Would allow:
a-zA-Z_.
Add any other characters you want to allow.