Solved

Script to Kill All UniCode Characters, from Files' Name, using a Regular Expression

Posted on 2007-12-06
6
482 Views
Last Modified: 2008-02-01
I have many downloaded files which contain, in its name, UniCode characters (such as Chinese, Japanese, Korean, Arabic, Hebraic, Russian) which are seen as "Illegal Characters" by English version of Windows XP's explorer.exe... This causes serious troubles when managing such files...
Thus, I would like to get a script to automatically delete such characters from files' name, in order to avoid problems when trying to access them...

PS.1 - even when downloading, for example, an eBook (or application) totally written in English, stupidly the files' names can contain such unicode/illegal characters (I think that's because they are hosted in foreign servers)...



PS.2 - This question is related to:

http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_23005080.html

Although "ReNamer" is a superb renaming tool; I still to want/need to get a script, as a "light-weight" and quickly/rapidly solution "on-the-fly"...

Someone, outside EE, told me about REGULAR EXPRESSIONS...  Please give a look at:

http://www.autoitscript.com/forum/index.php?showtopic=58848&st=0&gopid=444082&#entry444082

and

http://www.isthisthingon.org/unicode/allchars1.php

My problem is that I do not know how to make the RegExp...

I can use VBScript, AutoIt, Perl, Python, Ruby, Tcl-Tk, or whatever, but I need some help...


Thanks.

Regards.
0
Comment
Question by:asgarcymed
  • 4
  • 2
6 Comments
 
LVL 27

Expert Comment

by:ddrudik
ID: 20420709
Replace:
[^\w.]+
With empty string.

Would allow:
a-zA-Z_.

Add any other characters you want to allow.
0
 

Author Comment

by:asgarcymed
ID: 20421038
I need to allow: All English/German and Latin (Portuguese/Spanish/French/Italian) letters, lower and upper case [A..Z; À; Ã; É; Ê; Í; Ì; Ó; Ò; Õ; Ñ; Ç]
AND
!; ""; #; $; %; &; @; £; §; {; }; '; «; »; [American and European Keyboard]

I very urgently need to kill ALL Chinese, Japanese, Korean, Arabic, Hebraic, Russian characters (all letters are "crazy")...

Could you please help?

You "must" look at:

http://www.isthisthingon.org/unicode/allchars1.php

Thank you very much
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 20421236
Replace:
[^\u0000-\u024F]+
With empty string.

Would cover:
\p{InBasic_Latin}: U+0000..U+007F
\p{InLatin-1_Supplement}: U+0080..U+00FF
\p{InLatin_Extended-A}: U+0100..U+017F
\p{InLatin_Extended-B}: U+0180..U+024F
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 27

Accepted Solution

by:
ddrudik earned 500 total points
ID: 20421521
Did you need any additional help implementing the regex pattern?

A VBSCRIPT example:
<%
Set regEx = New RegExp
regEx.Global = True
teststring = "<your string>"
regEx.Pattern = "[^\u0000-\u024F]+"
teststring = regEx.Replace(teststring,"")
%>
0
 

Author Comment

by:asgarcymed
ID: 20421870
Thank you very much!! You made may day shining!!
Regards.
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 20422052
Thanks for the question and the points.  Glad I could help.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…

862 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

29 Experts available now in Live!

Get 1:1 Help Now