Solved

Script to Kill All UniCode Characters, from Files' Name, using a Regular Expression

Posted on 2007-12-06
6
509 Views
Last Modified: 2008-02-01
I have many downloaded files which contain, in its name, UniCode characters (such as Chinese, Japanese, Korean, Arabic, Hebraic, Russian) which are seen as "Illegal Characters" by English version of Windows XP's explorer.exe... This causes serious troubles when managing such files...
Thus, I would like to get a script to automatically delete such characters from files' name, in order to avoid problems when trying to access them...

PS.1 - even when downloading, for example, an eBook (or application) totally written in English, stupidly the files' names can contain such unicode/illegal characters (I think that's because they are hosted in foreign servers)...



PS.2 - This question is related to:

http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_23005080.html

Although "ReNamer" is a superb renaming tool; I still to want/need to get a script, as a "light-weight" and quickly/rapidly solution "on-the-fly"...

Someone, outside EE, told me about REGULAR EXPRESSIONS...  Please give a look at:

http://www.autoitscript.com/forum/index.php?showtopic=58848&st=0&gopid=444082&#entry444082

and

http://www.isthisthingon.org/unicode/allchars1.php

My problem is that I do not know how to make the RegExp...

I can use VBScript, AutoIt, Perl, Python, Ruby, Tcl-Tk, or whatever, but I need some help...


Thanks.

Regards.
0
Comment
Question by:asgarcymed
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
6 Comments
 
LVL 27

Expert Comment

by:ddrudik
ID: 20420709
Replace:
[^\w.]+
With empty string.

Would allow:
a-zA-Z_.

Add any other characters you want to allow.
0
 

Author Comment

by:asgarcymed
ID: 20421038
I need to allow: All English/German and Latin (Portuguese/Spanish/French/Italian) letters, lower and upper case [A..Z; À; Ã; É; Ê; Í; Ì; Ó; Ò; Õ; Ñ; Ç]
AND
!; ""; #; $; %; &; @; £; §; {; }; '; «; »; [American and European Keyboard]

I very urgently need to kill ALL Chinese, Japanese, Korean, Arabic, Hebraic, Russian characters (all letters are "crazy")...

Could you please help?

You "must" look at:

http://www.isthisthingon.org/unicode/allchars1.php

Thank you very much
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 20421236
Replace:
[^\u0000-\u024F]+
With empty string.

Would cover:
\p{InBasic_Latin}: U+0000..U+007F
\p{InLatin-1_Supplement}: U+0080..U+00FF
\p{InLatin_Extended-A}: U+0100..U+017F
\p{InLatin_Extended-B}: U+0180..U+024F
0
Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

 
LVL 27

Accepted Solution

by:
ddrudik earned 500 total points
ID: 20421521
Did you need any additional help implementing the regex pattern?

A VBSCRIPT example:
<%
Set regEx = New RegExp
regEx.Global = True
teststring = "<your string>"
regEx.Pattern = "[^\u0000-\u024F]+"
teststring = regEx.Replace(teststring,"")
%>
0
 

Author Comment

by:asgarcymed
ID: 20421870
Thank you very much!! You made may day shining!!
Regards.
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 20422052
Thanks for the question and the points.  Glad I could help.
0

Featured Post

Want Experts Exchange at your fingertips?

With Experts Exchange’s latest app release, you can now experience our most recent features, updates, and the same community interface while on-the-go. Download our latest app release at the Android or Apple stores today!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When you see single cell contains number and text, and you have to get any date out of it seems like cracking our heads.
With User Account Control (UAC) enabled in Windows 7, one needs to open an elevated Command Prompt in order to run scripts under administrative privileges. Although the elevated Command Prompt accomplishes the task, the question How to run as script…
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

632 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question