?
Solved

Script to Remove UniCode/Illegal Characters from Files' Name

Posted on 2007-12-05
6
Medium Priority
?
3,657 Views
Last Modified: 2010-04-21
Many files downloaded by eMule (ed2k/Kad) contain, in its name, UniCode characters (such as Chinese, Japanese, Korean, Arabic, Hebraic, Russian) which are seen as "Illegal Characters" by English version of Windows XP's explorer.exe... This causes serious troubles when managing such files...
Thus, I would like to get a script to automatically delete such characters from files' name, in order to avoid problems when trying to access them...

PS - even when we download an eBook totally written in English, stupidly the files' names contain such unicode/illegal characters...

Thanks.

Regards.
0
Comment
Question by:asgarcymed
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
6 Comments
 
LVL 7

Expert Comment

by:Wod
ID: 20416989
use Renamer: http://www.den4b.com/ (it supports unicode)
0
 

Author Comment

by:asgarcymed
ID: 20420653
Although "ReNamer" is a superb renaming tool; I still to want/need to get a script, as a "light-weight" and quickly/rapidly solution "on-the-fly"...

Someone, outside EE, told me about REGULAR EXPRESSIONS...  Please give a look at:

http://www.autoitscript.com/forum/index.php?showtopic=58848&st=0&gopid=444082&#entry444082

and

http://www.isthisthingon.org/unicode/allchars1.php

My problem is that I do not know how to make the RegExp...

I can use VBScript, AutoIt, Perl, Python, Ruby, Tcl-Tk, or whatever, but I need some help...


Thanks.

Regards.
0
 

Author Comment

by:asgarcymed
ID: 20424849
I am now using "RegExBuddy", a superb Win32 app to work and learn about Regular Expressions...

Using Google, I could get a txt file (see http://www.xys.org/xys/netters/others/net/wiki2.txt) which has many, many Chinese characters; and few English characters... I opened it with RegExBuddy, and I tested both RegEx's:

[\x10-\x1F\x21-\x2F\x3A-\x40\x5B-\x60\x80-\xFF]

and

[^\u0000-\u024F]+


But the results of test/debug were very confusing...

Even more - I got the Windows XP MUI (MultiLingual User Interface) and I installed all languages I already announced (Chinese/Japanese/Korean/Arabic/Hebraic/Russian)...

My confusion is now even bigger - some apps can correctly load the Chinese characters (for example), but the majority of apps continue not to deal with such characters (they show "squares" or "???????????" or distorted characters like when we try to read a binary file with a text editor...

A big confusion is installed in my brain... Must I have MUI installed ?... What is the best RegEx to kill such characters from files' names? If I have MUI installed, do I need such regex/script?? What should I do to solve this question once and for all?

Is there any Chinese/Japanese/Korean/Arabic/Hebraic/Russia person here? If yes, how do you manage the characters' conflicts between your Native Language and English?

Help is very appreciated!

Thanks in advance.

Regards.
0
Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

 
LVL 7

Accepted Solution

by:
Wod earned 2000 total points
ID: 20424924
You could try something like this in perl:

$string =~ s/^[\w\.]/_/g;

(replace all characters that are not word characters and not a period character (a-Z, 0-9 and "_" or ".") with "_")

$string would be your filename
0
 
LVL 7

Assisted Solution

by:Wod
Wod earned 2000 total points
ID: 20424932
PS: Some applications don't support Unicode (even if the OS do)
0
 

Author Closing Comment

by:asgarcymed
ID: 31413044
Thank you very much!!
Regards
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Over the years I have built up my own little library of code snippets that I refer to when programming or writing a script.  Many of these have come from the web or adaptations from snippets I find on the Web.  Periodically I add to them when I come…
Today, still in the boom of Apple, PC's and products, nearly 50% of the computer users use Windows as graphical operating systems. If you are among those users who love windows, but are grappling to keep the system's hard drive optimized, then you s…
Two types of users will appreciate AOMEI Backupper Pro: 1 - Those with PCIe drives (and haven't found cloning software that works on them). 2 - Those who want a fast clone of their boot drive (no re-boots needed) and it can clone your drive wh…
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question