Link to home
Start Free TrialLog in
Avatar of Lionel MM
Lionel MMFlag for United States of America

asked on

Parse Filenames and Replace characters

I have hundreds of files in directories and sub directories that have special characters in the file names. These rang e from "+" to "_". I have been using reNamer (http://www.den4b.com/?x=products) which works great but I have to methodically go through each directory to find the files  and then add the rule to change, replace or remove the character in question. I also use doublekiller to find and delete duplicates but unless the files are 100% exact it won't see it as a duplicate. So, is there a way I can write a script to do this for me. A script that will look at all file names in a given directory and all is sub directories and if it finds a + in the file name replace it with a space. It it very likely that if this renaming is done that a file with that name already exists--in that case can a log file be produced that shows all the files that cannot be renamed because files already exist with that file name. My purpose in this is to find duplicate files and to create a uniform naming for all my files which over the years have been all over the place.. Thank you.
Avatar of Bill Prew
Bill Prew

Yes, this could be done with either a BAT or VBS script fairly easy.  I'm on a mobile device right now but if someone else doesn't toss a solution at you I'll work one up later from my computer.

Does the following work to at least find all files that need to be adjusted?

DIR /S /B "c:\yourdir" "*+*.*"

~bp
SOLUTION
Avatar of Steve Knight
Steve Knight
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Lionel MM

ASKER

~bp
DIR /S /B "c:\yourdir" "*+*.*" does not work--shows all files and folders whether thay have a + or not
cd /d "c:\my videosr"
dir /s /b | findstr "+"

@echo off
setlocal enabledelayedexpansion
rem cd /d "c:\my videos"
for /f "tokens=* delims=" %%a in ('dir /s /b ^| findstr "+ "') do (
  set filename=%%~nxa
  set filename=!filename:+= !
  ECHO rename "%%~fa" "!filename!"
)
Works--thanks--but how do I deal with duplicates--if file name already exists how can I know that, get some kind of notice to go to the file location and deal with the duplicates. Thanks.
I get this on quite a lot of my files, when I try to rename it
"A duplicate file name exists, or the file cannot be found."
OK you could try something like this that then checks if the filename exists (in the same dir that is...) and shows an error if so.

Steve

@echo off
setlocal enabledelayedexpansion
cd /d "c:\my videos"
for /f "tokens=* delims=" %%a in ('dir /s /b ^| findstr "+ "') do (
  set filename=%%~nxa
  set filename=!filename:+= !
  if exist "%%~dpa!filename!" (
     ECHO ERROR: !filename! is a duplicate in %%~dpa
  ) ELSE (
     ECHO rename "%%~fa" "!filename!"
  )
)
Interesting question...

I'm sure most of us have got multiple copies of the same files scattered over our hard disks. - me included.

Now that this question has surfaced, I will return to it later today and hope to contribute a solution which may even be of benefit to myself...
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Blimey! I forgot to revisit this question.

The night is young....
If that syntax works for you to find then for /r tends to be quicker as otherwise it effectively lists all files from the dir /s /b then runs a find over them.

The advantage of findstr method is you can probably search for different patterns more easily in some circumstances.

Wonder what you'll conjur up then Paul - could be interesting project but sure someone must have written a de-dupe util already (commandline I mean)?  Logically you could a) look for files of the same size (of certain file type, e.g. .doc*, .xls* etc. then compare content with fc, and/or with name.  B) then maybe files of same name (but not the same size).

Could be as simple as a dump of all the filenames with columns ordered so you can use SORT to get them in an order to parse.

All slightly beyond the question though.

Steve
steve... good thoughts.

got distracted. was assisting someone with hardware issues... damn! won't be able to finish this tonight.

i have 4 HDDs with multiple copies of files, backup copies, copies in 'protected' areas and second and third copies etc... i'm in real need of sorting but i know it takes days and days to unravel my tangled mess.

problem is, i don't tend to trust programs written by someone else - not even those with a GUI. I usually do it manually using nothing other than Explorer and Notepad... silly isn't it?
Agreed.  Try to organise new things well as they go on to a new machine, then get bored and Gb's of data in old structure too.... luckily started organising all digitial HD video and photos well as we got the things, with backups to multiple machines, USB drives etc. - using batch as I trust too :-) and started clearing out some old IDE drives the other day, found a box of 5.25" and 3.5" floppies too.  I did actually virtualise a load of floppies at one point, and of PC programs I wrote in the late 80's, and Dragon stuff from earlier than that but rest just sits there.... along with the hundred or so "DV" video tapes and 8mm before that "one day" I will transfer to HD.....

Blimey I waffle on.

Steve
Steve, our file system is like a fingerprint showing our use over time.
What you guys are saying, how old stuff, on multiple hard drives have accumulated and duplicated data is exactly what I am trying to do. I have bought one big drive and copying it all onto one drive (is this the best approach), and then plan to run this solution on that drive to have it help me find where the duplicate files are so I put eyes on them and manually delete them. Will test this Monday and get back to you--thanks for the help
What do you do when one of the characters you want to replace with a space (or delete) is the % sign?

for /f "tokens=* delims=" %%a in ('dir /s /b ^| findstr "%"') do (
FINDSTR: No search strings
you should be able to escape it with a ^ before the %

steve
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
No I'm trying to
a) remove all non alpha numeric characters (+ _ % etc.)
b) get them all to the same place
c) then remove any duplicates

I have tried several GUI dup finders, use doublekiller (one I like best) and have used it to do some of the work but now I need this, the question I have asked, to do what I want done in a way I feel more comfortable with that using one of these GUI products. So, thanks for the hlep thus far and I think I am getting where I want to be.
lionelmm

i have just started coding. please be patient! i may be able to appease...
Just an update--I am getting partial success; at worst I am discovering I have many files with the same name so this is a great help. With most of the GUI's unless the files are 100% identical they can't find it but over the years of downloading many files are the same but for minor differences in size and this is a great help but struggling with moving them to one central folder. Working to move all duplicates to one place so I can view them and decide which ones to delete without having to go through hundreds of directories to see them--having them all in one folder is easier than 100's of folders. Please check if what I changed will work
@echo off
setlocal enabledelayedexpansion
cd /d "T:\my videos"
for /f "tokens=* delims=" %%a in ('dir /s /b ^| findstr "+ 3F 3A"') do (
  set filename=%%~nxa
  set filename=!filename:+= !
  if exist "%%~dpa!filename!" (
     Move %filename% T:\Duplicates
  ) ELSE (
     rename "%%~fa" "!filename!"
  )
)
I get an access denied error with this
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I realized once I wrote this that moving the duplicate (same file name) is not really helping me because then I will have one file as a potential duplicate in T:\Duplicates while the original remains in the original folder and so I won't have them in the same place to look at and decide which one to delete and which one to keep. I hate to ask but if it's not adding too much to your work already done can you say if the file cannot be renamed because a file with the same name already exists that you can then move that file with the same name to T:\Duplicates along with the one to be renamed? If its too much work, no worried I will find so way to deal with it--thanks.
You can also use fc.exe to look for an exact match:

fc file1 file2

If these are likely to be text based files then it can do various text based comparisons but /b will do a byte by byte check of the file.

We can do

fc file1 file2 && echo IT MATCHES

for instance ... so I think we need to decide on what is required before re-coding.

Maybe:

Loop through all files and directories under a subdirectory.

"correct" filename removing + or other characters

if the filename results in a duplicate then:

1. check if the file is complete duplicate using fc
   if so then move / delete it, or perhaps rename it to filename without the +'s with .duplicate on the end for easy finding later.
2. if it is not a perfect file match then rename it to filename without the +'s with .maybeduplicate on the end.

if the filename does not result in duplicate then just rename it.


Realistically the best way would be to pull everything into a database, or excel perhaps:

Path, Name, Size, date, time, "fixed" name.
Then use Excel to do vlookups of fixed name in the "name" column for duplicates
etc.

Blimey i waffle on.
Can you describe the problem you are trying to solve, and what the end result that you desire would be.

It sounds like you have "the same file" scattered around a disk in varying subfolders, and are trying to determine which ones to keep?  But so far you are only doing that based on file names, not size, content or date/time stamps.

It might be helpful at this point to take a step back, describe the current situation, and then describe the desired results and outcome.  That might allow us to better recommend and code a solid potential solution.

~bp
billprew
I have combined 3--500GB hard disk drives onto a 2TB hard disk drive. That drive is now full. It has mostly videos on it, backups of DVDs I bought, videos I've downloaded off the internet; videos I converted from DVDs to FLV, AVI and other video formats over the past several years. I have files in sub folders (can be a TV series name, or if a movie by my rating of the movie). Some of the files I have created and/or downloaded had different naming schemes. I did not think I had so many video as to fill 2TBs so I used DoubelKiller, was able to remove about 200GB of duplicates but also noticed many other files, same file but slightly different in size or date or type of video (flv, avi, etc). So what I am trying to do is to get all the files, regardless of type of video, to have the same file name. My hope is to have those files that cannot be renamed, that cannot have all the special characters removed because a file with that name already exists, to have it and the existing file moved to a central location so I can look at each file (using Windows explorer) and decide which file I want to keep.
dragon-it
Yes, this is what I want
Loop through all files and directories under a subdirectory.
"correct" filename removing +, _, % or other characters

If the filename results in a duplicate then:
1. check if the file is complete duplicate using fc
    rename it to filename without the +'s with .duplicate on the end for easy finding later.
2. if it is not a perfect file match then rename it to filename without the +'s with .maybeduplicate on the end.

This will work even better than what I proposed because I can then do a search on .duplicate and "open containing folder" and go to the folder with the files in it and decide there what to keep and what to delete. This will make it unnecessary to move files and then move them back, so this is a much better solution than what I was suggesting

if the filename does not result in duplicate then just rename it.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
lionelmm
(Apologies for the length - I bashed it in over two days)

I have not been able to find time to help here (I have a daughter in hospital - operated on today) and I am currently caring fulltime after her young children (it's been a month so far). What time I did have was squandered elsewhere I'm afraid.

Regarding the renaming issue. My approach would be to use the same convention Microsoft uses (with a slight personal tweak).

- if a filename matches another
    - then, if file contents are identical
        - then delete the most recent file
    - otherwise, if file contents are not identical
        - append '(n+1)' onto the end of the filename
        - where 'n+1' is the next available integer
    -
-

Example:

    filename.txt
    filename (1).txt
    filename (2).txt
    etc...

IMPORTANT TIP: You may consider some exceptions though. For example, you might have multiple instances of README.TXT in different folders. You wouldn't want to necessarily delete or rename those if they belog to different setup-programs.

Now that your files are on a single drive (which is exactly like I have mine - since nearly a year now, so I know where you're going with this, the next step is to 'classify' your data into smaller manageable groups.

MP3
So, we'll start with the easy stuff first (helps reduce bulk) - MP3s (and this may include other audio file types such as .WAV, .MID etc.). Therefore, create a folder (in uppercase) named 'MUSIC' off the root folder, and 'move' ALL your .MP3 files (with their relative paths) into this folder. This process is normally referred to as 'grafting'. So, at this stage all you'll be doing is 'grouping' your files instead of renaming or deleting them and therefore, you could end up with a something like this:

    MUSIC---Documents and Settings---etc...
    MUSIC---Documents and Settings (1)---etc...
    MUSIC---Documents and Settings (2)---etc...

where you have grafted music from different drives or different 'backup' folders.

Btw, I have a single folder on my drive named 'FILES' (containing everything I copied from my other drives) so, everytime I add a new folder to the root of the drive ie, 'MUSIC', 'VIDEO', 'PHOTOS' etc., I transfer the associated files from FILES into those folders. Eventually, there'll be so few files remaining in FILES I'll be able manage them manually in just a fraction of the time it would have otherwise taken.

VIDEO
Next, videos. Create a 'VIDEO' folder in the root and transfer ALL videos from the FILES folder to it (again, and as always, you need to retain their full paths. This is for the second phase later on).

And continue ths process for DOCUMENTS, GAMES, UTILS and so forth until the FILES folder is empty.

IMPORTANT TIP: It sould be apparent that slightly different rules might apply for different file types. For example, I have a large collection of music albums. It is common to find a songs performed by the same artist on two or more albums however, I wouldn't want to delete nor rename any of them as they each form part of a 'set'.

PROGRAMS
Like me, you may have lots of downloaded program files (could be in .ZIP format - usually a single file in it's own folder, or sometimes with it's files extracted to the same folder. In this case, you wouldn't want to rename any of the SETUP.EXE files. Some programs don't require 'installing' and will run straight from the folder they were extracted to in which case you might prefer to delete the .ZIP file whereas in some cases you may decie to delete the install files and keep the .ZIP file. Or you could simply delete both, or even keep both. The thing is, decisions like these are difficult to automate. This is where good old Windows Explorer comes in handy.

I'll leave it at that for now. I hope I've given you some thought to chew on... It's not quite as black and white as we hoped it would be.

RECOMMENDATION
For the most part, the whole process is going to be manual. This is because there are decisions that cannot be automated. Simple 'search & reporting' can be.

IMPORTANT TIP: Consider where you might have say two sets identical downloaded install files consisting of: SETUP.EXE and README.TXT. Now, you wouldn't want to delete the SETUP.EXE file from the first folder and README.TXT from the second one. If the contents of both folders are identical (file-for-file) then it's safe to delete a whole folder however, be mindful that those folders might not be named the same. This type of decision is still best left to humans to decide as 'BAT2EXE' and 'A000232Tool' mean nothing the the humble PC.

I would concentrate on one group at a time - MUSIC, PHOTOS, VIDEOS etc. A good starting point might be .MP3s or videos. This would mean having more than one batch file process your files. Each batch file might be tailored to a particular goup with it's own rules for renaming, moving, deleting etc.

As I said earlier, I haven't given this question my fullest support which is a real shame because it's something I am in need of doing myself. I will however dip in and out and contribute as and when I am able to. In the meantime, be prepared to accept the fact the process might take you weeks if not months... I've been chipping away at mine for a year so far... lol
@Paul,

Good luck on the home front, looks like an interesting post, I will digest more fully later today, but I think you and I are thinking along similar lines...

~bp
paultomasi
Thanks for all that work and suggestions and comments. I will look it over shortly. I hope and pray all goes well with your daughter operation and her recovery. and that all goes well caring for your grand kids. God bless.

 I had a server issue last night so I will not have a chance to look this over today--will get back to it tomorrow. Again thanks to all for all the help thus far.
Paul - hope all goes well, and yes agree that is the sort of thing that should be done.    Up all last night with middle son mopping sick and worse every 15 mins so no time to think sensibly about anything.

Dumping the file details to a text file is a start though to see how the raw data shows for starters so would be interesting to the results of http:#37803990 for starters (well a few lines of obvious duplicate ones).

Steve
bill
Hey, thanks bill.

lionelmm
Thank you too.

steve
Thank you and sorry to hear things aren't too good with your son at the moment either!

Creating a 'master dump file' does sound like a good idea. Just got to come up with some ideas on processing it.

BTW, the master dump file could contain the following info:

    filename.extensionname
    filepath
    filesize
    date and time file created
    date and time file last modified

    files grouped by their folder
    foldersize
    date and time folder created
My apologies on not getting back to you on this--used a recommendation about hot-swapping eSATA drives that really messed up one of my client's servers and lost me a TrueCrypt archive drive. Has had my undivided attention for about 10 days. Will get back to this this week.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I know, that is what I have been doing the last week. I have been trying to note what I am doing and add that to some of the suggested solutions but yet it is as you say--having eyes on it is really the one ay you can know for sure that what you want to save is saved, changed, renamed or deleted.
Seems like it's not a programmer you need then but, a 100 pairs of eyes and 100 pairs of hands... :)

...and lots of strong coffee!
I was hoping to automate the process because each folder had the same misc. characters and most of those were removed--however that lead to duplicates and how to deal with that. Then there were some whose names where slightly different, a space here, lack of a space there (which is tough to add or remove in programming since spaces are so ubiquitous) so lots of work was done by the suggestions provided but I have been unable to get to a point where it is all automated so I went to eyes on and decisions then and there.
Thanks for all the help--it got me started and did do a lot of the real easy deletions of duplicate files--the rest I had to manually. Thank you very much for the help. Learned a lot too.
No problem, glad you got there.

Steve