Solved

Regular Expression to replace underscores between numbers with dots and other underscores with spaces

Posted on 2010-11-24
27
969 Views
Last Modified: 2012-05-10
Could one of you Regex gurus help me with a regular expression to replace only underscores between numbers with a dot and any other underscores with a space.

So, these two lines:
This_is_my_program_v2_0_1_3_signature
This.is.another v1_2_3-name

should become:
This is my program v2.0.1.3 signature
This.is.another v1.2.3-name

As a secondary goal I would like to also remove any text following the number so the original lines would become:
This is my program v2.0.1.3
This.is.another v1.2.3

I know, I'm pushing my luck.  ;-), but as a third goal allow for an "unless the ending text equal 'some string' "  So that for the original lines I could specify "except for '-name' " and the lines would become:
This is my program v2.0.1.3
This.is.another v1.2.3-name

Thanks for any help.  Please don't offer a link to a Regex tutorial, I already have that link. ;-)
I'm not sure what "version" of Regex I'm using but the Backreference replacment character is \ not $.  The software I need this for is a free utility called "Bulk Rename Utility" v2.7.1.2.
0
Comment
Question by:megnin
  • 15
  • 12
27 Comments
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34205760
// Goal 1
Find:

(?<=\d)_(?=\d)

Replace:

.

// Goal 2 & 3

Find:

(?<=\d(?!-name))\D*$

Replace:

[empty string]
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34205810
Disregard. I did not see the name of the utility before posting. I don't believe those patterns will work.
0
 
LVL 1

Author Comment

by:megnin
ID: 34206094
Okay, I'm working on "Goal 1" right now...  (?<=\d)_(?=\d) replaced with . changed all strings to "."
in red like there was an error.

I got limited success with:
Match:         (.*) (\d)_(\d)_(\d)_(\d)
Replace:      \1 \2.\3.\4.\5

but I'm guessing and using trial and error (mostly error)

That did not handle underscors between alpha characters at all.  It would replace three underscores between four single digits if there were no other underscores in the string at all, then it made no change to the string.  
0
 
LVL 1

Author Comment

by:megnin
ID: 34206340
Match:           (.*)_(\w+)_(\w+)_(\w+)(\d+)_(\d+)_(\d+)_(\d+)
Replace:        \1 \2 \3 \4.\5.\6.\7.\8

With a bunch of strings of varying numbers of works separated by _ and varying numbers of digits separated by _, the above only works if there are exactly three words then four groups of digits.

one_two_three_v6_0_1_12345-string_anystring
becomes:
one two three v6.0.1.12345

which is fine, but if there any any more or less string groups or number groups, it makes no change.

Is there a way to say, replace (\w+)_(\w+) untill you run out of words with _ between them with spaces, then replace (\d+)_(\d+) until you run out of numbers with _ between them with dots?   I see a problem with determining how many backreferences to capture.  ;-(
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34206913
I don't think it's a limitation of regex itself but that it's the way the utility processes the pattern. It seems to want the exact pattern for the whole filename and that you can't just affect parts of it (e.g. replace any digits). Is it a requirement to use this tool?
0
 
LVL 1

Author Comment

by:megnin
ID: 34206980
Using this tool is not really a requirement.  A PowerScript script would be a really good alternative.  ;-)

Yeah, you're right.  It want's every piece of the filename accounted for.  That's why I started it with (.*) and the \1 restores the first part of the filename before anything else is even matched.

I'm just learning PowerScript so that's why I said it would be a nice alternative.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34207041
Forgive my ignorance. Is PowerScript used for PowerShell?
0
 
LVL 1

Author Comment

by:megnin
ID: 34207092
No, I'm the dummy.  I meant PowerShell.  Sorry about that.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34207615
Ok. Here goes:
$files = [IO.Directory]::GetFiles("C:\your")
foreach ($file in $files)
{
    $newname = [System.Text.RegularExpressions.Regex]::Replace($file, "(?<=\d)_(?=\d)", ".")
    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "(?<=\d(?!-name))\D*$", "")
    mv "$file" "$newname"
}

Open in new window

0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34207637
P.S.

"C:\your" is of course your parent directory. This will replace any occurrence of underscores between two digits, so if you path structure has this, they will be replaced also--and only the first occurrence of such will be replaced as after that, the remaining paths are invalid because the parent directory name has been changed.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34207647
Actually, in hindsight, it will probably fail if a parent has that pattern in its name, since the replaced directory name will most likely not exist  :)
0
 
LVL 1

Author Comment

by:megnin
ID: 34207980
What if I want to change folder names?  In the parent folder is a bunch of folders that need renaming.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34208002
You can change line 1 to

    $files = [IO.Directory]::GetDirectories("C:\your")

where "your" is the parent.
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 1

Author Comment

by:megnin
ID: 34208372
Okay, this is fun.  I really like PowerShell, what little I've played with it...

The script replaces the _ between numbers, but it doesn't touch _ between alpha characters.

This is not an urgent project I'm working on.  It's more of a learning experience for me at this point.  I'm going home in a few minutes, but I'll play with this some more at home.  ;-)
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34208621
Sorry, I forgot about the "and any other underscores with a space" requirement. Modified:
$files = [IO.Directory]::GetDirectories("C:\your")

foreach ($file in $files)

{

    $newname = [System.Text.RegularExpressions.Regex]::Replace($file, "(?<=\d)_(?=\d)", ".")

    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "_", " ")

    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "(?<=\d(?!-name))\D*$", "")

    mv "$file" "$newname"

}

Open in new window

0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34208956
Again, that will replace ALL underscores, so

   C:\path_to\some\folder_you\target

becomes

   C:\path to\some\folder you\target

which will probably break. Actually, since it's bugging me that much, here's a modified script that should only affect the target directory (and not its path). You can switch between files and directories by changing line 2 to either of the following, accordingly:

    $fsObjects = $fso.GetDirectories()
    $fsObjects = $fso.GetFiles()
$fso = New-Object IO.DirectoryInfo("C:\your")

$fsObjects = $fso.GetDirectories()

foreach ($obj in $fsObjects)

{

    $o = [IO.FileSystemInfo] $obj

    $path = [IO.Path]::GetDirectoryName($o.FullName)

    $newname = [System.Text.RegularExpressions.Regex]::Replace($o.Name, "(?<=\d)_(?=\d)", ".")

    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "_", " ")

    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "(?<=\d(?!-name))\D*$", "")

    $obj.MoveTo([IO.Path]::Combine($path, $newname))

}

Open in new window

0
 
LVL 1

Author Comment

by:megnin
ID: 34212337
Wow!  That works great!  For filenames the extension is also remove, but I changed the exclusion ?! from -name to .txt so I can selectively preserve file extensions.

Thank you for the modified version.

This is a really great script.  It seems it could be "tweaked" to perform all sorts of file/folder renameing operations.

If I wanted to replace occurrances of %20 in the middle of a filename with a space, could I just add a line like this?...
$newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "%20", " ")  

$fso = New-Object IO.DirectoryInfo("C:\MyTestFolder")  

$fsObjects = $fso.GetFiles()  

foreach ($obj in $fsObjects)  

{  

    $o = [IO.FileSystemInfo] $obj  

    $path = [IO.Path]::GetDirectoryName($o.FullName)  

    $newname = [System.Text.RegularExpressions.Regex]::Replace($o.Name, "(?<=\d)_(?=\d)", ".")  

    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "_", " ")  

    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "(?<=\d(?!.txt))\D*$", "")  

    $obj.MoveTo([IO.Path]::Combine($path, $newname))  

}

Open in new window

0
 
LVL 1

Author Comment

by:megnin
ID: 34212351
I guess so.  :-)  I tried it and it works  (the %20 replacing thing) :-)  So, I guess the reverse would also work if you needed to put %20 in place of spaces for web links or something.
0
 
LVL 1

Author Comment

by:megnin
ID: 34212589
Could that line be modified to leave any file extension untouched?  
$newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "(?<=\d(?!.txt))\D*$", "")

Open in new window

0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34215104
Sorry for the late response--I was busy with Turkey Day  :)

You can try altering it to the following. I'm on my linux box right now so I can't test.
$newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "(?<=\d(?!.txt))\D*\..*$", "")

Open in new window

0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34230226
Correcton:
$newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "(?<=\d(?!.txt))\D*(\..*)$", "$1")

Open in new window

0
 
LVL 1

Author Comment

by:megnin
ID: 34245974
With a folder of 15 files with various multi word filenames and number in various positions between words or at the end and with various file extensions, that last example removes the file extension of seemingly random files.  About half will loose their extension and the others will retain the file extension.  There doesn't seem to be a recognizable pattern except that it varies depending on whether other elements in the filename were changed.
There were two files with .txt extensions.  The first had only a single underscore between two words and no numbers in the filename.  The seconds had underscores between words and some numbers.  The first lost it's extension and the second one kept it.  Strange.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34257440
Is it possible to post the original filenames, or perhaps sanitized names, but with the same characteristics?
0
 
LVL 1

Author Comment

by:megnin
ID: 34282437
Sorry for the delay again.  I took a week of vacation after Thanksgiving. :-}

Here's a pretty good sampling of filename characteristics:

Alpha_Beta_123_1_1.mkv
Alpha_Beta_v123_1_1.rar
Alpha_Beta_v_1_1.zip
Alpha_Beta v123-R1.avi
Alpha_Beta_1_1_1-R1.doc
Alpha Beta 1.1.1-R1.aspx
Alpha_Beta_123_1_1-R1.txt

Alpha_Beta_123_1_1.zip
Alpha_Beta_v123_1_1.ppt
Alpha_Beta_v_1_1.mkv
Alpha_Beta v123-R1.txt
Alpha_Beta_1_1_1-R1.txt
Alpha Beta 1.1.1-R1.ppt
Alpha_Beta_123_1_1-R1.mkv
0
 
LVL 74

Accepted Solution

by:
käµfm³d   👽 earned 500 total points
ID: 34283041
Silly me...   It seems that double-quotes in PS will perform variable interpolation on $[symbol] constructs. This caused the $1 in the replacement pattern to be evaluated for its value, and since $1 wasn't declared anywhere, its value was empty when the replace function tried to use it. Apparently, single quotes are needed to make the regex object evaluate the $1 as a replacement expression (below).

I also combined the underscore and "%20" into one expression since their replacement value is the same.
$fso = New-Object IO.DirectoryInfo("C:\your")
$fsObjects = $fso.GetFiles()
foreach ($obj in $fsObjects)
{
    $o = [IO.FileSystemInfo] $obj
    $path = [IO.Path]::GetDirectoryName($o.FullName)
    $newname = [System.Text.RegularExpressions.Regex]::Replace($o.Name, "(?<=\d)_(?=\d)", ".")
    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "_|%20", " ")
    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "(?<=\d(?!-name))\D*?(\.[^.]*)$", '$1')
    $path = [IO.Path]::Combine($path, $newname)
    
    If (-not (Test-Path $path))
    {
    	$obj.MoveTo($path)
    }
}

Open in new window

0
 
LVL 1

Author Closing Comment

by:megnin
ID: 34284351
I think that's done it.  Worked fine on my test folder, anyway.  :-)  My test folder probably has a "worst case scenerio" of filenames.  Thank you very much for your efforts and patience.  ;-)  I really appreciate it.  There are some good techniques in there that I can learn a lot from as well!
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34284701
NP. I don't mind sticking it out in the least--I just hate that it takes me so many posts to get you to your goal!

Glad to help nonetheless  :)
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code o…
Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now