• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1002
  • Last Modified:

Regular Expression to replace underscores between numbers with dots and other underscores with spaces

Could one of you Regex gurus help me with a regular expression to replace only underscores between numbers with a dot and any other underscores with a space.

So, these two lines:
This_is_my_program_v2_0_1_3_signature
This.is.another v1_2_3-name

should become:
This is my program v2.0.1.3 signature
This.is.another v1.2.3-name

As a secondary goal I would like to also remove any text following the number so the original lines would become:
This is my program v2.0.1.3
This.is.another v1.2.3

I know, I'm pushing my luck.  ;-), but as a third goal allow for an "unless the ending text equal 'some string' "  So that for the original lines I could specify "except for '-name' " and the lines would become:
This is my program v2.0.1.3
This.is.another v1.2.3-name

Thanks for any help.  Please don't offer a link to a Regex tutorial, I already have that link. ;-)
I'm not sure what "version" of Regex I'm using but the Backreference replacment character is \ not $.  The software I need this for is a free utility called "Bulk Rename Utility" v2.7.1.2.
0
megnin
Asked:
megnin
  • 15
  • 12
1 Solution
 
käµfm³d 👽Commented:
// Goal 1
Find:

(?<=\d)_(?=\d)

Replace:

.

// Goal 2 & 3

Find:

(?<=\d(?!-name))\D*$

Replace:

[empty string]
0
 
käµfm³d 👽Commented:
Disregard. I did not see the name of the utility before posting. I don't believe those patterns will work.
0
 
megninAuthor Commented:
Okay, I'm working on "Goal 1" right now...  (?<=\d)_(?=\d) replaced with . changed all strings to "."
in red like there was an error.

I got limited success with:
Match:         (.*) (\d)_(\d)_(\d)_(\d)
Replace:      \1 \2.\3.\4.\5

but I'm guessing and using trial and error (mostly error)

That did not handle underscors between alpha characters at all.  It would replace three underscores between four single digits if there were no other underscores in the string at all, then it made no change to the string.  
0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
megninAuthor Commented:
Match:           (.*)_(\w+)_(\w+)_(\w+)(\d+)_(\d+)_(\d+)_(\d+)
Replace:        \1 \2 \3 \4.\5.\6.\7.\8

With a bunch of strings of varying numbers of works separated by _ and varying numbers of digits separated by _, the above only works if there are exactly three words then four groups of digits.

one_two_three_v6_0_1_12345-string_anystring
becomes:
one two three v6.0.1.12345

which is fine, but if there any any more or less string groups or number groups, it makes no change.

Is there a way to say, replace (\w+)_(\w+) untill you run out of words with _ between them with spaces, then replace (\d+)_(\d+) until you run out of numbers with _ between them with dots?   I see a problem with determining how many backreferences to capture.  ;-(
0
 
käµfm³d 👽Commented:
I don't think it's a limitation of regex itself but that it's the way the utility processes the pattern. It seems to want the exact pattern for the whole filename and that you can't just affect parts of it (e.g. replace any digits). Is it a requirement to use this tool?
0
 
megninAuthor Commented:
Using this tool is not really a requirement.  A PowerScript script would be a really good alternative.  ;-)

Yeah, you're right.  It want's every piece of the filename accounted for.  That's why I started it with (.*) and the \1 restores the first part of the filename before anything else is even matched.

I'm just learning PowerScript so that's why I said it would be a nice alternative.
0
 
käµfm³d 👽Commented:
Forgive my ignorance. Is PowerScript used for PowerShell?
0
 
megninAuthor Commented:
No, I'm the dummy.  I meant PowerShell.  Sorry about that.
0
 
käµfm³d 👽Commented:
Ok. Here goes:
$files = [IO.Directory]::GetFiles("C:\your")
foreach ($file in $files)
{
    $newname = [System.Text.RegularExpressions.Regex]::Replace($file, "(?<=\d)_(?=\d)", ".")
    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "(?<=\d(?!-name))\D*$", "")
    mv "$file" "$newname"
}

Open in new window

0
 
käµfm³d 👽Commented:
P.S.

"C:\your" is of course your parent directory. This will replace any occurrence of underscores between two digits, so if you path structure has this, they will be replaced also--and only the first occurrence of such will be replaced as after that, the remaining paths are invalid because the parent directory name has been changed.
0
 
käµfm³d 👽Commented:
Actually, in hindsight, it will probably fail if a parent has that pattern in its name, since the replaced directory name will most likely not exist  :)
0
 
megninAuthor Commented:
What if I want to change folder names?  In the parent folder is a bunch of folders that need renaming.
0
 
käµfm³d 👽Commented:
You can change line 1 to

    $files = [IO.Directory]::GetDirectories("C:\your")

where "your" is the parent.
0
 
megninAuthor Commented:
Okay, this is fun.  I really like PowerShell, what little I've played with it...

The script replaces the _ between numbers, but it doesn't touch _ between alpha characters.

This is not an urgent project I'm working on.  It's more of a learning experience for me at this point.  I'm going home in a few minutes, but I'll play with this some more at home.  ;-)
0
 
käµfm³d 👽Commented:
Sorry, I forgot about the "and any other underscores with a space" requirement. Modified:
$files = [IO.Directory]::GetDirectories("C:\your")
foreach ($file in $files)
{
    $newname = [System.Text.RegularExpressions.Regex]::Replace($file, "(?<=\d)_(?=\d)", ".")
    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "_", " ")
    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "(?<=\d(?!-name))\D*$", "")
    mv "$file" "$newname"
}

Open in new window

0
 
käµfm³d 👽Commented:
Again, that will replace ALL underscores, so

   C:\path_to\some\folder_you\target

becomes

   C:\path to\some\folder you\target

which will probably break. Actually, since it's bugging me that much, here's a modified script that should only affect the target directory (and not its path). You can switch between files and directories by changing line 2 to either of the following, accordingly:

    $fsObjects = $fso.GetDirectories()
    $fsObjects = $fso.GetFiles()
$fso = New-Object IO.DirectoryInfo("C:\your")
$fsObjects = $fso.GetDirectories()
foreach ($obj in $fsObjects)
{
    $o = [IO.FileSystemInfo] $obj
    $path = [IO.Path]::GetDirectoryName($o.FullName)
    $newname = [System.Text.RegularExpressions.Regex]::Replace($o.Name, "(?<=\d)_(?=\d)", ".")
    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "_", " ")
    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "(?<=\d(?!-name))\D*$", "")
    $obj.MoveTo([IO.Path]::Combine($path, $newname))
}

Open in new window

0
 
megninAuthor Commented:
Wow!  That works great!  For filenames the extension is also remove, but I changed the exclusion ?! from -name to .txt so I can selectively preserve file extensions.

Thank you for the modified version.

This is a really great script.  It seems it could be "tweaked" to perform all sorts of file/folder renameing operations.

If I wanted to replace occurrances of %20 in the middle of a filename with a space, could I just add a line like this?...
$newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "%20", " ")  

$fso = New-Object IO.DirectoryInfo("C:\MyTestFolder")  
$fsObjects = $fso.GetFiles()  
foreach ($obj in $fsObjects)  
{  
    $o = [IO.FileSystemInfo] $obj  
    $path = [IO.Path]::GetDirectoryName($o.FullName)  
    $newname = [System.Text.RegularExpressions.Regex]::Replace($o.Name, "(?<=\d)_(?=\d)", ".")  
    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "_", " ")  
    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "(?<=\d(?!.txt))\D*$", "")  
    $obj.MoveTo([IO.Path]::Combine($path, $newname))  
}

Open in new window

0
 
megninAuthor Commented:
I guess so.  :-)  I tried it and it works  (the %20 replacing thing) :-)  So, I guess the reverse would also work if you needed to put %20 in place of spaces for web links or something.
0
 
megninAuthor Commented:
Could that line be modified to leave any file extension untouched?  
$newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "(?<=\d(?!.txt))\D*$", "")

Open in new window

0
 
käµfm³d 👽Commented:
Sorry for the late response--I was busy with Turkey Day  :)

You can try altering it to the following. I'm on my linux box right now so I can't test.
$newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "(?<=\d(?!.txt))\D*\..*$", "")

Open in new window

0
 
käµfm³d 👽Commented:
Correcton:
$newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "(?<=\d(?!.txt))\D*(\..*)$", "$1")

Open in new window

0
 
megninAuthor Commented:
With a folder of 15 files with various multi word filenames and number in various positions between words or at the end and with various file extensions, that last example removes the file extension of seemingly random files.  About half will loose their extension and the others will retain the file extension.  There doesn't seem to be a recognizable pattern except that it varies depending on whether other elements in the filename were changed.
There were two files with .txt extensions.  The first had only a single underscore between two words and no numbers in the filename.  The seconds had underscores between words and some numbers.  The first lost it's extension and the second one kept it.  Strange.
0
 
käµfm³d 👽Commented:
Is it possible to post the original filenames, or perhaps sanitized names, but with the same characteristics?
0
 
megninAuthor Commented:
Sorry for the delay again.  I took a week of vacation after Thanksgiving. :-}

Here's a pretty good sampling of filename characteristics:

Alpha_Beta_123_1_1.mkv
Alpha_Beta_v123_1_1.rar
Alpha_Beta_v_1_1.zip
Alpha_Beta v123-R1.avi
Alpha_Beta_1_1_1-R1.doc
Alpha Beta 1.1.1-R1.aspx
Alpha_Beta_123_1_1-R1.txt

Alpha_Beta_123_1_1.zip
Alpha_Beta_v123_1_1.ppt
Alpha_Beta_v_1_1.mkv
Alpha_Beta v123-R1.txt
Alpha_Beta_1_1_1-R1.txt
Alpha Beta 1.1.1-R1.ppt
Alpha_Beta_123_1_1-R1.mkv
0
 
käµfm³d 👽Commented:
Silly me...   It seems that double-quotes in PS will perform variable interpolation on $[symbol] constructs. This caused the $1 in the replacement pattern to be evaluated for its value, and since $1 wasn't declared anywhere, its value was empty when the replace function tried to use it. Apparently, single quotes are needed to make the regex object evaluate the $1 as a replacement expression (below).

I also combined the underscore and "%20" into one expression since their replacement value is the same.
$fso = New-Object IO.DirectoryInfo("C:\your")
$fsObjects = $fso.GetFiles()
foreach ($obj in $fsObjects)
{
    $o = [IO.FileSystemInfo] $obj
    $path = [IO.Path]::GetDirectoryName($o.FullName)
    $newname = [System.Text.RegularExpressions.Regex]::Replace($o.Name, "(?<=\d)_(?=\d)", ".")
    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "_|%20", " ")
    $newname = [System.Text.RegularExpressions.Regex]::Replace($newname, "(?<=\d(?!-name))\D*?(\.[^.]*)$", '$1')
    $path = [IO.Path]::Combine($path, $newname)
    
    If (-not (Test-Path $path))
    {
    	$obj.MoveTo($path)
    }
}

Open in new window

0
 
megninAuthor Commented:
I think that's done it.  Worked fine on my test folder, anyway.  :-)  My test folder probably has a "worst case scenerio" of filenames.  Thank you very much for your efforts and patience.  ;-)  I really appreciate it.  There are some good techniques in there that I can learn a lot from as well!
0
 
käµfm³d 👽Commented:
NP. I don't mind sticking it out in the least--I just hate that it takes me so many posts to get you to your goal!

Glad to help nonetheless  :)
0

Featured Post

Receive 1:1 tech help

Solve your biggest tech problems alongside global tech experts with 1:1 help.

  • 15
  • 12
Tackle projects and never again get stuck behind a technical roadblock.
Join Now