[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 531
  • Last Modified:

Regular Expression to remove file extensions in a comma delimited string

I've got a string of file names that are delimited.  I need strip off the file extension for each file in the delimited string and put it back into a string using the same type of delimiter.  I'd actually like 2 versions.  One uses a comma and the other uses a line break (/n) delimiter.  Can you also tell me if there is an easy way to change the delimiter?  I foresee getting files delimited by tabs.

Let's make this idiot proof and assume just about every poor naming convention is used (spaces, periods, underscores, dashes).  The constant is the delimiter.  I'll say the extension is found by finding the delimiter and backing up to the first period and stripping the period and all text up to, but not including the delimiter.  What's left is a delimited string without the extensions.  There won't be any spaces or junk at the end of the string.  A file name shouldn't, but I'd bet I'll get a file that has a space before the period.  Strip that off too.  There can also be spaces around the delimiter.  This text string should make someone else besides me sick.  It's in Windows, so I don't care about case sensitivity.

Mess of an input string:
SQL_BACKUP_20140808_111457 -  Copy.TXT,SQL_BACKUP_20140808_111439.Copy.Copy.TXT,SQL_BACKUP_20140808_ .111442.TXTX ,SQL_BACKUP_20140808_111439.hTml , SQL BACKUP_20140808_111457.LoveMyClients,SQL-BACKUP_20140808_111442 -- Copy.TXT,SQL - BACKUP_20140808_111442 - Copy - Copy.TXT

Desired output string:
SQL_BACKUP_20140808_111457 -  Copy,SQL_BACKUP_20140808_111439.Copy.Copy,SQL_BACKUP_20140808_ .111442,SQL_BACKUP_20140808_111439,SQL BACKUP_20140808_111457,SQL-BACKUP_20140808_111442 -- Copy,SQL - BACKUP_20140808_111442 - Copy - Copy


p.s. I'm using MS System Center Orchestrator and it has less-than-great tools for parsing text and I don't want to pipe this out to a file and use PowerShell to modify and then read it back in.  I need to do that in some places, but I want to be able to clean up this mess of a string once and work with the cleaned up string.  I'll actually build an IP using this expression if I can and then share with the world.
0
jimbob_sf
Asked:
jimbob_sf
  • 5
  • 2
1 Solution
 
Terry WoodsIT GuruCommented:
Do you have any filenames containing the delimiter character? If so, can you please confirm how they are escaped?
0
 
Terry WoodsIT GuruCommented:
Also, I'm not clear on whether you have a specific tool you want to use; you seem to be implying that MS System Center Orchestrator doesn't handle regular expressions. Is this task one that will be an ongoing process (and thus need to be automated somehow) or is a one-off solution ok, such as doing a regex replace using notepad++ ?
0
 
Terry WoodsIT GuruCommented:
To do the replace where the delimiter is a comma, use pattern (note the first character is intentionally a space):
 *\.[^.]*(?=,|\r?$)

Open in new window

with the replacement being an empty string.

I tested this on regex101.com (with the g pattern modifier set) and the pattern looks like it works nicely. The pattern should work in a variety of tools, depending on what suits you.

Version for a \n delimiter:
 *\.[^.]*(?=\n|\r?$)

Open in new window

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
jimbob_sfAuthor Commented:
I didn't press Submit before you actually gave me the answers!  Looks great.  Thank you!


Great questions.

No file names will contain the delimiter.  If they do, my error trapping will prove itself;)  In fact, having the expression vomit would be a great outcome.  Puke is easier to detect than a valid string that just looks for files that don't exist;)

Orchestrator can use regular expressions, so I'm ok there and yes, I'll be using this one time and again.  The root of this is working with a client that is either sending us a bunch of files via FTP or we're pulling them off their FTP server.  Either way, I'm starting with scanning a folder for particular things.  There are a bunch of Integration Packs that do some of this stuff, but being that I'm going to be using this particular function again and again, I'm actually going to create my own IP with a few functions I use a lot.  I use Notepad ++ on a lot of my machines btw.

Anders Bengtsson's Blog is a great place to check out as I learn SCORCH stuff.  This is how I'll start with building my own IP with this solution.
http://contoso.se/blog/?p=2802
0
 
Terry WoodsIT GuruCommented:
Great; let me know if I can be of any further help!
0
 
jimbob_sfAuthor Commented:
What if I wanted the opposite?  I just ran into some Orchestrator "personality".  The way I've been using your patterns, it returns the file name.  Now, I'm trying a different activity that is returning the stripped out text (the extension).  Can you whip up 2 versions that return the opposite?

So, instead of the file name without the extension stripped off, it would return the extension, including the period.

This is all stemming from Orchestrator's stubborn inability to more fully manage flattening of output.
0
 
Terry WoodsIT GuruCommented:
For a replace where the delimiter is a comma:
(?<=,|^)[^.,]* *

Open in new window

with the replacement being an empty string.

For a replace where the delimiter is a \n character:
(?<=\n|^)[^.\n]* *

Open in new window


Try those...  If they don't work, I suggest you post a new question so that you can get a speedier response :-)
0

Featured Post

Upgrade your Question Security!

Add Premium security features to your question to ensure its privacy or anonymity. Learn more about your ability to control Question Security today.

  • 5
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now