Link to home
Start Free TrialLog in
Avatar of gedwardnelson
gedwardnelsonFlag for United States of America

asked on

Parse middle of string based on certain and sometimes repetitive characters

Hello,

I have a table containing several thousand records of file names. I need to be able to pull out only part of the file name to use for other purposes. Below I have listed some example names and what the query result should be for each. Basically, I need to remove the .pdf file extension from the right, and I need the group of characters that contains the hyphen starting after the last underscore.

tblDrawings
fFileNames
SystemName: (query expression for parsed text)

IMC_FSB_L1_P_HUPWR-015.pdf                        HUPWR-015
IMC_FSB_L1U_P_CCW-001_R1_10.13.12.pdf      CCW-001_R1_10.13.12
IMC_FSB_P_L1_HPAr-024.1.pdf                        HPAr-024.1
IMC_FSB_P_L1_PCWR-006-R3.pdf                        PCWR-006-R3
IMC_FSB_P_L1_PCWS-021.1.pdf                        PCWS-021.1
IMC_FSB_P_L1_SWS-034_R1_12.27.12.pdf      SWS-034_R1_12.27.12
IMC_FSB_P_L1U_HFW-014.pdf                                HFW-014

Thanks for your help!
George
Avatar of Rey Obrero (Capricorn1)
Rey Obrero (Capricorn1)
Flag of United States of America image

try this query

select  fFileNames, Mid(Left([fFileNames], Len([fFileNames]) - 4), InStrRev([fFileNames], "_") + 1) as SystemName
from tbldrawings
There is no real consistency there

Perhaps it would be easier if you created a list of the known "Prefix" characters:
IMC_FSB_L1_P_
IMC_FSB_L1U_P_
IMC_FSB_P_L1U_
..and just got everything after these characters.
Function shortfile(filename As String) As String
Dim a As Variant
Dim i  As Integer
Dim tempFile As String

a = Split(filename, "_")
For i = 3 To UBound(a)
tempFile = tempFile & a(i)
Next
shortfile = Replace(tempFile, ".pdf", "")
End Function
called in query
select shortfile(fileFieldName) from tableName
Oh, left underscores out:

Function shortfile(filename As String) As String
Dim a As Variant
Dim i  As Integer
Dim tempFile As String

a = Split(filename, "_")
For i = 3 To UBound(a)
tempFile = tempFile & a(i) & "_"
Next
shortfile = Replace(tempFile, ".pdf", "")
shortfile = Left(shortfile, Len(shortfile) - 1)
End Function
Avatar of gedwardnelson

ASKER

I'm hoping it wouldn't need to be done that way because this will be ongoing and the prefix characters will have unlimited possible combinations. However, the information I need will always begin immediately following the final underscore. Is there a way to count the underscores prior to the "dash" or "hyphen" then parse/pull after that underscore and trim off the .pdf off the end?
oops, you wiil need UDF codes to do that

can you upload a copy of the db with the table...

there maybe lot of variations of fFileNames..
Should have added-Create a module with the posted code.
Assumes there are always 4 elements separated by "_" before file name
The FOR statement should be
FOR i = 4 to ubound(a)
OK, Let me give an update to the suggestions from Capricorn and Jerry.

Cap,

Yours came close but worked only on the file names with no underscores after the after the characters I'm looking for. Any underscores after the dash are OK.

Jerry,

Yours also came close but it left in the last underscore and the characters between it and the previous underscore.

I'm attaching a copy of the table as capricorn1 suggested for you guys to look at. Thanks for the help.
FileName.accdb
Jerry, I did. Check out the copy I just uploaded. thanks. :-)
Database is password protected.
If there are 4 elements before desired name,
FOR i = 4 to ubound(a)
If this may vary, send db without the credential requirement
SOLUTION
Avatar of Rey Obrero (Capricorn1)
Rey Obrero (Capricorn1)
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Is there any possibility that you will have values with different file extensions?

For example, ".txt" or ".csv" instead of ".pdf"?

If so, what would you want to happen?
One line of code can do it:

Public Function ExtractValue(ByVal strFilename As String) As String
  ExtractValue = StrReverse(Mid(StrReverse(Split(strFilename, "_", 5)(4)), 5))
End Function

/gustav
gedwardnelson,

You wrote:

...and I need the group of characters that contains the hyphen starting after the last underscore.

But then look at two of your examples:

IMC_FSB_L1U_P_CCW-001_R1_10.13.12.pdf      CCW-001_R1_10.13.12
IMC_FSB_P_L1_SWS-034_R1_12.27.12.pdf      SWS-034_R1_12.27.12

If those two examples are correct, then very clearly it is NOT the last underscore.

I think maybe you need to restate your question :)

Patrick
Ultimately you may want to ask the party who creates this filename to use a specific "delimiter" between the two segments.
For example, perhaps a comma?
IMC_FSB_L1U_P_,CCW-001_R1_10.13.12.pdf

Even a double Underscore would probably do in a crunch...
IMC_FSB_L1U_P__CCW-001_R1_10.13.12.pdf

As long as you have some specific delimiter that is placed consistently (and only used) between the segments, ...
...then what you are asking for here is ridiculously easy...

JeffCoachman
Many time you get a file from a "source"
You think that is the way it will always have to be.

Sometimes if you just ask: "Hey, can you insert a comma between the segments?". ...they might say: "Sure, no problem, why didn't you ask sooner?"

Problem solved...


;-)

Jeff
I want to thank everybody for their help. I got way more responses than I'm used to. :-)

I have re-attached the file. It contains an example of the solution. I apologize for the issue earlier. This is actually a sharepoint database and I forgot about the required password when I copied the table over.

JerryB, Your second solution, ID: 38768572, worked perfectly. Like you mentioned, it takes care of the unknown and varying number of underscores.

Jeff, I agree with your suggestion about asking to put in a delimiter. But in this situation we are bound by our end client's naming convention. Each underscore separates a different piece of information. And besides, we still have a ways to go to get that team to be consistent enough already without asking them to be consistent enough to start adding commas. LOL

Mathews, Regarding your "catch" on my description, you're pulling a me. LOL. But if you'd have read the following sentence then that context would have clarified the prior sentence that you quoted. And no issue with different file extensions. The directories these come from contain mostly .pdfs. But I have the other occasional files filtered out when I first pull in the file paths.

Cactus, your proposed solution would not be flexible enough for this scenario.

Cap, your second solution worked partially but it didn't take into account varying underscores. Worked perfectly with the way the files are currently named, but if they change on the next project it wouldn't.

So, considering all of this and the amount of effort put into the various solutions I'm not sure how to break up the scores. Jerry provided the complete solution. Capricorn1 provided a couple partial solutions. I want to be fair on how to spread the points. Like I mentioned earlier I don't normally get this much help so it's much easier to score.

Thanks again everyone for your help,
George
FileName.accdb
George,

With respect, what would the "following sentence" have been?  The sentence I quoted was the last one in your original post.

Indeed, your second comment made exactly the same requirement:

However, the information I need will always begin immediately following the final underscore.

So, either it is the last underscore, or it isn't.  If it's not, then what is it?

Also...

I need the group of characters that contains the hyphen

That was from the original question.  What happens for values that have no hyphen?  Your sample database included the following entries with no hyphen:

FName
IMC_FSB_P_L1_A_Process Waste Lift Stations Rack_10.05.12.pdf
IMC_FSB_P_L1_A_Process Waste Lift Stations_10.05.12.pdf
IMC_FSB_P_L1_Seismic Support B_R2_10.26.12.pdf
IMC_FSB_P_L1_Seismic Support Pipe Layout.pdf
IMC_FSB_P_L1_Seismic Support_A_10.16.12.pdf
IMC_FSB_P_L1_Seismic Support_E_10.16.12.pdf
0492_001.pdf

Open in new window


What should be returned for those values?

Just trying to figure out what you really need, because so far you have not made it at all clear :)

Patrick
Patrick,

I think I was pretty clear in my explanation and the follow-up attempt to clarify. Though I realize what's clear in my mind may not always be clear to the oerson(s) I'm explaining it too. After re-reading it I agree it could have been more clearly stated. But if you look at the examples I gave they do show the underscores after the hyphen as being part of result needed. I'll do better next time. :-)

I regards to the records you found that absolutely do not fit the criteria they will be handled individually. They're only a few. This is a work in progress. If I'd have remembered they were in the sample I would have removed them to avoid the confusion.

In my previous response I stated that JerryB's solution was the one that worked. That was what I was after.

Thanks for your help!
George