gedwardnelson
asked on
Parse middle of string based on certain and sometimes repetitive characters
Hello,
I have a table containing several thousand records of file names. I need to be able to pull out only part of the file name to use for other purposes. Below I have listed some example names and what the query result should be for each. Basically, I need to remove the .pdf file extension from the right, and I need the group of characters that contains the hyphen starting after the last underscore.
tblDrawings
fFileNames
SystemName: (query expression for parsed text)
IMC_FSB_L1_P_HUPWR-015.pdf HUPWR-015
IMC_FSB_L1U_P_CCW-001_R1_1 0.13.12.pd f CCW-001_R1_10.13.12
IMC_FSB_P_L1_HPAr-024.1.pd f HPAr-024.1
IMC_FSB_P_L1_PCWR-006-R3.p df PCWR-006-R3
IMC_FSB_P_L1_PCWS-021.1.pd f PCWS-021.1
IMC_FSB_P_L1_SWS-034_R1_12 .27.12.pdf SWS-034_R1_12.27.12
IMC_FSB_P_L1U_HFW-014.pdf HFW-014
Thanks for your help!
George
I have a table containing several thousand records of file names. I need to be able to pull out only part of the file name to use for other purposes. Below I have listed some example names and what the query result should be for each. Basically, I need to remove the .pdf file extension from the right, and I need the group of characters that contains the hyphen starting after the last underscore.
tblDrawings
fFileNames
SystemName: (query expression for parsed text)
IMC_FSB_L1_P_HUPWR-015.pdf
IMC_FSB_L1U_P_CCW-001_R1_1
IMC_FSB_P_L1_HPAr-024.1.pd
IMC_FSB_P_L1_PCWR-006-R3.p
IMC_FSB_P_L1_PCWS-021.1.pd
IMC_FSB_P_L1_SWS-034_R1_12
IMC_FSB_P_L1U_HFW-014.pdf HFW-014
Thanks for your help!
George
There is no real consistency there
Perhaps it would be easier if you created a list of the known "Prefix" characters:
IMC_FSB_L1_P_
IMC_FSB_L1U_P_
IMC_FSB_P_L1U_
..and just got everything after these characters.
Perhaps it would be easier if you created a list of the known "Prefix" characters:
IMC_FSB_L1_P_
IMC_FSB_L1U_P_
IMC_FSB_P_L1U_
..and just got everything after these characters.
Function shortfile(filename As String) As String
Dim a As Variant
Dim i As Integer
Dim tempFile As String
a = Split(filename, "_")
For i = 3 To UBound(a)
tempFile = tempFile & a(i)
Next
shortfile = Replace(tempFile, ".pdf", "")
End Function
called in query
select shortfile(fileFieldName) from tableName
Dim a As Variant
Dim i As Integer
Dim tempFile As String
a = Split(filename, "_")
For i = 3 To UBound(a)
tempFile = tempFile & a(i)
Next
shortfile = Replace(tempFile, ".pdf", "")
End Function
called in query
select shortfile(fileFieldName) from tableName
Oh, left underscores out:
Function shortfile(filename As String) As String
Dim a As Variant
Dim i As Integer
Dim tempFile As String
a = Split(filename, "_")
For i = 3 To UBound(a)
tempFile = tempFile & a(i) & "_"
Next
shortfile = Replace(tempFile, ".pdf", "")
shortfile = Left(shortfile, Len(shortfile) - 1)
End Function
Function shortfile(filename As String) As String
Dim a As Variant
Dim i As Integer
Dim tempFile As String
a = Split(filename, "_")
For i = 3 To UBound(a)
tempFile = tempFile & a(i) & "_"
Next
shortfile = Replace(tempFile, ".pdf", "")
shortfile = Left(shortfile, Len(shortfile) - 1)
End Function
ASKER
I'm hoping it wouldn't need to be done that way because this will be ongoing and the prefix characters will have unlimited possible combinations. However, the information I need will always begin immediately following the final underscore. Is there a way to count the underscores prior to the "dash" or "hyphen" then parse/pull after that underscore and trim off the .pdf off the end?
oops, you wiil need UDF codes to do that
can you upload a copy of the db with the table...
there maybe lot of variations of fFileNames..
can you upload a copy of the db with the table...
there maybe lot of variations of fFileNames..
Should have added-Create a module with the posted code.
Assumes there are always 4 elements separated by "_" before file name
The FOR statement should be
FOR i = 4 to ubound(a)
The FOR statement should be
FOR i = 4 to ubound(a)
ASKER
OK, Let me give an update to the suggestions from Capricorn and Jerry.
Cap,
Yours came close but worked only on the file names with no underscores after the after the characters I'm looking for. Any underscores after the dash are OK.
Jerry,
Yours also came close but it left in the last underscore and the characters between it and the previous underscore.
I'm attaching a copy of the table as capricorn1 suggested for you guys to look at. Thanks for the help.
FileName.accdb
Cap,
Yours came close but worked only on the file names with no underscores after the after the characters I'm looking for. Any underscores after the dash are OK.
Jerry,
Yours also came close but it left in the last underscore and the characters between it and the previous underscore.
I'm attaching a copy of the table as capricorn1 suggested for you guys to look at. Thanks for the help.
FileName.accdb
ASKER
Jerry, I did. Check out the copy I just uploaded. thanks. :-)
Database is password protected.
If there are 4 elements before desired name,
FOR i = 4 to ubound(a)
If this may vary, send db without the credential requirement
FOR i = 4 to ubound(a)
If this may vary, send db without the credential requirement
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Is there any possibility that you will have values with different file extensions?
For example, ".txt" or ".csv" instead of ".pdf"?
If so, what would you want to happen?
For example, ".txt" or ".csv" instead of ".pdf"?
If so, what would you want to happen?
One line of code can do it:
Public Function ExtractValue(ByVal strFilename As String) As String
ExtractValue = StrReverse(Mid(StrReverse( Split(strF ilename, "_", 5)(4)), 5))
End Function
/gustav
Public Function ExtractValue(ByVal strFilename As String) As String
ExtractValue = StrReverse(Mid(StrReverse(
End Function
/gustav
gedwardnelson,
You wrote:
But then look at two of your examples:
If those two examples are correct, then very clearly it is NOT the last underscore.
I think maybe you need to restate your question :)
Patrick
You wrote:
...and I need the group of characters that contains the hyphen starting after the last underscore.
But then look at two of your examples:
IMC_FSB_L1U_P_CCW-001_R1_1 0.13.12.pd f CCW-001_R1_10.13.12
IMC_FSB_P_L1_SWS-034_R1_12 .27.12.pdf SWS-034_R1_12.27.12
IMC_FSB_P_L1_SWS-034_R1_12
If those two examples are correct, then very clearly it is NOT the last underscore.
I think maybe you need to restate your question :)
Patrick
Ultimately you may want to ask the party who creates this filename to use a specific "delimiter" between the two segments.
For example, perhaps a comma?
IMC_FSB_L1U_P_,CCW-001_R1_ 10.13.12.p df
Even a double Underscore would probably do in a crunch...
IMC_FSB_L1U_P__CCW-001_R1_ 10.13.12.p df
As long as you have some specific delimiter that is placed consistently (and only used) between the segments, ...
...then what you are asking for here is ridiculously easy...
JeffCoachman
For example, perhaps a comma?
IMC_FSB_L1U_P_,CCW-001_R1_
Even a double Underscore would probably do in a crunch...
IMC_FSB_L1U_P__CCW-001_R1_
As long as you have some specific delimiter that is placed consistently (and only used) between the segments, ...
...then what you are asking for here is ridiculously easy...
JeffCoachman
Many time you get a file from a "source"
You think that is the way it will always have to be.
Sometimes if you just ask: "Hey, can you insert a comma between the segments?". ...they might say: "Sure, no problem, why didn't you ask sooner?"
Problem solved...
;-)
Jeff
You think that is the way it will always have to be.
Sometimes if you just ask: "Hey, can you insert a comma between the segments?". ...they might say: "Sure, no problem, why didn't you ask sooner?"
Problem solved...
;-)
Jeff
ASKER
I want to thank everybody for their help. I got way more responses than I'm used to. :-)
I have re-attached the file. It contains an example of the solution. I apologize for the issue earlier. This is actually a sharepoint database and I forgot about the required password when I copied the table over.
JerryB, Your second solution, ID: 38768572, worked perfectly. Like you mentioned, it takes care of the unknown and varying number of underscores.
Jeff, I agree with your suggestion about asking to put in a delimiter. But in this situation we are bound by our end client's naming convention. Each underscore separates a different piece of information. And besides, we still have a ways to go to get that team to be consistent enough already without asking them to be consistent enough to start adding commas. LOL
Mathews, Regarding your "catch" on my description, you're pulling a me. LOL. But if you'd have read the following sentence then that context would have clarified the prior sentence that you quoted. And no issue with different file extensions. The directories these come from contain mostly .pdfs. But I have the other occasional files filtered out when I first pull in the file paths.
Cactus, your proposed solution would not be flexible enough for this scenario.
Cap, your second solution worked partially but it didn't take into account varying underscores. Worked perfectly with the way the files are currently named, but if they change on the next project it wouldn't.
So, considering all of this and the amount of effort put into the various solutions I'm not sure how to break up the scores. Jerry provided the complete solution. Capricorn1 provided a couple partial solutions. I want to be fair on how to spread the points. Like I mentioned earlier I don't normally get this much help so it's much easier to score.
Thanks again everyone for your help,
George
FileName.accdb
I have re-attached the file. It contains an example of the solution. I apologize for the issue earlier. This is actually a sharepoint database and I forgot about the required password when I copied the table over.
JerryB, Your second solution, ID: 38768572, worked perfectly. Like you mentioned, it takes care of the unknown and varying number of underscores.
Jeff, I agree with your suggestion about asking to put in a delimiter. But in this situation we are bound by our end client's naming convention. Each underscore separates a different piece of information. And besides, we still have a ways to go to get that team to be consistent enough already without asking them to be consistent enough to start adding commas. LOL
Mathews, Regarding your "catch" on my description, you're pulling a me. LOL. But if you'd have read the following sentence then that context would have clarified the prior sentence that you quoted. And no issue with different file extensions. The directories these come from contain mostly .pdfs. But I have the other occasional files filtered out when I first pull in the file paths.
Cactus, your proposed solution would not be flexible enough for this scenario.
Cap, your second solution worked partially but it didn't take into account varying underscores. Worked perfectly with the way the files are currently named, but if they change on the next project it wouldn't.
So, considering all of this and the amount of effort put into the various solutions I'm not sure how to break up the scores. Jerry provided the complete solution. Capricorn1 provided a couple partial solutions. I want to be fair on how to spread the points. Like I mentioned earlier I don't normally get this much help so it's much easier to score.
Thanks again everyone for your help,
George
FileName.accdb
George,
With respect, what would the "following sentence" have been? The sentence I quoted was the last one in your original post.
Indeed, your second comment made exactly the same requirement:
So, either it is the last underscore, or it isn't. If it's not, then what is it?
Also...
That was from the original question. What happens for values that have no hyphen? Your sample database included the following entries with no hyphen:
What should be returned for those values?
Just trying to figure out what you really need, because so far you have not made it at all clear :)
Patrick
With respect, what would the "following sentence" have been? The sentence I quoted was the last one in your original post.
Indeed, your second comment made exactly the same requirement:
However, the information I need will always begin immediately following the final underscore.
So, either it is the last underscore, or it isn't. If it's not, then what is it?
Also...
I need the group of characters that contains the hyphen
That was from the original question. What happens for values that have no hyphen? Your sample database included the following entries with no hyphen:
FName
IMC_FSB_P_L1_A_Process Waste Lift Stations Rack_10.05.12.pdf
IMC_FSB_P_L1_A_Process Waste Lift Stations_10.05.12.pdf
IMC_FSB_P_L1_Seismic Support B_R2_10.26.12.pdf
IMC_FSB_P_L1_Seismic Support Pipe Layout.pdf
IMC_FSB_P_L1_Seismic Support_A_10.16.12.pdf
IMC_FSB_P_L1_Seismic Support_E_10.16.12.pdf
0492_001.pdf
What should be returned for those values?
Just trying to figure out what you really need, because so far you have not made it at all clear :)
Patrick
ASKER
Patrick,
I think I was pretty clear in my explanation and the follow-up attempt to clarify. Though I realize what's clear in my mind may not always be clear to the oerson(s) I'm explaining it too. After re-reading it I agree it could have been more clearly stated. But if you look at the examples I gave they do show the underscores after the hyphen as being part of result needed. I'll do better next time. :-)
I regards to the records you found that absolutely do not fit the criteria they will be handled individually. They're only a few. This is a work in progress. If I'd have remembered they were in the sample I would have removed them to avoid the confusion.
In my previous response I stated that JerryB's solution was the one that worked. That was what I was after.
Thanks for your help!
George
I think I was pretty clear in my explanation and the follow-up attempt to clarify. Though I realize what's clear in my mind may not always be clear to the oerson(s) I'm explaining it too. After re-reading it I agree it could have been more clearly stated. But if you look at the examples I gave they do show the underscores after the hyphen as being part of result needed. I'll do better next time. :-)
I regards to the records you found that absolutely do not fit the criteria they will be handled individually. They're only a few. This is a work in progress. If I'd have remembered they were in the sample I would have removed them to avoid the confusion.
In my previous response I stated that JerryB's solution was the one that worked. That was what I was after.
Thanks for your help!
George
select fFileNames, Mid(Left([fFileNames], Len([fFileNames]) - 4), InStrRev([fFileNames], "_") + 1) as SystemName
from tbldrawings