Solved

Parse middle of string based on certain and sometimes repetitive characters

Posted on 2013-01-11
22
251 Views
Last Modified: 2013-01-16
Hello,

I have a table containing several thousand records of file names. I need to be able to pull out only part of the file name to use for other purposes. Below I have listed some example names and what the query result should be for each. Basically, I need to remove the .pdf file extension from the right, and I need the group of characters that contains the hyphen starting after the last underscore.

tblDrawings
fFileNames
SystemName: (query expression for parsed text)

IMC_FSB_L1_P_HUPWR-015.pdf                        HUPWR-015
IMC_FSB_L1U_P_CCW-001_R1_10.13.12.pdf      CCW-001_R1_10.13.12
IMC_FSB_P_L1_HPAr-024.1.pdf                        HPAr-024.1
IMC_FSB_P_L1_PCWR-006-R3.pdf                        PCWR-006-R3
IMC_FSB_P_L1_PCWS-021.1.pdf                        PCWS-021.1
IMC_FSB_P_L1_SWS-034_R1_12.27.12.pdf      SWS-034_R1_12.27.12
IMC_FSB_P_L1U_HFW-014.pdf                                HFW-014

Thanks for your help!
George
0
Comment
Question by:gedwardnelson
  • 7
  • 5
  • 3
  • +3
22 Comments
 
LVL 120

Expert Comment

by:Rey Obrero (Capricorn1)
ID: 38768352
try this query

select  fFileNames, Mid(Left([fFileNames], Len([fFileNames]) - 4), InStrRev([fFileNames], "_") + 1) as SystemName
from tbldrawings
0
 
LVL 74

Expert Comment

by:Jeffrey Coachman
ID: 38768360
There is no real consistency there

Perhaps it would be easier if you created a list of the known "Prefix" characters:
IMC_FSB_L1_P_
IMC_FSB_L1U_P_
IMC_FSB_P_L1U_
..and just got everything after these characters.
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 38768372
Function shortfile(filename As String) As String
Dim a As Variant
Dim i  As Integer
Dim tempFile As String

a = Split(filename, "_")
For i = 3 To UBound(a)
tempFile = tempFile & a(i)
Next
shortfile = Replace(tempFile, ".pdf", "")
End Function
called in query
select shortfile(fileFieldName) from tableName
0
Three Reasons Why Backup is Strategic

Backup is strategic to your business because your data is strategic to your business. Without backup, your business will fail. This white paper explains why it is vital for you to design and immediately execute a backup strategy to protect 100 percent of your data.

 
LVL 26

Expert Comment

by:jerryb30
ID: 38768379
Oh, left underscores out:

Function shortfile(filename As String) As String
Dim a As Variant
Dim i  As Integer
Dim tempFile As String

a = Split(filename, "_")
For i = 3 To UBound(a)
tempFile = tempFile & a(i) & "_"
Next
shortfile = Replace(tempFile, ".pdf", "")
shortfile = Left(shortfile, Len(shortfile) - 1)
End Function
0
 

Author Comment

by:gedwardnelson
ID: 38768387
I'm hoping it wouldn't need to be done that way because this will be ongoing and the prefix characters will have unlimited possible combinations. However, the information I need will always begin immediately following the final underscore. Is there a way to count the underscores prior to the "dash" or "hyphen" then parse/pull after that underscore and trim off the .pdf off the end?
0
 
LVL 120

Expert Comment

by:Rey Obrero (Capricorn1)
ID: 38768395
oops, you wiil need UDF codes to do that

can you upload a copy of the db with the table...

there maybe lot of variations of fFileNames..
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 38768412
Should have added-Create a module with the posted code.
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 38768439
Assumes there are always 4 elements separated by "_" before file name
The FOR statement should be
FOR i = 4 to ubound(a)
0
 

Author Comment

by:gedwardnelson
ID: 38768475
OK, Let me give an update to the suggestions from Capricorn and Jerry.

Cap,

Yours came close but worked only on the file names with no underscores after the after the characters I'm looking for. Any underscores after the dash are OK.

Jerry,

Yours also came close but it left in the last underscore and the characters between it and the previous underscore.

I'm attaching a copy of the table as capricorn1 suggested for you guys to look at. Thanks for the help.
FileName.accdb
0
 

Author Comment

by:gedwardnelson
ID: 38768478
Jerry, I did. Check out the copy I just uploaded. thanks. :-)
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 38768513
Database is password protected.
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 38768529
If there are 4 elements before desired name,
FOR i = 4 to ubound(a)
If this may vary, send db without the credential requirement
0
 
LVL 120

Assisted Solution

by:Rey Obrero (Capricorn1)
Rey Obrero (Capricorn1) earned 100 total points
ID: 38768543
use this function

function getSystemName(str)

dim s, j
s=str

for j=1 to 4

     s = mid(s, InStr(s, "_") + 1)
Next
getSystemName = Left(s, Len(s) - 4)


end function


now use it in a query

select  fFileNames,getSystemName([fFileNames]) as SystemName
from tbldrawings
0
 
LVL 26

Accepted Solution

by:
jerryb30 earned 400 total points
ID: 38768572
If there are indeterminate number of "_" before desired file name:
Function shortfile(filename As String) As String
Dim a As Variant
Dim i  As Integer
Dim tempFile As String
Dim ctr As Integer
ctr = 0
Dim x As Integer
For x = 1 To Len(filename)
If Mid(filename, x, 1) = "-" Then
Exit For
End If
If Mid(filename, x, 1) = "_" Then
ctr = ctr + 1


End If
Next x
'MsgBox ctr
a = Split(filename, "_")
For i = ctr To UBound(a)
tempFile = tempFile & a(i) & "_"
Next i
shortfile = Replace(tempFile, ".pdf", "")
shortfile = Left(shortfile, Len(shortfile) - 1)
End Function

Open in new window

0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 38768598
Is there any possibility that you will have values with different file extensions?

For example, ".txt" or ".csv" instead of ".pdf"?

If so, what would you want to happen?
0
 
LVL 49

Expert Comment

by:Gustav Brock
ID: 38769034
One line of code can do it:

Public Function ExtractValue(ByVal strFilename As String) As String
  ExtractValue = StrReverse(Mid(StrReverse(Split(strFilename, "_", 5)(4)), 5))
End Function

/gustav
0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 38769075
gedwardnelson,

You wrote:

...and I need the group of characters that contains the hyphen starting after the last underscore.

But then look at two of your examples:

IMC_FSB_L1U_P_CCW-001_R1_10.13.12.pdf      CCW-001_R1_10.13.12
IMC_FSB_P_L1_SWS-034_R1_12.27.12.pdf      SWS-034_R1_12.27.12

If those two examples are correct, then very clearly it is NOT the last underscore.

I think maybe you need to restate your question :)

Patrick
0
 
LVL 74

Expert Comment

by:Jeffrey Coachman
ID: 38769222
Ultimately you may want to ask the party who creates this filename to use a specific "delimiter" between the two segments.
For example, perhaps a comma?
IMC_FSB_L1U_P_,CCW-001_R1_10.13.12.pdf

Even a double Underscore would probably do in a crunch...
IMC_FSB_L1U_P__CCW-001_R1_10.13.12.pdf

As long as you have some specific delimiter that is placed consistently (and only used) between the segments, ...
...then what you are asking for here is ridiculously easy...

JeffCoachman
0
 
LVL 74

Expert Comment

by:Jeffrey Coachman
ID: 38769231
Many time you get a file from a "source"
You think that is the way it will always have to be.

Sometimes if you just ask: "Hey, can you insert a comma between the segments?". ...they might say: "Sure, no problem, why didn't you ask sooner?"

Problem solved...


;-)

Jeff
0
 

Author Comment

by:gedwardnelson
ID: 38771200
I want to thank everybody for their help. I got way more responses than I'm used to. :-)

I have re-attached the file. It contains an example of the solution. I apologize for the issue earlier. This is actually a sharepoint database and I forgot about the required password when I copied the table over.

JerryB, Your second solution, ID: 38768572, worked perfectly. Like you mentioned, it takes care of the unknown and varying number of underscores.

Jeff, I agree with your suggestion about asking to put in a delimiter. But in this situation we are bound by our end client's naming convention. Each underscore separates a different piece of information. And besides, we still have a ways to go to get that team to be consistent enough already without asking them to be consistent enough to start adding commas. LOL

Mathews, Regarding your "catch" on my description, you're pulling a me. LOL. But if you'd have read the following sentence then that context would have clarified the prior sentence that you quoted. And no issue with different file extensions. The directories these come from contain mostly .pdfs. But I have the other occasional files filtered out when I first pull in the file paths.

Cactus, your proposed solution would not be flexible enough for this scenario.

Cap, your second solution worked partially but it didn't take into account varying underscores. Worked perfectly with the way the files are currently named, but if they change on the next project it wouldn't.

So, considering all of this and the amount of effort put into the various solutions I'm not sure how to break up the scores. Jerry provided the complete solution. Capricorn1 provided a couple partial solutions. I want to be fair on how to spread the points. Like I mentioned earlier I don't normally get this much help so it's much easier to score.

Thanks again everyone for your help,
George
FileName.accdb
0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 38772612
George,

With respect, what would the "following sentence" have been?  The sentence I quoted was the last one in your original post.

Indeed, your second comment made exactly the same requirement:

However, the information I need will always begin immediately following the final underscore.

So, either it is the last underscore, or it isn't.  If it's not, then what is it?

Also...

I need the group of characters that contains the hyphen

That was from the original question.  What happens for values that have no hyphen?  Your sample database included the following entries with no hyphen:

FName
IMC_FSB_P_L1_A_Process Waste Lift Stations Rack_10.05.12.pdf
IMC_FSB_P_L1_A_Process Waste Lift Stations_10.05.12.pdf
IMC_FSB_P_L1_Seismic Support B_R2_10.26.12.pdf
IMC_FSB_P_L1_Seismic Support Pipe Layout.pdf
IMC_FSB_P_L1_Seismic Support_A_10.16.12.pdf
IMC_FSB_P_L1_Seismic Support_E_10.16.12.pdf
0492_001.pdf

Open in new window


What should be returned for those values?

Just trying to figure out what you really need, because so far you have not made it at all clear :)

Patrick
0
 

Author Comment

by:gedwardnelson
ID: 38772731
Patrick,

I think I was pretty clear in my explanation and the follow-up attempt to clarify. Though I realize what's clear in my mind may not always be clear to the oerson(s) I'm explaining it too. After re-reading it I agree it could have been more clearly stated. But if you look at the examples I gave they do show the underscores after the hyphen as being part of result needed. I'll do better next time. :-)

I regards to the records you found that absolutely do not fit the criteria they will be handled individually. They're only a few. This is a work in progress. If I'd have remembered they were in the sample I would have removed them to avoid the confusion.

In my previous response I stated that JerryB's solution was the one that worked. That was what I was after.

Thanks for your help!
George
0

Featured Post

Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

QuickBooks® has a great invoice interface that we were happy with for a while but that changed in 2001 through no fault of Intuit®. Our industry's unit names are dictated by RUS: the Rural Utilities Services division of USDA. Contracts contain un…
Phishing attempts can come in all forms, shapes and sizes. No matter how familiar you think you are with them, always remember to take extra precaution when opening an email with attachments or links.
What’s inside an Access Desktop Database. Will look at the basic interface, Navigation Pane (Database Container), Tables, Queries, Forms, Report, Macro’s, and VBA code.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question