CREATE A QUERY COLUMN FROM A GIVEN STRING IN ANOTHER COLUMN

howcheat
howcheat used Ask the Experts™
on
I have a query /table with a data string of varying length and verbage in a column named "obligation" (which is a text field). some rows contain the obligation number and others do not. The obligation number, when present is in varying locations in the obligation field.  I need to make another column in my query to identify the rows with an obligation number. The format for the obligation number is 6 characters followed by a dash (-). Is there any way I  
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Author

Commented:
Disregard last incomplete sentence (Is there any way I)

thx
Ron MalmsteadInformation Services Manager

Commented:
Can you post a few examples of the raw data you are trying to get the obligation number substring from?

This seems like a "workaround" however, the proper way to do this is to have an obligation number field.  Extracting substrings from a mixed text field, which you intend to perform "lookups" on, is a lot of overhead for MSAccess, especially when the table grows large.
Top Expert 2010

Commented:
I like to use Regular Expressions for this.  To learn more about them, please see:

http://www.experts-exchange.com/Programming/Languages/Visual_Basic/A_1336-Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html

From that article, add the RegExpFind function to a regular VBA module in your VBA project.  The source code is:

Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos, _
    Optional MatchCase As Boolean = True, Optional ReturnType As Long = 0, _
    Optional MultiLine As Boolean = False)
    
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    
    ' For more info, please see:
    ' http://www.experts-exchange.com/articles/Programming/Languages/Visual_Basic/Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html
    
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    
    ' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
    ' pattern (PatternStr).  Use Pos to indicate which match you want:
    ' Pos omitted               : function returns a zero-based array of all matches
    ' Pos = 1                   : the first match
    ' Pos = 2                   : the second match
    ' Pos = <positive integer>  : the Nth match
    ' Pos = 0                   : the last match
    ' Pos = -1                  : the last match
    ' Pos = -2                  : the 2nd to last match
    ' Pos = <negative integer>  : the Nth to last match
    ' If Pos is non-numeric, or if the absolute value of Pos is greater than the number of
    ' matches, the function returns an empty string.  If no match is found, the function returns
    ' an empty string.  (Earlier versions of this code used zero for the last match; this is
    ' retained for backward compatibility)
    
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    
    ' ReturnType indicates what information you want to return:
    ' ReturnType = 0            : the matched values
    ' ReturnType = 1            : the starting character positions for the matched values
    ' ReturnType = 2            : the lengths of the matched values
    
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    
    ' Note: RegExp counts the character positions for the Match.FirstIndex property as starting
    ' at zero.  Since VB6 and VBA has strings starting at position 1, I have added one to make
    ' the character positions conform to VBA/VB6 expectations
    
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    
    Static RegX As Object
    Dim TheMatches As Object
    Dim Answer()
    Dim Counter As Long
    
    ' Evaluate Pos.  If it is there, it must be numeric and converted to Long
    
    If Not IsMissing(Pos) Then
        If Not IsNumeric(Pos) Then
            RegExpFind = ""
            Exit Function
        Else
            Pos = CLng(Pos)
        End If
    End If
    
    ' Evaluate ReturnType
    
    If ReturnType < 0 Or ReturnType > 2 Then
        RegExpFind = ""
        Exit Function
    End If
    
    ' Create instance of RegExp object if needed, and set properties
    
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
    End With
        
    ' Test to see if there are any matches
    
    If RegX.Test(LookIn) Then
        
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        
        Set TheMatches = RegX.Execute(LookIn)
        
        ' Test to see if Pos is negative, which indicates the user wants the Nth to last
        ' match.  If it is, then based on the number of matches convert Pos to a positive
        ' number, or zero for the last match
        
        If Not IsMissing(Pos) Then
            If Pos < 0 Then
                If Pos = -1 Then
                    Pos = 0
                Else
                    
                    ' If Abs(Pos) > number of matches, then the Nth to last match does not
                    ' exist.  Return a zero-length string
                    
                    If Abs(Pos) <= TheMatches.Count Then
                        Pos = TheMatches.Count + Pos + 1
                    Else
                        RegExpFind = ""
                        GoTo Cleanup
                    End If
                End If
            End If
        End If
        
        ' If Pos is missing, user wants array of all matches.  Build it and assign it as the
        ' function's return value
        
        If IsMissing(Pos) Then
            ReDim Answer(0 To TheMatches.Count - 1)
            For Counter = 0 To UBound(Answer)
                Select Case ReturnType
                    Case 0: Answer(Counter) = TheMatches(Counter)
                    Case 1: Answer(Counter) = TheMatches(Counter).FirstIndex + 1
                    Case 2: Answer(Counter) = TheMatches(Counter).Length
                End Select
            Next
            RegExpFind = Answer
        
        ' User wanted the Nth match (or last match, if Pos = 0).  Get the Nth value, if possible
        
        Else
            Select Case Pos
                Case 0                          ' Last match
                    Select Case ReturnType
                        Case 0: RegExpFind = TheMatches(TheMatches.Count - 1)
                        Case 1: RegExpFind = TheMatches(TheMatches.Count - 1).FirstIndex + 1
                        Case 2: RegExpFind = TheMatches(TheMatches.Count - 1).Length
                    End Select
                Case 1 To TheMatches.Count      ' Nth match
                    Select Case ReturnType
                        Case 0: RegExpFind = TheMatches(Pos - 1)
                        Case 1: RegExpFind = TheMatches(Pos - 1).FirstIndex + 1
                        Case 2: RegExpFind = TheMatches(Pos - 1).Length
                    End Select
                Case Else                       ' Invalid item number
                    RegExpFind = ""
            End Select
        End If
    
    ' If there are no matches, return empty string
    
    Else
        RegExpFind = ""
    End If
    
Cleanup:
    ' Release object variables
    
    Set TheMatches = Nothing
    
End Function

Open in new window


Once you have done that, you can run a query like this:


SELECT Obligation, RegExpFind(Obligation, "[A-Za-z0-9]{6}-", 1) AS ObligationNumber
FROM SomeTable

Open in new window


That looks for the first instance of six letters or digits, followed by a hyphen.  If no such text is found, it returns a zero length string.

Patrick
Top Expert 2016

Commented:
place this in a column grid of your query


obligation number:right(left([obligation], inStr([obligation], "-") - 1), 6)

Top Expert 2016

Commented:
or use this query

select [obligation],right(left([obligation], inStr([obligation], "-") - 1), 6) as [obligation number]
from yourtable

As long as your text field only has a "-" when there's an Obligation Number present, this should work:

ObligationNo: IIf(InStr([Obligation],"-")=0,"No Obligation Number Present",Mid([Obligation],InStr([Obligation],"-")-6,6))

Flyster
Top Expert 2010

Commented:
Suppose that the current value of Obligation is "blah blah blah 123456- blah blah blah".

cap1's expression will return " blah ".
Top Expert 2016

Commented:
@patrick
<cap1's expression will return " blah ".>

what erratic codes did you  use ?
Top Expert 2010

Commented:
cap1,

Not sure what you mean.  If you feed in "blah blah blah 123456- blah blah blah" as the string value to parse in your expression, the answer you get back is " blah ".

The Asker indicated that the obligation number could show up anywhere in the field value, and as such I wanted to test what would happen if there were text both before and after the obligation number.

Patrick
Top Expert 2016

Commented:
try it first before making a comment....
Top Expert 2010

Commented:
cap1,

My apologies.  "blah blah blah 123456- blah blah blah" returns "123456", as expected.

However, suppose the input were "blah blah blah 123- blah blah blah".  The output would be "ah 123", and that would be incorrect.

Also, if the input were simply "123-" then the output would be "123".  Again, incorrect.

It would also be useful to tighten up my pattern string to:

"\b[A-Za-z0-9]{6}-"

Or, if the dash is not supposed to be returned:

"\b[A-Za-z0-9]{6}(?=-)"

That uses a lookahead to see if the next character is a dash, but does not include the dash in the result.

All I am trying to say is that any solution that assumes any entry with a dash is going to contain a valid obligation number is going to be vulnerable to false positives.
Top Expert 2016

Commented:
the format is

    obligation number is 6 characters followed by a dash (-).
Top Expert 2010

Commented:
Yes, and my suggestion specifically tests to ensure that that condition is met.  Your does not.  Your suggestion is simply looking for the first dash, and assuming that it follows the first valid obligation number.  That might be OK, but then again it might not be.  It depends on what all goes into that column.

It's not personal, Rey :)
Top Expert 2016

Commented:
what  business rule do we have to follow?  the condition was clearly given by the OP
Top Expert 2010

Commented:
Yes, and note that he did NOT say either of these:

1) That there will always be a valid obligation number
2) That the first dash you encounter will ALWAYS be part of a valid obligation number

You are assuming both to be true, and that may or may not be valid.
Ron MalmsteadInformation Services Manager

Commented:
I think it's worth repeating that...the proper way to do this is to have a dedicated field for the information.  Expecting the users who enter this information to follow a "rule" is probably going to have limited success.  Workarounds like this tend to lead to bigger issues down the road.

Author

Commented:
i messed this up! But all experts were close: I said it "ends" with the dash but it doesn't. After the dash there are additional characters in the obligation number. So I should have said six characters followed by a dash and then additional characters. So the number will actually end on the space after the obligation number.
Top Expert 2016

Commented:
use a query with a where clause


select [obligation],right(left([obligation], inStr([obligation], "-") - 1), 6) as [obligation number]
from yourtable
where instr([obligation],"-") >0
Top Expert 2010

Commented:
OK, still using the RegExpFind function from http://www.experts-exchange.com/Programming/Languages/Visual_Basic/A_1336-Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html


SELECT Obligation, RegExpFind(Obligation, "\b[A-Za-z0-9]{6}-[A-Za-z0-9]+", 1) AS ObligationNumber
FROM SomeTable

Open in new window


That pattern matches 6 alphanumeric characters, followed by a dash, followed by 1 or more alphanumeric characters.

If by "characters" you meant only letters...

SELECT Obligation, RegExpFind(Obligation, "\b[A-Za-z]{6}-[A-Za-z]+", 1) AS ObligationNumber
FROM SomeTable

Open in new window


If by "characters" you meant only digits...

SELECT Obligation, RegExpFind(Obligation, "\b\d{6}-\d+", 1) AS ObligationNumber
FROM SomeTable

Open in new window

Top Expert 2010

Commented:
howcheat,

Have you had an opportunity to test any of these approaches?

Patrick

Author

Commented:
Patrick,

Your last expression came close but did not display the remaining characters in the obligation number. For example, it detected and displayed W9789G- but did not show the remaining characters in the word (It stopped after the first dash). The problem may be the additional dashes that are in the remaining string.

Example with obligation number in quotes for this demo only: This is a test "W1234G-07-F-0012" of what I am attempting to retrieve.   The other "W9876Z-7-001"  dashes  have no consistent number of digits nor standard position.
Try this. It should work as long as there are no spaces in your text string prior to the obligation number:

ObligationNo: IIf(InStr([Obligation],"-")=0,"No Obligation Number Present",Mid([Obligation],InStr([Obligation],"-")-6,InStr([Obligation]," ")-InStr([Obligation],"-")+6))
Top Expert 2010
Commented:
I see.  You didn't really say there could be multiple dashes :)

OK, still using the RegExpFind function from http://www.experts-exchange.com/Programming/Languages/Visual_Basic/A_1336-Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html

SELECT Obligation, RegExpFind(Obligation, "\b[A-Za-z0-9]{6}(-[A-Za-z0-9]+)+", 1) AS ObligationNumber
FROM SomeTable

Open in new window


That matches:

6 alphanumerics, followed by at least 1 instance of a dash followed by 1+ alphanumerics

Author

Commented:
That's it Patrick!!
Top Expert 2010

Commented:
howcheat,

Glad to help!  If you have not already done so, I would really appreciate it if you could please return to my article
http://www.experts-exchange.com/Programming/Languages/Visual_Basic/A_1336-Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html
and click 'Yes' for the 'Was this helpful?' voting.

Patrick

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial