We help IT Professionals succeed at work.

Help with a solution provided on 3/4/20

Coleen Sullivan
on
Medium Priority
51 Views
Last Modified: 2020-03-26
Hello @Aikimark  

You  helped me with a solution earlier this month that I am now using on 'real' data and getting some odd results.  

When running q_eeSolution2, records 4132-4150 are included with the result and I can't determine why.

In addition, I believe your routine ignores any leading or trailing [spaces] I have included on some of the codes. For example, Some of the 'codes' are only 2 characters (SA, AB, AG, etc.), which I have entered as [space]'code'[space] in order not to capture all record where the business name may include that 2-character combination - which is (many) thousands of records. Is there a way to capture the 'code' exactly as it is entered in the table?

Thank you.
EE_Sample.accdb
Comment
Watch Question

Daniel PineaultPresident / Owner CARDA Consultants Inc.
CERTIFIED EXPERT
Distinguished Expert 2018

Commented:
Could you  provide a link to your previous discussion so we can get up to speed and possibly help.
David Johnson, CDSimple Geek from the '70s
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
CERTIFIED EXPERT
Top Expert 2014

Commented:
My first reaction was that this had to do with a Trim() function being used during the construction of the regex pattern.

Thanks for the link, David.

@Coleen,
Let's start with one of the questions I posed in the prior question.  Do you want to match the beginning of the words, the exact word, or anywhere in the business name?
Coleen SullivanIndependent Contractor

Author

Commented:
I need to have two answers for this - keeping in mind that I do not necessarily need to match the beginning of the 'Business_Name' field, although that could be an option.  I do however, need to match the entire [code] word to any word in the Busines_Name field. It seems the issue is with the 2-character codes as I described - where the preceding spaces were being ignored, resulting in thousands of incorrect records that began with those common character combinations (SA, AB, AG, SP, etc)

Example:
1. [code] = " AB" (preceded w/ [space])
            should return: "Company name AB",  not "ABRUSCO, JAMES"
2. [code] = "Bank" should return:
               "Bank of NY", "European Banking Society" and "Mr. Joseph Banks".  

Additionally, as mentioned, I can't determine why records 4132-4150 are hitting the results, as there are no [codes] that are a match.

Thanks.
CERTIFIED EXPERT
Top Expert 2014

Commented:
Thanks for that clarification.

What about " & Co"?  Should that only be an exact match, like " AB" or should it also match "& Company"?

What about "'s"?  What should that match?
CERTIFIED EXPERT
Top Expert 2014

Commented:
I think that I used the Trim() function in an earlier version of the solution.  I removed it based on OP comment.
Coleen SullivanIndependent Contractor

Author

Commented:
I have "Company" as a separate item in the code table, so "& Co" can be exact.

The 'apostrophe s'  was used to capture names that would not otherwise be considered a 'business' name, such as: "Sullivan's" , therefore  it is a yes on that being exact as well I suppose.

Wondering whether it makes sense to add logic that would force a preceding [space] to the [code] IF the [code] LENGTH is 2-characters. This would eliminate the need to store the preceding [space].  It would have to be applied for all 2-character length [codes] that were not the 'apostrophe s" as that one would generally not have the preceding [space].
CERTIFIED EXPERT
Top Expert 2014

Commented:
Thanks.  I'm playing with the pattern(s) now.

I found another issue with my prior solution that I will also address.
Coleen SullivanIndependent Contractor

Author

Commented:
Thanks
CERTIFIED EXPERT
Top Expert 2014

Commented:
I tweaked the code and added "SA" to the business codes table.  The number of rows dropped, but it is filtering what I expected to see, based on a sample of the first 550+ rows query result.
EE_Sample--2-.accdb
Coleen SullivanIndependent Contractor

Author

Commented:
This solution returns 10 fewer records that the original, however records 4132-4149 are still hitting in the result.  I included these in the test data as examples of what I am getting with my live data - on a much larger scale.  I can tweak the [codes] in my table, however I don't see what codes in the code table could be triggering these and I'm not 100% clear on what your routine is doing - which is what has prompted my return for clarity or a new solution.
CERTIFIED EXPERT
Top Expert 2014

Commented:
4132-4149 are still hitting in the result
So, are these rows expected to match?
Coleen SullivanIndependent Contractor

Author

Commented:
NO - as I've been saying, 
When running q_eeSolution2, records 4132-4150 are included with the result and I can't determine why.
Additionally, as mentioned, I can't determine why records 4132-4150 are hitting the results, as there are no [codes] that are a match.
My live data is over 1mm records. I am trying to isolate business names from individual names as best as possible providing results that are easy for individuals to review.  The 10 examples of individuals names that I provided in the sample Business_Name table are representative of many tens of thousands names of individuals that are hitting the live results.  As I mentioned, there are no codes in the code table that should be triggering these.

Coleen SullivanIndependent Contractor

Author

Commented:
Not sure if it makes a difference, however, I expect the table of Business_Codes will continue to grow
Coleen SullivanIndependent Contractor

Author

Commented:
I am getting a ‘Run-time error 13: Type mismatch’  after copying the new code to my live application.  
The debugger has highlighted the ELSE statement in the RegexMatch function:

strCodes = strCodes & “|” & oRE.Replace (rs!code, “\$1”)

this was not an issue in the file you shared earlier,
CERTIFIED EXPERT
Top Expert 2014

Commented:
Ah.  Let me check the patterns and get back to you tomorrow.
CERTIFIED EXPERT
Top Expert 2014

Commented:
Oh.  That is an artifact of my debugging.  If you want to correct that, alter any line (add a space) and position to a different line.  Then do a Debug | Compile operation.  Then save the module and close the database.  When you reopen the database, that behavior should be gone.

I'll also fix that before I post another database.
Coleen SullivanIndependent Contractor

Author

Commented:
ok. We’re you able to resolve the main issue though-as explained a few posts back?
Coleen SullivanIndependent Contractor

Author

Commented:
The Type mismatch run-time error 13 is still an issue after editing the code as you suggest.
I notice the replacement text you are using is: "\$1" which is different from what you were using before:   "\."  
      Is this the reason for the 'Type" mismatch?
Also, the live database I have copied the code to is on a different computer running Access 2007-2016 - this may also be the issue.
Please advise.
CERTIFIED EXPERT
Top Expert 2014

Commented:
The first set of replace operations is different from before.  I needed to escape not only the period character but also parentheses characters (open and close).  My regex pattern is any of these three.  The replacement is a backslash prior to whichever character was found.

So, "." becomes "\.", "(" becomes "\(", and ")" becomes "\)".

I checked the constructed regex pattern and these substitutions are happening as expected.
Note: I used the VBA Replace() function in the original code.  I am now using the regex object's Replace() method.

The Type mismatch run-time error 13 is still an issue
On what line is the error occurring?
Please paste the routine code into your response comment.

I am testing this with Access 2010.

==================
I did a grocery store run this morning and will run another errand after 11.  I have not looked at the last rows to see why they match.
Coleen SullivanIndependent Contractor

Author

Commented:
strCodes = strCodes & “|” & oRE.Replace (rs!code, “\$1”)
CERTIFIED EXPERT
Top Expert 2014

Commented:
Is there really a space after Replace?
Coleen SullivanIndependent Contractor

Author

Commented:
yes

Coleen SullivanIndependent Contractor

Author

Commented:
sorry - no

Coleen SullivanIndependent Contractor

Author

Commented:
strCodes = strCodes & “|” & oRE.Replace(rs!code, “\$1”)
CERTIFIED EXPERT
Top Expert 2014

Commented:
CIO matches
ACOSTA, JUAN IGNACIO
AGUILERA, MAURICIO
QUINTANA, HORACIO

LDA matches
SUAREZ, NILDA MARGARITA

S A matches
ACOSTA, CARLOS ALBERTO
AGOSTINI, CARLOS AURELIO
AGROPECUARIA, LOS ABUELOS
PAZ, LUIS ALBERTO
PETERSEN, TOMAS ANDRES

SL matches
ISLA, SERGIO
ISLAS, SILVIA MARINA


USA matches
RAMIREZ, SUSANA RAQUEL


SCA matches
REYES PACHEO, OSCAR
SILVA, HUGO OSCAR
SULLIVAN, OSCAR
CERTIFIED EXPERT
Top Expert 2014

Commented:
Please paste the function code that you are invoking.
CERTIFIED EXPERT
Top Expert 2014

Commented:
Ah.  You must have used Word to do some editing.  The line that is causing you an error contains smart quotes rather than ASCII quotes.
Coleen SullivanIndependent Contractor

Author

Commented:
I will look at the code line again.
As for the names that are incorrectly appearing in the results - this is not acceptable.  
I can't have every occurance of the code within in a "word" - it must be the code word only - somewhere in the field. in the field.
I think for all the 2 & 3 character text [codes], I will have to store preceding and trailing spaces in the table unless -as I have asked previously- you can do this in the code?



Coleen SullivanIndependent Contractor

Author

Commented:
The examples I provided yesterday should have clarified this:

1. [code] = " AB" (preceded w/ [space])
            should return: "Company name AB",  not "ABRUSCO, JAMES"
2. [code] = "Bank" should return:
               "Bank of NY", "European Banking Society" and "Mr. Joseph Banks".  
Coleen SullivanIndependent Contractor

Author

Commented:
This would translate to matching
  1. The exact [code] word with any of the words in the Business_name field
  2. The beginning of the [code] word with the beginning of any of the words in the Business_Name field

NOT the [code] string anywhere in the Business_Name field
CERTIFIED EXPERT
Top Expert 2014

Commented:
These results are based on the codes in the table and the one that I added.

These (last rows) matches are for codes that do not begin with a space.  Therefore, they will match anywhere in the business name.

This brings us back to the conversation that I tried to have in the original question.
Coleen SullivanIndependent Contractor

Author

Commented:
Was my last comment more helpful?
This would translate to matching
  1. The exact [code] word with any of the words in the Business_name field
  2. The beginning of the [code] word with the beginning of any of the words in the Business_Name field

NOT the [code] string anywhere in the Business_Name field 
CERTIFIED EXPERT
Top Expert 2014

Commented:
Was my last comment more helpful?
Yes
Coleen SullivanIndependent Contractor

Author

Commented:
Great
CERTIFIED EXPERT
Top Expert 2014
Commented:
This version of the function resulted in 3829 rows.  Please test.
Option Explicit
Option Compare Database

Public Function PartialCodeMatch(ByVal parmBusinessCode As Variant) As Boolean
    Dim vParts As Variant
    Dim vItem As Variant
    If IsNull(parmBusinessCode) Then
        PartialCodeMatch = False
        Exit Function
    End If
    vParts = Split(parmBusinessCode, " ")
    For Each vItem In vParts
        If IsNull(DLookup("Code", "[T_BusinessCodes]", "code='" & Replace(vItem, "'", "''") & "'")) Then
        Else
            PartialCodeMatch = True
            Exit Function
        End If
    Next
    PartialCodeMatch = False
End Function

Public Function RegexMatch(ByVal parmBusinessCode As Variant) As Boolean
    Static oRE As Object
    Dim rs As Recordset
    Dim strCodes As String
    Dim strWholeWordCodes As String
    
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
        oRE.pattern = "([.()])"
        
        Set rs = DBEngine(0)(0).OpenRecordset("Select Code from [T_BusinessCodes]")
        Do Until rs.EOF
            If Mid(rs!code, 1, 1) = " " Then
                strWholeWordCodes = strWholeWordCodes & "|" & oRE.Replace(rs!code, "\$1")
            Else
                strCodes = strCodes & "|" & oRE.Replace(rs!code, "\$1")
            End If
            rs.MoveNext
        Loop
        
        oRE.Global = False
        oRE.ignorecase = True
        strCodes = Mid(strCodes, 2)
        strCodes = "\b(?:" & strCodes & ")"
        strWholeWordCodes = Mid(strWholeWordCodes, 2)
        strWholeWordCodes = "(?:(?:" & strWholeWordCodes & ")\b)"
        oRE.pattern = strCodes & "|" & strWholeWordCodes
    End If
    
    If IsNull(parmBusinessCode) Then
        RegexMatch = False
        Exit Function
    End If
    
    RegexMatch = oRE.test(parmBusinessCode)

End Function

Open in new window

Coleen SullivanIndependent Contractor

Author

Commented:
I don't see any changes in this version of the function as compared to the former? Are you sure you pasted the new version?
I get 3927 records - including 4132, 4134, 4136, 4137, 4147, 4150 - most likely because I am editing the [codes] in the BusinessCodes table, i.e. adding spacesto the 2-character codes.

CERTIFIED EXPERT
Top Expert 2014

Commented:
Yes.  I added \b in this line:
strCodes = "\b(?:" & strCodes & ")"

Open in new window

Coleen SullivanIndependent Contractor

Author

Commented:
Not sure why I'm not getting the same results as you? Are you getting any records 4132-4150?
CERTIFIED EXPERT
Top Expert 2014

Commented:
No.  The last ID is 4129.
CERTIFIED EXPERT
Top Expert 2014

Commented:
Coleen

If you change the code table, please close and reopen the database.

Alternatively, you can open the VBA code window (Alt+F11) and perform a Run | Reset.

The function builds the code patterns once and reuses that for each invocation.  This allows the function to be performant.
Coleen SullivanIndependent Contractor

Author

Commented:
Thank you - I was not aware of that.
Unfortunately I am still getting records 4132 and 4134.  These would both fall under the "S A" code, that I thought I took care of with adding a space: " S A". Is your code adding these spaces as well?

CERTIFIED EXPERT
Top Expert 2014

Commented:
Are you using the codes table from the database that I posted?  It contains an SA code.  You should check that to see if it needs a leading space character.  I added it prior to some of the comments you made, clarifying the filtering requirements.
Coleen SullivanIndependent Contractor

Author

Commented:
Yes I was, but I have since edited all of the 2-character codes, adding preceding spaces
and the records in question contain "S -space- A": CARLO S ABRUZZI
CERTIFIED EXPERT
Top Expert 2014

Commented:
I've done some testing this morning.  The way I'm treating entries like "(NORWAY)" isn't correct.  Would you like to see that version of the code or is what you already have sufficient?
Coleen SullivanIndependent Contractor

Author

Commented:
absolutely- please share.
CERTIFIED EXPERT
Top Expert 2014

Commented:
Public Function RegexMatch(ByVal parmBusinessCode As Variant) As Boolean
    Static oRE As Object
    Dim rs As Recordset
    Dim strCodes As String
    Dim strWholeWordCodes As String
    Dim strDelimitedCodes As String
    
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
        oRE.pattern = "([.()])"
        
        Set rs = DBEngine(0)(0).OpenRecordset("Select Code from [T_BusinessCodes]")
        Do Until rs.EOF
            If Mid(rs!code, 1, 1) = " " Then
                strWholeWordCodes = strWholeWordCodes & "|" & oRE.Replace(rs!code, "\$1")
            ElseIf (rs!code Like "[!A-Za-z]*") Or (rs!code Like "*[!A-Za-z]") Then
                strDelimitedCodes = strDelimitedCodes & "|" & oRE.Replace(rs!code, "\$1")
            Else
                strCodes = strCodes & "|" & oRE.Replace(rs!code, "\$1")
            End If
            rs.MoveNext
        Loop
        
        oRE.Global = False
        oRE.ignorecase = True
        strCodes = Mid(strCodes, 2)
        strCodes = "\b(?:" & strCodes & ")"
        strWholeWordCodes = Mid(strWholeWordCodes, 2)
        strWholeWordCodes = "(?:" & strWholeWordCodes & ")\b"
        strDelimitedCodes = Mid(strDelimitedCodes, 2)
        strDelimitedCodes = "(?:" & strDelimitedCodes & ")"
        oRE.pattern = strCodes & "|" & strWholeWordCodes & "|" & strDelimitedCodes
    End If
    
    If IsNull(parmBusinessCode) Then
        RegexMatch = False
        Exit Function
    End If
    
    RegexMatch = oRE.test(parmBusinessCode)

End Function

Open in new window