Link to home
Start Free TrialLog in
Avatar of Coleen Sullivan
Coleen SullivanFlag for United States of America

asked on

Help with a solution provided on 3/4/20

Hello @Aikimark  

You  helped me with a solution earlier this month that I am now using on 'real' data and getting some odd results.  

When running q_eeSolution2, records 4132-4150 are included with the result and I can't determine why.

In addition, I believe your routine ignores any leading or trailing [spaces] I have included on some of the codes. For example, Some of the 'codes' are only 2 characters (SA, AB, AG, etc.), which I have entered as [space]'code'[space] in order not to capture all record where the business name may include that 2-character combination - which is (many) thousands of records. Is there a way to capture the 'code' exactly as it is entered in the table?

Thank you.
EE_Sample.accdb
Avatar of Daniel Pineault
Daniel Pineault

Could you  provide a link to your previous discussion so we can get up to speed and possibly help.
My first reaction was that this had to do with a Trim() function being used during the construction of the regex pattern.

Thanks for the link, David.

@Coleen,
Let's start with one of the questions I posed in the prior question.  Do you want to match the beginning of the words, the exact word, or anywhere in the business name?
Avatar of Coleen Sullivan

ASKER

I need to have two answers for this - keeping in mind that I do not necessarily need to match the beginning of the 'Business_Name' field, although that could be an option.  I do however, need to match the entire [code] word to any word in the Busines_Name field. It seems the issue is with the 2-character codes as I described - where the preceding spaces were being ignored, resulting in thousands of incorrect records that began with those common character combinations (SA, AB, AG, SP, etc)

Example:
1. [code] = " AB" (preceded w/ [space])
            should return: "Company name AB",  not "ABRUSCO, JAMES"
2. [code] = "Bank" should return:
               "Bank of NY", "European Banking Society" and "Mr. Joseph Banks".  

Additionally, as mentioned, I can't determine why records 4132-4150 are hitting the results, as there are no [codes] that are a match.

Thanks.
Thanks for that clarification.

What about " & Co"?  Should that only be an exact match, like " AB" or should it also match "& Company"?

What about "'s"?  What should that match?
I think that I used the Trim() function in an earlier version of the solution.  I removed it based on OP comment.
I have "Company" as a separate item in the code table, so "& Co" can be exact.

The 'apostrophe s'  was used to capture names that would not otherwise be considered a 'business' name, such as: "Sullivan's" , therefore  it is a yes on that being exact as well I suppose.

Wondering whether it makes sense to add logic that would force a preceding [space] to the [code] IF the [code] LENGTH is 2-characters. This would eliminate the need to store the preceding [space].  It would have to be applied for all 2-character length [codes] that were not the 'apostrophe s" as that one would generally not have the preceding [space].
Thanks.  I'm playing with the pattern(s) now.

I found another issue with my prior solution that I will also address.
Thanks
I tweaked the code and added "SA" to the business codes table.  The number of rows dropped, but it is filtering what I expected to see, based on a sample of the first 550+ rows query result.
EE_Sample--2-.accdb
This solution returns 10 fewer records that the original, however records 4132-4149 are still hitting in the result.  I included these in the test data as examples of what I am getting with my live data - on a much larger scale.  I can tweak the [codes] in my table, however I don't see what codes in the code table could be triggering these and I'm not 100% clear on what your routine is doing - which is what has prompted my return for clarity or a new solution.
4132-4149 are still hitting in the result
So, are these rows expected to match?
NO - as I've been saying, 
When running q_eeSolution2, records 4132-4150 are included with the result and I can't determine why.
Additionally, as mentioned, I can't determine why records 4132-4150 are hitting the results, as there are no [codes] that are a match.
My live data is over 1mm records. I am trying to isolate business names from individual names as best as possible providing results that are easy for individuals to review.  The 10 examples of individuals names that I provided in the sample Business_Name table are representative of many tens of thousands names of individuals that are hitting the live results.  As I mentioned, there are no codes in the code table that should be triggering these.

Not sure if it makes a difference, however, I expect the table of Business_Codes will continue to grow
I am getting a ‘Run-time error 13: Type mismatch’  after copying the new code to my live application.  
The debugger has highlighted the ELSE statement in the RegexMatch function:

strCodes = strCodes & “|” & oRE.Replace (rs!code, “\$1”)

this was not an issue in the file you shared earlier,
Ah.  Let me check the patterns and get back to you tomorrow.
Oh.  That is an artifact of my debugging.  If you want to correct that, alter any line (add a space) and position to a different line.  Then do a Debug | Compile operation.  Then save the module and close the database.  When you reopen the database, that behavior should be gone.

I'll also fix that before I post another database.
ok. We’re you able to resolve the main issue though-as explained a few posts back?
The Type mismatch run-time error 13 is still an issue after editing the code as you suggest.
I notice the replacement text you are using is: "\$1" which is different from what you were using before:  "\."  
      Is this the reason for the 'Type" mismatch?
Also, the live database I have copied the code to is on a different computer running Access 2007-2016 - this may also be the issue.
Please advise.
The first set of replace operations is different from before.  I needed to escape not only the period character but also parentheses characters (open and close).  My regex pattern is any of these three.  The replacement is a backslash prior to whichever character was found.

So, "." becomes "\.", "(" becomes "\(", and ")" becomes "\)".

I checked the constructed regex pattern and these substitutions are happening as expected.
Note: I used the VBA Replace() function in the original code.  I am now using the regex object's Replace() method.

The Type mismatch run-time error 13 is still an issue
On what line is the error occurring?
Please paste the routine code into your response comment.

I am testing this with Access 2010.

==================
I did a grocery store run this morning and will run another errand after 11.  I have not looked at the last rows to see why they match.
strCodes = strCodes & “|” & oRE.Replace (rs!code, “\$1”)
Is there really a space after Replace?
yes

sorry - no

strCodes = strCodes & “|” & oRE.Replace(rs!code, “\$1”)
CIO matches
ACOSTA, JUAN IGNACIO
AGUILERA, MAURICIO
QUINTANA, HORACIO

LDA matches
SUAREZ, NILDA MARGARITA

S A matches
ACOSTA, CARLOS ALBERTO
AGOSTINI, CARLOS AURELIO
AGROPECUARIA, LOS ABUELOS
PAZ, LUIS ALBERTO
PETERSEN, TOMAS ANDRES

SL matches
ISLA, SERGIO
ISLAS, SILVIA MARINA


USA matches
RAMIREZ, SUSANA RAQUEL


SCA matches
REYES PACHEO, OSCAR
SILVA, HUGO OSCAR
SULLIVAN, OSCAR
Please paste the function code that you are invoking.
Ah.  You must have used Word to do some editing.  The line that is causing you an error contains smart quotes rather than ASCII quotes.
I will look at the code line again.
As for the names that are incorrectly appearing in the results - this is not acceptable.  
I can't have every occurance of the code within in a "word" - it must be the code word only - somewhere in the field. in the field.
I think for all the 2 & 3 character text [codes], I will have to store preceding and trailing spaces in the table unless -as I have asked previously- you can do this in the code?



The examples I provided yesterday should have clarified this:

1. [code] = " AB" (preceded w/ [space])
            should return: "Company name AB",  not "ABRUSCO, JAMES"
2. [code] = "Bank" should return:
               "Bank of NY", "European Banking Society" and "Mr. Joseph Banks".  
This would translate to matching
  1. The exact [code] word with any of the words in the Business_name field
  2. The beginning of the [code] word with the beginning of any of the words in the Business_Name field

NOT the [code] string anywhere in the Business_Name field
These results are based on the codes in the table and the one that I added.

These (last rows) matches are for codes that do not begin with a space.  Therefore, they will match anywhere in the business name.

This brings us back to the conversation that I tried to have in the original question.
Was my last comment more helpful?
This would translate to matching
  1. The exact [code] word with any of the words in the Business_name field
  2. The beginning of the [code] word with the beginning of any of the words in the Business_Name field

NOT the [code] string anywhere in the Business_Name field 
Was my last comment more helpful?
Yes
Great
ASKER CERTIFIED SOLUTION
Avatar of aikimark
aikimark
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I don't see any changes in this version of the function as compared to the former? Are you sure you pasted the new version?
I get 3927 records - including 4132, 4134, 4136, 4137, 4147, 4150 - most likely because I am editing the [codes] in the BusinessCodes table, i.e. adding spacesto the 2-character codes.

Yes.  I added \b in this line:
strCodes = "\b(?:" & strCodes & ")"

Open in new window

Not sure why I'm not getting the same results as you? Are you getting any records 4132-4150?
No.  The last ID is 4129.
Coleen

If you change the code table, please close and reopen the database.

Alternatively, you can open the VBA code window (Alt+F11) and perform a Run | Reset.

The function builds the code patterns once and reuses that for each invocation.  This allows the function to be performant.
Thank you - I was not aware of that.
Unfortunately I am still getting records 4132 and 4134.  These would both fall under the "S A" code, that I thought I took care of with adding a space: " S A". Is your code adding these spaces as well?

Are you using the codes table from the database that I posted?  It contains an SA code.  You should check that to see if it needs a leading space character.  I added it prior to some of the comments you made, clarifying the filtering requirements.
Yes I was, but I have since edited all of the 2-character codes, adding preceding spaces
and the records in question contain "S -space- A": CARLOS ABRUZZI
I've done some testing this morning.  The way I'm treating entries like "(NORWAY)" isn't correct.  Would you like to see that version of the code or is what you already have sufficient?
absolutely- please share.
Public Function RegexMatch(ByVal parmBusinessCode As Variant) As Boolean
    Static oRE As Object
    Dim rs As Recordset
    Dim strCodes As String
    Dim strWholeWordCodes As String
    Dim strDelimitedCodes As String
    
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
        oRE.pattern = "([.()])"
        
        Set rs = DBEngine(0)(0).OpenRecordset("Select Code from [T_BusinessCodes]")
        Do Until rs.EOF
            If Mid(rs!code, 1, 1) = " " Then
                strWholeWordCodes = strWholeWordCodes & "|" & oRE.Replace(rs!code, "\$1")
            ElseIf (rs!code Like "[!A-Za-z]*") Or (rs!code Like "*[!A-Za-z]") Then
                strDelimitedCodes = strDelimitedCodes & "|" & oRE.Replace(rs!code, "\$1")
            Else
                strCodes = strCodes & "|" & oRE.Replace(rs!code, "\$1")
            End If
            rs.MoveNext
        Loop
        
        oRE.Global = False
        oRE.ignorecase = True
        strCodes = Mid(strCodes, 2)
        strCodes = "\b(?:" & strCodes & ")"
        strWholeWordCodes = Mid(strWholeWordCodes, 2)
        strWholeWordCodes = "(?:" & strWholeWordCodes & ")\b"
        strDelimitedCodes = Mid(strDelimitedCodes, 2)
        strDelimitedCodes = "(?:" & strDelimitedCodes & ")"
        oRE.pattern = strCodes & "|" & strWholeWordCodes & "|" & strDelimitedCodes
    End If
    
    If IsNull(parmBusinessCode) Then
        RegexMatch = False
        Exit Function
    End If
    
    RegexMatch = oRE.test(parmBusinessCode)

End Function

Open in new window

Coleen, you might also want to take a look at my article on complex text searches.  It has a database with code and sample data that allows you to do complex searches.  The article is located here.