We help IT Professionals succeed at work.

Query Help - possible VBA solution? Array?

Hello Experts,

I need help with an Access query that checks a 'Business_Name' field in a table with approximately 1.3m records, against a table of about 300 corporate identifier codes.  
My query provides results, but they are not complete.  I've included both tables (unjoined) and I'm using an InStr function to get these results.  I have a feeling there is a solution for this but that it will involve VBA, which I am NOT proficient in!

I've provide my SQL below.  FYI -  [business_name_tx] is in table "T_UCVI" and  [Code] is in table "T_BusinessCodes"

My current SQL:  
SELECT DISTINCTROW T_UCVI.ID, T_UCVI.BUSINESS_NAME_TX, InStr(1,[business_name_tx],([code])) AS Corp1, InStr(2,[business_name_tx],(" " & [code])) AS Corp2
FROM T_BusinessCodes, T_UCVI
GROUP BY T_UCVI.ID, T_UCVI.BUSINESS_NAME_TX, InStr(1,[business_name_tx],([code])), InStr(2,[business_name_tx],(" " & [code]))
HAVING (((InStr(1,[business_name_tx],([code])))=1)) OR (((InStr(2,[business_name_tx],(" " & [code])))>1))
ORDER BY T_UCVI.BUSINESS_NAME_TX;

Thanks for all help!
Comment
Watch Question

CERTIFIED EXPERT
Top Expert 2014

Commented:
I've included both tables

I don't see any attachments.
Coleen SullivanIndependent Contractor

Author

Commented:

I was referring to my query where I included both tables - I did not attach anything here - provided the sql

CERTIFIED EXPERT
Distinguished Expert 2017

Commented:

If some of the names start with spaces, run an update query first to clean up that mess.


What is missing?  We can't see your results.  We have no idea what your data looks like so we have no idea what might be wrong with your query.

CERTIFIED EXPERT
Top Expert 2014

Commented:
I did not attach anything here - provided the sql

Not useful enough for us to help you - we aren't psychic.
John TsioumprisSoftware & Systems Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:

Without sample data is worse than guessing...Your data could be everything..

One thing is certain...the Instr is probably a performance killer...

Jim Dettman (EE MVE)President / Owner
CERTIFIED EXPERT
Fellow
Most Valuable Expert 2017

Commented:

Hi Coleen


   As the others had said, it's a bit difficult to say if there is a better way without seeing an example of the data from both tables.   Just a few rows from each would be helpful.


  With that said:


a. You never want a query without a join if you can avoid it.   Even a join using a WHERE expression is better than nothing.

b.   HAVING is always applied after the records are fetched.   Always try and put any restrictions on the data as a WHERE.


Jim.

CERTIFIED EXPERT
Most Valuable Expert 2015
Distinguished Expert 2018

Commented:
My query provides results, but they are not complete.
How do you know that?
Coleen SullivanIndependent Contractor

Author

Commented:
Thank you for all the replies.  I've attached a sample database.  
   T_BusinessCodes: Contains 500+ scenarios of potential codes/phrases that would indicate a business name
   T_UCVI: Contains a subset of Business_Names
   q_CpChk: is the query I am intersted in optimizing
   q_delta: displays those records that should have been captured in the q_CpChk query

The ultimate goal is to identify Business Names from Individual's Names.  It is understood this is a first-pass with manual reviews being done on the results, however this should capture more than 90% of the business names.
Query q_Cpchk is capturing most instances.  The reords it doesn't capture (q_delta) contain messy data, i.e. they contain a phrase from T_BusinessCodes in the business_name, but they are not preceeded with, or followed by spaces. Perhaps I should be using something other than InStr?
I know I should never have unjoined tables in a query, I am concerned about crashing though when using 'real data', which is over 1.5m records/ month.   I have in my head this may a good use for any array - but thats as far as my knowledge base goes!
Thanks in advance.
EE_Sample.accdb

CERTIFIED EXPERT
Top Expert 2014

Commented:
Here is one option to test.  I'm going to post another one shortly.

Add a module to your database and paste the following routine:
Option Explicit

Public Function PartialCodeMatch(ByVal parmBusinessCode As Variant) As Boolean
    Dim vParts As Variant
    Dim vItem As Variant
    If IsNull(parmBusinessCode) Then
        PartialCodeMatch = False
        Exit Function
    End If
    vParts = Split(parmBusinessCode, " ")
    For Each vItem In vParts
        If IsNull(DLookup("Code", "[T_BusinessCodes]", "code='" & Replace(vItem, "'", "''") & "'")) Then
        Else
            PartialCodeMatch = True
            Exit Function
        End If
    Next
    PartialCodeMatch = False
End Function

Open in new window


Run this query.  It took about 10 seconds on your sample database.
SELECT T_UCVI.ID, T_UCVI.BUSINESS_NAME
FROM T_UCVI
WHERE PartialCodeMatch([T_UCVI].[BUSINESS_NAME])=True
ORDER BY T_UCVI.ID;

Open in new window

CERTIFIED EXPERT
Top Expert 2014

Commented:
Since we only have about 500 (not 300) codes to check, it was worth exploring a regular expression solution.  This is a much faster solution.

Add this routine to your module.
Public Function RegexMatch(ByVal parmBusinessCode As Variant) As Boolean
    Static oRE As Object
    Dim rs As Recordset
    Dim strCodes As String
    
    If oRE Is Nothing Then
        Set rs = DBEngine(0)(0).OpenRecordset("Select Code from [T_BusinessCodes]")
        Do Until rs.EOF
            strCodes = strCodes & "|" & Trim(rs!Code)
            rs.MoveNext
        Loop
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = False
        oRE.ignorecase = True
        oRE.pattern = "\b(?:" & Mid(strCodes, 2) & ")\b"
    End If
    
    If IsNull(parmBusinessCode) Then
        RegexMatch = False
        Exit Function
    End If
    
    RegexMatch = oRE.test(parmBusinessCode)

End Function

Open in new window


Test this query.
SELECT T_UCVI.ID, T_UCVI.BUSINESS_NAME
FROM T_UCVI
WHERE RegexMatch([T_UCVI].[BUSINESS_NAME])=True
ORDER BY T_UCVI.ID;

Open in new window


This solution produces more matches than the first one I posted.  You will need to test and validate the results.
Coleen SullivanIndependent Contractor

Author

Commented:
I get error:  "Undefined Function RegexMatch in expression"
Out of curiosity, how many matches does your second solution produce?
My current query q_CpChk produces 4124 results out of 4131 records.  If your solution produces fewer than this, it is not helpful.
As mentioned, the 7 records my query does not produce (q_delta) contain a valid business code within the name, but InStr is not picking them up.
CERTIFIED EXPERT
Top Expert 2014

Commented:
Did you add the routine (code) I posted?
Did you save the module after posting the code?

The second produces 3425 rows.  The first solution produces 3333 rows.

Please invoke that second solution and post a row that your solution finds that neither of my solutions returns.
Coleen SullivanIndependent Contractor

Author

Commented:
All code is added and module saved -
You can see the results of my query by running q_CpChk
Since I can't run your query, I cannot provide the delta for you
CERTIFIED EXPERT
Top Expert 2014

Commented:
Since I can't run your query
You are receiving a Undefined Function RegexMatch in expression?
Coleen SullivanIndependent Contractor

Author

Commented:
correct
CERTIFIED EXPERT
Top Expert 2014

Commented:
Please post the sample database where you've pasted the regexmatch function
Coleen SullivanIndependent Contractor

Author

Commented:
EE_Sample.accdb
Your query: q_eeSolution2
Module: Mod_ee
CERTIFIED EXPERT
Top Expert 2014

Commented:
thanks
CERTIFIED EXPERT
Top Expert 2014

Commented:
1. Mystery solved.  You created a Class Module, not a regular module.  We need to resolve your exact matching requirements.

2.  Are you looking for exact word matches?
In your original example, rows 4-5 match the "Usa" code, but that only matches the start of the word in the business name.  Is that what you really need?  I have been trying to match the entire word.
BUSINESS_NAME
**************N├O USAR******************
*NON Usare

Open in new window


===========================
Tweaking the regex pattern to only match the leading characters of the word returns 4124 rows.

Please test the attached database.
EE_Sample--1-.accdb
Coleen SullivanIndependent Contractor

Author

Commented:
I am using the InStr function to find ANY occurrance of [code] within the business_name field, regardless of start-location. I'm not clear how this could be misunderstood.
I'll have a look at your database tomorrow, although it sounds as if your solution is just producing the same results as my original query - and it is still not capturing the 7 other records,.

John TsioumprisSoftware & Systems Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
Just to understand something...the problem is the No or records ...or the speed?
And for clarification how many T_BusinessCodes you expect to have ...if they are just 500 maybe things could get some speed.
CERTIFIED EXPERT
Top Expert 2014

Commented:
Which 7 records/rows are you missing?
CERTIFIED EXPERT
Top Expert 2014

Commented:
I'm not clear how this could be misunderstood.
For me, it isn't a matter of "understanding", rather it is a matter of "why would someone do such a thing in the first place".  I'm sure that you or your client has a reason.
CERTIFIED EXPERT
Top Expert 2014
Commented:
I've tweaked the regex pattern.  It now returns 4128 rows
EE_Sample--1-.accdb
Coleen SullivanIndependent Contractor

Author

Commented:
Thank you!
Your last sample captures as many records as I believe we can expect -and- it is much faster than my query - great!
The two records it does not capture are:
     4128. name S.R.L
     4130. name Sαrl            aux mains de Mr name

Record 4128 has a punctuation error - missing the last ".".  I can accommodate for this in the T_BusinessCodes table - not a big deal.
Record 4130 contains the word "Sarl" but it's not being picked up. I'm thinking the "a" is being interpreted as another character based on the font for that one character. The input for this data can be from anywhere in the world, so there is a possibility some characters don't translate properly I suppose.
Bottom line - this is a much faster and more efficient solution than what I had, thank you.
CERTIFIED EXPERT
Top Expert 2014

Commented:
Since some of your codes have period characters, please replace your current RegexMatch routine with this one.  It forces the period in character(s) in the code to match a period character.
Public Function RegexMatch(ByVal parmBusinessCode As Variant) As Boolean
    Static oRE As Object
    Dim rs As Recordset
    Dim strCodes As String
    
    If oRE Is Nothing Then
        Set rs = DBEngine(0)(0).OpenRecordset("Select Code from [T_BusinessCodes]")
        Do Until rs.EOF
            strCodes = strCodes & "|" & Replace(rs!Code, ".", "\.")
            rs.MoveNext
        Loop
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = False
        oRE.ignorecase = True
        oRE.pattern = Mid(strCodes, 2)
    End If
    
    If IsNull(parmBusinessCode) Then
        RegexMatch = False
        Exit Function
    End If
    
    RegexMatch = oRE.test(parmBusinessCode)

End Function

Open in new window


================================
You are correct that the business name in row 4130 is not in the standard ASCII set.  It is an extended ASCII character with a value of 225.
CERTIFIED EXPERT
Top Expert 2014

Commented:
It is possible to match extended ASCII characters as well as unicode characters (> 255).  However, it isn't a trivial task.
Coleen SullivanIndependent Contractor

Author

Commented:
Thank you. I replaced the RegexMatch routine as you suggested, however it is still not picking up record 4128.
I beleive I understand what what this line of code is doing, but would you please clarify for me?
     strCodes = strCodes & "|" & Replace(rs!Code, ".", "\.") 

Thanks also for the explanation on the extended ASCII character set.  This is not a huge issue at this time, so I will leave it for now.
It is possible I may revisit it in the future though.
CERTIFIED EXPERT
Top Expert 2014

Commented:
Replace(rs!Code, ".", "\.") 

Open in new window

In the above piece of code, I'm replacing every period character with two characters ("\."), a backslash followed by a period character.

In regex patterns, the backslash character is an escape for the character that follows it.  This is necessary in cases where the character we want to match is one of those we use for regex pattern definitions.  For instance, if we wanted to match parentheses characters or square bracket characters, we would precede them with a backslash character.

===============================
If you need to match the extended ASCII set, you would need to change the (mostly) vowels in the codes to accommodate differently accented vowels in the 192-255 range.  Even though the regex object I'm instantiating ignores case, it won't address this (extended ASCII) character encoding issue.  Here are the extended Latin letter characters (reference: https://www.ascii-code.com/ ).

Latin capital letter A with grave
Latin capital letter A with acute
Latin capital letter A with circumflex
Latin capital letter A with tilde
Latin capital letter A with diaeresis
Latin capital letter A with ring above
Latin capital letter AE
Latin capital letter C with cedilla
Latin capital letter E with grave
Latin capital letter E with acute
Latin capital letter E with circumflex
Latin capital letter E with diaeresis
Latin capital letter I with grave
Latin capital letter I with acute
Latin capital letter I with circumflex
Latin capital letter I with diaeresis
Latin capital letter ETH
Latin capital letter N with tilde
Latin capital letter O with grave
Latin capital letter O with acute
Latin capital letter O with circumflex
Latin capital letter O with tilde
Latin capital letter O with diaeresis
Latin capital letter O with slash
Latin capital letter U with grave
Latin capital letter U with acute
Latin capital letter U with circumflex
Latin capital letter U with diaeresis
Latin capital letter Y with acute
Latin capital letter THORN
Latin small letter sharp s - ess-zed
Latin small letter a with grave
Latin small letter a with acute
Latin small letter a with circumflex
Latin small letter a with tilde
Latin small letter a with diaeresis
Latin small letter a with ring above
Latin small letter ae
Latin small letter c with cedilla
Latin small letter e with grave
Latin small letter e with acute
Latin small letter e with circumflex
Latin small letter e with diaeresis
Latin small letter i with grave
Latin small letter i with acute
Latin small letter i with circumflex
Latin small letter i with diaeresis
Latin small letter eth
Latin small letter n with tilde
Latin small letter o with grave
Latin small letter o with acute
Latin small letter o with circumflex
Latin small letter o with tilde
Latin small letter o with diaeresis
Latin small letter o with slash
Latin small letter u with grave
Latin small letter u with acute
Latin small letter u with circumflex
Latin small letter u with diaeresis
Latin small letter y with acute
Latin small letter thorn
Latin small letter y with diaeresis


Both the standard and extended alternatives would need to be placed into the code we use for the pattern.

Example:
Code = Sarl
matching regex pattern = S[aÀÁÂÃÄÅàáâãäå]rl

As I stated in an earlier comment, it can be done but it isn't trivial.
Coleen SullivanIndependent Contractor

Author

Commented:
Thanks. Yes, I will definitely leave trying to match the extended ASCII characters alone for now!
Thanks also for your explanation regarding the line of code:  
strCodes = strCodes & "|" & Replace(rs!Code, ".", "\.")
In regex patterns, the backslash character is an escape for the character that follows it.  This is necessary in cases where the character we want to match is one of those we use for regex pattern definitions.
I thought you were trying to capture record 4128 which was missing the last period. But if it's not there, how can you test for it, right?
All good. Thanks again for your help and insight.


CERTIFIED EXPERT
Top Expert 2014
Commented:
I guess it's time to close this question.