Link to home
Start Free TrialLog in
Avatar of Coleen Sullivan
Coleen SullivanFlag for United States of America

asked on

MS Access: Expanding the scope of an existing Regex Function

I recently received help with a Regex function that is working well, however I would like to refine it so that it runs more efficiently on over 2mm records.  
The existing function matches Codes from T_BusinessCodes with business names in T_MMA and updates field BizCd in T_MMA as 'Corp' if true.

Rather than have the function review each record for each code, I realized there are different categories of codes that can be applied with more certainty than others. By updating the codes based on category, we can eliminate those records from the next pass-through of the function and have also assigned a category for further analysis.

I have added the field:  'Category' in T_BusinessCodes which contains 5 categories:
Corp2
Corp
Geo
Proper2
Proper
I would like to run the function separately for each category (in the order noted)
on all records where the 'BizCd' field is Null 
and update the 'BizCd' field with the category name.

The updated T_BusinessCodes table is attached as an Excel file: T_BusinessCodes.xlsx
The SQL for the query that calls the the RegexMatch function:
UPDATE T_MMA SET 
WHERE (((T_MMA.CDT_BUSINESS_CLASSIFICATION_CD) Like "*Ind*") AND ((T_MMA.BizCd) Is Null) AND ((RegexMatch([T_MMA].[BUSINESS_NAME_TX]))=True));

Open in new window

The 'update-to' value in the query should reflect the category that the function is processing.

The current Regex Function:

Public Function RegexMatch(ByVal parmBusinessName As Variant) As Boolean
    Static oRE As Object
    Dim rs As Recordset
    Dim strCodes As String
    
    If oRE Is Nothing Then              'only do this once in the life of the open database
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True               'look at all possible matches
        oRE.Pattern = "([.()+])"        'characters that need escaping
        
        'clean up and concatenate the codes
        Set rs = DBEngine(0)(0).OpenRecordset("Select Trim(Code) As trim_code from [T_BusinessCodes]")
        Do Until rs.EOF
        
        'Replace is escaping the characters with a backslash
        strCodes = strCodes & "|" & oRE.Replace(rs![trim_code], "\$1")
          
          rs.MoveNext
        
        Loop
        
        oRE.Global = False                      'stop looking with the first match
        oRE.ignorecase = True
        strCodes = Mid(strCodes, 2)                  'skip the leading pipe character
        strCodes = " (?:" & strCodes & ")[ .,;]"     'put finishing touches on the pattern
        oRE.Pattern = strCodes
    End If
    
    If IsNull(parmBusinessName) Then
        RegexMatch = False
        Exit Function
    End If
    
    'prepend and append a space to the business name and invoke the
    'regex .test() method
    RegexMatch = oRE.test(" " & parmBusinessName & " ")

End Function


Open in new window



Avatar of aikimark
aikimark
Flag of United States of America image

Your Update query isn't complete
What has changed with the routine?  Looks the same.

Do you still have complete business names in the codes table?

What's the difference between "Corp" and "Corp2" category?
Likewise, what is the difference between "Proper" and "Proper2"?
Avatar of Coleen Sullivan

ASKER

I do not have an 'update to' value in the query as I need that to reflect the category that is being processed

Category descriptions:
Corp2:      Acronyms or abbreviations I have a high level of confidence in being a Corporate -vs- Individual name
Corp:        Words or phrases I have a high level of confidence in being Corporate -vs- Individual name
Proper2:   Words or symbols that are used in both Corporate & Proper names
Proper:     Words that are found in both Corporate & Proper names
Geo:         Geographic location words found in both Corporate & Proper names

Categorizing the codes will help me when reviewing results.


What has changed with the routine?  Looks the same.
The change is that I want the function to run separately for each category and update the (NULL) BizCd field accordingly.  Each time the function runs, it will be processing fewer and fewer records.
Do you still have complete business names in the codes table? 
There may be a few instances of full business names, but there shouldn't be many

Avatar of Norie
Norie

Coleen

Are you sure you need to expand/change the function?

Can't you do what you want with multiple queries with appropriate criteria?
ASKER CERTIFIED SOLUTION
Avatar of aikimark
aikimark
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
In order to get the performance you need, the regex object is persisted between function invocations (Static declaration).

If this version of the function is too slow, then I can alter it, adding parameters to rebuild the regex object and limit which business code rows are used to build the regex object.
At this point, I'm not sure that you need to include the function in your WHERE clause, just your SET clause
In order to get the performance you need, the regex object is persisted between function invocations (Static declaration).
I'm not sure what you mean...

If this version of the function is too slow, then I can alter it, adding parameters to rebuild the regex object and limit which business code rows are used to build the regex object. 
Yes - this is what I am trying to accomplish by categorizing
Please test what I have posted.
Note: I added an index to the code column of the business codes table
I'm getting error: 'undefined function RegexMatch in expression'
I changed the name of the function, since it now returns a code/Null value.
This is PERFECT! Thank you!!
How's the performance?
Performance is an issue when criteria are specified for the code category.
  • T_BusinessCpInd: contains all records for processing 
  • q_Look_BizCpInd: shows total BizCd's already processed (outside the Function)
  • T_BusinessCodes: contains updated business codes/categories.  Hierarchy for category processing should be: Corp1, Corp2, Geo, Proper1, Proper2
  • q_BizCpInd_Master: calls the function.  It is currently a select querry with no criteria specified.  The goal here is to update the (Null) BizCd with the category generated by the function, in hierarchy order.
Thanks for having a look.

Thanks for posting.
Question: Is the ID field in the business name table used for anything?
The ID field in T_BusinessCodes is not used anywhere else
The ID field in T_BusinessCPInd is derived from the master monthly table and used in multiple other places
There are duplicate business names in the T_BusinessCPInd table.  This is about 5% of the remaining (Null code) rows that need to be processed.

I love that you've already classified 62% of the total rows.  Bravo.

I'll include some performance testing with the user function testing.
Yes, there will be duplicate names in the table to update.

I created a new table of unique business names where the associated bizcd field is null.  That gives me 671201 rows to update.

I added a Frequency column to the business codes table.

I ran routine that would get the word frequency from the unique names and updated the new Frequency column.  The "words" were contiguous non-punctuation characters.  About 1/3 of the business code table contains multiple words or punctuation characters.  I updated those frequencies in a separate query (a very inefficient/slow process).

I created a new regex matching routine (RegexMatch_Code2) that will accept a category as a (new) parameter.  The regex pattern is now based on the descending frequency of the business codes.  This places the most frequent codes at the start of the regex pattern.

I invoked the RunUpdate routine against the categories in the order you prescribed.  Here are the run times (11 mins total).

Starting Corp2	72880.76	Finished    73167.19 
Starting Corp1	73167.2 	Finished    73480.91 
Starting Corp 	73480.91	Finished    73501.72 
Starting Geo  	73501.72	Finished    73533.45 
Starting Proper2	73533.46	Finished    73554.84 
Starting Proper	73554.84	Finished    73572.56 

Open in new window


There are now 445269 rows in the UniqueNotNull table with null bizcd values.  Is this an expected number?

The UniqueNotNull table can now be used to update the business name table.
SampleDBcode.bas
Thank you.  I will review in detail over the weekend.
The 445269 unmatched/null biz codes records sounds accurate.

I've tweaked the RegexMatch_Code2() code, correcting a problem where a miss-typed category name will match business names that contain double space sequences in the business name.

I'm assuming that your data prep tasks include removing empty string business names.  If that isn't the case, please let me know.
SampleDBcode.bas
My preliminary runs assign a 'Corp1' category to records with 0 spaces, which would include those that are empty strings as well.
If you preprocess the data, assigning 'Corp1' to empty strings, then the SQL will not feed such rows to the routine.  Something seems wrong about having an entity with no name.  I usually classify such entities as 'bad' or 'invalid' or don't import them.

More Data Analysis

Here's my list of 259 business codes in the Corp2 category that have a zero frequency.  There might be some typos in the data (I know you're still working on this).  For instance, there is a "Homduras", which might be a misspelled "Honduras", which I would expect to be in the Geo category.
Aleutians
BOISSEROLLE
Boet efter
Biometric
BRANDSCAPE
ALLOCATIONS
ACTUACION
Beligique
BELGOCLEAN
Aluminate
Ambasadeur
Bankfirst
Amazonica
Agricultur
BEAUMONTEL
Bd of Ed
BANSABADELL
Automatiom
Authoriti
Anteproyectos
Bus Brewing
AFFINITAIR
Archbishop
Applicator
C.C.I.A.A
AIGLEVILLE
Place to Grow
RAPPELS G
Objectiv
Obstetric
Opthalmic
Opthalmics
Opthalmol
Optomistist
Ortopedic
PARC SOLAIRE
PETFOODS
Petunia Patch
PhosAgro
Nutriproducts
Pipe-Roll
Notefull
Pneumatiques
Polinator
PREFEREN
Preparat
Profession
Queenswood
Quiltworks
RAPPEL E
RAPPELS B
RAPPELS C
RAPPELS E
Kilometri
Piggly Wiggly
Mediterrane
L. L. L. P
L. V. M. H
LA GRAPPE
LABORATOTIES
Landsdown
Latokartanontie
LIBERTAT
LISBONNE
Lisheen Mine
Lubrikant
LUXINNOVATION
Magyarorsza
Nybropost
Mazda-Volkswagon
RAPPELS P
Meds & Food
Merchandis
Mietverw
MILITAIRE
Minicipal
Mississipps
MMC Norilsk Nickel
Motorways
Mount Hermon
Mykatrade
NatlVegetblOil
NCV-NEBON
MANTENIMIENT
Trygging
RAPPELS F
Supermarked
Symbiotic
Techmique
Tedarikcileri
Tek-Este
Tempering
Therapeut
TOP GERT
Tot Spot
Transatlant
Transmerida
Subsecret
Triathon
SUBC Prop
Upphandlingsgrupp
Valuator
Variante
VERANSTA
Verkehrsmanagement
Volkswagon
VROUWVERENIGING
WERKPLAAT
Wildhores
Windprojekt
Wohnungsbaugesellschaft
Wohnungseigent
TRES SUERTES
SCHMIEDETECHNIK
RAPPELS S
RealFoudation
Rechtsamt
Recyclery
Redcarpet
Refurbish
REIFENVERKAUF
REPOSICION
Republieke
Republiq
ressource
REVETEMENT
Suntrade
Schekinoazot
Konveyor
Scientifiq
Seaplane
Shell Fee
Sindicat
Siriushof
SNC DU PARC
Sosiaality
Soverign
Stade Koln
StadtsSport
Staritskiy syr
Streekraad
Scheepvaartondernemin
EMPRENDI
Kundengeschenke
Dociedad
Donut King
Drankwinkel
Drip Bar
Dubininskoe
Ecologic
Ecotechnolog
Ecovision
Eigentumsve
Eigentumsverwaltung
Ekaterinburgskiy
Diepenmaat
Embroider
Denso Thermal
EN FAILLIT
ENGENEER
Equensworld
EQUIPAMENT
Esquires
ESTABLECIMIENT
ESTUAIRE
Etrangere
EUR IFMS
Expectra
Extermin
Fayetteville
ELMIN BAUXITES
Communic
CAPENERGIE
CAP'ENFANTS
Caspian Flat Glass
Chiroprac
Chocolat
CHOLAT ANAST
CHOMERAC
CLASSIFIED
CLASSIFIEDS
Cloister
Cocoa Cola
COFITEM-COFIMUR
Diocesano
Commandry
Fidal R/
Composure
Comunita
CONCERTACION
CONFEDERAT
CONSORZIATI
Cornerstone-
Cryogenics
CUMERIO MED
Dbedrock
De leden
Delat tre
DeNationale
Comerico
Ironsides
Inails By Gil
INFIRMIERSwith no traffic or rain
Ingenier
Injectrade
Inn & Spa
Innovator
Insolven
INT DEPT
Inteligen
Interfed
Intermes
Intraservice
Federaco
Inzenjerin
Immigratie
Janitoral
Jardines Et
Kangourou
Kes Caves
Kindregarten
Xterminator
KOMPANIA
Kompeten
Komponent
Kompressor
Konstruk
CAMBIO RUT
Invoices
Gläubigerausschuss
Kumpulantie
Firsthand
Fjardalaz
Fonds BOIS
Fotocolor
Francise
Function
GANDON TPS
Gemeinschaftskernkraftwerk
GENEGIS 2
GENVILLE
Geotek Seismorazvedka
In Falin
Gholf Klub
IMMO VANVES
Gobelinmanufaktur
Golden Falcon
Grill & Pub
Grundstücksgemeinschaft
HB JARDINS
Helicpoter
HELIODROMUS
Hokie Spokes
Homduras
Huiseienaarsvereniging
Huisvereeniging
I.D.F.E.E.V
Feldspath
Gez. Eig

Open in new window

What is the routine you're using to determine frequency of use?  This may be faster than what I'm doing to determine if the code is still being used - which would remove those 259 codes from the table.

I can't control the typos... and yes, it is possible I may have added a misspelled word in the code table in order to capture the record.
I'm using the Bus_Name_Word_Freq() routine.  It looks at the UniqueNull table, which you will have to create from your pre-processed rows.  I would have used a dictionary object, but that isn't available on my laptop :-(

You are welcome to tweak any of the routines to best suit your needs.
Also, I'm not sure what this new routine is doing - or in what order.
This line is confusing:
Sub whole_company_names()

Open in new window

As any full names in the main table that match those in the code table are being handled prior to running this routine, assigning a 'Full' biz code and thereby excluding them from the next pass.  Anyting categorized as 'Full' in the code table should not be included in down-stream processing.
This line is also confusing:
Const busname As String = "Alabama Trusking Ascociation WC Self Incurance Fun"   

Open in new window

Why would this be included in the VBA code?  I can't worry about typos, they will jump out and be forwarded for manual editing.

Finally, if this new routine is indeed determining frequency of use within category, then I would combine Corp1, Corp2 & Geo all as Corp1, and allow your frequency of use theory to determine the order in which it wants to process them.  I would leave Proper 1 & Proper2 as they are as these will need a detailed manual review.
Why are you looking at that routine?

The const string was used as part of my regex performance testing.  There is a commented statement next to it that is the same company name, but misspelled.  That is how I compared match performance against non-match performance.
In the lower left corner of the VB code window, you should see a couple of buttons.  They change the view of the code to single routine or all routines.  I usually look at the code in single routine view.
I see, thank you.  
I have a very limited knowledge of VBA, so was just looking through the entire code you posted. RegexMAtch_Code2 is the newest routine, that will incorporate the categories, yes?
Yes.

You might want to look at RunUpdates()
There's a piece missing from the frequency calculations.  If you're going to need this frequency data, I can add the missing piece to the code base.  The missing piece looks for code matches where the code isn't a single word - containing punctuation or multiple words.

Unless you have some unique routines in your original module, you should probably save it and remove it from the current database to avoid name collisions.
Yes, I would like to see the frequency calculation that considers all the codes in the table - no exclusions.
I will remove my original module as you suggest, thanks.
I'm getting a run-time error "Too few parameters expected', breaking on this line of code:
Set rs = DBEngine(0)(0).OpenRecordset("Select Trim(Code) As trim_code from [T_BusinessCodes] Where Category Like '" & parmCategory & "' Order By Frequency Desc")

Open in new window

Does your business code table have a Frequency column? (Long Integer data type)

How are you invoking the routine?
Also, have you created and populated the UniqueNull table?
1. No, my BusinessCode table does not have a Frequency column
2. The error occurred when invoking the routine in both an update query and a select query.
3. Not sure what you mean by "have you created and populated the UniqueNull table?"  I have run the preliminary queries that populate the BizCd for certain scenarios, leaving the NULL values to be evaluated by the routine.
I added a long-integer "Frequency" field to the T_BusinessCodes table, now receive:
Run-Time error '3052':
File sharing lock count exceeded. Increase MaxLocksPerFile registry entry


on this line of code
Set rs = DBEngine(0)(0).OpenRecordset("Select Trim(Code) As trim_code from [T_BusinessCodes] Where Category Like '" & parmCategory & "' Order By Frequency Desc")
   

Open in new window

I have not populated the Frequency field....not clear on how you're doing this.
Your design work on the table, adding the Frequency column, is probably the cause of the lock condition.

Please wait for me to finish the frequency work before continuing.

Please run a make table query to create a unique null table of business names with DISTINCT values, where the bizcd field is null.  There is no need to run the matching against duplicate values.  If the names are the same, you will populate the category associated with that name when you update the business name table after your category updating work with the routine.
I interpreted your asking about the Frequency column as though the new routine was looking for it. I have removed it now and am using the older routine in the interim.

A couple of observations/questions:
1 - I'm getting run-time errors when I invoke the function as the 'update to' Value in an update query
2 - I removed all categories except Corp1, Corp2 and GEO from the T-BusinessCodes table, as I wanted to assign the records that matched these categories first. I invoked the function as an expression[TRUE] in an update query,  and updated the NULL BizCd field as 'Corp'.  There were only 434 updates which didn't seem right, so I ran it as a select query to visually see if it would pick up more records - and it did. Judging from your request above for unique business names, I am thinking this routine will only assign a code once per invocation, requiring additional invocations - is this correct?  If so, There has to be a better way, I don't want to start running subsequent queries based on Business names, this is just not efficient.

I am sending a new db via messaging so as not to post it on the public site.  This contains the partially populated BizCd field in T_BuisinessCpInd and a new T_BusinessCodes table.
I have removed it
The Frequency column or the routine?

Please allow me to correct problems before making changes.  Some of your changes are presenting me with a moving target of problems.
I'm automating the processing.  I created the following setup routine to do the following:
1. delete & create the T_BusinessCpInd_UniqueNull table
2. Ensure that the business codes table has a Frequency column
3. Ensure there's a table (T_BusinessCpInd_WordsCodes) for words and codes

The T_BusinessCpInd_WordsCodes table is a new one. Rather than directly update the frequency column in the business codes table, I realized that I should populate a table with all the results, rather than just those that matched your known codes.  This might help you create some matching codes or change the categories of some codes.

Updating the frequency column in the business codes table will now happen from the new WordsCodes table.

Sub Cat_Setup()
    'run this to set up the environment prior to categorization of business names
    '
    Dim boolExists As Boolean
    
    '=================================
    '1. delete & create the T_BusinessCpInd_UniqueNull table
    
    On Error Resume Next
    boolExists = ItemExistsWithinCollection("T_BusinessCpInd_UniqueNull", _
                                DBEngine(0)(0).TableDefs)
    If boolExists Then
        DBEngine(0)(0).TableDefs.Delete "T_BusinessCpInd_UniqueNull"
        If Err = 0 Then
            Debug.Print Now, "Successful deletion of T_BusinessCpInd_UniqueNull"
        Else
            If Err = 3265 Then
                Debug.Print Now, "T_BusinessCpInd_UniqueNull not found"
            Else
                Debug.Print Now, "Unexpected error (" & Err & ") -- " & Err.Description
            End If
            Err.Clear
        End If
    End If
    
    DBEngine(0)(0).TableDefs.Refresh
    
    DoEvents
    
    Debug.Print Now, "Start: T_BusinessCpInd_UniqueNull Make Table query"
    DBEngine(0)(0).Execute "SELECT BUSINESS_NAME_TX, BizCd " & _
                            "INTO T_BusinessCpInd_UniqueNull " & _
                            "FROM T_BusinessCpInd " & _
                            "WHERE BizCd Is Null"

    
    If Err <> 0 Then
        Debug.Print Now, "Make Table unexpected error (" & Err & ") -- " & Err.Description
        Debug.Print Now, "Stopping Cat_Setup early"
        Exit Sub
    Else
        Debug.Print Now, "End: T_BusinessCpInd_UniqueNull Make Table query"
    End If
    
    Debug.Print Now, "End: Created T_BusinessCpInd_UniqueNull with " & _
                    DBEngine(0)(0).RecordsAffected & " rows"
    
    DoEvents
        
    '=================================
    '2. Ensure that the business codes table has a Frequency column
    Debug.Print Now, "Start: T_BusinessCodes Frequency column check/reset/create"
    
    boolExists = ItemExistsWithinCollection("Frequency", _
                                DBEngine(0)(0).TableDefs("T_BusinessCodes").Fields)
    
    If boolExists Then
        DBEngine(0)(0).Execute "Update T_BusinessCodes Set Frequency = Null"
    Else
        DBEngine(0)(0).Execute "ALTER TABLE T_BusinessCodes " & _
                                "ADD COLUMN Frequency Long", dbFailOnError
        
        DBEngine(0)(0).TableDefs("T_BusinessCodes").Fields.Refresh
        
        Debug.Print Now, "Add Frequency column unexpected error (" & Err & ") -- " & Err.Description
        Debug.Print Now, "Stopping Cat_Setup early"
        Exit Sub
    
    End If
    Debug.Print Now, "End: T_BusinessCodes Frequency column check/reset/create"
    
    '=================================
    '3. Ensure there's a table (T_BusinessCpInd_WordsCodes) for words and codes
    '   and code frequency results
    '   If exists, then delete all its rows
    '   else create the table
    
    Debug.Print Now, "Start: T_BusinessCpInd_WordsCodes check/reset/create"
    boolExists = ItemExistsWithinCollection("T_BusinessCpInd_WordsCodes", _
                                DBEngine(0)(0).TableDefs)
    
    If boolExists Then
        DBEngine(0)(0).Execute "Delete * From T_BusinessCpInd_WordsCodes"
    Else
        DBEngine(0)(0).Execute "CREATE TABLE [T_BusinessCpInd_WordsCodes] (" & _
                                "WordCode CHAR, " & _
                                "Frequency INTEGER)", dbFailOnError
        
        If Err <> 0 Then
            Debug.Print Now, "Create table T_BusinessCpInd_WordsCodes unexpected error (" & Err & ") -- " & Err.Description
            Debug.Print Now, "Stopping Cat_Setup early"
            Exit Sub
        End If
        
        DBEngine(0)(0).TableDefs.Refresh
        
    End If
    Debug.Print Now, "End: T_BusinessCpInd_WordsCodes check/reset/create"
    
    DoEvents

End Sub

Open in new window

Colleen

Here is a list of the currently defined categories.  Please verify your earlier order of category application, including ALL of the categories.

Corp1
Corp2
Full
Geo
Proper1
Proper2
You might need to reopen this question.
I've finished this round of code updates and testing.
There are two new routines that should automate the setup
Cat_Setup() and Bus_Name_WordCode_Freq().  I've included the log statements from the second one.

Bus_Name_WordCode_Freq
12/9/2020 9:16:37 PM        Start: Bus_Name_WordCode_Freq
12/9/2020 9:16:37 PM        Start: T_BusinessCpInd_UniqueNull survey of words
12/9/2020 9:16:56 PM                      Progress: 10.00%
12/9/2020 9:17:22 PM                      Progress: 20.00%
12/9/2020 9:17:42 PM                      Progress: 30.00%
12/9/2020 9:18:01 PM                      Progress: 40.00%
12/9/2020 9:18:17 PM                      Progress: 50.00%
12/9/2020 9:18:32 PM                      Progress: 60.00%
12/9/2020 9:18:50 PM                      Progress: 70.00%
12/9/2020 9:19:06 PM                      Progress: 80.00%
12/9/2020 9:19:21 PM                      Progress: 90.00%
12/9/2020 9:19:35 PM                      Progress: 100.00%
12/9/2020 9:19:35 PM        End: T_BusinessCpInd_UniqueNull survey of words
12/9/2020 9:19:35 PM        Start: Update T_BusinessCpInd_WordsCodes with (word, frequency) data - 261772 words
12/9/2020 9:19:51 PM        End: Update T_BusinessCpInd_WordsCodes with (word, frequency) data
12/9/2020 9:19:51 PM        Start: T_BusinessCpInd_UniqueNull survey of codes
12/9/2020 9:21:03 PM                      Progress: 10.00%
12/9/2020 9:21:46 PM                      Progress: 20.00%
12/9/2020 9:22:30 PM                      Progress: 30.00%
12/9/2020 9:23:11 PM                      Progress: 40.00%
12/9/2020 9:23:41 PM                      Progress: 50.00%
12/9/2020 9:24:12 PM                      Progress: 60.00%
12/9/2020 9:24:44 PM                      Progress: 70.00%
12/9/2020 9:25:22 PM                      Progress: 80.00%
12/9/2020 9:26:00 PM                      Progress: 90.00%
12/9/2020 9:26:32 PM                      Progress: 100.00%
12/9/2020 9:26:32 PM        End: T_BusinessCpInd_UniqueNull survey of codes
12/9/2020 9:26:32 PM        Start: Update T_BusinessCpInd_WordsCodes with (code, frequency) data - 534 codes
12/9/2020 9:26:32 PM        End: Update T_BusinessCpInd_WordsCodes with (code, frequency) data
12/9/2020 9:26:32 PM        End: Bus_Name_WordCode_Freq
12/9/2020 9:31:29 PM        Start: Update T_BusinessCodes with frequency data
12/9/2020 9:31:30 PM        End: Update T_BusinessCodes with frequency data
12/9/2020 9:31:30 PM        End: Bus_Name_WordCode_Freq

Open in new window


There are a little less than 800 business codes that don't match the UniqueNull data.  One that caught my attention was ".COM".  Since our current rules indicate that the business code must appear after a space character or at the start of the business name, this domain name suffix would always appear after a non-space character.  You can make this a regex pattern and see if it matches.
Try "\w+.COM" or "\S+.COM"
If there are a limited number of such business codes, it might be simpler to do this with SQL, bypassing the routine.
Mod_BizCodes_Match_Function.bas
I tested this pattern and it seems to work the way you want it to:
\S{1,}.COM

Open in new window