asked on
Query Help - possible VBA solution? Array?
I need help with an Access query that checks a 'Business_Name' field in a table with approximately 1.3m records, against a table of about 300 corporate identifier codes.
My query provides results, but they are not complete. I've included both tables (unjoined) and I'm using an InStr function to get these results. I have a feeling there is a solution for this but that it will involve VBA, which I am NOT proficient in!
I've provide my SQL below. FYI - [business_name_tx] is in table "T_UCVI" and [Code] is in table "T_BusinessCodes"
My current SQL:
SELECT DISTINCTROW T_UCVI.ID, T_UCVI.BUSINESS_NAME_TX, InStr(1,[business_name_tx]
FROM T_BusinessCodes, T_UCVI
GROUP BY T_UCVI.ID, T_UCVI.BUSINESS_NAME_TX, InStr(1,[business_name_tx]
HAVING (((InStr(1,[business_name_
ORDER BY T_UCVI.BUSINESS_NAME_TX;
Thanks for all help!
ASKER
I was referring to my query where I included both tables - I did not attach anything here - provided the sql
If some of the names start with spaces, run an update query first to clean up that mess.
What is missing? We can't see your results. We have no idea what your data looks like so we have no idea what might be wrong with your query.
I did not attach anything here - provided the sql
Not useful enough for us to help you - we aren't psychic.
Without sample data is worse than guessing...Your data could be everything..
One thing is certain...the Instr is probably a performance killer...
Hi Coleen
As the others had said, it's a bit difficult to say if there is a better way without seeing an example of the data from both tables. Just a few rows from each would be helpful.
With that said:
a. You never want a query without a join if you can avoid it. Even a join using a WHERE expression is better than nothing.
b. HAVING is always applied after the records are fetched. Always try and put any restrictions on the data as a WHERE.
Jim.
My query provides results, but they are not complete.How do you know that?
ASKER
T_BusinessCodes: Contains 500+ scenarios of potential codes/phrases that would indicate a business name
T_UCVI: Contains a subset of Business_Names
q_CpChk: is the query I am intersted in optimizing
q_delta: displays those records that should have been captured in the q_CpChk query
The ultimate goal is to identify Business Names from Individual's Names. It is understood this is a first-pass with manual reviews being done on the results, however this should capture more than 90% of the business names.
Query q_Cpchk is capturing most instances. The reords it doesn't capture (q_delta) contain messy data, i.e. they contain a phrase from T_BusinessCodes in the business_name, but they are not preceeded with, or followed by spaces. Perhaps I should be using something other than InStr?
I know I should never have unjoined tables in a query, I am concerned about crashing though when using 'real data', which is over 1.5m records/ month. I have in my head this may a good use for any array - but thats as far as my knowledge base goes!
Thanks in advance.
EE_Sample.accdb
Add a module to your database and paste the following routine:
Option Explicit
Public Function PartialCodeMatch(ByVal parmBusinessCode As Variant) As Boolean
Dim vParts As Variant
Dim vItem As Variant
If IsNull(parmBusinessCode) Then
PartialCodeMatch = False
Exit Function
End If
vParts = Split(parmBusinessCode, " ")
For Each vItem In vParts
If IsNull(DLookup("Code", "[T_BusinessCodes]", "code='" & Replace(vItem, "'", "''") & "'")) Then
Else
PartialCodeMatch = True
Exit Function
End If
Next
PartialCodeMatch = False
End Function
Run this query. It took about 10 seconds on your sample database.
SELECT T_UCVI.ID, T_UCVI.BUSINESS_NAME
FROM T_UCVI
WHERE PartialCodeMatch([T_UCVI].[BUSINESS_NAME])=True
ORDER BY T_UCVI.ID;
Add this routine to your module.
Public Function RegexMatch(ByVal parmBusinessCode As Variant) As Boolean
Static oRE As Object
Dim rs As Recordset
Dim strCodes As String
If oRE Is Nothing Then
Set rs = DBEngine(0)(0).OpenRecordset("Select Code from [T_BusinessCodes]")
Do Until rs.EOF
strCodes = strCodes & "|" & Trim(rs!Code)
rs.MoveNext
Loop
Set oRE = CreateObject("vbscript.regexp")
oRE.Global = False
oRE.ignorecase = True
oRE.pattern = "\b(?:" & Mid(strCodes, 2) & ")\b"
End If
If IsNull(parmBusinessCode) Then
RegexMatch = False
Exit Function
End If
RegexMatch = oRE.test(parmBusinessCode)
End Function
Test this query.
SELECT T_UCVI.ID, T_UCVI.BUSINESS_NAME
FROM T_UCVI
WHERE RegexMatch([T_UCVI].[BUSINESS_NAME])=True
ORDER BY T_UCVI.ID;
This solution produces more matches than the first one I posted. You will need to test and validate the results.
ASKER
Out of curiosity, how many matches does your second solution produce?
My current query q_CpChk produces 4124 results out of 4131 records. If your solution produces fewer than this, it is not helpful.
As mentioned, the 7 records my query does not produce (q_delta) contain a valid business code within the name, but InStr is not picking them up.
Did you save the module after posting the code?
The second produces 3425 rows. The first solution produces 3333 rows.
Please invoke that second solution and post a row that your solution finds that neither of my solutions returns.
ASKER
You can see the results of my query by running q_CpChk
Since I can't run your query, I cannot provide the delta for you
Since I can't run your queryYou are receiving a Undefined Function RegexMatch in expression?
ASKER
ASKER
2. Are you looking for exact word matches?
In your original example, rows 4-5 match the "Usa" code, but that only matches the start of the word in the business name. Is that what you really need? I have been trying to match the entire word.
BUSINESS_NAME
**************N├O USAR******************
*NON Usare
==========================
Tweaking the regex pattern to only match the leading characters of the word returns 4124 rows.
Please test the attached database.
EE_Sample--1-.accdb
ASKER
I'll have a look at your database tomorrow, although it sounds as if your solution is just producing the same results as my original query - and it is still not capturing the 7 other records,.
And for clarification how many T_BusinessCodes you expect to have ...if they are just 500 maybe things could get some speed.
I'm not clear how this could be misunderstood.For me, it isn't a matter of "understanding", rather it is a matter of "why would someone do such a thing in the first place". I'm sure that you or your client has a reason.
ASKER
Your last sample captures as many records as I believe we can expect -and- it is much faster than my query - great!
The two records it does not capture are:
4128. name S.R.L
4130. name Sαrl aux mains de Mr name
Record 4128 has a punctuation error - missing the last ".". I can accommodate for this in the T_BusinessCodes table - not a big deal.
Record 4130 contains the word "Sarl" but it's not being picked up. I'm thinking the "a" is being interpreted as another character based on the font for that one character. The input for this data can be from anywhere in the world, so there is a possibility some characters don't translate properly I suppose.
Bottom line - this is a much faster and more efficient solution than what I had, thank you.
Public Function RegexMatch(ByVal parmBusinessCode As Variant) As Boolean
Static oRE As Object
Dim rs As Recordset
Dim strCodes As String
If oRE Is Nothing Then
Set rs = DBEngine(0)(0).OpenRecordset("Select Code from [T_BusinessCodes]")
Do Until rs.EOF
strCodes = strCodes & "|" & Replace(rs!Code, ".", "\.")
rs.MoveNext
Loop
Set oRE = CreateObject("vbscript.regexp")
oRE.Global = False
oRE.ignorecase = True
oRE.pattern = Mid(strCodes, 2)
End If
If IsNull(parmBusinessCode) Then
RegexMatch = False
Exit Function
End If
RegexMatch = oRE.test(parmBusinessCode)
End Function
==========================
You are correct that the business name in row 4130 is not in the standard ASCII set. It is an extended ASCII character with a value of 225.
ASKER
I beleive I understand what what this line of code is doing, but would you please clarify for me?
Thanks also for the explanation on the extended ASCII character set. This is not a huge issue at this time, so I will leave it for now.
It is possible I may revisit it in the future though.
Replace(rs!Code, ".", "\.")
In the above piece of code, I'm replacing every period character with two characters ("\."), a backslash followed by a period character.In regex patterns, the backslash character is an escape for the character that follows it. This is necessary in cases where the character we want to match is one of those we use for regex pattern definitions. For instance, if we wanted to match parentheses characters or square bracket characters, we would precede them with a backslash character.
==========================
If you need to match the extended ASCII set, you would need to change the (mostly) vowels in the codes to accommodate differently accented vowels in the 192-255 range. Even though the regex object I'm instantiating ignores case, it won't address this (extended ASCII) character encoding issue. Here are the extended Latin letter characters (reference: https://www.ascii-code.com/ ).
Latin capital letter A with grave
Latin capital letter A with acute
Latin capital letter A with circumflex
Latin capital letter A with tilde
Latin capital letter A with diaeresis
Latin capital letter A with ring above
Latin capital letter AE
Latin capital letter C with cedilla
Latin capital letter E with grave
Latin capital letter E with acute
Latin capital letter E with circumflex
Latin capital letter E with diaeresis
Latin capital letter I with grave
Latin capital letter I with acute
Latin capital letter I with circumflex
Latin capital letter I with diaeresis
Latin capital letter ETH
Latin capital letter N with tilde
Latin capital letter O with grave
Latin capital letter O with acute
Latin capital letter O with circumflex
Latin capital letter O with tilde
Latin capital letter O with diaeresis
Latin capital letter O with slash
Latin capital letter U with grave
Latin capital letter U with acute
Latin capital letter U with circumflex
Latin capital letter U with diaeresis
Latin capital letter Y with acute
Latin capital letter THORN
Latin small letter sharp s - ess-zed
Latin small letter a with grave
Latin small letter a with acute
Latin small letter a with circumflex
Latin small letter a with tilde
Latin small letter a with diaeresis
Latin small letter a with ring above
Latin small letter ae
Latin small letter c with cedilla
Latin small letter e with grave
Latin small letter e with acute
Latin small letter e with circumflex
Latin small letter e with diaeresis
Latin small letter i with grave
Latin small letter i with acute
Latin small letter i with circumflex
Latin small letter i with diaeresis
Latin small letter eth
Latin small letter n with tilde
Latin small letter o with grave
Latin small letter o with acute
Latin small letter o with circumflex
Latin small letter o with tilde
Latin small letter o with diaeresis
Latin small letter o with slash
Latin small letter u with grave
Latin small letter u with acute
Latin small letter u with circumflex
Latin small letter u with diaeresis
Latin small letter y with acute
Latin small letter thorn
Latin small letter y with diaeresis
Both the standard and extended alternatives would need to be placed into the code we use for the pattern.
Example:
Code = Sarl
matching regex pattern = S[aÀÁÂÃÄÅàáâãäå]rl
As I stated in an earlier comment, it can be done but it isn't trivial.
ASKER
Thanks also for your explanation regarding the line of code:
I thought you were trying to capture record 4128 which was missing the last period. But if it's not there, how can you test for it, right?In regex patterns, the backslash character is an escape for the character that follows it. This is necessary in cases where the character we want to match is one of those we use for regex pattern definitions.
All good. Thanks again for your help and insight.
I don't see any attachments.