tawathav
asked on
Access Query Match/Categorize
Hi
Does anyone know of a way to categorize like items in Access? I am thinking this would involve a bit of VBA/SQL. For example, if I have the following :
Supplier
ABC Co
ABC
ABC Comp
ABC Company
ABC Ltd.
I would like it categorize all of the above as ABC in another field based on a 80% match.
Is something like this possible?
Thank you
Does anyone know of a way to categorize like items in Access? I am thinking this would involve a bit of VBA/SQL. For example, if I have the following :
Supplier
ABC Co
ABC
ABC Comp
ABC Company
ABC Ltd.
I would like it categorize all of the above as ABC in another field based on a 80% match.
Is something like this possible?
Thank you
ASKER
hmmmm.....good point. Any other suggestions? Also, does anyone have any sample code?
Thank you so much! I am working on something that has over 1 mil records and so I need to do something other than manual.
Thanks again
Thank you so much! I am working on something that has over 1 mil records and so I need to do something other than manual.
Thanks again
Do you have any other fields (address, telephone #) that could be used to help refine the problem?
I assume that you are attempting to try to identify all of the "duplicates" so that you only have a single record for each supplier, not multiples with slightly different configurations?
I do something similar to what I described above when someone is creating a new user account in many of my databases. I pass in the users name and parse that value and the users name fields from my database and if there are any matches, then I generally use a pop-up form with a listbox to display the potential matches. I allow the user to view any of the accounts presented and either select from among the possible matches presented, or create a new account (or cancel the operation). But to compare over 1M records to one another and identify all of the possible combinations is daunting.
However, if you make the assumption (to start with) that you only want to use the first word in each supplier name, it would at least give you a head-start. Something like:
SELECT T1.Supplier, T1.Address, T1.Phone, T2.Supplier, T2.Address, T2.Phone
FROM yourTable as T1
INNER JOIN yourTable as T2
ON Left(T1.Supplier, instr(T1.Supplier & " ", " ") - 1) = Left(T2.Supplier, instr(T2.Supplier & " ", " ") - 1)
ORDER BY T1.Supplier, T2.Supplier
Don't know what you want to do with that, but it would be a start.
I assume that you are attempting to try to identify all of the "duplicates" so that you only have a single record for each supplier, not multiples with slightly different configurations?
I do something similar to what I described above when someone is creating a new user account in many of my databases. I pass in the users name and parse that value and the users name fields from my database and if there are any matches, then I generally use a pop-up form with a listbox to display the potential matches. I allow the user to view any of the accounts presented and either select from among the possible matches presented, or create a new account (or cancel the operation). But to compare over 1M records to one another and identify all of the possible combinations is daunting.
However, if you make the assumption (to start with) that you only want to use the first word in each supplier name, it would at least give you a head-start. Something like:
SELECT T1.Supplier, T1.Address, T1.Phone, T2.Supplier, T2.Address, T2.Phone
FROM yourTable as T1
INNER JOIN yourTable as T2
ON Left(T1.Supplier, instr(T1.Supplier & " ", " ") - 1) = Left(T2.Supplier, instr(T2.Supplier & " ", " ") - 1)
ORDER BY T1.Supplier, T2.Supplier
Don't know what you want to do with that, but it would be a start.
ASKER
I would like to start with using the first word. Mostly the suppliers are global so the first word is usually similiar or at least parts are. However, after the first word, you might have the country name or an abbrev or something.
I have manually been looking up strings using contains and manually putting the parent supplier name.
I will start with the query you wrote above and see what happens.
I am open to ALL ideas. I just need to do this fairly fast and simple
I have manually been looking up strings using contains and manually putting the parent supplier name.
I will start with the query you wrote above and see what happens.
I am open to ALL ideas. I just need to do this fairly fast and simple
ASKER
Also I keep getting an error with the above query stating the join is incorrect.
You would not be able to see the JOIN in the query design window, just in the SQL view, but you could also try:
SELECT T1.Supplier, T1.Address, T1.Phone, T2.Supplier, T2.Address, T2.Phone
FROM yourTable as T1, yourTable as T2
WHERE Left(T1.Supplier, instr(T1.Supplier & " ", " ") - 1) = Left(T2.Supplier, instr(T2.Supplier & " ", " ") - 1)
AND T1.SupplierID <> T2.SupplierID
ORDER BY T1.Supplier, T2.Supplier
That should identify the left most "word" in the [Supplier] field for each record in the tables. I added a criteria to the WHERE clause which rules out the possibility that the two values are from the same record.
SELECT T1.Supplier, T1.Address, T1.Phone, T2.Supplier, T2.Address, T2.Phone
FROM yourTable as T1, yourTable as T2
WHERE Left(T1.Supplier, instr(T1.Supplier & " ", " ") - 1) = Left(T2.Supplier, instr(T2.Supplier & " ", " ") - 1)
AND T1.SupplierID <> T2.SupplierID
ORDER BY T1.Supplier, T2.Supplier
That should identify the left most "word" in the [Supplier] field for each record in the tables. I added a criteria to the WHERE clause which rules out the possibility that the two values are from the same record.
ASKER
The query just keep locking up my laptop. Maybe I could make 2 column using the left formula and than compare so it doesnt get locked up.
You are trying to run a query that generates 1M * 1M (1 trillion) row combinations (every row against every other row, and then do these comparisons. Try:
SELECT SupplierID, Supplier, Address, Phone, Left(Supplier, instr(Supplier & " ", " ") - 1) as LeftWord
FROM your table
Then save that as query1. Then use something like:
SELECT T1.Supplier, T1.Address, T1.Phone, T2.Supplier, T2.Address, T2.Phone
FROM query1 as T1
INNER JOIN query1 T2
ON T1.LeftWord = T2.LeftWord
WHERE T1.SupplierID <> T2.SupplierID
ORDER BY T1.Supplier, T2.Supplier
If that gives you problems as well, post your SQL string and we will see if we cannot get this working.
SELECT SupplierID, Supplier, Address, Phone, Left(Supplier, instr(Supplier & " ", " ") - 1) as LeftWord
FROM your table
Then save that as query1. Then use something like:
SELECT T1.Supplier, T1.Address, T1.Phone, T2.Supplier, T2.Address, T2.Phone
FROM query1 as T1
INNER JOIN query1 T2
ON T1.LeftWord = T2.LeftWord
WHERE T1.SupplierID <> T2.SupplierID
ORDER BY T1.Supplier, T2.Supplier
If that gives you problems as well, post your SQL string and we will see if we cannot get this working.
ASKER
The query ran, however, it says too many records to display.
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
commonality of words (whole or partial) when broken by spaces?
Would "BCD Comp" be a "match" for "ABC Comp"?
depending on the number of suppliers, you could write a function which you pass two values (Suppliers) and then split those values into their parts on the space character, then analyze to determine whether one or more of the comparison words is a match and have the function return a boolean value. But if you did that "BCD Comp" would match "ABC Comp" because os the "Comp" match.