Hello Experts,
I am working a project that requires me to remove duplicate records within a database. As a starting point I have identified the criteria by which to identify where 2 records are alike. I have then run a process within a piece of proprietary software that based upon fuzzy logic etc etc returns a list of records that identify (based upon my criteria), which records are similar. I now need to be able to group ALL similar records
In order to simplify the problem, please use the following data:
table MATERIALS (fields MATNR | ZGROUP) contains all my unique materials, as example assume it has these records
MATA |
MATB |
MATC |
MATD |
MATE |
MATF |
MATG |
MATH |
MATI |
Table DUPLICATES (MATNR1 | MATNR2) contains the relationships between materials that my de-duplication process has identified as being similar.
MATA | MATB
MATB | MATF
MATG | MATF
MATC | MATD
MATC | MATI
MATI | MATE
I need a query(s) or stored procedure that will identify all the records that are similar i.e.MATA is like MATB, MAT B is like MATF and MATG is like MAT F, therefore, MATA, MATB, MATF and MATG are all similar.
I would like table MATETRIALS and field ZGROUP to be updated with a value that allows all linked records to be grouped. The MATERIALS table would therefore look as follows:
MATA | 1
MATB | 1
MATC | 2
MATD | 2
MATE | 2
MATF | 1
MATG | 1
MATH | 3
MATI | 2
This would allow me to then present my business users with these records, grouped, together by all those that are similar and to then tell me which record should survive.
Any help with this problem would be gratefully received.
Regards
Andy
Start Free Trial