?
Solved

De-duplication query on Access (SQL) database

Posted on 2013-01-17
11
Medium Priority
?
541 Views
Last Modified: 2013-01-20
Hi,

I have a Access database that I need some help with (my Access and SQL skills are pretty basic). I have a single table containing information about documents. Including which other documents in the table are duplicates of it.

My objective is to create a list of unique documents, ie. All the documents that have no duplicates (truly unique) along with only one document for each set (2 or more) duplicates. This is further complicated by the marking of duplicates not including all (in direct) duplicates.

The marking of duplicates does not extend to every duplicate, eg.
Filename      Duplicate
Document A      Document B
Document B      Document A
Document B      Document C
Document C      Document B
Document A has been marked as a duplicate of Document B. Document B is marked as a duplicate of Document C. So Document A and C and also duplicates, but are not directly marked as duplicates.

Currently my information is stored in a table with this layout:
Filename      Duplicate1      Duplicate2      Duplicate3      Duplicate4      Duplicate5
DocumentA      DocumentB      DocumentH                                                            
                              
Where no duplicate could be identified the respective Duplicate 1-5 fields are blank.

This may not be the best way to store this information and I welcome any suggestions to restructure the table or implement additional tables.

The output I require is a list of unique documents, ie. No document in the list will have a duplicate (direct or indirect) in the list.

Many thanks in advance,
0
Comment
Question by:Roger Adams
  • 7
  • 4
11 Comments
 
LVL 31

Expert Comment

by:hnasr
ID: 38787168
Upload an access database with such info in relevant tables. Then list the expected output.
0
 

Author Comment

by:Roger Adams
ID: 38787279
Please see DB attached.

The output I require is a list of reference numbers for unique documents, ie.  Documents that that no duplicates and are not the duplicate of any other document, along with one document from each set (possibly more than one document) of duplicates.

I hope this is clear, thank you for your help.
Database21.accdb
0
 
LVL 31

Expert Comment

by:hnasr
ID: 38787307
Thanks! Back in 2 hours.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 31

Expert Comment

by:hnasr
ID: 38788354
Hi  r0cky07,
There are many records in table.
Please limit the number of records to a few which you can process the required output by hand, then list the output.
Upload the new table.
0
 

Author Comment

by:Roger Adams
ID: 38792046
Please see a mockup table attached. Finding an example set from the original table would be too time consuming so I created a mock up to show what needs to be done.

The unique list output for this example should be:
J001
J006
J007

I hope this illustrates what I am trying to do, please feel free to ask any questions.
Database22.accdb
0
 
LVL 31

Expert Comment

by:hnasr
ID: 38792638
Try this.
New form with a command button to normalized data in new table table1(fld1, fld2)
Query to list unique values>

Open form Form1
Click Find Unique

Query: query_unique_list
SELECT table1.fld1
FROM table1
WHERE (((table1.fld1) Not In (SELECT fld2 FROM table1 x where x.fld2=table1.fld1)))
GROUP BY table1.fld1;

Private Sub cmdFindUnique_Click()
    Dim r1s As DAO.Recordset
    Dim rs2 As DAO.Recordset
    Set rs1 = CurrentDb.OpenRecordset("All_Documents_Question")
    DoCmd.SetWarnings False
    DoCmd.RunSQL "Delete from table1"
    Set rs2 = CurrentDb.OpenRecordset("table1")
    While Not (rs1.EOF)
        For i = 2 To rs1.Fields.Count - 1
            rs2.AddNew
            rs2(0) = rs1(1)
            If Not IsNull(rs1(i)) Then
                rs2(1) = rs1(i)
            End If
            rs2.Update
        Next
       
       
        rs1.MoveNext
    Wend
    DoCmd.SetWarnings True
    DoCmd.OpenQuery "query_unique_list"
End Sub
Database22-Q-27998802.accdb
0
 
LVL 31

Expert Comment

by:hnasr
ID: 38793344
Please check sample database!
0
 

Author Comment

by:Roger Adams
ID: 38793378
Hi, thank you very much. You are definitely on the right track!

The output of the sample database is missing one entry: J007 (or J008)
They are duplicates, so one of them should be included in the end product, the unique list.

Does that make sense?
0
 
LVL 31

Expert Comment

by:hnasr
ID: 38793538
"Does that make sense?"

Looks like a redundant entry. If you remove J008 from duplicate entry you get the required output.
fld1
J001
J006
J007

Check your requirements and comment.
0
 
LVL 31

Accepted Solution

by:
hnasr earned 2000 total points
ID: 38794029
Modified version to handle extra records in original table.
The previous version is very slow in running the query.
Database23-Q-27998802.accdb
0
 

Author Comment

by:Roger Adams
ID: 38794036
That looks fantastic!

I shall error check over the weekend and revert, excellent skills thought *bow*
0

Featured Post

Get free NFR key for Veeam Availability Suite 9.5

Veeam is happy to provide a free NFR license (1 year, 2 sockets) to all certified IT Pros. The license allows for the non-production use of Veeam Availability Suite v9.5 in your home lab, without any feature limitations. It works for both VMware and Hyper-V environments

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This month, Experts Exchange sat down with resident SQL expert, Jim Horn, for an in-depth look into the makings of a successful career in SQL.
This holiday season, we’re giving away the gift of knowledge—tech knowledge, that is. Keep reading to see what hacks, tips, and trends we have wrapped and waiting for you under the tree.
Visualize your data even better in Access queries. Given a date and a value, this lesson shows how to compare that value with the previous value, calculate the difference, and display a circle if the value is the same, an up triangle if it increased…
Have you created a query with information for a calendar? ... and then, abra-cadabra, the calendar is done?! I am going to show you how to make that happen. Visualize your data!  ... really see it To use the code to create a calendar from a q…

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question