asked on

Challenging Access query

Hello Experts:

I need some help with some data management (using Excel and Access).

Background on Excel File (incl. two tabs "Raw Data" and "Summary"):
- The "Raw Data" tab contains four columns [Incident Date], [Incident Time], [Age], [Gender]
- Also, for each of these four columns, I added an adjacent "Delta" column using an IF statement to compare two adjacent cells (per row).
- For example, for the [Date] colummn, in cell B2, I compare cell A2 with A3. If date = 07/01 = 07/02, I output either "Same" or "Different". In this case, cell B2 returns "Different".
- Alternatively, in cell B3, I compare cell A3 with A4. If date = 07/02 = 07/02, I output either "Same" or "Different". In this case, cell B2 returns "Same".
- The remaining formulae (in columns D, F, and H) follow the same principle as applied in column B.
- Now column I... it's using a nested IF formula. I want to determine where columns A, C, E, G are all the "Same" (or "Different").
- In this case, out of 323 rows, 90 rows are exactly the "Same" and 233 rows equal "Different" (see tab "Summary").

Background on Access file:
- I imported the Excel data from "Raw Data" (without the "Delta" columns). Table "00_tblRawData_323_Records" includes 323 records.
- I created a Select/Make Table query ("00_qry_233_Records") which uses the "Group" feature in the query. Upon executing the MakeTable query, it now creates table "01_tblRawData_233_Records".
- The records in "01_tblRawData_233_Records" should be the equivalent to the 233 rows marked "Different" in the spreadsheet.

Here's what I need some assistance with in the Access database:
- Again, the Excel file identified 90 records equal to "Same" and 233 records equal to "Different".
- Upon data import from Excel to Access, I have identified the 233 "Different" records but I have not been able to identify the 90 "Same" records.
- I tried to use a left join query "01_LeftJoinQuery" (following concept of "which 90 records exist in the 323 records that are not in the 233 records). However, my left join query doesn't seem to work correctly... it only produces 10 records (vs. 90 records.
- So, for a recap, in Access, I need to have query that outputs the 90 records marked "Same" in the spreadsheet.

But wait, here's more:
- Again, once the 90 records have been identified, I also want to output the "matching record(s)" in another query. For instance, let's go back to Excel (allow me to use row numbers)...
- Cell I3 = "Same"... that was based on row A3=A4 AND C3=C4 AND E3=E4 AND G3=G4. Again, in Excel, only I3 is being marked as "same" (counting towards the 90 records)... in reality though, row 3 matches row 4 and row 4 matches row 3... should I really want both records to be output in Access.
- It gets little bit better, in Excel, row #12 (i.e., I12) is listed as "Same"... again, I am comparing cells in row 12 against row 13. However, row 13 also indicates "Same" (when compared against row 14).
- That means, that in reality, row 12 through row 14 have the same value (when looking at Date, Time, Age, and Gender).

All that said, I want Access to output all those records where I have at least two or three (or more) alike records. How can that be achieved?

And, maybe, I overly complicated this through my queries. All, what's needed is to have a data set where all duplicates are created in a table. In fact, I included an updated Spreadsheet where I marked all record in color "blue-ish". Please see updated XLS "RecordSet _ with actual records to be included in Access query"

Hopefully this wasn't too confusing. If it was, I will gladly expand on the problem.

Thank you for your help in advance,
EEH
RecordSet.xlsx
RecordSet.accdb
RecordSet-_-with-actual-records-to-.xlsx

Dale Fye

ExpExchHelp,

If you use the query wizard, one of the options is to create a "find duplicates" query, which will allow you to select the fields you want to use to define "duplicates' (Date, Time, Age, and Gender in your case).

But I prefer to do this on my own, with a query that look like:

SELECT yourTable.*
FROM yourTable
INNER JOIN (
SELECT [Date], [Time], Age, Gender
FROM yourTable
GROUP BY [Date], [Time], Age, Gender
HAVING Count([PKField]) > 1
) as T ON yourTable.[Date] = T.[Date]
AND yourTable.[Time] = T.[Time]
AND yourTable.[Age] = T.[Age]
AND yourTable.[Gender] = T.[Gender]