Link to home
Start Free TrialLog in
Avatar of ExpExchHelp
ExpExchHelpFlag for United States of America

asked on

Challenging Access query

Hello Experts:

I need some help with some data management (using Excel and Access).

Background on Excel File (incl. two tabs "Raw Data" and "Summary"):
- The "Raw Data" tab contains four columns [Incident Date], [Incident Time], [Age], [Gender]
- Also, for each of these four columns, I added an adjacent "Delta" column using an IF statement to compare two adjacent cells (per row).
- For example, for the [Date] colummn, in cell B2, I compare cell A2 with A3.  If date = 07/01 = 07/02, I output either "Same" or "Different".  In this case, cell B2 returns "Different".
- Alternatively, in cell B3, I compare cell A3 with A4.  If date = 07/02 = 07/02, I output either "Same" or "Different".  In this case, cell B2 returns "Same".
- The remaining formulae (in columns D, F, and H) follow the same principle as applied in column B.
- Now column I... it's using a nested IF formula.   I want to determine where columns A, C, E, G are all the "Same" (or "Different").
- In this case, out of 323 rows, 90 rows are exactly the "Same" and 233 rows equal "Different" (see tab "Summary").

Background on Access file:

- I imported the Excel data from "Raw Data" (without the "Delta" columns).   Table "00_tblRawData_323_Records" includes 323 records.
- I created a Select/Make Table query ("00_qry_233_Records") which uses the "Group" feature in the query.  Upon executing the MakeTable query, it now creates table "01_tblRawData_233_Records".  
- The records in "01_tblRawData_233_Records" should be the equivalent to the 233 rows marked "Different" in the spreadsheet.

Here's what I need some assistance with in the Access database:
- Again, the Excel file identified 90 records equal to "Same" and 233 records equal to "Different".
- Upon data import from Excel to Access, I have identified the 233 "Different" records but I have not been able to identify the 90 "Same" records.  
- I tried to use a left join query "01_LeftJoinQuery" (following concept of "which 90 records exist in the 323 records that are not in the 233 records).  However, my left join query doesn't seem to work correctly... it only produces 10 records (vs. 90 records.
- So, for a recap, in Access, I need to have query that outputs the 90 records marked "Same" in the spreadsheet.

But wait, here's more:
- Again, once the 90 records have been identified, I also want to output the "matching record(s)" in another query.   For instance, let's go back to Excel (allow me to use row numbers)...
- Cell I3 = "Same"... that was based on row A3=A4 AND C3=C4 AND E3=E4 AND G3=G4.  Again, in Excel, only I3 is being marked as "same" (counting towards the 90 records)... in reality though, row 3 matches row 4 and row 4 matches row 3... should I really want both records to be output in Access.
- It gets little bit better, in Excel, row #12 (i.e., I12) is listed as "Same"... again, I am comparing cells in row 12 against row 13.  However, row 13 also indicates "Same" (when compared against row 14).
- That means, that in reality, row 12 through row 14 have the same value (when looking at Date, Time, Age, and Gender).

All that said, I want Access to output all those records where I have at least two or three (or more) alike records.   How can that be achieved?

And, maybe, I overly complicated this through my queries.  All, what's needed is to have a data set where all duplicates are created in a table.   In fact, I included an updated Spreadsheet where I marked all record in color "blue-ish".  Please see updated XLS "RecordSet _ with actual records to be included in Access query"

Hopefully this wasn't too confusing.  If it was, I will gladly expand on the problem.    

Thank you for your help in advance,
EEH
RecordSet.xlsx
RecordSet.accdb
RecordSet-_-with-actual-records-to-.xlsx
Avatar of Dale Fye
Dale Fye
Flag of United States of America image

ExpExchHelp,

If you use the query wizard, one of the options is to create a "find duplicates" query, which will allow you to select the fields you want to use to define "duplicates' (Date, Time, Age, and Gender in your case).

But I prefer to do this on my own, with a query that look like:

SELECT yourTable.*
FROM yourTable
INNER JOIN (
SELECT [Date], [Time], Age, Gender
FROM yourTable
GROUP BY [Date], [Time], Age, Gender
HAVING Count([PKField]) > 1
) as T ON yourTable.[Date] = T.[Date]
AND yourTable.[Time] = T.[Time]
AND yourTable.[Age] = T.[Age]
AND yourTable.[Gender] = T.[Gender]
Avatar of ExpExchHelp

ASKER

Dale:

I used your structured and created a query with the following syntax

SELECT 00_tblRawData_323_Records.* FROM 00_tblRawData_323_Records INNER JOIN (SELECT [Date], [Time], [Age], [Gender] FROM 00_tblRawData_323_Records GROUP BY [Date], [Time], [Age], [Gender] HAVING Count([PKField]) > 1) as T ON 00_tblRawData_323_Records.[Date] = T.[Date] AND 00_tblRawData_323_Records.[Time] = T.[Time] AND 00_tblRawData_323_Records.[Age] = T.[Age] AND 00_tblRawData_323_Records.[Gender] = T.[Gender] 

Open in new window


Unfortunately, when executing it, I get the following syntax error: "Syntax error in query expression 'tblRawData_323_Records.[Date] = T.[Date'.

What am I missing?

EEH
In the HAVING clause, I used: HAVING Count([PKField])
you will need to change "PKField" to the name of the primary key field in the table (if you allowed Access to add one when you imported the data), or to any other field name that is guaranteed to have NON NULL values ([Date] would probably work).

SELECT 00_tblRawData_323_Records.* 
FROM 00_tblRawData_323_Records 
INNER JOIN (
SELECT [Date], [Time], [Age], [Gender] 
FROM 00_tblRawData_323_Records 
GROUP BY [Date], [Time], [Age], [Gender] 
HAVING Count([Date]) > 1) as T 
ON 00_tblRawData_323_Records.[Date] = T.[Date] 
AND 00_tblRawData_323_Records.[Time] = T.[Time] 
AND 00_tblRawData_323_Records.[Age] = T.[Age] 
AND 00_tblRawData_323_Records.[Gender] = T.[Gender] 

Open in new window

Dale:

I changed my table name to "yourTable" and modified the name of the PK... it runs... not sure as to why I got the error w/ my naming concentions.

I counted the # of "blue records" in Excel.. it was 167.   Your query also produces 167 records in Access.   That's very exciting... allow me to do little bit more validation, but I feel already very excited about the progress.

More to follow (on Monday)... 'hope you won't mind my keeping the post open for a bit longer.

Cheers,
EEH
Dale:

The query works great!    I validated the record count it is spot on.

I have one follow-up question... if preferred I will open up a new post (just let me know).

//

As a picture is worth a 1000 words, Excel provides the step-by-step process.   With your help, the first 3 steps are now easily completed.   The 4th step, however, offers a new challenge.    

Ultimately, I need to merge the "Summary" of the records pairings into a single record.   I used a semi-colon to separate summary comments but the ";" may not be required as long as I can show all n comments in the merged records.  

Is step #4 doable?

Thank you in advance,
EEH
Update.xlsx
Update.accdb
Yes, this is doable.

Take a look at the code and description at the following link.  This describes how to concatenate data from a field in multiple records into a new value.

http://theaccessweb.com/modules/mdl0004.htm

Dale
Dale:

I took a look at Dev Ashish's VBA code.    I'm not entirely certain how to tie it into my database (see attached).

That is, I need to run the code on the 2nd query (i.e., the one with the 77 records).   However, it is a GroupBy query w/o the field "Summary".  

Again, I'm not entirely following how to modify the module to make it work w/ my query.

I appreciate if you have any additional feedback/recommendations.

Thank you,
EEH
Update-with-module.accdb
OK,

I've created my own function "fnConcat" in the enclosed database.  Take a look at tbl_fnConcatTest and qry_fnConcatTest.  The function accepts arguments for FieldName, DataSource, Criteria, and Delimiter.

FieldName: this can either be a single field name "Comments" or could be multiple fields "[Comments] & ' xx ' & [SomeOther fieldname]"
Note that this must be wrapped in quotes and any quotes embedded must be doubled up or replaced with single quotes

DataSource: this is the name of the table or query you are concatenating data from

Criteria:  this is a valid criteria string, which can include concatenated values.  In sample query, I used:
"([MyDate] = #" & [T].[MyDate] & "#) AND ([MyTime] = #" & [T].[MyTime] & "#) AND ([Age] = " & [T].[Age] & ") AND ([Gender] = '" & [T].[Gender] & "')"

Open in new window

Note: if you have records with NULL values in any of the fields being used in the criteria expression, you must ensure that your criteria handles those appropriately.

Delimiter: this is simply the character or characters you want to use to delimit the strings or values being concatenated.  In my sample query, I used: " || "

Your best bet is to create an aggregate query that groups on the [Date], [Time], [Age], and [Gender] fields in your FindDuplicates query (similar to my tbl_fnConcatTest).  Then create a computed field based on my function and those values.

HTH
Dale
Concatenation.accdb
Dale:

Thank you... this looks VERY PROMISING.

For testing purposes, I have appended my 323 records into your table.    Then, I added random comments (they don't mean anything) into the comment field.  Upon execution of the query, however, I get multiple errors.   See attached DB.

You referenced NULL values.   Is this why it's throwing the error messages?   If so, how can I work around it again.    With the exception of the comments, the data from the four fields (date, time, age, gender) are valid data.

Thank you for your help thus far... I very much appreciate it.

EEH
Concatenation--2-.accdb
ASKER CERTIFIED SOLUTION
Avatar of Dale Fye
Dale Fye
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Dale:

Thank you... again, I truly appreciate your continued assistance.

The query executes without any issues.    At first glance, I thought that I had an incorrect record count.   I anticipated the record output to be around 77.   However, I then realized this query probably should display the **233** records (marked as "Different") in the original XLS but with the comments from the "Same" records.

Please allow me to do some more validation this weekend.   I believe, however, this appears to be the PERFECT SOLUTION!

I will get back w/ you this weekend, if not early Monday morning.  

Again, thousand thanks thus far!!

EEH
Dale:

Oh, forgot to ask... I have reviewed the query:

SELECT T.MyDate, T.MyTime, T.Age, T.Gender, fnConcat("Comments","tbl_fnConcatTest",
"(NZ([MyDate], #1/1/2000#) = #" & NZ([T].[MyDate], #1/1/2000#) & "#) AND (NZ([MyTime], #12:00:00#) = #" & NZ([T].[MyTime], #12:00:00#) & "#) AND (NZ([Age], 0) = " & NZ([T].[Age], 0) & ") AND (NZ([Gender], 'X') = '" & NZ([T].[Gender], "X") & "')"," || ") AS MergedComments
FROM tbl_fnConcatTest AS T
GROUP BY T.MyDate, T.MyTime, T.Age, T.Gender;

Open in new window


Could you please provide me some more information how this query works?

Thank you,
EEH
adsfad
Well, it is an aggregate query, it groups by the fields you were identifying as the fields you would use where there are duplicates.  Actually you could add a "Having" clause to the query to only return the records where there are duplicates by adding the following at the end of the query (remove the semi-colon and add this as the next line)

HAVING Count(1) > 1

So when you run this query, it is going to group by those 4 columns, and then pass the values from those 4 columns into the fnConcat function as the criteria.  The function will generate a recordset for each set of values that is passed to it, and loop through the records in that recordset to contatenate the Comments column values into a single return value.
Dale:

Thank you -- naturally, my actual data set has an additional 20+ fields that need to be output.   Adding those fields to the query, I presume, will not change the outcome of the query, correct?

Again, thanks!!
EEH
Well, how do you expect to pull in all of the values in those additional 20 fields when you have duplicates?  The way that query works is that it finds the duplicates across those 4 fields, and concatenates the Comments field from all of the records which match on those 4 fields.

If you have 20 other fields, and the data is not the same in all of them, how do you propose to decide which records or which record to pull those other 20 values from?
Would it possible to turn the current query into a MakeTable query and then, somehow, using the primary key ([ID]) to link them back to the original table with the 20 fields?
No, you are going to lose the PKID of the original records in the Group By query.  Otherwise, it will return one row for every PKID.

You could join your aggregate query back to the table it came from, linking on the [Date], [Time], Age, and Gender fields, but if you do that, what is the point of the Group By query.

Lets go back to the beginning.  What do you want to actually accomplish with this table you are importing from Excel?  Ignore your original question, what do you want to do with the data from this Excel Spreadsheet?

If you want to create a table based on the [Date], [Time], Age, and Gender, and then create another table with the details associated with each of the records created by the grouping, that is possible, sort of like an Orders and Order_Details tables.
Dale:

Thank you for your follow-up and continued willingness to assist me.  Ok, allow me to recap the process w/o going into the weeds.

Background:
- Users have access to an online system where they must report "incidents" which may have occurred.  Doing so results in an "initial" report.
- Frequently, users have to go back into the online system and provide an update to the report.
- If trained properly, users would identify the existing report and merely update the record.  
- However, quite frequently (due to lack of training) users create a new record vs. modifying the existing one.  So, in many cases, there are two (or more) records for the same incident (vs. having a single record w/ multiple updates).

Transition to my original post/question:
- Upon the data pull into a CVS/XLS format, the spreadsheet contains hundreds of incidents/records.  As described above, there are scenarios where we have multiple records for the same incident.
- With the help of your query, I now can easily identify both a) distinct records and b) alike records [based on the demographics combined with date/time].  
- Also, as you know, w/ the help of your query, I can merge all comments into a single cell/block for all the secondary/tertiary records.  
- So, HAVING THIS QUERY IS A HUGE HELP !!

Now, here's where I stand:
- For simplification purposes in my original question, I focused only on those four fields ([Date], [Time], [Age], [Gender]) during the initial post.
- However, there's additional information in the incident reports (e.g., 323 records in the provided example) that I need to "tie" to the merged data/records (i.e., 233 records).
- Some fields -- actually, I think it's closer to 70+ fields -- have the same values (based on drop-down boxes) so I believe they could be added to the fnConcat function/query.  
- At the same time, other fields may have different values (just like the comments field).  Maybe they can be merged as well.
- Being at home now , I don't have access to all the data (and, as you can imagine, some of that information is sensitive to begin with).  

My question:
- Although there are roughly 70+ fields, isn't the concept the same?   That is, for any incidents where I have repeated values across multiple records, I could add them to the "Group By" query, right?  
- And, if the data vary for the same incidents (just like the comment field), couldn't I create another merge function?   If so, can I keep using the same function (module) and simply use another field alias (rather than "MergeComments", I would use "MergeOtherField")?

Tomorrow morning, I will integrate your query into my actual data set.   I then will know more details on the specific process.   However, if you have an "work around" solution based on the information described above, I certainly would welcome your recommendation.

Thank you,
EEH
Dale:

I hope my last respond provided little more content.   Please allow me to ask a follow-up question though.

I showed the query result to another work co-worker.   Both of us were thrilled that it worked w/ the actual data set.

However, there's always a "but, ???".

Quick background before:
- Previously, the analysts manually scanned through the record set and determined which records become a "pair" or "triplet".   That was based on fields [Date], [Time], [Age], [Gender].
- I did the record count comparison between the automated procedure in Access and the (previous) manual count in Excel.  
- Turns out that there was a difference of two (2) records.
- Here's why... as part of the manual (human in the loop) check, there's another **sorting criteria** (i.e., Age, Gender, Organization).

My question:
- How should the query expression be modified from originally, 4 fields to now 3 fields?   Pls see code below (which throws a syntax error).
MergedSummary_1: 
fnConcat("Summary","01_tbl_ConcatenateSummary",
"(NZ([Date], #1/1/2000#) = #" & Nz([T].[Date],#1/1/2000#) & "#) AND 
(NZ([Time], #12:00:00#) = #" & Nz([T].[Time],#12:00:00 PM#) & "#) AND 
(NZ([Age], 0) = " & Nz([T].[Age],0) & ") AND 
(NZ([Gender], 'X') = '" & Nz([T].[Gender],"X") & "')"," || ")


MergedSummary_2: 
fnConcat("Summary","01_tbl_ConcatenateSummary",
"(NZ([Age], 0) = " & Nz([T].[Age],0) & ") AND 
(NZ([Gender], 'X') = '" & Nz([T].[Gender],"X") AND
(NZ([Organization], 'X') = '" & Nz([T].[Organization],"X") & "')"," || ")

Open in new window

MergedSummary_2:

fnConcat("Summary","01_tbl_ConcatenateSummary",
"(NZ([Age], 0) = " & Nz([T].[Age],0) & ") AND
(NZ([Gender], 'X') = '" & Nz([T].[Gender],"X") & "') AND
(NZ([Organization], 'X') = '" & Nz([T].[Organization],"X") & "')"," || ")

In the 3rd line above, you have to close out the Gender comparison, by adding in the text that is bolded above.  You should be able to simply cut and paste the text from above.
Dale:

Thank you for the additional query feedback.

You have been extremely gracious, patient, and supportive in this post.   So, I would like to close this question at this time.

In the event I run into additional stumble-blocks, I hope you won't mind my opening a new post/question and provide the cross-reference URL in this question.

Again, I would like to thank you for your assistance.   Your help has been tremendous.

EEH
Dale's help has been absolutely superb.    I had multiple follow-up questions and Dale patiently "walked" me through the process.

The solution is perfect and I couldn't have asked for better assistance.

EEH
glad that I was able to help.
Dale:

Please forgive me to asking a follow-up question.

The query (grouping by [Date], [Time], [Age], [Gender]) which concatenates "Summary" executes within 1-2 seconds.  

For testing purposes (to display data from additional fields) I added ~20 fields following the sample principle applied to "Summary".   For example, I also concatenate "Organization" and ~19 fields in the same manner.  

This query also executes within a few seconds.   HOWEVER, when changing the same query from a "Select" to a "Make Table" query, "seconds turn into minutes".   For example, the modified Select query with multiple concatenated fields executes in maybe 5 seconds.   However, when in "Make Table" mode, it takes nearly 15 minutes to execute.

Why does it take so much longer to write to a table vs. displaying the results?

Thank you,
EEH
Not sure.

You might want to save the SELECT query and then use that as the source for the make-table and see how that performs.
Dale - thank you... I just tried your recommendation.   Apparently, no change... it still takes much longer to create the table (vs. viewing the data).

Thank you,
EEH
One of the things you need to understand is that Access will return results to your screen, before the query is done running, if there are a lot of records in the result set.  So, while you may see records almost immediately, the entire recordset may not be in memory yet.

Another way to test this, with code would be to put the following in a subroutine in a regular code module and run it.  It should open the query move to the last record, and then print the time it took to run.
Dim rs as dao.recordset
debug.print Now()
set rs = currentdb.openrecordset ("yourQueryName")
rs.movelast
debug.print Now()
rs.close
set rs = nothing

Open in new window

Another test you might want to try:

- delete the records from the temp table created by the Make-table query
- convert your make-table query to an append query and append the records from the original query into the temp table.
Dale:

I hope you won't me asking another follow-up question.   Please see image below which now includes the ID number field.

Instead of concatenating by date, time, age, gender, I only want to concatenate records where the ID is the same.

I attempted to change the code from/to; however, upon execution, the query results in a error.   How should I modify the expression?

Summary: fnConcat("Summary","00000000_Testing","
(NZ([IncidentDate], #1/1/2000#) = #" & Nz([T].[IncidentDate],#1/1/2000#) & "#) AND 
(NZ([Age], 0) = " & Nz([T].[Age],0) & ") AND 
(NZ([Gender], 'X') = '" & Nz([T].[Gender],"X") & "')"
," || ")



Summary: fnConcat("Summary","00000000_Testing","
(NZ([ID], 0) = " & Nz([T].[ID],0)& "')" 
," || ")

Open in new window

Snapshot.JPG
Dale... never mind... I had an additional quote... once removed, it worked fine.


Summary: fnConcat("Summary","00000000_Testing","
(NZ([ID], 0) = " & Nz([T].[ID],0) & ")" 
," || ")

Open in new window