Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Count Distinct Records by Address MySQL

Posted on 2014-11-18
5
323 Views
Last Modified: 2014-12-04
I have a database where I know that I may have more than one record present for a given Address, Individual, or Surname+Address (HOUSEHOLD)

The data in the table is standardized. Fields include :
EMAIL      
FIRST NAME
LAST NAME
ADDRESS
CITY      
STATE      
ZIPCODE
URL
CODE

I need to be able to run a query by either URL, CODE, State, AND ZIP
For Example
WHERE URL LIKE '%motorcycle%'
and STATE in ('mi', 'oh')
GROUP BY STATE

Open in new window


or
WHERE CODE IN ('abcd', 'xyz')
GROUP BY STATE

Open in new window


But I need to return a count or even pull the data based on ONE record per Household (household being defined as : Surname, Address, City, State, ZIP), or one per Address (Address being defined as : Address, City, State, ZIP)

How would I do this in combination with the other selects
0
Comment
Question by:FirstDirect
5 Comments
 
LVL 34

Expert Comment

by:James0628
ID: 40451989
The obvious solution would seem to be to group by the required fields.  If you want one record per "household", group by Surname, Address, City, State and ZIP.  If you want a count of the records for each "household", include Count in the query (I'm assuming that MySQL has a Count function, or something similar).

 I'm not sure how the two different Where's (checking URL and STATE, or checking CODE), or the two different "groups" (Household or Address) fit in.  Is the user going to be selecting which fields they're testing and which "group" they're looking for?  Are you trying to handle all of those possibilities in one query?

 James
0
 
LVL 26

Accepted Solution

by:
Zberteoc earned 500 total points
ID: 40452368
If your columns are always filled with data then you could group by Address and zip only because that would make a unique combination. That would do a count by address. You could actually do this:
SELECT 
	ADDRESS,
	ZIPCODE,
	COUNT(*) as Count_ALL,
	COUNT(DISTINCT `LAST NAME`) as Count_Households
FROM 
	YoutTable
WHERE
	...
GROUP BY
	ADDRESS,
	ZIPCODE

Open in new window

Or you could do this:
SELECT 
	ADDRESS,
	CITY,      
	STATE,     
	ZIPCODE,
	COUNT(*) as Count_ALL,
	COUNT(DISTINCT `LAST NAME`) as Count_Households
FROM 
	YoutTable
WHERE
	...
GROUP BY
	ADDRESS,
	CITY,      
	STATE,     
	ZIPCODE

Open in new window

But if the addresses are not accurate(different spelling for the same address/city/state) could be a problem.
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 40452680
We have had to deal with a very similar issue in our constituent management app.  The problem boils down to two things:
There are an infinite number of ways to write a single address.  Between typos, non-standard abbreviations, and just plain wrongness, you can never be sure that any two addresses point to the same place.  We work around this (to a degree) with an in-house application which leverages USPS official address lists and standardization techniques.  Every address we receive is initially sent through this app for formatting and term standardization, and verification of physical existence (according to the USPS, anyways).
Your firm definition of "household" is a good thing, but a bit complicated to be using "on the fly", so to speak.  It might be better to abstract this information to a "household identifier" table.  For example, you could move any address-specific fields in YourTable to AddressIDTable, replacing them with the ID field from the new lookup.  This would allow for a much simpler comparison to decide if two people are living at the same address - if the IDs match, then yes.

Hope this helps!
0
 

Author Comment

by:FirstDirect
ID: 40452758
Zberteoc

Thank you for your response I will try that. The examples is what I was looking for.

Steve Bink
Very helpful suggestions. I will look into this.

James0628
Although I appreciate your response to this question. I do realize that if I want one per household that it is a combination of a few fields. What I did not know how to do is create the correct query.  To answer your question. We are searching by URL or Code to select specific people in the table by what code has been assigned to them or by the URL they came in on.  Its one thing to simply write the query for that, but another to limit the results to one per Address or one per Household in combination with the other.
0
 
LVL 34

Expert Comment

by:James0628
ID: 40453236
Since you were already using Group By, adding more fields to that seemed straightforward enough (which made me think that there might be more to the question).

 As for the counts, I was originally thinking of counting the records within each Household/Address (eg. to identify the Households that have multiple records).  If you want to count the number of unique Households/Addresses, you can combine the fields that make up a Household/Address and do a count on that.  For example, for a Household count, assuming that ZIP is a numeric field, it might be something like

COUNT (DISTINCT Surname, Address, City, State, CAST (ZIP AS CHAR)) AS Count_Households

 James
0

Featured Post

Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I'm trying, I really am. But I've seen so many wrong approaches involving date(time) boundaries I despair about my inability to explain it. I've seen quite a few recently that define a non-leap year as 364 days, or 366 days and the list goes on. …
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …
In an interesting question (https://www.experts-exchange.com/questions/29008360/) here at Experts Exchange, a member asked how to split a single image into multiple images. The primary usage for this is to place many photographs on a flatbed scanner…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question