Solved

Counting DISTINCT Values that are in one column of SQL

Posted on 2013-06-21
19
297 Views
Last Modified: 2013-06-25
I have attached an excel file that I will refer to which is just a small portion of the table. I am trying to count data to transfer to a BI tool. I know I have 24 lines of data, but I have 9 different customers.




Count Distinct Customers        Customers Sold            Hit Ratio
                 9                                        1                     11.1% (1/9)
Carrier-Count.xlsx
0
Comment
Question by:Newbi22
  • 8
  • 4
  • 4
  • +1
19 Comments
 
LVL 15

Expert Comment

by:gplana
ID: 39267000
Try SELECT count(DISTINCT your_field) FROM your_table;
0
 

Author Comment

by:Newbi22
ID: 39267012
I need a WHERE clause because in the table there is mulitple carriers. Also the WHERE clause looks at a different table then the count
0
 
LVL 15

Expert Comment

by:gplana
ID: 39267020
No problem, this is just an example, you can add a WHERE and also some joins to other tables.
0
 

Author Comment

by:Newbi22
ID: 39267042
Sorry, I only do well when the code provide more assistance. I am a CFO trying to write SQL..... Never a good thing. Need more assistance with the code.
0
 
LVL 15

Expert Comment

by:gplana
ID: 39267050
You should do something like this:

SELECT count(DISTINCT your_field)
FROM your_table INNER JOIN your_other_table ON your_table.your_field = your_other_table.your_other_field
WHERE some_conditions_here;

Where conditions are something like: salary>1000 and name='John'
0
 

Author Comment

by:Newbi22
ID: 39267088
SELECT count(DISTINCT Client_Last_Name)
FROM NewBusinessHeader INNER JOIN NewBusinessLineItem ON NewBusinessLineItem.Carrier = NewBusinessHeader.Client_Last_Name
WHERE Carrier='Mike Insurance Co';

NOT WORKING.......

Carrier is in NewBusinessLineItem

Client_Last_Name is in NewBusinessHeader
0
 

Author Comment

by:Newbi22
ID: 39267151
I found 1 mistake between header and line item......So I fixed that and tried again. I get zero as the answer.



SELECT count(DISTINCT Client_Last_Name)
FROM NewBusinessHeader INNER JOIN NewBusinessLineItem ON NewBusinessHeader.Client_Last_Name = NewBusinessLineItem.Carrier
WHERE Carrier='QBE';
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39267466
mmm, the question asks how to count distinct values, but the real problem seems to be:
 "how do I join 2 tables - when I don't know what fields to use?"

so far you have tried, 2 versions of the same thing (just swapping the order)

INNER JOIN NewBusinessLineItem
--#1
   ON NewBusinessLineItem.Carrier = NewBusinessHeader.Client_Last_Name
--#2
   ON NewBusinessHeader.Client_Last_Name = NewBusinessLineItem.Carrier

Swapping the order will not assist and most probably neither of these fields (Carrier + Client_Last_Name) relate to each other. The correct method to join these 2 tables is much more likely to be through ID fields (not text fields).

A "Header" table and  "lineItem" table would typically be joined like this

FROM MyHeader
INNER JOIN MyLineItems ON MyHeader.ID = MyLineItems.HeaderID

SO: The solution lies in finding the correct fields to use for the join, which might end-up looking like this:

SELECT count(DISTINCT H.Client_Last_Name)
FROM NewBusinessHeader AS H
INNER JOIN NewBusinessLineItem AS L ON H.ID = L.HeaderID -- this is a guess!
WHERE L.Carrier='QBE';

Note here I have introduced aliases H and L (Header/Lineitems)
and that I have prefixed each field by an appropriate alias

If you are still struggling to identify which fields to use in the join, please provide a listing of fields from each table.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39267471
btw: IF Client_Last_Name did = Carrier, and you filter for Carrier = 'QBE'

then the count of distinct Client_Last_Name would be 1

Client_Last_Name = Carrier
QBE = QBE
count(QBE) = 1
0
Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

 
LVL 48

Expert Comment

by:PortletPaul
ID: 39267489
perhaps this will help.

Let's assume that spreadsheet is created by an accurate query, and that result is called 'derived'.
select
  count(distinct [NB Header Client Last name])
        as Distinct_Customers
, count(distinct ( case when [Sold] = 'Yes' then [NB Header Client Last name] end ) )
        as Customers_Sold
, count(distinct ( case when [Sold] = 'Yes' then [NB Header Client Last name] end ) )
  /
  count(distinct [NB Header Client Last name])
        as Hit_Ratio
from (
           -- whatever it was you used to get the spreadsheet
         ) as derived

Open in new window

0
 
LVL 31

Expert Comment

by:awking00
ID: 39271128
select distinct count(x.a) over (partition by y.customer order by y.b) countdistinctcustomer,
count(y.b) over (partition by x.customer order by x.a) customerssold,
round((count(y.b) over (partition by x.customer order by x.a)/
 count(x.a) over (partition by y.customer order by y.b) * 100),1) hitratio from
(select customer, count(*) a from yourtable group by customer) as x,
(select customer, count(*) b from yourtable where sold = 'Yes' group by customer) as y;
0
 
LVL 31

Expert Comment

by:awking00
ID: 39271132
Note - I replaced "NB Header Client Last Name" with "customer"
0
 

Author Comment

by:Newbi22
ID: 39272354
OK....I am still having a little problem understanding. I think it is because my question and file it not the best...SO, I have uploaded a new file. Row #2 are the exact field names in the tables. There are 2 tables. NewBusinessHeader (Header) and NewBusinessLineItem (Line) both of the table names are in row 1 (shortened them for length purpose).

Both of these table are joined on NBHeaderID with a one to many. Header can only have 1 NBHeaderID where Line can have many.

Background...We are in insurance. I can have 1 customer in (Header), but send their insurance quotes to Many different insurance carriers (Line)... The quest is to find out how many "submissions" I send to each carrier, but if I count lines I get too many and I can't count just headers because the Carrier field is stored in the Line table with Customer in the Header table.
Carrier-Count.xlsx
0
 
LVL 31

Expert Comment

by:awking00
ID: 39272430
Can you show the values in the two tables separately, including the NBHeaderID? It may be as simple as substitiuting the header table for 'yourtable' in my subquery x and a join of the header and line tables for 'yourtable' in my subquery y. However, it might also require a dstinct keyword or use of an analytic function to limit the records returned since you state there can be many NBHeaderIDs in the line table.
0
 

Author Comment

by:Newbi22
ID: 39272476
As requested, I split the information into 2. Attached is a new file with a Header and Line Item Tab. The answer to my question in this file would be 9 submission (nine customers in the header table) where the Carrier is equal to Mike Insurance Co.  

This is small subset. We have probably 30+ carriers.
Carrier-Count.xlsx
0
 
LVL 31

Expert Comment

by:awking00
ID: 39272495
So do you want to see the distinct customers, soldcustomers, and hitratio
for specific carriers or a separate line for all carriers?
0
 

Author Comment

by:Newbi22
ID: 39272571
That would be ideal. The process looks like this.

1. I need to count the number of distinct customers (submission) I have sent to THAT particular carrier. (I will date range it as another where clause)

2. How many of those customers did we sell.

3. The math gives the hit ratio (#2 divided by #3)

I am sorry that I am having a problem explaining this. Right now I do in excel, but it is way to big.
0
 
LVL 48

Accepted Solution

by:
PortletPaul earned 500 total points
ID: 39273480
Your sample data and the following query is available at http://sqlfiddle.com/#!3/90e33/1 I have included a sample date range filter below, but your data contains nulls in the date fields - so that is going to be a curse (perhaps you have other fields you can also use?)
SELECT
      carrier  -- this column is optional
    , count(DISTINCT ClientLastName) AS count_ClientLastName
    , count(DISTINCT CASE WHEN sold='Yes' THEN ClientLastName END) AS count_Sold
    , count(DISTINCT CASE WHEN sold='Yes' THEN ClientLastName END) * 1.0
    / count(DISTINCT ClientLastName) * 1.0 AS Hit_Ratio

FROM NewBusinessHeader AS H
INNER JOIN NewBusinessLineItem AS I ON h.NBHeaderID = I.NBHeaderID

WHERE carrier = 'Mike Insurance Co' -- sent to THAT particular carrier
    
    --AND quote = 'Yes' -- is this considered?
    AND (
         (
              I.ProcessCompleteDate >= '2012-01-01'
          AND I.ProcessCompleteDate  < '2013-01-01'
         )
          OR I.ProcessCompleteDate IS NULL -- this is going to be painful
        )

GROUP BY
      carrier
;

Open in new window

nb: I removed spaces from the column names for simplicity
0
 

Author Closing Comment

by:Newbi22
ID: 39275965
WOW!!!!!!!! I tested and got it to work in the SQL program I have. I'll have to make some adjustments to get prompts in their for Process Complete Data and Carrier. Unless you have suggestions for that.  THANK YOU !
0

Featured Post

Complete Microsoft Windows PC® & Mac Backup

Backup and recovery solutions to protect all your PCs & Mac– on-premises or in remote locations. Acronis backs up entire PC or Mac with patented reliable disk imaging technology and you will be able to restore workstations to a new, dissimilar hardware in minutes.

Join & Write a Comment

Having an SQL database can be a big investment for a small company. Hardware, setup and of course, the price of software all add up to a big bill that some companies may not be able to absorb.  Luckily, there is a free version SQL Express, but does …
Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now