• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 324
  • Last Modified:

Counting DISTINCT Values that are in one column of SQL

I have attached an excel file that I will refer to which is just a small portion of the table. I am trying to count data to transfer to a BI tool. I know I have 24 lines of data, but I have 9 different customers.




Count Distinct Customers        Customers Sold            Hit Ratio
                 9                                        1                     11.1% (1/9)
Carrier-Count.xlsx
0
Michael Franz
Asked:
Michael Franz
  • 8
  • 4
  • 4
  • +1
1 Solution
 
gplanaCommented:
Try SELECT count(DISTINCT your_field) FROM your_table;
0
 
Michael FranzCFOAuthor Commented:
I need a WHERE clause because in the table there is mulitple carriers. Also the WHERE clause looks at a different table then the count
0
 
gplanaCommented:
No problem, this is just an example, you can add a WHERE and also some joins to other tables.
0
2018 Annual Membership Survey

Here at Experts Exchange, we strive to give members the best experience. Help us improve the site by taking this survey today! (Bonus: Be entered to win a great tech prize for participating!)

 
Michael FranzCFOAuthor Commented:
Sorry, I only do well when the code provide more assistance. I am a CFO trying to write SQL..... Never a good thing. Need more assistance with the code.
0
 
gplanaCommented:
You should do something like this:

SELECT count(DISTINCT your_field)
FROM your_table INNER JOIN your_other_table ON your_table.your_field = your_other_table.your_other_field
WHERE some_conditions_here;

Where conditions are something like: salary>1000 and name='John'
0
 
Michael FranzCFOAuthor Commented:
SELECT count(DISTINCT Client_Last_Name)
FROM NewBusinessHeader INNER JOIN NewBusinessLineItem ON NewBusinessLineItem.Carrier = NewBusinessHeader.Client_Last_Name
WHERE Carrier='Mike Insurance Co';

NOT WORKING.......

Carrier is in NewBusinessLineItem

Client_Last_Name is in NewBusinessHeader
0
 
Michael FranzCFOAuthor Commented:
I found 1 mistake between header and line item......So I fixed that and tried again. I get zero as the answer.



SELECT count(DISTINCT Client_Last_Name)
FROM NewBusinessHeader INNER JOIN NewBusinessLineItem ON NewBusinessHeader.Client_Last_Name = NewBusinessLineItem.Carrier
WHERE Carrier='QBE';
0
 
PortletPaulCommented:
mmm, the question asks how to count distinct values, but the real problem seems to be:
 "how do I join 2 tables - when I don't know what fields to use?"

so far you have tried, 2 versions of the same thing (just swapping the order)

INNER JOIN NewBusinessLineItem
--#1
   ON NewBusinessLineItem.Carrier = NewBusinessHeader.Client_Last_Name
--#2
   ON NewBusinessHeader.Client_Last_Name = NewBusinessLineItem.Carrier

Swapping the order will not assist and most probably neither of these fields (Carrier + Client_Last_Name) relate to each other. The correct method to join these 2 tables is much more likely to be through ID fields (not text fields).

A "Header" table and  "lineItem" table would typically be joined like this

FROM MyHeader
INNER JOIN MyLineItems ON MyHeader.ID = MyLineItems.HeaderID

SO: The solution lies in finding the correct fields to use for the join, which might end-up looking like this:

SELECT count(DISTINCT H.Client_Last_Name)
FROM NewBusinessHeader AS H
INNER JOIN NewBusinessLineItem AS L ON H.ID = L.HeaderID -- this is a guess!
WHERE L.Carrier='QBE';

Note here I have introduced aliases H and L (Header/Lineitems)
and that I have prefixed each field by an appropriate alias

If you are still struggling to identify which fields to use in the join, please provide a listing of fields from each table.
0
 
PortletPaulCommented:
btw: IF Client_Last_Name did = Carrier, and you filter for Carrier = 'QBE'

then the count of distinct Client_Last_Name would be 1

Client_Last_Name = Carrier
QBE = QBE
count(QBE) = 1
0
 
PortletPaulCommented:
perhaps this will help.

Let's assume that spreadsheet is created by an accurate query, and that result is called 'derived'.
select
  count(distinct [NB Header Client Last name])
        as Distinct_Customers
, count(distinct ( case when [Sold] = 'Yes' then [NB Header Client Last name] end ) )
        as Customers_Sold
, count(distinct ( case when [Sold] = 'Yes' then [NB Header Client Last name] end ) )
  /
  count(distinct [NB Header Client Last name])
        as Hit_Ratio
from (
           -- whatever it was you used to get the spreadsheet
         ) as derived

Open in new window

0
 
awking00Commented:
select distinct count(x.a) over (partition by y.customer order by y.b) countdistinctcustomer,
count(y.b) over (partition by x.customer order by x.a) customerssold,
round((count(y.b) over (partition by x.customer order by x.a)/
 count(x.a) over (partition by y.customer order by y.b) * 100),1) hitratio from
(select customer, count(*) a from yourtable group by customer) as x,
(select customer, count(*) b from yourtable where sold = 'Yes' group by customer) as y;
0
 
awking00Commented:
Note - I replaced "NB Header Client Last Name" with "customer"
0
 
Michael FranzCFOAuthor Commented:
OK....I am still having a little problem understanding. I think it is because my question and file it not the best...SO, I have uploaded a new file. Row #2 are the exact field names in the tables. There are 2 tables. NewBusinessHeader (Header) and NewBusinessLineItem (Line) both of the table names are in row 1 (shortened them for length purpose).

Both of these table are joined on NBHeaderID with a one to many. Header can only have 1 NBHeaderID where Line can have many.

Background...We are in insurance. I can have 1 customer in (Header), but send their insurance quotes to Many different insurance carriers (Line)... The quest is to find out how many "submissions" I send to each carrier, but if I count lines I get too many and I can't count just headers because the Carrier field is stored in the Line table with Customer in the Header table.
Carrier-Count.xlsx
0
 
awking00Commented:
Can you show the values in the two tables separately, including the NBHeaderID? It may be as simple as substitiuting the header table for 'yourtable' in my subquery x and a join of the header and line tables for 'yourtable' in my subquery y. However, it might also require a dstinct keyword or use of an analytic function to limit the records returned since you state there can be many NBHeaderIDs in the line table.
0
 
Michael FranzCFOAuthor Commented:
As requested, I split the information into 2. Attached is a new file with a Header and Line Item Tab. The answer to my question in this file would be 9 submission (nine customers in the header table) where the Carrier is equal to Mike Insurance Co.  

This is small subset. We have probably 30+ carriers.
Carrier-Count.xlsx
0
 
awking00Commented:
So do you want to see the distinct customers, soldcustomers, and hitratio
for specific carriers or a separate line for all carriers?
0
 
Michael FranzCFOAuthor Commented:
That would be ideal. The process looks like this.

1. I need to count the number of distinct customers (submission) I have sent to THAT particular carrier. (I will date range it as another where clause)

2. How many of those customers did we sell.

3. The math gives the hit ratio (#2 divided by #3)

I am sorry that I am having a problem explaining this. Right now I do in excel, but it is way to big.
0
 
PortletPaulCommented:
Your sample data and the following query is available at http://sqlfiddle.com/#!3/90e33/1 I have included a sample date range filter below, but your data contains nulls in the date fields - so that is going to be a curse (perhaps you have other fields you can also use?)
SELECT
      carrier  -- this column is optional
    , count(DISTINCT ClientLastName) AS count_ClientLastName
    , count(DISTINCT CASE WHEN sold='Yes' THEN ClientLastName END) AS count_Sold
    , count(DISTINCT CASE WHEN sold='Yes' THEN ClientLastName END) * 1.0
    / count(DISTINCT ClientLastName) * 1.0 AS Hit_Ratio

FROM NewBusinessHeader AS H
INNER JOIN NewBusinessLineItem AS I ON h.NBHeaderID = I.NBHeaderID

WHERE carrier = 'Mike Insurance Co' -- sent to THAT particular carrier
    
    --AND quote = 'Yes' -- is this considered?
    AND (
         (
              I.ProcessCompleteDate >= '2012-01-01'
          AND I.ProcessCompleteDate  < '2013-01-01'
         )
          OR I.ProcessCompleteDate IS NULL -- this is going to be painful
        )

GROUP BY
      carrier
;

Open in new window

nb: I removed spaces from the column names for simplicity
0
 
Michael FranzCFOAuthor Commented:
WOW!!!!!!!! I tested and got it to work in the SQL program I have. I'll have to make some adjustments to get prompts in their for Process Complete Data and Carrier. Unless you have suggestions for that.  THANK YOU !
0

Featured Post

2018 Annual Membership Survey

Here at Experts Exchange, we strive to give members the best experience. Help us improve the site by taking this survey today! (Bonus: Be entered to win a great tech prize for participating!)

  • 8
  • 4
  • 4
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now