Solved

Single row for Count(column is NULL), Count(all rows)

Posted on 2014-12-22
10
131 Views
Last Modified: 2015-01-05
Hi all

Does anyone have some quick and dirty (copyright jimpen) T-SQL to return a set with a single row that contains a count of all rows where a specific value is NULL (say column_name), and a count of the rows in the entire table (say table_name)?

I've put together the below CTE, which works fine, but it seems like there's a more elegant way to do this that I'm not grasping.

Thanks in advance.
Jim

;
WITH m as (
	SELECT 'Account' as label, COUNT(id) as row_count_column_name_missing
	FROM table_name
	WHERE column_name IS NULL)
, a as (	
	SELECT 'Account' as label, COUNT(id) as row_count_all
	FROM table_name) 
SELECT 
	m.label, 
	m.row_count_column_name_missing, 
	a.row_count_all,
	CAST(m.row_count_column_name_missing / CAST(row_count_all as numeric(19,4)) * 100 as numeric(5,2)) as pct_missing
FROM m
	JOIN a ON m.label = a.label

Open in new window

0
Comment
Question by:Jim Horn
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +2
10 Comments
 
LVL 10

Assisted Solution

by:Ray
Ray earned 150 total points
ID: 40513198
Not sure I'd call this elegant, but simpler for sure.

I didn't have time to test this, so the case statement may need a slight syntax adjustment.

select  sum(case when Col_Name NULL then 1 else 0 end ) ,  count(*)
from  Table_Name
0
 
LVL 69

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 275 total points
ID: 40513211
Yep, with a very slight syntax change (no pts for me please):

select  sum(case when Col_Name IS NULL then 1 else 0 end ) AS col_name_null_count,  count(*) as total_rows_in_table
0
 
LVL 10

Expert Comment

by:Ray
ID: 40513214
Too many irons in the fire this morning, Thanks for tidying it up Scott! :-)
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 65

Author Comment

by:Jim Horn
ID: 40513235
Simpler yes, but on a table with 1m rows this takes 7 seconds vs. 1 second for the CTE approach.

Tinkering, tinkering..
0
 
LVL 10

Expert Comment

by:Ray
ID: 40513311
Sorry JIm, I wasn't 'reading into' the question.
0
 
LVL 32

Expert Comment

by:awking00
ID: 40513482
select 'account' as label, count(*) as row_count_all, count(column_name) as row_count_column_name_missing,
cast((count(column_name)/count(*)) * 100 as numeric(5,2)) as pct_missing
from yourtable;
0
 
LVL 69

Accepted Solution

by:
Scott Pletcher earned 275 total points
ID: 40514051
>> Simpler yes, but on a table with 1m rows this takes 7 seconds vs. 1 second for the CTE approach. <<

That's odd, because the CTEs should scan to the table twice, the query only once.  Perhaps if the other query is run first and the CTE is run second and uses what's already in the buffers.

You can simplify the query to:
select count(*), count(column_name)
since count will ignore nulls anyway.  Perhaps, maybe, an old optimizer might scan the table twice for the "case(...)" version.

Better than time is to look at the query plan and/or compare logical I/O counts:
SET STATISTICS IO ON
before running the queries.

Be sure to ignore the first run no matter which method is used.  That allows the rows to get into buffers.

Then you can compare I/O and even times, although elapsed time is affected by many things and thus doesn't necessarily directly indicate the overhead in a given query.
0
 
LVL 49

Assisted Solution

by:Vitor Montalvão
Vitor Montalvão earned 75 total points
ID: 40514619
Simpler yes, but on a table with 1m rows this takes 7 seconds vs. 1 second for the CTE approach.
Strange. You're sure that isn't a cache issue?
0
 
LVL 32

Expert Comment

by:awking00
ID: 40514784
Actually, I guess row_count_column_name_missing should be count(*) - count(column_name).
0
 
LVL 65

Author Comment

by:Jim Horn
ID: 40531578
Cleared the cache, set statistics io and time on, and reran both my proposed and the new code here.
New code is half the elapsed time and simpler, so I'll go with that.

Code #1 - My original
DBCC FREEPROCCACHE
SET STATISTICS IO ON
SET STATISTICS TIME ON                     
;
WITH m as (
	SELECT 'Account' as label, COUNT(id) as row_count_column_name_missing
	FROM SF_Account_1
	WHERE Address1_BioIQ__c IS NULL)
, a as (	
	SELECT 'Account' as label, COUNT(id) as row_count_all
	FROM SF_Account_1) 
SELECT 
	m.label, 
	m.row_count_column_name_missing, 
	a.row_count_all,
	-- CAST(m.row_count_column_name_missing / CAST(row_count_all as numeric(19,4))) * 100 as numeric(5,2)) as pct_missing
	100 - CAST(m.row_count_column_name_missing / CAST(row_count_all as numeric(19,4)) * 100 as numeric(5,2))
FROM m
	JOIN a ON m.label = a.label

SET STATISTICS IO OFF
SET STATISTICS TIME OFF

Open in new window

Results #1
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(1 row(s) affected)
Table 'SF_Account_1'. Scan count 34, logical reads 636238, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 2651 ms,  elapsed time = 577 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Open in new window


Code #2 - Proposed here, with minor modificaitons
DBCC FREEPROCCACHE
SET STATISTICS IO ON
SET STATISTICS TIME ON  
 
SELECT 
	a.col_name_null_count, 
	a.total_rows_in_table, 
	a.col_name_null_count / CAST(a.total_rows_in_table as numeric(19,4)) as pct_missing
FROM (
	select 
		SUM(CASE WHEN Address1_BioIQ__c IS NULL then 1 else 0 end ) AS col_name_null_count,  
		COUNT(*) as total_rows_in_table
	FROM SF_Account_1) a
                           
SET STATISTICS IO OFF
SET STATISTICS TIME OFF

Open in new window


Results #2
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(1 row(s) affected)
Table 'SF_Account_1'. Scan count 17, logical reads 318119, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 2701 ms,  elapsed time = 256 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Open in new window

0

Featured Post

The Eight Noble Truths of Backup and Recovery

How can IT departments tackle the challenges of a Big Data world? This white paper provides a roadmap to success and helps companies ensure that all their data is safe and secure, no matter if it resides on-premise with physical or virtual machines or in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.
Viewers will learn how the fundamental information of how to create a table.

735 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question