Solved

Single row for Count(column is NULL), Count(all rows)

Posted on 2014-12-22
10
126 Views
Last Modified: 2015-01-05
Hi all

Does anyone have some quick and dirty (copyright jimpen) T-SQL to return a set with a single row that contains a count of all rows where a specific value is NULL (say column_name), and a count of the rows in the entire table (say table_name)?

I've put together the below CTE, which works fine, but it seems like there's a more elegant way to do this that I'm not grasping.

Thanks in advance.
Jim

;
WITH m as (
	SELECT 'Account' as label, COUNT(id) as row_count_column_name_missing
	FROM table_name
	WHERE column_name IS NULL)
, a as (	
	SELECT 'Account' as label, COUNT(id) as row_count_all
	FROM table_name) 
SELECT 
	m.label, 
	m.row_count_column_name_missing, 
	a.row_count_all,
	CAST(m.row_count_column_name_missing / CAST(row_count_all as numeric(19,4)) * 100 as numeric(5,2)) as pct_missing
FROM m
	JOIN a ON m.label = a.label

Open in new window

0
Comment
Question by:Jim Horn
  • 3
  • 2
  • 2
  • +2
10 Comments
 
LVL 10

Assisted Solution

by:Ray
Ray earned 150 total points
ID: 40513198
Not sure I'd call this elegant, but simpler for sure.

I didn't have time to test this, so the case statement may need a slight syntax adjustment.

select  sum(case when Col_Name NULL then 1 else 0 end ) ,  count(*)
from  Table_Name
0
 
LVL 69

Assisted Solution

by:ScottPletcher
ScottPletcher earned 275 total points
ID: 40513211
Yep, with a very slight syntax change (no pts for me please):

select  sum(case when Col_Name IS NULL then 1 else 0 end ) AS col_name_null_count,  count(*) as total_rows_in_table
0
 
LVL 10

Expert Comment

by:Ray
ID: 40513214
Too many irons in the fire this morning, Thanks for tidying it up Scott! :-)
0
 
LVL 65

Author Comment

by:Jim Horn
ID: 40513235
Simpler yes, but on a table with 1m rows this takes 7 seconds vs. 1 second for the CTE approach.

Tinkering, tinkering..
0
 
LVL 10

Expert Comment

by:Ray
ID: 40513311
Sorry JIm, I wasn't 'reading into' the question.
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 
LVL 31

Expert Comment

by:awking00
ID: 40513482
select 'account' as label, count(*) as row_count_all, count(column_name) as row_count_column_name_missing,
cast((count(column_name)/count(*)) * 100 as numeric(5,2)) as pct_missing
from yourtable;
0
 
LVL 69

Accepted Solution

by:
ScottPletcher earned 275 total points
ID: 40514051
>> Simpler yes, but on a table with 1m rows this takes 7 seconds vs. 1 second for the CTE approach. <<

That's odd, because the CTEs should scan to the table twice, the query only once.  Perhaps if the other query is run first and the CTE is run second and uses what's already in the buffers.

You can simplify the query to:
select count(*), count(column_name)
since count will ignore nulls anyway.  Perhaps, maybe, an old optimizer might scan the table twice for the "case(...)" version.

Better than time is to look at the query plan and/or compare logical I/O counts:
SET STATISTICS IO ON
before running the queries.

Be sure to ignore the first run no matter which method is used.  That allows the rows to get into buffers.

Then you can compare I/O and even times, although elapsed time is affected by many things and thus doesn't necessarily directly indicate the overhead in a given query.
0
 
LVL 45

Assisted Solution

by:Vitor Montalvão
Vitor Montalvão earned 75 total points
ID: 40514619
Simpler yes, but on a table with 1m rows this takes 7 seconds vs. 1 second for the CTE approach.
Strange. You're sure that isn't a cache issue?
0
 
LVL 31

Expert Comment

by:awking00
ID: 40514784
Actually, I guess row_count_column_name_missing should be count(*) - count(column_name).
0
 
LVL 65

Author Comment

by:Jim Horn
ID: 40531578
Cleared the cache, set statistics io and time on, and reran both my proposed and the new code here.
New code is half the elapsed time and simpler, so I'll go with that.

Code #1 - My original
DBCC FREEPROCCACHE
SET STATISTICS IO ON
SET STATISTICS TIME ON                     
;
WITH m as (
	SELECT 'Account' as label, COUNT(id) as row_count_column_name_missing
	FROM SF_Account_1
	WHERE Address1_BioIQ__c IS NULL)
, a as (	
	SELECT 'Account' as label, COUNT(id) as row_count_all
	FROM SF_Account_1) 
SELECT 
	m.label, 
	m.row_count_column_name_missing, 
	a.row_count_all,
	-- CAST(m.row_count_column_name_missing / CAST(row_count_all as numeric(19,4))) * 100 as numeric(5,2)) as pct_missing
	100 - CAST(m.row_count_column_name_missing / CAST(row_count_all as numeric(19,4)) * 100 as numeric(5,2))
FROM m
	JOIN a ON m.label = a.label

SET STATISTICS IO OFF
SET STATISTICS TIME OFF

Open in new window

Results #1
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(1 row(s) affected)
Table 'SF_Account_1'. Scan count 34, logical reads 636238, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 2651 ms,  elapsed time = 577 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Open in new window


Code #2 - Proposed here, with minor modificaitons
DBCC FREEPROCCACHE
SET STATISTICS IO ON
SET STATISTICS TIME ON  
 
SELECT 
	a.col_name_null_count, 
	a.total_rows_in_table, 
	a.col_name_null_count / CAST(a.total_rows_in_table as numeric(19,4)) as pct_missing
FROM (
	select 
		SUM(CASE WHEN Address1_BioIQ__c IS NULL then 1 else 0 end ) AS col_name_null_count,  
		COUNT(*) as total_rows_in_table
	FROM SF_Account_1) a
                           
SET STATISTICS IO OFF
SET STATISTICS TIME OFF

Open in new window


Results #2
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(1 row(s) affected)
Table 'SF_Account_1'. Scan count 17, logical reads 318119, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 2701 ms,  elapsed time = 256 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Open in new window

0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
In this article I will describe the Backup & Restore method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now