Solved

Single row for Count(column is NULL), Count(all rows)

Posted on 2014-12-22
10
133 Views
Last Modified: 2015-01-05
Hi all

Does anyone have some quick and dirty (copyright jimpen) T-SQL to return a set with a single row that contains a count of all rows where a specific value is NULL (say column_name), and a count of the rows in the entire table (say table_name)?

I've put together the below CTE, which works fine, but it seems like there's a more elegant way to do this that I'm not grasping.

Thanks in advance.
Jim

;
WITH m as (
	SELECT 'Account' as label, COUNT(id) as row_count_column_name_missing
	FROM table_name
	WHERE column_name IS NULL)
, a as (	
	SELECT 'Account' as label, COUNT(id) as row_count_all
	FROM table_name) 
SELECT 
	m.label, 
	m.row_count_column_name_missing, 
	a.row_count_all,
	CAST(m.row_count_column_name_missing / CAST(row_count_all as numeric(19,4)) * 100 as numeric(5,2)) as pct_missing
FROM m
	JOIN a ON m.label = a.label

Open in new window

0
Comment
Question by:Jim Horn
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +2
10 Comments
 
LVL 11

Assisted Solution

by:Ray
Ray earned 150 total points
ID: 40513198
Not sure I'd call this elegant, but simpler for sure.

I didn't have time to test this, so the case statement may need a slight syntax adjustment.

select  sum(case when Col_Name NULL then 1 else 0 end ) ,  count(*)
from  Table_Name
0
 
LVL 69

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 275 total points
ID: 40513211
Yep, with a very slight syntax change (no pts for me please):

select  sum(case when Col_Name IS NULL then 1 else 0 end ) AS col_name_null_count,  count(*) as total_rows_in_table
0
 
LVL 11

Expert Comment

by:Ray
ID: 40513214
Too many irons in the fire this morning, Thanks for tidying it up Scott! :-)
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 66

Author Comment

by:Jim Horn
ID: 40513235
Simpler yes, but on a table with 1m rows this takes 7 seconds vs. 1 second for the CTE approach.

Tinkering, tinkering..
0
 
LVL 11

Expert Comment

by:Ray
ID: 40513311
Sorry JIm, I wasn't 'reading into' the question.
0
 
LVL 32

Expert Comment

by:awking00
ID: 40513482
select 'account' as label, count(*) as row_count_all, count(column_name) as row_count_column_name_missing,
cast((count(column_name)/count(*)) * 100 as numeric(5,2)) as pct_missing
from yourtable;
0
 
LVL 69

Accepted Solution

by:
Scott Pletcher earned 275 total points
ID: 40514051
>> Simpler yes, but on a table with 1m rows this takes 7 seconds vs. 1 second for the CTE approach. <<

That's odd, because the CTEs should scan to the table twice, the query only once.  Perhaps if the other query is run first and the CTE is run second and uses what's already in the buffers.

You can simplify the query to:
select count(*), count(column_name)
since count will ignore nulls anyway.  Perhaps, maybe, an old optimizer might scan the table twice for the "case(...)" version.

Better than time is to look at the query plan and/or compare logical I/O counts:
SET STATISTICS IO ON
before running the queries.

Be sure to ignore the first run no matter which method is used.  That allows the rows to get into buffers.

Then you can compare I/O and even times, although elapsed time is affected by many things and thus doesn't necessarily directly indicate the overhead in a given query.
0
 
LVL 51

Assisted Solution

by:Vitor Montalvão
Vitor Montalvão earned 75 total points
ID: 40514619
Simpler yes, but on a table with 1m rows this takes 7 seconds vs. 1 second for the CTE approach.
Strange. You're sure that isn't a cache issue?
0
 
LVL 32

Expert Comment

by:awking00
ID: 40514784
Actually, I guess row_count_column_name_missing should be count(*) - count(column_name).
0
 
LVL 66

Author Comment

by:Jim Horn
ID: 40531578
Cleared the cache, set statistics io and time on, and reran both my proposed and the new code here.
New code is half the elapsed time and simpler, so I'll go with that.

Code #1 - My original
DBCC FREEPROCCACHE
SET STATISTICS IO ON
SET STATISTICS TIME ON                     
;
WITH m as (
	SELECT 'Account' as label, COUNT(id) as row_count_column_name_missing
	FROM SF_Account_1
	WHERE Address1_BioIQ__c IS NULL)
, a as (	
	SELECT 'Account' as label, COUNT(id) as row_count_all
	FROM SF_Account_1) 
SELECT 
	m.label, 
	m.row_count_column_name_missing, 
	a.row_count_all,
	-- CAST(m.row_count_column_name_missing / CAST(row_count_all as numeric(19,4))) * 100 as numeric(5,2)) as pct_missing
	100 - CAST(m.row_count_column_name_missing / CAST(row_count_all as numeric(19,4)) * 100 as numeric(5,2))
FROM m
	JOIN a ON m.label = a.label

SET STATISTICS IO OFF
SET STATISTICS TIME OFF

Open in new window

Results #1
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(1 row(s) affected)
Table 'SF_Account_1'. Scan count 34, logical reads 636238, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 2651 ms,  elapsed time = 577 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Open in new window


Code #2 - Proposed here, with minor modificaitons
DBCC FREEPROCCACHE
SET STATISTICS IO ON
SET STATISTICS TIME ON  
 
SELECT 
	a.col_name_null_count, 
	a.total_rows_in_table, 
	a.col_name_null_count / CAST(a.total_rows_in_table as numeric(19,4)) as pct_missing
FROM (
	select 
		SUM(CASE WHEN Address1_BioIQ__c IS NULL then 1 else 0 end ) AS col_name_null_count,  
		COUNT(*) as total_rows_in_table
	FROM SF_Account_1) a
                           
SET STATISTICS IO OFF
SET STATISTICS TIME OFF

Open in new window


Results #2
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(1 row(s) affected)
Table 'SF_Account_1'. Scan count 17, logical reads 318119, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 2701 ms,  elapsed time = 256 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Open in new window

0

Featured Post

Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
A Stored Procedure in Microsoft SQL Server is a powerful feature that it can be used to execute the Data Manipulation Language (DML) or Data Definition Language (DDL). Depending on business requirements, a single Stored Procedure can return differe…
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed
Via a live example, show how to setup several different housekeeping processes for a SQL Server.

617 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question