selectivity of columns

anushahanna
anushahanna used Ask the Experts™
on
Is there a way to find the selectivity of the columns in a table?

thanks
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Yes!

Selectivity = no. of unique values in column / total number of rows

Author

Commented:
thanks. no sql in-built function that will give this?
PMI ACP® Project Management

Prepare for the PMI Agile Certified Practitioner (PMI-ACP)® exam, which formally recognizes your knowledge of agile principles and your skill with agile techniques.

If you want to do this quickly for one column you can execute this query:


SELECT 1.0*(SELECT COUNT(DISTINCT <your column>) FROM <your table>) / (SELECT COUNT(*) FROM <your table>)

Open in new window

Author

Commented:
ralmada, I would be curious to check it out.. i tried it but it gives some schema errors. Do you have it installed in your system? if you don't mind, can you tell what kind of output do you get? Do you get the selectivity on all the columns in a table?

Author

Commented:
gothamite, thanks. that was helpful. do you think using 2 cursors - one for table level and one for column level and then using your query should effectively get all the selectivity info for all columns in the database?
Yes that would probably have the least impact
I would also add

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

to the start of the query which will cause minimal locking of data so as to not affect any other queries.

Commented:
>>ralmada, I would be curious to check it out.. i tried it but it gives some schema errors. Do you have it installed in your system? if you don't mind, can you tell what kind of output do you get? Do you get the selectivity on all the columns in a table?<<
No I don't have it installed, but I can give it a try if you give me until tomorrow.

Author

Commented:
gothamite, do you mean put the isolation level in the cursor?

Thanks ralmada for your help in that regard.
No, just put the isolation level at start of the batch before all other commands i.e. just before you declare your cursors etc.,


Commented:
Ok, the example I've provided before requires additional functions as well as you need to specify the table and column name, so not good. I have done this however for all tables and views within the database or just for one table see second example:
 
 

--All columns all tables within the database
declare @strSQL varchar(max)

select @strSQL = coalesce(@strSQL, ' ') +  ' union all select ''' + a.name + ''' as tablename, ''' + b.name + ''' as columnanme, count(distinct [' + b.name + ']) * 1.0 / nullif(count(*),0) as selectivity from [' + a.name + ']'
from sys.objects a 
inner join sys.columns b on a.object_id = b.object_id
where a.type in ('U', 'V')

set @strSQL = stuff(@strSQL, 1, 11, '')
exec(@strSQL)


--All columns in a table:
declare @strSQL varchar(max)

select @strSQL = coalesce(@strSQL, ' ') +  ' union all select ''' + a.name + ''' as tablename, ''' + b.name + ''' as columnanme, count(distinct [' + b.name + ']) * 1.0 / nullif(count(*),0) as selectivity from [' + a.name + ']'
from sys.objects a 
inner join sys.columns b on a.object_id = b.object_id
where a.name = 'productct' -- a.type in ('U', 'V')

set @strSQL = stuff(@strSQL, 1, 11, '')
exec(@strSQL)

Open in new window

Commented:
so in the second example I've used a table of my own, but just change it accordingly.
declare @strSQL varchar(max) 
 
select @strSQL = coalesce(@strSQL, ' ') +  ' union all select ''' + a.name + ''' as tablename, ''' + b.name + ''' as columnanme, count(distinct [' + b.name + ']) * 1.0 / nullif(count(*),0) as selectivity from [' + a.name + ']' 
from sys.objects a  
inner join sys.columns b on a.object_id = b.object_id 
where a.name = 'yourtablename' 
 
set @strSQL = stuff(@strSQL, 1, 11, '') 
exec(@strSQL)

Open in new window

Author

Commented:
gothamite, got you. Why 'uncommitted' isolation before getting selectivity?

Author

Commented:
ralmada,
Thanks very much for spinning a query that skillfully.

Now I do have a problem in that this database needs the schema with the tablename otherwise it complains.
I was able to add 'inner join sys.schemas s on a.schema_id = s.schema_id' to your query, but I was not able to successfully add that into the @strSQL. can you please help me add the schema also into the string?
Read uncommitted will greatly reduce the number of locks required to run the selectivity query. You don't have to do this, but it will limit the effect on any other queries you may have running.

Commented:
Here you go
declare @strSQL varchar(max)  
  
select @strSQL = coalesce(@strSQL, ' ') +  ' union all select ''' + s.name + ''' as schema, ''' + a.name + ''' as tablename, ''' + b.name + ''' as columnanme, count(distinct [' + b.name + ']) * 1.0 / nullif(count(*),0) as selectivity from [' + s.name + '.' + a.name + ']'  
from sys.objects a   
inner join sys.columns b on a.object_id = b.object_id  
inner join sys.schemas s on a.schema_id = s.schema_id
where a.name = 'yourtablename'  
  
set @strSQL = stuff(@strSQL, 1, 11, '')  
exec(@strSQL)

Open in new window

Author

Commented:
gothamite, makes perfect sense, as it will be a very intense query. thanks for that very helpful idea.

hopefully ralmada is going to be a great fix to the cursor.

Author

Commented:
ralmada, that was brilliant. thanks very much. it works beautifully.

For all tables, do you just change the "where a.name = 'yourtablename'" section  to "where a.type = 'U'"?
I did that, and I am getting an error

Msg 8117, Level 16, State 1, Line 1
Operand data type ntext is invalid for count operator.

Commented:
ok, so the problem is that you have ntext columns. If you're in SQL 2005+ then you should definitively avoid using ntext and start using nvarchar(max). Same goes with text datatypes, should be replaced with varchar(max).
Nevertheless here's a fix to avoid that problem
 

declare @strSQL varchar(max)   
   
select @strSQL = coalesce(@strSQL, ' ') +  ' union all select ''' + s.name + ''' as schema, ''' + a.name + ''' as tablename, ''' + b.name + ''' as columnanme, count(distinct cast([' + b.name + '] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [' + s.name + '.' + a.name + ']'   
from sys.objects a    
inner join sys.columns b on a.object_id = b.object_id   
inner join sys.schemas s on a.schema_id = s.schema_id 
where a.name = 'yourtablename'   
   
set @strSQL = stuff(@strSQL, 1, 11, '')   
exec(@strSQL)

Open in new window

Author

Commented:
ralmada, thanks very much for the good advice on text data types.

I tried the query, now I get this error, 7 times if i run for one table, and many times more if I run for the whold DB:
Msg 156, Level 15, State 1, Line 1
Incorrect syntax near the keyword 'schema'.

I changed the
 [' + s.name + '.' + a.name + ']'  
part to
[' + s.name + '].[' + a.name + ']'
still that did not help.. can you kindly take a look at it..

Commented:
what about like this? If not, can you run a print @strSQL instead of exec(@strSQL) for a table that is giving errors and post the result here:
declare @strSQL varchar(max)    
    
select @strSQL = coalesce(@strSQL, ' ') +  ' union all select ''' + s.name + ''' as [schema], ''' + a.name + ''' as tablename, ''' + b.name + ''' as columnanme, count(distinct cast([' + b.name + '] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [' + s.name + '].[' + a.name + ']'    
from sys.objects a     
inner join sys.columns b on a.object_id = b.object_id    
inner join sys.schemas s on a.schema_id = s.schema_id  
where a.name = 'yourtablename'    
    
set @strSQL = stuff(@strSQL, 1, 11, '')    
exec(@strSQL)

Open in new window

Author

Commented:
I am getting multiple errors of
Incorrect syntax near the keyword 'as'.

The print gives me
"
 select 'dbo' as [schema], 'ConfigurationInfo' as tablename, 'ConfigInfoID' as columnanme, count(distinct cast([ConfigInfoID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ConfigurationInfo] union all select 'dbo' as [schema], 'ConfigurationInfo' as tablename, 'Name' as columnanme, count(distinct cast([Name] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ConfigurationInfo] union all select 'dbo' as [schema], 'ConfigurationInfo' as tablename, 'Value' as columnanme, count(distinct cast([Value] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ConfigurationInfo] union all select 'dbo' as [schema], 'Catalog' as tablename, 'ItemID' as columnanme, count(distinct cast([ItemID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Path' as columnanme, count(distinct cast([Path] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Name' as columnanme, count(distinct cast([Name] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'ParentID' as columnanme, count(distinct cast([ParentID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Type' as columnanme, count(distinct cast([Type] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Content' as columnanme, count(distinct cast([Content] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Intermediate' as columnanme, count(distinct cast([Intermediate] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'SnapshotDataID' as columnanme, count(distinct cast([SnapshotDataID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'LinkSourceID' as columnanme, count(distinct cast([LinkSourceID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Property' as columnanme, count(distinct cast([Property] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Description' as columnanme, count(distinct cast([Description] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Hidden' as columnanme, count(distinct cast([Hidden] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'CreatedByID' as columnanme, count(distinct cast([CreatedByID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'CreationDate' as columnanme, count(distinct cast([CreationDate] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'ModifiedByID' as columnanme, count(distinct cast([ModifiedByID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'ModifiedDate' as columnanme, count(distinct cast([ModifiedDate] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'MimeType' as columnanme, count(distinct cast([MimeType] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'SnapshotLimit' as columnanme, count(distinct cast([SnapshotLimit] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Parameter' as columnanme, count(distinct cast([Parameter] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'PolicyID' as columnanme, count(distinct cast([PolicyID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'PolicyRoot' as columnanme, count(distinct cast([PolicyRoot] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'ExecutionFlag' as columnanme, count(distinct cast([ExecutionFlag] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'ExecutionTime' as columnanme, count(distinct cast([ExecutionTime] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'UpgradeInfo' as tablename, 'Item' as columnanme, count(distinct cast([Item] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[UpgradeInfo] union all select 'dbo' as [schema], 'UpgradeInfo' as tablename, 'Status' as columnanme, count(distinct cast([Status] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[UpgradeInfo] union all select 'dbo' as [schema], 'ModelDrill' as tablename, 'ModelDrillID' as columnanme, count(distinct cast([ModelDrillID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelDrill] union all select 'dbo' as [schema], 'ModelDrill' as tablename, 'ModelID' as columnanme, count(distinct cast([ModelID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelDrill] union all select 'dbo' as [schema], 'ModelDrill' as tablename, 'ReportID' as columnanme, count(distinct cast([ReportID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelDrill] union all select 'dbo' as [schema], 'ModelDrill' as tablename, 'ModelItemID' as columnanme, count(distinct cast([ModelItemID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelDrill] union all select 'dbo' as [schema], 'ModelDrill' as tablename, 'Type' as columnanme, count(distinct cast([Type] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelDrill] union all select 'dbo' as [schema], 'ModelPerspective' as tablename, 'ID' as columnanme, count(distinct cast([ID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelPerspective] union all select 'dbo' as [schema], 'ModelPerspective' as tablename, 'ModelID' as columnanme, count(distinct cast([ModelID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelPerspective] union all select 'dbo' as [schema], 'ModelPerspective' as tablename, 'PerspectiveID' as columnanme, count(distinct cast([PerspectiveID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelPerspective] union all select 'dbo' as [schema], 'ModelPerspective' as tablename, 'PerspectiveName' as columnanme, count(distinct cast([PerspectiveName] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelPerspective] union all select 'dbo' as [schema], 'ModelPerspective' as tablename, 'PerspectiveDescription' as columnanme, count(distinct cast([PerspectiveDescription] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelPerspective] union all select 'dbo' as [schema], 'CachePolicy' as tablename, 'CachePolicyID' as columnanme, count(distinct cast([CachePolicyID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[CachePolicy] union all select 'dbo'
"
Msg 156, Level 15, State 1, Line 1
Incorrect syntax near the keyword 'as'.
Msg 105, Level 15, State 1, Line 1
Unclosed quotation mark after the character string 'DESC
'.
Msg 102, Level 15, State 1, Line 1
Incorrect syntax near 'DESC
'.

Open in new window

Commented:
the print seems to be truncated. Have you changed the @strSQL datatype? I've used varchar(max)
If  not, can you post it in the Code section like below

post the result of print here.

Open in new window

Author

Commented:
ralmada, I did not change any of the datatypes;just did copy/paste.

I was also curious.. the print only printed till "union all select 'dbo'" which did not seem to be complete...

Commented:
ok, the problem is that print only takes up to 8000 characters and your string seems to be longer than that!!!
try this:
use select @strSQL instead of print @strSQL  and post the result here.
Also can you run it for a single table causing errors so we can narrow down the problem.

Author

Commented:
ralmada, OK surely will try first thing tomorrow.
thanks for your help.

Commented:
Ok, I found the bug :) there's a missing bracket for the cast function.
I have also attached a new alternative using a cursor. Not a lot of difference in my environment, give both a try and let me know.
For the cursor alternative I have tried with and without the
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

but it didn't really make any differences.
 

/*--Without cursor
declare @strSQL varchar(max)     
     
select @strSQL = coalesce(@strSQL, ' ') +  ' union all select ''' + s.name + ''' as [schema], ''' + a.name + ''' as tbl, ''' + b.name + ''' as col, count(distinct cast([' + b.name + '] as varchar(max))) * 1.0 / nullif(count(*),0) as sy from [' + s.name + '].[' + a.name + ']'     
from sys.objects a      
inner join sys.columns b on a.object_id = b.object_id     
inner join sys.schemas s on a.schema_id = s.schema_id   
inner join sys.types t on b.user_type_id = t.user_type_id
where a.type = 'U' --a.name = 'yourtablename'     
and t.name <> 'xml'
      
set @strSQL = stuff(@strSQL, 1, 11, '')     
exec(@strSQL)
*/

--Using Cursor
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

declare @t table (schemaname varchar(255), tablename varchar(255), columnname varchar(255), stivity decimal(10,2))
declare @strSQL varchar(max)
declare @schema varchar(255)
declare @table varchar(255)
declare @column varchar(255)

declare selectivity cursor read_only for
select s.name as [schema], a.name as [table], b.name as [column]
from sys.objects a      
inner join sys.columns b on a.object_id = b.object_id     
inner join sys.schemas s on a.schema_id = s.schema_id   
inner join sys.types t on b.user_type_id = t.user_type_id
where a.type = 'U' --a.name = 'yourtablename'     
and t.name <> 'xml'

open selectivity

fetch next from selectivity
into @schema, @table, @column

set @strSQL = ' '

while @@FETCH_STATUS = 0
begin
	
	set @strSQL = 'select ''' + @schema + ''', ''' + @table + ''', ''' + @column + ''', count(distinct cast([' + @column + '] as varchar(max))) * 1.0 / nullif(count(*),0) from [' + @schema + '].[' + @table + ']'     

	insert @t
	exec(@strSQL)

	fetch next from selectivity
	into @schema, @table, @column

end

close selectivity
deallocate selectivity

select * from @t

Open in new window

Author

Commented:
Thanks so much, ralmada.

For the non-cursor one, I get this
Msg 529, Level 16, State 2, Line 1
Explicit conversion from data type image to varchar(max) is not allowed.

i am getting the same for the cursor one also.

This is different than the 'Operand data type ntext is invalid for count operator.' we got before.

Commented:
wow, you have lots of datatypes there lol!
You will have to exclude them
Also remember that ntext, text and image are deprecated, you should consider using varchar(max) instead.

/*--Without cursor 
declare @strSQL varchar(max)      
      
select @strSQL = coalesce(@strSQL, ' ') +  ' union all select ''' + s.name + ''' as [schema], ''' + a.name + ''' as tbl, ''' + b.name + ''' as col, count(distinct cast([' + b.name + '] as varchar(max))) * 1.0 / nullif(count(*),0) as sy from [' + s.name + '].[' + a.name + ']'      
from sys.objects a       
inner join sys.columns b on a.object_id = b.object_id      
inner join sys.schemas s on a.schema_id = s.schema_id    
inner join sys.types t on b.user_type_id = t.user_type_id 
where a.type = 'U' --a.name = 'yourtablename'      
and t.name not in ('xml', 'image')
       
set @strSQL = stuff(@strSQL, 1, 11, '')      
exec(@strSQL) 
*/ 
 
--Using Cursor 
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED 
 
declare @t table (schemaname varchar(255), tablename varchar(255), columnname varchar(255), stivity decimal(10,2)) 
declare @strSQL varchar(max) 
declare @schema varchar(255) 
declare @table varchar(255) 
declare @column varchar(255) 
 
declare selectivity cursor read_only for 
select s.name as [schema], a.name as [table], b.name as [column] 
from sys.objects a       
inner join sys.columns b on a.object_id = b.object_id      
inner join sys.schemas s on a.schema_id = s.schema_id    
inner join sys.types t on b.user_type_id = t.user_type_id 
where a.type = 'U' --a.name = 'yourtablename'      
and t.name not in('xml', 'image')
 
open selectivity 
 
fetch next from selectivity 
into @schema, @table, @column 
 
set @strSQL = ' ' 
 
while @@FETCH_STATUS = 0 
begin 
         
        set @strSQL = 'select ''' + @schema + ''', ''' + @table + ''', ''' + @column + ''', count(distinct cast([' + @column + '] as varchar(max))) * 1.0 / nullif(count(*),0) from [' + @schema + '].[' + @table + ']'      
 
        insert @t 
        exec(@strSQL) 
 
        fetch next from selectivity 
        into @schema, @table, @column 
 
end 
 
close selectivity 
deallocate selectivity 
 
select * from @t

Open in new window

Author

Commented:
ralmada, You truly are a guru! Many thanks. You made it look so simple!

gothamite, appreciate your input and help very much also.

Commented:
Thanks for your kind words! Glad to help!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial