selectivity of columns

Is there a way to find the selectivity of the columns in a table?

thanks
LVL 6
anushahannaAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

gothamiteCommented:
Yes!

Selectivity = no. of unique values in column / total number of rows
0
anushahannaAuthor Commented:
thanks. no sql in-built function that will give this?
0
ralmadaCommented:
0
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

gothamiteCommented:
If you want to do this quickly for one column you can execute this query:


SELECT 1.0*(SELECT COUNT(DISTINCT <your column>) FROM <your table>) / (SELECT COUNT(*) FROM <your table>)

Open in new window

0
anushahannaAuthor Commented:
ralmada, I would be curious to check it out.. i tried it but it gives some schema errors. Do you have it installed in your system? if you don't mind, can you tell what kind of output do you get? Do you get the selectivity on all the columns in a table?
0
anushahannaAuthor Commented:
gothamite, thanks. that was helpful. do you think using 2 cursors - one for table level and one for column level and then using your query should effectively get all the selectivity info for all columns in the database?
0
gothamiteCommented:
Yes that would probably have the least impact
I would also add

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

to the start of the query which will cause minimal locking of data so as to not affect any other queries.
0
ralmadaCommented:
>>ralmada, I would be curious to check it out.. i tried it but it gives some schema errors. Do you have it installed in your system? if you don't mind, can you tell what kind of output do you get? Do you get the selectivity on all the columns in a table?<<
No I don't have it installed, but I can give it a try if you give me until tomorrow.
0
anushahannaAuthor Commented:
gothamite, do you mean put the isolation level in the cursor?

Thanks ralmada for your help in that regard.
0
gothamiteCommented:
No, just put the isolation level at start of the batch before all other commands i.e. just before you declare your cursors etc.,


0
ralmadaCommented:
Ok, the example I've provided before requires additional functions as well as you need to specify the table and column name, so not good. I have done this however for all tables and views within the database or just for one table see second example:
 
 

--All columns all tables within the database
declare @strSQL varchar(max)

select @strSQL = coalesce(@strSQL, ' ') +  ' union all select ''' + a.name + ''' as tablename, ''' + b.name + ''' as columnanme, count(distinct [' + b.name + ']) * 1.0 / nullif(count(*),0) as selectivity from [' + a.name + ']'
from sys.objects a 
inner join sys.columns b on a.object_id = b.object_id
where a.type in ('U', 'V')

set @strSQL = stuff(@strSQL, 1, 11, '')
exec(@strSQL)


--All columns in a table:
declare @strSQL varchar(max)

select @strSQL = coalesce(@strSQL, ' ') +  ' union all select ''' + a.name + ''' as tablename, ''' + b.name + ''' as columnanme, count(distinct [' + b.name + ']) * 1.0 / nullif(count(*),0) as selectivity from [' + a.name + ']'
from sys.objects a 
inner join sys.columns b on a.object_id = b.object_id
where a.name = 'productct' -- a.type in ('U', 'V')

set @strSQL = stuff(@strSQL, 1, 11, '')
exec(@strSQL)

Open in new window

0
ralmadaCommented:
so in the second example I've used a table of my own, but just change it accordingly.
declare @strSQL varchar(max) 
 
select @strSQL = coalesce(@strSQL, ' ') +  ' union all select ''' + a.name + ''' as tablename, ''' + b.name + ''' as columnanme, count(distinct [' + b.name + ']) * 1.0 / nullif(count(*),0) as selectivity from [' + a.name + ']' 
from sys.objects a  
inner join sys.columns b on a.object_id = b.object_id 
where a.name = 'yourtablename' 
 
set @strSQL = stuff(@strSQL, 1, 11, '') 
exec(@strSQL)

Open in new window

0
anushahannaAuthor Commented:
gothamite, got you. Why 'uncommitted' isolation before getting selectivity?
0
anushahannaAuthor Commented:
ralmada,
Thanks very much for spinning a query that skillfully.

Now I do have a problem in that this database needs the schema with the tablename otherwise it complains.
I was able to add 'inner join sys.schemas s on a.schema_id = s.schema_id' to your query, but I was not able to successfully add that into the @strSQL. can you please help me add the schema also into the string?
0
gothamiteCommented:
Read uncommitted will greatly reduce the number of locks required to run the selectivity query. You don't have to do this, but it will limit the effect on any other queries you may have running.
0
ralmadaCommented:
Here you go
declare @strSQL varchar(max)  
  
select @strSQL = coalesce(@strSQL, ' ') +  ' union all select ''' + s.name + ''' as schema, ''' + a.name + ''' as tablename, ''' + b.name + ''' as columnanme, count(distinct [' + b.name + ']) * 1.0 / nullif(count(*),0) as selectivity from [' + s.name + '.' + a.name + ']'  
from sys.objects a   
inner join sys.columns b on a.object_id = b.object_id  
inner join sys.schemas s on a.schema_id = s.schema_id
where a.name = 'yourtablename'  
  
set @strSQL = stuff(@strSQL, 1, 11, '')  
exec(@strSQL)

Open in new window

0
anushahannaAuthor Commented:
gothamite, makes perfect sense, as it will be a very intense query. thanks for that very helpful idea.

hopefully ralmada is going to be a great fix to the cursor.
0
anushahannaAuthor Commented:
ralmada, that was brilliant. thanks very much. it works beautifully.

For all tables, do you just change the "where a.name = 'yourtablename'" section  to "where a.type = 'U'"?
I did that, and I am getting an error

Msg 8117, Level 16, State 1, Line 1
Operand data type ntext is invalid for count operator.
0
ralmadaCommented:
ok, so the problem is that you have ntext columns. If you're in SQL 2005+ then you should definitively avoid using ntext and start using nvarchar(max). Same goes with text datatypes, should be replaced with varchar(max).
Nevertheless here's a fix to avoid that problem
 

declare @strSQL varchar(max)   
   
select @strSQL = coalesce(@strSQL, ' ') +  ' union all select ''' + s.name + ''' as schema, ''' + a.name + ''' as tablename, ''' + b.name + ''' as columnanme, count(distinct cast([' + b.name + '] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [' + s.name + '.' + a.name + ']'   
from sys.objects a    
inner join sys.columns b on a.object_id = b.object_id   
inner join sys.schemas s on a.schema_id = s.schema_id 
where a.name = 'yourtablename'   
   
set @strSQL = stuff(@strSQL, 1, 11, '')   
exec(@strSQL)

Open in new window

0
anushahannaAuthor Commented:
ralmada, thanks very much for the good advice on text data types.

I tried the query, now I get this error, 7 times if i run for one table, and many times more if I run for the whold DB:
Msg 156, Level 15, State 1, Line 1
Incorrect syntax near the keyword 'schema'.

I changed the
 [' + s.name + '.' + a.name + ']'  
part to
[' + s.name + '].[' + a.name + ']'
still that did not help.. can you kindly take a look at it..
0
ralmadaCommented:
what about like this? If not, can you run a print @strSQL instead of exec(@strSQL) for a table that is giving errors and post the result here:
declare @strSQL varchar(max)    
    
select @strSQL = coalesce(@strSQL, ' ') +  ' union all select ''' + s.name + ''' as [schema], ''' + a.name + ''' as tablename, ''' + b.name + ''' as columnanme, count(distinct cast([' + b.name + '] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [' + s.name + '].[' + a.name + ']'    
from sys.objects a     
inner join sys.columns b on a.object_id = b.object_id    
inner join sys.schemas s on a.schema_id = s.schema_id  
where a.name = 'yourtablename'    
    
set @strSQL = stuff(@strSQL, 1, 11, '')    
exec(@strSQL)

Open in new window

0
anushahannaAuthor Commented:
I am getting multiple errors of
Incorrect syntax near the keyword 'as'.

The print gives me
"
 select 'dbo' as [schema], 'ConfigurationInfo' as tablename, 'ConfigInfoID' as columnanme, count(distinct cast([ConfigInfoID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ConfigurationInfo] union all select 'dbo' as [schema], 'ConfigurationInfo' as tablename, 'Name' as columnanme, count(distinct cast([Name] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ConfigurationInfo] union all select 'dbo' as [schema], 'ConfigurationInfo' as tablename, 'Value' as columnanme, count(distinct cast([Value] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ConfigurationInfo] union all select 'dbo' as [schema], 'Catalog' as tablename, 'ItemID' as columnanme, count(distinct cast([ItemID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Path' as columnanme, count(distinct cast([Path] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Name' as columnanme, count(distinct cast([Name] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'ParentID' as columnanme, count(distinct cast([ParentID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Type' as columnanme, count(distinct cast([Type] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Content' as columnanme, count(distinct cast([Content] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Intermediate' as columnanme, count(distinct cast([Intermediate] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'SnapshotDataID' as columnanme, count(distinct cast([SnapshotDataID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'LinkSourceID' as columnanme, count(distinct cast([LinkSourceID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Property' as columnanme, count(distinct cast([Property] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Description' as columnanme, count(distinct cast([Description] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Hidden' as columnanme, count(distinct cast([Hidden] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'CreatedByID' as columnanme, count(distinct cast([CreatedByID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'CreationDate' as columnanme, count(distinct cast([CreationDate] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'ModifiedByID' as columnanme, count(distinct cast([ModifiedByID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'ModifiedDate' as columnanme, count(distinct cast([ModifiedDate] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'MimeType' as columnanme, count(distinct cast([MimeType] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'SnapshotLimit' as columnanme, count(distinct cast([SnapshotLimit] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'Parameter' as columnanme, count(distinct cast([Parameter] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'PolicyID' as columnanme, count(distinct cast([PolicyID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'PolicyRoot' as columnanme, count(distinct cast([PolicyRoot] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'ExecutionFlag' as columnanme, count(distinct cast([ExecutionFlag] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'Catalog' as tablename, 'ExecutionTime' as columnanme, count(distinct cast([ExecutionTime] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[Catalog] union all select 'dbo' as [schema], 'UpgradeInfo' as tablename, 'Item' as columnanme, count(distinct cast([Item] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[UpgradeInfo] union all select 'dbo' as [schema], 'UpgradeInfo' as tablename, 'Status' as columnanme, count(distinct cast([Status] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[UpgradeInfo] union all select 'dbo' as [schema], 'ModelDrill' as tablename, 'ModelDrillID' as columnanme, count(distinct cast([ModelDrillID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelDrill] union all select 'dbo' as [schema], 'ModelDrill' as tablename, 'ModelID' as columnanme, count(distinct cast([ModelID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelDrill] union all select 'dbo' as [schema], 'ModelDrill' as tablename, 'ReportID' as columnanme, count(distinct cast([ReportID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelDrill] union all select 'dbo' as [schema], 'ModelDrill' as tablename, 'ModelItemID' as columnanme, count(distinct cast([ModelItemID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelDrill] union all select 'dbo' as [schema], 'ModelDrill' as tablename, 'Type' as columnanme, count(distinct cast([Type] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelDrill] union all select 'dbo' as [schema], 'ModelPerspective' as tablename, 'ID' as columnanme, count(distinct cast([ID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelPerspective] union all select 'dbo' as [schema], 'ModelPerspective' as tablename, 'ModelID' as columnanme, count(distinct cast([ModelID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelPerspective] union all select 'dbo' as [schema], 'ModelPerspective' as tablename, 'PerspectiveID' as columnanme, count(distinct cast([PerspectiveID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelPerspective] union all select 'dbo' as [schema], 'ModelPerspective' as tablename, 'PerspectiveName' as columnanme, count(distinct cast([PerspectiveName] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelPerspective] union all select 'dbo' as [schema], 'ModelPerspective' as tablename, 'PerspectiveDescription' as columnanme, count(distinct cast([PerspectiveDescription] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[ModelPerspective] union all select 'dbo' as [schema], 'CachePolicy' as tablename, 'CachePolicyID' as columnanme, count(distinct cast([CachePolicyID] as varchar(max)) * 1.0 / nullif(count(*),0) as selectivity from [dbo].[CachePolicy] union all select 'dbo'
"
Msg 156, Level 15, State 1, Line 1
Incorrect syntax near the keyword 'as'.
Msg 105, Level 15, State 1, Line 1
Unclosed quotation mark after the character string 'DESC
'.
Msg 102, Level 15, State 1, Line 1
Incorrect syntax near 'DESC
'.

Open in new window

0
ralmadaCommented:
the print seems to be truncated. Have you changed the @strSQL datatype? I've used varchar(max)
If  not, can you post it in the Code section like below

post the result of print here.

Open in new window

0
anushahannaAuthor Commented:
ralmada, I did not change any of the datatypes;just did copy/paste.

I was also curious.. the print only printed till "union all select 'dbo'" which did not seem to be complete...
0
ralmadaCommented:
ok, the problem is that print only takes up to 8000 characters and your string seems to be longer than that!!!
try this:
use select @strSQL instead of print @strSQL  and post the result here.
Also can you run it for a single table causing errors so we can narrow down the problem.
0
anushahannaAuthor Commented:
ralmada, OK surely will try first thing tomorrow.
thanks for your help.
0
ralmadaCommented:
Ok, I found the bug :) there's a missing bracket for the cast function.
I have also attached a new alternative using a cursor. Not a lot of difference in my environment, give both a try and let me know.
For the cursor alternative I have tried with and without the
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

but it didn't really make any differences.
 

/*--Without cursor
declare @strSQL varchar(max)     
     
select @strSQL = coalesce(@strSQL, ' ') +  ' union all select ''' + s.name + ''' as [schema], ''' + a.name + ''' as tbl, ''' + b.name + ''' as col, count(distinct cast([' + b.name + '] as varchar(max))) * 1.0 / nullif(count(*),0) as sy from [' + s.name + '].[' + a.name + ']'     
from sys.objects a      
inner join sys.columns b on a.object_id = b.object_id     
inner join sys.schemas s on a.schema_id = s.schema_id   
inner join sys.types t on b.user_type_id = t.user_type_id
where a.type = 'U' --a.name = 'yourtablename'     
and t.name <> 'xml'
      
set @strSQL = stuff(@strSQL, 1, 11, '')     
exec(@strSQL)
*/

--Using Cursor
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

declare @t table (schemaname varchar(255), tablename varchar(255), columnname varchar(255), stivity decimal(10,2))
declare @strSQL varchar(max)
declare @schema varchar(255)
declare @table varchar(255)
declare @column varchar(255)

declare selectivity cursor read_only for
select s.name as [schema], a.name as [table], b.name as [column]
from sys.objects a      
inner join sys.columns b on a.object_id = b.object_id     
inner join sys.schemas s on a.schema_id = s.schema_id   
inner join sys.types t on b.user_type_id = t.user_type_id
where a.type = 'U' --a.name = 'yourtablename'     
and t.name <> 'xml'

open selectivity

fetch next from selectivity
into @schema, @table, @column

set @strSQL = ' '

while @@FETCH_STATUS = 0
begin
	
	set @strSQL = 'select ''' + @schema + ''', ''' + @table + ''', ''' + @column + ''', count(distinct cast([' + @column + '] as varchar(max))) * 1.0 / nullif(count(*),0) from [' + @schema + '].[' + @table + ']'     

	insert @t
	exec(@strSQL)

	fetch next from selectivity
	into @schema, @table, @column

end

close selectivity
deallocate selectivity

select * from @t

Open in new window

0
anushahannaAuthor Commented:
Thanks so much, ralmada.

For the non-cursor one, I get this
Msg 529, Level 16, State 2, Line 1
Explicit conversion from data type image to varchar(max) is not allowed.

i am getting the same for the cursor one also.

This is different than the 'Operand data type ntext is invalid for count operator.' we got before.

0
ralmadaCommented:
wow, you have lots of datatypes there lol!
You will have to exclude them
Also remember that ntext, text and image are deprecated, you should consider using varchar(max) instead.

/*--Without cursor 
declare @strSQL varchar(max)      
      
select @strSQL = coalesce(@strSQL, ' ') +  ' union all select ''' + s.name + ''' as [schema], ''' + a.name + ''' as tbl, ''' + b.name + ''' as col, count(distinct cast([' + b.name + '] as varchar(max))) * 1.0 / nullif(count(*),0) as sy from [' + s.name + '].[' + a.name + ']'      
from sys.objects a       
inner join sys.columns b on a.object_id = b.object_id      
inner join sys.schemas s on a.schema_id = s.schema_id    
inner join sys.types t on b.user_type_id = t.user_type_id 
where a.type = 'U' --a.name = 'yourtablename'      
and t.name not in ('xml', 'image')
       
set @strSQL = stuff(@strSQL, 1, 11, '')      
exec(@strSQL) 
*/ 
 
--Using Cursor 
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED 
 
declare @t table (schemaname varchar(255), tablename varchar(255), columnname varchar(255), stivity decimal(10,2)) 
declare @strSQL varchar(max) 
declare @schema varchar(255) 
declare @table varchar(255) 
declare @column varchar(255) 
 
declare selectivity cursor read_only for 
select s.name as [schema], a.name as [table], b.name as [column] 
from sys.objects a       
inner join sys.columns b on a.object_id = b.object_id      
inner join sys.schemas s on a.schema_id = s.schema_id    
inner join sys.types t on b.user_type_id = t.user_type_id 
where a.type = 'U' --a.name = 'yourtablename'      
and t.name not in('xml', 'image')
 
open selectivity 
 
fetch next from selectivity 
into @schema, @table, @column 
 
set @strSQL = ' ' 
 
while @@FETCH_STATUS = 0 
begin 
         
        set @strSQL = 'select ''' + @schema + ''', ''' + @table + ''', ''' + @column + ''', count(distinct cast([' + @column + '] as varchar(max))) * 1.0 / nullif(count(*),0) from [' + @schema + '].[' + @table + ']'      
 
        insert @t 
        exec(@strSQL) 
 
        fetch next from selectivity 
        into @schema, @table, @column 
 
end 
 
close selectivity 
deallocate selectivity 
 
select * from @t

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
anushahannaAuthor Commented:
ralmada, You truly are a guru! Many thanks. You made it look so simple!

gothamite, appreciate your input and help very much also.
0
ralmadaCommented:
Thanks for your kind words! Glad to help!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server 2008

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.