Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Find distinct value for all table columns in a database

Posted on 2004-10-24
7
Medium Priority
?
1,731 Views
Last Modified: 2012-08-14
I am trying to get a different result from this cursor.  This cursor gives the distinct count for all colums in every table in the database,  I need to get each distinct column value for every column in all tables in the database,  I understand this may take a long time to run but it is most valueable to me and I am not too familiar with cursors.  Any help will be gratly appreciated.

create procedure  sp_tableinfo
as
DECLARE @table sysname
DECLARE @column sysname
DECLARE @datatype sysname
DECLARE @designed_length int
DECLARE @all_count int
DECLARE @sql nvarchar(4000)

SET NOCOUNT ON
--EXEC sp_updatestats

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED -- Will speed things up a bit

CREATE TABLE table_info
(table_name sysname NOT NULL
,column_name sysname NOT NULL
,data_type sysname NOT NULL
,designed_length int NULL
,max_length int NULL
,distinct_count int NULL
,all_count int NOT NULL
,cardinality AS
CASE
WHEN distinct_count IS NULL THEN CAST(data_type AS varchar(7))
WHEN all_count = 0 THEN CAST('No rows' AS varchar(7))
ELSE CAST(
CAST(CAST(distinct_count AS decimal)/CAST(all_count AS decimal) AS decimal(18,4)) AS varchar(7))
END
)

DECLARE c CURSOR FAST_FORWARD FOR
SELECT
isc.table_name,
isc.column_name,
isc.data_type,
COALESCE(isc.character_maximum_length, isc.numeric_precision),
si.rowcnt
FROM information_schema.columns isc
INNER JOIN information_schema.tables ist
ON isc.table_name = ist.table_name
INNER JOIN sysindexes si
ON isc.table_name = OBJECT_NAME(si.id)
WHERE ist.table_type = 'base table'
AND ist.table_name not like 'dt%'
AND si.indid IN (0,1)
group by isc.table_name, isc.column_name,isc.data_type,isc.character_maximum_length, isc.numeric_precision,
si.rowcnt
ORDER BY isc.table_name, isc.column_name,isc.data_type,isc.character_maximum_length, isc.numeric_precision,
si.rowcnt


OPEN c
FETCH NEXT FROM c INTO @table, @column, @datatype, @designed_length, @all_count
WHILE @@FETCH_STATUS = 0
BEGIN
IF @datatype IN ('text', 'ntext', 'image')
BEGIN
SET @sql = 'SELECT ''' + @table + ''', ''' + @column + ''', ''' + @datatype + '''' --+ 'group by['+@column+']'
SET @sql = @sql + ', ' + CAST(@designed_length AS varchar(10)) + ', MAX(DATALENGTH([' + @column + ']))'
SET @sql = @sql + ', NULL' + ', ' + CAST(@all_count AS varchar(10)) + ' FROM [' + @table + ']'
END
ELSE
BEGIN
SET @sql = @sql + ', ' + CAST(@designed_length AS varchar(10)) + ', CAST([' + @column + '] AS VARCHAR(8000))'
SET @sql = @sql + ', ' + CAST(@designed_length AS varchar(10)) + ', MAX(LEN(CAST([' + @column + '] AS VARCHAR(8000))))'
SET @sql = @sql + ', COUNT(DISTINCT [' + @column + '])'
SET @sql = @sql + ', ' + CAST(@all_count AS varchar(10)) + ' FROM [' + @table + ']'
END
PRINT @sql
INSERT INTO table_info (table_name, column_name, data_type, designed_length, max_length, distinct_count, all_count)
EXEC(@sql)
FETCH NEXT FROM c INTO @table, @column, @datatype, @designed_length, @all_count
END
CLOSE c
DEALLOCATE c

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

/*SELECT table_name, column_name, data_type, designed_length, max_length, distinct_count, all_count, cardinality
FROM table_info */



--DROP TABLE table_info
0
Comment
Question by:jemacc
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
  • +1
7 Comments
 
LVL 8

Expert Comment

by:sigmacon
ID: 12392127
EVERY DISTINCT VALUE IN EVERY COLUMN OF EVERY TABLE? How would you like to have that printed on the screen? That could potentially be GigaBytes of data. At the least this should not be inserted in that table that you use for the statistics, because you would have a lot of redundant rows, the only difference being the distinct column value. Unless you want a comma separated list of all values in a text field.

BTW, the code you posted above seems to only look at columns that have and index, if I am not mistaken, or a statistic on them.

It's possible though, so confirm that this is what you want, and if so, how the result should be printed on the screen or stored.
0
 

Author Comment

by:jemacc
ID: 12392137
I do need the results written to the table because I need to be able to report information. As far a comma sepated value I will still need to store for reporting information.
0
 
LVL 8

Expert Comment

by:sigmacon
ID: 12392147
jemacc,

The current reporting table has one row for each column analysed. How do you want the distinct column values, which could potentially be 1000s or more to be stored? Duplicate the row for each column and for each value? Put the in a separate table? Add them as comma-separated list? If you don't know the answer to that, please describe your reporting issue a little bit more detailed, so I can derive how you would like to get the values back.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 17

Expert Comment

by:BillAn1
ID: 12392171
the following will give you distinct values.
The only probs are  - 1. it cannot do distinct for text, ntext or image data types. There is no easy method for getting distinct values for these. (IF the text is less than 8000 chars you could convert to varchar, but if it is > 8000 chars it is difficult to do)
2. Because the table you store the data in needs to have the table name, the column name and the data, and there is a restriction on the max size of a row of approx 8016 bytes, you cannot store a full 8000 bytes for the value. If you do have any varchar values which are between 7000 and 8000 you would have to change your table sturcture to be normalised, in order to accommodate the full 8000 chars.
Otherwise this should work, with the caveat that it could get quite large - depending on your data.


create procedure  sp_tableinfo
as
BEGIN
DECLARE @table sysname
DECLARE @column sysname
DECLARE @datatype sysname

DECLARE @sql nvarchar(4000)

SET NOCOUNT ON
--EXEC sp_updatestats

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED -- Will speed things up a bit

CREATE TABLE table_info
(table_name sysname NOT NULL
,column_name sysname NOT NULL
,col_value varchar(7000) NULL
)


DECLARE c CURSOR FAST_FORWARD FOR
SELECT
isc.table_name,
isc.column_name,
isc.data_type
FROM information_schema.columns isc
INNER JOIN information_schema.tables ist
ON isc.table_name = ist.table_name
WHERE ist.table_type = 'base table'
AND ist.table_name not like 'dt%'
ORDER BY 1,2

set @sql = ''
OPEN c
FETCH NEXT FROM c INTO @table, @column, @datatype
WHILE @@FETCH_STATUS = 0
BEGIN
      IF @datatype NOT IN ('text', 'ntext', 'image')
      BEGIN
            SET @sql = 'INSERT INTO table_info SELECT DISTINCT ''' + @table + ''', ''' + @column + ''', convert(varchar(7000),[' + @column + ']) FROM [' + @table + ']'
      END
      ---PRINT @sql
      EXEC(@sql)
      FETCH NEXT FROM c INTO @table, @column, @datatype
END
CLOSE c
DEALLOCATE c
end
0
 
LVL 50

Expert Comment

by:Lowfatspread
ID: 12392777
given that this will take a while anyway...

it may be beneficial to  use the Tableid and Columnid values used in sysobjects in your stats rows that should enable you to store 8000 byte varchars if thats important..

0
 

Author Comment

by:jemacc
ID: 12393906
BillAn1

I did not see my datatype in the results set as the original state

My reports needs is as follows.  I need to be able to give to a department a break down of their data, they are going through a conversion process and must identify and analyze thier data.

for example there maybe in  column name street, there may be diffrent entry for street

street
str
st
strrt
strr
str_...

Having this information will help them determine their business process for entering information. (the database is not normalize and had very little constraints)
0
 
LVL 17

Accepted Solution

by:
BillAn1 earned 2000 total points
ID: 12394124
sorry, if you want the datatype, you can add it back to your table.

create procedure  sp_tableinfo
as
BEGIN
DECLARE @table sysname
DECLARE @column sysname
DECLARE @datatype sysname

DECLARE @sql nvarchar(4000)

SET NOCOUNT ON
--EXEC sp_updatestats

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED -- Will speed things up a bit

CREATE TABLE table_info
(table_name sysname NOT NULL
,column_name sysname NOT NULL
,datatype sysname NOT NULL
,col_value varchar(7000) NULL
)


DECLARE c CURSOR FAST_FORWARD FOR
SELECT
isc.table_name,
isc.column_name,
isc.data_type
FROM information_schema.columns isc
INNER JOIN information_schema.tables ist
ON isc.table_name = ist.table_name
WHERE ist.table_type = 'base table'
AND ist.table_name not like 'dt%'
ORDER BY 1,2

set @sql = ''
OPEN c
FETCH NEXT FROM c INTO @table, @column, @datatype
WHILE @@FETCH_STATUS = 0
BEGIN
     IF @datatype NOT IN ('text', 'ntext', 'image')
     BEGIN
          SET @sql = 'INSERT INTO table_info SELECT DISTINCT ''' + @table + ''', ''' + @column + ''', ''' + @datatype + ''', convert(varchar(7000),[' + @column + ']) FROM [' + @table + ']'
     END
     ---PRINT @sql
     EXEC(@sql)
     FETCH NEXT FROM c INTO @table, @column, @datatype
END
CLOSE c
DEALLOCATE c
end
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
This article shows gives you an overview on SQL Server 2016 row level security. You will also get to know the usages of row-level-security and how it works
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question