Find distinct value for all table columns in a database

I am trying to get a different result from this cursor.  This cursor gives the distinct count for all colums in every table in the database,  I need to get each distinct column value for every column in all tables in the database,  I understand this may take a long time to run but it is most valueable to me and I am not too familiar with cursors.  Any help will be gratly appreciated.

create procedure  sp_tableinfo
as
DECLARE @table sysname
DECLARE @column sysname
DECLARE @datatype sysname
DECLARE @designed_length int
DECLARE @all_count int
DECLARE @sql nvarchar(4000)

SET NOCOUNT ON
--EXEC sp_updatestats

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED -- Will speed things up a bit

CREATE TABLE table_info
(table_name sysname NOT NULL
,column_name sysname NOT NULL
,data_type sysname NOT NULL
,designed_length int NULL
,max_length int NULL
,distinct_count int NULL
,all_count int NOT NULL
,cardinality AS
CASE
WHEN distinct_count IS NULL THEN CAST(data_type AS varchar(7))
WHEN all_count = 0 THEN CAST('No rows' AS varchar(7))
ELSE CAST(
CAST(CAST(distinct_count AS decimal)/CAST(all_count AS decimal) AS decimal(18,4)) AS varchar(7))
END
)

DECLARE c CURSOR FAST_FORWARD FOR
SELECT
isc.table_name,
isc.column_name,
isc.data_type,
COALESCE(isc.character_maximum_length, isc.numeric_precision),
si.rowcnt
FROM information_schema.columns isc
INNER JOIN information_schema.tables ist
ON isc.table_name = ist.table_name
INNER JOIN sysindexes si
ON isc.table_name = OBJECT_NAME(si.id)
WHERE ist.table_type = 'base table'
AND ist.table_name not like 'dt%'
AND si.indid IN (0,1)
group by isc.table_name, isc.column_name,isc.data_type,isc.character_maximum_length, isc.numeric_precision,
si.rowcnt
ORDER BY isc.table_name, isc.column_name,isc.data_type,isc.character_maximum_length, isc.numeric_precision,
si.rowcnt


OPEN c
FETCH NEXT FROM c INTO @table, @column, @datatype, @designed_length, @all_count
WHILE @@FETCH_STATUS = 0
BEGIN
IF @datatype IN ('text', 'ntext', 'image')
BEGIN
SET @sql = 'SELECT ''' + @table + ''', ''' + @column + ''', ''' + @datatype + '''' --+ 'group by['+@column+']'
SET @sql = @sql + ', ' + CAST(@designed_length AS varchar(10)) + ', MAX(DATALENGTH([' + @column + ']))'
SET @sql = @sql + ', NULL' + ', ' + CAST(@all_count AS varchar(10)) + ' FROM [' + @table + ']'
END
ELSE
BEGIN
SET @sql = @sql + ', ' + CAST(@designed_length AS varchar(10)) + ', CAST([' + @column + '] AS VARCHAR(8000))'
SET @sql = @sql + ', ' + CAST(@designed_length AS varchar(10)) + ', MAX(LEN(CAST([' + @column + '] AS VARCHAR(8000))))'
SET @sql = @sql + ', COUNT(DISTINCT [' + @column + '])'
SET @sql = @sql + ', ' + CAST(@all_count AS varchar(10)) + ' FROM [' + @table + ']'
END
PRINT @sql
INSERT INTO table_info (table_name, column_name, data_type, designed_length, max_length, distinct_count, all_count)
EXEC(@sql)
FETCH NEXT FROM c INTO @table, @column, @datatype, @designed_length, @all_count
END
CLOSE c
DEALLOCATE c

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

/*SELECT table_name, column_name, data_type, designed_length, max_length, distinct_count, all_count, cardinality
FROM table_info */



--DROP TABLE table_info
jemaccAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

sigmaconCommented:
EVERY DISTINCT VALUE IN EVERY COLUMN OF EVERY TABLE? How would you like to have that printed on the screen? That could potentially be GigaBytes of data. At the least this should not be inserted in that table that you use for the statistics, because you would have a lot of redundant rows, the only difference being the distinct column value. Unless you want a comma separated list of all values in a text field.

BTW, the code you posted above seems to only look at columns that have and index, if I am not mistaken, or a statistic on them.

It's possible though, so confirm that this is what you want, and if so, how the result should be printed on the screen or stored.
0
jemaccAuthor Commented:
I do need the results written to the table because I need to be able to report information. As far a comma sepated value I will still need to store for reporting information.
0
sigmaconCommented:
jemacc,

The current reporting table has one row for each column analysed. How do you want the distinct column values, which could potentially be 1000s or more to be stored? Duplicate the row for each column and for each value? Put the in a separate table? Add them as comma-separated list? If you don't know the answer to that, please describe your reporting issue a little bit more detailed, so I can derive how you would like to get the values back.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

BillAn1Commented:
the following will give you distinct values.
The only probs are  - 1. it cannot do distinct for text, ntext or image data types. There is no easy method for getting distinct values for these. (IF the text is less than 8000 chars you could convert to varchar, but if it is > 8000 chars it is difficult to do)
2. Because the table you store the data in needs to have the table name, the column name and the data, and there is a restriction on the max size of a row of approx 8016 bytes, you cannot store a full 8000 bytes for the value. If you do have any varchar values which are between 7000 and 8000 you would have to change your table sturcture to be normalised, in order to accommodate the full 8000 chars.
Otherwise this should work, with the caveat that it could get quite large - depending on your data.


create procedure  sp_tableinfo
as
BEGIN
DECLARE @table sysname
DECLARE @column sysname
DECLARE @datatype sysname

DECLARE @sql nvarchar(4000)

SET NOCOUNT ON
--EXEC sp_updatestats

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED -- Will speed things up a bit

CREATE TABLE table_info
(table_name sysname NOT NULL
,column_name sysname NOT NULL
,col_value varchar(7000) NULL
)


DECLARE c CURSOR FAST_FORWARD FOR
SELECT
isc.table_name,
isc.column_name,
isc.data_type
FROM information_schema.columns isc
INNER JOIN information_schema.tables ist
ON isc.table_name = ist.table_name
WHERE ist.table_type = 'base table'
AND ist.table_name not like 'dt%'
ORDER BY 1,2

set @sql = ''
OPEN c
FETCH NEXT FROM c INTO @table, @column, @datatype
WHILE @@FETCH_STATUS = 0
BEGIN
      IF @datatype NOT IN ('text', 'ntext', 'image')
      BEGIN
            SET @sql = 'INSERT INTO table_info SELECT DISTINCT ''' + @table + ''', ''' + @column + ''', convert(varchar(7000),[' + @column + ']) FROM [' + @table + ']'
      END
      ---PRINT @sql
      EXEC(@sql)
      FETCH NEXT FROM c INTO @table, @column, @datatype
END
CLOSE c
DEALLOCATE c
end
0
LowfatspreadCommented:
given that this will take a while anyway...

it may be beneficial to  use the Tableid and Columnid values used in sysobjects in your stats rows that should enable you to store 8000 byte varchars if thats important..

0
jemaccAuthor Commented:
BillAn1

I did not see my datatype in the results set as the original state

My reports needs is as follows.  I need to be able to give to a department a break down of their data, they are going through a conversion process and must identify and analyze thier data.

for example there maybe in  column name street, there may be diffrent entry for street

street
str
st
strrt
strr
str_...

Having this information will help them determine their business process for entering information. (the database is not normalize and had very little constraints)
0
BillAn1Commented:
sorry, if you want the datatype, you can add it back to your table.

create procedure  sp_tableinfo
as
BEGIN
DECLARE @table sysname
DECLARE @column sysname
DECLARE @datatype sysname

DECLARE @sql nvarchar(4000)

SET NOCOUNT ON
--EXEC sp_updatestats

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED -- Will speed things up a bit

CREATE TABLE table_info
(table_name sysname NOT NULL
,column_name sysname NOT NULL
,datatype sysname NOT NULL
,col_value varchar(7000) NULL
)


DECLARE c CURSOR FAST_FORWARD FOR
SELECT
isc.table_name,
isc.column_name,
isc.data_type
FROM information_schema.columns isc
INNER JOIN information_schema.tables ist
ON isc.table_name = ist.table_name
WHERE ist.table_type = 'base table'
AND ist.table_name not like 'dt%'
ORDER BY 1,2

set @sql = ''
OPEN c
FETCH NEXT FROM c INTO @table, @column, @datatype
WHILE @@FETCH_STATUS = 0
BEGIN
     IF @datatype NOT IN ('text', 'ntext', 'image')
     BEGIN
          SET @sql = 'INSERT INTO table_info SELECT DISTINCT ''' + @table + ''', ''' + @column + ''', ''' + @datatype + ''', convert(varchar(7000),[' + @column + ']) FROM [' + @table + ']'
     END
     ---PRINT @sql
     EXEC(@sql)
     FETCH NEXT FROM c INTO @table, @column, @datatype
END
CLOSE c
DEALLOCATE c
end
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.