Solved

Find distinct value for all table columns in a database

Posted on 2004-10-24
1,724 Views
Last Modified: 2012-08-14
I am trying to get a different result from this cursor.  This cursor gives the distinct count for all colums in every table in the database,  I need to get each distinct column value for every column in all tables in the database,  I understand this may take a long time to run but it is most valueable to me and I am not too familiar with cursors.  Any help will be gratly appreciated.

create procedure  sp_tableinfo
as
DECLARE @table sysname
DECLARE @column sysname
DECLARE @datatype sysname
DECLARE @designed_length int
DECLARE @all_count int
DECLARE @sql nvarchar(4000)

SET NOCOUNT ON
--EXEC sp_updatestats

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED -- Will speed things up a bit

CREATE TABLE table_info
(table_name sysname NOT NULL
,column_name sysname NOT NULL
,data_type sysname NOT NULL
,designed_length int NULL
,max_length int NULL
,distinct_count int NULL
,all_count int NOT NULL
,cardinality AS
CASE
WHEN distinct_count IS NULL THEN CAST(data_type AS varchar(7))
WHEN all_count = 0 THEN CAST('No rows' AS varchar(7))
ELSE CAST(
CAST(CAST(distinct_count AS decimal)/CAST(all_count AS decimal) AS decimal(18,4)) AS varchar(7))
END
)

DECLARE c CURSOR FAST_FORWARD FOR
SELECT
isc.table_name,
isc.column_name,
isc.data_type,
COALESCE(isc.character_maximum_length, isc.numeric_precision),
si.rowcnt
FROM information_schema.columns isc
INNER JOIN information_schema.tables ist
ON isc.table_name = ist.table_name
INNER JOIN sysindexes si
ON isc.table_name = OBJECT_NAME(si.id)
WHERE ist.table_type = 'base table'
AND ist.table_name not like 'dt%'
AND si.indid IN (0,1)
group by isc.table_name, isc.column_name,isc.data_type,isc.character_maximum_length, isc.numeric_precision,
si.rowcnt
ORDER BY isc.table_name, isc.column_name,isc.data_type,isc.character_maximum_length, isc.numeric_precision,
si.rowcnt


OPEN c
FETCH NEXT FROM c INTO @table, @column, @datatype, @designed_length, @all_count
WHILE @@FETCH_STATUS = 0
BEGIN
IF @datatype IN ('text', 'ntext', 'image')
BEGIN
SET @sql = 'SELECT ''' + @table + ''', ''' + @column + ''', ''' + @datatype + '''' --+ 'group by['+@column+']'
SET @sql = @sql + ', ' + CAST(@designed_length AS varchar(10)) + ', MAX(DATALENGTH([' + @column + ']))'
SET @sql = @sql + ', NULL' + ', ' + CAST(@all_count AS varchar(10)) + ' FROM [' + @table + ']'
END
ELSE
BEGIN
SET @sql = @sql + ', ' + CAST(@designed_length AS varchar(10)) + ', CAST([' + @column + '] AS VARCHAR(8000))'
SET @sql = @sql + ', ' + CAST(@designed_length AS varchar(10)) + ', MAX(LEN(CAST([' + @column + '] AS VARCHAR(8000))))'
SET @sql = @sql + ', COUNT(DISTINCT [' + @column + '])'
SET @sql = @sql + ', ' + CAST(@all_count AS varchar(10)) + ' FROM [' + @table + ']'
END
PRINT @sql
INSERT INTO table_info (table_name, column_name, data_type, designed_length, max_length, distinct_count, all_count)
EXEC(@sql)
FETCH NEXT FROM c INTO @table, @column, @datatype, @designed_length, @all_count
END
CLOSE c
DEALLOCATE c

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

/*SELECT table_name, column_name, data_type, designed_length, max_length, distinct_count, all_count, cardinality
FROM table_info */



--DROP TABLE table_info
0
Question by:jemacc
    7 Comments
     
    LVL 8

    Expert Comment

    by:sigmacon
    EVERY DISTINCT VALUE IN EVERY COLUMN OF EVERY TABLE? How would you like to have that printed on the screen? That could potentially be GigaBytes of data. At the least this should not be inserted in that table that you use for the statistics, because you would have a lot of redundant rows, the only difference being the distinct column value. Unless you want a comma separated list of all values in a text field.

    BTW, the code you posted above seems to only look at columns that have and index, if I am not mistaken, or a statistic on them.

    It's possible though, so confirm that this is what you want, and if so, how the result should be printed on the screen or stored.
    0
     

    Author Comment

    by:jemacc
    I do need the results written to the table because I need to be able to report information. As far a comma sepated value I will still need to store for reporting information.
    0
     
    LVL 8

    Expert Comment

    by:sigmacon
    jemacc,

    The current reporting table has one row for each column analysed. How do you want the distinct column values, which could potentially be 1000s or more to be stored? Duplicate the row for each column and for each value? Put the in a separate table? Add them as comma-separated list? If you don't know the answer to that, please describe your reporting issue a little bit more detailed, so I can derive how you would like to get the values back.
    0
     
    LVL 17

    Expert Comment

    by:BillAn1
    the following will give you distinct values.
    The only probs are  - 1. it cannot do distinct for text, ntext or image data types. There is no easy method for getting distinct values for these. (IF the text is less than 8000 chars you could convert to varchar, but if it is > 8000 chars it is difficult to do)
    2. Because the table you store the data in needs to have the table name, the column name and the data, and there is a restriction on the max size of a row of approx 8016 bytes, you cannot store a full 8000 bytes for the value. If you do have any varchar values which are between 7000 and 8000 you would have to change your table sturcture to be normalised, in order to accommodate the full 8000 chars.
    Otherwise this should work, with the caveat that it could get quite large - depending on your data.


    create procedure  sp_tableinfo
    as
    BEGIN
    DECLARE @table sysname
    DECLARE @column sysname
    DECLARE @datatype sysname

    DECLARE @sql nvarchar(4000)

    SET NOCOUNT ON
    --EXEC sp_updatestats

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED -- Will speed things up a bit

    CREATE TABLE table_info
    (table_name sysname NOT NULL
    ,column_name sysname NOT NULL
    ,col_value varchar(7000) NULL
    )


    DECLARE c CURSOR FAST_FORWARD FOR
    SELECT
    isc.table_name,
    isc.column_name,
    isc.data_type
    FROM information_schema.columns isc
    INNER JOIN information_schema.tables ist
    ON isc.table_name = ist.table_name
    WHERE ist.table_type = 'base table'
    AND ist.table_name not like 'dt%'
    ORDER BY 1,2

    set @sql = ''
    OPEN c
    FETCH NEXT FROM c INTO @table, @column, @datatype
    WHILE @@FETCH_STATUS = 0
    BEGIN
          IF @datatype NOT IN ('text', 'ntext', 'image')
          BEGIN
                SET @sql = 'INSERT INTO table_info SELECT DISTINCT ''' + @table + ''', ''' + @column + ''', convert(varchar(7000),[' + @column + ']) FROM [' + @table + ']'
          END
          ---PRINT @sql
          EXEC(@sql)
          FETCH NEXT FROM c INTO @table, @column, @datatype
    END
    CLOSE c
    DEALLOCATE c
    end
    0
     
    LVL 50

    Expert Comment

    by:Lowfatspread
    given that this will take a while anyway...

    it may be beneficial to  use the Tableid and Columnid values used in sysobjects in your stats rows that should enable you to store 8000 byte varchars if thats important..

    0
     

    Author Comment

    by:jemacc
    BillAn1

    I did not see my datatype in the results set as the original state

    My reports needs is as follows.  I need to be able to give to a department a break down of their data, they are going through a conversion process and must identify and analyze thier data.

    for example there maybe in  column name street, there may be diffrent entry for street

    street
    str
    st
    strrt
    strr
    str_...

    Having this information will help them determine their business process for entering information. (the database is not normalize and had very little constraints)
    0
     
    LVL 17

    Accepted Solution

    by:
    sorry, if you want the datatype, you can add it back to your table.

    create procedure  sp_tableinfo
    as
    BEGIN
    DECLARE @table sysname
    DECLARE @column sysname
    DECLARE @datatype sysname

    DECLARE @sql nvarchar(4000)

    SET NOCOUNT ON
    --EXEC sp_updatestats

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED -- Will speed things up a bit

    CREATE TABLE table_info
    (table_name sysname NOT NULL
    ,column_name sysname NOT NULL
    ,datatype sysname NOT NULL
    ,col_value varchar(7000) NULL
    )


    DECLARE c CURSOR FAST_FORWARD FOR
    SELECT
    isc.table_name,
    isc.column_name,
    isc.data_type
    FROM information_schema.columns isc
    INNER JOIN information_schema.tables ist
    ON isc.table_name = ist.table_name
    WHERE ist.table_type = 'base table'
    AND ist.table_name not like 'dt%'
    ORDER BY 1,2

    set @sql = ''
    OPEN c
    FETCH NEXT FROM c INTO @table, @column, @datatype
    WHILE @@FETCH_STATUS = 0
    BEGIN
         IF @datatype NOT IN ('text', 'ntext', 'image')
         BEGIN
              SET @sql = 'INSERT INTO table_info SELECT DISTINCT ''' + @table + ''', ''' + @column + ''', ''' + @datatype + ''', convert(varchar(7000),[' + @column + ']) FROM [' + @table + ']'
         END
         ---PRINT @sql
         EXEC(@sql)
         FETCH NEXT FROM c INTO @table, @column, @datatype
    END
    CLOSE c
    DEALLOCATE c
    end
    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Lean Six Sigma Project Manager Certification

    There are many schools of thought around successful project management, but few as highly regarded as the Six Sigma and Lean methods. With 37 hours of learning, this training will explain concrete processes for increasing efficiency and limiting wasted time and effort.

    This is my first ever Article on EE or elsewhere; therefore, please bear with me if I have some discrepancies in my writing. I read many articles and questions related to "how to pass values to SSIS packages at run-time?"  Hence, this common ques…
    Introduction SQL Server Integration Services can read XML files, that’s known by every BI developer.  (If you didn’t, don’t worry, I’m aiming this article at newcomers as well.) But how far can you go?  When does the XML Source component become …
    Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
    Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.

    910 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    14 Experts available now in Live!

    Get 1:1 Help Now