SQL query to return all unique values of all fields in a table without knowing at compile time what fields are in the table

I have a java web application that is a front end to a Microsoft SQL Server (I think 2008)
 database. It lets the user add, change, and delete records in the database. The app does not know anything about the database and gets the names of tables and fields in the tables by querying the database with queries like...

SELECT C.ORDINAL_POSITION, C.COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, NUMERIC_PRECISION, K.ORDINAL_POSITION AS PK_ORDINAL, CONSTR.CONSTRAINT_TYPE   FROM INFORMATION_SCHEMA.COLUMNS C    LEFT JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE K    ON C.TABLE_NAME=K.TABLE_NAME AND   C.COLUMN_NAME=K.COLUMN_NAME    LEFT JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS CONSTR   ON K.CONSTRAINT_NAME=CONSTR.CONSTRAINT_NAME   WHERE C.TABLE_SCHEMA='dbo' AND C.TABLE_NAME='tbl_L_Constants'   ORDER BY C.ORDINAL_POSITION

which produces

ORDINAL_POSITION	COLUMN_NAME	DATA_TYPE	CHARACTER_MAXIMUM_LENGTH	NUMERIC_PRECISION	PK_ORDINAL	CONSTRAINT_TYPE
1	Source_System	varchar	10	NULL	1	PRIMARY KEY
2	Product_Id	int	NULL	10	2	PRIMARY KEY
3	Field_Name	varchar	100	NULL	3	PRIMARY KEY
4	Field_Value	varchar	100	NULL	NULL	NULL
5	Field_Purpose	varchar	100	NULL	NULL	NULL

Open in new window


On the add and change screen I display to the user a drop-down list for every field, showing the unique values in that field. But first I do a count on the unique values so I know if there are too many values to display, and I cut it off at some number.

The problem is that first I have to query the database for the list of fields in the table, then I have to run a query for each field to find out the count, then if the count is not too high, I run another query to get the actual values.

I wonder if there is any way to reduce the number of queries? Maybe I could run a query that joins the actual table and syscolumns and maybe uses GROUP BY somehow to get the count of unique values in each field in the table, all in one results set.

Then, I could walk that and run a query for each field that doesn't have too many values.

Here are the other two queries I run

SELECT COUNT(DISTINCT Call_Type) AS NumFound FROM Bonds
SELECT DISTINCT Call_Type, COUNT(*) AS NumFound FROM tbl_M_Bond GROUP BY Call_Type ORDER BY Call_Type

That second query reminds me that in the drop-down list I display the value and the number of occurrences of the value. So, something like this for the first name field in a table of people's names:

Jason (10)
Susan (8)
Sarah (4)

Can anyone think of a way to reduce these queries and enhance performance? Without writing a stored procedure? In my organization that is more difficult to get into production.
jkurantAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

jkurantAuthor Commented:
Here is a screenshot of a chunk of my data-entry screen.
0
jkurantAuthor Commented:
attaching screenshot
0
jkurantAuthor Commented:
screenshot.gif
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Walter RitzelSenior Software EngineerCommented:
What you are doing here is to create dynamic queries, which is something really cool. And unfortunately, I dont see a different approach than the one you are using.
Suggestions to improve that would be keep the table structures in memory or saved in a metadata table, to save some time when building the page. But, get the unique values for each field is something that should be done at runtime.
0
Scott PletcherSenior DBACommented:
1) Stop using INFORMATION_SCHEMA views, as they are deadly slow and contribute to deadlocks; use sys.columns instead.
2) You need to index each column separately for any large tables that allow this type of processing.  
3) You wouldn't need to run both of these queries:
A) SELECT COUNT(DISTINCT Call_Type) AS NumFound FROM Bonds
B) SELECT /*DISTINCT*/ Call_Type, COUNT(*) AS NumFound FROM tbl_M_Bond GROUP BY Call_Type /*ORDER BY Call_Type*/  --don't need the DISTINCT with GROUP BY
Since you can derive the first value from the second query.  For example, you could create a generic result table of:
column_name, value, value_count --clustered on ( column_name, value )
And load the results of all the 3B queries into that table; from that, you can quickly compute the 3A results.
If you do that, you also:
    --don't need ORDER BY in 3B, you can wait until final display to the user to do an ORDER BY
0
PortletPaulfreelancerCommented:
GROUP BY automatically (and always) produces a "distinct list" as determined by the fields specified in that clause.

It is wasteful to ask for "select distinct" of a list produced by  a "group by" (which is already distinct)

nb: the select clause is executed AFTER the group by clause
0
jkurantAuthor Commented:
Thank you, @Paul Maxwell. I was able to remove some redundant usage of DISTINCT with GROUP BY.

@ScottPletcher, it does not help to skip the count because I don't want to pull the data if there are too many records, which there certainly can be. So, if there are a million distinct values, I don't want to pull them from the database. In that case I show "Too many unique values to show". Also, at least on our server the INFORMATION_SCHEMA views are quick, but I do wonder about deadlocks. I am not sure what the alternative is, though.
0
PortletPaulfreelancerCommented:
>>" I am not sure what the alternative is, though."
I believe Scott gave you the alternative (sys.columns)

For more on this may I suggest this blog:
The case against INFORMATION_SCHEMA views

Do note that [column_id] from sys.columns IS the equivalent of [ordinal_position] however gaps in the number can exist if columns are deleted. Using ROW_NUMBER() OVER(|partition by object_id| ORDER BY column_id)  can be used to avoid gaps.
0
jkurantAuthor Commented:
So is there no way to make a query with an OVER CLAUSE that uses the primary key for the table without me having to specify what that key is?
0
PortletPaulfreelancerCommented:
sorry, I'm not sure what you mean by the query on row_number()

I was pointing you toward an alternative to information_schema views as recommended by Scott, and one should always take Scott's recommendations seriously as I have very high regard for his opinions on SQL Server!

My reference to row_number() is only relevant IF you REQUIRE "ordinal_position"  to have no gaps. If you don't care about possible gaps then there is no need for row_number() at all.

If you are now thinking you can use row_number() as an alternative to your existing queries, I'm not sure how it would help. If you NEED to PARTITION the number sequences you will need to know what you are partitioning by.

btw: In the case I showed earlier though sys.columns.object_id won't change any time sooon so using it in a query is pretty darn safe IMHO
0
jkurantAuthor Commented:
Thank you, @Paul. @Scott may be incredibly knowledgeable about SQL Server, but he doesn't read the questions carefully. On two of my questions he has given me answers that are specifically shown to be not possible given my question. For example, he suggests skipping the count when I explained that I did the count to avoid trying to pull millions of records, which I am sure he understands.

My problem is that I do not understand a number of features of SQL Server, including RowNumber(), OVER clause, partitions, and pivot. I wonder if any of these could produce a result that it gives me the unique values of all fields in a table in one query. Maybe not!

I suppose a stored procedure would probably be the best solution, but these are more scrutinized in my organization. Maybe it is worth the scrutiny.
0
Scott PletcherSenior DBACommented:
@jkurant:

OK, I'll drop out then, since you believe I'm incapable of helping you.  Good luck with future qs.

I admit, I can on occasion misread or not fully read a q.  But, keep in mind, I'm very busy with work and volunteering my time to help askers here, who are normally grateful for that effort.
0
Walter RitzelSenior Software EngineerCommented:
@jkurant,

you cannot use the metadata views or tables in SQL Server to run a query that will produce in a single query the information you want. And this is not related to any SQL Server functionality or feature: is that what you are trying to accomplish is complex.
So, what you need is to create a stored procedure that can retrieve the metadata and then use it to produce a dynamic query that will get the information you need.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
jkurantAuthor Commented:
It looks like the only solution will be to create a stored procedure. Paul did help me eliminate the redundant DISTINCT when used with GROUP BY.

Thank you, experts!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.