[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1179
  • Last Modified:

What is Right and/or Wrong with SELECT *

I have been around a couple of blocks with SQL Server. But one direction I am not clear about is whether or not to use SELECT * in my views and stored procedures.

For maintenance purposes, it seems to me that using SELECT * relieves the maintenance person of having to modify a lot of stored procedure and view scripts. All they have to do is run sp_refreshview and sp_recompile as appropriate after altering a table.

Generally, you are altering a table to add columns, and if you always add columns to the end of a table, then even forgetting to run sp_refreshview and sp_recompile will not cause regression errors (the new fields will not be there but the old ones will still be referenced correctly.

So, why else would SELECT * be "evil?
0
anyoneis
Asked:
anyoneis
  • 8
  • 4
  • 2
  • +4
6 Solutions
 
ptjcbCommented:
Never use SELECT * in a production environment. You will be returning data to the view/stored proc that it does not need. Always make your queries as concise as possible.
0
 
Carl TawnSystems and Integration DeveloperCommented:
Only in so much as if you only want a small selection of fields returned then there is a large overhead using *, since you will also retrieve a whole load of fields you're not actually interested in.
0
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
I agree with above.
 * it needs a small overhead as it needs to expand the * into the list of columns (although inside views and procedures, that's the least problem
 * if tables/views are changing, data types change etc, the dependant views/procedures are not automatically updated (despite the *), as they are precompiled. got some nice surprises with that.
 * it needs to fetch from disk/ram, transport over the network, store in the client RAM more data that is eventually needed (as said above). Optimizing applications starts with reducing also there.
 * it will not be clear from which table column come from when you start joining tables (using table aliases is very important for readability of the query)
0
Free Backup Tool for VMware and Hyper-V

Restore full virtual machine or individual guest files from 19 common file systems directly from the backup file. Schedule VM backups with PowerShell scripts. Set desired time, lean back and let the script to notify you via email upon completion.  

 
Anthony PerkinsCommented:
>> it seems to me that using SELECT * relieves the maintenance person of having to modify a lot of stored procedure and view scripts.<<
That is just the point:  It relieves the maintenace person of thinking.  Very dangerous indeed.
0
 
anyoneisAuthor Commented:
ptjcb: This extra data overhead is only a factor if you don't need all of the columns. Moreover, "SELECT *" is extremely clear and MUCH MORE CONSISE than having a long list of columns. Is there anything else you can offer to support "Never use SELECT * in a production environment?"

angelIII: I think the overhead that carl_tawn was referring to is a result of transferring data that you don't need. But, I am only talking about using SELECT * when I need all the data anyways.

When the table changes and a view references columns by name, the view gets the correct columns even if the column order changes - of course, it doesn't get the new column(s), until you recode it. If a column changes type, for instance a char becomes an int, the view reflects that change also, without running sp_refreshview.
 
When SELECT * is used in a view and the column order changes, the view still returns the first n columns, where n is the count of the old set of columns. It will however, return the correct new results if you merely run sp_refreshview.

So, in the former case, the system is automatically detecting the changed columns, but in the SELECT * case it is not. Which one of these cases sounds like it will be more performanct? To me, it sounds like the latter will be, since the system is obviously mapping columns by ordinal and not by name. The system seems to be doing less work at run time.

David




0
 
anyoneisAuthor Commented:
acperkins: No, they just have to THINK to run the sp_recompile and sp_refresh sprocs. What they don't have to do is TYPE a bunch of columns names in a bunch pof places. :-)
0
 
Anthony PerkinsCommented:
>>No, they just have to THINK to run the sp_recompile and sp_refresh sprocs. <<
You are missing the point that does not require a thought process.  

>>What they don't have to do is TYPE a bunch of columns names in a bunch pof places.<<
Again you are misisng the point: It is not a question if typing a bunch of columns.  It is a question of which columns should be typed.
0
 
anyoneisAuthor Commented:
acperkins: I still don't see your point. If I want all of the columns, what is the benefit of saying "I want column1, column2, and column3" over saying "I want all three columns"?

Requiring the maintenance person to "think more", without some tangible benefit, is not something I want to do.


0
 
Anthony PerkinsCommented:
Fair enough and good luck.
0
 
anyoneisAuthor Commented:
acperkins: >>and good luck.

Exactly! :-)

Look, I get it that people say you should not use "SELECT *"! And these are people with a great deal of SQL experience and knowledge. But what I don't yet know is the "why" behind the heurstic!

Could it be that this is one of those heurstics that gets drummed into people's head so much that the original purpose (to not process and transfer more data than is needed) is lost? Or is there a bear around the next corner?

I respect the lore enough to dig for a supporting argument for this heuristic. But so far, I can't find one.


0
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
WHY?
well, the reason is mainly the fact of documentation: existing, level of it, and the reading of it.
Today, you encode all your views with SELECT *, but if there is no documentation/note in the view itself, that, every week there should be a job that runs sp_refreshview to ensure that in case the table def changes, the view will reflect that change automatically and that indeed you really want all columns, the next person to work with this will not know it.

If you put that note/doc inside all those views, I think you can go with the SELECT * with no problem at all.

Regarding the sp_refreshview:
if you create a table X ( field1 int, field2 datetime, field3 varchar)
then view Y ( select * from X )
then drop table X, and create new table X ( field3 varchar, field1 int, field2 datetime ) ie reordering the columns.
fill data into table X

now, without first running sp_refreshview, run a
SELECT * FROM Y

and see what is happening

0
 
anyoneisAuthor Commented:
angelIII: >>If you put that note/doc inside all those views, I think you can go with the SELECT *
              >>wth no problem at all.

Even better - I put my documentation in a maintanance plan document that tells them how to modify the database.  I don't need to run a sp_refreshviews "in case" something has changed - that indeed would be the mental laziness that acperkins talked about.  Rather, I have defined a process for making changes that includes running sp_refreshviews when appropriate.

Other input: I have a coworker who thinks there is (was?) a difference in how SQL Server optimizes/caches a SELECT * view versus a SELECT columnnames query - this line of investigation might bear some fruit.



0
 
Scott PletcherSenior DBACommented:
>> If I want all of the columns, what is the benefit of saying "I want column1,  column2, and column3" over saying "I want all three columns"? <<

How do you know that you will need all the columns added TOMORROW?  Or the week after that?  Or six months from now?  How can you "need" the data if you don't even know it exists???

Also, if a join is used, an * can cause *errors* if a column added to one of the joined tables duplicates one on an existing table, and the code receiving the results does not expect/allow two columns with the same name.


A final reason is that if you use * people will think you a poor coder -- and there will be evidence to support that conclusion.
0
 
nmcdermaidCommented:
I don't know how many times I have seen this in code:

INSERT INTO TABLE1
SELECT * FROM Table2


and it gets my goat because it means I can't just add another column to either table because this particular bit of code will crash instantly as the fields don't match.


Between that and ScottPletchers comment you can see how you are inviting errors in your systems if you don't specify the columns.




0
 
anyoneisAuthor Commented:
Please note that I have not said that SELECT * is the best solution to use in all cases. I have merely challenged the community here to justify the prevailing heuristic that "SELECT * is evil."

So far, I have heard nothing that comes close to supporting this blanket assertion! (I certainly acknowledge that people have brought up valid reasons for CIRCUMSTANTIALLY eschewing Select *.)

ScottPletcher:
>>How do you know that you will need all the columns added TOMORROW?  Or the week after >>that?  Or six months from now?  How can you "need" the data if you don't even know it exists???

Simple. If the whole purpose of the view or stored procedure is to retrieve ALL the columns of the table, I know TODAY that tomorrow I will need that as yet unknown column to fulfill the purpose of the view or stored procedure.

>>Also, if a join is used, an * can cause *errors* if a column added to one of the joined tables >>duplicates one on an existing table, and the code receiving the results does not expect/allow two >>columns with the same name.

True enough. And, when I have column name duplication, I must resort to naming all of the columns. For this very reason, I don't have 15 different ID columns in my database, I have CustomerID in the Customer table, and AddressID in the Address table, ...

>>A final reason is that if you use * people will think you a poor coder -- and there will be evidence >>to support that conclusion.

In this context, that is nothing but circular reasoning. As a developer with over 30 years of experience, I am seldom intimidated by herd mentality - I have seen too many herds run over the edge of a cliff or miss a watering hole.

If those people can't tell me why they think I am a poor coder for using SELECT *, there is evidence that they might very well be the victims of a misconception or a gross generalization.

nmcdermaid:
>>INSERT INTO TABLE1
>>SELECT * FROM Table2

That might be a problem, and perhaps points out a circumstance where it is shortsighted - On the other hand, I could say that the code given is obviously designed to copy ALL of the columns from table2 into table1, and if you are trying to change table1 and not also change table2, you are violating the original design and might very well be going down a wrong path.
0
 
anyoneisAuthor Commented:
angelIII: I tried:

/*Regarding the sp_refreshview:
if you create a table X ( field1 int, field2 datetime, field3 varchar)
then view Y ( select * from X )
then drop table X, and create new table X ( field3 varchar, field1 int, field2 datetime ) ie reordering the columns.
fill data into table X
*/

SET NOCOUNT ON

DROP TABLE X
GO
CREATE TABLE X ( field1 int, field2 varchar(10), field3 char)
GO
DROP VIEW Y
DROP VIEW Z
GO
CREATE VIEW Y AS SELECT * FROM X
GO
CREATE VIEW Z AS SELECT field1, field2, field3 FROM X
GO
DROP TABLE X
GO
CREATE TABLE X ( field3 varchar(10), field1 int, field2 varchar(10), newfield int  )
GO
INSERT X VALUES('a', 11, 'abc', 99)
INSERT X VALUES('b', 12, 'def', 98)

SELECT '' [Table X], * FROM X
SELECT '' [View Y], * FROM Y
SELECT '' [View Z], * FROM Z
EXEC sp_refreshview 'Y'
EXEC sp_refreshview 'Z'
print '(Did the refresh)'
print ''
SELECT '' [Table X], * FROM X
SELECT '' [View Y], * FROM Y
SELECT '' [View Z], * FROM Z

I got:

Table X field3     field1      field2     newfield    
------- ---------- ----------- ---------- -----------
        a          11          abc        99
        b          12          def        98

View Y field1     field2      field3    
------ ---------- ----------- ----------
       a          11          abc
       b          12          def

View Z field1      field2     field3    
------ ----------- ---------- ----------
       11          abc        a
       12          def        b

(Did the refresh)
 
Table X field3     field1      field2     newfield    
------- ---------- ----------- ---------- -----------
        a          11          abc        99
        b          12          def        98

View Y field3     field1      field2     newfield    
------ ---------- ----------- ---------- -----------
       a          11          abc        99
       b          12          def        98

View Z field1      field2     field3    
------ ----------- ---------- ----------
       11          abc        a
       12          def        b


So, neither view reflects the true contents of table X *before* sp_refreshviews is run. Only the SELECT * view reflects the true contents of table X *after* sp_refreshviews is run.

I will readily concede that if only a subset of columns is wanted, a named subset of columns should be used to define the view . However, if ALL columns are wanted, the evidence is pointing to the superiority of SELECT *.

David

0
 
nmcdermaidCommented:
>> I could say that the code given is obviously designed to copy ALL of the columns from table2 into table1,

In every case, yes, it was originally designed to do that... but in every case I had a good reason to add a column to the source table (i.e. a lineage field or a datestamp) and not the destination table... and therefore cursed the original creator when it broke!! I guess you could say I was extending the design rather than violating it, though perhaps the original developer might have something to say about that!


Actually the place where I cursed most was when it was used to feed a VB App. I needed to add a field to a certain lookup table but when I did it crashed the VB App (due to SELECT * and some poor coding within the App)


Nice little discussion you have going here. Its interesting to see everyones viewpoints.


I guess you could say that SELECT * is OK as long as absolutely EVERYTHING downstream is specifically allowing for the fact that the columns might change.

I have just found in my experience that I've never had the need for that particular functionality, I've had more of a need for an existing application to be changed without it breaking.
0
 
Scott PletcherSenior DBACommented:
>>  If the whole purpose of the view or stored procedure is to retrieve ALL the columns of the table, I know TODAY that tomorrow I will need that as yet unknown column to fulfill the purpose of the view or stored procedure. <<

Now *that's* bad logic!  How do you know that there will be a *business* purpose to have *"all"* columns all of a table *at once* when you don't even know what the columns are?  You *CAN'T*.  It's just not possible.  Especially if you add an IMAGE or TEXT column.  If it isn't needed, you will waste *huge* amounts of resources transferring an image or text column that can't even be displayed without changing the code receiving it.

In my 25+ yrs of experience, I've seen many other people with 30 yrs of experience that didn't seem to grasp some basic concepts; it's very disappointing.  But it helps show why development is moving to India, I guess.
0
 
anyoneisAuthor Commented:
nmcdermaid: >>Nice little discussion you have going here. Its interesting to see everyones viewpoints.<<

I agree, it is interesting! I am usually very leary about doing something that goes against "the lore", partially because I have had it come back to bite me in the past. And also because I like to honor "the principle of least surprise" in my code. But if another approach seems compelling for one reason or another, I like to explore it - I still have to ask "why" when I don't "feel" the reasons for not taking a certain route.

>>Actually the place where I cursed most was when it was used to feed a VB App. I needed to add a field to a certain lookup table but when I did it crashed the VB App (due to SELECT * and some poor coding within the App)<<

Don't you love it when something <should> work and it kicks you in the face? I ran into several adventures of this kind using ADO.NET in my last project - specifically using relations and filters.

>>I have just found in my experience that I've never had the need for that particular functionality, I've had more of a need for an existing application to be changed without it breaking.<<

Perhaps because I am using DataSets, and windows forms databinding, if any table develops an extra column(s) it doesn't threaten my code.

ScottPletcher: >> Especially if you add an IMAGE or TEXT column.  If it isn't needed, you will waste *huge* amounts of resources transferring an image or text column that can't even be displayed without changing the code receiving it.<<

Assuming that the transfer of the large new column was optional, they would have at least two good choices:
1) Recode the stored procedures, views, functions, etc. which referred to that table to use "Select <columnlist>"
2) Create a new table for the new data, and only refer to it when needed.

All other things being equal, I would advise them to go the latter route.

>>In my 25+ yrs of experience, I've seen many other people with 30 yrs of experience that didn't seem to grasp some basic concepts; it's very disappointing.  But it helps show why development is moving to India, I guess.<<

Very little of my 30+ years is SQL related. I cut my teeth on flat files and ISAM, MKAM, ... - all hierarchical databases. Sometimes, I come to problems with a different look than someone who has been living and breathing SQL for many years. And sometimes, a different look is useful.

As for India, I hear that new Computer Science grads there are very concerned about all of their jobs moving ... to Korea! :-)

David
0
 
Anthony PerkinsCommented:
Thanks for the points and hopefully you will come back in 30 years and let us know what you have learned :)
0

Featured Post

Prep for the ITIL® Foundation Certification Exam

December’s Course of the Month is now available! Enroll to learn ITIL® Foundation best practices for delivering IT services effectively and efficiently.

  • 8
  • 4
  • 2
  • +4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now