[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

performance issue

Posted on 2012-09-06
17
Medium Priority
?
383 Views
Last Modified: 2012-09-18
I have a table whose the primary keys are set to varchar data type column "fullname" and "version". The entire table doesn't have ID like int...etc column. I think this is a bad design and I should create a ID column as PK.

I want to know if this will cause a future performance issue if using varchar as PK?

My second question is

I have another table doesn't have any index and PK key set. I think this is also a bad design. I want to know if this is rule - every table should have PK and every table should have an index to be a good design?
0
Comment
Question by:wasabi3689
  • 6
  • 4
  • 3
  • +3
17 Comments
 
LVL 70

Accepted Solution

by:
Scott Pletcher earned 556 total points
ID: 38374184
A varchar clustered index and/or PK is NOT necessarily wrong or a "bad design".  In fact, that is often the best choice, if that is the data's true key or that is how data is (almost) always queried.

If you were storing phone book information in a table, isn't it best to cluster it by name!?

The idea that every table should be clustered by an int/number IS 100% WRONG!

In SQL Server, yes, (almost) every table should have a clustered index (not necessarily a PK).  SQL is poor at managing the space in a heap (a table w/o a clus index).
0
 

Author Comment

by:wasabi3689
ID: 38374254
The table whose PK are join key for fullname and version. fullname+version =PK. I think because the fullname cannot be PK individually, and has to combine with version to be PK. Do think this is not a good design. How about create an auto number for the table for PK?
0
 

Author Comment

by:wasabi3689
ID: 38374260
also, cluster index must be unique key? If I set the cluster index to full name, that may not be good?
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 70

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 556 total points
ID: 38374280
SQL will make the internal value unique itself if necessary; you can still create a clustered index on the column(s) you need.  For example:

CREATE TABLE dbo.test1 ( c1 varchar(30) )
CREATE CLUSTERED INDEX test1__CL ON dbo.test1 ( c1 )
EXEC sp_helpindex 'dbo.test1'
INSERT INTO dbo.test1 VALUES('a')
INSERT INTO dbo.test1 VALUES('a')
INSERT INTO dbo.test1 VALUES('a')
SELECT * FROM dbo.test1


>> How about create an auto number for the table for PK? <<

I've already stated that could be a very bad idea.

You've fallen under the spell of people who can only accept a super-simple, every-table-the-same key, no matter what the genuine data and business requirements are.
0
 
LVL 8

Assisted Solution

by:databoks
databoks earned 92 total points
ID: 38374338
Well as I look at the columns "FullName" and "Versions" it would appear to be that there could be a chance that two persons has the same name and maybe running the same version.
But that couldn't be the case in yours - what if another developer will maintain your database?

Good Database design is one of the most important things when they are used in software. Even column names can have a huge bad impact for your software.

I believe the best solution is to use an integer value which auto increases as rows are inserted.

I have created, maintained, created complex databases and I have never ever had the requirement for using another data type other than an integer for primary key values.

Also the Index is not really necessary. Indexing uses a lot of space in your database. Because it Indexes this for faster searched. This will eventually fill up more space in your database than the rows itself - this is of course bad design.

You should only have indexes when you have thousands of rows or advanced Queries.

Today every company has a server with fast CPU with good memory and fast Hard drives. Having this would allow you have faster searches.  


I would recommend you to read these links regarding Indexing in SQL server:
http://odetocode.com/articles/70.aspx

It describes, discusses benefits and disadvantage using Indexes and gives examples on WHEN to use them.

The link also included Store procedures that can help you see how much your indexing uses of space. This is quite interesting.
0
 
LVL 13

Assisted Solution

by:alpmoon
alpmoon earned 92 total points
ID: 38374350
You should consider expected table size as well. If you don't expect more than a few pages of data (20-30K), a table without an index would be fine because scanning such a table don't take more than miliseconds. However, a unique index also prevent duplicate rows and it is  still a good idea in most cases.

I agree that fullname+version as PK is definitely OK. However, having an index on varchar column may not be a good idea if varchar column has very long data in a huge table.
0
 
LVL 70

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 556 total points
ID: 38374375
>> I have created, maintained, created complex databases and I have never ever had the requirement for using another data type other than an integer for primary key values. <<

LOL.  Just because you refuse to see something doesn't mean it doesn't exist.

EVERY SQL database I've EVER tuned, I've had to undo the dopey, one-size-fits-all-NO-THINKING-REQUIRED-int-clus-index to get decent performance, by changing the clustered key to what it should have been in the first place.  

Typically 60-70%(!) of tuning for pre-existing dbs is just doing that because people don't understand the VITAL importance of the CORRECT clustering key(s).

Yes, they have a simple (SUPER simple) clustering key, then to get even decent performance they have to add covering indexes for almost every major query, using huge amounts of disk space while drastically slowing down ALL table INSERTs, UPDATEs and DELETEs.


When you create the table, determine the proper clustered index and create it.  Even for a couple of HUNDRED rows, it can make a HUGE difference in join/lookup performance.
0
 
LVL 75

Assisted Solution

by:Aneesh Retnakaran
Aneesh Retnakaran earned 368 total points
ID: 38374419
This is a well known debate, if the table is small,i wouldn't bother changing the primary key. If the table is huge and have other non clustered indexes, it will add some overhead while inserting the data also the index size will be huge. Remember,each Non Clustered Index key, must include the entire clusterung key either explicitly in the non clustered definition or implicitly (sql server adds uniquifier) , the wider the Clustering Key the wider the NonClustering key.
as far as an identity column is concerned
- it is unique (no unncessary time wasted for uniqufying)
-it makes all the NCIs narrow
-its ever increasing, which reduces the fragmentation
- and is fixed width (if you choose a varchar column, its not fixed width, sql will add more bytes to the Clustered Key )

So the decision is yours, for smaller tables I wont change them, but for larger tables by creating a composite key on varchar columns, will waste a lot of disk space .
0
 
LVL 25

Assisted Solution

by:lwadwell
lwadwell earned 92 total points
ID: 38374915
My $0.02 worth.
A primary key index is also a unique index.  So I am guessing that the current PK on the varchar columns is enforcing uniqueness in the table - there must be a reason for that.  If you did put an arbitrary identity column in as the PK ... purely to have an integer based PK index ... you would need to add a unique index on the original varchar columns to continue to enforce uniqueness on those values also.
I sit firmly in the @ScottPletcher camp on this.  If the data, as defined and understood, has a naturally occurring column or columns that define a unique instance of the data ... then you have a primary key IMHO.
That is not to say that identity columns shouldn't be used ... they have their role too.  If you were building a billing system for example and you wanted a unique number for each invoice generated or payment receipt ... these are example of where I would use an identity column ... and coincidentally, the value actually has meaning in the context of the data besides 'being unique'.
0
 
LVL 70

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 556 total points
ID: 38376484
Quite true.  Nothing wrong an identity column per se, but it should not automatically be the key to a table.  (Identity also causes some serialization of INSERTs -- if multiple INSERTs are running at once, each has to get different ascending numbers -- which people tend to forget.)

Not all tables will have non-clus index(es).  If there are none, the relative size of the clus key is irrelevant regarding that.


>> it is unique (no unncessary time wasted for uniqufying) <<
It's not wasted, it's needed for SQL *INTERNALLY*.  It has NOTHING to do with the data itself.  Therefore it's IRRELEVANT to the proper key from a BUSINESS perspective.



Let's take a simple, common example.  I'm creating a table of state codes to verify, lookup and translate state abbrevs (AL=Alabama,FL=Florida,etc.).

So you're telling me I should create an identity column and make that the *clustered* index on that table??  To me that's insane.  The clus key should be the state code, and in this case it will be unique on its own.
0
 
LVL 75

Assisted Solution

by:Aneesh Retnakaran
Aneesh Retnakaran earned 368 total points
ID: 38376713
>Let's take a simple, common example.  I'm creating a table of state codes to verify, lookup and translate state abbrevs ,So you're telling me I should create an identity column and make that the *clustered* index on that table??
it still depends, if this table is specific to US/ Canada, your ci MUST be on the state abbrev. but if this is specific to all the countries then , you have to think about it.  say the table countain another column to store the country name which is a varchar(100) column, in this case, I prefer an identity column, I dont want something around 104 bytes included for every non clustered index key.  but instead of the varchar column, if it was just a country id (integer), you dont need an identity column, you can use (countryId + provinceCode) which will be 6 bytes. So it all depends;
Here in this particular section, the pk was created on two varchar columns, we dont really have the table structure, but assume, full name is varchar(124) and version is varchar(10), so each index key here will be 124 + 10 + ( i think  2bytes for each variable columns  = 4 ) = 138 bytes. if it is an integer column, it will be 4 bytes. so basically you are wasting 134 bytes per index key. ( i dont consider the uniqueuifier here because, he mentioned it is the primary key) .this 134 is per index, say you have 10 indexes and 1 M  rows., how  mush you can save here ?
0
 
LVL 25

Expert Comment

by:lwadwell
ID: 38376770
I would use the ISO standard country code (2 char or 3 char) abbreviations for the country code.
0
 
LVL 70

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 556 total points
ID: 38376914
>> but assume, full name is varchar(124) and version is varchar(10), so each index key here will be 124 + 10 + ( i think  2bytes for each variable columns  = 4 ) = 138 bytes. if it is an integer column, it will be 4 bytes. so basically you are wasting 134 bytes per index key. ( i dont consider the uniqueuifier here because, he mentioned it is the primary key) .this 134 is per index, say you have 10 indexes and 1 M  rows., how  mush you can save here ? <<

Yes, the upper levels of the index will take more disk space using varchar.  But that's not the critical point.  The critical factor is how the table is used.  

A little disk space is nothing compared to the performance issues of the wrong clustered index key(s)!
0
 
LVL 75

Assisted Solution

by:Aneesh Retnakaran
Aneesh Retnakaran earned 368 total points
ID: 38381593
>Yes, the upper levels of the index will take more disk space using varchar.
No, upper levels wont contain the data, Level 0 has the data..

>A little disk space is nothing compared to the performance issues of the wrong clustered index key(s)!
still everything has to be loaded in to the memory, so, will take more buffer space..

As I mentioned earlier, this has been an ongoing debate since SQL 7. these are my opinions
0
 
LVL 70

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 556 total points
ID: 38383117
>>
  >Yes, the upper levels of the index will take more disk space using varchar.
  No, upper levels wont contain the data, Level 0 has the data..
<<

The upper levels of EVERY index contain all the key columns of the index, of whatever data type those columns happen to be.
0
 
LVL 75

Assisted Solution

by:Aneesh Retnakaran
Aneesh Retnakaran earned 368 total points
ID: 38385007
Here is a test script which i created, which gives you some idea of the space wastage (check the space for the index )


create table test1 ( i int identity constraint pk_test primary key, name varchar(64) not null , dob date not null, address varchar (128) not null  )
GO
create table test2 (name varchar(64) not null , dob date not null, address varchar (128) not null )
alter table test2 add constraint pf_Composite primary key clustered (name,dob,address)
GO


set nocount on
declare @dat date = '20100101', @i int = 0 --41
WHILE @i < 100000
BEGIN
      SELECT @dat = DATEADD(d,1, @dat), @i+= 1
      insert into test1 values (REPLICATE('a',64) ,@dat, REPLICATE('b',128) )
      insert into test2 values (REPLICATE('a',64) ,@dat, REPLICATE('b',128) )
END

exec sp_spaceused 'test1'
exec sp_spaceused 'test2'

--create another index

create index idx_nc_test1_dob on test1 (dob)
create index idx_nc_test2_dob on test2 (dob)

exec sp_spaceused 'test1'
exec sp_spaceused 'test2'

--create another index
create index idx_nc_test1_address on test1 (address)
create index idx_nc_test2_address on test2 (address)


exec sp_spaceused 'test1'
exec sp_spaceused 'test2'



drop table test1
drop table test2
0
 

Author Closing Comment

by:wasabi3689
ID: 38412257
All these comments are very beneficial.
0

Featured Post

Fill in the form and get your FREE NFR key NOW!

Veeam is happy to provide a FREE NFR server license to certified engineers, trainers, and bloggers.  It allows for the non‑production use of Veeam Agent for Microsoft Windows. This license is valid for five workstations and two servers.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
This post looks at MongoDB and MySQL, and covers high-level MongoDB strengths, weaknesses, features, and uses from the perspective of an SQL user.
Please read the paragraph below before following the instructions in the video — there are important caveats in the paragraph that I did not mention in the video. If your PaperPort 12 or PaperPort 14 is failing to start, or crashing, or hanging, …
With just a little bit of  SQL and VBA, many doors open to cool things like synchronize a list box to display data relevant to other information on a form.  If you have never written code or looked at an SQL statement before, no problem! ...  give i…

873 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question