Change identity column's data type from int to bigint in merge replication.

hi,

right now we have some concern on identity column's data type and we would like to change the type from int to bigint, as from replication point of view we want to store more on an identity column so when replicating the identity column is not going to conflict with each other's update from different host on merge replication.

sth like this:

CREATE TABLE [dbo].[header_transaction_fee](
      [vms_header_transaction_fee_id] [bigint] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,

Open in new window


but if the data type change from int to bigint, if we add an index later on for this bigint column, the performance of building the index will hurt and performance will also impacted as index only good on small int, it should be the clustered index has problem on this but not non clustered index, am I right?

http://stackoverflow.com/questions/2124631/sql-server-int-or-bigint-database-table-ids

as we are replicating this kind of table to 3 servers and need to create an acceptable range on each server. But someone propose that if we use int, we have a limit of 2billion, which is roughly 650 million range per server. In most cases this will be fine for 10 years possible, some tables it may only be good for 2 years. I guess it comes down to what is being loaded into the table and how often.

but:

1) buy why extend that identity to bigint related to how many years the table can store data? just because of the range of merge replication on that table ? so in the case of merge replication, how can we deal with  this kind of range not enough problem for a table keep have data insert and data expected to store in DB for a long time, e.g. 7 years?

2) if we can only enlarge the range of identity, any way to solve this kind of problem except keep increasing the range? any permanent solution for that ?

3) if the type is bigint anyway, really has a performance impact on creating index on that column ? or we usually don't create an index on identity column? and if we create we don't create clustered index on it anyway but non clustered index, this is the way it should be?
LVL 1
marrowyungSenior Technical architecture (Data)Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Jacques Bourgeois (James Burger)PresidentCommented:
You might want to explore the notion of a uniqueidentifier. It has been designed expressly for the type of questions that you ask.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
lcohanDatabase AnalystCommented:
Sorry to say however due to the "implicit" data-type it "hides" - varchar(36) it will be a disaster for Clustered (and non clustered indexes), PKeys/Fkeys, ETC in particular on replicated (where merge is the worst) database scenario.


I think the problem would be to deal with the alter table command because of that IDENTITY property column... is this a production already used system?

As far as indexes build/rebuild and performance due to 4 to 8 bytes increase storage from int to bigint this is by far better than GUID and the best you can get so far. Add NONCLUSTERED PKeys and use NONCLUSTERED indexes for these type of IDENTITY columns as most likely it is not relevant (useful) for most of the queries if the order of rows in the table is matching your IDENTITY column.
1
marrowyungSenior Technical architecture (Data)Author Commented:
Jacques,

that one is a 16-byte GUID, clustered index can't define on it as it will make the whole thing even slower.

but why is it ?

from the link: https://msdn.microsoft.com/en-us/library/ms187942(v=sql.100).aspx, it said:

"Merge replication and transactional replication with updating subscriptions use uniqueidentifier columns to guarantee that rows are uniquely identified across multiple copies of the table."

I think the replication will use the  table 's identity column to handle this ?

so what the table structure will be? this :

CREATE TABLE [dbo].[header_transaction_fee](
      [vms_header_transaction_fee_id] [bigint] uniqueidentifier (1,1) NOT FOR REPLICATION NOT NULL,

Open in new window


 ?
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

marrowyungSenior Technical architecture (Data)Author Commented:
lcohan,

"I think the problem would be to deal with the alter table command because of that IDENTITY property column... is this a production already used system?"

yes it is and one of our developer see we have this:

CREATE TABLE [EDOCU].[vms_header_transaction_fee](
      [vms_header_transaction_fee_id] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,

Open in new window


and he suggest the identity to be bigint.

"varchar(36) "

that's for bigint ?

"alter table command "

why worry about that? The alter can make data lose ?

"As far as indexes build/rebuild and performance due to 4 to 8 bytes increase storage from int to bigint this is by far better than GUID"

so you mean using bigint as the typs is far better than GUID, which is uniqueidentifier on performance ?

"or most of the queries if the order of rows in the table is matching your IDENTITY column. "

you mean if the identity column if change from int to bigint, if nonclustered index is defined on that identity column, the performance doesn't help much?

and defining clustered index on identity column in this case can be faster ?
0
marrowyungSenior Technical architecture (Data)Author Commented:
Jacques,

any update for me ?
0
Jacques Bourgeois (James Burger)PresidentCommented:
Sorry, I have already gave my input. A uniqueidentifier is the tool that was created for that purpose. This is what Microsoft recommends and uses by default. They should know, don't you think?
0
marrowyungSenior Technical architecture (Data)Author Commented:
tks for much, '

but what I am focusing is that:
"
that one is a 16-byte GUID, clustered index can't define on it as it will make the whole thing even slower.

 but why is it ?
"

tks anyway.
0
Jacques Bourgeois (James Burger)PresidentCommented:
I think the replication will use the  table 's identity column to handle this

You are wrong about that one... unless you define your column as a unique identifier. Otherwise, SQL Server will add a column with a ROWID. Look in the reference I sent you in one of my previous posts: merge replication adds the column rowguid to every table, unless the table already has a column of data type uniqueidentifier with the ROWGUIDCOL property set (in which case this column is used)

A GUID is the only way to insure with a sufficient probability that you will not get the same value from 2 different sources/databases.
0
marrowyungSenior Technical architecture (Data)Author Commented:
"A GUID is the only way to insure with a sufficient probability that you will not get the same value from 2 different sources/databases. "

ok, what I saw is lcohan said change from int to bigint is better than GUID.

tks anyway.
0
Jacques Bourgeois (James Burger)PresidentCommented:
The programmers who designed SQL Server know what they are doing. They would not use a uniqueidentifier for replication if it was not good for performance.

Icohan sees the GUID as as a varchar(36), because this is the way it is shown to us in order to be readable. But internally, it is a sequence of hexadecimal numbers. It is thus treated as a numeric as far as indexing is concerned, not as a string.

And if you really want an answer that you can be confident with, do as I would do: test both ways on a test database.
0
marrowyungSenior Technical architecture (Data)Author Commented:
"And if you really want an answer that you can be confident with, do as I would do: test both ways on a test database. "

yeah we will.


happy new year sir.

 I will read more on your suggestion. sometimes programmer may not know what they are doing, that's why they come and ask.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.