Database replication questions

Hi guys, right now I am reading the documentation of some FB replication engines. The type of the replications is two way with logs. There are some things I don't fully understand.
1. It is sertain, that the PKs created by each slave must be unique. For example the first will create PKs starting from "1", the second starts with "1 000 000 000" etc. Wht I don't understand is that some replication eingins like FiBRE, say you need to have unique PK for all the tables. For exmple on table "customers" you create a record with pk = 1 and then on the table "clients" the first row must be pk = 2. Is this really necessary? I think its not. Tell me pls.

2. The second thing that can't get out of my head is the so called "Update Conflict". For examle the slave 1 creates a record with values 1 2 3, and replicates them to the server. Slave 2 synchronises its DB and changes the field 2 to 5 and replicates. then the first slave changes the value and replicates. And BOOM, here comes the problem, cos if slave 1 has query of "update client set name = 'Peter' where name = 'Not Peter'" but the second slave has changed the value of "Not Peter" to "John". When the first one replicates the DB this update statement will change nothing. Pls give me a solution to this problem.
Who is Participating?
pcelbaConnect With a Mentor Commented:
Your scenario is possible and it will always cause similar questions...

You may solve it e.g. by timestamps. If you refuse updates having older timestamps than existing records then all records will synchronize. Time must be synchronized on all computers which is not problem today.

Another solution is based on timing of your update operations. If you decide "Process incomming data first then create outgoing batch" all data will be synchronized but not all users which are simultaneously updating same records on different client sites will see their changes later because they will be updated (synchronized) by contents incomming from the server.

The problem can be slightly reduced if you don't synchronize whole records but single columns only. It, of course, means additional coding.

Deletion is not a problem on the first view only. You should reduce records deleting and replace it by records expiration (means new column containing ExpirationDate). In such case you are sure you don't delete records containing foreign keys used "somewhere else".

Each replication is somehow specific and you'll see many additional problems with the first data.
1. This is really not necessary but I know one application which requires it... It uses PK from all tables as PK in common DeleteLog. Easy to fix but it still works this way. On the other hand sometimes the PKs are generated by one global function which does not recognize table names. And they are unique across the whole app.

2. Update conflict is obvious in systems which are synchronizing data in both directions. Simple rule says: The last is the winner.  The replication does not look at old data obviously and it does not send SQL commands. It sends and updates the whole row based on PK and timestamp. Of course, if you use replication system based on SQL commands replication then it will work as you described. Such system is dangerous because it can easily cause data integrity problems.

IncognitoManAuthor Commented:
Thank you pcelba,
so you tell me "the last is the winner". This sounds like a good solution.
Let's see if I get it right.
First you create a table SYSLOG{pk, PCnumber, time, query, pk_of_query, data of BLOB type}, and another table SYSTRANS in which we will have the records that have been repicated.
After that you create trigers for all the tables that will insert data in SYSLOG table.
Then you create a program, that will take from the server all the records, that have been inserted deleted or inserted by a different SYSLOG.PCnumber than you and doesnt exist in your SYSTRANS table and send them to the server, then put their PKs into SYSTRANS table.
On the server there will be also have those two tables, but the SYSTRANS table will have the ID of the computer and the pk of the record that have been replicated, so that the server can rememmber which record to which computer was sent. Then insert a record into the server's SYSLOG and SYSTRANS and exequte the query.

But here comes the funny part.
If the operation is Update when the client 1 takes the records from the server and sees they've been updated by another client 2, but the record has been created by client 1. If it has, it will not take the record because in the server there will be the record of client 1 but, client 1 will have the record of client 2 :) which is stupid. The same thing need to be done with the delete statement, but maybe not cos you will just try to delete an unexisting record.
Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

IncognitoManAuthor Commented:
Pls explain me this "Process incomming data first then create outgoing batch".
Well seems this timestamp method is the best. Cos in my solution the "winner" will be not the last one updated the record,, but the last one tha made a replication.
If you have more solutions pls share them with me. :)
If you know more problems I will face durring DB replications tell me about them.
IncognitoManAuthor Commented:
Interesting ideas for people that understand things and don't like the existing solutions. Thanks pcelba.
"Process incomming data first then create outgoing batch" - this is all about batch synchronization. If you decide to process sync data each evening on the Server then create outgoing data for clients and they will process them morning before the work starts everything should by OK. This loop can be done with shorter period, of course. Depends on the network topology and possibilities. Sometimes there are more client levels or clients are off-line during the day etc.

We have implemented another layer to data synchronization - data filtering. Each client has rights to see just a subset of server data based on various conditions and data properties. E.g. Customers are split among different territories and each client can see just a few of them. This will reduce the risk of data loss (or fraud) and it also reduces the amount of transferred data. The code behind depends on implementation.

Additional option should be "Sync ALL data". It is usefull when new client is installed or connected to the network.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.