Database replication questions

Posted on 2009-12-19
Last Modified: 2013-12-09
Hi guys, right now I am reading the documentation of some FB replication engines. The type of the replications is two way with logs. There are some things I don't fully understand.
1. It is sertain, that the PKs created by each slave must be unique. For example the first will create PKs starting from "1", the second starts with "1 000 000 000" etc. Wht I don't understand is that some replication eingins like FiBRE, say you need to have unique PK for all the tables. For exmple on table "customers" you create a record with pk = 1 and then on the table "clients" the first row must be pk = 2. Is this really necessary? I think its not. Tell me pls.

2. The second thing that can't get out of my head is the so called "Update Conflict". For examle the slave 1 creates a record with values 1 2 3, and replicates them to the server. Slave 2 synchronises its DB and changes the field 2 to 5 and replicates. then the first slave changes the value and replicates. And BOOM, here comes the problem, cos if slave 1 has query of "update client set name = 'Peter' where name = 'Not Peter'" but the second slave has changed the value of "Not Peter" to "John". When the first one replicates the DB this update statement will change nothing. Pls give me a solution to this problem.
Question by:IncognitoMan
    LVL 41

    Expert Comment

    1. This is really not necessary but I know one application which requires it... It uses PK from all tables as PK in common DeleteLog. Easy to fix but it still works this way. On the other hand sometimes the PKs are generated by one global function which does not recognize table names. And they are unique across the whole app.

    2. Update conflict is obvious in systems which are synchronizing data in both directions. Simple rule says: The last is the winner.  The replication does not look at old data obviously and it does not send SQL commands. It sends and updates the whole row based on PK and timestamp. Of course, if you use replication system based on SQL commands replication then it will work as you described. Such system is dangerous because it can easily cause data integrity problems.


    Author Comment

    Thank you pcelba,
    so you tell me "the last is the winner". This sounds like a good solution.
    Let's see if I get it right.
    First you create a table SYSLOG{pk, PCnumber, time, query, pk_of_query, data of BLOB type}, and another table SYSTRANS in which we will have the records that have been repicated.
    After that you create trigers for all the tables that will insert data in SYSLOG table.
    Then you create a program, that will take from the server all the records, that have been inserted deleted or inserted by a different SYSLOG.PCnumber than you and doesnt exist in your SYSTRANS table and send them to the server, then put their PKs into SYSTRANS table.
    On the server there will be also have those two tables, but the SYSTRANS table will have the ID of the computer and the pk of the record that have been replicated, so that the server can rememmber which record to which computer was sent. Then insert a record into the server's SYSLOG and SYSTRANS and exequte the query.

    But here comes the funny part.
    If the operation is Update when the client 1 takes the records from the server and sees they've been updated by another client 2, but the record has been created by client 1. If it has, it will not take the record because in the server there will be the record of client 1 but, client 1 will have the record of client 2 :) which is stupid. The same thing need to be done with the delete statement, but maybe not cos you will just try to delete an unexisting record.
    LVL 41

    Accepted Solution

    Your scenario is possible and it will always cause similar questions...

    You may solve it e.g. by timestamps. If you refuse updates having older timestamps than existing records then all records will synchronize. Time must be synchronized on all computers which is not problem today.

    Another solution is based on timing of your update operations. If you decide "Process incomming data first then create outgoing batch" all data will be synchronized but not all users which are simultaneously updating same records on different client sites will see their changes later because they will be updated (synchronized) by contents incomming from the server.

    The problem can be slightly reduced if you don't synchronize whole records but single columns only. It, of course, means additional coding.

    Deletion is not a problem on the first view only. You should reduce records deleting and replace it by records expiration (means new column containing ExpirationDate). In such case you are sure you don't delete records containing foreign keys used "somewhere else".

    Each replication is somehow specific and you'll see many additional problems with the first data.

    Author Comment

    Pls explain me this "Process incomming data first then create outgoing batch".
    Well seems this timestamp method is the best. Cos in my solution the "winner" will be not the last one updated the record,, but the last one tha made a replication.
    If you have more solutions pls share them with me. :)
    If you know more problems I will face durring DB replications tell me about them.

    Author Closing Comment

    Interesting ideas for people that understand things and don't like the existing solutions. Thanks pcelba.
    LVL 41

    Expert Comment

    "Process incomming data first then create outgoing batch" - this is all about batch synchronization. If you decide to process sync data each evening on the Server then create outgoing data for clients and they will process them morning before the work starts everything should by OK. This loop can be done with shorter period, of course. Depends on the network topology and possibilities. Sometimes there are more client levels or clients are off-line during the day etc.

    We have implemented another layer to data synchronization - data filtering. Each client has rights to see just a subset of server data based on various conditions and data properties. E.g. Customers are split among different territories and each client can see just a few of them. This will reduce the risk of data loss (or fraud) and it also reduces the amount of transferred data. The code behind depends on implementation.

    Additional option should be "Sync ALL data". It is usefull when new client is installed or connected to the network.

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    How to run any project with ease

    Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
    - Combine task lists, docs, spreadsheets, and chat in one
    - View and edit from mobile/offline
    - Cut down on emails

    Entering time in Microsoft Access can be difficult. An input mask often bothers users more than helping them and won't catch all typing errors. This article shows how to create a textbox for 24-hour time input with full validation politely catching …
    I annotated my article on ransomware somewhat extensively, but I keep adding new references and wanted to put a link to the reference library.  Despite all the reference tools I have on hand, it was not easy to find a way to do this easily. I finall…
    Video by: Steve
    Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
    Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

    737 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    17 Experts available now in Live!

    Get 1:1 Help Now