Link to home
Start Free TrialLog in
Avatar of IncognitoMan
IncognitoManFlag for Bulgaria

asked on

Tutorial or lessons on how to write a replication engine for FireBird

Hi guys,
I have a database and want to write my own replication engine with a log database or tables. I need the technology and algorythms all explayned somewhere.
Now pls don't just search the google and give me the first site you find as i've already done it.
I tryed FiBRE, but in order to install it I need to recreate my tables and sinse they are 112 it's not an option.
Avatar of twinsoft
twinsoft
Flag of Greece image

Hi, there are several matters that need to be considered before you start...

1) Kind of Replication (one way, both ways, all to all etc)
2) Volume of Data (Estimated number of Records and Record's Fields, do you have blobs ?)
3) Bandwidth (need to compress data before sending ?)

Waiting for some more info...
Avatar of IncognitoMan

ASKER

Hello twinsoft, thank you for your reply.

1. The replication kind is Master - Slave, both ways with logs. The idea is to have a local DB even if the connection with the server is lost.
2. Record fields are from 2 to 40, and the number of records is indefinite. I mean this DB is going to work on different customers and i have no idea how mutch records they will be inserting. And i need it to work for indefinite number of local slave DBes.
3. This part with the compressation is interesting :). I didnt consider it, but i like it. Well if the customer is in a big city it will be high speed, but we have small village towns where the connection is still Dial-Up.
So, you will have a main server where you will send for example all your orders from all the clients and if the server is down the clients will still continue to work on local mode. When the server is up again the orders will be send to the main server. Similarly all the changes in the server (new items, prices etc) will be replicated to all clients.

Now there is a problem with logs, and that is that the amount of data can get very big. Depending on how many changes you make to every record (insert new record, update to set a flag field, update again to set another field etc) you could have for one record of actuall data several lines in the log table. I usually prefer to synchronize DBs when the client store is not working (at the end of the day). In that way i get the final snapshot of the DBs and not all the intermediate states of data. First i replicate all data from clients to main server and then from server back to all clients. The procedure is initiated by the main server (with a scheduler), and when it is finished the remote clients are shutdown. But this depends on the customer needs.

Now, if you must use logs, you should add triggers to your DB with high position number, for every action in the DB (after IUD triggers) . All this will be recorder in a log table. The log table should have the name of the table, the name of the primary key(s), the type of action (insert, update, delete) and the actual data. It should also have a timestamp to record the time of the change in case of a replication conflict (just to know what time each client updated the record).

Then the fun part begins. When you want to send the log to the server you should parse the log table and create a SQL statement for every record. These SQL statements are stored in a txt file (or a memory stream). Then the statements are compressed and send to the server where they are decompressed and executed. You can send the compressed data using a FTP component or make a tcp/ip client/server connection. The same is done the other way around to replicate the server with all clients. In order for all this to work you should have unique PR keys for all your transactional tables. The best approach is to use bigint (64 bit) PR keys and start every client from a mark of 1 Billion records. So the first client will start from 1, the second from 1000000001, the third from 2000000001 etc. This will give each client a one billion records gap to work with (i think it is enough).

Hope this helped...
ASKER CERTIFIED SOLUTION
Avatar of twinsoft
twinsoft
Flag of Greece image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Your solution is simmilar to FiBRE. The part I dont like is that I need to have globally unique PKs. I already do have a database and I need to rewrite it. God, I have to make again over 100 tables. Because I know that changing field types, can damage the table and I also experienced it. So as I understand it, there is no way to do this without globally unique PKs, as even every PC need to create unique PKs. That was a problem I couldn't solve when I tryed to create my own DB replication.

Also I don't like the part with the 1000 000 001 PKs, cos i need it to work with at least 1000PCs. So the records are only 4 000 000 and they can be filled for a perod of 3-4 years.
SO the idea that came first in my head is to make the PK varchar. give it a type of varchar(100) and forget about everything :).
Oooo look whtat link I found. It's written a long time ago for IB, but is applyable for FB.
http://www.ibphoenix.com/main.nfs?a=ibphoenix&page=ibp_howto10

Thank you very mutch, twinsoft!
His answers are superb!!!
Oooh the PK can be int64 which is not a 4 000 000 000 value variable, my mistake. Still I think that varhar is the most easy to use PK. Give me an idea.
Oh and explain me why do I need to have globally unipue PKs for all the tables, I didnt get that. Doesn't the table name and the pk value make the record unique???
Hi,
i wrote all transactional tables. Let me explain.

Let's say that you have a table for the Products/Items in your DB. In terms of replication you have two options:

1) Clients cannot change this table. All modifications are made on the server side DB and then they are replicated to all the clients.
2) Client can make changes to this table (insert/update/delete) and then these changes are replicated to the server DB and, if necessary, to all the other clients.

In the first case this table is not transactional, but in the second it is. So as i said before it depends on what kind of replication you want. In every DB the PKs are local. When records from different DBs are gathered to one DB then we need Global Unique Pks.

Varchar(100) for PK does not sound good to me as it would increase the size of your DB. A bigInt needs 64bits to store its value on the DB (8 Bytes) and this does not change as the numbers increase, which is not the case with varchar. One other important disadvantage is that FB handles indexes of numeric types better than varchar ones (has to do with compression, paging and other FB internals). This will definitely have an impact in your memory consumption and FB performance...

Good luck...