Solved

Design of Distributed Databases

Posted on 2004-04-21
6
790 Views
Last Modified: 2013-12-12
Objective:

I want to design and implimentation using oracle the following:

*** Design of Distributed Databases ***

And after that ( which I am not discussing at the moment but later in some other question)

*** Query Decomposition ***
*** Data Localization and Optimization of Distributed Queries ***

Scenario:
I am oracle Developer and a little bit knowledge of DBA

So .. .. For "Design of Distributed Databases"

What type of hardware specifications are required.
From where I am going to start.
I am currently using Oracle 9i on windows xp and 2000professional.

In Short I just want to impliment this in lesser time, lesser cost, highest achievements.

Healthy web links/referenes/book links are helpful for me but if you explain yourself, It' s great effort for me

===> I know Orale well but very much new to DDBMS(Practical approach)

Thanks ... remercier ... bedanken
0
Comment
Question by:leoahmad
  • 3
  • 2
6 Comments
 
LVL 47

Accepted Solution

by:
schwertner earned 500 total points
ID: 10887134
Distributed Database, Transactions and Two Phase Commit

What is the Distributed Database Option?  
----------------------------------------  
A distributed system is one in which both data and transaction processing  
are divided between one or more computers connected by a network, each  
computer playing a specific role in the system.  
 
This configuration has multiple databases, each of which is accessed  
directly by a single server and can be accessed indirectly by other  
instances through server/server cooperation. Distributed systems  
allow you to have data physically located at several sites, and  
each site can transparently access all of the data.  
 
Each node can be used for database processing, but the data is  
permanently partitioned among the nodes. Several smaller server  
machines can be cheaper and more flexible than one large,  
centrally located server.  
 
The key goals of a distributed database system are availability,  
accuracy, concurrency, and recoverability.  
 
The Client-Server Model and Distributed Systems  
-----------------------------------------------  
The client-server model is basic to distributed systems. It is a  
response to the limitations presented by the traditional mainframe  
client-host model, in which a single mainframe provides shared data  
access to many dumb terminals. The client-server model is also a  
response to the local area network (LAN) model, in which many isolated  
systems access a file server that provides no processing power.  
 
Client-server architecture provides integration of data and services  
and allows clients to be isolated from inherent complexities, such as  
communication protocols. The simplicity of the client-server architecture  
allows clients to make requests that are routed to the appropriate server.  
These requests are made in the form of transactions. Client transactions  
are often SQL or PL/SQL procedures and functions that access individual  
databases and services.
 
Distributed Database Characteristics  
------------------------------------
 
This article describes the twelve specifications for the ideal distributed  
database management system and how ORACLE conforms to these specifications.  
 
Oracle's distributed architecture, comprising SQL*Net, Open Gateway and the  
Oracle Server, provides an effective solution to the challenge of sharing  
data in a networked environment.  
 
The Oracle Server's distributed architecture provides effective data sharing  
in a networked environment using both client-server and distributed database  
architectures.  
 
In a client-server environment, communication takes place between two  
processes that reside on different machines.  The client executes the  
application or application tool and sends requests to the server for data.  
The received data is processed at the client machine.  This is known as  
distributed processing.  The ideal distributed system should look like a  
non-distributed system.  Twelve specifications for the ideal distributed  
database were developed by C.J. Date.  
 
The Oracle Server supports most of the ideal distributed features.  
 
1. Local Autonomy  
 
The data is owned and managed locally.  Local operations remain purely local.  
One site (node) in the distributed system does not depend on another site to  
function successfully.  
 
2. No reliance on a central site  
 
All sites are treated as equals.  Each site has its own data dictionary.  
 
3. Continuous Operation  
 
Incorporating a new site has no effect on existing applications and does not  
disrupt service.  
 
4. Location Independence  
 
Users can retrieve and update data independent of the site.  
 
5. Fragmentation Independence  
 
Users can store parts of a table at different locations.  Both horizontal and  
vertical partitioning of data is possible.  
 
6. Replication Independence  
 
Stored copies of data can be located at multiple sites.  Read-only snapshots  
(v7.0) and updatable snapshots (v7.1 and beyond) provide read-only and  
updatable copies of tables, respectively.  Symmetric Replication using  
triggers make readable and writable replication possible.  
 
7. Distributed Query Processing  
 
Users can query a database residing on another node.  The query is executed  
at the node where the data is located.  
 
8. Distributed Transaction Management  
 
A transaction can update, insert or delete data from multiple databases.  
The two-phase commit mechanism in Oracle ensures the integrity of distributed  
transactions.  Row level locking ensures a high level of data concurrency.  
 
9. Hardware Independence  
 
Oracle7 runs on all major hardware platforms.  
 
10. Operating System Independence  
 
A specific operating system is not required.  Oracle7 runs under a variety  
of operating systems.  
 
11. Network Independence  
 
The Oracle Server's SQL*Net supports most popular networking software.  
Network independence allows communication across homogeneous and heterogenous  
networks.  Oracle's MultiProtocol Interchange enables applications to  
communicate with databases across multiple network protocols.  
 
12. DBMS Independence  
 
DBMS Independence is the ability to integrate different databases.  Open  
Gateway supports connections to non-Oracle databases.  
 
 
 
DISTRIBURTED TRANSACTIONS AND THE TWO PHASE COMMIT  
==================================================
 
Two phase commit only comes in play during a commit of a distributed  
transaction.  The whole purpose is to maintain the integrity of the  
"global" database.  In other words, two phase commit guarantees that  
everything will either commit or rollback.  
 
TRANSACTION TYPES:  
 
        LOCAL TRANSACTION      contains ONLY statements on the local node.  
 
        REMOTE TRANSACTION     contains one or more statements which ALL  
                               reference the same remote node.  
 
        DISTRIBUTED  
        TRANSACTION            contains statements that modify data in  
                               two or more distinct nodes.  
                               * The only place where 2-PHASE COMMIT comes  
                                 into play.  
 
TERMS:  
 
        CLIENTS    are nodes that references information from another database  
        (C)        server in a distributed transaction.  
 
        SERVERS    are nodes that are directly referenced in a distributed  
        (S)        transaction, or is requested to participate in a transaction  
                   because another node requires data from it.  
 
        GLOBAL  
        COORDINATOR    is the node in which the distributed transaction  
        (GC)           originates.  
 
        LOCAL  
        COORDINATOR    is the node that references data on other nodes to  
        (LC)           complete its part in the distributed transaction.  
 
        COMMIT POINT  
        SITE           is the site with the highest commit point strength  
        (CPS)          "init.ora" parameter.  It is usually the most critical  
                       site that can not afford collisions in case of an  
                       in-doubt transaction.  
 
        SCN            is the system commit number.  
 
        SCN            is essentially an internal database clock.  
                       This is a monotonically increasing an unique number  
                       for each transaction.  
 
        COMMIT_POINT_STRENGTH is the init.ora parameter that determines the  
        (CPstr)               COMMIT POINT SITE.  
 
 
When you attempt to commit a distributed transaction, you will enter  
ORACLE'S TWO PHASE COMMIT MECHANISM.  
 
TWO-PHASE COMMIT:  
 
        PREPARE PHASE:  
 
                1) Commit point site is determined.  
 
                2) Global coordinator asks all participating nodes  
                   (except commit point site) to promise to COMMIT or  
                   ROLLBACK the transaction regardless of failure).  
 
                        This information is propagated by the local  
                        coordinators.   The servers have to be prepared before  
                        the local or global coordinators (except the commit  
                        point site).  The local coordinator is responsible  
                        for asking dependant nodes to prepare.  
 
 
                                POSSIBLE RESPONSES FROM NODES:  
                                        * PREPARED  
                                        * ABORT  
                                        * READ-ONLY NODES  
 
                        Locks obtained during the distributed transaction  
                        will continue to be held.  
 
                        Redo is flushed to the local redo logs.  
 
                3) Each node will pass back the SCN for his node.  
 
                4) Global Coordinator determines the max SCN.  
 
 
 
After all the nodes have prepared successfully, we enter the commit phase.  
All transactions except those found in the commit point site are "in-doubt"  
until the commit point phase completes successfully.  
 
        COMMIT PHASE:  
 
                1) Global coordinator sends the max SCN to the commit point  
                   site and asks it to commit.  
 
                2) Commit Point Site will try to commit or everything is  
                   rolled back.    
 
                        The locks are released in the commit point site first.  
 
                3) If committed/rolled back, the Commit Point Site will  
                   inform the Global Coordinator which will commit/rolled back  
                   at that time.  
 
                4) The information will propogate down to its clients/local  
                   coordinators and they will commit/roll back and propagate  
                   the information down to their servers until there are no  
                   more servers.  
 
(note: READ ONLY nodes do not participate in 2-phase commit.)  
 
 
EXAMPLE:  
 
                UPDATE EMP SET SAL= SAL * 1.10 WHERE DEPTNO=10;  
                UPDATE EMP@HAWAII SET SAL = SAL*1.20 WHERE DEPNO=20;  
                        BEFORE UPDATE TRIGGER Fires and updates  
                        dept@newyork.  
                                another trigger fires and inserts  
                                dept_audit@paranoid.  
                        after update trigger fires and inserts into the global  
                        audit table, emp_audit@headquarters  
                INSERT INTO VACATION_TIME SELECT * FROM DAYS_OFF@HOME  
                COMMIT; (TWO-PHASE COMMIT BEGINS HERE)  
 
 
 
 
                                _______________  
                                |             |    
          select days_off       |    HOME     | SERVER   /*READ-ONLY*/  
                                |_____________| SCN=20000  
                                      |         CPstr=250  
                                      |  
                                _______________  
                                |             | GC/LC/CLIENT  
          update emp            |   LOCAL     | SCN=100  
                                |_____________| CPstr=125  
 
                                /             \  
                               /               \  
                        ____________       _____________  
   update emp           |           |      |            |   insert emp_audit  
   SERVER/LC/CLIENT     | HAWAII    |      |     HQ     |   SERVER  
   SCN=250              |___________|      |____________|   SCN =1000  
   CPstr=95               /                                 CPstr=10  
                         /  
                        /  
                    ------------  
                    |          | inserts dept_audit  
                    | PARANOID | SERVER  
                    |__________| SCN=50  
                                 CPstr=205  
 
 
        PREPARE PHASE:  
                1) COMMIT POINT SITE IS PARANOID  
                     The global coordinator will already know what the  
                     commit point strength of each node prior to the commit.  
                     Read-only nodes are not included.  
                2) All nodes except for PARANOID is asked to prepare.  
                3) HAWAII, the local coordinator, is responsible to ask her  
                   dependent nodes to prepare before she prepares.  In this  
                   case, PARANOID is a commit point site; thus, it is ignored.  
                4) The highest SCN is sent to LOCAL node via the  
                   local coordinators.  The highest SCN is 1000.  
                5) All nodes which PREPARED will flush entries of the  
                   transaction to the redo logs if not already done.  
 
If any of the nodes send an "ABORT" message back, then the transaction is  
rolled back at this time.  Any failure after the PREPARE phase will result  
with "in-doubt" transactions.  
 
        COMMIT PHASE:  
                1) PARANOID IS ASKED TO COMMIT OR ROLLBACK BY THE LOCAL (GC).  
                2) PARANOID commits at a SCN greater than 1000.  
                        a) Redo is flushed.  
                        b) Locks are released.  
                        c) outcome is relayed back to the LOCAL node (GC).  
                Assume success:  
                3) AFter receiving the information, GC will commit at the  
                   same SCN and pass the information to its dependents.  
                        a) commit flushed to redo logs.  
                        b) data locks are released.  
                        c) GC will pass the information to HAWAII and HQ.  
                                (1) They, in turn, will commit and HAWAII  
                                    will pass the information to PARANOID.  
 
If all is successful, every statement will commit with the same SCN and then  
RECO will delete the entries from "dba_2pc_pending" and "dba_2pc_neighbors"  
tables.  Afterwards, the nodes will "forget" the transaction.  
 
 
TWO-PHASE COMMIT QUIZ/ANSWERS  
=============================
 
1) What is the difference between remote transaction and distributed  
   transaction?  
 
        REMOTE TRANSACTION contains one or more statements which ALL  
                           reference the same remote node.  
 
        DISTRIBUTED TRANSACTION contains statements that modify data in  
                                two or more distinct nodes.  
                * The only place where 2-PHASE COMMIT comes into play.  
 
2) When does two-phase commit come in play?  
 
Two phase commit only comes in play during a commit of a distributed  
transaction.  The whole purpose is to maintain the integrity of the  
"global" database.  In other words, two phase commit guarantees that  
everything will either commit or rollback.  
 
3) Define distributed database.  
 
An environment that includes multiple servers where users manipulate  
data as if there is one "global database".   With the widening use of  
heterogeneous hardware environments and multiple database servers, it has  
become essential to maintain the integrity of this "global database".  Of  
course, the distributed concept is usually transparent to the end user and  
the application programmers.  
 
Another defintion could be:  
A distributed database is a database that is not stored in its entirety at  
a single physical location.  Rather, a distributed database is a database that  
is stored across a network of locations that are connected via communication  
links.  A distributed database consists of a collection of sites or nodes  
connected together into a communication network.  
 
4) When is a transaction considered "in-doubt"?  
 
After all the nodes have prepared successfully, we enter the commit phase.  
All transactions except those found in the commit point site are "in-doubt"  
until the commit point phase completes successfully.  
 
5) Define the following terms:  CLIENTS, DATABASE SERVERS, GLOBAL COORDINATOR,  
                                LOCAL COORDINATOR, COMMIT POINT SITE  
 
        CLIENTS are nodes that references information from another database  
        (C)     server in a distributed transaction.  
 
        SERVERS are nodes that are directly referenced in a distributed  
        (S)     transaction, or is requested to participate in a transaction  
                because another node requires data from it.  
 
        GLOBAL COORDINATOR is the node in which the distributed transaction  
        (GC)               originates.  
 
        LOCAL COORDINATOR is the node that references data on other nodes to  
        (LC)              complete its part in the distributed transaction.  
 
        COMMIT POINT SITE is the site with the highest commit point strength  
        (CPS)             init.ora parameter.  It is usually the most critical  
                          site that can not afford collisions in case of an  
                          in-doubt transaction.  
 
        SCN is the system commit number.  
        SCN is essentially an internal database clock.  
            This is a monotonically increasing an unique number for  
            each transaction.  
 
        COMMIT_POINT_STRENGTH is the init.ora parameter that determines the  
        (CPstr)               COMMIT POINT SITE.  
 
6) Can the global coordinator be the commit point site as well?  
 
Yes, the only prerequisites are that the node is part of the distributed  
transaction (not read-only) and it has the highest commit point strength  
parameter.  
 
7) What distinguishes the commit point site?  
 
        COMMIT POINT SITE is the site with the highest commit point strength  
        (CPS)             "init.ora" parameter. It is usually the most critical  
                          site that can not afford collisions in case of an  
                          in-doubt transaction.  
 
8) What does RECO do?  
 
RECO PROCESS  
        * gets information from the "DBA_2PC_PENDING" and  
          "DBA_2PC_NEIGHBORS" views.  
        * executes as the global or local coordinator  
        * tries to communicate wit other nodes in the in-doubt transaction  
                If it cannot connect, it will continue to try at exponential  
                intervals.  
        * resolves in-doubt transactions if the connection is up.  
        * uses the same DBLINK as the transaction  
        * removes resolved transactions from the pending transaction table.  
 
9) What happens during the PREPARE phase?  Please be as detailed as possible?  
 
        PREPARE PHASE:  
 
                1) Commit point site is determined.  
 
                2) Global coordinator asks all participating nodes  
                   (except commit point site) to promise to COMMIT or  
                   ROLLBACK the transaction regardless of failure).  
 
                        This information is propagated by the local  
                        coordinators.   The servers have to be prepared before  
                        the local or global coordinators (except the commit  
                        point site).  The local coordinator is responsible  
                        for asking dependant nodes to prepare.  
 
 
                                POSSIBLE RESPONSES FROM NODES:  
                                        * PREPARED  
                                        * ABORT  
                                        * READ-ONLY NODES  
 
                        Locks obtained during the distributed transaction  
                        will continue to be held.  
 
                        Redo is flushed to the local redo logs.  
 
                3) Each node will pass back the SCN for his node.  
 
                4) Global Coordinator determines the max SCN.  
 
 
 
After all the nodes have prepared successfully, we enter the commit phase.  
All transactions except those found in the commit point site are "in-doubt"  
until the commit point phase completes successfully.  
 
10) What happens during the COMMIT phase?  Please be as detailed as possible?  
 
        COMMIT PHASE:  
 
                1) Global coordinator sends the max SCN to the commit point  
                   site and asks it to commit.  
 
                2) Commit Point Site will try to commit or everything is  
                   rolled back.    
 
                        The locks are released in the commit point site first.  
 
                3) If committed/rolled back, the Commit Point Site will  
                   inform the Global Coordinator which will commit/rolled back  
                   at that time.  
 
                4) The information will propogate down to its clients/local  
                   coordinators and they will commit/roll back and propagate  
                   the information down to their servers until there are no  
                   more servers.  
 
(NOTE: READ ONLY nodes do not participate in 2-phase commit.)  
 
11) What is SCN and how is it significant in two-phase commit?  
 
        SCN is the system commit number.  
        SCN is essentially an internal database clock.  
            This is a monotonically increasing an unique number for  
            each transaction.  
 
The entire distributed transaction will be committed at the same SCN.  
This will allow some type of consistency during distributed recovery.  
 
12) Which two views are helpful during MANUAL distributed recovery?  
 
They are "dba_2pc_pending" and "dba_2pc_neighbors".  
 
13) When does RECO remove the entries from those views?  
 
If all is successful, every statement will commit with the same SCN and then  
RECO will delete the entries from "dba_2pc_pending" and "dba_2pc_neighbors"  
tables.  Afterwards, the nodes will "forget" the transaction.  
 
14) List some cases where you may do MANUAL RECOVERY?  
 
A dba may do some manual recovery if there was a network or system or  
database crash and the "in-doubt" transactions are holding locks that  
are crucial to other applications.  
 
15) How should you pick your COMMIT POINT SITE?  
 
COMMIT POINT SITE is the site with the highest commit point strength  
(CPS) "init.ora" parameter.  It is usually the most critical  
site that can not afford collisions in case of a doubted transaction.  
You need to coordinate with the other dba adminstrators to make that  
decision.

0
 
LVL 14

Author Comment

by:leoahmad
ID: 10888252
@ schwertner

Where sholud I start from...

I have two systems both p4 with winxp
One has install Oracle 8i the other one is oracle 9i

I am expecting answer from you.

So please help me in saying that I am working on DDBMS

Thanks
0
 
LVL 22

Expert Comment

by:earth man2
ID: 10888480
look at DATABASE LINKS and REPLICATION in Oracle Concepts Docs.
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 47

Expert Comment

by:schwertner
ID: 10888487
I read the course "Distributed DB" at the University.
Now let us speak about Oracle.

Facts:
1. You have 2 Oracle databases on separate/same machine.  Yes!

2. You can access the other DB from the each machine. Yes! How? Using DB-links! Right!

3. You can construct Distrubuted Transaction on each machine! Yes!
 
This runs under "instance 1":
BEGIN TRANSACTION;
SELECT a FROM instance1.tableX;
UPDATE instance2.tableY;                      --via db-link - this makes the transaction distributed!!!
UPDATE instance1.TableZ;
END TRANSACTION;

The most important part is in 8. above:
                              8. Distributed Transaction Management  
 
                              A transaction can update, insert or delete data from multiple databases.  
                              The two-phase commit mechanism in Oracle ensures the integrity of distributed  
                               nsactions.  Row level locking ensures a high level of data concurrency.  

4. THE MOST IMPORTANT FACT:
  In the case of the above transaction Oracle automatically will apply 2 PHASE COMMIT PROTOCOL!!!!!

  The main characteristic of 2PC protocol:
   "two phase commit guarantees that  everything (in the transaction!!!!!!) will either commit or rollback"
   Only if you define TRANSACTION.
   
WARNING: Outside TRANSACTIONS 2PC protocol doesn't work!


Now I hope the distributed functionality in Oracle is clear!!

0
 
LVL 14

Author Comment

by:leoahmad
ID: 10888759
it means that any two databases on two different system on any envoirnment, LAN WAN
a distributed database is constructed??

I am using LAN

Any further idea

I am student in DDBMS so your hints are helpful for me.

LeoAhmad
0
 
LVL 47

Expert Comment

by:schwertner
ID: 10889648
Oracle is invariant what kind of connection are you using. Oracle NET protocol uses TCP/IP connection and LISTENERS. Normally they are listening on port 1521. So you can connect Oracle instances on different platforms. The only prerequiosite is to have TCP/IP connection between the instances.
Of course the quality of the connection (mainly the speed) has a great impact of the productivity.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Cursors in Oracle: A cursor is used to process individual rows returned by database system for a query. In oracle every SQL statement executed by the oracle server has a private area. This area contains information about the SQL statement and the…
I remember the day when someone asked me to create a user for an application developement. The user should be able to create views and materialized views and, so, I used the following syntax: (CODE) This way, I guessed, I would ensure that use…
This video explains at a high level with the mandatory Oracle Memory processes are as well as touching on some of the more common optional ones.
This video shows how to recover a database from a user managed backup

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now