I have mission critical servers (2 of them are running Oracle DB Engine under Redhat E4, and 1 is Windows 2003 Server running Oracle applications). I need to be able to minimize the failure time. Which means, in case of any problem in either of the server I will be able either to automatic failover to another server, or use a virtual server and deploy a true image of the faulty server. What I have in mind is the following: 1- Buy a server and install Vmware on it for virtualziation 2- Buy a true image software (such as Acronos Echo Server Enterprise) to snapshot the servers to be ready for recover/deployment in case of failure.
Is this a good way to achieve what I am looking for?
Please note that I have Intelligent Disaster Recovery Agent for Symantic Backup Exec 11d, but the problem with this is that the IDR should be restored to the same exact hardware. However, other recovery software use a universal drivers for such type of operations.
If you depolyed these two servers using VMware ESX and HA (High Availablility) you could have them failover to the other server autoamatically - and be able to take servers offline without downtime to your client facing servers.
You would need 2x ESX Servers - Virtual Center (this can run as a VM within one of the ESX servers) and some shared storage (iSCSI or fiber SAN for example) for it to work, but that would be the most elagant VMware solution.
If you bought ESX and Virtual center only you could "Hot" CLone the servers periodiaclly - though it would be better to "snapshot" them, that way you could roll back to a point in time if thay failed
but HA is the correct way to acheive what you asked
Excellent. Please let me know if I understood you correctly: We need to buy a Hardware (server), and on that server we need to install the 2 licenses of ESX Servers. one each of these two servers (Virtual Servers), we install the two servers that we need to protect. As for the Shared storage, that will be needed as a shared place to store the database files, and other dynamic and keep changing files, right?
I am not really familiar with this VMware technology, do you have any docs regarding that or how manual?
you will need 2 servers and some storage (so thats either a third server or some SAN storage)
>>one each of these two servers (Virtual Servers), we install the two servers
You install the two servers you need on one of the ESX servers then is the esx server goes offline the other one will start up the two servers (the servers virtual hard disks are in the storage NOT on the ESX server itself)
>>As for the Shared storage, that will be needed as a shared place to store the database files, and other dynamic and keep changing files, right?
They hold the virtual hard disks and configuratio files for the virtual servers
>>I am not really familiar with this VMware technology, do you have any docs regarding that or how manual?
Can you define what you needs for recovery time or uptime are? VMware HA is an excellent solution for recovery, but it will allow the machines to go down. What happens is that the VM will die when the server it is on (ESX host) goes down. The other ESX host(s) in the cluster will sense this and take over the VM and restart. It is important to understand a couple of things with this: 1. The virtualized servers (Oracle and W2K3) will die when the ESX host dies. They will be restarted, but currently connected users will be disconnected. There is high availability but not 100% uptime. 2. Since the VM goes down hard when the hardware dies, your VM and/or the databases might not come up cleanly. It is the same as if a physical server goes down hard. Sometimes they come up okay and sometimes they do not. Databases don't like being hard reset. 3. This solution will be expensive to implement properly. You will need two servers capable of running all three servers each. That is because if one fails, the other server will assume the load of the other plus its own load. You will need to invest in a SAN or NFS filer, if your company doesn't already have one. This can be very expensive and non-trivial to implement. You will need to add a whole SAN infrastructure if you elect to go with a traditional FC SAN (HBAs, switches, etc). 4. If you truly need to ensure the servers never go offline or are hard reset, then you will need to implement a different type of clustering. This can be MSCS or linux HA. On the paid front, Veritas also makes a popular cluster suite. These, along with redundant hardware and networking, can approach the 100% uptime.