Link to home
Start Free TrialLog in
Avatar of DonFreeman
DonFreemanFlag for United States of America

asked on

Databases Running on Virtual Machines (VM)

I have taken over a database shop that has some databases running on VMWare.   I am getting a lot of pressure to do more of that and just before I took over they moved two large SQL servers to VM that run a myriad of smaller databases supporting our GenGov program.   Most of these have less than 10 users and they appear to running well.   Oracle announced its own VM solution yesterday.   I asked about this on line this morning and haven't received a lot of feedback yet but one guy mentioned that the main problem with VM and Oracle is licensing costs.  The processor/licensing math works to Oracle's advantage in this case.

What I believe I am seeing however is that the VM technology competes with RAC, Grid, and SQL Cluster technology.   The primary means of managing failover, disaster recovery and maintaining availability shifts to the server team from the DB team.  Is this a correct view?  The Oracle DBA's I talked to at our last Oracle User Group lunch seemed universally against the use of VMware but we didn't have time to get into the specifics of why.   Since we are getting ready to upgrade the database servers on one of our major apps I I want to make sure that when I debate this that the facts are on my side and I'm not defending an inferior technology just because I manage it.   I need some discussion of this and am willing to share points for informed knowledgeable commentary (which I shamelessly will claim as my own to my boss ).

Should I be fighting this or going with it?
Avatar of schwertner
schwertner
Flag of Antarctica image

Note:249212.1
Support Status for VMWare
-----------------------------
 
Oracle provides support of the Oracle Stack when running on a VMware virtual machine in the following manner. If a problem arises and it is a known Oracle issue, Oracle support will recommend the appropriate solution. If that solution  does not work, the issue will be referred back to VMware for support. If the problem is determined to be an unknown Oracle issue when running on a VMware virtual machine, and the issue cannot be reproduced on a physical system by Oracle support, the issue will be referred back to VMware for support. Oracle and VMware have in place a joint customer support agreement to enable customer
support issues to be transferred between the two partners.

Please Note: RAC is not supported on VMWare by Oracle.
No experience with Oracle on VMWare, but we are steadily (but slowly) migrating our SQL Servers to VMWare with excellent results.  Doesn't mean we don't run into the odd wrinkle here and there, but once we fix an issue with VMWare is seems to stay fixed.  We currently have four Virtual servers running SQL Server - two of them for more than 6 months.
ASKER CERTIFIED SOLUTION
Avatar of schwertner
schwertner
Flag of Antarctica image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of DonFreeman

ASKER

So far not what I am looking for.  I know it dbs will run on VM's and I know that to some degree that it is supported.  We briefly tried to get a RAC running on VMWare and gave it up.    But, is VM better than RAC  or SQL Cluster for High Availability or Disaster Recovery?  The VM is a good solution for reducing rack crowding, heat, hardware, power, etc.   You can also migrate images to new hardware without having to reinstall software.  All our databases are on SANs so the question is can VM's compete with the Oracle RAC for HA and DR?   Our VM wizard seems to think he has the goods!   The problem may be that there aren't many people who are experts on both and can make accurate comparisons.
I will never recommend to install Oracle PRODUCTION instances
on VMWARE. Even for performance testing. For development you can
take this risk .... Take in account that when you increase the number
of the virtual machines on a physical server the overall productivity will go down.

Instead I will recommend to create many instances on a physical machine.
Playing with the size of the SGA you can place them there, there will no overhead
connected with the VMWare software (it also consumes resources!).
VMWare and RAC are different technologies.

VMWare is the possibility to run many (virtual) machines on ONE physical machine.
OS could be different and this is perfect - imagine laptow with Windows XP and Linux
for development - no need for two boxes.

RAC is another solution. It is used:
1. For Load Balancing - redirecting the task to the less loaded machine.
On VMWare this is not possible, because physically the machine resides on one physical
machine

2. For failover of a node - the node is down or need maintanence and you turn it out.
In VMWare to fix the box you need to turn off the whole VMWare instances on the machine.

So you can not compare these solutions, their target is different ...

VMWare is important attempt to build many virtual machines on one physical,
but nowadays it seems to be good for small applications, not for software
monsters like Oracle Servers.

I will mention also that Oracle certifies Virtual Machines on Futjicy/Siemens
Itanium-2 servers.
personally I like real servers more then VM:
and started use VM just for testing
Later we did not have much space in server room, SAN was utilized:
as result VM helped us to consolidate some databases and to free server room
So probably it is all about budget - $$$ and personal priorities ..

I see VM  and real servers but not VM only environment
I want to say about VM and RAC I realize they are different.  They have some overlap in benefits.  Our VM Guru is advocating VM as a HA solution.  He can copy and replicate a VM partition to another machine very quickly although I haven't heard this described as failover yet.  The same with disaster recovery.    

It is mentioned that VM does not do load balancing and RAC does.   The primary benefit of RAC is failover and Dataguard for DR so I need to compare those two things and see if Virtual machines can perform this adequately.

I'm going to need some time to read all those links.  Thanks for all your inputs!
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
We have a lot of ESX servers.  But how does it determine if it is failing and needs to failover to another server?  Would Oracle RAC do it the same way?  You have tested this but with a running database?
Avatar of robocat
robocat

VMware HA is a lot cheaper than RAC because you need only one Oracle license. The setup is also less complex to understand and manage.

The failover (when an ESX suddenly goes down) happens by restarting the VM on another node. This is not a clean failover, your database will probably do a consistency check and this can take some time depending on the size.

So call VMWare HA a poor man's failover. It all depends if you really need to be up 24/24 7/7 or can deal with an outage of -say 15 minutes.

VMWare is also great for disaster recovery. If you primary site goes down, and you have replicated your stora
ge to another site, you can be up again in a very short time. All of this at lower cost than RAC.
Please pay attention to Lahousden comment - it is perfect!!!!

My comments:

----> VMware HA is a lot cheaper than RAC because you need only one Oracle license.
This can be commented so:
If you install more images on ESX server you have to pay Oracle license for every installation.
Oracle doesn't consider the concept 'image' of an instance.
If you doesn't use VMWare you can using only one Oracle license to create many instances
on the same computer and to use them - be sceptic here, may be I am wrong, so investigate
Oracle Licenses. I have it, but I have no time to investigate right now.

@Don: Yout VM guru can copy and replicate a VM partition to another machine very quickly BUT only if the ESX server is running. But failure of a node means that the node is out of order and can not run at all. So I am very curious how the Guru will transfer the image. This is the same mistake to place the Primary and the Data Guard instance on one machine or even in the same building - imagine a fire and you will understand what will happen ....

I think tnat VMWare is a good technology for development - every developer group could have their own instance and to recover the data without disturbing other developer groups. It can not substitute RAC and it can not be used for DataGuard. In our VMWare instalations it is very hard task to add additional hard disk space to the image. The resource sharing of the hardware is a very hard task even for Gurus.

@schwerter: The licesing of Oracle is based upon the number of physical processors in the box (or a number of named users). Should one install multiple Oracle instances on a box (in separate VMWare hosts or not), the license cost remains the same.

As a consequence, one should size the VMWare ESX server to fit one (or several) Oracle instances only, and don't put anything but Oracle onto the ESX.

RAC however requires you to buy 2, 3 or more times this license cost, depending if your cluster has 2, 3 or more nodes.

The idea here is not to consolidate lots of other servers, but to use ESX as a means for HA and DR of Oracle. This might sound a bit weird but it is actually a valid idea. The cost savings can be significant compared to RAC, if you can tolerate a very small downtime.

>>>So I am very curious how the Guru will transfer the image.

Simply using ESX High Availability (HA), this will happen 100% automatically.

>>>It can not substitute RAC and it can not be used for DataGuard.

True, but you can get close as far as HA and DR is concerned at a much lower cost.


@robocat; I think you cited out of context saying:

1.
>>>So I am very curious how the Guru will transfer the image.

Simply using ESX High Availability (HA), this will happen 100% automatically.

The context is (pay attention that the presumtion is that the VMware computer doesn't work
and cannot be started AT ALL!!!!!). How will you backup the image? This happens often.
This is the risks of the centralization - to install the enterpise computing farm on one or a few
computers.

Yout VM guru can copy and replicate a VM partition to another machine very quickly BUT only if the ESX server is running. But failure of a node means that the node is out of order and can not run at all. So I am very curious how the Guru will transfer the image.

The same happens in the big cities. Look Hamburg - every Underground or S-Train Crosses the Main Railway Station. Every accident at this station causes stopping of the rail public transportation ... I have experienced this .. for hours.
Additionally pay attention that tests on VMWare installation cannot give real results.
Because of the overhead used by VMWare and other images running on the
installation. In contrast the production Oracle server runs with no WMWare

Real scenario fro yesterday in our company:
1. I encrypted the connections to the Oracle Server
2. The bosses asked to estimate the delay caused by the necryption.
3. The testers said that the overhead caused by the VMServer cannot
be calculated and the experiments will not reflect the real production installation.

As result I have to provide them a testbed without virtual servers.


>>> pay attention that the presumtion is that the VMware computer doesn't work and cannot be started AT ALL. Yout VM guru can copy and replicate a VM partition to another machine very quickly BUT only if the ESX server is running.

No, not true. That's all what ESX HA is about. Another ESX server will boot the virtual machine image located on your FC/iSCSI SAN. No need for the original server to be available at all.

Many companies also mirror the SAN storage (synchronously or asynchronously), to guarantee that the storage is available on the DR site also. You can restart the Oracle on the DR site in minutes when the primary site goes down.

Schwertner, I get the impression your talking about VMware Server instead of ESX/Virtual Infrastructure ? These are completely different beasts.




>> SAN????? No, most of the companies do not use SANs at all. Why? Too expensive, need fibre channels, special NICs. So I do not believe that you can access a failed box hard disk in the common case.

In our company the smallest server with ESX server has 2 Xeon processors, but some departments are using boxes with 4 and even 8 processors with 12 GB RAM. Could you imagine what is the Oracle license cost for such installation?

Finally the original question is about VMWare.

Oracle's VM is not good enough. It seems that they support only their Oracle Linux and the latest versions 10g, 11g of the DB.

Interesting - we coudlnt to install Windows/2000 on ESX server. This was not supported. I needed it urgently because standalone Forms/Reports server of Oracle is not certificated for Windows XP. So I had to deliver separate old computer and to install W2000 and Forms/Reports server.

Virtualization is a good thing. I appretiate the idea. BUT never forget that:
1. The underlying hardware is a Personal Computer. Even in the cases of Xeon and Itanium 2 processors it
potential is limited and this is fact is a luxury PC. So you can not expect too much images working on this hardware
with this L1, L2, L3 caches, memories and buses. Also Linux is mainli smal to moderate computer OS.
2. The big software complexes like Oracle sometmes prefer to command the devices over the OS and the file system (raw devices of Oracle, etc.). So they need direct access to the iron ...

>> No, most of the companies do not use SANs at all...The underlying hardware is a Personal Computer...

I'm sorry, let's stay "professional" here. The author of the original question said they have a SAN, and also seem to have an operational Virtual Infrastructure with HA. They probably are a fairly large organization with high availabililty requirements.

>> Finally the original question is about VMWare.

Indeed. The original question was: can you archieve HA/DR using VMware like you would using database clustering or RAC.

The anwser to the question is that VMWare Virtual infracture gets close if you can live with a few minutes downtime during failover and a  small virtualisation overhead. In both cases you need shared storage.



This questions will be read from many askers and they can got wrong conclusions.
Be aware that the Oracle (and other!) DBAs are highly responsible for the failover of the
instance. There are no excuse if a production instance fails and is not available for days.
The DBA will face big troubles and possible could be fired if he is the author of the idea
to use virtualization.

I open the internal mailbox and will post here a letter:
09/05/2007 09:58 AM
ESX Server down!
hi *,
currently the ESX servers are down - means all charly2/3 images and CruiseControl are not reachable.
IT is informed... I'll inform when back again.
greetings,
..... end of letter

This lasts 2 days!
This was development installation .....
3 Oracle servers, overloaded!
But no my responsibility, other DBA was responsible.

In our company normally the big boss visits my oficce and explains me:
"Joseph, 45 employees doesn't develop software, I have to pay them N bucks daily!
Please multiply 45*N*2! "
No objections!
We have four or five SANs  We are a state health department.  We have 3 large projects using Oracle (all stovepiped) and a couple of hundred SQL database supporting various departments.    I've been here for two weeks and the position I am filling was gapped for about 3 months so I am scrambling to map things out.  

There are no diagrams, drawings or anything else that describes the enterprise architecture.  I am getting a plotter out of storage and getting ready to start documenting.  Nobody in the database group is aware of or knowledgeable about the server teams plans for virtualization.  

We are not really a 24/7 shop.  There is no operations center, at least not one managing our ops, and we won't know if something broke until we come in the morning.   Last night the AC in the server room sent down at 5pm and the janitor and the landlord (leased building) were left to sort it out.  Its still down and nobody has come. This morning a lot of databases are down.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I have followed all the links and read the articles.  The article with the most intriguing title has a dead link:
"The Pros and Cons of Virtual Machines in the Datacenterl"    It doesn't look like anybody has written a definitive paper that I can find on pros and cons of databases on VM.

In one of the articles I found this comment on performance testing:

"Two good performance comparisons of VMware and Xen were conducted by the computer science departments at University of Cambridge, England and Clarkson University (PDF). Based on the Cambridge study, VMware Workstation achieves near-native performance for processor-intensive tasks, but experiences slow-downs of up to 88 percent on I/O-bound tasks. That means your I/O-bound process would be running at nearly 1/10 of its native speed—something that may be unacceptable to you. The Cambridge group performed its study based on VMware Workstation 3.2 because licensing restrictions in newer VMware versions prohibit test comparisons. "

I read that as VMWare specifically prohibits anybody from actually testing their software to find out if it performs badly and publishing the results.  They follow up by saying you have to test it yourself.  I'll be the first to say that I'm not an engineer with a lab that matches our production environment where I can conduct fully instrumented tests of hardware/software configurations.  We do the best we can.  I'll come back and award points by the end of the day.  I hope you'll add any further comments that will help.  Considering how few actual references exist out there for this subject this thread may become a good reference for the question.



Possibly you can try to use VMWare firstly with 2-3 instances and to investigate how they work.
Of course no RAC!

@robocat; I appretiate your input. But RAC and VMWare have different tasks. We cannot compare apples and bulls! Of course to have the RAC loadbalance and failover (excluding the disk storage on the SAN crashes) the enterprise has to pay money.

>>> That means your I/O-bound process would be running at nearly 1/10 of its native speed—something that may be unacceptable to you. The Cambridge group performed its study based on VMware Workstation 3.2

... which completely invalidates the test because VM Workstation (or VM Server) have to go through the underlying OS which implies massive overhead. The ESX kernel works directly on the hardware and has a lot less overhead.

Still you have to pay a lot of attention to your Storage Infrastructure and some tuning to get good results. Currently we're migrating our VMWare datastores from iSCSI (IP SAN) to NFS (on NetApp) and we're getting 30% to 45% increases in I/O performance for our VM hosts. In some cases this can even outperform FC SAN.

Also remember that databases need lots of RAM to perform at their best, so buy your VMWare servers with at least 16GB or 32GB RAM.