Use of ext3 filesystem for Oracle clusterware cluster nodes

nd2011 used Ask the Experts™
I would like to know the feasibility of use of the following and any gotchas, constraints, risks for the same.

In a Oracle clusterware (now called grid infrastructure) cluster nodes (2 nodes cluster) on Linux - configured in Active-Passive mode (Only one node will run a given cluster resource/service at a time) -
Can I use a LUN presented to both the nodes, format it as ext3 filesystem and use in the cluster to mount it to one server only at a time?
[I wont be using cluster LVM - just format the LUN as a single partition as ext3 filesystem and mount on one server. Ensuring that the filesystem is mounted only on one server at any given point in time (through the scripts for defining cluster resource).]

Is this a well used approach?
Is this an acceptable approach?
Are their any issues, risks in this approach?

Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2009

>>Is this a well used approach?

Probably not nowadays. People who do clustering with Oracle want either high availability, scaling or disaster recovery protection.
HA means Dataguard.
DR means Dataguard.
Scaling means RAC.

An active-standby, in my book, needs to have a complete separate disk to be useful.

We used to run Oracle Failsafe Clusters years ago, but now with the maturity of RAC and DG, they don't really bring that much to the table. From a redundancy perspective, they don't provide much protection, and don't help you scale either, as there is always an idle CPU.

>>Is this an acceptable approach?

Personally, I'd say no. ext3 isn't a cluster filesystem, so if you have shared LUNs, it will corrupt the filesystem if one node doesn't know what the other node is doing. Will it work, yes probably, as long as you can guarantee against simultaenous mounting.

>>Are their any issues, risks in this approach?

See above.

If you really want to do clustering, I'd do a RAC cluster, or a data guard cluster. Active-standby really isn't much use in my experience, because most failures I've seen were historically disk lated.

My question to you is, "Do you want HA, scaling, or DR protection?"
Richard OlutolaConsultant

If you want to use hardware clustering, then you could use ext3. You would not be using RAC and Oracle would not be involved in/aware of your clustering efforts.

To fail-over you'll need a way of 'switching'/presenting the storage to the active server.

I know AIX have the HACMP (or something like that) to achive this. Some shops use hardware clustering as a cheaper alternative to RAC. One the downsides is that it requires administrator intervention to switchover/failover, otherwise it would requires extensive scripting to do anything automatically.

So you can use ext3 for single node installation but not RAC. RAC requires ocsf2, direct access/raw (not from 11.2), or ASM.



-> rolutola:
We plan to use Oracle clusterware - but not RAC. The LUN (on which we will have the ext3 filesystem) is presented to both the nodes/servers that are configured in clusterware cluster.
We will define a cluster resource that will include action to mount/unmount the filesystem as part of resource start/stop scripts. The cluster setup will ensure that the cluster service/resource will run from only one node at a time and the start/stop scripts for the resource will then mount/umount the filesystem - thus effectively having the filesystem mounted on one server only at any given point in time.
Do you think there is even a remote chance of corruption of data / LUN reservation etc - if above is being taken care of and there is no manual mount/umount of filesystem to happen?

-> mrjoltcola :
I want HA. We are using Oracle clusterware - but without RAC as RAC costs a lot, we do not need it as we do not intend to scale a lot (present configs are good enough).
I have seen failures with respect to network access, SAN ports etc where local HA provides automatic failover.
We do not need a cluster filesystem so to say - as we do not need the filesystem to be available from multiple nodes simultaneously - but to have the filesystem available to the cluster node which is running the cluster service/resource at that time.
I agree that its idle CPU - but it provides automatic failover, you can plan updates etc on one node at a time by failing over services to other node.

Success in ‘20 With a Profitable Pricing Strategy

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Richard OlutolaConsultant

Your plan sounds technically achievable. The only thing I would be concerned about is Oracle support of the plan. If you encounter any problems requiring their support you may not be covered.

The advantage of GRID computing is to take you out of single point of failure. Like an electrical power grid oracle used the concept, which means immaterial of what is the source, the service is always available to the client. If you are using only one node in the cluster at any given time your cluster should work. Oracle uses OCFS so that the shared LUN is available to multiple nodes at the same time and the nodes are not able to take an exclusive lock on the LUN but can take locks only at file level.



Thanks for all your comments so far. Before I can accept any comments/suggestions/solution - I would like to know -
Does anyone see any risks/issues that could be faced due to this solution arrangement? - except for the points highlighted above - Oracle support, and ensuring filesystem mounted only on one node.

Like do you feel there could be any other instance where each server may be trying to access/acquire lock on the LUN - or anything else that could cause either corruption of data on the LUN or a panic on any of the nodes?

I'll suggest you trial this on a test system. From your description of the method, I personally can't see any risks as long as the filesystem is identical on both nodes.

Perhaps if one server has not finished umounting a device while the second node tries to mount the same device, then there may be problems. Since the two servers are isolated from each other and there is no 'coordinator', there is always the risk of human error.


Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial