Solved

RED HAT cluster fails over too quickly

Posted on 2014-03-13
11
901 Views
Last Modified: 2014-03-26
we have a red hat cluster that panics very easily once activity increases on the database server and forces a failover to the 2nd node..
What parameters should we look at?and should we incraese them to?
0
Comment
Question by:it-rex
  • 6
  • 4
11 Comments
 
LVL 76

Expert Comment

by:arnold
Comment Utility
What check is "failing" triggering the cluster failover?
database connection timeout? system Load?
Backend storage timeout?

What is the average load on the system?
Peak usage?

It is impossible to suggest what to increase without knowing what triggers the failover.
0
 
LVL 11

Author Comment

by:it-rex
Comment Utility
all what we have


Mar 13 00:01:21 rgmanager [fs] Checking fs "oracle_data_report_fs_res", Level 10
Mar 13 00:08:08 rgmanager I am node #1
Mar 13 00:08:09 rgmanager Resource Group Manager Starting
Mar 13 00:08:09 rgmanager DBus Notifications Initialized
Mar 13 00:08:09 rgmanager Loading Service Data
Mar 13 00:08:09 rgmanager Loading Resource Rules
Mar 13 00:08:09 rgmanager 24 rules loaded
Mar 13 00:08:09 rgmanager Building Resource Trees
Mar 13 00:08:10 rgmanager 22 resources defined
Mar 13 00:08:10 rgmanager Loading Failover Domains
Mar 13 00:08:10 rgmanager 1 domains defined
Mar 13 00:08:10 rgmanager Loading Event Triggers
Mar 13 00:08:10 rgmanager 1 events defined
Mar 13 00:08:10 rgmanager Initializing Services
Mar 13 00:08:10 rgmanager [oracledb] Validating configuration for risprs01
Mar 13 00:09:44 rgmanager [fs] stop: Could not match /dev/oracle_data_report_vg/oracle_data_report_lv with a real device
Mar 13 00:09:44 rgmanager [fs] stop: Could not match /dev/oracle_data_load_vg/oracle_data_load_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_rman1_vg/oracle_rman1_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_scripts_vg/oracle_scripts_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_redo2_vg/oracle_redo2_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_redo1_vg/oracle_redo1_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_arch_redo_vg/oracle_arch_redo_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_data_vg/oracle_data_lv with a real device
Mar 13 00:09:50 rgmanager [ip] 10.90.72.73 is not configured
Mar 13 00:09:50 rgmanager [ip] 10.90.72.70 is not configured
Mar 13 00:09:50 rgmanager [ip] 10.90.72.69 is not configured
Mar 13 00:09:50 rgmanager Services Initialized
Mar 13 00:09:51 rgmanager Event: Port Opened
Mar 13 00:09:51 rgmanager State change: Local UP
Mar 13 00:09:51 rgmanager State change: synlp2876-clust UP
Mar 13 00:09:51 rgmanager Evaluating RG service:oracle_service, state started, owner synlp2876-clust
Mar 13 00:09:51 rgmanager Event (1:1:1) Processed
Mar 13 00:09:51 rgmanager Evaluating RG service:oracle_service, state started, owner synlp2876-clust
Mar 13 00:09:51 rgmanager Event (0:2:1) Processed
Mar 13 00:09:56 rgmanager 2 events processed
0
 
LVL 11

Author Comment

by:it-rex
Comment Utility
this happens exactly when the databse backup starts..
0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
Your backup using TSM, RMANN?

Do you have two network interfaces or just one overwhich backup saturates the network so heart beat is not detected?

IT starts complaining about access to the partitions.

Does your backup, take LVM resources offline?
Do you have an iscsi based disk resources or FC (oracle data storage)?
0
 
LVL 11

Author Comment

by:it-rex
Comment Utility
We have VNX and it is FC...
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 11

Author Comment

by:it-rex
Comment Utility
Also this happens with RMAN backup to disk....
0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
You only have one event.
Which fencing event are you using?
Is this when you startup the cluster services?
0
 
LVL 11

Author Comment

by:it-rex
Comment Utility
I'm a DBA not sure what you mean..
How can u find which fencing event I'm using??
0
 
LVL 76

Accepted Solution

by:
arnold earned 500 total points
Comment Utility
You have to look at the cluster configuration.
Which RedHat version are you using, RHEL 5.x?

more /etc/cluster/cluster.conf


There is a GUI interface system-config-cluster if you have a graphical Interfac.
 

Who setup the cluster? Can they be consulted?

Are all your databases being backed up at the same time, or do you have the back up spread over a period of time?

Your fencing process might be contingent on talking/querying the database.  your backups might be locking and preventing a response to the check which triggers the failover.
0
 
LVL 13

Expert Comment

by:Sandy
Comment Utility
Please paste  /etc/cluster/cluster.conf by removing passwords and any further confidential information so we can see.

TY/SA
0
 
LVL 11

Author Closing Comment

by:it-rex
Comment Utility
thanks
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Little introduction about CP: CP is a command on linux that use to copy files and folder from one location to another location. Example usage of CP as follow: cp /myfoder /pathto/destination/folder/ cp abc.tar.gz /pathto/destination/folder/ab…
It’s 2016. Password authentication should be dead — or at least close to dying. But, unfortunately, it has not traversed Quagga stage yet. Using password authentication is like laundering hotel guest linens with a washboard — it’s Passé.
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now