Solved

RED HAT cluster fails over too quickly

Posted on 2014-03-13
11
935 Views
Last Modified: 2014-03-26
we have a red hat cluster that panics very easily once activity increases on the database server and forces a failover to the 2nd node..
What parameters should we look at?and should we incraese them to?
0
Comment
Question by:it-rex
  • 6
  • 4
11 Comments
 
LVL 77

Expert Comment

by:arnold
ID: 39927754
What check is "failing" triggering the cluster failover?
database connection timeout? system Load?
Backend storage timeout?

What is the average load on the system?
Peak usage?

It is impossible to suggest what to increase without knowing what triggers the failover.
0
 
LVL 11

Author Comment

by:it-rex
ID: 39927781
all what we have


Mar 13 00:01:21 rgmanager [fs] Checking fs "oracle_data_report_fs_res", Level 10
Mar 13 00:08:08 rgmanager I am node #1
Mar 13 00:08:09 rgmanager Resource Group Manager Starting
Mar 13 00:08:09 rgmanager DBus Notifications Initialized
Mar 13 00:08:09 rgmanager Loading Service Data
Mar 13 00:08:09 rgmanager Loading Resource Rules
Mar 13 00:08:09 rgmanager 24 rules loaded
Mar 13 00:08:09 rgmanager Building Resource Trees
Mar 13 00:08:10 rgmanager 22 resources defined
Mar 13 00:08:10 rgmanager Loading Failover Domains
Mar 13 00:08:10 rgmanager 1 domains defined
Mar 13 00:08:10 rgmanager Loading Event Triggers
Mar 13 00:08:10 rgmanager 1 events defined
Mar 13 00:08:10 rgmanager Initializing Services
Mar 13 00:08:10 rgmanager [oracledb] Validating configuration for risprs01
Mar 13 00:09:44 rgmanager [fs] stop: Could not match /dev/oracle_data_report_vg/oracle_data_report_lv with a real device
Mar 13 00:09:44 rgmanager [fs] stop: Could not match /dev/oracle_data_load_vg/oracle_data_load_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_rman1_vg/oracle_rman1_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_scripts_vg/oracle_scripts_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_redo2_vg/oracle_redo2_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_redo1_vg/oracle_redo1_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_arch_redo_vg/oracle_arch_redo_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_data_vg/oracle_data_lv with a real device
Mar 13 00:09:50 rgmanager [ip] 10.90.72.73 is not configured
Mar 13 00:09:50 rgmanager [ip] 10.90.72.70 is not configured
Mar 13 00:09:50 rgmanager [ip] 10.90.72.69 is not configured
Mar 13 00:09:50 rgmanager Services Initialized
Mar 13 00:09:51 rgmanager Event: Port Opened
Mar 13 00:09:51 rgmanager State change: Local UP
Mar 13 00:09:51 rgmanager State change: synlp2876-clust UP
Mar 13 00:09:51 rgmanager Evaluating RG service:oracle_service, state started, owner synlp2876-clust
Mar 13 00:09:51 rgmanager Event (1:1:1) Processed
Mar 13 00:09:51 rgmanager Evaluating RG service:oracle_service, state started, owner synlp2876-clust
Mar 13 00:09:51 rgmanager Event (0:2:1) Processed
Mar 13 00:09:56 rgmanager 2 events processed
0
 
LVL 11

Author Comment

by:it-rex
ID: 39927793
this happens exactly when the databse backup starts..
0
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

 
LVL 77

Expert Comment

by:arnold
ID: 39927810
Your backup using TSM, RMANN?

Do you have two network interfaces or just one overwhich backup saturates the network so heart beat is not detected?

IT starts complaining about access to the partitions.

Does your backup, take LVM resources offline?
Do you have an iscsi based disk resources or FC (oracle data storage)?
0
 
LVL 11

Author Comment

by:it-rex
ID: 39928128
We have VNX and it is FC...
0
 
LVL 11

Author Comment

by:it-rex
ID: 39928131
Also this happens with RMAN backup to disk....
0
 
LVL 77

Expert Comment

by:arnold
ID: 39928271
You only have one event.
Which fencing event are you using?
Is this when you startup the cluster services?
0
 
LVL 11

Author Comment

by:it-rex
ID: 39928290
I'm a DBA not sure what you mean..
How can u find which fencing event I'm using??
0
 
LVL 77

Accepted Solution

by:
arnold earned 500 total points
ID: 39928407
You have to look at the cluster configuration.
Which RedHat version are you using, RHEL 5.x?

more /etc/cluster/cluster.conf


There is a GUI interface system-config-cluster if you have a graphical Interfac.
 

Who setup the cluster? Can they be consulted?

Are all your databases being backed up at the same time, or do you have the back up spread over a period of time?

Your fencing process might be contingent on talking/querying the database.  your backups might be locking and preventing a response to the check which triggers the failover.
0
 
LVL 13

Expert Comment

by:Sandy
ID: 39933666
Please paste  /etc/cluster/cluster.conf by removing passwords and any further confidential information so we can see.

TY/SA
0
 
LVL 11

Author Closing Comment

by:it-rex
ID: 39957602
thanks
0

Featured Post

VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Check if a filw is immutable for a certain amount of time 6 61
Linux hostname change 2 72
linux 13 50
SUSE Linux Enterprise 11.x Ensure tftp server is not enabled 1 25
Introduction We as admins face situation where we need to redirect websites to another. This may be required as a part of an upgrade keeping the old URL but website should be served from new URL. This document would brief you on different ways ca…
I. Introduction There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension (http://www.experts-exchange.com/discussions/210281/Attachments-with-no-extension.html). This reminded me of questions tha…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question