Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1042
  • Last Modified:

RED HAT cluster fails over too quickly

we have a red hat cluster that panics very easily once activity increases on the database server and forces a failover to the 2nd node..
What parameters should we look at?and should we incraese them to?
0
it-rex
Asked:
it-rex
  • 6
  • 4
1 Solution
 
arnoldCommented:
What check is "failing" triggering the cluster failover?
database connection timeout? system Load?
Backend storage timeout?

What is the average load on the system?
Peak usage?

It is impossible to suggest what to increase without knowing what triggers the failover.
0
 
it-rexAuthor Commented:
all what we have


Mar 13 00:01:21 rgmanager [fs] Checking fs "oracle_data_report_fs_res", Level 10
Mar 13 00:08:08 rgmanager I am node #1
Mar 13 00:08:09 rgmanager Resource Group Manager Starting
Mar 13 00:08:09 rgmanager DBus Notifications Initialized
Mar 13 00:08:09 rgmanager Loading Service Data
Mar 13 00:08:09 rgmanager Loading Resource Rules
Mar 13 00:08:09 rgmanager 24 rules loaded
Mar 13 00:08:09 rgmanager Building Resource Trees
Mar 13 00:08:10 rgmanager 22 resources defined
Mar 13 00:08:10 rgmanager Loading Failover Domains
Mar 13 00:08:10 rgmanager 1 domains defined
Mar 13 00:08:10 rgmanager Loading Event Triggers
Mar 13 00:08:10 rgmanager 1 events defined
Mar 13 00:08:10 rgmanager Initializing Services
Mar 13 00:08:10 rgmanager [oracledb] Validating configuration for risprs01
Mar 13 00:09:44 rgmanager [fs] stop: Could not match /dev/oracle_data_report_vg/oracle_data_report_lv with a real device
Mar 13 00:09:44 rgmanager [fs] stop: Could not match /dev/oracle_data_load_vg/oracle_data_load_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_rman1_vg/oracle_rman1_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_scripts_vg/oracle_scripts_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_redo2_vg/oracle_redo2_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_redo1_vg/oracle_redo1_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_arch_redo_vg/oracle_arch_redo_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_data_vg/oracle_data_lv with a real device
Mar 13 00:09:50 rgmanager [ip] 10.90.72.73 is not configured
Mar 13 00:09:50 rgmanager [ip] 10.90.72.70 is not configured
Mar 13 00:09:50 rgmanager [ip] 10.90.72.69 is not configured
Mar 13 00:09:50 rgmanager Services Initialized
Mar 13 00:09:51 rgmanager Event: Port Opened
Mar 13 00:09:51 rgmanager State change: Local UP
Mar 13 00:09:51 rgmanager State change: synlp2876-clust UP
Mar 13 00:09:51 rgmanager Evaluating RG service:oracle_service, state started, owner synlp2876-clust
Mar 13 00:09:51 rgmanager Event (1:1:1) Processed
Mar 13 00:09:51 rgmanager Evaluating RG service:oracle_service, state started, owner synlp2876-clust
Mar 13 00:09:51 rgmanager Event (0:2:1) Processed
Mar 13 00:09:56 rgmanager 2 events processed
0
 
it-rexAuthor Commented:
this happens exactly when the databse backup starts..
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
arnoldCommented:
Your backup using TSM, RMANN?

Do you have two network interfaces or just one overwhich backup saturates the network so heart beat is not detected?

IT starts complaining about access to the partitions.

Does your backup, take LVM resources offline?
Do you have an iscsi based disk resources or FC (oracle data storage)?
0
 
it-rexAuthor Commented:
We have VNX and it is FC...
0
 
it-rexAuthor Commented:
Also this happens with RMAN backup to disk....
0
 
arnoldCommented:
You only have one event.
Which fencing event are you using?
Is this when you startup the cluster services?
0
 
it-rexAuthor Commented:
I'm a DBA not sure what you mean..
How can u find which fencing event I'm using??
0
 
arnoldCommented:
You have to look at the cluster configuration.
Which RedHat version are you using, RHEL 5.x?

more /etc/cluster/cluster.conf


There is a GUI interface system-config-cluster if you have a graphical Interfac.
 

Who setup the cluster? Can they be consulted?

Are all your databases being backed up at the same time, or do you have the back up spread over a period of time?

Your fencing process might be contingent on talking/querying the database.  your backups might be locking and preventing a response to the check which triggers the failover.
0
 
SandyCommented:
Please paste  /etc/cluster/cluster.conf by removing passwords and any further confidential information so we can see.

TY/SA
0
 
it-rexAuthor Commented:
thanks
0

Featured Post

Fill in the form and get your FREE NFR key NOW!

Veeam is happy to provide a FREE NFR server license to certified engineers, trainers, and bloggers.  It allows for the non‑production use of Veeam Agent for Microsoft Windows. This license is valid for five workstations and two servers.

  • 6
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now