Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

RED HAT cluster fails over too quickly

Posted on 2014-03-13
11
Medium Priority
?
1,040 Views
Last Modified: 2014-03-26
we have a red hat cluster that panics very easily once activity increases on the database server and forces a failover to the 2nd node..
What parameters should we look at?and should we incraese them to?
0
Comment
Question by:it-rex
  • 6
  • 4
11 Comments
 
LVL 80

Expert Comment

by:arnold
ID: 39927754
What check is "failing" triggering the cluster failover?
database connection timeout? system Load?
Backend storage timeout?

What is the average load on the system?
Peak usage?

It is impossible to suggest what to increase without knowing what triggers the failover.
0
 
LVL 11

Author Comment

by:it-rex
ID: 39927781
all what we have


Mar 13 00:01:21 rgmanager [fs] Checking fs "oracle_data_report_fs_res", Level 10
Mar 13 00:08:08 rgmanager I am node #1
Mar 13 00:08:09 rgmanager Resource Group Manager Starting
Mar 13 00:08:09 rgmanager DBus Notifications Initialized
Mar 13 00:08:09 rgmanager Loading Service Data
Mar 13 00:08:09 rgmanager Loading Resource Rules
Mar 13 00:08:09 rgmanager 24 rules loaded
Mar 13 00:08:09 rgmanager Building Resource Trees
Mar 13 00:08:10 rgmanager 22 resources defined
Mar 13 00:08:10 rgmanager Loading Failover Domains
Mar 13 00:08:10 rgmanager 1 domains defined
Mar 13 00:08:10 rgmanager Loading Event Triggers
Mar 13 00:08:10 rgmanager 1 events defined
Mar 13 00:08:10 rgmanager Initializing Services
Mar 13 00:08:10 rgmanager [oracledb] Validating configuration for risprs01
Mar 13 00:09:44 rgmanager [fs] stop: Could not match /dev/oracle_data_report_vg/oracle_data_report_lv with a real device
Mar 13 00:09:44 rgmanager [fs] stop: Could not match /dev/oracle_data_load_vg/oracle_data_load_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_rman1_vg/oracle_rman1_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_scripts_vg/oracle_scripts_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_redo2_vg/oracle_redo2_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_redo1_vg/oracle_redo1_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_arch_redo_vg/oracle_arch_redo_lv with a real device
Mar 13 00:09:46 rgmanager [fs] stop: Could not match /dev/oracle_data_vg/oracle_data_lv with a real device
Mar 13 00:09:50 rgmanager [ip] 10.90.72.73 is not configured
Mar 13 00:09:50 rgmanager [ip] 10.90.72.70 is not configured
Mar 13 00:09:50 rgmanager [ip] 10.90.72.69 is not configured
Mar 13 00:09:50 rgmanager Services Initialized
Mar 13 00:09:51 rgmanager Event: Port Opened
Mar 13 00:09:51 rgmanager State change: Local UP
Mar 13 00:09:51 rgmanager State change: synlp2876-clust UP
Mar 13 00:09:51 rgmanager Evaluating RG service:oracle_service, state started, owner synlp2876-clust
Mar 13 00:09:51 rgmanager Event (1:1:1) Processed
Mar 13 00:09:51 rgmanager Evaluating RG service:oracle_service, state started, owner synlp2876-clust
Mar 13 00:09:51 rgmanager Event (0:2:1) Processed
Mar 13 00:09:56 rgmanager 2 events processed
0
 
LVL 11

Author Comment

by:it-rex
ID: 39927793
this happens exactly when the databse backup starts..
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 80

Expert Comment

by:arnold
ID: 39927810
Your backup using TSM, RMANN?

Do you have two network interfaces or just one overwhich backup saturates the network so heart beat is not detected?

IT starts complaining about access to the partitions.

Does your backup, take LVM resources offline?
Do you have an iscsi based disk resources or FC (oracle data storage)?
0
 
LVL 11

Author Comment

by:it-rex
ID: 39928128
We have VNX and it is FC...
0
 
LVL 11

Author Comment

by:it-rex
ID: 39928131
Also this happens with RMAN backup to disk....
0
 
LVL 80

Expert Comment

by:arnold
ID: 39928271
You only have one event.
Which fencing event are you using?
Is this when you startup the cluster services?
0
 
LVL 11

Author Comment

by:it-rex
ID: 39928290
I'm a DBA not sure what you mean..
How can u find which fencing event I'm using??
0
 
LVL 80

Accepted Solution

by:
arnold earned 2000 total points
ID: 39928407
You have to look at the cluster configuration.
Which RedHat version are you using, RHEL 5.x?

more /etc/cluster/cluster.conf


There is a GUI interface system-config-cluster if you have a graphical Interfac.
 

Who setup the cluster? Can they be consulted?

Are all your databases being backed up at the same time, or do you have the back up spread over a period of time?

Your fencing process might be contingent on talking/querying the database.  your backups might be locking and preventing a response to the check which triggers the failover.
0
 
LVL 13

Expert Comment

by:Sandy
ID: 39933666
Please paste  /etc/cluster/cluster.conf by removing passwords and any further confidential information so we can see.

TY/SA
0
 
LVL 11

Author Closing Comment

by:it-rex
ID: 39957602
thanks
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Using libpcap/Jpcap to capture and send packets on Solaris version (10/11) Library used: 1.      Libpcap (http://www.tcpdump.org) Version 1.2 2.      Jpcap(http://netresearch.ics.uci.edu/kfujii/Jpcap/doc/index.html) Version 0.6 Prerequisite: 1.      GCC …
I have written articles previously comparing SARDU and YUMI.  I also included a couple of lines about Easy2boot (easy2boot.com).  I have now been using, and enjoying easy2boot as my sole multiboot utility for some years and realize that it deserves …
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
Suggested Courses
Course of the Month11 days, 22 hours left to enroll

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question