?
Solved

MySQL Percona Cluster: WSREP Error "Failed to initialize backend"

Posted on 2012-08-16
7
Medium Priority
?
1,541 Views
Last Modified: 2012-08-26
Hello everyone -
 
I am dealing with another DBA's installation of Percona Cluster this afternoon - and he's not available.
 
I have searched the forums here and I have failed to find a post regarding the problem I am seeing.
 
The sysadmin restarted NODE01 and MySQL (Percona Server) is not starting due to:
 
120816 13:11:33 WSREP: gcs/src/gcs_backend.c:gcs_backend_init():87: Invalid backend URI: 0
 120816 13:11:33 WSREP: gcs/src/gcs_core.c:gcs_core_open():202: Failed to initialize backend using '0': -22 (Invalid argument)
 120816 13:11:33 WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel 'c_l' at '0': -22 (Invalid argument)
 120816 13:11:33 WSREP: gcs connect failed: Invalid argument
 120816 13:11:33 WSREP: wsrep::connect() failed: 6
 120816 13:11:33 Aborting
 

That's from the NODE1 error log.
 
Any idea at all what's going on here?
 
The system has been previously up and running in a four node arrangement. The other three nodes have MySQL up and running.
 
Thanks in advance for ANY advice, thoughts or suggestions you may be able to provide.
 
/David C.
0
Comment
Question by:learningtechnologies
  • 5
  • 2
7 Comments
 
LVL 2

Author Comment

by:learningtechnologies
ID: 38302348
Additional information that may help:

 uname -a
Linux hsdb01 3.2.0-24-virtual #37-Ubuntu SMP Wed Apr 25 10:17:19 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

free -m
             total       used       free     shared    buffers     cached
Mem:         16050        605      15444          0         30        155
-/+ buffers/cache:        419      15630
Swap:         3317          0       3317

Open in new window


Thanks!

/David C.
0
 
LVL 24

Accepted Solution

by:
johanntagle earned 1500 total points
ID: 38302829
Not too many people know about the Percona Cluster here.  I haven't used it yet myself though I plan to try it out on test machines.   May I suggest you also post at the Percona Users Google Group?  See https://groups.google.com/forum/?fromgroups#!forum/percona-discussion
0
 
LVL 2

Author Comment

by:learningtechnologies
ID: 38302932
Thank you for that excellent suggestion!

I have posted at the Percona community forum.

I have the feeling I'm going to be a Percona Cluster expert by the time this is all over.

Thanks -

/David C.
0
Get your Disaster Recovery as a Service basics

Disaster Recovery as a Service is one go-to solution that revolutionizes DR planning. Implementing DRaaS could be an efficient process, easily accessible to non-DR experts. Learn about monitoring, testing, executing failovers and failbacks to ensure a "healthy" DR environment.

 
LVL 24

Assisted Solution

by:johanntagle
johanntagle earned 1500 total points
ID: 38302947
Teach me!  Teach me! hehehe
0
 
LVL 2

Author Comment

by:learningtechnologies
ID: 38304031
Well ... I ~have~ been wondering about when I could write it all up because the next three days are full for me ... but I did figure it out after about six straight hours of working on it.
So ... more info will be on its way eventually.

I appreciate everyone's willingness to help.

/David C.
0
 
LVL 2

Assisted Solution

by:learningtechnologies
learningtechnologies earned 0 total points
ID: 38316058
What I did to get the node rejoined to the cluster was to:

1) Copy the running process values from the ps aux | grep ^mysql command
2) Modify the running process values from using the network where none of the nodes were listening to the network where they were listening

Original process values shown by ps aux | grep ^mysql:

/usr/local/mysql/bin/mysqld_safe --basedir=/usr/local/mysql --datadir=/data/cluster-data --plugin-dir=/usr/local/mysql/lib/mysql/plugin --user=mysql --log-error=/data/cluster-data/pcdb01.err --pid-file=/data/cluster-data/pcdb01.pid --wsrep_cluster_address=gcomm://10.4.8.142:4567

Values passed to system as start up command that resulted in the node rejoining the cluster:

/usr/local/mysql/bin/mysqld_safe --basedir=/usr/local/mysql --datadir=/data/cluster-data --plugin-dir=/usr/local/mysql/lib/mysql/plugin --user=mysql --log-error=/data/cluster-data/pcdb01.err --pid-file=/data/cluster-data/pcdb01.pid --wsrep_cluster_address=gcomm://10.10.13.142

Discussion:
What happened was that the maintenance performed by the sysadmin which resulted in needing to reboot the system was to add an additional network to eth0.

Because none of the other nodes in the cluster had that network configured they were not able to communicate with that node.

That node was using the last network added to eth0 as its default network route for all traffic.

So - Percona Cluster (tertiary party MySQL product) relies heavily on Galera Cluster (quaternary party) which in turn relies on WSRep (quinary! party) for the communications layer between the nodes. (From what I am told, this is actually fairly common practice.)

Once I had all that teased apart and understood the error message more accurately, I found the following page to be VERY helpful:

http://www.codership.com/wiki/doku.php?id=info

especially this section on joing a new node to the cluster - because rejoining an existing node works the same way:

http://www.codership.com/wiki/doku.php?id=info#adding_another_node_to_a_cluster

I tried using the network address in the running process list above (10.4.8.142:4567) - but that failed with a 'cluster not found' error.

At this point I new that I needed to find which address one of the other nodes was running on.

A quick ps aux | grep ^mysql on node 2 showed that it was participating on the 10.10.13.142 address, not the 10.4.8.142 address.

Once I passed the correct network address value to node 1 for start up, it came up right away and began using rsync to catch itself up with the other nodes on the cluster.

I did not have to specify the port 4567 because that is the default port for Percona Cluster.

I think that's about it.

Any questions?

/David C.
0
 
LVL 2

Author Closing Comment

by:learningtechnologies
ID: 38333777
I figured it out on my own. I thought that the suggestion to visit Percona's google group was excellent.  I had not thought of that.
0

Featured Post

Fill in the form and get your FREE NFR key NOW!

Veeam is happy to provide a FREE NFR server license to certified engineers, trainers, and bloggers.  It allows for the non‑production use of Veeam Agent for Microsoft Windows. This license is valid for five workstations and two servers.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
This article shows the steps required to install WordPress on Azure. Web Apps, Mobile Apps, API Apps, or Functions, in Azure all these run in an App Service plan. WordPress is no exception and requires an App Service Plan and Database to install
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Suggested Courses
Course of the Month17 days, 12 hours left to enroll

831 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question