Link to home
Start Free TrialLog in
Avatar of marrowyung
marrowyung

asked on

my percona xtraDB cluster suddently dead and how to fix it

hi,

my PXC is down and this is the error message I got from error log:


2020-08-26T21:06:22.229917+08:00 0 [Note] [MY-000000] [Galera] Quorum results:
        version    = 6,
        component  = PRIMARY,
        conf_id    = 4,
        members    = 2/2 (primary/total),
        act_id     = 6004,
        last_appl. = 5999,
        protocols  = 2/10/4 (gcs/repl/appl),
        vote policy= 0,
        group UUID = f55a0575-de37-11ea-be24-0b54a6765364
2020-08-26T21:06:22.230038+08:00 0 [Note] [MY-000000] [Galera] Flow-control inte                                        rval: [141, 141]
2020-08-26T21:06:22.230319+08:00 1 [Note] [MY-000000] [Galera] ####### processin                                        g CC 6005, local, ordered
2020-08-26T21:06:22.230388+08:00 1 [Note] [MY-000000] [Galera] Drain monitors fr                                        om 6004 upto 6004
2020-08-26T21:06:22.230429+08:00 1 [Note] [MY-000000] [Galera] REPL Protocols: 1                                        0 (5, 3)
2020-08-26T21:06:22.230471+08:00 1 [Note] [MY-000000] [Galera] ####### My UUID:                                         995a4f35-e2a4-11ea-bdc3-aad79852f241
2020-08-26T21:06:22.230504+08:00 1 [Note] [MY-000000] [Galera] ####### ST not re                                        quired
2020-08-26T21:06:22.230533+08:00 1 [Note] [MY-000000] [Galera] Skipping cert ind                                        ex reset
2020-08-26T21:06:22.230566+08:00 1 [Note] [MY-000000] [Galera] ####### Adjusting                                         cert position: 6004 -> 6005
2020-08-26T21:06:22.230630+08:00 0 [Note] [MY-000000] [Galera] Service thread qu                                        eue flushed.
2020-08-26T21:06:22.230731+08:00 1 [Note] [MY-000000] [Galera] ####### Setting m                                        onitor position to 6005
2020-08-26T21:06:22.261514+08:00 1 [Note] [MY-000000] [Galera] Recording CC from                                         group: 6005
2020-08-26T21:06:22.261595+08:00 1 [Note] [MY-000000] [Galera] Lowest cert index                                         boundary for CC from group: 6005
2020-08-26T21:06:22.261620+08:00 1 [Note] [MY-000000] [Galera] Min available fro                                        m gcache for CC from group: 5972
2020-08-26T21:06:22.261705+08:00 1 [Note] [MY-000000] [Galera] =================                                        ===============================
View:
  id: f55a0575-de37-11ea-be24-0b54a6765364:6005
  status: primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATIO                                        N, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: 0
  members(2):
        0: 995a4f35-e2a4-11ea-bdc3-aad79852f241, iuatifmkdb22
        1: f2266321-e2a4-11ea-b702-d66d0cee8089, iuatifmkdb23
=================================================
2020-08-26T21:06:22.261763+08:00 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd i                                        s not defined, skipping notification.
2020-08-26T21:06:24.860399+08:00 0 [Note] [MY-000000] [Galera]  cleaning up 4855                                        f0e4 (tcp://<IP address>:4567)
2020-08-26T21:06:30.613053+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') connection established to 4855f0e4 tcp://<IP address>:4567
2020-08-26T21:06:30.613742+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2020-08-26T21:06:30.619103+08:00 0 [Note] [MY-000000] [Galera] declaring 4855f0e                                        4 at tcp://<IP address>:4567 stable
2020-08-26T21:06:30.619157+08:00 0 [Note] [MY-000000] [Galera] declaring f226632                                        1 at tcp://<IP address>:4567 stable
2020-08-26T21:06:30.619692+08:00 0 [Note] [MY-000000] [Galera] Node 995a4f35 sta                                        te primary
2020-08-26T21:06:30.621315+08:00 0 [Note] [MY-000000] [Galera] Current view of c                                        luster as seen by this node
view (view_id(PRIM,4855f0e4,45)
memb {
        4855f0e4,0
        995a4f35,0
        f2266321,0
        }
joined {
        }
left {
        }
partitioned {
        }
)
2020-08-26T21:06:30.621364+08:00 0 [Note] [MY-000000] [Galera] Save the discover                                        ed primary-component to disk
2020-08-26T21:06:30.639510+08:00 0 [Note] [MY-000000] [Galera] New COMPONENT: pr                                        imary = yes, bootstrap = no, my_idx = 1, memb_num = 3
2020-08-26T21:06:30.639577+08:00 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: W                                        aiting for state UUID.
2020-08-26T21:06:30.645994+08:00 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: s                                        ent state msg: f5693549-e79c-11ea-a174-56dfef54d120
2020-08-26T21:06:30.646473+08:00 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: g                                        ot state msg: f5693549-e79c-11ea-a174-56dfef54d120 from 0 (iuatifmkdb25)
2020-08-26T21:06:30.646521+08:00 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: g                                        ot state msg: f5693549-e79c-11ea-a174-56dfef54d120 from 1 (iuatifmkdb22)
2020-08-26T21:06:30.646569+08:00 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: g                                        ot state msg: f5693549-e79c-11ea-a174-56dfef54d120 from 2 (iuatifmkdb23)
2020-08-26T21:06:30.646597+08:00 0 [Note] [MY-000000] [Galera] Quorum results:
        version    = 6,
        component  = PRIMARY,
        conf_id    = 5,
        members    = 2/3 (primary/total),
        act_id     = 6005,
        last_appl. = 5999,
        protocols  = 2/10/4 (gcs/repl/appl),
        vote policy= 0,
        group UUID = f55a0575-de37-11ea-be24-0b54a6765364
2020-08-26T21:06:30.646731+08:00 0 [Note] [MY-000000] [Galera] Flow-control inte                                        rval: [173, 173]
2020-08-26T21:06:30.647008+08:00 1 [Note] [MY-000000] [Galera] ####### processin                                        g CC 6006, local, ordered
2020-08-26T21:06:30.647066+08:00 1 [Note] [MY-000000] [Galera] Drain monitors fr                                        om 6005 upto 6005
2020-08-26T21:06:30.647096+08:00 1 [Note] [MY-000000] [Galera] REPL Protocols: 1                                        0 (5, 3)
2020-08-26T21:06:30.647127+08:00 1 [Note] [MY-000000] [Galera] ####### My UUID:                                         995a4f35-e2a4-11ea-bdc3-aad79852f241
2020-08-26T21:06:30.647150+08:00 1 [Note] [MY-000000] [Galera] ####### ST not re                                        quired
2020-08-26T21:06:30.647170+08:00 1 [Note] [MY-000000] [Galera] Skipping cert ind                                        ex reset
2020-08-26T21:06:30.647193+08:00 1 [Note] [MY-000000] [Galera] ####### Adjusting                                         cert position: 6005 -> 6006
2020-08-26T21:06:30.647248+08:00 0 [Note] [MY-000000] [Galera] Service thread qu                                        eue flushed.
2020-08-26T21:06:30.647316+08:00 1 [Note] [MY-000000] [Galera] ####### Setting m                                        onitor position to 6006
2020-08-26T21:06:30.649593+08:00 0 [Note] [MY-000000] [Galera] Member 0.0 (iuati                                        fmkdb25) requested state transfer from '*any*'. Selected 1.0 (iuatifmkdb22)(SYNC                                        ED) as donor.
2020-08-26T21:06:30.649639+08:00 0 [Note] [MY-000000] [Galera] Shifting SYNCED -                                        > DONOR/DESYNCED (TO: 6006)
2020-08-26T21:06:30.671764+08:00 1 [Note] [MY-000000] [Galera] Recording CC from                                         group: 6006
2020-08-26T21:06:30.671931+08:00 1 [Note] [MY-000000] [Galera] Lowest cert index                                         boundary for CC from group: 6006
2020-08-26T21:06:30.671973+08:00 1 [Note] [MY-000000] [Galera] Min available fro                                        m gcache for CC from group: 5972
2020-08-26T21:06:30.672056+08:00 1 [Note] [MY-000000] [Galera] =================                                        ===============================
View:
  id: f55a0575-de37-11ea-be24-0b54a6765364:6006
  status: primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATIO                                        N, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: 1
  members(3):
        0: 4855f0e4-e2a5-11ea-b57b-2a6d1a5ea2b7, iuatifmkdb25
        1: 995a4f35-e2a4-11ea-bdc3-aad79852f241, iuatifmkdb22
        2: f2266321-e2a4-11ea-b702-d66d0cee8089, iuatifmkdb23
=================================================
2020-08-26T21:06:30.672106+08:00 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd i                                        s not defined, skipping notification.
2020-08-26T21:06:30.703596+08:00 1 [Note] [MY-000000] [Galera] Detected STR vers                                        ion: 1, req_len: 87, req: STRv1
2020-08-26T21:06:30.703701+08:00 1 [Note] [MY-000000] [Galera] IST request: f55a                                        0575-de37-11ea-be24-0b54a6765364:6004-6006|tcp://<IP address>:4568
2020-08-26T21:06:30.705597+08:00 0 [Note] [MY-000000] [Galera] async IST sender                                         starting to serve tcp://<IP address>:4568 sending 6005-6006
2020-08-26T21:06:30.706433+08:00 0 [Note] [MY-000000] [Galera] IST sender 6005 -                                        > 6006
2020-08-26T21:06:30.706458+08:00 0 [Note] [MY-000000] [Galera] 1.0 (iuatifmkdb22                                        ): State transfer to 0.0 (iuatifmkdb25) complete.
2020-08-26T21:06:30.706509+08:00 0 [Note] [MY-000000] [Galera] Shifting DONOR/DE                                        SYNCED -> JOINED (TO: 6006)
2020-08-26T21:06:30.707166+08:00 0 [Note] [MY-000000] [Galera] Member 1.0 (iuati                                        fmkdb22) synced with group.
2020-08-26T21:06:30.707204+08:00 0 [Note] [MY-000000] [Galera] Shifting JOINED -                                        > SYNCED (TO: 6006)
2020-08-26T21:06:30.707261+08:00 1 [Note] [MY-000000] [Galera] Server iuatifmkdb                                        22 synced with group
2020-08-26T21:06:30.797688+08:00 0 [Note] [MY-000000] [Galera] async IST sender                                         served
2020-08-26T21:06:30.829325+08:00 0 [Note] [MY-000000] [Galera] 0.0 (iuatifmkdb25                                        ): State transfer from 1.0 (iuatifmkdb22) complete.
2020-08-26T21:06:30.830583+08:00 0 [Note] [MY-000000] [Galera] Member 0.0 (iuati                                        fmkdb25) synced with group.
2020-08-26T21:06:33.862430+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') turning message relay requesting off
2020-08-26T21:10:54.588710+08:00 0 [Warning] [MY-000000] [Galera] last inactive                                         check more than PT1.5S (3*evs.inactive_check_period) ago (PT3.17151S), skipping                                         check
2020-08-26T21:10:55.088728+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') connection to peer 4855f0e4 with addr tcp://<IP address>:4567                                         timed out, no messages seen in PT3S (gmcast.peer_timeout)
2020-08-26T21:10:55.089033+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') connection to peer f2266321 with addr tcp://<IP address>:4567                                         timed out, no messages seen in PT3S (gmcast.peer_timeout)
2020-08-26T21:10:55.089104+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.18                                        .116.186:4567 tcp://<IP address>:4567
2020-08-26T21:10:56.089413+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') reconnecting to f2266321 (tcp://<IP address>:4567), attempt 0
2020-08-26T21:10:56.090266+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') reconnecting to 4855f0e4 (tcp://<IP address>:4567), attempt 0
2020-08-26T21:10:56.092185+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') connection established to f2266321 tcp://<IP address>:4567
2020-08-26T21:10:56.092985+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') connection established to 4855f0e4 tcp://<IP address>:4567
2020-08-26T21:10:59.591477+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') turning message relay requesting off
2020-08-26T21:11:03.283008+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') connection to peer 4855f0e4 with addr tcp://<IP address>:4567                                         timed out, no messages seen in PT3S (gmcast.peer_timeout)
2020-08-26T21:11:03.283264+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') connection to peer f2266321 with addr tcp://<IP address>:4567                                         timed out, no messages seen in PT3S (gmcast.peer_timeout)
2020-08-26T21:11:03.283338+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.18                                        .116.186:4567 tcp://<IP address>:4567
2020-08-26T21:11:03.283395+08:00 0 [Warning] [MY-000000] [Galera] last inactive                                         check more than PT1.5S (3*evs.inactive_check_period) ago (PT3.1916S), skipping c                                        heck
2020-08-26T21:11:04.234524+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') connection established to f2266321 tcp://<IP address>:4567
2020-08-26T21:11:04.237550+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') connection established to 4855f0e4 tcp://<IP address>:4567
2020-08-26T21:11:07.283651+08:00 0 [Note] [MY-000000] [Galera] (995a4f35, 'tcp:/                                        /0.0.0.0:4567') turning message relay requesting off


Open in new window


any idea on why it even can't bootstrap at all ?



SOLUTION
Avatar of theGhost_k8
theGhost_k8
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of marrowyung
marrowyung

ASKER

" I'd avoid looking at "Notes" to start my debug and consider "Errors" first. "

What is the command to only see error part of the log ?  other than general log and slow query log, what log I should read on the REAL problem of the PXC ?
forgot the tell you that after one day without doing anything, the PXC back to life by itself!

but data sync has problem.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
As rightly pointed by Tomas you're looking at right error log and it is where you should see errors, but I couldn't see errors as you were mentioning.

I am not sure how do you mean by data sync has a problem but easiest way is identify "master" node as a single source of truth, purge data on others and let it rejoin the cluster with full SST.
Tomas Helgi Johannsson ,

"This is crucial to log any errors and put them aside from any other noice in the main/general log. "
already check and that's why I post the error.

"Any error code in the log can be further analyzed with the perror utility that displays more detailed info on the error (more human readable :) ) "

actually from the log I post, what is the code I should look for ? it seems not error code at all!

"The only "error" I see in the provided log-part is tcp connection timeout which may be caused of misconfiguration or network related issues."

this probably say to me that someone change sth in the network and make it doesn't work! so is that mean once the network is ok the galera cluster will recover itself ?

theGhost_k8 ,

"I am not sure how do you mean by data sync has a problem but easiest way is identify "master" node as a single source of truth, purge data on others and let it rejoin the cluster with full SST. "

I tested it by creating a login and it don't sync,  this is the problem, if I see cluster seems can't connect we should test it to make sure that everything is ok.





Hi,

this probably say to me that someone change sth in the network and make it doesn't work! so is that mean once the network is ok the galera cluster will recover itself ?
Once a node in the cluster joins the cluster after it has lost connection to the rest of the cluster it starts the recovery of missing data. The node and the cluster needs to have enough available buffers and wsrep_slave_threads (default is 1 - max 512) to quickly apply missing data in parallel.
And thus binlog retention policy should be equal to the incremental backup retention policy.

Regards,
    Tomas Helgi

" and wsrep_slave_threads (default is 1 - max 512) to quickly apply missing data in parallel. "

wsrep_slave_threads highly related to the speed data in sync with each other?

"The node and the cluster needs to have enough available buffers "

what is the buffer you are saying ? what is the global variable name for it and what should be the size ?

"Once a node in the cluster joins the cluster after it has lost connection to the rest of the cluster it starts the recovery of missing data.  "
how often they detect if the peer back online or connectable ?
Hi,

Gcache_size and wsrep_slave_threads as well as other configuration that I mentioned here needs to be tuned in respect to your workload. The slave threads takes data from a workingset stored in the gcache and replay them on the tables on the node.
One parameter not mentioned in my comments to the question is the
gcache.recover=yes 

Open in new window

in wsrep_provider_options that prevents the gcache to be deleted and recreated at node startup and thus preserves the IST transfer rather than jumping to SST transfer (which in many cases are slower). This blog-article is worth reading as well as the other links I have posted in relation to this matter.

Regards,
    Tomas Helgi
"in wsrep_provider_options that prevents the gcache to be deleted  "

by this, the gcache is not going to stay forever, right? must have retention period ?
under what situation this setting is not going to work ?when the full and increment backup cycle start again ?
Hi,

The gcache is deleted and recreated at node startup (by default the gcache.recover=NO). However as stated in these links you need the gcache to survive when a node is brought down for maintenance or crash to be able to favor IST transfer over SST transfer. The gcache has nothing to do with the database backups. It holds all the workingsets (transactions in a UOW) that a node gets from other nodes to replay. Once a workingset is replayed on the node it is deleted from the gcache.
https://galeracluster.com/2016/11/faster-cluster-restarts-in-3-19-with-gcache-recovery/
https://www.percona.com/blog/2016/11/30/galera-cache-gcache-finally-recoverable-restart/

Regards,
     Tomas Helgi
"Once a workingset is replayed on the node it is deleted from the gcache. "

so it means it never need retention period at all ? it just use to store transaction log, the bin log.




HI,

Don't confuse the gcache with the binlog. They have different purposes although they at some point hold the same data.
Difference between the binlog and the gcache is that the binlog is never emptied of the transactional data while the data that has been replayed into the instances tables is deleted from the gcache.
https://www.percona.com/blog/2016/04/11/understanding-gcache-record-set-cache-percona-xtradb-cluster/
https://www.burnison.ca/notes/fun-mysql-fact-of-the-day-binlog-cache
https://severalnines.com/database-blog/how-avoid-sst-when-adding-new-node-mysql-galera-cluster
https://severalnines.com/database-blog/understanding-gcache-galera

Regards,
    Tomas Helgi
"Difference between the binlog and the gcache is that the binlog is never emptied of the transactional data while the data that has been replayed into the instances tables is deleted from the gcache. "

but bin log has retention period too? once the backup is done and committed log are truncated ?

yeah they are similiar !

Look at it conceptually, binlog is a LOG and gcache is a CACHE.

ok. cache is the STORE the uncommitted log ?
Hi,

gcache holds the same data as the binlog.Thus
When the node is done with the transaction and is about to commit, it will generate the final-write-set using the two files (if the data size grew enough to use FileStore) plus HEADER, and will publish it for certification to cluster.
The native node executing the transaction will also act as subscription node, and will receive its own write-set through the cluster publish mechanism. This time, the native node will try to cache write-set into its GCache. How much data GCache retains is controlled by the GCache configuration.
GCache holds the write-set published on the cluster for replication. The lifetime of write-set in GCache is not transaction-linked
Taken from the first link I provided. This means that when a transaction is about to be commited it is written to binlog and gcache. Gcache is only used by the cluster to replicate data to other nodes and replay each writeset ( set of one or more commited transactions) on each node.

Regards,
    Tomas Helgi
"Gcache is only used by the cluster to replicate data to other nodes and replay each writeset ( set of one or more commited transactions) on each node. "
ok, very clear ! so once the transaction is committed and replicated to the slave, it will be removed from the Gache?


"This means that when a transaction is about to be commited it is written to binlog  "

and the transaction in the binlog will be commited to the data files? and once that transaction will be remove from the log ?

Hi,
ok, very clear ! so once the transaction is committed and replicated to the slave, it will be removed from the Gache?
Yes

and the transaction in the binlog will be commited to the data files? and once that transaction will be remove from the log ? 
No, the transaction is never removed from the binlog. The binlog is used by master-slave replication as well as to restore incremental backups ( point in time recovery ) and thus removing transaction from such log would make the relevant data inconsistent and thus compromising the integrity of the database.
Removing data from binlog or part of a binlog will render the incremental backups that rely on those logs useless.
And thus the binlog retention policy should be equal to the incremental backup retention policy to have a valid incremental backup at any time.

Regards,
   Tomas Helgi
"ok. cache is the STORE the uncommitted log ?"
> you can think of it so... it stores the write-sets / txns coming from other nodes...

"ok, very clear ! so once the transaction is committed and replicated to the slave, it will be removed from the Gache? "
> No. It is a "circular cache"... Once it is full, it will start to overwrite from the beginning.

"the transaction in the binlog will be commited to the data files? and once that transaction will be remove from the log ? "
> I dont think I can think of anything that is removing information from a log. binlog is never removed, it is a log.
Thomas,

"And thus the binlog retention policy should be equal to the incremental backup retention policy to have a valid incremental backup at any time. "

so this means it is the backup process clean the bin log transaction up ?


"backup process clean the bin log transaction up ? "
> No process does anything with the binary logs. Not mysql, not xtrabackup.
> Binary logs are written and retained by mysql and that's it.

"Binary logs are written and retained by mysql and that's it."
gcache will be clean up automatically once slave receive it and commit it./
what I want to know is when binary log clean up, or recycle should I say? if it is not recycle the log file will occupy all disk space.

or just the binlog retention policy does the matter? 
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
tks.
hopefully I am not going to come back.
hi,

I read this: https://www.percona.com/blog/2016/11/30/galera-cache-gcache-finally-recoverable-restart/ 

gcache revive doesn’t work if . . .If gcache pages are involved. Gcache pages are still removed on shutdown, and the gcache write-set until that point also gets cleared.
Again let’s see and example:

Open in new window

  • Let’s assume the same configuration and workflow as mentioned above. We will just change the workload pattern.

    Open in new window

  • n1, n2, n3 are in sync and an average-size workload is executed, such that the write-set fits in the gcache. (seqno=1-x)

    Open in new window

  • n2 and n3 are shutdown.

    Open in new window

  • n1 continues to operate and executes some average size workload followed by a huge transaction that results in the creation of a gcache page. (1-x-a-b-c-h) [h represent transaction seqno]

    Open in new window

  • Now n1 is shutdown. During shutdown, gcache pages are purged (irrespective of the keep_page_sizes setting).

    Open in new window

  • The purge ensures that all the write-sets that has seqno smaller than gcache-page-residing write-set are purged, too. This effectively means (1-h) everything is removed, including (a,b,c).

    Open in new window

  • On restart, even though n1 can revive the gcache it can’t revive anything, as all the write-sets are purged.

    Open in new window

  • When n2 boots up, it requests IST, but n1 can’t service the missing write-set (a,b,c,h). This causes SST to take place.

    Open in new window



and I don't know why:

  • n1 continues to operate and executes some average size workload followed by a huge transaction that results in the creation of a gcache page. (1-x-a-b-c-h) [h represent transaction seqno]
  • Now n1 is shutdown. During shutdown, gcache pages are purged (irrespective of the keep_page_sizes setting)

 I don't understand why even gcache is running SST still needed.


Marrow,
I'm late in commenting though I hope you have already figured it out yourself but for the sake completeness let me just comment... the last line " I don't understand why even gcache is running SST still needed."
- gcache is a cache and not a service. So it is not "running"
- If gcache size is not enough to maintain the changes for the node to join and catchup, it will have to do an SST.
You can read more about gcache on documentation page to get more clarity.