If I build an M3 edition cluster. Let's say the one node that has CLDB on it has a catastrophic failure. What are the steps to recovery? Can I install CLDB on a couple nodes but only bring it up on one? So would the recovery in that case be:

  • Primary CLDB node fails
  • Reassign ip/host of original CLDB node to another
  • Startup replacement host?
  • Startup the new CLDB?
  • Is there any copying or some scripts that need to be run on each node or ???

And part of my sense is that CLDB is the "equivalent" of NameNode in that it needs to exist for clients to operate properly? If CLDB dies, do all clients die, too?

asked 05 Jul '11, 07:42

jacques's gravatar image

jacques
214353742
accept rate: 25%

edited 14 Jul '11, 22:18

MC%20Srivas's gravatar image

MC Srivas ♦♦
2.8k1519


Many questions here, I have summarized some of the main points as they apply to M3

Q1. How is the CLDB's data protected?

The CLDB automatically re-replicates its data to other nodes in the cluster. It aims for a minimum of 2 copies at all times, but usually keeps 3 around. If the number of copies falls below 2, it will aggressively re-replicate before allowing any new updates. Any available node in the cluster is used, and the CLDB's data will move around as disk failures or node failures happen.

Q2. When CLDB dies, what happens?

The CLDB process is automatically restarted on the node. All jobs and all processes wait for the CLDB to return, and resume from where they left off, with no data or job loss.

Q3. What happens if the node running the CLDB crashes and burns and never returns?

The data is still safe, as it is replicated at least on one other node, and usually two. The nodes where the data is currently can be found by using a command (maprcli dump cldbnodes), and then install the mapr-cldb package on one of the those nodes (apt-get install mapr-cldb) and start it up (via configure.sh and /etc/init.d/mapr-warden start). Next go to all the other nodes, and run configure.sh to point them to the new CLDB node. Note that none of this is needed on M5.

At this point, the new CLDB will automatically re-replicate itself to ensure that there are 2 copies, and run normally.

link

answered 05 Jul '11, 14:59

MC%20Srivas's gravatar image

MC Srivas ♦♦
2.8k1519
accept rate: 32%

edited 05 Jul '11, 15:28

What output should we expect to see from "maprcli dump cldbnodes"? I don't have a failed CLDB, but I thought it would be a good idea to get familiar with this process. I ran "maprcli dump cldbnodes -cluster mycluster -zkconnect mapr000:5181" and all I get back is:

valid ...

It doesn't matter if I connect to a zookeeper node or not, I get the same output. Am I doing something wrong?

(14 Oct '11, 10:25) Matt

"maprcli dump cldbnodes -cluster mycluster -zkconnect mapr000:5181 -json"

The -json format seems to work.

(14 Oct '11, 11:37) Nabeel ♦♦

The CLDB data is not restricted to the CLDB nodes, but is in fact stored in volumes that are distributed across the cluster (and replicated). When you move the CLDB service, it immediately has access to the same data. Since you didn't mention actually physically removing any nodes, there would be no impact at all on the data layout.

link

answered 23 Nov '11, 13:30

Peter%20Conrad's gravatar image

Peter Conrad ♦♦
926237
accept rate: 27%

CLDB's data is replicated (assuming the cluster has more than 1 node). One can start multiple CLDB nodes, but failure of one CLDB will not cause failover in M3. If the node which was running the CLDB is gone forever, then one can start the CLDB on a node where the CLDB data was replicated.

link

answered 05 Jul '11, 09:06

Lohit's gravatar image

Lohit ♦♦
2.1k313
accept rate: 44%

edited 05 Jul '11, 14:26

Peter%20Conrad's gravatar image

Peter Conrad ♦♦
926237

Can you please give the exact steps and what "tool"?

(05 Jul '11, 09:14) jacques

You can use the following procedure to recover from a catastrophic failure of the CLDB node: http://www.mapr.com/doc/display/MapR/CLDB+Failover

link

answered 05 Jul '11, 15:20

Peter%20Conrad's gravatar image

Peter Conrad ♦♦
926237
accept rate: 27%

If I want to change the cldb node and add a new node to the cluster and assign it as the cldb, is there a way to get keep the data?

(22 Nov '11, 20:54) Kas

You can add the CLDB to a different node. In an M3 cluster, the procedure would be similar to CLDB failover; you would add the CLDB role to the new node, stop the CLDB service on the original CLDB node, and so on.

link

answered 23 Nov '11, 09:27

Peter%20Conrad's gravatar image

Peter Conrad ♦♦
926237
accept rate: 27%

so, does this automatically copies over the data from the backup nodes?

(23 Nov '11, 12:45) Kas
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×74
×17

Asked: 05 Jul '11, 07:42

Seen: 3,770 times

Last updated: 23 Nov '11, 13:30

powered by OSQA