|
If I build an M3 edition cluster. Let's say the one node that has CLDB on it has a catastrophic failure. What are the steps to recovery? Can I install CLDB on a couple nodes but only bring it up on one? So would the recovery in that case be:
And part of my sense is that CLDB is the "equivalent" of NameNode in that it needs to exist for clients to operate properly? If CLDB dies, do all clients die, too? |
|
Many questions here, I have summarized some of the main points as they apply to M3 Q1. How is the CLDB's data protected? The CLDB automatically re-replicates its data to other nodes in the cluster. It aims for a minimum of 2 copies at all times, but usually keeps 3 around. If the number of copies falls below 2, it will aggressively re-replicate before allowing any new updates. Any available node in the cluster is used, and the CLDB's data will move around as disk failures or node failures happen. Q2. When CLDB dies, what happens? The CLDB process is automatically restarted on the node. All jobs and all processes wait for the CLDB to return, and resume from where they left off, with no data or job loss. Q3. What happens if the node running the CLDB crashes and burns and never returns? The data is still safe, as it is replicated at least on one other node, and usually two. The nodes where the data is currently can be found by using a command (maprcli dump cldbnodes), and then install the mapr-cldb package on one of the those nodes (apt-get install mapr-cldb) and start it up (via configure.sh and /etc/init.d/mapr-warden start). Next go to all the other nodes, and run configure.sh to point them to the new CLDB node. Note that none of this is needed on M5. At this point, the new CLDB will automatically re-replicate itself to ensure that there are 2 copies, and run normally. What output should we expect to see from "maprcli dump cldbnodes"? I don't have a failed CLDB, but I thought it would be a good idea to get familiar with this process. I ran "maprcli dump cldbnodes -cluster mycluster -zkconnect mapr000:5181" and all I get back is: valid ... It doesn't matter if I connect to a zookeeper node or not, I get the same output. Am I doing something wrong?
(14 Oct '11, 10:25)
Matt
"maprcli dump cldbnodes -cluster mycluster -zkconnect mapr000:5181 -json" The -json format seems to work.
(14 Oct '11, 11:37)
Nabeel ♦♦
|
|
The CLDB data is not restricted to the CLDB nodes, but is in fact stored in volumes that are distributed across the cluster (and replicated). When you move the CLDB service, it immediately has access to the same data. Since you didn't mention actually physically removing any nodes, there would be no impact at all on the data layout. |
|
CLDB's data is replicated (assuming the cluster has more than 1 node). One can start multiple CLDB nodes, but failure of one CLDB will not cause failover in M3. If the node which was running the CLDB is gone forever, then one can start the CLDB on a node where the CLDB data was replicated. Can you please give the exact steps and what "tool"?
(05 Jul '11, 09:14)
jacques
|
|
You can use the following procedure to recover from a catastrophic failure of the CLDB node: http://www.mapr.com/doc/display/MapR/CLDB+Failover If I want to change the cldb node and add a new node to the cluster and assign it as the cldb, is there a way to get keep the data?
(22 Nov '11, 20:54)
Kas
|
|
You can add the CLDB to a different node. In an M3 cluster, the procedure would be similar to CLDB failover; you would add the CLDB role to the new node, stop the CLDB service on the original CLDB node, and so on. so, does this automatically copies over the data from the backup nodes?
(23 Nov '11, 12:45)
Kas
|