If I build an M3 edition cluster. Let's say the one node that has CLDB on it has a catastrophic failure. What are the steps to recovery? Can I install CLDB on a couple nodes but only bring it up on one? So would the recovery in that case be:
Primary CLDB node fails
Reassign ip/host of original CLDB node to another
Startup replacement host?
Startup the new CLDB?
Is there any copying or some scripts that need to be run on each node or ???
And part of my sense is that CLDB is the "equivalent" of NameNode in that it needs to exist for clients to operate properly? If CLDB dies, do all clients die, too?
Answer by MC Srivas · Jul 05, 2011 at 02:59 PM
Many questions here, I have summarized some of the main points as they apply to M3
Q1. How is the CLDB's data protected?
The CLDB automatically re-replicates its data to other nodes in the cluster. It aims for a minimum of 2 copies at all times, but usually keeps 3 around. If the number of copies falls below 2, it will aggressively re-replicate before allowing any new updates. Any available node in the cluster is used, and the CLDB's data will move around as disk failures or node failures happen.
Q2. When CLDB dies, what happens?
The CLDB process is automatically restarted on the node. All jobs and all processes wait for the CLDB to return, and resume from where they left off, with no data or job loss.
Q3. What happens if the node running the CLDB crashes and burns and never returns?
The data is still safe, as it is replicated at least on one other node, and usually two. The nodes where the data is currently can be found by using a command (maprcli dump cldbnodes), and then install the mapr-cldb package on one of the those nodes (apt-get install mapr-cldb) and start it up (via configure.sh and /etc/init.d/mapr-warden start). Next go to all the other nodes, and run configure.sh to point them to the new CLDB node. Note that none of this is needed on M5.
At this point, the new CLDB will automatically re-replicate itself to ensure that there are 2 copies, and run normally.
Answer by Peter Conrad · Nov 23, 2011 at 01:30 PM
The CLDB data is not restricted to the CLDB nodes, but is in fact stored in volumes that are distributed across the cluster (and replicated). When you move the CLDB service, it immediately has access to the same data. Since you didn't mention actually physically removing any nodes, there would be no impact at all on the data layout.
Answer by Lohit Vijayarenu · Jul 05, 2011 at 09:06 AM
CLDB's data is replicated (assuming the cluster has more than 1 node). One can start multiple CLDB nodes, but failure of one CLDB will not cause failover in M3. If the node which was running the CLDB is gone forever, then one can start the CLDB on a node where the CLDB data was replicated.
Answer by Peter Conrad · Nov 23, 2011 at 09:27 AM
You can add the CLDB to a different node. In an M3 cluster, the procedure would be similar to CLDB failover; you would add the CLDB role to the new node, stop the CLDB service on the original CLDB node, and so on.