|
Been playing with a 3-node lab test of M3. Wondering if there's any way to force the datanodes to re-replicate data after a disk/node failure. Also curious how this is scheduled, and if the schedule can be changed. In a similar vein, curious how/when data blocks are scrubbed, and if that process can be manually started. Thanks! |
|
On a disk failure, the system re-replicates automatically. On a node-failure, the system waits for about 1 hour (to see if the node comes back). If the node doesn't come back, the data is re-replicated gently (10% at a time). But if two nodes fail, then the data whose repl factor has dropped to 1 will be re-replicated immediately (since the data is considered to be dangerously under-replicated). Note that all of the above thresholds are controllable via various params in the system (please see the documentation at http://mapr.com/doc). |
|
I've tried both of the above, and it doesn't seem to immediately begin, including taking 2 nodes of the 6 node cluster immediately offline. Instead, it begins after 15 minutes or so. It doesn't sound like there's a specific way to encourage the cluster to resolve. Does anyone have any idea how often the scrub runs across datanodes to verify checksums of all blocks? Where is that set? The documentation isn't entirely clear on either of these points. 1
It should be in 5 minutes, not 15 minutes. A container that is idle is not re-replicated immediately (the system simply waits for the servers to return). But if your program happens to start writing in any of the containers with repl=1, it will re-replicate immediately.
(18 Feb, 12:25)
MC Srivas ♦♦
If you crash the CLDB, it waits for 15 minutes after coming up to decide that servers are indeed down and containers need to be re-replicated. This is to give a chance for nodes to report to CLDB and for things to stabilize before we start re-replicating. Perhaps you are seeing that. The parameter "cldb.replication.manager.start.mins" is set to 15 mins by default, and can be changed to trigger re-replication sooner.
(20 Feb, 07:24)
MC Srivas ♦♦
|