I setup a 6-nodes cluster for 1 week, and I find the data seems not well balanced among all disks in the cluster.
Take today's data for instance:
Data seems prefer to the master-node, but why?
And what is the strategy for the data-balancing?
How can I balance data in my cluster?
asked 21 Sep '11, 19:32
The MapR system balances automatically by moving data from nodes that are more full than the cluster average. A node that is within +/-10% of the cluster-average is considered to be within average fullness.
Additionally, by default the balancer doesn't move data from a node unless it is atleast 70% full. This behavior can be changed by modifying the config variable "cldb.balancer.disk.threshold.percentage".
In the instance that you pasted, all of the nodes have disk-utilization within 10% of the average. And, all of them are < 70%. So, the balancer would take no action.
By master, I'm assuming you mean the CLDB node. MapR tries hard to keep the first copy of every write local - if this causes an anamoly that causes some nodes to have excessive space utilization compared to the rest of the cluster (+/- 10% of cluster-avg), the balancer will fix it.
Please look at http://www.mapr.com/doc/display/MapR/Balancers for additional details regarding the balancer.
answered 22 Sep '11, 14:27
MC Srivas ♦♦
Is the disk balancer running on your cluster? Cut and paste the output of: /opt/mapr/bin/maprcli config load -json | grep "cldb.balancer"
Also, how many volumes do you have?
answered 21 Sep '11, 20:43
hadoop@lord:~$ /opt/mapr/bin/maprcli config load -json | grep "cldb.balancer"
I have 18 system-volumes and 11 user-data-volumes in my cluster.
The data in system-volumes is a little, most data is in the user-data-volume: