I generated a Terabyte of data on the cluster using TeraGen, and noticed that it said that Used reported around 700 GB used.
This leads me to ask, does the cluster take into consideration compression or replication on these reports?
Answer by Lohit Vijayarenu · Aug 05, 2011 at 08:08 AM
Is the used report you are referring from Dashboard page? If so, then used reported is raw disk space used across the entire cluster, no matter how the data is stored (compressed/uncompressed) It would also include snapshots.
If the value "used" reported you are seeing is per volume, via the volume properties page, then it displays used-uncompressed, used-compressed both without considering replication (ie, logical data size). It also shows the current replication factor of volume. From these, you can compute total disk usage as replication x used-compressed.
MapR and snappy compression 1 Answer
M3 1 hr delay 2 Answers