|
I generated a Terabyte of data on the cluster using TeraGen, and noticed that it said that Used reported around 700 GB used. This leads me to ask, does the cluster take into consideration compression or replication on these reports? |
|
Is the used report you are referring from Dashboard page? If so, then used reported is raw disk space used across the entire cluster, no matter how the data is stored (compressed/uncompressed) It would also include snapshots. If the value "used" reported you are seeing is per volume, via the volume properties page, then it displays used-uncompressed, used-compressed both without considering replication (ie, logical data size). It also shows the current replication factor of volume. From these, you can compute total disk usage as replication x used-compressed. What Lohit said is good. In general, we make a distinction between data size (how many bytes you can read) and disk usage (how many blocks of disk are consumed). For a fixed data size, compression decreases disk usage and replication increases disk usage.
(05 Aug '11, 10:27)
TedDunning ♦♦
|