I generated a Terabyte of data on the cluster using TeraGen, and noticed that it said that Used reported around 700 GB used.

This leads me to ask, does the cluster take into consideration compression or replication on these reports?

asked 05 Aug '11, 05:27

Andrew%20Wells's gravatar image

Andrew Wells
3416814
accept rate: 100%


Is the used report you are referring from Dashboard page? If so, then used reported is raw disk space used across the entire cluster, no matter how the data is stored (compressed/uncompressed) It would also include snapshots.

If the value "used" reported you are seeing is per volume, via the volume properties page, then it displays used-uncompressed, used-compressed both without considering replication (ie, logical data size). It also shows the current replication factor of volume. From these, you can compute total disk usage as replication x used-compressed.

link

answered 05 Aug '11, 08:08

Lohit's gravatar image

Lohit ♦♦
2.1k313
accept rate: 44%

edited 06 Aug '11, 09:54

TedDunning's gravatar image

TedDunning ♦♦
3.6k322

What Lohit said is good. In general, we make a distinction between data size (how many bytes you can read) and disk usage (how many blocks of disk are consumed). For a fixed data size, compression decreases disk usage and replication increases disk usage.

(05 Aug '11, 10:27) TedDunning ♦♦
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×136
×22
×3

Asked: 05 Aug '11, 05:27

Seen: 1,368 times

Last updated: 06 Aug '11, 09:54

powered by OSQA