This is a roundabout way of asking the "which storage is more expensive for Hadoop - SAN or distributed DAS". If you can dial the FS replication factor down to 1 when using a SAN, then the storage cost equation favors SAN over default 3x replication factor on distributed DAS.


asked 28 Mar '12, 09:39

Chris%20Almond's gravatar image

Chris Almond
accept rate: 9%

edited 28 Mar '12, 09:40

Yes, there is a 3x cost to replication, but there is also a 3x performance increase and the increase in cost comes off of a baseline that is about 30x better than the net cost of most SAN storage. The net is that distributed local storage wins most price/performance comparisons very easily because of the massive residual advantage.

Your mileage will vary and a detailed comparison is a very complex computation, but with storage heavy nodes the numbers are hard to overcome.

As an example, with conventional hardware, 12 drives can move about a GB per second using MapR. With sufficient networking bandwidth, this gives you an aggregate bandwidth of about a TB/s for a 1000 node cluster. Smaller clusters also show very impressive performance. Even with stock Hadoop you can get about a third of this bandwidth which isn't all that shabby.

You mileage will vary, of course, and if somebody is giving you a big SAN for free and paying the power costs, you probably should come to a different answer. Likewise, if you are using EC2, their SAN is your only real choice for persistent storage.


answered 28 Mar '12, 14:01

TedDunning's gravatar image

TedDunning ♦♦
accept rate: 23%

Ted - thank you for this response. I assume the "magic" SAN I refer too makes up for some of the aggregate network bandwidth benefits you mention too. I'm adding more to this thread below.

(28 Mar '12, 14:55) Chris Almond

Within the context of this thread, the what if a data node fails" question comes up. Here is my crack at an answer to that....

In a typical cluster using distributed DAS with “N” replication factor:

  • The cluster re-replicates the data blocks associated with the failed data node by sourcing an existing replica node and writing to a new replica node until Nx replication factor is restored.

In an atypical cluster that is using shared storage…

  • If replication factor is still > 1, then I’m thinking same thing happens as above, assuming the data nodes in the cluster see the SAN LUNs as local DAS devices… the cluster re-replicates the data from the SAN based “disk” devices associated with the failed data node by sourcing an existing replica node and writing to a new replica node until Nx replication factor is restored.
  • If replication factor = 1 (single copy), then I’m thinking the hadoop cluster will see that as a data loss because it will think the only replica of the data stored “on” that node has been lost, when in fact the SAN based “disks” mounted by the failed node are not affected by the node failure.

If my assumptions are correct, then using shared storage for hadoop doesn’t win us anything unless the hadoop distribution is smart enough see a data node failure as not affecting storage components locally assigned to it, and it must have the ability to reassign those local storage components to a healthy compute node.


answered 28 Mar '12, 14:58

Chris%20Almond's gravatar image

Chris Almond
accept rate: 9%

What you say is pretty much correct.

In the single replica case, if you can move the LUN's to a new node, the MapR can recognize the storage pools and re-instantiate them. So you get part of what you want, but not all. Notably, manual intervention will be required. This same process can be used to recover data in dire cases with multiple node failures where enough disks survive.

The local storage replication case is what we design for. Certain aspects of the design can help the other case, but isn't the design center so you may have a little bit of round hole/square peg effect.

(28 Mar '12, 15:05) TedDunning ♦♦
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text]( "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 28 Mar '12, 09:39

Seen: 1,581 times

Last updated: 28 Mar '12, 15:05

powered by OSQA