This is a roundabout way of asking the "which storage is more expensive for Hadoop - SAN or distributed DAS". If you can dial the FS replication factor down to 1 when using a SAN, then the storage cost equation favors SAN over default 3x replication factor on distributed DAS.
Answer by Ted Dunning · Mar 28, 2012 at 02:01 PM
Yes, there is a 3x cost to replication, but there is also a 3x performance increase and the increase in cost comes off of a baseline that is about 30x better than the net cost of most SAN storage. The net is that distributed local storage wins most price/performance comparisons very easily because of the massive residual advantage.
Your mileage will vary and a detailed comparison is a very complex computation, but with storage heavy nodes the numbers are hard to overcome.
As an example, with conventional hardware, 12 drives can move about a GB per second using MapR. With sufficient networking bandwidth, this gives you an aggregate bandwidth of about a TB/s for a 1000 node cluster. Smaller clusters also show very impressive performance. Even with stock Hadoop you can get about a third of this bandwidth which isn't all that shabby.
You mileage will vary, of course, and if somebody is giving you a big SAN for free and paying the power costs, you probably should come to a different answer. Likewise, if you are using EC2, their SAN is your only real choice for persistent storage.
Answer by Chris Almond · Mar 28, 2012 at 02:58 PM
Within the context of this thread, the what if a data node fails" question comes up. Here is my crack at an answer to that....
In a typical cluster using distributed DAS with “N” replication factor:
The cluster re-replicates the data blocks associated with the failed data node by sourcing an existing replica node and writing to a new replica node until Nx replication factor is restored.
In an atypical cluster that is using shared storage…
If replication factor is still > 1, then I’m thinking same thing happens as above, assuming the data nodes in the cluster see the SAN LUNs as local DAS devices… the cluster re-replicates the data from the SAN based “disk” devices associated with the failed data node by sourcing an existing replica node and writing to a new replica node until Nx replication factor is restored.
If replication factor = 1 (single copy), then I’m thinking the hadoop cluster will see that as a data loss because it will think the only replica of the data stored “on” that node has been lost, when in fact the SAN based “disks” mounted by the failed node are not affected by the node failure.
If my assumptions are correct, then using shared storage for hadoop doesn’t win us anything unless the hadoop distribution is smart enough see a data node failure as not affecting storage components locally assigned to it, and it must have the ability to reassign those local storage components to a healthy compute node.