|
Is there anything fundamentally wrong in having the DFS block size to a low value say 64K? Given that the "NameNode" is completely distributed should this be an issue? Does this complicate the container reports in anyway? In short are there any issues doing this. |
|
Container size (and hence report) is not tied to file block size. Making block size small will only fit more blocks in a container. On the other hand having too low block size will have poor performance problem for your Map/Reduce job. Map/Reduce framework by default splits input into blocks and hands out blocks to mappers. Blocks are typically 64-128M so that mappers have enough data to process. If you do make block size small, you would have to change your input split calculation to over come this problem. This is an issue only for Map/Reduce workload. If MR is taken out of the picture lower block size doesn't affect the system's behavior in any way or form?
(24 Jul '11, 23:05)
John
Block size does affect how many containers a file can live in. If you have a 1MB file with a block size of 2MB, that file is going to live in one container. If you decrease the block size to 100KB, then you can have that file in 10 containers. This means that reads of the file can be parallelized. Of course, with such small reads it may be difficult to detect the difference.
(24 Jul '11, 23:08)
TedDunning ♦♦
Of course there's also overhead. If the system is asked to move every 64K to a new spot in the cluster, there's corresponding meta-data overhead involved, triplicated, of about 300 bytes of disk per chunk. Plus the number of rpc's to read the file increases.
(24 Jul '11, 23:26)
MC Srivas ♦♦
|