Is there anything fundamentally wrong in having the DFS block size to a low value say 64K? Given that the "NameNode" is completely distributed should this be an issue? Does this complicate the container reports in anyway? In short are there any issues doing this.
Answer by Lohit Vijayarenu · Jul 24, 2011 at 10:42 PM
Container size (and hence report) is not tied to file block size. Making block size small will only fit more blocks in a container. On the other hand having too low block size will have poor performance problem for your Map/Reduce job. Map/Reduce framework by default splits input into blocks and hands out blocks to mappers. Blocks are typically 64-128M so that mappers have enough data to process. If you do make block size small, you would have to change your input split calculation to over come this problem.