Is there anything fundamentally wrong in having the DFS block size to a low value say 64K? Given that the "NameNode" is completely distributed should this be an issue? Does this complicate the container reports in anyway? In short are there any issues doing this.

asked 24 Jul '11, 22:31

John's gravatar image

accept rate: 0%

retagged 10 Aug '11, 09:00

Andrew%20Wells's gravatar image

Andrew Wells

Container size (and hence report) is not tied to file block size. Making block size small will only fit more blocks in a container. On the other hand having too low block size will have poor performance problem for your Map/Reduce job. Map/Reduce framework by default splits input into blocks and hands out blocks to mappers. Blocks are typically 64-128M so that mappers have enough data to process. If you do make block size small, you would have to change your input split calculation to over come this problem.


answered 24 Jul '11, 22:42

Lohit's gravatar image

Lohit ♦♦
accept rate: 44%

This is an issue only for Map/Reduce workload. If MR is taken out of the picture lower block size doesn't affect the system's behavior in any way or form?

(24 Jul '11, 23:05) John

Block size does affect how many containers a file can live in. If you have a 1MB file with a block size of 2MB, that file is going to live in one container. If you decrease the block size to 100KB, then you can have that file in 10 containers. This means that reads of the file can be parallelized.

Of course, with such small reads it may be difficult to detect the difference.

(24 Jul '11, 23:08) TedDunning ♦♦

Of course there's also overhead. If the system is asked to move every 64K to a new spot in the cluster, there's corresponding meta-data overhead involved, triplicated, of about 300 bytes of disk per chunk. Plus the number of rpc's to read the file increases.

(24 Jul '11, 23:26) MC Srivas ♦♦
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text]( "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 24 Jul '11, 22:31

Seen: 3,067 times

Last updated: 10 Aug '11, 09:00

powered by OSQA