We currently have 100 maps available under a working cluster of 20 nodes. However when we start a hive job, only 14 maps are allocated. How can we control this and allow more maps per job?

asked 01 May '12, 13:15

thurman's gravatar image

thurman
1223
accept rate: 0%


Typically, the number of maps for a job is determined the size of the input data. More specifically, the number of chunks.

The chunksize is 256 MB by default. See the documentation for changing chunksize: http://mapr.com/doc/display/MapR/hadoop+mfs

link

answered 01 May '12, 15:36

steven's gravatar image

steven
32213
accept rate: 19%

This is more of a scheduling question. Despite having 100 maps available, we only ever are allowed 14maps concurrent per hive job. We can however run additional jobs to take up the remaining maps available, However what we'd like to do is have some control over this limit.

(01 May '12, 17:07) thurman

If it is a scheduling question then check the jobtracker page whether you see a backlog of hive jobs with lots of unscheduled map tasks. Otherwise, try tweaking input format, chunk size, and other parameters as suggested by Steven.

link

answered 01 May '12, 19:53

gera's gravatar image

gera
1.3k16
accept rate: 20%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×70
×1

Asked: 01 May '12, 13:15

Seen: 650 times

Last updated: 01 May '12, 19:53

powered by OSQA