The cluster was running all fine, all of sudden all task trackers went down.

Task Tracker Log:

2012-04-16 06:36:43,602 ERROR org.apache.hadoop.mapred.TaskTracker: Failed to send heartbeat to jobTracker. java.io.IOException: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException
        at org.apache.hadoop.mapred.SchedulingAlgorithms$FairShareComparator.compare(SchedulingAlgorithms.java:95)
        at org.apache.hadoop.mapred.SchedulingAlgorithms$FairShareComparator.compare(SchedulingAlgorithms.java:68)
        at java.util.Arrays.mergeSort(Arrays.java:1270)
        at java.util.Arrays.sort(Arrays.java:1210)
        at java.util.Collections.sort(Collections.java:159)
        at org.apache.hadoop.mapred.FairScheduler.assignTasks(FairScheduler.java:629)
        at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3763)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:964)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1318)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1314)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1312)
. Exiting...
2012-04-16 06:36:43,607 INFO org.apache.hadoop.util.AsyncDiskService: Shutting down all AsyncDiskService threads...
2012-04-16 06:36:43,608 INFO org.apache.hadoop.util.AsyncDiskService: All AsyncDiskService threads are terminated.
2012-04-16 06:36:43,608 INFO org.apache.hadoop.util.MRAsyncDiskService: Deleting toBeDeleted directory.
2012-04-16 06:36:43,609 INFO org.apache.hadoop.mapred.TaskTracker: Shutting down: Map-events fetcher for all reduce tasks on tracker_nia-dev15.eng.narus.com:localhost.localdomain/127.0.0.1:46400
2012-04-16 06:36:43,712 INFO org.apache.hadoop.ipc.Server: Stopping server on 46400
2012-04-16 06:36:43,712 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 46400: exiting
2012-04-16 06:36:43,712 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 46400: exiting
.....
.....
2012-04-16 06:36:43,716 INFO org.apache.hadoop.ipc.Server: IPC Server handler 20 on 46400: exiting
2012-04-16 06:36:43,716 INFO org.apache.hadoop.ipc.Server: IPC Server handler 32 on 46400: exiting
2012-04-16 06:36:43,716 INFO org.apache.hadoop.ipc.Server: IPC Server handler 29 on 46400: exiting
2012-04-16 06:36:43,820 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG:
/*********
SHUTDOWN_MSG: Shutting down TaskTracker at nia-dev15/172.31.3.158
*********/
Job Tracker Log:

2012-04-16 06:36:43,603 INFO org.apache.hadoop.mapred.JobTracker: Creating a recovery entry for tasktracker: nia-dev15.eng.narus.com 2012-04-16 06:36:43,610 INFO org.apache.hadoop.mapred.JobTracker: Adding tracker tracker_nia-dev15.eng.narus.com:localhost.localdomain/127.0.0.1:46400 to host nia-dev15.eng.narus.com 2012-04-16 06:36:43,611 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9001, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@79f1d448, true, true, true, -1) from 172.31.3.158:41371: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.mapred.SchedulingAlgorithms$FairShareComparator.compare(SchedulingAlgorithms.java:95) at org.apache.hadoop.mapred.SchedulingAlgorithms$FairShareComparator.compare(SchedulingAlgorithms.java:68) at java.util.Arrays.mergeSort(Arrays.java:1270) at java.util.Arrays.sort(Arrays.java:1210) at java.util.Collections.sort(Collections.java:159) at org.apache.hadoop.mapred.FairScheduler.assignTasks(FairScheduler.java:629) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3763) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:964) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1318) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1314) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1312)

asked 16 Apr '12, 01:54

Ashish's gravatar image

Ashish
1223
accept rate: 0%

edited 20 Nov '12, 18:16

Nabeel's gravatar image

Nabeel ♦♦
2.2k147


Did you recently do any modification on the scheduler parameters on pools.xml or any related configuration file ?

link

answered 16 Apr '12, 05:27

Nabeel's gravatar image

Nabeel ♦♦
2.2k147
accept rate: 24%

No, We are using boxed version of MapR (Build version# 1.2.3.12961.GA) & no changes to any configuration

link

answered 17 Apr '12, 04:17

Ashish's gravatar image

Ashish
1223
accept rate: 0%

How many JT and TT nodes do you have in the cluster ? Do all the Tasktrackers die or is it just one particular node ? Does restarting the TT work ?

(17 Apr '12, 04:31) Nabeel ♦♦

One JT & two TT, yes all Tasktrackers die. Restarting TT didn't help, TT went down again in few seconds.

link

answered 17 Apr '12, 07:45

Ashish's gravatar image

Ashish
1223
accept rate: 0%

can you post your pools.xml and mapred-site.xml here ?

(17 Apr '12, 08:16) Nabeel ♦♦

pools.xml:

<allocations></allocations>

mapred-site.xml:

mapred.job.tracker maprfs:/// mapred.local.dir /tmp/mapr-hadoop/mapred/local

webinterface.private.actions true

mapred.jobtracker.port 9001

mapreduce.tasktracker.outofband.heartbeat false

mapred.system.dir /var/mapr/cluster/mapred/jobTracker/system

mapred.job.tracker.persist.jobstatus.dir /var/mapr/cluster/mapred/jobTracker/jobsInfo

mapreduce.jobtracker.staging.root.dir /var/mapr/cluster/mapred/jobTracker/staging

mapreduce.job.split.metainfo.maxsize 10000000

mapred.jobtracker.retiredjobs.cache.size 1000

mapred.jobtracker.completeuserjobs.maximum 5

mapred.job.tracker.history.completed.location /var/mapr/cluster/mapred/jobTracker/history/done

hadoop.job.history.location

mapred.jobtracker.jobhistory.lru.cache.size 5

mapreduce.jobtracker.recovery.dir /var/mapr/cluster/mapred/jobTracker/recovery

mapreduce.jobtracker.recovery.maxtime 480

mapred.jobtracker.restart.recover true

mapred.fairscheduler.allocation.file

mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.FairScheduler

mapred.fairscheduler.assignmultiple true

mapred.fairscheduler.eventlog.enabled false

mapred.fairscheduler.smalljob.schedule.enable true

mapred.fairscheduler.smalljob.max.maps 10

mapred.fairscheduler.smalljob.max.reducers 10

mapred.fairscheduler.smalljob.max.inputsize 10737418240

mapred.fairscheduler.smalljob.max.reducer.inputsize 1073741824

mapred.cluster.ephemeral.tasks.memory.limit.mb 200

mapred.tasktracker.map.tasks.maximum (CPUS > 2) ? (CPUS * 0.75) : 1

mapreduce.tasktracker.prefetch.maptasks 0.5

mapred.tasktracker.reduce.tasks.maximum (CPUS > 2) ? (CPUS * 0.50): 1

mapred.tasktracker.ephemeral.tasks.maximum 1

mapred.tasktracker.ephemeral.tasks.timeout 10000

mapred.tasktracker.ephemeral.tasks.ulimit 4294967296>

mapreduce.tasktracker.reserved.physicalmemory.mb

mapreduce.tasktracker.heapbased.memory.management false

mapred.tasktracker.taskmemorymanager.killtask.maxRSS false

mapreduce.tasktracker.reserved.physicalmemory.mb.low 0.80

mapreduce.tasktracker.jvm.idle.time 10000

mapred.task.tracker.task-controller org.apache.hadoop.mapred.LinuxTaskController

mapred.tasktracker.task-controller.config.overwrite true

mapreduce.tasktracker.group root

mapreduce.cluster.map.userlog.retain-size -1

mapreduce.cluster.reduce.userlog.retain-size -1

mapreduce.tasktracker.task.slowlaunch false

keep.failed.task.files false

mapred.job.reuse.jvm.num.tasks -1

mapred.map.tasks.speculative.execution true

mapred.reduce.tasks.speculative.execution true

mapred.job.map.memory.physical.mb

mapred.job.reduce.memory.physical.mb

mapreduce.task.classpath.user.precedence false

mapred.map.child.java.opts -XX:ErrorFile=/opt/cores/mapreducejavaerror%p.log

mapred.reduce.child.java.opts -XX:ErrorFile=/opt/cores/mapreducejavaerror%p.log

mapred.map.child.env

mapred.reduce.child.env

mapred.map.child.ulimit

mapred.reduce.child.ulimit

io.sort.mb 100

io.sort.factor 256

io.sort.record.percent 0.17

mapred.reduce.slowstart.completed.maps 0.95

mapreduce.reduce.input.limit -1

mapred.reduce.parallel.copies 12

hadoop.proxyuser.root.hosts *

hadoop.proxyuser.root.groups root

link

answered 18 Apr '12, 02:30

Ashish's gravatar image

Ashish
1223
accept rate: 0%

edited 18 Apr '12, 02:33

I've the same problem with the TaskTracker. The Cluster run well, but after 1-2 hours all TaskTrackers goes down.
Which java-distribution did you use?
Sun-Java or any ohter Java-distribution?

link

answered 20 Apr '12, 02:04

Frank's gravatar image

Frank
12
accept rate: 0%

edited 23 Apr '12, 06:37

Hi Ashish,

I believe you may be hitting Apache bug MAPREDUCE-3674. I'll open a support case and contact you through there to get this issue resolved.

Best Regards, Aaron Eng

link

answered 23 Apr '12, 10:08

Aaron%20Eng's gravatar image

Aaron Eng ♦♦
2923
accept rate: 11%

When I google for this NPE. I am getting exactly two hits: this question on the MapR forum and the JIRA MAPREDUCE-3674. The question is whether you have tried to browse to "jobqueue_details.jsp manually" as well, prior all TT's are knocked down.

link

answered 23 Apr '12, 10:11

gera's gravatar image

gera
1.3k16
accept rate: 21%

edited 25 Apr '12, 13:07

Hi Ashish,

Still figuring out how to look up your email adress here, but in the mean time, if you want to email me at aeng@maprtech.com I can get back to you with more details immediately.

Best Regards, Aaron Eng

link

answered 23 Apr '12, 11:27

Aaron%20Eng's gravatar image

Aaron Eng ♦♦
2923
accept rate: 11%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×1
×1

Asked: 16 Apr '12, 01:54

Seen: 875 times

Last updated: 20 Nov '12, 18:16

Related questions

powered by OSQA