|
The cluster was running all fine, all of sudden all task trackers went down. Task Tracker Log: 2012-04-16 06:36:43,602 ERROR org.apache.hadoop.mapred.TaskTracker: Failed to send heartbeat to jobTracker. java.io.IOException: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException
at org.apache.hadoop.mapred.SchedulingAlgorithms$FairShareComparator.compare(SchedulingAlgorithms.java:95)
at org.apache.hadoop.mapred.SchedulingAlgorithms$FairShareComparator.compare(SchedulingAlgorithms.java:68)
at java.util.Arrays.mergeSort(Arrays.java:1270)
at java.util.Arrays.sort(Arrays.java:1210)
at java.util.Collections.sort(Collections.java:159)
at org.apache.hadoop.mapred.FairScheduler.assignTasks(FairScheduler.java:629)
at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3763)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:964)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1318)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1314)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1312)
. Exiting...
2012-04-16 06:36:43,607 INFO org.apache.hadoop.util.AsyncDiskService: Shutting down all AsyncDiskService threads...
2012-04-16 06:36:43,608 INFO org.apache.hadoop.util.AsyncDiskService: All AsyncDiskService threads are terminated.
2012-04-16 06:36:43,608 INFO org.apache.hadoop.util.MRAsyncDiskService: Deleting toBeDeleted directory.
2012-04-16 06:36:43,609 INFO org.apache.hadoop.mapred.TaskTracker: Shutting down: Map-events fetcher for all reduce tasks on tracker_nia-dev15.eng.narus.com:localhost.localdomain/127.0.0.1:46400
2012-04-16 06:36:43,712 INFO org.apache.hadoop.ipc.Server: Stopping server on 46400
2012-04-16 06:36:43,712 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 46400: exiting
2012-04-16 06:36:43,712 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 46400: exiting
.....
.....
2012-04-16 06:36:43,716 INFO org.apache.hadoop.ipc.Server: IPC Server handler 20 on 46400: exiting
2012-04-16 06:36:43,716 INFO org.apache.hadoop.ipc.Server: IPC Server handler 32 on 46400: exiting
2012-04-16 06:36:43,716 INFO org.apache.hadoop.ipc.Server: IPC Server handler 29 on 46400: exiting
2012-04-16 06:36:43,820 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG:
/*********
SHUTDOWN_MSG: Shutting down TaskTracker at nia-dev15/172.31.3.158
*********/
Job Tracker Log:
|
|
No, We are using boxed version of MapR (Build version# 1.2.3.12961.GA) & no changes to any configuration How many JT and TT nodes do you have in the cluster ? Do all the Tasktrackers die or is it just one particular node ? Does restarting the TT work ?
(17 Apr '12, 04:31)
Nabeel ♦♦
|
|
pools.xml: <allocations></allocations> mapred-site.xml: mapred.job.tracker maprfs:/// mapred.local.dir /tmp/mapr-hadoop/mapred/local webinterface.private.actions true mapred.jobtracker.port 9001 mapreduce.tasktracker.outofband.heartbeat false mapred.system.dir /var/mapr/cluster/mapred/jobTracker/system mapred.job.tracker.persist.jobstatus.dir /var/mapr/cluster/mapred/jobTracker/jobsInfo mapreduce.jobtracker.staging.root.dir /var/mapr/cluster/mapred/jobTracker/staging mapreduce.job.split.metainfo.maxsize 10000000 mapred.jobtracker.retiredjobs.cache.size 1000 mapred.jobtracker.completeuserjobs.maximum 5 mapred.job.tracker.history.completed.location /var/mapr/cluster/mapred/jobTracker/history/done hadoop.job.history.location mapred.jobtracker.jobhistory.lru.cache.size 5 mapreduce.jobtracker.recovery.dir /var/mapr/cluster/mapred/jobTracker/recovery mapreduce.jobtracker.recovery.maxtime 480 mapred.jobtracker.restart.recover true mapred.fairscheduler.allocation.file mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.FairScheduler mapred.fairscheduler.assignmultiple true mapred.fairscheduler.eventlog.enabled false mapred.fairscheduler.smalljob.schedule.enable true mapred.fairscheduler.smalljob.max.maps 10 mapred.fairscheduler.smalljob.max.reducers 10 mapred.fairscheduler.smalljob.max.inputsize 10737418240 mapred.fairscheduler.smalljob.max.reducer.inputsize 1073741824 mapred.cluster.ephemeral.tasks.memory.limit.mb 200 mapred.tasktracker.map.tasks.maximum (CPUS > 2) ? (CPUS * 0.75) : 1 mapreduce.tasktracker.prefetch.maptasks 0.5 mapred.tasktracker.reduce.tasks.maximum (CPUS > 2) ? (CPUS * 0.50): 1 mapred.tasktracker.ephemeral.tasks.maximum 1 mapred.tasktracker.ephemeral.tasks.timeout 10000 mapred.tasktracker.ephemeral.tasks.ulimit 4294967296> mapreduce.tasktracker.reserved.physicalmemory.mb mapreduce.tasktracker.heapbased.memory.management false mapred.tasktracker.taskmemorymanager.killtask.maxRSS false mapreduce.tasktracker.reserved.physicalmemory.mb.low 0.80 mapreduce.tasktracker.jvm.idle.time 10000 mapred.task.tracker.task-controller org.apache.hadoop.mapred.LinuxTaskController mapred.tasktracker.task-controller.config.overwrite true mapreduce.tasktracker.group root mapreduce.cluster.map.userlog.retain-size -1 mapreduce.cluster.reduce.userlog.retain-size -1 mapreduce.tasktracker.task.slowlaunch false keep.failed.task.files false mapred.job.reuse.jvm.num.tasks -1 mapred.map.tasks.speculative.execution true mapred.reduce.tasks.speculative.execution true mapred.job.map.memory.physical.mb mapred.job.reduce.memory.physical.mb mapreduce.task.classpath.user.precedence false mapred.map.child.java.opts -XX:ErrorFile=/opt/cores/mapreducejavaerror%p.log mapred.reduce.child.java.opts -XX:ErrorFile=/opt/cores/mapreducejavaerror%p.log mapred.map.child.env mapred.reduce.child.env mapred.map.child.ulimit mapred.reduce.child.ulimit io.sort.mb 100 io.sort.factor 256 io.sort.record.percent 0.17 mapred.reduce.slowstart.completed.maps 0.95 mapreduce.reduce.input.limit -1 mapred.reduce.parallel.copies 12 hadoop.proxyuser.root.hosts * hadoop.proxyuser.root.groups root |
|
I've the same problem with the TaskTracker. The Cluster run well, but after 1-2 hours all TaskTrackers goes down. |
|
Hi Ashish, I believe you may be hitting Apache bug MAPREDUCE-3674. I'll open a support case and contact you through there to get this issue resolved. Best Regards, Aaron Eng |
|
When I google for this NPE. I am getting exactly two hits: this question on the MapR forum and the JIRA MAPREDUCE-3674. The question is whether you have tried to browse to "jobqueue_details.jsp manually" as well, prior all TT's are knocked down. |
|
Hi Ashish, Still figuring out how to look up your email adress here, but in the mean time, if you want to email me at aeng@maprtech.com I can get back to you with more details immediately. Best Regards, Aaron Eng |