At times I see some of the tasks being failed due to OutOfMemoryError with a stack trace in syslogs as-

2012-02-13 08:09:42,003 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) at com.mapr.fs.MapRFsInStream.getCacheSize(MapRFsInStream.java:64) at com.mapr.fs.Inode.<init>(Inode.java:112) at com.mapr.fs.MapRFsInStream.<init>(MapRFsInStream.java:36) at com.mapr.fs.MapRClient.open(MapRClient.java:191) at com.mapr.fs.MapRFileSystem.open(MapRFileSystem.java:307) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:460) at org.apache.hadoop.mapred.Merger$Segment.init(Merger.java:204) at org.apache.hadoop.mapred.Merger$Segment.access$100(Merger.java:165) at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:444) at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:407) at org.apache.hadoop.mapred.Merger.merge(Merger.java:77) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergePerPartition(MapTask.java:1885) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1312) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:589) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:656) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109) at org.apache.hadoop.mapred.Child.main(Child.java:264)

I would expect such tasks to be killed and retried. Strangely sometimes such tasks are not killed and stderr logs will have something like-

Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException: Problem signalling task 32039 with KILL; exit = 6 at org.apache.hadoop.mapred.LinuxTaskController.signalTask(LinuxTaskController.java:339) at org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.kill(JvmManager.java:704) at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvmRunner(JvmManager.java:351) at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvm(JvmManager.java:330) at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.taskKilled(JvmManager.java:321) at org.apache.hadoop.mapred.JvmManager.taskKilled(JvmManager.java:170) at org.apache.hadoop.mapred.TaskRunner.kill(TaskRunner.java:810) at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:4474) at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.jobHasFinished(TaskTracker.java:4446) at org.apache.hadoop.mapred.TaskTracker.purgeTask(TaskTracker.java:3353) at org.apache.hadoop.mapred.TaskTracker.fatalError(TaskTracker.java:4878) at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:964) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1318) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1314) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1312) at org.apache.hadoop.ipc.Client.call(Client.java:1071) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:275) at $Proxy0.fatalError(Unknown Source) at org.apache.hadoop.mapred.Child.main(Child.java:324)

What could be wrong here?

asked 13 Feb, 09:28

Vinod%20Singh's gravatar image

Vinod Singh
1124
accept rate: 0%

edited 13 Feb, 09:29


This is SETUID_OPER_FAILED error code when hadoop TaskController was trying to kill the task.

6 SETUID_OPER_FAILED Either could not read the local groups database, or could not set UID or GID [https://ccp.cloudera.com/display/CDHDOC/Appendix+E+-+Task-controller+Error+Codes]

link

answered 16 Feb, 14:19

gera's gravatar image

gera
511
accept rate: 25%

We ran into this same error. The SETUID_OPER_FAILED error code is coming from the setuid call (http://linux.die.net/man/2/setuid) from the Task Controller. We were getting an EAGAIN (Resource temporarily unavailable) error which pointed to being over the RLIMIT_NPROC resource limit for that user. We had to up our ulimit -u settings on each node to fix it.

link

answered 02 Mar, 14:44

shawnnussbaum's gravatar image

shawnnussbaum
1
accept rate: 0%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×1

Asked: 13 Feb, 09:28

Seen: 149 times

Last updated: 02 Mar, 14:44

Related questions

powered by OSQA