We generated a million files in a single directory. When we attempted to grab a few of them with the Hadoop FS command, the command ran for a while but then just started erroring out over and over with:

#hadoop fs -copyToLocal /path/to/file/filePrefix-* ~/output/ 2012-02-11 17:39:18,4121 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:479 Thread: 140457850164992 createMapRFileStatus failed, could not get gidstr for gid 0 2012-02-11 17:39:22,2184 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:1402 Thread: 140457850164992 readdirplus failed, could not create MapRFileStatus object for [filename]

This appears to be an out of memory error. We had to kill -9 the process.

Mounting the same directory via NFS was slow, but did ultimately copy the files.

asked 11 Feb, 18:28

jacques's gravatar image

jacques
184233038
accept rate: 50%

edited 11 Feb, 18:30


I think the problem is the Hadoop shell tries to suck in all the filenames and sort them before copying them.

link

answered 11 Feb, 22:56

MC%20Srivas's gravatar image

MC Srivas ♦♦
1.9k215
accept rate: 39%

Having not looked at the code, I assumed that it would have pushed the prefix down to a PathFilter which the FileSystem implementation would have handled. I suppose either way this isn't an edge case that someone with a NameNode would be running across.

(12 Feb, 11:01) jacques
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×2

Asked: 11 Feb, 18:28

Seen: 143 times

Last updated: 12 Feb, 11:01

powered by OSQA