|
We generated a million files in a single directory. When we attempted to grab a few of them with the Hadoop FS command, the command ran for a while but then just started erroring out over and over with:
This appears to be an out of memory error. We had to kill -9 the process. Mounting the same directory via NFS was slow, but did ultimately copy the files. |
|
I think the problem is the Hadoop shell tries to suck in all the filenames and sort them before copying them. Having not looked at the code, I assumed that it would have pushed the prefix down to a PathFilter which the FileSystem implementation would have handled. I suppose either way this isn't an edge case that someone with a NameNode would be running across.
(12 Feb, 11:01)
jacques
|