I wanted to avoid exporting output files from a map-reduce program so I set mapred.output.dir to an file:// location that points to an NFS mounted partition.
Unfortunately, this means that the output files are owned by the hadoop user (using CDH or stock Hadoop). What can I do?
I would like the resulting output dir to belong to "user" so that downstream processing works correctly. I can't run all jobs as the hadoop user.
asked 22 Jun '11, 08:41
This situation doesn't even apply with MapR because you can just use the standard Hadoop output mechanisms to create the files in maprfs and then access them via NFS. This preserves ownership in the way that you would like and provides the access that you would like while maintaining the speed of normal map-reduce and distribution file system (maprfs in this case). This strategy of using NFS to access map-reduce output also allows you to use traditional Linux, Mac or Windows tools running on single nodes where such simpler tools suffice. No export or import is needed in these cases.
answered 30 Jun '11, 18:38