|
first off, mapr is running like a champ. loving it. we've got a directory with multiple sub-directories with log files that we'd like to run in a hadoop streaming job. it seems that if we pass in the parent directory for the -input param, it doesn't "know" to look in the sub-dir for the log files. so our gnarly work-around is to pass in each sub-dir as an -input. curious if there is a graceful workaround that could involve passing another flag to look in the sub-dirs recursively for content. help? |
|
Check out the FileSystem method globStatus(). This will work for java map-reduce jobs. For streaming jobs, standard linux globbing should work. |