first off, mapr is running like a champ. loving it.

we've got a directory with multiple sub-directories with log files that we'd like to run in a hadoop streaming job. it seems that if we pass in the parent directory for the -input param, it doesn't "know" to look in the sub-dir for the log files. so our gnarly work-around is to pass in each sub-dir as an -input. curious if there is a graceful workaround that could involve passing another flag to look in the sub-dirs recursively for content.

help?

asked 20 Sep '11, 16:59

niyogi's gravatar image

niyogi
1444
accept rate: 0%


Check out the FileSystem method globStatus(). This will work for java map-reduce jobs. For streaming jobs, standard linux globbing should work.

link

answered 20 Sep '11, 18:27

steven's gravatar image

steven
32213
accept rate: 19%

edited 20 Sep '11, 18:52

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×85
×3

Asked: 20 Sep '11, 16:59

Seen: 907 times

Last updated: 20 Sep '11, 18:52

powered by OSQA