I got NumberFormatException when I run mahout-job(org.apache.mahout.cf.taste.hadoop.item.RecommenderJob) on MapR. I used mahout-core-0.5-job.jar. When "mapred.reduce.tasks" is default, it runs well, but setting more than 1, I got the exception. On apache hadoop, I can get result file.

Are there any solutions?

asked 23 Aug '11, 12:39

yuji's gravatar image

yuji
1334
accept rate: 0%

In org.apache.mahout.cf.taste.hadoop.TasteHadoopUtils class, readIntFromFile(Configuration conf, Path outputDir) method, fs.listStatus(outputDir, PathFilters.partFilter())[0].getPath() returns maprfs://host:port/tmpdir/countUsers/part-r-00004. part-r-00004 is empty, part-r-00000 is collect file. I guess apache hadoop returns part-r-00000.

(23 Aug '11, 12:42) yuji

What happened here is that the values deduced from the files were not deterministic except in the case of a single reducer. The trunk version avoids this issue entirely by looking at counters from the hadoop job. This is a much preferable approach.

Mahout is still not entirely stable, even in the more stable recommender code. I think that trunk is significantly better than 0.5. In fact, we have started the process of finishing off the final bugs for the 0.6 release which should happen within a month or so.

I would recommend moving to the trunk version. You should be able to simply substitute the mahout jar file.

link

answered 23 Aug '11, 16:03

TedDunning's gravatar image

TedDunning ♦♦
2.4k315
accept rate: 28%

edited 23 Aug '11, 16:03

I got the detail. Thank you.

(23 Aug '11, 20:39) yuji

The ordering of the result files should not matter to the recommender as long as the names are correct.

Can you paste in a stack trace?

link

answered 23 Aug '11, 13:20

TedDunning's gravatar image

TedDunning ♦♦
2.4k315
accept rate: 28%

There were some race conditions in Mahout 0.5 that were not accounted for properly. I don't remember anything that sounded like what you describe.

Is it possible to try your test on the trunk version of Mahout instead of 0.5? I note that readIntFromFile doesn't even exist in TasteUtils any more.

link

answered 23 Aug '11, 13:29

TedDunning's gravatar image

TedDunning ♦♦
2.4k315
accept rate: 28%

Stack trace is below.

Exception in thread "main" java.lang.NumberFormatException: For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:470) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.mahout.cf.taste.hadoop.TasteHadoopUtils.readIntFromFile(TasteHadoopUtils.java:93) at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:215) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:333) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186)

mapred.map.tasks = 5 Temporary countUsers files list. This lists are same as apache hadoop. -rw-r--r-- 3 mapr supergroup 7 2011-08-24 07:22 /mahout/user_based/tmp/countUsers/part-r-00000 -rw-r--r-- 3 mapr supergroup 0 2011-08-24 07:22 /mahout/user_based/tmp/countUsers/part-r-00001 -rw-r--r-- 3 mapr supergroup 0 2011-08-24 07:22 /mahout/user_based/tmp/countUsers/part-r-00002 -rw-r--r-- 3 mapr supergroup 0 2011-08-24 07:22 /mahout/user_based/tmp/countUsers/part-r-00003 -rw-r--r-- 3 mapr supergroup 0 2011-08-24 07:22 /mahout/user_based/tmp/countUsers/part-r-00004

That's sure. The code has changed. 0.5 int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath);

trunk numberOfUsers = (int) toUserVector.getCounters().findCounter(ToUserVectorReducer.Counters.USERS).getValue();

I've not tried trunk's RecommenderJob yet, but do I have no choice without changing mahout jar-file from 0.5?

thanks.

link

answered 23 Aug '11, 15:42

yuji's gravatar image

yuji
1334
accept rate: 0%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×5

Asked: 23 Aug '11, 12:39

Seen: 751 times

Last updated: 23 Aug '11, 20:39

powered by OSQA