|
I got NumberFormatException when I run mahout-job(org.apache.mahout.cf.taste.hadoop.item.RecommenderJob) on MapR. I used mahout-core-0.5-job.jar. When "mapred.reduce.tasks" is default, it runs well, but setting more than 1, I got the exception. On apache hadoop, I can get result file. Are there any solutions? |
|
What happened here is that the values deduced from the files were not deterministic except in the case of a single reducer. The trunk version avoids this issue entirely by looking at counters from the hadoop job. This is a much preferable approach. Mahout is still not entirely stable, even in the more stable recommender code. I think that trunk is significantly better than 0.5. In fact, we have started the process of finishing off the final bugs for the 0.6 release which should happen within a month or so. I would recommend moving to the trunk version. You should be able to simply substitute the mahout jar file. I got the detail. Thank you.
(23 Aug '11, 20:39)
yuji
|
|
The ordering of the result files should not matter to the recommender as long as the names are correct. Can you paste in a stack trace? |
|
There were some race conditions in Mahout 0.5 that were not accounted for properly. I don't remember anything that sounded like what you describe. Is it possible to try your test on the trunk version of Mahout instead of 0.5? I note that readIntFromFile doesn't even exist in TasteUtils any more. |
|
Stack trace is below. Exception in thread "main" java.lang.NumberFormatException: For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:470) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.mahout.cf.taste.hadoop.TasteHadoopUtils.readIntFromFile(TasteHadoopUtils.java:93) at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:215) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:333) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) mapred.map.tasks = 5 Temporary countUsers files list. This lists are same as apache hadoop. -rw-r--r-- 3 mapr supergroup 7 2011-08-24 07:22 /mahout/user_based/tmp/countUsers/part-r-00000 -rw-r--r-- 3 mapr supergroup 0 2011-08-24 07:22 /mahout/user_based/tmp/countUsers/part-r-00001 -rw-r--r-- 3 mapr supergroup 0 2011-08-24 07:22 /mahout/user_based/tmp/countUsers/part-r-00002 -rw-r--r-- 3 mapr supergroup 0 2011-08-24 07:22 /mahout/user_based/tmp/countUsers/part-r-00003 -rw-r--r-- 3 mapr supergroup 0 2011-08-24 07:22 /mahout/user_based/tmp/countUsers/part-r-00004 That's sure. The code has changed. 0.5 int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), countUsersPath); trunk numberOfUsers = (int) toUserVector.getCounters().findCounter(ToUserVectorReducer.Counters.USERS).getValue(); I've not tried trunk's RecommenderJob yet, but do I have no choice without changing mahout jar-file from 0.5? thanks. |
In org.apache.mahout.cf.taste.hadoop.TasteHadoopUtils class, readIntFromFile(Configuration conf, Path outputDir) method, fs.listStatus(outputDir, PathFilters.partFilter())[0].getPath() returns maprfs://host:port/tmpdir/countUsers/part-r-00004. part-r-00004 is empty, part-r-00000 is collect file. I guess apache hadoop returns part-r-00000.