What does 100% compatible mean? Can I use my existing packaged Java applications and simply give them a new file system location or do I have to change my application dependencies to use a MapR JAR? Per your other answers, it seems like the latter. If so, is the required JAR in a maven repo?
MapR supports the full API specified in the Apache Hadoop 0.20.2. This means you do not need to recompile your existing applications that leverage Hadoop 0.20.2. The API calls themselves are identical.
However, the actual work thats done when those API calls are issued is different. You must load the MapR Hadoop jar in your application instead of loading the Apache or Cloudera Hadoop jars for instance.
In addition, whereas with Apache and Cloudera hadoop, you would specify something like: hdfs://namenode:8020
On MapR, you should refer to the MapR cluster as: maprfs:///
OR as maprfs:///mapr/<clustername>
When you refer to the MapR system using the maprfs:/// URI, it invokes our file system protocol and when you launch your application, as long as they load in our Hadoop library then the special URI will work just fine and you will be able to interact with the MapR cluster using the same API calls as with Apache/Cloudera Hadoop.
Regarding how to get the MapR Hadoop library, you will find our jars under /opt/mapr/hadoop/hadoop-0.20.2/lib
You can get those installed on a client machine through either our "mapr-client" or "mapr-core" packages. You can install either on your client machine and it will get the MapR libraries as well as the "hadoop" command and basic configurations.
Please let me know if that answers your questions.
answered 05 Jul '11, 08:53
I would recommend you install mapr-client on the node from where you are trying to submit your job (or you can do it on any node where mapr-tasktracker is installed) and just submit job there in regular fashion (.../bin/hadoop jar <your_jar> <your job="" params="">) Here is a link to documentation regarding mapr-client installation and configuration: http://mapr.com/doc/display/MapR/Setting+Up+the+Client This way everything is preconfigured and your job should just run (unless for some reason you create jar-with-dependencies and it includes hadoop* jars from standard distribution). Also please let us know if you are talking about running pig and/or hive jobs
answered 05 Jul '11, 08:55
Does anybody know where is the maven repository for the mapr jar files ? It would be nice if mapr can publish their custom jar file in the central maven repository http://mvnrepository.com/ so that everybody can share that. Otherwise, every developer of mapr distribution has to set up his own private maven repository.
answered 21 Jul '11, 16:59