|
Hello, I am having problem running hadoop test job(s) on a newly configured system. I am new to hadoop and this is my first setup using mapr; thus it is very likely that I am missing something "basic" here... My setup:
HOWEVER, when I try to run some of the examples from "hadoop-*-test.jar", the job fails "ConnectionLost.. Reconnecting.." errors (see below). Any ideas what I am missing here? I am not even sure of the nature of the problem in the first place. Is this something related to "incorrect/incomplete configuration" of the client? Or, is this something related to networking (however both cluster and client can "talk/ping" each other). Is there some other "missing service" (zookeeper?) that needs to be running? Is there some other info (log files, config files) I should provide you? Any ideas are appreciated.. need help :) Thanks, Marek [marek@mapr-client ~]$ hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -write -nrFiles 10 TestDFSIO.0.0.4 12/02/29 17:29:18 INFO fs.TestDFSIO: nrFiles = 10 12/02/29 17:29:18 INFO fs.TestDFSIO: fileSize (MB) = 1.0 12/02/29 17:29:18 INFO fs.TestDFSIO: bufferSize = 1000000 12/02/29 17:29:19 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFS 12/02/29 17:29:19 INFO fs.TestDFSIO: creating control file: 1048576 bytes, 10 files 12/02/29 17:29:19 INFO fs.TestDFSIO: created control files for: 10 files 12/02/29 17:29:19 INFO fs.JobTrackerWatcher: findJobTrackerAddr: ConnectionLost, Reconnecting... Current ZooKeeper Server: localhost:5181 12/02/29 17:29:20 ERROR fs.MapRClient: Retrying...Fetching new Zookeeper locations from CLDB. Attempt #1 12/02/29 17:29:22 INFO fs.JobTrackerWatcher: findJobTrackerAddr: ConnectionLost, Reconnecting... Current ZooKeeper Server: localhost:5181 12/02/29 17:29:22 ERROR fs.MapRClient: Retrying...Fetching new Zookeeper locations from CLDB. Attempt #2 ... |
|
I have a feeling that on your mapr-desktop Zookeeper host in all configurations goes as "localhost" and when you are trying to connect to it from mapr-client it definitely can not connect to it on "localhost" as it is not local to mapr-client. Please double check it and if needed rerun configure.sh on mapr-desktop with -Z mapr-desktop -C mapr-desktop You may also find that mapr-desktop resolves to 127.0.0.1 on mapr-desktop. You should see a message about that if so. In that event, the simplest fix is to use the desired IP address instead. Fixing the resolution of the name is nice, but not entirely necessary.
(29 Feb '12, 19:21)
TedDunning ♦♦
I believe resolving names into IPs works fine on both server (mapr-desktop=10.10.0.243) and client (mapr-client=10.10.0.232). root@mapr-desktop:~# ping mapr-desktop root@mapr-desktop:~# ping mapr-client [marek@mapr-client ~]$ ping mapr-client [marek@mapr-client ~]$ ping mapr-desktop
(01 Mar '12, 13:45)
Marek
Could you paste here /opt/mapr/conf/cldb.conf from your "server" (mapr-desktop")?
(01 Mar '12, 13:51)
yufeldman ♦♦
root@mapr-desktop:~ # cat /opt/mapr/conf/cldb.conf
(01 Mar '12, 14:05)
Marek
|
|
Following the advice, I re-run configuration of the server (mapr-desktop), but it did NOT help. root@mapr-desktop:~ # /opt/mapr/server/configure.sh -N MyCluster -C mapr-desktop -Z mapr-desktop root@mapr-desktop:~ # cat /opt/mapr/logs/configure.log root@mapr-desktop:~ # cat /opt/mapr/conf/mapr-clusters.conf root@mapr-desktop:~ # /etc/init.d/mapr-zookeeper status root@mapr-desktop:~ # cat /opt/mapr/zookeeper/zookeeper-3.3.2/conf/zoo.cfg root@mapr-desktop:~ # cat /opt/mapr/conf/cldb.conf ======== b) "cldb.zookeeper.servers=mapr-desktop:5181" was set in the cldb.conf c) However, runming the test job (on the client) still fails with "ConnectionLost, Reconnecting": d) After running the test, there are 10 new "control(?) files" created on the server: running -clean command removes these files: e) Another "random" observation is that timestamps between both systems (client & server) is a few hour off. I can/will fix it, but could it matter in this case? So, something is working? But, the test "hadoop-*-test.jar TestDFSIO" job does not complete. Again, it looks to me like my client CAN "talk" to the server, but the job fails. Please advice. An update: maybe this will give a clue... a) modified /etc/hosts file on the server
(02 Mar '12, 14:04)
Marek
b) now running the same test:
(02 Mar '12, 14:05)
Marek
Please notice that the old error: changed to However, the problem moved further: ANY IDEAS about this?
Please be patient with me :) I'm new to hadoop/MapR
(02 Mar '12, 14:09)
Marek
1
The problem here is that it is looking for the CLDB at 127.0.0.1. This is incorrect. You need to figure out why it is looking there. Can you check /opt/mapr/conf/mapr-clusters.conf and make sure that the correct IP/hostname is listed for the cldb node. If mapr-clusters.conf uses a hostname, make sure that it resolves correctly. Are you still able to run hadoop fs -ls from the client? Can you successfully submit jobs directly on the server?
(02 Mar '12, 14:46)
steven
Steven, a) Yes, indeed the mapr-clusters.conf on the server did not look ok:
(02 Mar '12, 15:00)
Marek
b) I removed the "my.cluster.com 127.0.0.1:7222" line, and now I get much further: [marek@mapr-client ~]$ hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -write -nrFiles 10 This looks like some problem with permissions, right?
(02 Mar '12, 15:02)
Marek
showing 5 of 6
show all
|