Hello,

I am having problem running hadoop test job(s) on a newly configured system. I am new to hadoop and this is my first setup using mapr; thus it is very likely that I am missing something "basic" here...

My setup:

  1. I am using "MapR Virtual Machine" (M3 Demo VM) as the "test cluster". I can access the cluster via https://mapr-desktop:8443 just fine, all services seems to be running ok, etc.

  2. I created another vmware machine to be used as a "client"; running CentOS-6.2 & Java 1.7.0. Installed "mapr-client" from repo, and run the "configure.sh" (/opt/mapr/server/configure.sh -N MyCluster -c -C mapr-desktop:7222)

  3. Running really basic commands on the client (such as "hadoop fs -ls") seems to be ok. To prepare for the tests I created "/myvolume" on MapR-FS and copied some data files there ("hadoop fs -copyFromLocal bigfile.txt /myvolume/in"). So far everything was ok.

HOWEVER, when I try to run some of the examples from "hadoop-*-test.jar", the job fails "ConnectionLost.. Reconnecting.." errors (see below).

Any ideas what I am missing here? I am not even sure of the nature of the problem in the first place. Is this something related to "incorrect/incomplete configuration" of the client? Or, is this something related to networking (however both cluster and client can "talk/ping" each other). Is there some other "missing service" (zookeeper?) that needs to be running?

Is there some other info (log files, config files) I should provide you? Any ideas are appreciated.. need help :)

Thanks, Marek


[marek@mapr-client ~]$ hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -write -nrFiles 10 TestDFSIO.0.0.4

12/02/29 17:29:18 INFO fs.TestDFSIO: nrFiles = 10

12/02/29 17:29:18 INFO fs.TestDFSIO: fileSize (MB) = 1.0

12/02/29 17:29:18 INFO fs.TestDFSIO: bufferSize = 1000000

12/02/29 17:29:19 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFS

12/02/29 17:29:19 INFO fs.TestDFSIO: creating control file: 1048576 bytes, 10 files

12/02/29 17:29:19 INFO fs.TestDFSIO: created control files for: 10 files

12/02/29 17:29:19 INFO fs.JobTrackerWatcher: findJobTrackerAddr: ConnectionLost, Reconnecting... Current ZooKeeper Server: localhost:5181

12/02/29 17:29:20 ERROR fs.MapRClient: Retrying...Fetching new Zookeeper locations from CLDB. Attempt #1

12/02/29 17:29:22 INFO fs.JobTrackerWatcher: findJobTrackerAddr: ConnectionLost, Reconnecting... Current ZooKeeper Server: localhost:5181

12/02/29 17:29:22 ERROR fs.MapRClient: Retrying...Fetching new Zookeeper locations from CLDB. Attempt #2 ...

asked 29 Feb '12, 14:14

Marek's gravatar image

Marek
29679
accept rate: 20%

edited 01 Mar '12, 13:49


I have a feeling that on your mapr-desktop Zookeeper host in all configurations goes as "localhost" and when you are trying to connect to it from mapr-client it definitely can not connect to it on "localhost" as it is not local to mapr-client. Please double check it and if needed rerun configure.sh on mapr-desktop with -Z mapr-desktop -C mapr-desktop

link

answered 29 Feb '12, 19:16

yufeldman's gravatar image

yufeldman ♦♦
1.9k27
accept rate: 25%

You may also find that mapr-desktop resolves to 127.0.0.1 on mapr-desktop. You should see a message about that if so.

In that event, the simplest fix is to use the desired IP address instead. Fixing the resolution of the name is nice, but not entirely necessary.

(29 Feb '12, 19:21) TedDunning ♦♦

I believe resolving names into IPs works fine on both server (mapr-desktop=10.10.0.243) and client (mapr-client=10.10.0.232).


root@mapr-desktop:~# ping mapr-desktop
PING mapr-desktop (10.10.0.243) 56(84) bytes of data.

root@mapr-desktop:~# ping mapr-client
PING mapr-client.internal.xtremedata.com (10.10.0.232) 56(84) bytes of data.


[marek@mapr-client ~]$ ping mapr-client
PING mapr-client.internal.xtremedata.com (10.10.0.232) 56(84) bytes of data.

[marek@mapr-client ~]$ ping mapr-desktop
PING mapr-desktop.internal.xtremedata.com (10.10.0.243) 56(84) bytes of data.

(01 Mar '12, 13:45) Marek

Could you paste here /opt/mapr/conf/cldb.conf from your "server" (mapr-desktop")?

(01 Mar '12, 13:51) yufeldman ♦♦

root@mapr-desktop:~ # cat /opt/mapr/conf/cldb.conf
...
# CLDB before creating Root Volume
cldb.min.fileservers=1
# CLDB listening port
cldb.port=7222
# Number of worker threads
cldb.numthreads=10
# CLDB webport
cldb.web.port=7221
# Number of RW containers in cache
#cldb.containers.cache.entries=1000000
# Topology script to be used to determine
# Rack topology of node
# Script should take an IP address as input and print rack path
# on STDOUT. eg
# $>/home/mapr/topo.pl 10.10.10.10
# $>/mapr-rack1
# $>/home/mapr/topo.pl 10.10.10.20
# $>/mapr-rack2
#net.topology.script.file.name=/home/mapr/topo.pl
# ZooKeeper address
cldb.zookeeper.servers=mapr-desktop:5181
# Hadoop metrics jar version
hadoop.version=0.20.2
# CLDB JMX remote port
cldb.jmxremote.port=7220
cldb.zk.timeout=300

(01 Mar '12, 14:05) Marek

Following the advice, I re-run configuration of the server (mapr-desktop), but it did NOT help.
See more details below...

root@mapr-desktop:~ # /opt/mapr/server/configure.sh -N MyCluster -C mapr-desktop -Z mapr-desktop
Node setup configuration: cldb fileserver hbinternal hbmaster hbregionserver hive jobtracker nfs pig tasktracker webserver zookeeper
Log can be found at: /opt/mapr/logs/configure.log


root@mapr-desktop:~ # cat /opt/mapr/logs/configure.log
2012-03-01 13:16:18.857 mapr-desktop configure.sh(6077) Install main:670 Using 7222 port for CLDB mapr-desktop
2012-03-01 13:16:18.897 mapr-desktop configure.sh(6077) Install main:778 Using 5181 port for ZooKeeper mapr-desktop
2012-03-01 13:16:18.922 mapr-desktop configure.sh(6077) Install main:849
2012-03-01 13:16:18.928 mapr-desktop configure.sh(6077) Install main:850 Node install STARTED
2012-03-01 13:16:18.934 mapr-desktop configure.sh(6077) Install main:851 -----------------------
2012-03-01 13:16:18.944 mapr-desktop configure.sh(6077) Install ConstructMapRClustersConfFile:195 Contructing ClusterConfFile: cldb node list: 10.10.0.243:7222
2012-03-01 13:16:18.953 mapr-desktop configure.sh(6077) Install ConstructMapRClustersConfFile:207 Contructing ClusterConfFile: Done
2012-03-01 13:16:18.964 mapr-desktop configure.sh(6077) Install UpdateFileClientConfig:295 Updating file client config
2012-03-01 13:16:19.10 mapr-desktop configure.sh(6077) Install ConfigureJTRole:331 Configuring Hadoop
2012-03-01 13:16:19.16 mapr-desktop configure.sh(6077) Install ConfigureJTRole:334 Updating JT config
2012-03-01 13:16:19.22 mapr-desktop configure.sh(6077) Install UpdateFileClientConfig:295 Updating file client config
2012-03-01 13:16:19.35 mapr-desktop configure.sh(6077) Install ConfigureTTRole:344 Configuring TaskTracker role
2012-03-01 13:16:19.42 mapr-desktop configure.sh(6077) Install UpdateFileClientConfig:295 Updating file client config
2012-03-01 13:16:19.54 mapr-desktop configure.sh(6077) Install ConfigureHBMRole:395 Configuring Hbase Master Role
2012-03-01 13:16:19.60 mapr-desktop configure.sh(6077) Install ConfigureHBase:353 Configuring Hbase
2012-03-01 13:16:19.84 mapr-desktop configure.sh(6077) Install ConfigureHBRRole:424 Configuring Hbase RS Role
2012-03-01 13:16:19.90 mapr-desktop configure.sh(6077) Install ConfigureHBase:353 Configuring Hbase
2012-03-01 13:16:19.110 mapr-desktop configure.sh(6077) Install ConfigureHBIRole:409 Configuring Hbase Client Role
2012-03-01 13:16:19.116 mapr-desktop configure.sh(6077) Install ConfigureHBase:353 Configuring Hbase
2012-03-01 13:16:19.138 mapr-desktop configure.sh(6077) Install UpdateWardenConfig:436 Updating Warden config
2012-03-01 13:16:19.170 mapr-desktop configure.sh(6077) Install main:860
2012-03-01 13:16:19.176 mapr-desktop configure.sh(6077) Install main:861 Node install FINISHED
2012-03-01 13:16:19.182 mapr-desktop configure.sh(6077) Install main:862 -----------------------


root@mapr-desktop:~ # cat /opt/mapr/conf/mapr-clusters.conf
my.cluster.com 127.0.0.1:7222
MyCluster 10.10.0.243:7222


root@mapr-desktop:~ # /etc/init.d/mapr-zookeeper status
JMX enabled by default
Using config: /opt/mapr/zookeeper/zookeeper-3.3.2/conf/zoo.cfg
zookeeper running as process 1227.


root@mapr-desktop:~ # cat /opt/mapr/zookeeper/zookeeper-3.3.2/conf/zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=20
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=10
# the directory where the snapshot is stored.
dataDir=/opt/mapr/zkdata
# the port at which the clients will connect
clientPort=5181
# max number of client connections
maxClientCnxns=100
maxSessionTimeout=300000


root@mapr-desktop:~ # cat /opt/mapr/conf/cldb.conf
#
# CLDB Config file.
# Properties defined in this file are loaded during startup
# and are valid for only CLDB which loaded the config.
# These parameters are not persisted anywhere else.
#
# Wait until minimum number of fileserver register with
# CLDB before creating Root Volume
cldb.min.fileservers=1
# CLDB listening port
cldb.port=7222
# Number of worker threads
cldb.numthreads=10
# CLDB webport
cldb.web.port=7221
# Number of RW containers in cache
#cldb.containers.cache.entries=1000000
# Topology script to be used to determine
# Rack topology of node
# Script should take an IP address as input and print rack path
# on STDOUT. eg
# $>/home/mapr/topo.pl 10.10.10.10
# $>/mapr-rack1
# $>/home/mapr/topo.pl 10.10.10.20
# $>/mapr-rack2
#net.topology.script.file.name=/home/mapr/topo.pl
# ZooKeeper address
cldb.zookeeper.servers=mapr-desktop:5181
# Hadoop metrics jar version
hadoop.version=0.20.2
# CLDB JMX remote port
cldb.jmxremote.port=7220
cldb.zk.timeout=300
root@mapr-desktop:~ #

========
Here are a few things I noticed:
a) I used "MyCluster" name during configure.
"MyCluster 10.10.0.243:7222" was ADDED into mapr-clusters.conf
Old statement "my.cluster.com 127.0.0.1:7222" is still there.

b) "cldb.zookeeper.servers=mapr-desktop:5181" was set in the cldb.conf
It used to be "cldb.zookeeper.servers=localhost:5181"

c) However, runming the test job (on the client) still fails with "ConnectionLost, Reconnecting":
[marek@mapr-client ~]$ hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -write -nrFiles 10
TestDFSIO.0.0.4
12/03/01 16:28:16 INFO fs.TestDFSIO: nrFiles = 10
12/03/01 16:28:16 INFO fs.TestDFSIO: fileSize (MB) = 1.0
12/03/01 16:28:16 INFO fs.TestDFSIO: bufferSize = 1000000
12/03/01 16:28:17 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
12/03/01 16:28:17 INFO fs.TestDFSIO: creating control file: 1048576 bytes, 10 files
12/03/01 16:28:17 INFO fs.TestDFSIO: created control files for: 10 files
12/03/01 16:28:17 INFO fs.JobTrackerWatcher: findJobTrackerAddr: ConnectionLost, Reconnecting... Current ZooKeeper Server: localhost:5181
12/03/01 16:28:19 ERROR fs.MapRClient: Retrying...Fetching new Zookeeper locations from CLDB. Attempt #1
12/03/01 16:28:21 INFO fs.JobTrackerWatcher: findJobTrackerAddr: ConnectionLost, Reconnecting... Current ZooKeeper Server: localhost:5181
12/03/01 16:28:21 ERROR fs.MapRClient: Retrying...Fetching new Zookeeper locations from CLDB. Attempt #2
...

d) After running the test, there are 10 new "control(?) files" created on the server:
root@mapr-desktop:~ # ls /mapr/my.cluster.com/benchmarks/TestDFSIO/io_control/ -a
. in_file_test_io_0 in_file_test_io_2 in_file_test_io_4 in_file_test_io_6 in_file_test_io_8
.. in_file_test_io_1 in_file_test_io_3 in_file_test_io_5 in_file_test_io_7 in_file_test_io_9

running -clean command removes these files:
[marek@mapr-client ~]$ hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -clean
TestDFSIO.0.0.4
12/03/01 16:50:58 INFO fs.TestDFSIO: nrFiles = 1
12/03/01 16:50:58 INFO fs.TestDFSIO: fileSize (MB) = 1.0
12/03/01 16:50:58 INFO fs.TestDFSIO: bufferSize = 1000000
12/03/01 16:50:58 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
12/03/01 16:50:58 INFO fs.TestDFSIO: Cleaning up test files

e) Another "random" observation is that timestamps between both systems (client & server) is a few hour off. I can/will fix it, but could it matter in this case?


So, something is working? But, the test "hadoop-*-test.jar TestDFSIO" job does not complete.
What is a meaning of "INFO fs.JobTrackerWatcher: findJobTrackerAddr: ConnectionLost, Reconnecting..."?
Is this a problem with JobTracker?

Again, it looks to me like my client CAN "talk" to the server, but the job fails.
I wonder if this is a case of "misconfigured" server (but I am using "stock" M3 demo vm)?
Or, is this a problem with my client? In this case, is there additional configuration step that I missed?

Please advice.
Marek

link

answered 01 Mar '12, 14:02

Marek's gravatar image

Marek
29679
accept rate: 20%

An update: maybe this will give a clue...

a) modified /etc/hosts file on the server
root@mapr-desktop:~# cat /etc/hosts
10.10.0.243 mapr-desktop
127.0.0.1 localhost.localdomain localhost

(02 Mar '12, 14:04) Marek

b) now running the same test:
[marek@mapr-client ~]$ hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -write -nrFiles 10
TestDFSIO.0.0.4
(...) INFO fs.TestDFSIO: nrFiles = 10
(...) INFO fs.TestDFSIO: fileSize (MB) = 1.0
(...) INFO fs.TestDFSIO: bufferSize = 1000000
(...) INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
(...) INFO fs.TestDFSIO: creating control file: 1048576 bytes, 10 files
(...) INFO fs.TestDFSIO: created control files for: 10 files
(...) INFO fs.JobTrackerWatcher: Current running JobTracker is: mapr-desktop/10.10.0.243:9001
(...),3209 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1047 Thread: 140029808441088 Lookup of volume mapr.cluster.root failed, error Connection reset by peer(104), CLDB: 127.0.0.1:7222 trying another CLDB

(02 Mar '12, 14:05) Marek

Please notice that the old error:
"...INFO fs.JobTrackerWatcher: findJobTrackerAddr: ConnectionLost, Reconnecting..."

changed to
"...INFO fs.JobTrackerWatcher: Current running JobTracker is: mapr-desktop/10.10.0.243:9001..."

However, the problem moved further:
"...ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1047 Thread: 140029808441088 Lookup of volume mapr.cluster.root failed, error Connection reset by peer(104), CLDB: 127.0.0.1:7222 trying another CLDB..."

ANY IDEAS about this? Please be patient with me :) I'm new to hadoop/MapR
Thanks,
Marek

(02 Mar '12, 14:09) Marek
1

The problem here is that it is looking for the CLDB at 127.0.0.1. This is incorrect. You need to figure out why it is looking there. Can you check /opt/mapr/conf/mapr-clusters.conf and make sure that the correct IP/hostname is listed for the cldb node. If mapr-clusters.conf uses a hostname, make sure that it resolves correctly.

Are you still able to run hadoop fs -ls from the client? Can you successfully submit jobs directly on the server?

(02 Mar '12, 14:46) steven

Steven,

a) Yes, indeed the mapr-clusters.conf on the server did not look ok:
root@mapr-desktop:~# cat /opt/mapr/conf/mapr-clusters.conf
my.cluster.com 127.0.0.1:7222
MyCluster 10.10.0.243:7222

(02 Mar '12, 15:00) Marek

b) I removed the "my.cluster.com 127.0.0.1:7222" line, and now I get much further:

[marek@mapr-client ~]$ hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -write -nrFiles 10
TestDFSIO.0.0.4
...
(...) INFO fs.JobTrackerWatcher: Current running JobTracker is: mapr-desktop/10.10.0.243:9001
(...) INFO mapred.FileInputFormat: Total input paths to process : 10
(...) INFO mapred.JobClient: Running job: job_201203021455_0001
(...) INFO mapred.JobClient: map 0% reduce 0%
(...) INFO mapred.JobClient: Job complete: job_201203021455_0001
(...) INFO mapred.JobClient: Counters: 0
(...) INFO mapred.JobClient: Job Failed: Inline Setup for Job job_201203021455_0001 failed:
java.io.IOException: Could not set owner/group marek/null for path maprfs://10.10.0.243:7222/benchmarks/TestDFSIO/io_write

This looks like some problem with permissions, right?

(02 Mar '12, 15:02) Marek
showing 5 of 6 show all
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×31
×19
×17
×11

Asked: 29 Feb '12, 14:14

Seen: 482 times

Last updated: 02 Mar '12, 15:03

powered by OSQA