I'm trying to install MapR on a cluster with Ubuntu 10.04 with custom kernel (3.3-rc3). There is only one master node (running only 1 zookeeper and 1 cldb for the entire cluster), and all the other nodes of the cluster only have mfs and tasktracker services installed.

Each cluster machine has two network interfaces, and MAPR_SUBNETS is configured appropriately to only use one interface. The /etc/hosts file has the DNS mappings also set correctly.

After following the instruction for M3 install in the documentation, when I start warden on the first node (the master node), I get the following error:


2012-03-14 09:48:31,323 INFO com.mapr.warden.service.baseservice.Service [main-EventThread]: Thread: 41, NodeCreated: /services/hoststats/tmel-bd-n11.tmel.vmem.int

2012-03-14 09:48:31,350 INFO com.mapr.warden.service.baseservice.Service [hoststats_monitor]: Need delayed alarm clearing for: NODE_ALARM_SERVICE_HOSTSTATS_DOWN

2012-03-14 09:48:31,350 INFO com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/hoststats. Event state: SyncConnected. Event type: NodeChildrenChanged

2012-03-14 09:48:31,351 ERROR com.mapr.warden.service.CLDBService isToStartNow [cldb_monitor]: Exception while trying to get children of: /datacenter/controlnodes/cldb/active/CLDBNodes org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /datacenter/controlnodes/cldb/active/CLDBNodes at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) at com.mapr.warden.service.baseservice.common.ZKUtilsLocking.getZkNodeChildren(ZKUtilsLocking.java:89) at com.mapr.warden.service.CLDBService.isToStartNow(CLDBService.java:299) at com.mapr.warden.service.baseservice.Service$ServiceMonitorRun.run(Service.java:1559) at java.lang.Thread.run(Thread.java:662)

2012-03-14 09:48:31,353 INFO com.mapr.warden.service.baseservice.Service$ServiceRun [cldb_monitor]: Command: [nice, -n, -10, /etc/init.d/mapr-cldb, start], Directory: /etc/init.d/

2012-03-14 09:48:31,379 INFO com.mapr.warden.service.baseservice.Service$ServiceRun [hoststats_monitor]: Command: [nice, -n, -10, /etc/init.d/mapr-hoststats, start], Directory: /etc/init.d/

2012-03-14 09:48:31,434 INFO com.mapr.warden.service.baseservice.Service$ServiceRun [hoststats_monitor]:

2012-03-14 09:48:32,443 INFO com.mapr.warden.service.baseservice.Service$ServiceRun [cldb_monitor]: Starting CLDB, logging to /opt/mapr/logs/cldb.log


The zookeper log has no error, but the INFO messages look like this >>

2012-03-14 09:48:34,064 - INFO [ProcessThread:-1:PrepRequestProcessor@407] - Got user-level KeeperException when processing sessionid:0x136121abb510010 type:create cxid:0x17 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/datacenter/controlnodes/cldb Error:KeeperErrorCode = NodeExists for /datacenter/controlnodes/cldb

2012-03-14 09:48:34,073 - INFO [ProcessThread:-1:PrepRequestProcessor@407] - Got user-level KeeperException when processing sessionid:0x136121abb510010 type:create cxid:0x18 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/datacenter/controlnodes/cldb/epoch Error:KeeperErrorCode = NodeExists for /datacenter/controlnodes/cldb/epoch

2012-03-14 09:48:34,081 - INFO [ProcessThread:-1:PrepRequestProcessor@407] - Got user-level KeeperException when processing sessionid:0x136121abb510010 type:create cxid:0x19 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/datacenter/controlnodes/cldb/epoch/1 Error:KeeperErrorCode = NodeExists for /datacenter/controlnodes/cldb/epoch/1


The cldb log has the following (trimmed) output >>

2012-03-14 10:06:21,121 INFO com.mapr.fs.cldb.CLDBServer [pool-1-thread-2]: FSRegister: Request FSID: 2836519655559445901 FSNetworkLocation: / FSHost:Port: 10.11.91.208:5660-172.16.200.2:5660- FSHostName: tmel-bd-n12.tmel.vmem.int StoragePools f282cb583e097e88004f5ff2f804576c-7f6bacbec8c602d3004f5ff2f909a659-04bceaaae9520992004f5ff2f602a258- Capacity: 6406795 Available: 6404544 Used: 2250 Role: 0 isDCA: false Received registration request

2012-03-14 10:06:21,121 INFO com.mapr.fs.cldb.CLDBServer [pool-1-thread-2]: FSRegister: CLDB waiting for local mfs to register and become master, requesting fileserver 10.11.91.208:5660-172.16.200.2:5660- FSID: 2836519655559445901 to try again by returning ESRCH

2012-03-14 10:06:21,910 INFO com.mapr.fs.cldb.CLDBServer [pool-1-thread-2]: RPC: PROGRAMID: 2345 PROCEDUREID: 103 from 10.11.91.207:51687 Generating reply with status: 3

2012-03-14 10:06:21,929 INFO com.mapr.fs.cldb.CLDBServer [pool-1-thread-2]: RPC: PROGRAMID: 2345 PROCEDUREID: 40 from 10.11.91.207:51687 Generating reply with status: 3

2012-03-14 10:06:25,674 INFO com.mapr.fs.cldb.CLDBServer [pool-1-thread-2]: RPC: PROGRAMID: 2345 PROCEDUREID: 103 from 10.11.91.207:53592 Generating reply with status: 3

asked 14 Mar '12, 10:10

Kshitij%20S's gravatar image

Kshitij S
41222
accept rate: 33%


Yes, it was a mfs/disk issue, not a warden issue. Once I fixed the disks, all services came back up without a problem.

I initially thought of it as a warden issue because, as you pointed out, I saw the error in the warden log and it looked as if warden itself was not initializing properly.

link

answered 14 Mar '12, 12:24

Kshitij%20S's gravatar image

Kshitij S
41222
accept rate: 33%

quick update

I just checked the mfs service, and the log shows the errors related to block device being busy. disksetup step did not complain about it though. I'm trying to fix this now but I don't understand how zookeeper and warned would have an error since it seems mfs is started after warden.


2012-03-14 09:48:22,7101 ERROR Global fs/server/mapserver/loadsp.cc:177 clnt x.x.0.0:0 req 0 seq 0 InitStoargaePools from disktab

2012-03-14 09:48:22,7101 INFO IOMgr fs/server/io/iomgr.cc:2253 clnt x.x.0.0:0 req 0 seq 0 skip comments line 1

2012-03-14 09:48:22,7101 INFO IOMgr fs/server/io/iomgr.cc:2249 clnt x.x.0.0:0 req 0 seq 0 Skip line in disktab, incorrect format, missing fileds

2012-03-14 09:48:22,7101 INFO IOMgr fs/server/io/iomgr.cc:2268 clnt x.x.0.0:0 req 0 seq 0 found 7 disks in disktab

2012-03-14 09:48:22,7101 INFO IOMgr fs/server/io/lun.cc:769 clnt x.x.0.0:0 req 0 seq 0 Loading disk:/dev/sdb

2012-03-14 09:48:22,7101 INFO IOMgr fs/server/io/lun.cc:780 clnt x.x.0.0:0 req 0 seq 0 /dev/sdb LoadDiskPrivateData 0x1223880

2012-03-14 09:48:22,7102 ERROR IOMgr fs/server/io/lun.cc:576 clnt x.x.0.0:0 req 0 seq 0 target device open(/dev/sdb) error Device or resource busy failed(errno 16)

2012-03-14 09:48:22,7102 ERROR IOMgr fs/server/io/lun.cc:784 clnt x.x.0.0:0 req 0 seq 0 OnlineDisk(/dev/sdb) failed Device or resource busy.(16)

2012-03-14 09:48:22,7102 ERROR Global fs/server/mapserver/loadsp.cc:80 clnt x.x.0.0:0 req 0 seq 0 Onlinedisk(/dev/sdb) failed(16)

2012-03-14 09:48:22,7102 INFO IOMgr fs/server/io/iomgr.cc:843 clnt x.x.0.0:0 req 0 seq 0 execv of /opt/mapr/server/handle_disk_failure.sh with option : /opt/mapr/server/handle_disk_failure.sh

link

answered 14 Mar '12, 10:18

Kshitij%20S's gravatar image

Kshitij S
41222
accept rate: 33%

Why did you decide that warden had issues starting up? I think it started up, but because (as you said) issues in mfs and subsequently CLDB no other services were starting up. If you are referring to "ERROR com.mapr.warden.service.CLDBService isToStartNow" - it is related to the fact that znode for CLDBMaster is not there at that moment which is because CLDB could not fully initialized, as it depends on mfs which has errors with diskdevice.

link

answered 14 Mar '12, 10:55

yufeldman's gravatar image

yufeldman ♦♦
1.9k27
accept rate: 25%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×47
×8

Asked: 14 Mar '12, 10:10

Seen: 1,031 times

Last updated: 14 Mar '12, 12:24

powered by OSQA