I am having an issue where no matter how far I go in removing MapR code and configuration from all systems involved I get the error;

Input nodes do not match any of the cluster nodes

When interacting with any other node but the CLDB node.

I've run zkdatacleaner.sh on all nodes w/ zookeeper present. I've deleted the entire MapR directory tree, I've deleted the directory zookeeper creates in the /tmp and /var directories I've created new cluster keys on the website. The only thing that I haven't done is delete any Java caches (only because I can't find any) and re-install the OS (which I don't think will solve the problem).

With that said I have two questions: 1) Is there a document or group of documents that hold an exhaustive list what MapR files are responsible for which MapR components? Eg /opt/mapr/conf/warden.com = cldb startup adminui startup etc .... I'd like to have some sort of map as to where to look when things go wrong rather than combing through every file in every directory every time... If this can be found at the Apache site on Hadoop that is fine as well but I've been coming up empty handed on a number of searches on this particular topic.

2) I've followed the documentation explicitly on how to add nodes to a cluster and how to clean zookeeper data and I still have no been successfully able to fix the above error ( Input nodes do not match any of the cluster nodes) Some snippets from logs:

Control node:

root@MAPR1:/opt/mapr/logs# jps
3558 WardenMain
4465 CommandServer
14689 Jps
4477 JobTracker
572 CommandServer
3911 CLDB
3503 QuorumPeerMain
root@MAPR1:/opt/mapr/logs#

CLDB.LOG

2011-09-23 08:29:57,571 INFO  com.mapr.fs.cldb.ActiveContainersMap [pool-1-thread-262]: BatchUpdate containerUpdate CID: 2050 Container ID:2050 Master:192.168.106.67:5660--3-VALID(1733412610376772269) Servers:  192.168.106.67:5660--3-VALID(1733412610376772269) 192.168.106.65:5660--3-VALID(7735693932272923427) 192.168.106.66:5660--3-VALID(7451503201337471211) Inactive Servers:  Unused Servers:  Latest epoch:3 SizeMB:0
2011-09-23 08:29:57,635 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [pool-1-thread-262]: Storing KvStoreContainerInfo to ZooKeeper  Container ID:1 VolumeId:1 Master:192.168.106.65:5660--5-VALID Servers:  192.168.106.65:5660--5-VALID 192.168.106.66:5660--5-VALID 192.168.106.67:5660--5-VALID Inactive Servers:  Unused Servers:  Latest epoch:5
2011-09-23 08:30:32,868 WARN  com.mapr.fs.cldb.alarms.Alarms [ReplicationManagerThread]: VOLUME_ALARM_DATA_UNDER_REPLICATED cleared, for volume mapr.cldb.internal
2011-09-23 08:41:55,833 ERROR com.mapr.fs.cldb.CLDBServer [pool-1-thread-265]: VolumeLookup: VolName: mapr.MAPR2.lab.net.local.mapred Volume not found
root@MAPR1:/opt/mapr/logs#

Please note, 192.168.106.66 and .67 no longer have CLDB or zookeeper installed. I've uninstalled them from that node removed all directories etc. They are simply processing nodes now.

Still from control node: WARDEN.LOG

Header: hostName: MAPR1.lab.net, Time Zone: Eastern Standard Time, processName: warden, processId: 3528
2011-09-23 08:27:51,084 INFO  com.mapr.warden.service.baseservice.Service [Thread-8-EventThread]: Process path: /services/nfs/master. Event state: SyncConnected. Event type: NodeDeleted
2011-09-23 08:27:51,092 INFO  com.mapr.warden.service.baseservice.Service [Thread-8-EventThread]: MasterNode is:/services/nfs/master am I master: false
2011-09-23 08:27:51,098 INFO  com.mapr.warden.service.baseservice.Service [Thread-8-EventThread]: Thread: 110, MasterIP: MAPR7.lab.net
up
2011-09-23 08:27:51,099 INFO  com.mapr.warden.service.baseservice.Service [Thread-8-EventThread]: Process path: /services/nfs/master. Event state: SyncConnected. Event type: NodeDataChanged
2011-09-23 08:28:18,002 INFO  com.mapr.warden.WardenServer [main-EventThread]: Process path: /servers. Event state: SyncConnected. Event type: NodeChildrenChanged
2011-09-23 08:29:33,105 INFO  com.mapr.warden.WardenServer [main-EventThread]: Process path: /servers. Event state: SyncConnected. Event type: NodeChildrenChanged
2011-09-23 08:29:33,280 INFO  com.mapr.warden.service.baseservice.Service [Thread-7-EventThread]: Process path: /services/tasktracker. Event state: SyncConnected. Event type: NodeChildrenChanged

MFS.LOG

2011-09-23 08:29:55,6843 INFO Replication fs/server/replication/containerresyncfromsnapshot.cc:89 clnt x.x.0.0:0 req 0 seq 6929664 Resyncing from cid 1 replica 1 txnVN 1048991 snapVN 32 writeVN 1048576 iscow 1 isundo 0, rollforwardcontainer 0 dumpSnapshotInode 1
2011-09-23 08:29:56,7075 INFO Replication fs/server/replication/containerresync.cc:2141 clnt x.x.0.0:0 req 0 seq 6929664 Resync from snapshot completed srccid:1 replicacid:1 resynccid:1 err 0 svderr 0
2011-09-23 08:29:56,7287 INFO Replication fs/server/replication/containerresync.cc:2833 clnt x.x.0.0:0 req 0 seq 0 ResyncContainer complete srccid 1 replicacid 1 err 0x0

2011-09-23 08:29:56,7287 INFO Replication fs/server/replication/replicate.cc:1287 clnt x.x.0.0:0 req 0 seq 0 Adding 192.168.106.67:5660 as replica for container (1) after completing resync.
2011-09-23 08:29:56,7287 INFO Replication fs/server/replication/containerresync.cc:2422 clnt x.x.0.0:0 req 0 seq 0 Deleting snapshot 4063809596
2011-09-23 08:29:56,7287 INFO Container fs/server/container/delete.cc:976 clnt x.x.0.0:0 req 0 seq 0 Container delete request for cid 4063809596 cb 0x7bba10
2011-09-23 08:29:56,7289 INFO Container fs/server/container/container.cc:3012 clnt x.x.0.0:771 req 3 seq 11089408 update state for container 4063809596 : removing old orphanEntry with opcode 64908768
2011-09-23 08:29:56,7290 INFO Replication fs/server/replication/containerresync.cc:2474 clnt x.x.0.0:771 req 0 seq 6738432 Deleting resync container WA 0x3de16e0 cid 1
2011-09-23 08:29:56,7290 INFO KvStore fs/server/mapserver/kvstoremultiop.cc:1311 clnt x.x.0.0:0 req 0 seq 7647488 Multiop on cid 1 without logflush took 1651 msec
2011-09-23 08:29:56,7477 INFO Container fs/server/container/delete.cc:3667 clnt x.x.0.0:0 req 0 seq 2 Deleted container with cid 4063809596
root@MAPR1:/opt/mapr/logs#

Misc Logs:

root@MAPR1:/opt/mapr/logs# tail createJTVolume.log
stat: cannot stat `/var/mapr/cluster/mapred': No such file or directory
stat: cannot stat `/var/mapr/cluster/mapred': No such file or directory
stat: cannot stat `/var/mapr/cluster/mapred': No such file or directory
stat: cannot stat `/var/mapr/cluster/mapred': No such file or directory
2011-09-22 15:48:49
---- Thu Sep 22 11:48:51 EDT 2011 --- ALL OK

I've verified forward and reverse DNS, local resolution via gethostip resolves to 127.0.1.1 my hosts file is correct.

ERROR (22) -  Unable to map host: MAPR1.lab.net to non-local ipaddress while creating volume mapr.MAPR1.lab.net.local.logs
2011-09-22 11:54:15.236 MAPR1 createsystemvolumes.sh(3614) Install CreateLocalVolumeDirectories:170 CreateLocalVolume: Retrying after 20 seconds. RetryCnt: 1
3
ERROR (22) -  Unable to map host: MAPR1.lab.net to non-local ipaddress while creating volume mapr.MAPR1.lab.net.local.logs
2011-09-22 11:54:37.746 MAPR1 createsystemvolumes.sh(3614) Install CreateLocalVolumeDirectories:170 CreateLocalVolume: Retrying after 20 seconds. RetryCnt: 1
4
2011-09-22 11:54:37.757 MAPR1 createsystemvolumes.sh(3614) Install CreateLocalVolumeDirectories:170 'logs' volume could not be created after multiple retries
stat: cannot stat `/var/mapr/local/MAPR1.lab.net/logs': No such file or directory
stat: cannot stat `/var/mapr/local/MAPR1.lab.net/logs': No such file or directory
stat: cannot stat `/var/mapr/local/MAPR1.lab.net/logs': No such file or directory
stat: cannot stat `/var/mapr/local/MAPR1.lab.net/logs': No such file or directory

asked 23 Sep '11, 10:01

tskyers's gravatar image

tskyers
40223
accept rate: 33%

edited 23 Sep '11, 10:03


I get that error from the web UI when I try to start or stop services on cluster nodes using it. I have verified that each etc/host file is correct and working, as well as dns resolution from name to ip and from ip to name as well as local host lookup and resolv.conf.

link

answered 25 Sep '11, 19:12

tskyers's gravatar image

tskyers
40223
accept rate: 33%

Here is a line from one of your replies:

"I've verified forward and reverse DNS, local resolution via gethostip resolves to 127.0.1.1 my hosts file is correct."

MAPRN.lab.net should resolve to non-local ( another words should NOT only resolve to 127.0.0.1, but have also some non-local resolution)

(25 Sep '11, 20:40) yufeldman ♦♦

It does, the systems are all configured for host file then DNS as far as resolution is concerned. Local host is set to 127 and local IP all other names are resolved using DNS.

The server worked the first time I installed MapR however subsequent installations have had various issues, this being one of them.

(26 Sep '11, 04:49) tskyers

I would recommend you to contact support@mapr.com at this point.

(26 Sep '11, 10:09) yufeldman ♦♦

Did you rerun configure.sh on all nodes, giving the correct IP addresses cldb and zookeeper nodes?

link

answered 23 Sep '11, 10:37

steven's gravatar image

steven
32213
accept rate: 19%

Yes, I ran the configuration script at the appropriate point during installation on all the nodes.

link

answered 25 Sep '11, 18:29

tskyers's gravatar image

tskyers
40223
accept rate: 33%

From which application/command do you get this error:

"Input nodes do not match any of the cluster nodes"

I suspect that it is from either CLI or UI when you try to manage services. Could you check your /etc/hosts or other Network services to make sure that from each and every host you see others as MAPRN.lab.net and not short names or any other names?

link

answered 25 Sep '11, 18:56

yufeldman's gravatar image

yufeldman ♦♦
1.9k27
accept rate: 25%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×75
×47
×31
×17
×5

Asked: 23 Sep '11, 10:01

Seen: 1,337 times

Last updated: 26 Sep '11, 10:09

powered by OSQA