I have a 3 node M3 cluster. Each node has 4 GB. I've been able to successfully get the cluster started. Each node was using about 65-70% of its available memory but the cluster was still starting up.

When I start up mapr-warden now, however, I get these entries in my process list:

16178 pts/0 Sl 0:00 java -XX:ErrorFile=/opt/cores/hs_err_pid%p.log -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/core 16200 pts/0 S 0:00 bash /opt/mapr/server/createsystemvolumes.sh 16330 ? S<Lsl 0:01 /opt/mapr/server/mfs -b -f /ramfs/mapr/cachefile -O /opt/mapr/conf/mapr-clusters.conf -p 5660 -n inode:6 16503 ? S<s 0:00 /opt/mapr/server/hoststats 5660 /opt/mapr/logs/TaskTracker.stats 19419 pts/0 S<l 0:02 java -server -Xmx479m -XX:ErrorFile=/opt/cores/hs_err_pid%p.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapD

Any idea what this means? I checked the logs but couldn't find any entry that could help diagnose the problem. I get a similar Out of Memory error on each cluster node.

I increased the amount of memory to each node: either 6 or 8 GB. But I still get the error.

Do I have to tweak a heapsize variable somewhere in the conf directory to allocate more memory?

(After edit)

OK, when I type ps -aef | grep mapr, this is what I get:

root 918 1 0 Oct25 ? 00:00:03 java -XX:ErrorFile=/opt/cores/hs_err_pid%p.log -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/cores -XX:+UseConcMarkSweepGC -Dlog.file=/opt/mapr/logs/warden.log -Djava.library.path=/opt/mapr/lib -classpath /opt/mapr:/opt/mapr/conf:/opt/mapr/lib/JPam-1.1.jar:/opt/mapr/lib/adminuiapp-0.1.jar:/opt/mapr/lib/ant-1.7.1.jar:/opt/mapr/lib/baseutils-0.1.jar:/opt/mapr/lib/cldb-0.1.jar:/opt/mapr/lib/cliframework-0.1.jar:/opt/mapr/lib/commons-codec-1.4.jar:/opt/mapr/lib/commons-el-1.0.jar:/opt/mapr/lib/commons-email-1.2.jar:/opt/mapr/lib/commons-logging-1.0.4.jar:/opt/mapr/lib/commons-logging-api-1.0.4.jar:/opt/mapr/lib/eval-0.5.jar:/opt/mapr/lib/globalfsck-0.1.jar:/opt/mapr/lib/google-collect-1.0.jar:/opt/mapr/lib/hadoop-metrics-0.20.2-dev.jar:/opt/mapr/lib/jasper-compiler-5.5.12.jar:/opt/mapr/lib/jasper-runtime-5.5.12.jar:/opt/mapr/lib/jetty-6.1.14.jar:/opt/mapr/lib/jetty-plus-6.1.14.jar:/opt/mapr/lib/jetty-util-6.1.14.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/jsp-2.1.jar:/opt/mapr/lib/jsp-api-2.1.jar:/opt/mapr/lib/junit-3.8.1.jar:/opt/mapr/lib/junit-4.5.jar:/opt/mapr/lib/kvstore-0.1.jar:/opt/mapr/lib/libprotodefs.jar:/opt/mapr/lib/log4j-1.2.14.jar:/opt/mapr/lib/log4j-1.2.15.jar:/opt/mapr/lib/logging-0.1.jar:/opt/mapr/lib/mail.jar:/opt/mapr/lib/maprbuildversion.jar:/opt/mapr/lib/maprcli-0.1.jar:/opt/mapr/lib/maprsecurity-0.1.jar:/opt/mapr/lib/maprutil-0.1.jar:/opt mapr/lib/protobuf-java-2.3.0-lite.jar:/opt/mapr/lib/servlet-api-2.5-6.1.14.jar:/opt/mapr/lib/volumemirror-0.1.jar:/opt/mapr/lib/warden-0.1.jar:/opt/mapr/lib/zookeeper-3.3.2.jar -Dcom.sun.management.jmxremote -Dpid=873 -Dpname=warden -Dmapr.home.dir=/opt/mapr com.mapr.warden.WardenMain /opt/mapr/conf/warden.conf root 996 1 0 Oct25 ? 00:00:03 java -Dzookeeper.log.dir=/opt/mapr/zookeeper/zookeeper-3.3.2/logs -Dzookeeper.root.logger=WARN, ROLLINGFILE -XX:ErrorFile=/opt/mapr/zookeeper/zookeeper-3.3.2/logs/hs_err_pid%p.log -cp /opt/mapr/zookeeper/zookeeper-3.3.2/bin/../build/classes:/opt/mapr/zookeeper/zookeeper-3.3.2/bin/../build/lib/*.jar:/opt/mapr/zookeeper/zookeeper-3.3.2/bin/../zookeeper-3.3.2.jar:/opt/mapr/zookeeper/zookeeper-3.3.2/bin/../lib/log4j-1.2.15.jar:/opt mapr/zookeeper/zookeeper-3.3.2/bin/../lib/jline-0.9.94.jar:/opt/mapr/zookeeper/zookeeper-3.3.2/bin/../src/java/lib/*.jar:/opt/mapr/zookeeper/zookeeper-3.3.2/conf: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /opt/mapr/zookeeper/zookeeper-3.3.2/conf/zoo.cfg root 1226 1 0 Oct25 ? 00:00:01 /opt/mapr/server/mfs -b -f /ramfs/mapr/cachefile -O /opt/mapr/conf/mapr-clusters.conf -p 5660 -n inode:6:log:6:meta:10:dir:40:small:15 -m 1197 root 1406 1 0 Oct25 ? 00:00:01 /opt/mapr/server/hoststats 5660 /opt/mapr/logs/TaskTracker.stats

This looks a lot better.

But now when I type this command:

/opt/mapr/bin/maprcli acl edit -type cluster -user <user>:fc

I get the dreaded "ERROR (10009) - Couldn't connect to the CLDB service".

I'm not sure why I'm getting this: I looked through the other messages mentioning that error and I tried running zkdatacleaner.sh, restarting mapr-warden and mapr-zookeeper, and running both configure.sh and disksetup again.

asked 25 Oct '11, 20:22

rpark31's gravatar image

rpark31
6335
accept rate: 0%

edited 25 Oct '11, 22:06


If you do "ps -aef | grep mapr" or similar it will show you more info.

In any case what you see is the processes that start up with "short" command line. Those command lines just show some parameters that help dumping info in case process runs out of memory, so it has nothing to do with lack of memory on the box. If it would that process would not be even running.

Is you cluster running OK otherwise?

link

answered 25 Oct '11, 21:12

yufeldman's gravatar image

yufeldman ♦♦
1.9k27
accept rate: 25%

See edited question above - I definitely see more with the ps -aef command but I'm not able to connect to CLDB.

(25 Oct '11, 22:07) rpark31

I assume you configured CLDB to run on the node you are getting "ps" from. And looks like it is not running. I would encourage you to look at /opt/mapr/logs/cldb.log to look for a reason CLDB shutdown. If it will not help you please submit a ticket to support@mapr.com

link

answered 25 Oct '11, 23:47

yufeldman's gravatar image

yufeldman ♦♦
1.9k27
accept rate: 25%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×10

Asked: 25 Oct '11, 20:22

Seen: 1,018 times

Last updated: 25 Oct '11, 23:47

powered by OSQA