How is your garbage collector by the way?

It is good to know how your garbage collection works and how much pause it brings to your application. Reading through this blog post I could easily generate stats and convert them to charts I could analyze and compare. I’ll keep this short and go through the steps very quickly.

Collect Logs

Add some JVM arguments to print GC stats, something like:

-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:gc-stats.log

Add some JVM arguments to choose your garbage collection algorithm (the last flag is useful to know if JVM has understood your configuration):

-XX:+UseG1GC -XX:+PrintFlagsFinal

to use G1 (garbage first), or

-XX:+UseParallelGC -XX:+UseParallelOldGC -XX:+PrintFlagsFinal

to use parallel garbage collection algorithm.

Now run your application and collect GC log file.

Install Naarad

Follow instructions on this page to install Naarad. It is a open source Python project on GitHub, developed by LinkedIn.

Convert and Chart

Naarad comes with a few example configurations that can be used in a simple command like this to parse GC statistics and generate HTML reports:

./bin/naarad -c examples/conf/config-gc -i logs/g1/ -o output/g1/

Which will read the log file gc_g1.log (specified in the configuration file) from logs/g1 and generate reports in output/g1.

This utility gives me graphs like the one below. This one shows GC pauses when using parallel and G1 algorithms respectively, there are a lot more of these I’m going to discover.

More

Naarad can be used to collect system telemetries, including but not limited to CPU and memory usage, which can be very useful and easily achieved. I haven’t tried that myself, but more information is provided here if you are interested.

Advertisements

Monitoring your Tomcat application server, or any other Java process, using JMX and OpenNMS

This post will show you how to monitor a Tomcat application server using JMX MBeans and an SNMP management application. The first part will guide you through securely enabling JMX management on the Java process. Once you have done this, you can use tools like Visual VM to monitor your process and its MBeans. If you want to take a step further and have this data gathered all the time, you might want to continue with the second part and integrate it with OpenNMS.

Requirements

It is assumed that you have Tomcat on a server you want to manage, and you have also got Sun Java 7 SDK (not JRE) to run on your machine (they both can be the same machine, but we keep them separated here). You can apply this to any other Java process with little modification (to the MBean names maybe).

Enable JMX on Server

Add a few Java arguments to your CATALINA_OPTS to enable remote connection to JMX, and then restart the application server:

export CATALINA_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=1100 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.password.file=/jre/lib/management/jmxremote.password"

Replacing with the appropriate path for your Java, copy /jre/lib/management/jmxremote.password.template into /jre/lib/management/jmxremote.password and set the passwords for monitorRole and controlRole at the end of the file as you wish. We will assume you have set them to MONITOR_PASS and CONTROL_PASS here .

Make sure that your firewall doesn’t block port 1100. You may change the port number as you wish, but remember to change it everywhere mentioned in this guide.

Now run a JMX utility to connect to your JMX server, like JConsole or Visual VM. If using JConsole, connect to a remote host providing hostname, port (1100), username (controlRole), and password (CONTROL_PASS). If using Visual VM with a VisualVM-MBeans plugin installed, you might be able to browse among the MBeans and call some operations (click to enlarge):

This might be enough if you want to see what’s going on in the JVM at the moment. You can even double click the numbers (only the ones highlighted) to see a graph over time.

But if you want to keep historical records of different things over time, keep reading. We will be demonstrating how to use a network management software to connect to JMX and gather information. This will be based on an open source application called OpenNMS, but you can probably use your own software so long as it supports JSR160.

Install OpenNMS

This document provides instructions for Fedora 16. Please follow the OpenNMS installation guide for other operating systems.

Add OpenNMS repository.

rpm -Uvh http://yum.opennms.org/repofiles/opennms-repo-stable-fc16.noarch.rpm

Install, init, and start PostgreSQL server, if you don’t have one.

yum -y install postgresql postgresql-server
/sbin/service postgresql initdb
/sbin/service postgresql start
/sbin/chkconfig postgresql on

To allow OpenNMS which is run as root to connect to PostgreSQL as opennms, you need to relax some access requirements. Edit /var/lib/pgsql/data/pg_hba.conf and make sure you have entries like this (change the last column):

local   all         all                               trust
host    all         all         127.0.0.1/32          trust
host    all         all         ::1/128               trust

Now restart PostgreSQL for these changes to take effect:

/sbin/service postgresql restart

Now you are ready to install OpenNMS:

yum -y install opennms
/opt/opennms/bin/runjava -S /usr/java/latest/bin/java
/opt/opennms/bin/install -dis
/sbin/service opennms start
Discover and Configure

If you have followed the instructions in the previous section, you are now able to access the web interface of OpenNMS via http://:8980/opennms/. Enter admin for both username and password when prompted. Just remember to open port 8980 in your firewall for hosts you will be accessing OpenNMS web interface from.
The easiest way to set up OpenNMS to monitor a JMX service is to hijack the configuration to set up its own JMX interface. The only things we need to change is port number, username, password, and a few names that are different in Tomcat. Please see this wiki page for an explanation.
Please modify your configuration files stated below to make sure that it matches what provided here.

/opt/opennms/etc/capsd-configuration.xml

<protocol-plugin protocol="OpenNMS-JVM" class-name="org.opennms.netmgt.capsd.plugins.Jsr160Plugin" scan="on" user-defined="false">
    <property key="port" value="1100" />
    <property key="factory" value="PASSWORD-CLEAR"/>
    <property key="username" value="controlRole" />
    <property key="password" value="CONTROL_PASS" />
    <property key="protocol" value="rmi"/>
    <property key="urlPath" value="/jmxrmi"/>
    <property key="timeout" value="3000" />
    <property key="retry" value="2" />
    <property key="type" value="default" />

/opt/opennms/etc/collectd-configuration.xml

<service name="OpenNMS-JVM" interval="300000" user-defined="false" status="on">
        <parameter key="port" value="1100"/>
        <parameter key="factory" value="PASSWORD-CLEAR"/>
        <parameter key="username" value="controlRole" />
        <parameter key="password" value="CONTROL_PASS" />
        <parameter key="retry" value="2"/>
        <parameter key="timeout" value="3000"/>
        <parameter key="protocol" value="rmi"/>
        <parameter key="urlPath" value="/jmxrmi"/>
        <parameter key="rrd-base-name" value="java" />
        <parameter key="ds-name" value="opennms-jvm"/>
        <parameter key="friendly-name" value="opennms-jvm"/>
        <parameter key="collection" value="jsr160"/>
        <parameter key="thresholding-enabled" value="true"/>

/opt/opennms/etc/poller-configuration.xml

<service name="OpenNMS-JVM" interval="300000" user-defined="false" status="on">
  <parameter key="port" value="1100"/>
  <parameter key="factory" value="PASSWORD-CLEAR"/>
  <parameter key="username" value="controlRole"/>
  <parameter key="password" value="CONTROL_PASS"/>
  <parameter key="retry" value="2"/>
  <parameter key="timeout" value="3000"/>
  <parameter key="rrd-repository" value="/opt/opennms/share/rrd/response" />
  <parameter key="ds-name" value="opennms-jvm"/>
  <parameter key="friendly-name" value="opennms-jvm"/>

/opt/opennms/etc/jmx-datacollection-config.xml

...
<mbean name="JVM MemoryPool:Eden Space" objectname="java.lang:type=MemoryPool,name=PS Eden Space">
...
<mbean name="JVM MemoryPool:Survivor Space" objectname="java.lang:type=MemoryPool,name=PS Survivor Space">
...
<mbean name="JVM MemoryPool:Perm Gen" objectname="java.lang:type=MemoryPool,name=PS Perm Gen">
...
<mbean name="JVM MemoryPool:Old Gen" objectname="java.lang:type=MemoryPool,name=PS Old Gen">
...

Now restart the server after making these changes.

/sbin/service opennms restart

Once restarted successfully, go to the web interface and perform the following steps:

  • In the Admin tab, click “Add Interface for Scanning”, then enter and add.
  • In the Events tab, click “All Events” and look for services being discovered.
  • In the Reports tab, click “Resource Graphs”, select Tomcat server in the standard reports, select the opennms-jvm, then click “Graph Selection”.

Here are your graphs.

You need to follow the same pattern to monitor other MBeans. For example to graph total compilation time we need to follow these steps.
First, add an entry in jmx-datacollection-config.xml file to query the MBean (use JConsole or JVisualVM to find the name of the MBean you are interested in):

<mbean name="JVM Compilation" objectname="java.lang:type=Compilation">
  <attrib name="TotalCompilationTime" alias="TotCompilationTime" type="gauge"/>

Then, add a report template in the snmp-graph.properties section:

report.jvm.compilation.name=JVM Compilation
report.jvm.compilation.columns=TotCompilationTime
report.jvm.compilation.type=interfaceSnmp
report.jvm.compilation.command=--title="JVM Compilation" \
 DEF:compilationTime={rrd1}:TotCompilationTime:AVERAGE \
 LINE2:compilationTime#0000ff:"Compilation Time" \
 GPRINT:compilationTime:AVERAGE:" Avg \\: %8.2lf %s\\n"

Finally, don’t forget to add this new graph to the list of graphs (mind the semicolon and backslash at the end of the line):

reports=mib2.HCbits, mib2.bits, mib2.percentdiscards, mib2.percenterrors, \
...
jvm.gc.copy, jvm.gc.msc, jvm.gc.parnew, jvm.gc.cms, jvm.gc.psms, jvm.gc.pss, jvm.compilation, \
...

How to start investigating Java’s OutOfMemoryError

This blog post addresses services/support people and doesn’t provide too much detailed information about memory management in Java.

You get out of memory, now what? First of all you need to understand what kind of OutOfMemory is it. You may run out of OS virtual memory, native memory allocated to Java process, or Java heap. The error message normally gives a good indication of specifics, see a few examples:

OutOfMemoryError: Java heap space
OutOfMemoryError: PermGen space
OutOfMemoryError: unable to create new native thread
OutOfMemoryError: requested XXX bytes for ChunkPool::allocate

A typical Java process has a heap where Objects go into, which is also divided into different sections. Depending on the implementation of JVM they might be called: Eden Space (New), From Space (Survivor 1), To Space (Survivor 2), Old Generation (Tenured), and Perm Generation (it is considered outside Heap in some implementations). For a detailed explanation see this page or this page. On top it add memory required for class-loaders, garbage collection process, threads stack, JNI, native memory buffers, etc. For a detailed explanation see this page.

Here, I’m going to list what you need to know before analyzing an OutOfMemoryError:

1- What is the architecture of your machine and the JVM running your Java process: 32-bit, 64-bit, or else? How to proceed from here depends on answers to this question, because a 32-bit Java process is limited to about 3GB of usable virtual memory in user space with default settings, and 4GB in best case.

$ uname -a
Linux houman-laptop 3.9.10-100.fc17.x86_64 #1 SMP Sun Jul 14 01:31:27 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
$ java -version
java version "1.7.0_25"
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)

The commands above tell me that I’m running a 64-bit JVM on a 64-bin machine.

2- How much memory is allocated to the Java process?

$ ps awwwxo pid,user,%mem,%cpu,vsz,rss,cmd | head -1; ps awwwxo pid,user,%mem,%cpu,vsz,rss,cmd | grep tomcat
 PID USER     %MEM %CPU    VSZ   RSS CMD
2885 houman   10.2  1.0 3405352 826028 /usr/java/jdk1.7.0_25/bin/java ... org.apache.catalina.startup.Bootstrap start
5774 houman    0.0  0.0 109408   872 grep --color=auto tomcat

The command above suggests that my Java process is allocated 3.4GB of memory by OS (VSZ) and is currently utilizing 0.8GB (RSS) of it.

3- How much memory is allocated to Java heap?

$ /usr/java/jdk1.7.0_25/bin/jmap -heap 2885
Attaching to process ID 2885, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 23.25-b01
using thread-local object allocation.
Parallel GC with 4 thread(s)
Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 536870912 (512.0MB)
   NewSize          = 1310720 (1.25MB)
   MaxNewSize       = 17592186044415 MB
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 134217728 (128.0MB)
   MaxPermSize      = 268435456 (256.0MB)
   G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
PS Young Generation
Eden Space:
   capacity = 55836672 (53.25MB)
   used     = 44561864 (42.49750518798828MB)
   free     = 11274808 (10.752494812011719MB)
   79.8075214797902% used
From Space:
   capacity = 59637760 (56.875MB)
   used     = 51424800 (49.042510986328125MB)
   free     = 8212960 (7.832489013671875MB)
   86.2285907451923% used
To Space:
   capacity = 59637760 (56.875MB)
   used     = 0 (0.0MB)
   free     = 59637760 (56.875MB)
   0.0% used
PS Old Generation
   capacity = 300941312 (287.0MB)
   used     = 124813120 (119.03106689453125MB)
   free     = 176128192 (167.96893310546875MB)
   41.47423933607361% used
PS Perm Generation
   capacity = 209780736 (200.0625MB)
   used     = 124733744 (118.95536804199219MB)
   free     = 85046992 (81.10713195800781MB)
   59.459103051292566% used
37997 interned Strings occupying 4151064 bytes.

It says my Java process has 287MB in old generation (119MB used) and 200MB in perm generation (118MB used) for example. On top it also indicates what are the ultimate limits for the Java process. When utilization of these 2 sections get high and the capacity is near the maximum available (Eden + From + To + PS Old grow towards MaxHeapSize, and PS Perm grows towards MaxPermSize), chances are you are running out of heap space, one way or another. Referring to this picture might help you understand the output better:

This command will tell you about garbage collections performed and the time spent doing so:

$ /usr/java/jdk1.7.0_25/bin/jstat -gcutil 2885
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT   
  0.00   86.34  20.72  41.71  59.46 83      1.991   6      1.832    3.823

This command will tell you about biggest objects that live in heap:

$ /usr/java/jdk1.7.0_25/bin/jmap -histo 2885 | head -20
 num     #instances         #bytes  class name
----------------------------------------------
   1:          1244       81546688  [Lorg.apache.activemq.command.DataStructure;
   2:        229010       31452120  
   3:        229010       31156304  
   4:        239863       28604688  [C
   5:         18433       22196816  
   6:         39290       15346496  [B
   7:         18433       14414408  
   8:         14020       13783136  
   9:         71423        6119320  [Ljava.util.HashMap$Entry;
  10:        237082        5689968  java.lang.String
  11:         50255        5038688  [I
  12:        125650        4020800  java.util.HashMap$Entry
  13:         97931        3917240  java.util.LinkedHashMap$Entry
  14:         42557        3404560  java.lang.reflect.Method
  15:        107680        3022072  [Ljava.lang.String;
  16:         38100        2438400  java.util.LinkedHashMap
  17:         19563        2362808  java.lang.Class

To investigate further, we might need to refer to the application logs, or use a more sophisticate tools (like YourKit, or Eclipse Memory Analyzer).

4- How much more memory are you using besides heap? Now deduct the RSS figure by your heap capacity, and that’s what you are using for everything else.

$ bc
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
826.028 - (53.25+56.875+56.875+287.0+200.0625)
171.9655

For me, it was 171MB. If this number is too high, it is worth checking threads. Running too many threads can affect memory usage of the application.

$ /usr/java/jdk1.7.0_25/bin/jstack 2885
Output is too long...

There is no easy way to figure out what’s wrong if native memory usage is too high, there are some methods mentioned here though.