1.11.12

Testing Accumulo with G1


Background

In order to avoid long Java garbage collection pauses and avoid memory fragmentation Accumulo long ago resorted to using off heap memory to store and sort incoming data.   This was accomplished using JNI and C++ STL maps with custom allocators.

The main reason long GC pauses must be avoided is that it can cause Accumulo server processes to lose their lock in zookeeper and shut down.   A secondary reason to avoid GC pauses is to keep the system responsive to queries.

With the release of the new G1 garbage collector in Java 7, I wanted to see how Accumulo performed storing and sorting its incoming data on the Java heap.  This is accomplished by setting   tserver.memory.maps.native.enabled to false.

Test Environment

I ran a set of experiments on a single node Accumulo instance to get a sense of how well G1 would work with the following environment.
Variable Value
Accumulo version 1.5.0-SNAPSHOT circa 10/28
Java version 7u9
Hadoop verions 1.0.3
OS RHEL 6
Compression snappy
Java GC options -XX:+PrintCommandLineFlags -XX:+PrintGCDetails -XX:+UseG1GC
# tablets 64

Continuous ingest, an Accumulo test suite, was used to generate key value pairs for this experiment. Two continuous ingest processes were run for each test.

Test Results

Below are the results of running the test. A rate of zero means the test did not complete because a long Java GC pause caused the tablet server to die. No swapping occurred during these test.
Map MemoryJava HeapNative mapIngestRateMax Pause
6G1GTRUE100M247,7870.77s
6G1GTRUE202M232,8940.98s
2G1GTRUE202M232,2211.46s
2G1GTRUE100M230,5620.08s
2G6GFALSE100M144,9163.84s
6G12GFALSE100M144,79010s
2G6GFALSE202M142,6504.8s
6G12GFALSE202M141,65810.54s
6G8GFALSE100M0> 30s
2G4GFALSE100M0> 30s

Conclusion


From the test results two things are apparent. First the native C++ map is faster. Second, if the difference between the Java Heap size and the amount of memory Accumulo uses is too small then a long pause occurs. For example when the Java heap was 8G and Accumulo used 6G for ingest, the test failed.

I think the failures occurred because there were no good G1 heap regions to collect. All test had 64 tablets. Continuous ingest generates uniform data that spreads evenly across all tablets. Since writes continually go to all tablets, each G1 heap region will have data for each tablet. When one tablet is minor compacted this frees a little bit of memory in each heap region. When Java attempts to free memory there are a lot of heap regions with a little bit of free memory. Therefore G1 is forced to pick a set of these and copy live objects to newer heap regions. I suspect its this copying that eventually takes too long and causes a long pause.

In the cases where G1 did not fail the Java heap was much larger than the memory Accumulo used to store incoming data. In this case, I think there were older heap regions that were mostly free and quickly collected. With more memory, as maps are compacted older heap regions have less live objects.

So it seems Accumulo can be made to work with G1, but its slower and uses more memory.