Background
In order to avoid long Java garbage collection pauses and avoid memory fragmentation Accumulo long ago resorted to using off heap memory to store and sort incoming data. This was accomplished using JNI and C++ STL maps with custom allocators.The main reason long GC pauses must be avoided is that it can cause Accumulo server processes to lose their lock in zookeeper and shut down. A secondary reason to avoid GC pauses is to keep the system responsive to queries.
With the release of the new G1 garbage collector in Java 7, I wanted to see how Accumulo performed storing and sorting its incoming data on the Java heap. This is accomplished by setting tserver.memory.maps.native.enabled to false.
Test Environment
I ran a set of experiments on a single node Accumulo instance to get a sense of how well G1 would work with the following environment.Variable | Value |
---|---|
Accumulo version | 1.5.0-SNAPSHOT circa 10/28 |
Java version | 7u9 |
Hadoop verions | 1.0.3 |
OS | RHEL 6 |
Compression | snappy |
Java GC options | -XX:+PrintCommandLineFlags -XX:+PrintGCDetails -XX:+UseG1GC |
# tablets | 64 |
Continuous ingest, an Accumulo test suite, was used to generate key value pairs for this experiment. Two continuous ingest processes were run for each test.
Test Results
Below are the results of running the test. A rate of zero means the test did not complete because a long Java GC pause caused the tablet server to die. No swapping occurred during these test.Map Memory | Java Heap | Native map | Ingest | Rate | Max Pause |
---|---|---|---|---|---|
6G | 1G | TRUE | 100M | 247,787 | 0.77s |
6G | 1G | TRUE | 202M | 232,894 | 0.98s |
2G | 1G | TRUE | 202M | 232,221 | 1.46s |
2G | 1G | TRUE | 100M | 230,562 | 0.08s |
2G | 6G | FALSE | 100M | 144,916 | 3.84s |
6G | 12G | FALSE | 100M | 144,790 | 10s |
2G | 6G | FALSE | 202M | 142,650 | 4.8s |
6G | 12G | FALSE | 202M | 141,658 | 10.54s |
6G | 8G | FALSE | 100M | 0 | > 30s |
2G | 4G | FALSE | 100M | 0 | > 30s |
Conclusion
From the test results two things are apparent. First the native C++ map is faster. Second, if the difference between the Java Heap size and the amount of memory Accumulo uses is too small then a long pause occurs. For example when the Java heap was 8G and Accumulo used 6G for ingest, the test failed.
I think the failures occurred because there were no good G1 heap regions to collect. All test had 64 tablets. Continuous ingest generates uniform data that spreads evenly across all tablets. Since writes continually go to all tablets, each G1 heap region will have data for each tablet. When one tablet is minor compacted this frees a little bit of memory in each heap region. When Java attempts to free memory there are a lot of heap regions with a little bit of free memory. Therefore G1 is forced to pick a set of these and copy live objects to newer heap regions. I suspect its this copying that eventually takes too long and causes a long pause.
In the cases where G1 did not fail the Java heap was much larger than the memory Accumulo used to store incoming data. In this case, I think there were older heap regions that were mostly free and quickly collected. With more memory, as maps are compacted older heap regions have less live objects.
So it seems Accumulo can be made to work with G1, but its slower and uses more memory.