SyntaxHighlighter

Friday, December 7, 2012

Part 2: Garbage Collection - Nuts and Bolts

Overview:
In Part 1 of this series, we have seen that java memory model has a Heap and a Stack and also how local variables and objects gets created on stack or heap and destroyed by garbage collector when their references go out of scope.

In this series, we will see
  1. Types of Garbage Collector and their configuration.
  2. Heap Memory and how to configure the size.
  3. Types of Garbage Collections.
  4. JVM Options for GC.
  5. How to read GC Logs.


Introduction:
As we all know that the application server and the applications deployed inside it runs on the JVM heap and garbage collector works as heap administrator or care taker to manage the heap and keep the memory up-to-date by removing all un-wanted objects and moving long lived objects here and there to avoid memory issues.
Type of Garbage Collector:
  As per oracle documentation,
  1. Serial Collector:  The serial collector uses a single thread to perform all garbage collection work, which makes it relatively efficient since there is no communication overhead between threads. It is best-suited to single processor machines, since it cannot take advantage of multiprocessor hardware, although it can be useful on multiprocessors for applications with small data sets (up to approximately 100MB). The serial collector is selected by default on certain hardware and operating system configurations, or we can enable it using JVM Option = -XX:+UseSerialGC. This is kind of “stop-the-world” collector , where all application threads must be stopped in order to run full GC.
  2. Parallel Collector: The parallel collector (also known as the throughput collector) performs minor collections in parallel, which can significantly reduce garbage collection overhead. It is intended for applications with medium- to large-sized data sets that are run on multiprocessor or multi-threaded hardware. The parallel collector is selected by default on certain hardware and operating system configurations, or we can enable it using JVM Option= -XX:+UseParallelGC. This is kind of “stop-the-world” collector , where all application threads must be stopped in order to run full GC.
  3. Concurrent (mark and sweep) Collector: Recommended for JDK 5 or above. The concurrent collector performs most of its work concurrently (i.e., while the application is still running) to keep garbage collection pauses short. It is designed for applications with medium- to large-sized data sets for which response time is more important than overall throughput, since the techniques used to minimize pauses can reduce application performance. JVM Options  
    -XX:+UseConcMarkSweepGC----- to let JVM know that we want CMS collector.
    -XX:+UseParNewGC ---- to collect young generation concurrently
    -XX: ParallelGCThreads= <number>----- Number of threads to use for GC
    This collector makes two very short stop-the-world pauses (one beginning of the GC and other in the middle of collection) and runs at the same as application threads for the rest of GC cycle. This means that it’s willing to make the trade-off of “false” negatives, failing to identify some garbage due to race condition (the garbage it misses will be collected in following GC cycle).
Heap Memory:
The java memory model divides the “heap” into two categories.
a) Young Generation (aka New Generation)Young generation heap is further classified into three different categories
  1. Eden: Eden is the area where all objects initially gets created by the application. Some of the objects may be short lived which dies in Eden itself.
  2. From:This is the area where all survived objects in Eden are moved into “from” space.
  3. To: This is the place where all survived objects in “From” space are moved into “To” space.
b) Old Generation (aka Tenured space or concurrent mark-sweep generation)This is the area where all survived objects in Young Generation (Eden + From + To) are moved, in other words, long lived objects lives here.

Logical view of JVM heap:




Heap Configurations:

Total Heap:Total heap size includes Young Generation + Old generation (Tenured).
To set minimum heap size to 512MB and maximum heap size 1024MB, use configuration as below
       -Xms512m -Xmx1024m
Young Generation Heapsimilarly can set the young generation size using below JVM options.
     -XNewSize ----sets minimum limit of Young Generation
     -XMaxNewSize ----sets maximum limit of young generation.
     
     How does Eden , From and To space gets set ?

     As we discussed earlier, Young generation is a combo of “Eden + From + To” spaces.
     JVM divides   Total young generation into Eden, From and To space using below option
    -XX: SurvivorRatio
     Eden = (From + To) * SurvivorRatio
     So if we set Young Generation size to 512 MB and Survivor Ratio = 8 then
      "From" space will be approximate = 32 MB
      "To" space will be approximate = 32 MB
      "Eden" space will be set to approximate= 448 MB.

PermGen Space: PermGen space is the area where all the class definitions loaded in the JVM and managed. None of object is created or stored in PermGen Space.
 JVM Options:
-XX: PermSize=512m
-XX: MaxPermSize=512m
Types of Garbage Collections
a) Minor or Young Collections
b) Major or Full Collection
a) Minor Garbage Collections (aka Young Collections):
  1. When an application creates an object it gets created in the Eden space. (as shown in picture below)
  2. This cycle continues until Eden space is full and no more space is left to create new object.
  3. Minor Garbage collection kicks in which does the following:
    1. Remove all unreferenced (not reachable) objects from the Eden space. That means, short lived objects will be removed from the memory.
    2. If few of them are still alive (reachable), those are moved from Eden to "From" space (as shown in picture below).
  4. Now, when Eden becomes full again, Minor collection kicks in  and does
    1. Remove short lived objects from memory.
    2. Move survived object from “From” space to “To” space and then Eden space to “From” space.
  5. When Eden becomes full again and after minor collection runs, any object which is still alive in “To” space is moved to “Old Generation” or "Tenured space".
Minor collections are very fast and efficient and do not require any running threads to halt. During minor collections, if any of the young object has a reference to the Tenured object (i.e object in Young generation has a reference to the object in Old generation) then the reference held by tenured object must also be scanned by minor collection.

b) Major Collections (aka Full Collections):
  1. When minor collection can’t move any more object to tenured space (due to lack of space), then Major Garbage collection kicks in (as shown in picture below).
  2. This is a stop the world collection which forces all application threads to be paused to perform object reachability test which in turn marks the live (reachable) objects and sweep all the left out (non-reachable) objects to free the memory. That’s why this garbage collector is called as “mark and sweep collector”. This mark and sweep is extremely a processor intensive operation.
  3. This may involve moving around the objects within Old generation, which is called as Compacting.
Safepoints:
As we seen above, Major Garbage collection cannot run without a pause of application threads and application threads cannot be stopped at any random time for GC. There are certain special times, when GC can take place and that is called as “Safepoints”. So full GC to take place, all the threads to be stopped at “Safepoints”.


JVM Options for Garbage Collection:
-verbose:gc ----- to enable GC informations to be printed.  
-XX:+DisableExplicitGC ---- Disable(ignore) explicit call by application like ..System.gc() or Runtime.gc()
-XX:+PrintGCApplicationStoppedTime ---- Print how much time application threads were stopped during GC.
-XX:+PrintGCApplicationConcurrentTime-----Print how much time GC was running along with application threads. 
-XX:+PrintGCDetails----causes additional informations to be printed
-XX:+PrintGCTimeStamps ----will add a time stamp at the start of each collection 
-Xloggc:C:\\gc.log---- log file name to log GC output
-XX:+PrintClassHistogram
XX:+HeapDumpOnOutOfMemoryError  ------ create automatic heap dump when JVM runs OOM.

Understand GC Logs

In order to understand GC logs, we got to try to add above JVM options to get detail picture of GC cycle. Before we use JVM options, we should be have clear idea what each option deos and what message it prints.

Lets take a close look for each option and see what kind of message it prints and what does that mean.

Each option's result are  highligted as Green.

-verbose:gc
[Full GC 252523K->252518K(253440K), 0.6746807 secs]

-XX:+PrintGCApplicationStoppedTime

Total time for which application threads were stopped: 0.1796777 seconds

-XX:+PrintGCApplicationConcurrentTime
Application time: 0.0102534 seconds

-XX:+PrintGCDetails
[Full GC [Tenured: 173867K->173867K(174784K), 0.5704368 secs] 252523K->252523K(253440K), [Perm : 100K->100K(12288K)], 0.5704791 secs] [Times: user=0.53 sys=0.00, real=0.57 secs]


-XX:+PrintGCTimeStamps
3.042: [Full GC 3.042: [Tenured: 173867K->173863K(174784K), 0.6854068 secs] 252523K->252518K(253440K), [Perm : 100K->100K(12288K)], 0.6854853 secs] [Times: user=0.69 sys=0.00, real=0.69 secs]

So far we have seen the data being displayed by the GC options. Now lets see what does each data mean. We will take below snapshot as an example.

3.042: [Full GC 3.042: [Tenured: 173867K->173863K(174784K), 0.6854068 secs] 252523K->252518K(253440K), [Perm : 100K->100K(12288K)], 0.6854853 secs] [Times: user=0.69 sys=0.00, real=0.69 secs]



In Part 3 , we will see how to troubleshoot OutOfMemory issue. So stay tuned.

No comments:

Post a Comment