cool hit counter JVM Virtual Machine Learning I: Summary of Garbage Collection Algorithm_Intefrankly

JVM Virtual Machine Learning I: Summary of Garbage Collection Algorithm

1. data types involved in the java virtual machine

  Javain the virtual machine, Data types can be divided into two categories: Basic type and reference type。

  A variable of a basic type holds the original value, i.e.: the value he represents is the value itself; while a variable of a reference type holds the reference value. A "reference value" represents a reference to an object, not the object itself, which is stored at the location of the address represented by the reference value.

   Basic types include: byte,short,int,long,char,float,double,Boolean,returnAddress

   Reference types include: class types, interface types, and arrays.

   Just like inc Language andc++ in, Variables involving pointers need to be released manually, (located) atjava The middle and bottom layers have an impact on our reference type Recycling was also done;

2. Heap and Stack

   The heap and stack are key to the operation of the program, and it is important to make their relationship clear.

Figure 1 Heap and stack of a program

   The difference between a heap and a stack?
   The stack is the runtime unit, while the heap is the storage unit.

   The stack solves the problem of running the program, i.e. how the program is executed, or how the data is handled; the heap solves the problem of storing the data, i.e. how and where the data is put.

   In Java a thread has a corresponding thread stack with it, which is easy to understand because different threads have different execution logic and therefore need a separate thread stack. The heap, on the other hand, is shared by all threads. The stack, because it is the unit of operation, stores information that is relevant to the current thread (or program). This includes local variables, program run state, method return values, etc.; whereas the heap is only responsible for storing object information.

Why do you need to distinguish between heap and stack? Isn't it possible to store data in a stack?

  First, from a software design perspective, the stack represents the processing logic, while the heap represents the data. This separation makes the processing logic much clearer. The idea of divide and conquer. This idea of isolation and modularity is reflected in all aspects of software design.

   Second, the separation of the heap from the stack allows the contents of the heap to be shared by multiple stacks (which can also be interpreted as multiple threads accessing the same object). The benefits of this sharing are many. On the one hand this sharing provides an efficient way to interact with data (e.g., shared memory), and on the other hand, shared constants and caches in the heap can be accessed by all stacks, saving space.

   Third, the stack requires address segmentation because of runtime needs, such as keeping the context in which the system is running. Since the stack can only grow upwards, it limits the ability of the stack to store content. Unlike the heap, the objects in the heap can grow dynamically as needed, so the stack and heap are split to make dynamic growth possible, and only one address in the heap needs to be recorded in the corresponding stack.

   Fourth, object orientation is the perfect combination of heap and stack. In fact, there is no difference in execution between a program in the object-oriented way and a previously structured program. However, the introduction of object orientation has led to a change in the way problems are approached and closer to the natural way of thinking. When we take the object apart, you find that the object's properties are actually data, stored in the heap; and the object's behavior (methods), which is the runtime logic, is placed on the stack. When we write the object, we actually write the data structure and the logic to handle the data. Object-oriented design, I have to admit, is indeed beautiful.

In Java, the Main function is the start of the stack, and the start of the program.

   There is always a starting point for a program to run. As with C, Main in java is that starting point. No matter what java program, finding main is the entry point for program execution :)

What is stored in the heap? What is stored in the stack?

   Objects are stored in the heap. The stack holds references to basic data types and objects in the heap. The size of an object is not estimable, or can change dynamically, but in the stack, an object corresponds to only one 4btye reference (the benefit of stack separation).

Why not put the basic types in the heap? Because its footprint is typically 1 to 8 bytes - it takes less space, and because it's a basic type, there's no dynamic growth - the length is fixed, so storing it on the stack is enough, and there's little point in having him in the heap (and wasting space, as explained later). Suffice it to say that references to both basic types and objects are stored on the stack and are both a few bytes a piece, so they are handled in a uniform manner at program runtime. But there's a difference between basic types, object references and objects themselves, because one is data on the stack and one is data in the heap. One of the most common problems is when passing parameters in Java.

What about passing values when passing parameters in Java? Or pass the citation?

   To illustrate this, two points need to be made clear.

1. Don't try to have a relationship withC draw an analogy,Java There is no concept of a pointer in。

2. The program will always run on the stack, Thus when parameters are passed, There is only the problem of passing basic types and object references。 will not pass the object itself directly。

   After clarifying the above two points. Java makes pass-value calls when passing arguments to method calls because it doesn't have pointers (see C's pass-value calls for this point). Therefore, many books inside say that Java makes pass-value calls, which is fine, and also simplifies the complexity in C.

   But how does the illusion of passing on a quote create?In the run stack, basic types and references are handled the same way, both passing values, so if a method call is passed by reference, it can also be interpreted as a pass-value call that "passes a reference value" at the same time, i.e. the reference is handled exactly the same as the basic type. But when entering the method being called, the value of this reference being passed, is interpreted (or looked up) by the program to an object in the heap, which only corresponds to the real object at this time. If a modification is made at this point, it is the object corresponding to the reference that is modified, not the reference itself, i.e.: it is the data in the heap that is modified. So this modification is a keeper now.

   Objects, in a sense, are made up of basic types. An object can be thought of as a tree, the properties of the object are still a tree if it is still an object (i.e., not a leaf node), and the base type is a leaf node of the tree. When program parameters are passed, none of the values being passed can be modified per se, but if the value is a non-leaf node (i.e. an object reference), everything below that node can be modified.

   Of the heap and stack, the stack is the most fundamental thing for program operation. Programs can run without a heap, but not without a stack. Whereas the heap serves the stack for data storage, to put it bluntly the heap is a shared piece of memory. However, it is the idea of separating the heap from the stack that makes Java's garbage collection possible.

   In Java, the size of the stack is set via -Xss, and when there is more data stored in the stack, this value needs to be adjusted up appropriately, otherwise a java.lang.StackOverflowError exception will occur. A common occurrence of this exception is a recursion that cannot return, because the information held on the stack at this point is the record point returned by the method.

3, the size of Java objects

   The size of the basic data type is fixed, so I won't go into it here. For Java objects of non-basic types, the size is debatable. The following description is based on the 32-bit Oracle HotSpot JVM.

   In Java, the size of an empty Object object is 8byte, and this size is just the size of an object in the heap that holds no properties. Look at the following statement.

Object ob = new Object();

   This completes the life of a Java object in the program, but the space it takes up is: 4byte + 8byte. 4byte is the space needed to hold the reference in the Java stack as described in the above section (pointers in c also occupy a size of 4 bytes). And that 8byte is the information about the object in the Java heap. Because all Java objects of non-basic types need to inherit Object objects by default, the size of Java objects must be larger than 8byte, no matter what kind of object they are.

   With the size of the Object object, we can calculate the size of other objects.

Class NewObject {
   int count;  //4 bytes
   boolean flag;  //One byte
   Object ob;  //8 bytes

   Consider memory to it.

  Its size is: size of empty object (8byte) + size of int (4byte) + size of Boolean (1byte) + size of empty Object reference (4byte) = 17byte. But since Java allocates memory for objects in integer multiples of 8, the closest integer multiple of 8 that is larger than 17byte is 24, so the size of this object is 24byte.

   Here's something to note about the size of the basic type's wrapper type。 Because this wrapper type has become the object, So they need to be treated as objects。 The size of the package type is at least12byte( Declare an emptyObject The minimum space required), moreover12byte Does not contain any valid information, at the same time, owing toJava The object size is8 integer multiple of, Thus a basic type wrapper class has a size of at least16byte。 This memory footprint is horrible, It is use basicN times (multiplier)(N>2), Some types of memory usage are even more exaggerated( Just think about it.)。 therefore, As little as possible if possible use Packaging category。 (located) atJDK5.0 after, Because of the inclusion of automatic type change, therefore,Java The virtual machine will be optimized accordingly in terms of storage。

reference type

  Object reference types are classified as strong references, soft references, weak references, and dummy references.

  Strong reference: is the reference generated by the virtual machine when we generally declare the object is, strong reference environment, garbage collection needs to strictly determine whether the current object is strongly referenced. If it is strongly referenced, it will not be garbage collected.

   soft citation: Soft references are generally made as caches to use。 The difference with a strong reference is, Soft references are recycled at garbage collection time, and the VM will decide whether to recycle the soft references based on the current system memory remaining. If the remaining memory is tight, the VM reclaims the space referenced by the soft reference; if the remaining memory is relatively rich, it does not reclaim it. In other words, the VM must not have a soft reference present when OutOfMemory occurs.

   weak citation: Weak references are similar to soft references, are used as caches to use。 But unlike soft quotes, Weak references are bound to be recycled when garbage collection is done, so their lifecycle exists for only one garbage collection cycle

   It goes without saying that strong references, Our system is generally in use The strong references are used when。 but (not)“ soft citation” harmony“ weak citation” relatively uncommon。 They are generally used as caches use, And it's usually done as a cache when the memory size is more constrained。 Because if the memory is big enough, You can directly use A strong reference as a cache is sufficient, Also more controllable。 thereby, They are commonly seen being use Caching in desktop applications。

   The basic garbage collection algorithm is described below. Garbage collection algorithms can be classified from different perspectives.

By basic recovery strategy

  Reference Counting:

  A relatively old recycling algorithm. The principle is that this object has a reference, that is, to increase the count by one, and the deletion of a reference decreases the count by one. When garbage collected, only objects with a count of 0 are used to collect. The most fatal aspect of this algorithm is its inability to handle circular references.


Figure 2 Mark-Remove Policy

   This algorithm is executed in two stages. The first stage marks all referenced objects starting at the root of the reference, and the second stage traverses the entire heap to clear out the unmarked objects. This algorithm requires suspending the entire application and, at the same time, generates memory fragmentation.

  Copying (Copying):

Figure 3 Replication Strategy

   This algorithm divides the memory space into two equal regions, Only one at a time. use One of the regions。 When garbage is collected, Traversing the current use district, Putting in use The objects in the。 This algorithm processes only the data being use The object in the, Therefore the cost of replication is relatively small, Also, it can be copied over and memory is organized accordingly, It won't happen.“ debris” issues。 definitely, The disadvantages of this algorithm are also obvious, You need twice the memory space.。


Figure 4 Tagging-compression strategy

   This algorithm combines the advantages of both "mark-and-clear" and "copy" algorithms. The first stage marks all referenced objects from the root node, and the second stage traverses the entire heap, clearing unmarked objects and "compacting" surviving objects into one piece of the heap, in order. This algorithm avoids the fragmentation problem of "mark-and-clear" and the space problem of "copy" algorithms.

By the way zoning is treated

   Incremental collection(Incremental Collecting): Real-time garbage collection algorithm, i.e.: Garbage collection while the application is in progress。 I don't know what it is.JDK5.0 The collector in the use This algorithm of。

Subgenerational collection(Generational Collecting): Garbage collection algorithm based on analysis of object life cycle。 Dividing objects into young generations、 aged generation、 persistent generation, For objects with different life cycles use Different algorithms( One of the above approaches) Conduct recovery。 The current garbage collector( through (a gap)J2SE1.2 commencement) all use of this algorithm。

By System Threads

   Serial collection: serial collection uses a single thread to handle all garbage collection because it does not require multi-threaded interaction, is easy to implement, and is more efficient. However, its limitations are also more obvious, i.e., it is not possible to use the advantages of multiprocessors, so this collection is suitable for single-processor machines. Of course, this collector can also be used on multi-processor machines in the case of small data volumes (around 100M).

   parallel collection: parallel collection use Multi-threaded handling of garbage collection, Thus the speed is fast, high efficiency。 And theoreticallyCPU The higher the number, The more you can show the parallel collector advantages。

  Concurrent collection: in contrast to serial and parallel collection, the previous two require a pause in the entire runtime environment when performing garbage collection work, while only the garbage collector is running; therefore, the system will have a significant pause in garbage collection, and the pause will be longer as the heap gets larger. Concurrent collection, on the other hand, is done concurrently by the application and garbage collection, and the application does not pause.

How to tell the difference between garbage

   The above-mentioned“ Quote Count” law, By counting the number of references when the object is generated and deleted by the control。 The garbage collector collects a count of0 The object of the。 But this approach cannot resolve circular references。 consequently, In a later implementation of the garbage determination algorithm, All from the root node where the program is running, Iterate over the entire object reference, Find surviving objects。 Then in this way of implementation, Where does garbage collection start?? i.e., Where to start to find out which objects are being used by the current system use of。 The difference between heap and stack analyzed above, Where the stack is where the actual program execution takes place, So to get which objects are being use, then you need to start withJava stack start。 at the same time, A stack is the one corresponding to a thread, therefore, If there are multiple threads, then all the stacks corresponding to these threads must be checked。

Figure 5 Root object and object tree

   Also, in addition to the stack, there are system runtime registers, etc., that store program runtime data. In this way, starting with a reference on the stack or in a register, we can find objects in the heap, and from these objects we can find references to other objects in the heap, and such references gradually expand, ending with null references or basic types. This forms a single object tree with the object corresponding to the reference in the Java stack as the root node, and if there are multiple references in the stack, multiple object trees will eventually be formed. Objects in these object trees are objects that are needed for current system operation and cannot be garbage collected. Other remaining objects, on the other hand, can be considered as objects that cannot be referenced to and can be recycled as garbage.

   So the garbage collection starts with some root objects (java stack, static variables, registers... ), and the simplest Java stack is the main function where the Java program executes. This recycling method is also the "mark-and-clean" recycling method mentioned above.

How to deal with debris

   Since different Java objects do not live for a certain amount of time, fragmented memory fragments can occur after the program has been running for a while if memory is not tidied up. The most immediate problem with fragmentation is that it results in the inability to allocate large blocks of memory space, as well as programs running less efficiently. So, among the basic garbage collection algorithms mentioned above, the "copy" approach and the "mark-and-clean" approach can both solve the fragmentation problem.

How to solve the simultaneous object creation and object recycling problem

   A garbage collection thread recovers memory, while a program runner thread consumes (or allocates) memory; one recovers memory and one allocates it, and from that point of view, the two are contradictory. Therefore, in the existing garbage collection methods, to perform garbage collection, it is generally necessary to suspend the entire application (i.e.: suspend memory allocation) before performing garbage collection, and then continue the application after the collection is completed. This implementation is the most direct and effective way to resolve the conflict between the two.

   But there is an obvious disadvantage to this approach, which is that as the heap space continues to grow, the garbage collection time will also continue to grow accordingly, corresponding to a corresponding increase in the application pause time. Some applications with high corresponding time requirements, such as a maximum pause time requirement of a few hundred milliseconds, are likely to exceed this limit when the heap space is larger than a few G. In this case, garbage collection will become a bottleneck in the system operation. To resolve this contradiction, there are concurrent garbage collection algorithms, using which the garbage collection threads run simultaneously with the program run threads. In this way, the problem of pausing is solved, but the complexity of the algorithm will be greatly increased and the processing power of the system will be correspondingly reduced because of the need to recycle objects at the same time as new objects are generated, and the "fragmentation" problem will be more difficult to solve.

Why do we need to split the generation

   The generational garbage collection strategy is based on the fact that The life cycle is different for different objects. Therefore, objects with different life cycles can be collected in different ways in order to improve recycling efficiency.

   In the process of running Java programs, a large number of objects are generated, some of which are related to business information, such as Session objects in Http requests, threads, Socket connections, which are directly linked to the business and therefore have a long life cycle. But there are also some objects, mainly temporary variables generated during the program run, these objects will have a relatively short life cycle, for example: String objects, due to its invariant class characteristics, the system will generate a large number of these objects, some objects can even be recycled after only one use.

   Imagine that, without object lifetime differentiation, each garbage collection is done on the entire heap space, which takes a relatively long time, and because each collection requires a traversal of all surviving objects, but in practice, this traversal is ineffective for objects with a long lifetime, because many traversals may be done, but they still exist. Therefore, generational garbage collection uses the idea of partitioning and dividing generations, placing objects with different lifecycles on different generations and using the garbage collection method most suitable for it on different generations for recycling.

How to divide the generations

Figure 6 Java object partitioning

   As shown in the figure.

   The virtual machine is divided into three generations: Young Generation, Old Generation, and Permanent Generation. One of the persistence generations mainly holds class information about Java classes and has little to do with the Java objects to be collected by garbage collection. The division between younger and older generations is one that has a greater impact on garbage collection.

Younger generations:

  All newly generated objects are placed in the young generation in the first place. The goal of the younger generation is to collect off objects that have a short life cycle as quickly as possible. The young generation is divided into three zones. One Eden zone, two Survivor zones (in general). Most objects are generated in the Eden zone. When the Eden zone is full, the surviving objects will be copied to the Survivor zone (one of two), and when this Survivor zone is full, the surviving objects in this zone will be copied to another Survivor zone, and when this Survivor zone is also full, the objects copied from the first Survivor zone and still alive at this time will be copied to the "Tenured" zone. Note that the two zones of Survivor are symmetric and not sequential, so there may be both objects copied from Eden and objects copied from the previous Survivor in the same zone, while the only objects copied to the aged zone are those from the first Survivor. Also, there is always an empty Survivor zone. Also, Survivor zones are configurable as multiple (more than two) depending on the needs of the program, which increases the time the object is present in the younger generation and reduces the possibility of being placed in the older generation.

Older generation :

   Objects that survive after N garbage collections in the younger generation are placed in the older generation. Therefore, it can be assumed that the older generations store objects that have a longer life cycle.

Durable generation:

   Used to store static files, nowadaysJava kind、 methods etc.。 Persistent generation has no significant effect on garbage collection, But some applications may dynamically generate or call someclass, for exampleHibernate etc., At such times a relatively large persistent generation space needs to be set up to hold these classes added during the run。 Persistent generation size through-XX:MaxPermSize=<N> Make settings。

When to trigger garbage collection

   Since the objects are handled in generations, the garbage collection area and time are different. There are two types of GC: Scavenge GC and Full GC.

Scavenge GC (occurs when a new object is created with no space requested)

In general, When a new object is generated, And inEden When a space request fails, It will triggerScavenge GC, rightEden regional implementationGC, Removal of non-surviving objects, and move the still-alive objects toSurvivor distinguish。 Then organizeSurvivor of two districts。 in this wayGC It's for the younger generation.Eden district, Will not affect older generations。 Because most of the objects are created fromEden district start, at the same timeEden The district will not be allocated a large, consequentlyEden districtGC will occur frequently。 thereby, Generally needed here use speedy、 Efficient algorithms, enableEden Go get free as soon as you can.。

Full GC

   Organize the entire heap, including Young, Tenured and Perm. Full GC is slower than Scavenge GC because it requires recycling of the entire pair, so the number of Full GCs should be reduced as much as possible. A large part of the work in tuning the JVM is for FullGC tuning. There are the following reasons that may lead to Full GC.

- Aged generation (Tenured) is written all over · persistent generation(Perm) be covered in writing · System.gc() Called by display -Dynamic changes in the allocation policy of each domain of Heap after the last GC

Illustration of the sub-generation waste recycling process

Figure 7-1 Generational garbage collection1

Figure 7-2 Generational garbage collection2

Figure 7-3 Generational garbage collection3

Figure 7-4 Generational garbage collection4

Choosing the right garbage collection algorithm

serial collector

Figure 8 Serial Collector

   Handling all garbage collection with a single thread is more efficient because there is no need for multi-threaded interaction. However, it is also not possible to use the advantages of multiple processors, so this collector is suitable for single processor machines. Of course, this collector can also be used on multi-processor machines in the case of small data volumes (around 100M).

may use-XX:+UseSerialGC Open.

parallel collector

Figure 9 Parallel collector

   Parallel garbage collection for the younger generation, This reduces garbage collection time。 Typically on multithreaded multiprocessor machines use。

   use -XX:+UseParallelGC. open

  The parallel collector was introduced on J2SE 5.0 update 6 and enhanced in Java SE 6.0 to allow parallel collection of older generations. If the aged generation does not use concurrent collection, garbage collection is performed by default using a single thread, thus constraining scalability. use-XX:+UseParallelOldGC Open.

use-XX:ParallelGCThreads=<N> Sets the number of threads for parallel garbage collection。 This value can be set to be equal to the number of machine processors。

   This collector can be configured as follows.

Maximum garbage collection pause: Specify the maximum pause time for garbage collection, Pass-XX:MaxGCPauseMillis=<N> designate。<N> is milliseconds。 If this value is specified, Heap size and garbage collection-related parameters are adjusted to reach the specified value。 Setting this value may reduce the throughput of your app。 throughput: Throughput is the ratio of garbage collection time to non-garbage collection time, Pass-XX:GCTimeRatio=<N> to set, The formula is1/(1+N)。 for example,-XX:GCTimeRatio=19 time, express5% of time spent on garbage collection。 The default is99, i.e.1% of time spent on garbage collection。

Concurrent Collector

   It is guaranteed that most of the work is done concurrently (the application does not stop) and garbage collection is suspended for only a small amount of time. This collector is suitable for medium and large scale applications with high response time requirements.

use-XX:+UseConcMarkSweepGC Open.


Figure 10 Concurrent Collector

   Concurrent Collector Reduction of suspension time mainly for older generations, He's applying it without stopping use Separate garbage collection thread, Track up to objects。 In each older generation of garbage collection cycle, In the early stages of collection Concurrent Collector A short pause is made for the entire app, It is paused again in the collection。 The second pause will be slightly longer than the first, During this process, multiple threads do garbage collection work at the same time。

   Concurrent Collector use The processor is replaced by a short pause。 in oneN processor on the system, Concurrent collection section useK/N available for recycling, In general1<=K<=N/4。

   On a host with only one processor use Concurrent Collector, Set toincremental mode The mode also provides a shorter pause time。

  Floating Garbage: Since garbage collection is performed while the application is running, some garbage may be generated while garbage collection is being completed, resulting in "Floating Garbage", which needs to be collected during the next garbage collection cycle. So, concurrent collectors typically require 20% of the reserved space for these floating garbage.

  Concurrent Mode Failure: Concurrent Collector Collected while the app is running, So you need to make sure that there is enough space for the program during the garbage collection period use, otherwise, Garbage collection is not yet complete, The heap space is full first。 This will happen“ Concurrent mode failed”, At this point the entire application will pause, Conducting garbage collection。

   launch Concurrent Collector: Because concurrent collection is done at application runtime, So it must be ensured that there is enough memory space for the program before the collection is completed use, Otherwise it will appear“Concurrent Mode Failure”。 By setting-XX:CMSInitiatingOccupancyFraction=<N> Specify how much of the heap is left to start concurrent collection。


Serial processor.

-- Where applicable: The amount of data is relatively small(100M around); Applications with a single processor and no requirement for response time。 --Cons: Only for small applications

Parallel processors.

-- Where applicable:“ There are high requirements for throughput”, manyCPU、 There is no requirement for app response times、 Large applications。 Example: Background processing、 Scientific calculations。 --Disadvantage: application response time may be longer during garbage collection

Concurrent processors.

--Applicable to: "high demand for response time", multi-CPU, medium and large applications with high demand for application response time. Examples: Web server/application server, telecommunication exchange, integrated development environment.

1、Watch out Hackers will use smartphone sensors to crack your PIN
2、Marshal Bull cheat auxiliary control
3、Rookie Learning Machine Learning Kickstart Episode
4、A word about closures in Python
5、The Birth of the Ulord Mascot

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送