cool hit counter Synchronized and Lock locks in the JVM implementation principles and code analysis_Intefrankly

Synchronized and Lock locks in the JVM implementation principles and code analysis


I. In-depth JVM locking mechanism: synchronized

synrhronizedKeyword Simplicity、 clearer、 Semantic clarity, So even with theLock interface, Still very widely used。 The semantics of its application layer are such that it is possible to put any non-null object as" lock up", propersynchronized When acting on the method, A lock is an instance of an object(this); When acting on a static method, the lock is on the object's correspondingClass an actual example, owing toClass Data exists on permanent tapes, So a static method lock is equivalent to a global lock for that class; propersynchronized When acting on an instance of an object, The locked block is the corresponding code block。 (located) atHotSpot JVM Under realization, The lock has a special name.: Object Monitor。

1.1 Thread states and state transitions

When multiple threads request an object monitor at the same time, the object monitor sets several states to distinguish between the requesting threads.

◆ Contention List: All threads requesting locks will be placed into this contention queue first.

◆ Entry List: Those threads in the Contention List that qualify as candidates are moved to the Entry List.

◆ Wait Set: Those threads whose calls to the wait method are blocked are placed in the Wait Set.

◆ OnDeck: At most one thread can be competing for a lock at any given moment, and that thread is called OnDeck, which can be understood as the Ready thread

◆ Owner: The thread that acquires the lock is called Owner.

◆ ! Owner: The thread that released the lock.

The following figure reflects a state transition relationship.

The new thread requesting the lock will first be added to the ConetentionList, and after some thread with the lock (Owner state) calls unlock, if the EntryList is found empty then move the thread from the ContentionList to the EntryList, the following explains how the ContentionList and EntryList are implemented.

1.2ContentionList virtual queue

The reason that the ContentionList is not a true Queue, but only a virtual queue, is that the ContentionList is composed of Node and its next pointer logic, and there is no data structure of a Queue. The ContentionList is a last-in-first-out (LIFO) queue, where each new Node is added at the head of the queue, the pointer to the first node is changed to the new node by CAS, and the next node is set to point to the subsequent node, while the fetch operation occurs at the end of the queue.

Apparently, the structure is actually a Lock-Free queue.

Because only Owner threads can fetch elements from the end of the queue ( The Owner thread will migrate threads from the ContentionList to the EntryList when it is unlocked ), i.e., no contention for thread-out operations, and of course the ABA problem of CAS is avoided.

1.3 EntryList

EntryList together withContentionList Logically in the same waiting queue,ContentionList Will be accessed concurrently by threads, In order to reduce the impact onContentionList Contention at the end of the team, rather than establishingEntryList。 The Owner thread will migrate threads from the ContentionList to the EntryList when it is unlocked, and will specifyEntryList One of the threads in( generally forHead) because ofReady(OnDeck) threads。Owner The thread does not pass the lock to theOnDeck threads, Just give the right to compete for the lock toOnDeck,OnDeck Threads need to re-compete for locks。 This sacrifices a certain amount of fairness though, But greatly improves overall throughput, (located) atHotspot pass on the middle ofOnDeck The act of selection is called“ Competition Switching”。

The OnDeck thread becomes the owner thread when it acquires the lock, and will remain in the EntryList if it is unable to acquire the lock, and its position in the EntryList does not change (it remains at the head of the queue) for fairness reasons. If the Owner thread is blocked by the wait method, it is moved to the WaitSet queue; if it is woken up at some point by notify/notifyAll, it is moved again to the EntryList.

1.4 Spin Lock

Those threads in ContetionList, EntryList, WaitSet are in the blocking state, and the blocking operation is done by the operating system (under Linxu through the pthread_mutex_lock function); after the thread is blocked it enters the kernel (Linux) scheduling state, which causes the system to switch back and forth between the user state and the kernel state, seriously affecting the performance of the lock. This is where it officially differs from Lock, whose blocking operation is its own blocking in the queue using the park method in LockSupport.

The solution to alleviate the above problem is spin-up, which is based on the principle that when contention occurs, if the Owner thread can release the lock in a short time, those threads that are contending can wait a little (spin-up), and after the Owner thread releases the lock, the contending thread may get the lock immediately, thus avoiding system blocking. However, the Owner may run beyond the threshold and the contending thread may spin for some time and still fail to obtain the lock, at which point the contending thread stops spinning and enters a blocking state (backing off). The basic idea is to spin and block again without success, minimizing the possibility of blocking, which has a very important performance improvement for those blocks of code with short execution times. Spin locks have a more apt name: spin-exponential backlocks, also known as compound locks. Obviously, spin only makes sense on a multiprocessor.

Another question is, what do threads do when they spin? Actually do nothing, you can execute a few for loops, you can execute a few empty assembly instructions, the purpose is to occupy the CPU and wait for the opportunity to acquire a lock. So, spin is a double-edged sword, if the spin is too long it will affect the overall performance, and if it is too short it will not achieve the purpose of delayed blocking. Obviously, the cycle selection of spin appears to be very important, but this is related to many scenarios such as operating system, hardware architecture, load of the system, etc. It is difficult to choose, and if it is not chosen properly, not only the performance will not be improved, but may also be degraded, so it is widely believed that spin locks are not scalable.

On the choice of spinlock period, HotSpot thinks the optimal time should be a thread context switch, but it doesn't currently do that. After investigation, which currently only suspends a few CPU cycles via assembly, HotSpot performs many other spin optimization strategies in addition to spin cycle selection, as follows.

◆ Always spin if the average load is less than CPUs.

◆ If more than (CPUs/2) threads are spinning, the later threads block directly.

◆ Delay the spin time (spin count) or go into blocking if the thread being spun discovers that the Owner has changed.

◆ Stops spin-up if the CPU is in power saving mode.

◆ The worst case of spin time is the CPU storage delay (the time difference between CPU A storing a data and CPU B learning about this data directly).

◆ The difference between thread priorities is properly discarded on spin.

When does that synchronized implementation use a spinlock? The answer is when the thread enters the ContentionList, i.e. before the first step of the operation. The thread first makes a spin attempt to acquire a lock when it enters the wait queue, and then enters the wait queue again if it is unsuccessful. This is slightly unfair to the threads that are already in the waiting queue. There is also the unfairness that the spin thread may have seized the lock of the Ready thread. Spin locks are maintained by each watch object, one per watch object.

1.5. Bias lock

Biased locking was introduced in JVM 1.6. Biased locking mainly solves the locking performance problem under no contention. first we look at what problems exist with locking under no contention. Almost all locks are now reentrant, i.e. a thread that has acquired a lock can lock/unlock the monitored object multiple times, and as per the previous HotSpot design, each locking/unlocking involves some CAS operation (e.g. CAS operation on a waiting queue), and CAS operations delay local calls to So the idea of the bias lock is that once a thread has acquired the watch object for the first time, the watch object is then "biased" towards that thread. Multiple subsequent calls avoid the CAS operation, which is simply setting a variable and eliminating the need to go through the various locking/unlocking processes if it is found to be true. But there are still many concepts to be explained and many introduced problems to be solved.

1.5.1 CAS and SMP Architecture

Why does CAS introduce local delays? This starts with the SMP (Symmetric Multiprocessor) architecture, which is roughly illustrated in the following figure.

The idea is that all CPUs will share a system bus (BUS) and rely on this bus to connect to main memory. Each core has its own level 1 cache, and each core is symmetrically distributed with respect to the BUS, hence the name "symmetric multiprocessor".

The full name of CAS is Compare-And-Swap, which is an atomic CPU instruction that allows the CPU to update the value of a location atomically after comparison.

Core1 and Core2 may simultaneously load the value of a location in main memory into their own L1 Cache. When Core1 modifies the value of this location in its own L1 Cache, it will "invalidate" the corresponding value in Core2's L1 Cache via the bus, and once Core2 finds that the value in its L1 Cache is invalid (called a Cache hit miss), it will load the latest value of that address from memory via the bus. When the values in Core1 and Core2 agree again, it is called "Cache Consistency", and at this level, the ultimate goal of lock design is to reduce Cache Consistency traffic.

And CAS happens to cause Cache consistency traffic, if there are many threads all sharing the same object, when some Core CAS succeeds it will inevitably cause a bus storm, this is called local latency, essentially biased locking is to eliminate CAS and reduce Cache consistency traffic.

1.5.2Cache coherence:

The Cache consistency mentioned above is actually supported by a protocol, and the common protocol now is MESI (first started by Intel), refer to: http://en.wikipedia.org/wiki/MESI_protocol, and this part will be carefully explained later.

1.5.3Cache Exceptions to consistent flows:

Not all CAS actually cause bus storms either, it has to do with the Cache conformance protocol, see http://blogs.oracle.com/dave/entry/biased_locking_in_hotspot

NUMA(Non Uniform Memory Access Achitecture) infrastructure:

together withSMP Correspondingly there are also asymmetric multiprocessors infrastructure, Now mainly used in some high-end processors, The main feature is that there is no bus, No common main memory, everyoneCore Have your own memory, This structure is not discussed here。

1.5.4 release from bias

One important issue introduced by biased locks is that in a multiple contention scenario, if another thread contends for the biased object, the owner needs to release the biased lock, and the release process introduces some performance overhead, but overall the benefits of biased locks outweigh the CAS costs.

1.6. Summary

Regarding locks, there are some other techniques introduced in JVM such as lock inflation, etc. These do not have a great impact compared to spin locks and biased locks and will not be described here. As you can see from the above, the underlying implementation of synchronized relies heavily on a Lock-Free queue, and the basic idea is to block after a spin and continue competing for locks after a competitive switch, slightly sacrificing fairness but gaining high throughput.

2, in-depth JVM locking mechanism: Lock

1. The procedure for calling ReentrantLock

  AbstractQueuedSynchronizer

    |---Sync

      |---NonfairSync

      |---NonfairSync

   After observing that ReentrantLock delegates all Lock interface operations to a Sync class, which inherits from AbstractQueuedSynchronizer.

  1. staticabstractclassSyncextendsAbstractQueuedSynchronizer

   Sync has two more subclasses.

  1. finalstaticclassNonfairSyncextendsSync
  2. finalstaticclassFairSyncextendsSync

is obviously defined to support both fair and non-fair locks, with the default being non-fair.

Let's first sort out the process of calling the Reentrant.lock() method (default non-fair lock).

These pesky Template patterns make it difficult to visualize the entire calling process. In fact, as you can see by the calling process above and the annotation of AbstractQueuedSynchronizer, AbstractQueuedSynchronizer abstracts most of the Lock functionality and only defers the implementation of the tryAcquire method to a subclass. The semantics of the tryAcquire method is to use a concrete subclass to determine if the lock is available to the requesting thread, and whether it succeeds or not AbstractQueuedSynchronizer will handle the process that follows.

2. Lock implementation (locking)

In short, AbstractQueuedSynchronizer will form a CLH queue of all the requested threads, when a thread finishes executing (lock.unlock()) it will activate its own successor node, but the threads that are executing are not in the queue, and those waiting to execute are all in the blocking state, after investigation the explicit blocking of threads is done by calling LockSupport.park(), which in turn calls the sun.misc.Unsafe.park() local method, and furthermore, HotSpot in Linux by calling the pthread_mutex_lock function to hand over the threads to the system kernel for blocking.

The queue is shown in the figure.

In the same way as synchronized, this is also a virtual queue, where there are no queue instances, only a back-and-forth relationship between nodes.

What is puzzling is why the CLH queue is used? The native CLH queue is used for spin locks, but Doug Lea converted it to blocking locks.

When there is a thread competing for a lock, that thread will try to get the lock first, which seems unfair to those threads already in the queue, which is where non-fair locks come from, similar to the synchronized implementation, which will greatly improve throughput.

If a Running thread already exists, the new competing thread is appended to the end of the queue, specifically using the CAS-based Lock-Free algorithm, because concurrent thread calls to CAS on Tail may cause CAS to fail on other threads, and the solution is Cyclic CAS [this cyclic CAS operation is also used in AtomicXXX]Until it works. The implementation of AbstractQueuedSynchronizer is very sophisticated and breathtaking, and it is difficult to fully appreciate its essence without entering into the details, as detailed below.

2.1 Sync.nonfairTryAcquire

The nonfairTryAcquire method will be the first method called indirectly by the lock method, and will be called first each time a lock is requested.

final boolean nonfairTryAcquire(int acquires) {   //acquires=1
    final Thread current = Thread.currentThread();   
    int c = getState();   
     if (c == 0) { // found no threads occupying the lock
        if (compareAndSetState(0, acquires)) {   
            setExclusiveOwnerThread(current);   
            return true;   
        }   
    }   
     else if (current == getExclusiveOwnerThread()) { // find that the current owner of the lock is itself, then add 1 to the corresponding counter and change the corresponding state.
        int nextc = c + acquires;   
        if (nextc < 0) // overflow   
            throw new Error("Maximum lock count exceeded");   
        setState(nextc);   
        return true;   
    }   
    return false;   
}   

This method will first determine the current state, if c==0 means no thread is competing for the lock, if not c != 0 means a thread is owning the lock.

If c==0 is found, the value of this state is set to acquires,acquires is initially called with a value of 1. Every time a thread reenters this lock it is +1 and every time it is unlocked it is -1, but the lock is released when it is 0. If CAS is set successfully, it can be expected that any other thread calling CAS will no longer succeed, and it is assumed that the current thread has got the lock, also as the Running thread, and it is clear that this Running thread is not in the waiting queue.

If c != 0 but finds itself already owning the lock, it simply ++acquires and modifies the status value, but by setStatus because there is no contention, not CAS, meaning this code implements the biased lock and does it beautifully.

2.2 AbstractQueuedSynchronizer.addWaiter

The addWaiter method is responsible for wrapping a thread that is currently unable to obtain a lock as a Node and adding it to the end of the queue.

private Node addWaiter(Node mode) {   
    Node node = new Node(Thread.currentThread(), mode);   
    // Try the fast path of enq; backup to full enq on failure   
    Node pred = tail;   
    if (pred != null) {   
        node.prev = pred;   
        if (compareAndSetTail(pred, node)) {   
            pred.next = node;   
            return node;   
        }   
    }   
    enq(node);   
    return node;   
}   

Where the parameter mode is the exclusive or shared lock, the default is null, exclusive lock. The action of trailing to the end of the line is in two steps.

  1. If the end of the current queue already exists (tail!=null), use CAS to update the current thread to Tail.
  2. If the current Tail is null or if the thread fails to set the tail of the queue by calling CAS, it continues to set the Tail via the enq method.

Here's the enq method: use the circular CAS method Add until success

private Node enq(final Node node) {   
    for (;;) {   
        Node t = tail;   
        if (t == null) { // Must initialize   
            Node h = new Node(); // Dummy header   
            h.next = node;   
            node.prev = h;   
            if (compareAndSetHead(h)) {   
                tail = node;   
                return h;   
            }   
        }   
        else {   
            node.prev = t;   
            if (compareAndSetTail(t, node)) {   
                t.next = node;   
                return t;   
            }   
        }   
    }   
}   

The method is a circular call to CAS, and even with a high concurrency scenario, the infinite loop will eventually succeed in appending the current thread to the end of the queue (or setting the head of the queue). In a nutshell, the purpose of addWaiter is to append the current now to the end of the queue via CAS and return the wrapped Node instance.

The main reason for packaging threads to be Node objects, in addition to using Node constructs for virtual queues, is to wrap various thread states in Node, which are carefully designed as some numeric values.

◆ SIGNAL(-1) : The thread's successor is/was blocked, when the thread is released or cancelled to re this successor thread (unpark).

◆ CANCELLED(1): The thread has been cancelled because of a timeout or interruption.

◆ CONDITION(-2): indicates that the thread is in the condition queue, that is, it is blocked because Condition.await was called.

◆ PROPAGATE(-3): propagate shared locks.

◆ 0: 0 means no status.

2.3 AbstractQueuedSynchronizer.acquireQueued

The main function of acquireQueued is to take the thread nodes that have been appended to the queue ( The addWaiter method returns the value ) to block, but before blocking and tryAccquire to retry whether the lock can be obtained, if the retry can be successful, then no need to block, directly return

final boolean acquireQueued(final Node node, int arg) {   
    try {   
        boolean interrupted = false;   
        for (;;) {   
            final Node p = node.predecessor();   
            if (p == head && tryAcquire(arg)) {    //
                setHead(node);   
                p.next = null; // help GC   
                 return interrupted;     // if the execution satisfies, i.e. the thread at the head of the queue gets a lock, then execute the contents of if renturn and exit the entire for loop.
            }   
            if (shouldParkAfterFailedAcquire(p, node) && parkAndCheckInterrupt())   // Blocking here
                interrupted = true;   
        }   
    } catch (RuntimeException ex) {   
        cancelAcquire(node);   
        throw ex;   
    }   
}   

Look closely at this method as an infinite loop, It feels as ifp == head && tryAcquire(arg) The loop will never end until the condition is met, Of course there's no dead end., The secret is in the first12 feasibleparkAndCheckInterrupt will hang the current thread, thus blocking the thread's call stack。

private final boolean parkAndCheckInterrupt() { 
    LockSupport.park(this); 
    return Thread.interrupted(); 
} 

As mentioned earlier, LockSupport.park ultimately hands the thread over to the system (Linux) kernel for blocking. Of course it is not immediately blocking the thread that does not request the lock, but also checking the status of the thread, for example if the thread is in the Cancel state is not necessary, the specific check is in shouldParkAfterFailedAcquire.

  private static boolean shouldParkAfterFailedAcquire(Node pred, Node node) {   
      int ws = pred.waitStatus;   
      if (ws == Node.SIGNAL)   
          /*  
           * This node has already set status asking a release  
           * to signal it, so it can safely park  
           */   
          return true;   
      if (ws > 0) {   
          /*  
           * Predecessor was cancelled. Skip over predecessors and  
           * indicate retry.  
           */   
   do {   
node.prev = pred = pred.prev;   
   } while (pred.waitStatus > 0);   
   pred.next = node;   
      } else {   
          /*  
           * waitStatus must be 0 or PROPAGATE. Indicate that we  
           * need a signal, but don't park yet. Caller will need to  
           * retry to make sure it cannot acquire before parking.   
           */   
          compareAndSetWaitStatus(pred, ws, Node.SIGNAL);   
      }    
      return false;   
  }   

The principles of inspection are.

◆ Rule 1: If the node status of the preceding node is SIGNAL, indicating that the current node needs to be unparked, it returns success, at which point line 12 of the acquireQueued method (parkAndCheckInterrupt) will cause the thread to block.

◆ rules and regulations2: If the state of the predecessor node isCANCELLED(ws>0), Indicates that the predecessor node has been abandoned, then backtracks to a non-cancelled predecessor node, return tofalse,acquireQueued The infinite loop of the method will recursively call the method, Until the rules1 return totrue, Causing threads to block。

◆ Rule 3: If the state of the predecessor node is non-SIGNAL and non-CANCELLED, set the state of the predecessor to SIGNAL, return false and then enter the infinite loop of acquireQueued, same as rule 2.

Overall, it seems that shouldParkAfterFailedAcquire is relying on the successor nodes to determine if the current thread should be blocked, and if the successor nodes are in the CANCELLED state, they are deleted by the way to reconstruct the queue.

At this point, the logic for locking the thread is complete, and the process of unlocking is discussed below.

3. Unlock

The unsuccessful thread requesting the lock is hung at line 12 of the acquireQueued method, and the code after line 12 must wait for the thread to be unlocked before it can be executed. If the blocked thread is unlocked, line 13 is executed, which sets interrupted = true, after which it enters another infinite loop.

From the code of the infinite loop, we can see that the thread that gets unlocked does not necessarily get the lock, and must call tryAccquire in line 6 to re-compete, because the lock is non-fair and may be obtained by a new thread, thus causing the thread that was just woken up to be blocked again. As you will see by the unlocking mechanism that will be described later, the first thread to be unlocked is Head, so the determination that p == head will basically succeed.

As you can see here, deferring the implementation of the tryAcquire method to a subclass is very subtle and extremely extensible, amazingly so! Of course the subtlety is not in this Templae design pattern, but in Doug Lea's careful layout of the lock structure.

The unlocking code is relatively simple and is mainly found in the AbstractQueuedSynchronizer.release and Sync.tryRelease methods.

class AbstractQueuedSynchronizer

public final boolean release(int arg) {   
    if (tryRelease(arg)) {   
        Node h = head;   
        if (h != null && h.waitStatus != 0)   
            unparkSuccessor(h);   
        return true;   
    }   
    return false;   
}   
class Sync

protected final boolean tryRelease(int releases) {   
    int c = getState() - releases;   
    if (Thread.currentThread() != getExclusiveOwnerThread())   
        throw new IllegalMonitorStateException();   
    boolean free = false;   
    if (c == 0) {   
        free = true;   
        setExclusiveOwnerThread(null);   
    }   
    setState(c);   
    return free;   
}   

tryRelease has the same semantics as tryAcquire, deferring the logic of how to release to the subclass. The tryRelease semantics are clear: if the thread locks multiple times, multiple releases are performed until status==0 then the lock is actually released, so called release the lock i.e. set status to 0, because there is no competition so no CAS is used.

The semantics of release is that the first thread in the queue (Head) is woken up if the lock can be released, with the following wakeup code.

private void unparkSuccessor(Node node) {   
    /*  
     * If status is negative (i.e., possibly needing signal) try  
     * to clear in anticipation of signalling. It is OK if this  
     * fails or if status is changed by waiting thread.  
     */   
    int ws = node.waitStatus;   
    if (ws < 0)   
        compareAndSetWaitStatus(node, ws, 0);    
   
    /*  
     * Thread to unpark is held in successor, which is normally  
     * just the next node.  But if cancelled or apparently null,  
     * traverse backwards from tail to find the actual  
     * non-cancelled successor.  
     */   
    Node s = node.next;   
    if (s == null || s.waitStatus > 0) {   
        s = null;   
        for (Node t = tail; t != null && t != node; t = t.prev)   
            if (t.waitStatus <= 0)   
                s = t;   
    }   
    if (s != null)   
        LockSupport.unpark(s.thread);   
}   

The point of this code is to find the first thread that can be unparked. Generally speaking head.next == head, Head is the first thread, but Head.next may be cancelled or set to null, so it's safer to look for the first available thread from back to front. It appears that backtracking will cause performance degradation, but the chances of this happening are actually very small, so there will be no performance impact. This is followed by a notification to the system kernel to continue the thread, which is done under Linux via pthread_mutex_unlock. After that, the unlocked thread enters the re-contested state described above.

4. Lock VS Synchronized

AbstractQueuedSynchronizer holds all blocking threads by constructing a blocking-based CLH queue for which operations are performed via Lock-Free (CAS) operations, but ReentrantLock implements biased locks for threads that have already acquired locks.

The underlying synchronized is also a wait queue based on CAS operations, but the JVM implementation is more refined, dividing the wait queue into ContentionList and EntryList, with the aim of reducing the speed of threads out of the queue; of course, it also implements a biased lock, and there is no essential difference between the two designs in terms of data structure. However, synchronized also implements spin locks and is optimized for different systems and hardware architectures, whereas Lock relies entirely on system blocking to hang up waiting threads.

Of course Lock is more suitable than synchronized in the application layer extension, you can inherit AbstractQueuedSynchronizer to define a variety of implementations, such as the implementation of read-write lock (ReadWriteLock), fair or unfair lock; at the same time, Lock corresponds to the Condition than wait/notify to facilitate much, much more flexible.


Recommended>>
1、Programming game lessons for the little bee robot
2、2018 Midyear Roundup the secret to wealth freedom
3、Toothpaste ingredient may control drugresistant malaria helps develop new drugs study says
4、An Unsuccessful Deep Learning Practice WeChat Jump
5、Software Definition Center Project Launched at Hanyama County Chinese Medicine Hospital

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送

    已发送

    朋友将在看一看看到

    确定
    分享你的想法...
    取消

    分享想法到看一看

    确定
    最多200字,当前共

    发送中

    网络异常,请稍后重试

    微信扫一扫
    关注该公众号