• T
    net: core: Correct the sock::sk_lock.owned lockdep annotations · 2dcb96ba
    Thomas Gleixner 提交于
    lock_sock_fast() and lock_sock_nested() contain lockdep annotations for the
    sock::sk_lock.owned 'mutex'. sock::sk_lock.owned is not a regular mutex. It
    is just lockdep wise equivalent. In fact it's an open coded trivial mutex
    implementation with some interesting features.
    
    sock::sk_lock.slock is a regular spinlock protecting the 'mutex'
    representation sock::sk_lock.owned which is a plain boolean. If 'owned' is
    true, then some other task holds the 'mutex', otherwise it is uncontended.
    As this locking construct is obviously endangered by lock ordering issues as
    any other locking primitive it got lockdep annotated via a dedicated
    dependency map sock::sk_lock.dep_map which has to be updated at the lock
    and unlock sites.
    
    lock_sock_nested() is a straight forward 'mutex' lock operation:
    
      might_sleep();
      spin_lock_bh(sock::sk_lock.slock)
      while (!try_lock(sock::sk_lock.owned)) {
          spin_unlock_bh(sock::sk_lock.slock);
          wait_for_release();
          spin_lock_bh(sock::sk_lock.slock);
      }
    
    The lockdep annotation for sock::sk_lock.owned is for unknown reasons
    _after_ the lock has been acquired, i.e. after the code block above and
    after releasing sock::sk_lock.slock, but inside the bottom halves disabled
    region:
    
      spin_unlock(sock::sk_lock.slock);
      mutex_acquire(&sk->sk_lock.dep_map, subclass, 0, _RET_IP_);
      local_bh_enable();
    
    The placement after the unlock is obvious because otherwise the
    mutex_acquire() would nest into the spin lock held region.
    
    But that's from the lockdep perspective still the wrong place:
    
     1) The mutex_acquire() is issued _after_ the successful acquisition which
        is pointless because in a dead lock scenario this point is never
        reached which means that if the deadlock is the first instance of
        exposing the wrong lock order lockdep does not have a chance to detect
        it.
    
     2) It only works because lockdep is rather lax on the context from which
        the mutex_acquire() is issued. Acquiring a mutex inside a bottom halves
        and therefore non-preemptible region is obviously invalid, except for a
        trylock which is clearly not the case here.
    
        This 'works' stops working on RT enabled kernels where the bottom halves
        serialization is done via a local lock, which exposes this misplacement
        because the 'mutex' and the local lock nest the wrong way around and
        lockdep complains rightfully about a lock inversion.
    
    The placement is wrong since the initial commit a5b5bb9a ("[PATCH]
    lockdep: annotate sk_locks") which introduced this.
    
    Fix it by moving the mutex_acquire() in front of the actual lock
    acquisition, which is what the regular mutex_lock() operation does as well.
    
    lock_sock_fast() is not that straight forward. It looks at the first glance
    like a convoluted trylock operation:
    
      spin_lock_bh(sock::sk_lock.slock)
      if (!sock::sk_lock.owned)
          return false;
      while (!try_lock(sock::sk_lock.owned)) {
          spin_unlock_bh(sock::sk_lock.slock);
          wait_for_release();
          spin_lock_bh(sock::sk_lock.slock);
      }
      spin_unlock(sock::sk_lock.slock);
      mutex_acquire(&sk->sk_lock.dep_map, subclass, 0, _RET_IP_);
      local_bh_enable();
      return true;
    
    But that's not the case: lock_sock_fast() is an interesting optimization
    for short critical sections which can run with bottom halves disabled and
    sock::sk_lock.slock held. This allows to shortcut the 'mutex' operation in
    the non contended case by preventing other lockers to acquire
    sock::sk_lock.owned because they are blocked on sock::sk_lock.slock, which
    in turn avoids the overhead of doing the heavy processing in release_sock()
    including waking up wait queue waiters.
    
    In the contended case, i.e. when sock::sk_lock.owned == true the behavior
    is the same as lock_sock_nested().
    
    Semantically this shortcut means, that the task acquired the 'mutex' even
    if it does not touch the sock::sk_lock.owned field in the non-contended
    case. Not telling lockdep about this shortcut acquisition is hiding
    potential lock ordering violations in the fast path.
    
    As a consequence the same reasoning as for the above lock_sock_nested()
    case vs. the placement of the lockdep annotation applies.
    
    The current placement of the lockdep annotation was just copied from
    the original lock_sock(), now renamed to lock_sock_nested(),
    implementation.
    
    Fix this by moving the mutex_acquire() in front of the actual lock
    acquisition and adding the corresponding mutex_release() into
    unlock_sock_fast(). Also document the fast path return case with a comment.
    Reported-by: NSebastian Siewior <bigeasy@linutronix.de>
    Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
    Cc: netdev@vger.kernel.org
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Jakub Kicinski <kuba@kernel.org>
    Cc: Eric Dumazet <edumazet@google.com>
    Signed-off-by: NDavid S. Miller <davem@davemloft.net>
    2dcb96ba
sock.h 78.6 KB