1. 16 7月, 2017 2 次提交
    • B
      fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks · 9d5b86ac
      Benjamin Coddington 提交于
      Since commit c69899a1 "NFSv4: Update of VFS byte range lock must be
      atomic with the stateid update", NFSv4 has been inserting locks in rpciod
      worker context.  The result is that the file_lock's fl_nspid is the
      kworker's pid instead of the original userspace pid.
      
      The fl_nspid is only used to represent the namespaced virtual pid number
      when displaying locks or returning from F_GETLK.  There's no reason to set
      it for every inserted lock, since we can usually just look it up from
      fl_pid.  So, instead of looking up and holding struct pid for every lock,
      let's just look up the virtual pid number from fl_pid when it is needed.
      That means we can remove fl_nspid entirely.
      
      The translaton and presentation of fl_pid should handle the following four
      cases:
      
      1 - F_GETLK on a remote file with a remote lock:
          In this case, the filesystem should determine the l_pid to return here.
          Filesystems should indicate that the fl_pid represents a non-local pid
          value that should not be translated by returning an fl_pid <= 0.
      
      2 - F_GETLK on a local file with a remote lock:
          This should be the l_pid of the lock manager process, and translated.
      
      3 - F_GETLK on a remote file with a local lock, and
      4 - F_GETLK on a local file with a local lock:
          These should be the translated l_pid of the local locking process.
      
      Fuse was already doing the correct thing by translating the pid into the
      caller's namespace.  With this change we must update fuse to translate
      to init's pid namespace, so that the locks API can then translate from
      init's pid namespace into the pid namespace of the caller.
      
      With this change, the locks API will expect that if a filesystem returns
      a remote pid as opposed to a local pid for F_GETLK, that remote pid will
      be <= 0.  This signifies that the pid is remote, and the locks API will
      forego translating that pid into the pid namespace of the local calling
      process.
      
      Finally, we convert remote filesystems to present remote pids using
      negative numbers. Have lustre, 9p, ceph, cifs, and dlm negate the remote
      pid returned for F_GETLK lock requests.
      
      Since local pids will never be larger than PID_MAX_LIMIT (which is
      currently defined as <= 4 million), but pid_t is an unsigned int, we
      should have plenty of room to represent remote pids with negative
      numbers if we assume that remote pid numbers are similarly limited.
      
      If this is not the case, then we run the risk of having a remote pid
      returned for which there is also a corresponding local pid.  This is a
      problem we have now, but this patch should reduce the chances of that
      occurring, while also returning those remote pid numbers, for whatever
      that may be worth.
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      9d5b86ac
    • B
      fs/locks: Use allocation rather than the stack in fcntl_getlk() · 52306e88
      Benjamin Coddington 提交于
      Struct file_lock is fairly large, so let's save some space on the stack by
      using an allocation for struct file_lock in fcntl_getlk(), just as we do
      for fcntl_setlk().
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      52306e88
  2. 27 5月, 2017 2 次提交
  3. 21 4月, 2017 1 次提交
  4. 25 12月, 2016 1 次提交
  5. 18 10月, 2016 1 次提交
  6. 28 9月, 2016 1 次提交
  7. 22 9月, 2016 3 次提交
    • P
      fs/locks: Use percpu_down_read_preempt_disable() · 87709e28
      Peter Zijlstra 提交于
      Avoid spurious preemption.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: der.herr@hofr.at
      Cc: paulmck@linux.vnet.ibm.com
      Cc: riel@redhat.com
      Cc: tj@kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      87709e28
    • P
      fs/locks: Replace lg_local with a per-cpu spinlock · 7c3f654d
      Peter Zijlstra 提交于
      As Oleg suggested, replace file_lock_list with a structure containing
      the hlist head and a spinlock.
      
      This completely removes the lglock from fs/locks.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: der.herr@hofr.at
      Cc: paulmck@linux.vnet.ibm.com
      Cc: riel@redhat.com
      Cc: tj@kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7c3f654d
    • P
      fs/locks: Replace lg_global with a percpu-rwsem · aba37660
      Peter Zijlstra 提交于
      Replace the global part of the lglock with a percpu-rwsem.
      
      Since fcl_lock is a spinlock and itself nests under i_lock, which too
      is a spinlock we cannot acquire sleeping locks at
      locks_{insert,remove}_global_locks().
      
      We can however wrap all fcl_lock acquisitions with percpu_down_read
      such that all invocations of locks_{insert,remove}_global_locks() have
      that read lock held.
      
      This allows us to replace the lg_global part of the lglock with the
      write side of the rwsem.
      
      In the absense of writers, percpu_{down,up}_read() are free of atomic
      instructions. This further avoids the very long preempt-disable
      regions caused by lglock on larger machines.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: der.herr@hofr.at
      Cc: paulmck@linux.vnet.ibm.com
      Cc: riel@redhat.com
      Cc: tj@kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      aba37660
  8. 16 9月, 2016 2 次提交
    • M
      vfs: do get_write_access() on upper layer of overlayfs · 4d0c5ba2
      Miklos Szeredi 提交于
      The problem with writecount is: we want consistent handling of it for
      underlying filesystems as well as overlayfs.  Making sure i_writecount is
      correct on all layers is difficult.  Instead this patch makes sure that
      when write access is acquired, it's always done on the underlying writable
      layer (called the upper layer).  We must also make sure to look at the
      writecount on this layer when checking for conflicting leases.
      
      Open for write already updates the upper layer's writecount.  Leaving only
      truncate.
      
      For truncate copy up must happen before get_write_access() so that the
      writecount is updated on the upper layer.  Problem with this is if
      something fails after that, then copy-up was done needlessly.  E.g. if
      break_lease() was interrupted.  Probably not a big deal in practice.
      
      Another interesting case is if there's a denywrite on a lower file that is
      then opened for write or truncated.  With this patch these will succeed,
      which is somewhat counterintuitive.  But I think it's still acceptable,
      considering that the copy-up does actually create a different file, so the
      old, denywrite mapping won't be touched.
      
      On non-overlayfs d_real() is an identity function and d_real_inode() is
      equivalent to d_inode() so this patch doesn't change behavior in that case.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Acked-by: NJeff Layton <jlayton@poochiereds.net>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      4d0c5ba2
    • M
      locks: fix file locking on overlayfs · c568d683
      Miklos Szeredi 提交于
      This patch allows flock, posix locks, ofd locks and leases to work
      correctly on overlayfs.
      
      Instead of using the underlying inode for storing lock context use the
      overlay inode.  This allows locks to be persistent across copy-up.
      
      This is done by introducing locks_inode() helper and using it instead of
      file_inode() to get the inode in locking code.  For non-overlayfs the two
      are equivalent, except for an extra pointer dereference in locks_inode().
      
      Since lock operations are in "struct file_operations" we must also make
      sure not to call underlying filesystem's lock operations.  Introcude a
      super block flag MS_NOREMOTELOCK to this effect.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Acked-by: NJeff Layton <jlayton@poochiereds.net>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      c568d683
  9. 19 8月, 2016 1 次提交
  10. 01 7月, 2016 1 次提交
    • M
      locks: use file_inode() · 6343a212
      Miklos Szeredi 提交于
      (Another one for the f_path debacle.)
      
      ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask.
      
      The reason is that generic_add_lease() used filp->f_path.dentry->inode
      while all the others use file_inode().  This makes a difference for files
      opened on overlayfs since the former will point to the overlay inode the
      latter to the underlying inode.
      
      So generic_add_lease() added the lease to the overlay inode and
      generic_delete_lease() removed it from the underlying inode.  When the file
      was released the lease remained on the overlay inode's lock list, resulting
      in use after free.
      Reported-by: NEryu Guan <eguan@redhat.com>
      Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      6343a212
  11. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  12. 09 1月, 2016 5 次提交
  13. 08 1月, 2016 1 次提交
    • J
      locks: fix unlock when fcntl_setlk races with a close · 7f3697e2
      Jeff Layton 提交于
      Dmitry reported that he was able to reproduce the WARN_ON_ONCE that
      fires in locks_free_lock_context when the flc_posix list isn't empty.
      
      The problem turns out to be that we're basically rebuilding the
      file_lock from scratch in fcntl_setlk when we discover that the setlk
      has raced with a close. If the l_whence field is SEEK_CUR or SEEK_END,
      then we may end up with fl_start and fl_end values that differ from
      when the lock was initially set, if the file position or length of the
      file has changed in the interim.
      
      Fix this by just reusing the same lock request structure, and simply
      override fl_type value with F_UNLCK as appropriate. That ensures that
      we really are unlocking the lock that was initially set.
      
      While we're there, make sure that we do pop a WARN_ON_ONCE if the
      removal ever fails. Also return -EBADF in this event, since that's
      what we would have returned if the close had happened earlier.
      
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Fixes: c293621b (stale POSIX lock handling)
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Acked-by: N"J. Bruce Fields" <bfields@fieldses.org>
      7f3697e2
  14. 18 12月, 2015 1 次提交
    • P
      fs: make locks.c explicitly non-modular · 91899226
      Paul Gortmaker 提交于
      The Kconfig currently controlling compilation of this code is:
      
      config FILE_LOCKING
           bool "Enable POSIX file locking API" if EXPERT
      
      ...meaning that it currently is not being built as a module by anyone.
      
      Lets remove the couple traces of modularity so that when reading the
      driver there is no doubt it is builtin-only.
      
      Since module_init translates to device_initcall in the non-modular
      case, the init ordering gets bumped to one level earlier when we
      use the more appropriate fs_initcall here.  However we've made similar
      changes before without any fallout and none is expected here either.
      
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Acked-by: NJeff Layton <jlayton@poochiereds.net>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      91899226
  15. 08 12月, 2015 1 次提交
  16. 18 11月, 2015 1 次提交
  17. 16 11月, 2015 1 次提交
    • J
      locks: Allow disabling mandatory locking at compile time · 9e8925b6
      Jeff Layton 提交于
      Mandatory locking appears to be almost unused and buggy and there
      appears no real interest in doing anything with it.  Since effectively
      no one uses the code and since the code is buggy let's allow it to be
      disabled at compile time.  I would just suggest removing the code but
      undoubtedly that will break some piece of userspace code somewhere.
      
      For the distributions that don't care about this piece of code
      this gives a nice starting point to make mandatory locking go away.
      
      Cc: Benjamin Coddington <bcodding@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jeff Layton <jeff.layton@primarydata.com>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      9e8925b6
  18. 23 10月, 2015 3 次提交
  19. 15 10月, 2015 1 次提交
  20. 21 9月, 2015 1 次提交
    • D
      fs: fix data races on inode->i_flctx · 128a3785
      Dmitry Vyukov 提交于
      locks_get_lock_context() uses cmpxchg() to install i_flctx.
      cmpxchg() is a release operation which is correct. But it uses
      a plain load to load i_flctx. This is incorrect. Subsequent loads
      from i_flctx can hoist above the load of i_flctx pointer itself
      and observe uninitialized garbage there. This in turn can lead
      to corruption of ctx->flc_lock and other members.
      
      Documentation/memory-barriers.txt explicitly requires to use
      a barrier in such context:
      "A load-load control dependency requires a full read memory barrier".
      
      Use smp_load_acquire() in locks_get_lock_context() and in bunch
      of other functions that can proceed concurrently with
      locks_get_lock_context().
      
      The data race was found with KernelThreadSanitizer (KTSAN).
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      128a3785
  21. 01 9月, 2015 1 次提交
  22. 13 7月, 2015 3 次提交
  23. 17 4月, 2015 1 次提交
    • A
      proc: show locks in /proc/pid/fdinfo/X · 6c8c9031
      Andrey Vagin 提交于
      Let's show locks which are associated with a file descriptor in
      its fdinfo file.
      
      Currently we don't have a reliable way to determine who holds a lock.  We
      can find some information in /proc/locks, but PID which is reported there
      can be wrong.  For example, a process takes a lock, then forks a child and
      dies.  In this case /proc/locks contains the parent pid, which can be
      reused by another process.
      
      $ cat /proc/locks
      ...
      6: FLOCK  ADVISORY  WRITE 324 00:13:13431 0 EOF
      ...
      
      $ ps -C rpcbind
        PID TTY          TIME CMD
        332 ?        00:00:00 rpcbind
      
      $ cat /proc/332/fdinfo/4
      pos:	0
      flags:	0100000
      mnt_id:	22
      lock:	1: FLOCK  ADVISORY  WRITE 324 00:13:13431 0 EOF
      
      $ ls -l /proc/332/fd/4
      lr-x------ 1 root root 64 Mar  5 14:43 /proc/332/fd/4 -> /run/rpcbind.lock
      
      $ ls -l /proc/324/fd/
      total 0
      lrwx------ 1 root root 64 Feb 27 14:50 0 -> /dev/pts/0
      lrwx------ 1 root root 64 Feb 27 14:50 1 -> /dev/pts/0
      lrwx------ 1 root root 64 Feb 27 14:49 2 -> /dev/pts/0
      
      You can see that the process with the 324 pid doesn't hold the lock.
      
      This information is required for proper dumping and restoring file
      locks.
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Acked-by: NJeff Layton <jlayton@poochiereds.net>
      Acked-by: N"J. Bruce Fields" <bfields@fieldses.org>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6c8c9031
  24. 03 4月, 2015 4 次提交
    • J
      locks: use cmpxchg to assign i_flctx pointer · 0429c2b5
      Jeff Layton 提交于
      During the v3.20/v4.0 cycle, I had originally had the code manage the
      inode->i_flctx pointer using a compare-and-swap operation instead of the
      i_lock.
      
      Sasha Levin though hit a problem while testing with trinity that made me
      believe that that wasn't safe. At the time, changing the code to protect
      the i_flctx pointer seemed to fix the issue, but I now think that was
      just coincidence.
      
      The issue was likely the same race that Kirill Shutemov hit while
      testing the pre-rc1 v4.0 kernel and that Linus spotted. Due to the way
      that the spinlock was dropped in the middle of flock_lock_file, you
      could end up with multiple flock locks for the same struct file on the
      inode.
      
      Reinstate the use of a CAS operation to assign this pointer since it's
      likely to be more efficient and gets the i_lock completely out of the
      file locking business.
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      0429c2b5
    • J
      locks: get rid of WE_CAN_BREAK_LSLK_NOW dead code · 3648888e
      Jeff Layton 提交于
      As Bruce points out, there's no compelling reason to change /proc/locks
      output at this point. If we did want to do this, then we'd almost
      certainly want to introduce a new file to display this info (maybe via
      debugfs?).
      
      Let's remove the dead WE_CAN_BREAK_LSLK_NOW ifdef here and just plan to
      stay with the legacy format.
      Reported-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      3648888e
    • J
      locks: change lm_get_owner and lm_put_owner prototypes · cae80b30
      Jeff Layton 提交于
      The current prototypes for these operations are somewhat awkward as they
      deal with fl_owners but take struct file_lock arguments. In the future,
      we'll want to be able to take references without necessarily dealing
      with a struct file_lock.
      
      Change them to take fl_owner_t arguments instead and have the callers
      deal with assigning the values to the file_lock structs.
      Signed-off-by: NJeff Layton <jlayton@primarydata.com>
      cae80b30
    • J
      locks: don't allocate a lock context for an F_UNLCK request · 5c1c669a
      Jeff Layton 提交于
      In the event that we get an F_UNLCK request on an inode that has no lock
      context, there is no reason to allocate one. Change
      locks_get_lock_context to take a "type" pointer and avoid allocating a
      new context if it's F_UNLCK.
      
      Then, fix the callers to return appropriately if that function returns
      NULL.
      Signed-off-by: NJeff Layton <jlayton@primarydata.com>
      5c1c669a