1. 03 1月, 2019 1 次提交
    • N
      locks: fix error in locks_move_blocks() · bf77ae4c
      NeilBrown 提交于
      After moving all requests from
         fl->fl_blocked_requests
      to
         new->fl_blocked_requests
      
      it is nonsensical to do anything to all the remaining elements, there
      aren't any.  This should do something to all the requests that have been
      moved. For simplicity, it does it to all requests in the target list.
      
      Setting "f->fl_blocker = new" to all members of new->fl_blocked_requests
      is "obviously correct" as it preserves the invariant of the linkage
      among requests.
      
      Reported-by: syzbot+239d99847eb49ecb3899@syzkaller.appspotmail.com
      Fixes: 5946c431 ("fs/locks: allow a lock request to block other requests.")
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      bf77ae4c
  2. 17 12月, 2018 1 次提交
  3. 07 12月, 2018 5 次提交
    • N
      fs/locks: remove unnecessary white space. · 7bbd1fc0
      NeilBrown 提交于
       - spaces before tabs,
       - spaces at the end of lines,
       - multiple blank lines,
       - blank lines before EXPORT_SYMBOL,
      can all go.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      7bbd1fc0
    • N
      fs/locks: merge posix_unblock_lock() and locks_delete_block() · cb03f94f
      NeilBrown 提交于
      posix_unblock_lock() is not specific to posix locks, and behaves
      nearly identically to locks_delete_block() - the former returning a
      status while the later doesn't.
      
      So discard posix_unblock_lock() and use locks_delete_block() instead,
      after giving that function an appropriate return value.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      cb03f94f
    • N
      fs/locks: create a tree of dependent requests. · fd7732e0
      NeilBrown 提交于
      When we find an existing lock which conflicts with a request,
      and the request wants to wait, we currently add the request
      to a list.  When the lock is removed, the whole list is woken.
      This can cause the thundering-herd problem.
      To reduce the problem, we make use of the (new) fact that
      a pending request can itself have a list of blocked requests.
      When we find a conflict, we look through the existing blocked requests.
      If any one of them blocks the new request, the new request is attached
      below that request, otherwise it is added to the list of blocked
      requests, which are now known to be mutually non-conflicting.
      
      This way, when the lock is released, only a set of non-conflicting
      locks will be woken, the rest can stay asleep.
      If the lock request cannot be granted and the request needs to be
      requeued, all the other requests it blocks will then be woken
      
      To make this more concrete:
      
        If you have a many-core machine, and have many threads all wanting to
        briefly lock a give file (udev is known to do this), you can get quite
        poor performance.
      
        When one thread releases a lock, it wakes up all other threads that
        are waiting (classic thundering-herd) - one will get the lock and the
        others go to sleep.
        When you have few cores, this is not very noticeable: by the time the
        4th or 5th thread gets enough CPU time to try to claim the lock, the
        earlier threads have claimed it, done what was needed, and released.
        So with few cores, many of the threads don't end up contending.
        With 50+ cores, lost of threads can get the CPU at the same time,
        and the contention can easily be measured.
      
        This patchset creates a tree of pending lock requests in which siblings
        don't conflict and each lock request does conflict with its parent.
        When a lock is released, only requests which don't conflict with each
        other a woken.
      
        Testing shows that lock-acquisitions-per-second is now fairly stable
        even as the number of contending process goes to 1000.  Without this
        patch, locks-per-second drops off steeply after a few 10s of
        processes.
      
        There is a small cost to this extra complexity.
        At 20 processes running a particular test on 72 cores, the lock
        acquisitions per second drops from 1.8 million to 1.4 million with
        this patch.  For 100 processes, this patch still provides 1.4 million
        while without this patch there are about 700,000.
      Reported-and-tested-by: NMartin Wilck <mwilck@suse.de>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      fd7732e0
    • N
      fs/locks: change all *_conflict() functions to return bool. · c0e15908
      NeilBrown 提交于
      posix_locks_conflict() and flock_locks_conflict() both return int.
      leases_conflict() returns bool.
      
      This inconsistency will cause problems for the next patch if not
      fixed.
      
      So change posix_locks_conflict() and flock_locks_conflict() to return
      bool.
      Also change the locks_conflict() helper.
      
      And convert some
         return (foo);
      to
         return foo;
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      c0e15908
    • N
      fs/locks: always delete_block after waiting. · 16306a61
      NeilBrown 提交于
      Now that requests can block other requests, we
      need to be careful to always clean up those blocked
      requests.
      Any time that we wait for a request, we might have
      other requests attached, and when we stop waiting,
      we must clean them up.
      If the lock was granted, the requests might have been
      moved to the new lock, though when merged with a
      pre-exiting lock, this might not happen.
      In all cases we don't want blocked locks to remain
      attached, so we remove them to be safe.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
      Tested-by: syzbot+a4a3d526b4157113ec6a@syzkaller.appspotmail.com
      Tested-by: Nkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      16306a61
  4. 01 12月, 2018 4 次提交
  5. 09 8月, 2018 1 次提交
  6. 07 8月, 2018 1 次提交
  7. 21 7月, 2018 1 次提交
    • E
      signal: Use PIDTYPE_TGID to clearly store where file signals will be sent · 01919134
      Eric W. Biederman 提交于
      When f_setown is called a pid and a pid type are stored.  Replace the use
      of PIDTYPE_PID with PIDTYPE_TGID as PIDTYPE_TGID goes to the entire thread
      group.  Replace the use of PIDTYPE_MAX with PIDTYPE_PID as PIDTYPE_PID now
      is only for a thread.
      
      Update the users of __f_setown to use PIDTYPE_TGID instead of
      PIDTYPE_PID.
      
      For now the code continues to capture task_pid (when task_tgid would
      really be appropriate), and iterate on PIDTYPE_PID (even when type ==
      PIDTYPE_TGID) out of an abundance of caution to preserve existing
      behavior.
      
      Oleg Nesterov suggested using the test to ensure we use PIDTYPE_PID
      for tgid lookup also be used to avoid taking the tasklist lock.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      01919134
  8. 18 7月, 2018 2 次提交
  9. 14 6月, 2018 2 次提交
  10. 06 6月, 2018 1 次提交
    • D
      vfs: change inode times to use struct timespec64 · 95582b00
      Deepa Dinamani 提交于
      struct timespec is not y2038 safe. Transition vfs to use
      y2038 safe struct timespec64 instead.
      
      The change was made with the help of the following cocinelle
      script. This catches about 80% of the changes.
      All the header file and logic changes are included in the
      first 5 rules. The rest are trivial substitutions.
      I avoid changing any of the function signatures or any other
      filesystem specific data structures to keep the patch simple
      for review.
      
      The script can be a little shorter by combining different cases.
      But, this version was sufficient for my usecase.
      
      virtual patch
      
      @ depends on patch @
      identifier now;
      @@
      - struct timespec
      + struct timespec64
        current_time ( ... )
        {
      - struct timespec now = current_kernel_time();
      + struct timespec64 now = current_kernel_time64();
        ...
      - return timespec_trunc(
      + return timespec64_trunc(
        ... );
        }
      
      @ depends on patch @
      identifier xtime;
      @@
       struct \( iattr \| inode \| kstat \) {
       ...
      -       struct timespec xtime;
      +       struct timespec64 xtime;
       ...
       }
      
      @ depends on patch @
      identifier t;
      @@
       struct inode_operations {
       ...
      int (*update_time) (...,
      -       struct timespec t,
      +       struct timespec64 t,
      ...);
       ...
       }
      
      @ depends on patch @
      identifier t;
      identifier fn_update_time =~ "update_time$";
      @@
       fn_update_time (...,
      - struct timespec *t,
      + struct timespec64 *t,
       ...) { ... }
      
      @ depends on patch @
      identifier t;
      @@
      lease_get_mtime( ... ,
      - struct timespec *t
      + struct timespec64 *t
        ) { ... }
      
      @te depends on patch forall@
      identifier ts;
      local idexpression struct inode *inode_node;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn_update_time =~ "update_time$";
      identifier fn;
      expression e, E3;
      local idexpression struct inode *node1;
      local idexpression struct inode *node2;
      local idexpression struct iattr *attr1;
      local idexpression struct iattr *attr2;
      local idexpression struct iattr attr;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      @@
      (
      (
      - struct timespec ts;
      + struct timespec64 ts;
      |
      - struct timespec ts = current_time(inode_node);
      + struct timespec64 ts = current_time(inode_node);
      )
      
      <+... when != ts
      (
      - timespec_equal(&inode_node->i_xtime, &ts)
      + timespec64_equal(&inode_node->i_xtime, &ts)
      |
      - timespec_equal(&ts, &inode_node->i_xtime)
      + timespec64_equal(&ts, &inode_node->i_xtime)
      |
      - timespec_compare(&inode_node->i_xtime, &ts)
      + timespec64_compare(&inode_node->i_xtime, &ts)
      |
      - timespec_compare(&ts, &inode_node->i_xtime)
      + timespec64_compare(&ts, &inode_node->i_xtime)
      |
      ts = current_time(e)
      |
      fn_update_time(..., &ts,...)
      |
      inode_node->i_xtime = ts
      |
      node1->i_xtime = ts
      |
      ts = inode_node->i_xtime
      |
      <+... attr1->ia_xtime ...+> = ts
      |
      ts = attr1->ia_xtime
      |
      ts.tv_sec
      |
      ts.tv_nsec
      |
      btrfs_set_stack_timespec_sec(..., ts.tv_sec)
      |
      btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
      |
      - ts = timespec64_to_timespec(
      + ts =
      ...
      -)
      |
      - ts = ktime_to_timespec(
      + ts = ktime_to_timespec64(
      ...)
      |
      - ts = E3
      + ts = timespec_to_timespec64(E3)
      |
      - ktime_get_real_ts(&ts)
      + ktime_get_real_ts64(&ts)
      |
      fn(...,
      - ts
      + timespec64_to_timespec(ts)
      ,...)
      )
      ...+>
      (
      <... when != ts
      - return ts;
      + return timespec64_to_timespec(ts);
      ...>
      )
      |
      - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
      |
      - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
      + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
      |
      - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
      |
      node1->i_xtime1 =
      - timespec_trunc(attr1->ia_xtime1,
      + timespec64_trunc(attr1->ia_xtime1,
      ...)
      |
      - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
      + attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
      ...)
      |
      - ktime_get_real_ts(&attr1->ia_xtime1)
      + ktime_get_real_ts64(&attr1->ia_xtime1)
      |
      - ktime_get_real_ts(&attr.ia_xtime1)
      + ktime_get_real_ts64(&attr.ia_xtime1)
      )
      
      @ depends on patch @
      struct inode *node;
      struct iattr *attr;
      identifier fn;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      expression e;
      @@
      (
      - fn(node->i_xtime);
      + fn(timespec64_to_timespec(node->i_xtime));
      |
       fn(...,
      - node->i_xtime);
      + timespec64_to_timespec(node->i_xtime));
      |
      - e = fn(attr->ia_xtime);
      + e = fn(timespec64_to_timespec(attr->ia_xtime));
      )
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      )
      ...+>
      }
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      struct kstat *stat;
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier i_xtime =~ "^i_[acm]time$";
      identifier xtime =~ "^[acm]time$";
      identifier fn, ret;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(stat->xtime);
      ret = fn (...,
      - &stat->xtime);
      + &ts);
      )
      ...+>
      }
      
      @ depends on patch @
      struct inode *node;
      struct inode *node2;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier i_xtime3 =~ "^i_[acm]time$";
      struct iattr *attrp;
      struct iattr *attrp2;
      struct iattr attr ;
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      struct kstat *stat;
      struct kstat stat1;
      struct timespec64 ts;
      identifier xtime =~ "^[acmb]time$";
      expression e;
      @@
      (
      ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
      |
       node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       stat->xtime = node2->i_xtime1;
      |
       stat1.xtime = node2->i_xtime1;
      |
      ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
      |
      ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
      |
      - e = node->i_xtime1;
      + e = timespec64_to_timespec( node->i_xtime1 );
      |
      - e = attrp->ia_xtime1;
      + e = timespec64_to_timespec( attrp->ia_xtime1 );
      |
      node->i_xtime1 = current_time(...);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
       node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
      - node->i_xtime1 = e;
      + node->i_xtime1 = timespec_to_timespec64(e);
      )
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: <anton@tuxera.com>
      Cc: <balbi@kernel.org>
      Cc: <bfields@fieldses.org>
      Cc: <darrick.wong@oracle.com>
      Cc: <dhowells@redhat.com>
      Cc: <dsterba@suse.com>
      Cc: <dwmw2@infradead.org>
      Cc: <hch@lst.de>
      Cc: <hirofumi@mail.parknet.co.jp>
      Cc: <hubcap@omnibond.com>
      Cc: <jack@suse.com>
      Cc: <jaegeuk@kernel.org>
      Cc: <jaharkes@cs.cmu.edu>
      Cc: <jslaby@suse.com>
      Cc: <keescook@chromium.org>
      Cc: <mark@fasheh.com>
      Cc: <miklos@szeredi.hu>
      Cc: <nico@linaro.org>
      Cc: <reiserfs-devel@vger.kernel.org>
      Cc: <richard@nod.at>
      Cc: <sage@redhat.com>
      Cc: <sfrench@samba.org>
      Cc: <swhiteho@redhat.com>
      Cc: <tj@kernel.org>
      Cc: <trond.myklebust@primarydata.com>
      Cc: <tytso@mit.edu>
      Cc: <viro@zeniv.linux.org.uk>
      95582b00
  11. 16 5月, 2018 1 次提交
  12. 26 3月, 2018 1 次提交
  13. 09 2月, 2018 1 次提交
  14. 28 11月, 2017 1 次提交
    • L
      Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6
      Linus Torvalds 提交于
      This is a pure automated search-and-replace of the internal kernel
      superblock flags.
      
      The s_flags are now called SB_*, with the names and the values for the
      moment mirroring the MS_* flags that they're equivalent to.
      
      Note how the MS_xyz flags are the ones passed to the mount system call,
      while the SB_xyz flags are what we then use in sb->s_flags.
      
      The script to do this was:
      
          # places to look in; re security/*: it generally should *not* be
          # touched (that stuff parses mount(2) arguments directly), but
          # there are two places where we really deal with superblock flags.
          FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
                  include/linux/fs.h include/uapi/linux/bfs_fs.h \
                  security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
          # the list of MS_... constants
          SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
                DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
                POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
                I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
                ACTIVE NOUSER"
      
          SED_PROG=
          for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
      
          # we want files that contain at least one of MS_...,
          # with fs/namespace.c and fs/pnode.c excluded.
          L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
      
          for f in $L; do sed -i $f $SED_PROG; done
      Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1751e8a6
  15. 22 7月, 2017 1 次提交
  16. 16 7月, 2017 2 次提交
    • B
      fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks · 9d5b86ac
      Benjamin Coddington 提交于
      Since commit c69899a1 "NFSv4: Update of VFS byte range lock must be
      atomic with the stateid update", NFSv4 has been inserting locks in rpciod
      worker context.  The result is that the file_lock's fl_nspid is the
      kworker's pid instead of the original userspace pid.
      
      The fl_nspid is only used to represent the namespaced virtual pid number
      when displaying locks or returning from F_GETLK.  There's no reason to set
      it for every inserted lock, since we can usually just look it up from
      fl_pid.  So, instead of looking up and holding struct pid for every lock,
      let's just look up the virtual pid number from fl_pid when it is needed.
      That means we can remove fl_nspid entirely.
      
      The translaton and presentation of fl_pid should handle the following four
      cases:
      
      1 - F_GETLK on a remote file with a remote lock:
          In this case, the filesystem should determine the l_pid to return here.
          Filesystems should indicate that the fl_pid represents a non-local pid
          value that should not be translated by returning an fl_pid <= 0.
      
      2 - F_GETLK on a local file with a remote lock:
          This should be the l_pid of the lock manager process, and translated.
      
      3 - F_GETLK on a remote file with a local lock, and
      4 - F_GETLK on a local file with a local lock:
          These should be the translated l_pid of the local locking process.
      
      Fuse was already doing the correct thing by translating the pid into the
      caller's namespace.  With this change we must update fuse to translate
      to init's pid namespace, so that the locks API can then translate from
      init's pid namespace into the pid namespace of the caller.
      
      With this change, the locks API will expect that if a filesystem returns
      a remote pid as opposed to a local pid for F_GETLK, that remote pid will
      be <= 0.  This signifies that the pid is remote, and the locks API will
      forego translating that pid into the pid namespace of the local calling
      process.
      
      Finally, we convert remote filesystems to present remote pids using
      negative numbers. Have lustre, 9p, ceph, cifs, and dlm negate the remote
      pid returned for F_GETLK lock requests.
      
      Since local pids will never be larger than PID_MAX_LIMIT (which is
      currently defined as <= 4 million), but pid_t is an unsigned int, we
      should have plenty of room to represent remote pids with negative
      numbers if we assume that remote pid numbers are similarly limited.
      
      If this is not the case, then we run the risk of having a remote pid
      returned for which there is also a corresponding local pid.  This is a
      problem we have now, but this patch should reduce the chances of that
      occurring, while also returning those remote pid numbers, for whatever
      that may be worth.
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      9d5b86ac
    • B
      fs/locks: Use allocation rather than the stack in fcntl_getlk() · 52306e88
      Benjamin Coddington 提交于
      Struct file_lock is fairly large, so let's save some space on the stack by
      using an allocation for struct file_lock in fcntl_getlk(), just as we do
      for fcntl_setlk().
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      52306e88
  17. 27 5月, 2017 2 次提交
  18. 21 4月, 2017 1 次提交
  19. 25 12月, 2016 1 次提交
  20. 18 10月, 2016 1 次提交
  21. 28 9月, 2016 1 次提交
  22. 22 9月, 2016 3 次提交
    • P
      fs/locks: Use percpu_down_read_preempt_disable() · 87709e28
      Peter Zijlstra 提交于
      Avoid spurious preemption.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: der.herr@hofr.at
      Cc: paulmck@linux.vnet.ibm.com
      Cc: riel@redhat.com
      Cc: tj@kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      87709e28
    • P
      fs/locks: Replace lg_local with a per-cpu spinlock · 7c3f654d
      Peter Zijlstra 提交于
      As Oleg suggested, replace file_lock_list with a structure containing
      the hlist head and a spinlock.
      
      This completely removes the lglock from fs/locks.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: der.herr@hofr.at
      Cc: paulmck@linux.vnet.ibm.com
      Cc: riel@redhat.com
      Cc: tj@kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7c3f654d
    • P
      fs/locks: Replace lg_global with a percpu-rwsem · aba37660
      Peter Zijlstra 提交于
      Replace the global part of the lglock with a percpu-rwsem.
      
      Since fcl_lock is a spinlock and itself nests under i_lock, which too
      is a spinlock we cannot acquire sleeping locks at
      locks_{insert,remove}_global_locks().
      
      We can however wrap all fcl_lock acquisitions with percpu_down_read
      such that all invocations of locks_{insert,remove}_global_locks() have
      that read lock held.
      
      This allows us to replace the lg_global part of the lglock with the
      write side of the rwsem.
      
      In the absense of writers, percpu_{down,up}_read() are free of atomic
      instructions. This further avoids the very long preempt-disable
      regions caused by lglock on larger machines.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: der.herr@hofr.at
      Cc: paulmck@linux.vnet.ibm.com
      Cc: riel@redhat.com
      Cc: tj@kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      aba37660
  23. 16 9月, 2016 2 次提交
    • M
      vfs: do get_write_access() on upper layer of overlayfs · 4d0c5ba2
      Miklos Szeredi 提交于
      The problem with writecount is: we want consistent handling of it for
      underlying filesystems as well as overlayfs.  Making sure i_writecount is
      correct on all layers is difficult.  Instead this patch makes sure that
      when write access is acquired, it's always done on the underlying writable
      layer (called the upper layer).  We must also make sure to look at the
      writecount on this layer when checking for conflicting leases.
      
      Open for write already updates the upper layer's writecount.  Leaving only
      truncate.
      
      For truncate copy up must happen before get_write_access() so that the
      writecount is updated on the upper layer.  Problem with this is if
      something fails after that, then copy-up was done needlessly.  E.g. if
      break_lease() was interrupted.  Probably not a big deal in practice.
      
      Another interesting case is if there's a denywrite on a lower file that is
      then opened for write or truncated.  With this patch these will succeed,
      which is somewhat counterintuitive.  But I think it's still acceptable,
      considering that the copy-up does actually create a different file, so the
      old, denywrite mapping won't be touched.
      
      On non-overlayfs d_real() is an identity function and d_real_inode() is
      equivalent to d_inode() so this patch doesn't change behavior in that case.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Acked-by: NJeff Layton <jlayton@poochiereds.net>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      4d0c5ba2
    • M
      locks: fix file locking on overlayfs · c568d683
      Miklos Szeredi 提交于
      This patch allows flock, posix locks, ofd locks and leases to work
      correctly on overlayfs.
      
      Instead of using the underlying inode for storing lock context use the
      overlay inode.  This allows locks to be persistent across copy-up.
      
      This is done by introducing locks_inode() helper and using it instead of
      file_inode() to get the inode in locking code.  For non-overlayfs the two
      are equivalent, except for an extra pointer dereference in locks_inode().
      
      Since lock operations are in "struct file_operations" we must also make
      sure not to call underlying filesystem's lock operations.  Introcude a
      super block flag MS_NOREMOTELOCK to this effect.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Acked-by: NJeff Layton <jlayton@poochiereds.net>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      c568d683
  24. 19 8月, 2016 1 次提交
  25. 01 7月, 2016 1 次提交
    • M
      locks: use file_inode() · 6343a212
      Miklos Szeredi 提交于
      (Another one for the f_path debacle.)
      
      ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask.
      
      The reason is that generic_add_lease() used filp->f_path.dentry->inode
      while all the others use file_inode().  This makes a difference for files
      opened on overlayfs since the former will point to the overlay inode the
      latter to the underlying inode.
      
      So generic_add_lease() added the lease to the overlay inode and
      generic_delete_lease() removed it from the underlying inode.  When the file
      was released the lease remained on the overlay inode's lock list, resulting
      in use after free.
      Reported-by: NEryu Guan <eguan@redhat.com>
      Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      6343a212
  26. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c