1. 04 9月, 2021 1 次提交
    • V
      memcg: enable accounting for file lock caches · 0f12156d
      Vasily Averin 提交于
      User can create file locks for each open file and force kernel to allocate
      small but long-living objects per each open file.
      
      It makes sense to account for these objects to limit the host's memory
      consumption from inside the memcg-limited container.
      
      Link: https://lkml.kernel.org/r/b009f4c7-f0ab-c0ec-8e83-918f47d677da@virtuozzo.comSigned-off-by: NVasily Averin <vvs@virtuozzo.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrei Vagin <avagin@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Yutian Yang <nglaive@gmail.com>
      Cc: Zefan Li <lizefan.x@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f12156d
  2. 20 4月, 2021 1 次提交
  3. 13 4月, 2021 1 次提交
  4. 11 3月, 2021 1 次提交
  5. 09 3月, 2021 1 次提交
    • J
      Revert "nfsd4: a client's own opens needn't prevent delegations" · 6ee65a77
      J. Bruce Fields 提交于
      This reverts commit 94415b06.
      
      That commit claimed to allow a client to get a read delegation when it
      was the only writer.  Actually it allowed a client to get a read
      delegation when *any* client has a write open!
      
      The main problem is that it's depending on nfs4_clnt_odstate structures
      that are actually only maintained for pnfs exports.
      
      This causes clients to miss writes performed by other clients, even when
      there have been intervening closes and opens, violating close-to-open
      cache consistency.
      
      We can do this a different way, but first we should just revert this.
      
      I've added pynfs 4.1 test DELEG19 to test for this, as I should have
      done originally!
      
      Cc: stable@vger.kernel.org
      Reported-by: NTimo Rothenpieler <timo@rothenpieler.org>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      6ee65a77
  6. 11 12月, 2020 1 次提交
  7. 26 10月, 2020 2 次提交
  8. 24 8月, 2020 1 次提交
  9. 14 7月, 2020 1 次提交
    • J
      nfsd4: a client's own opens needn't prevent delegations · 94415b06
      J. Bruce Fields 提交于
      We recently fixed lease breaking so that a client's actions won't break
      its own delegations.
      
      But we still have an unnecessary self-conflict when granting
      delegations: a client's own write opens will prevent us from handing out
      a read delegation even when no other client has the file open for write.
      
      Fix that by turning off the checks for conflicting opens under
      vfs_setlease, and instead performing those checks in the nfsd code.
      
      We don't depend much on locks here: instead we acquire the delegation,
      then check for conflicts, and drop the delegation again if we find any.
      
      The check beforehand is an optimization of sorts, just to avoid
      acquiring the delegation unnecessarily.  There's a race where the first
      check could cause us to deny the delegation when we could have granted
      it.  But, that's OK, delegation grants are optional (and probably not
      even a good idea in that case).
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      94415b06
  10. 03 6月, 2020 1 次提交
  11. 19 5月, 2020 1 次提交
  12. 09 5月, 2020 1 次提交
    • J
      nfsd: clients don't need to break their own delegations · 28df3d15
      J. Bruce Fields 提交于
      We currently revoke read delegations on any write open or any operation
      that modifies file data or metadata (including rename, link, and
      unlink).  But if the delegation in question is the only read delegation
      and is held by the client performing the operation, that's not really
      necessary.
      
      It's not always possible to prevent this in the NFSv4.0 case, because
      there's not always a way to determine which client an NFSv4.0 delegation
      came from.  (In theory we could try to guess this from the transport
      layer, e.g., by assuming all traffic on a given TCP connection comes
      from the same client.  But that's not really correct.)
      
      In the NFSv4.1 case the session layer always tells us the client.
      
      This patch should remove such self-conflicts in all cases where we can
      reliably determine the client from the compound.
      
      To do that we need to track "who" is performing a given (possibly
      lease-breaking) file operation.  We're doing that by storing the
      information in the svc_rqst and using kthread_data() to map the current
      task back to a svc_rqst.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      28df3d15
  13. 05 5月, 2020 1 次提交
  14. 25 4月, 2020 1 次提交
  15. 19 3月, 2020 1 次提交
  16. 07 3月, 2020 1 次提交
    • Y
      locks: fix a potential use-after-free problem when wakeup a waiter · 6d390e4b
      yangerkun 提交于
      '16306a61 ("fs/locks: always delete_block after waiting.")' add the
      logic to check waiter->fl_blocker without blocked_lock_lock. And it will
      trigger a UAF when we try to wakeup some waiter:
      
      Thread 1 has create a write flock a on file, and now thread 2 try to
      unlock and delete flock a, thread 3 try to add flock b on the same file.
      
      Thread2                         Thread3
                                      flock syscall(create flock b)
      	                        ...flock_lock_inode_wait
      				    flock_lock_inode(will insert
      				    our fl_blocked_member list
      				    to flock a's fl_blocked_requests)
      				   sleep
      flock syscall(unlock)
      ...flock_lock_inode_wait
          locks_delete_lock_ctx
          ...__locks_wake_up_blocks
              __locks_delete_blocks(
      	b->fl_blocker = NULL)
      	...
                                         break by a signal
      				   locks_delete_block
      				    b->fl_blocker == NULL &&
      				    list_empty(&b->fl_blocked_requests)
      	                            success, return directly
      				 locks_free_lock b
      	wake_up(&b->fl_waiter)
      	trigger UAF
      
      Fix it by remove this logic, and this patch may also fix CVE-2019-19769.
      
      Cc: stable@vger.kernel.org
      Fixes: 16306a61 ("fs/locks: always delete_block after waiting.")
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      6d390e4b
  17. 29 12月, 2019 1 次提交
  18. 20 8月, 2019 1 次提交
  19. 19 8月, 2019 2 次提交
  20. 25 7月, 2019 1 次提交
    • P
      locks: Fix procfs output for file leases · 43e4cb94
      Pavel Begunkov 提交于
      Since commit 778fc546 ("locks: fix tracking of inprogress
      lease breaks"), leases break don't change @fl_type but modifies
      @fl_flags. However, procfs's part haven't been updated.
      
      Previously, for a breaking lease the target type was printed (see
      target_leasetype()), as returns fcntl(F_GETLEASE). But now it's always
      "READ", as F_UNLCK no longer means "breaking". Unlike the previous
      one, this behaviour don't provide a complete description of the lease.
      
      There are /proc/pid/fdinfo/ outputs for a lease (the same for READ and
      WRITE) breaked by O_WRONLY.
      -- before:
      lock:   1: LEASE  BREAKING  READ  2558 08:03:815793 0 EOF
      -- after:
      lock:   1: LEASE  BREAKING  UNLCK  2558 08:03:815793 0 EOF
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      43e4cb94
  21. 04 7月, 2019 1 次提交
  22. 19 6月, 2019 2 次提交
    • A
      locks: eliminate false positive conflicts for write lease · 387e3746
      Amir Goldstein 提交于
      check_conflicting_open() is checking for existing fd's open for read or
      for write before allowing to take a write lease.  The check that was
      implemented using i_count and d_count is an approximation that has
      several false positives.  For example, overlayfs since v4.19, takes an
      extra reference on the dentry; An open with O_PATH takes a reference on
      the dentry although the file cannot be read nor written.
      
      Change the implementation to use i_readcount and i_writecount to
      eliminate the false positive conflicts and allow a write lease to be
      taken on an overlayfs file.
      
      The change of behavior with existing fd's open with O_PATH is symmetric
      w.r.t. current behavior of lease breakers - an open with O_PATH currently
      does not break a write lease.
      
      This increases the size of struct inode by 4 bytes on 32bit archs when
      CONFIG_FILE_LOCKING is defined and CONFIG_IMA was not already
      defined.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      387e3746
    • I
      locks: Add trace_leases_conflict · d51f527f
      Ira Weiny 提交于
      Signed-off-by: NIra Weiny <ira.weiny@intel.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      d51f527f
  23. 21 5月, 2019 1 次提交
  24. 24 4月, 2019 1 次提交
    • N
      locks: move checks from locks_free_lock() to locks_release_private() · 5926459e
      NeilBrown 提交于
      Code that allocates locks using locks_alloc_lock() will free it
      using locks_free_lock(), and will benefit from the BUG_ON()
      consistency checks therein.
      
      However some code (nfsd and lockd) allocate a lock embedded in
      some other data structure, and so free the lock themselves after
      calling locks_release_private().  This path does not benefit from
      the consistency checks.
      
      To help catch future errors, move the BUG_ON() checks to
      locks_release_private() - which locks_free_lock() already calls.
      This ensures that all users for locks will find out if the lock
      isn't detached properly before being free.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      5926459e
  25. 09 4月, 2019 1 次提交
    • G
      fs: mark expected switch fall-throughs · 0a4c9265
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      This patch fixes the following warnings:
      
      fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=]
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enabling
      -Wimplicit-fallthrough.
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      0a4c9265
  26. 25 3月, 2019 1 次提交
    • J
      locks: wake any locks blocked on request before deadlock check · 945ab8f6
      Jeff Layton 提交于
      Andreas reported that he was seeing the tdbtorture test fail in some
      cases with -EDEADLCK when it wasn't before. Some debugging showed that
      deadlock detection was sometimes discovering the caller's lock request
      itself in a dependency chain.
      
      While we remove the request from the blocked_lock_hash prior to
      reattempting to acquire it, any locks that are blocked on that request
      will still be present in the hash and will still have their fl_blocker
      pointer set to the current request.
      
      This causes posix_locks_deadlock to find a deadlock dependency chain
      when it shouldn't, as a lock request cannot block itself.
      
      We are going to end up waking all of those blocked locks anyway when we
      go to reinsert the request back into the blocked_lock_hash, so just do
      it prior to checking for deadlocks. This ensures that any lock blocked
      on the current request will no longer be part of any blocked request
      chain.
      
      URL: https://bugzilla.kernel.org/show_bug.cgi?id=202975
      Fixes: 5946c431 ("fs/locks: allow a lock request to block other requests.")
      Cc: stable@vger.kernel.org
      Reported-by: NAndreas Schneider <asn@redhat.com>
      Signed-off-by: NNeil Brown <neilb@suse.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      945ab8f6
  27. 28 2月, 2019 1 次提交
  28. 03 1月, 2019 1 次提交
    • N
      locks: fix error in locks_move_blocks() · bf77ae4c
      NeilBrown 提交于
      After moving all requests from
         fl->fl_blocked_requests
      to
         new->fl_blocked_requests
      
      it is nonsensical to do anything to all the remaining elements, there
      aren't any.  This should do something to all the requests that have been
      moved. For simplicity, it does it to all requests in the target list.
      
      Setting "f->fl_blocker = new" to all members of new->fl_blocked_requests
      is "obviously correct" as it preserves the invariant of the linkage
      among requests.
      
      Reported-by: syzbot+239d99847eb49ecb3899@syzkaller.appspotmail.com
      Fixes: 5946c431 ("fs/locks: allow a lock request to block other requests.")
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      bf77ae4c
  29. 17 12月, 2018 1 次提交
  30. 07 12月, 2018 5 次提交
    • N
      fs/locks: remove unnecessary white space. · 7bbd1fc0
      NeilBrown 提交于
       - spaces before tabs,
       - spaces at the end of lines,
       - multiple blank lines,
       - blank lines before EXPORT_SYMBOL,
      can all go.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      7bbd1fc0
    • N
      fs/locks: merge posix_unblock_lock() and locks_delete_block() · cb03f94f
      NeilBrown 提交于
      posix_unblock_lock() is not specific to posix locks, and behaves
      nearly identically to locks_delete_block() - the former returning a
      status while the later doesn't.
      
      So discard posix_unblock_lock() and use locks_delete_block() instead,
      after giving that function an appropriate return value.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      cb03f94f
    • N
      fs/locks: create a tree of dependent requests. · fd7732e0
      NeilBrown 提交于
      When we find an existing lock which conflicts with a request,
      and the request wants to wait, we currently add the request
      to a list.  When the lock is removed, the whole list is woken.
      This can cause the thundering-herd problem.
      To reduce the problem, we make use of the (new) fact that
      a pending request can itself have a list of blocked requests.
      When we find a conflict, we look through the existing blocked requests.
      If any one of them blocks the new request, the new request is attached
      below that request, otherwise it is added to the list of blocked
      requests, which are now known to be mutually non-conflicting.
      
      This way, when the lock is released, only a set of non-conflicting
      locks will be woken, the rest can stay asleep.
      If the lock request cannot be granted and the request needs to be
      requeued, all the other requests it blocks will then be woken
      
      To make this more concrete:
      
        If you have a many-core machine, and have many threads all wanting to
        briefly lock a give file (udev is known to do this), you can get quite
        poor performance.
      
        When one thread releases a lock, it wakes up all other threads that
        are waiting (classic thundering-herd) - one will get the lock and the
        others go to sleep.
        When you have few cores, this is not very noticeable: by the time the
        4th or 5th thread gets enough CPU time to try to claim the lock, the
        earlier threads have claimed it, done what was needed, and released.
        So with few cores, many of the threads don't end up contending.
        With 50+ cores, lost of threads can get the CPU at the same time,
        and the contention can easily be measured.
      
        This patchset creates a tree of pending lock requests in which siblings
        don't conflict and each lock request does conflict with its parent.
        When a lock is released, only requests which don't conflict with each
        other a woken.
      
        Testing shows that lock-acquisitions-per-second is now fairly stable
        even as the number of contending process goes to 1000.  Without this
        patch, locks-per-second drops off steeply after a few 10s of
        processes.
      
        There is a small cost to this extra complexity.
        At 20 processes running a particular test on 72 cores, the lock
        acquisitions per second drops from 1.8 million to 1.4 million with
        this patch.  For 100 processes, this patch still provides 1.4 million
        while without this patch there are about 700,000.
      Reported-and-tested-by: NMartin Wilck <mwilck@suse.de>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      fd7732e0
    • N
      fs/locks: change all *_conflict() functions to return bool. · c0e15908
      NeilBrown 提交于
      posix_locks_conflict() and flock_locks_conflict() both return int.
      leases_conflict() returns bool.
      
      This inconsistency will cause problems for the next patch if not
      fixed.
      
      So change posix_locks_conflict() and flock_locks_conflict() to return
      bool.
      Also change the locks_conflict() helper.
      
      And convert some
         return (foo);
      to
         return foo;
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      c0e15908
    • N
      fs/locks: always delete_block after waiting. · 16306a61
      NeilBrown 提交于
      Now that requests can block other requests, we
      need to be careful to always clean up those blocked
      requests.
      Any time that we wait for a request, we might have
      other requests attached, and when we stop waiting,
      we must clean them up.
      If the lock was granted, the requests might have been
      moved to the new lock, though when merged with a
      pre-exiting lock, this might not happen.
      In all cases we don't want blocked locks to remain
      attached, so we remove them to be safe.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
      Tested-by: syzbot+a4a3d526b4157113ec6a@syzkaller.appspotmail.com
      Tested-by: Nkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      16306a61
  31. 01 12月, 2018 3 次提交