1. 20 9月, 2019 1 次提交
  2. 26 7月, 2019 5 次提交
    • A
      fix the struct mount leak in umount_tree() · 19a1c409
      Al Viro 提交于
      	We need to drop everything we remove from the tree, whether
      mnt_has_parent() is true or not.  Usually the bug manifests as a slow
      memory leak (leaked struct mount for initramfs); it becomes much more
      visible in mount_subtree() users, such as btrfs.  There we leak
      a struct mount for btrfs superblock being mounted, which prevents
      fs shutdown on subsequent umount.
      
      Fixes: 56cbb429 ("switch the remnants of releasing the mountpoint away from fs_pin")
      Reported-by: NNikolay Borisov <nborisov@suse.com>
      Tested-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      19a1c409
    • N
      btrfs: fix extent_state leak in btrfs_lock_and_flush_ordered_range · a3b46b86
      Naohiro Aota 提交于
      btrfs_lock_and_flush_ordered_range() loads given "*cached_state" into
      cachedp, which, in general, is NULL. Then, lock_extent_bits() updates
      "cachedp", but it never goes backs to the caller. Thus the caller still
      see its "cached_state" to be NULL and never free the state allocated
      under btrfs_lock_and_flush_ordered_range(). As a result, we will
      see massive state leak with e.g. fstests btrfs/005. Fix this bug by
      properly handling the pointers.
      
      Fixes: bd80d94e ("btrfs: Always use a cached extent_state in btrfs_lock_and_flush_ordered_range")
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a3b46b86
    • G
      afs: fsclient: Mark expected switch fall-throughs · 29881608
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch
      cases where we are expecting to fall through.
      
      This patch fixes the following warnings:
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      fs/afs/fsclient.c: In function ‘afs_deliver_fs_fetch_acl’:
      fs/afs/fsclient.c:2199:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
         call->unmarshall++;
         ~~~~~~~~~~~~~~~~^~
      fs/afs/fsclient.c:2202:2: note: here
        case 1:
        ^~~~
      fs/afs/fsclient.c:2216:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
         call->unmarshall++;
         ~~~~~~~~~~~~~~~~^~
      fs/afs/fsclient.c:2219:2: note: here
        case 2:
        ^~~~
      fs/afs/fsclient.c:2225:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
         call->unmarshall++;
         ~~~~~~~~~~~~~~~~^~
      fs/afs/fsclient.c:2228:2: note: here
        case 3:
        ^~~~
      
      This patch is part of the ongoing efforts to enable
      -Wimplicit-fallthrough.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      29881608
    • G
      afs: yfsclient: Mark expected switch fall-throughs · 35a3a90c
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch
      cases where we are expecting to fall through.
      
      This patch fixes the following warnings:
      
      fs/afs/yfsclient.c: In function ‘yfs_deliver_fs_fetch_opaque_acl’:
      fs/afs/yfsclient.c:1984:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
         call->unmarshall++;
         ~~~~~~~~~~~~~~~~^~
      fs/afs/yfsclient.c:1987:2: note: here
        case 1:
        ^~~~
      fs/afs/yfsclient.c:2005:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
         call->unmarshall++;
         ~~~~~~~~~~~~~~~~^~
      fs/afs/yfsclient.c:2008:2: note: here
        case 2:
        ^~~~
      fs/afs/yfsclient.c:2014:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
         call->unmarshall++;
         ~~~~~~~~~~~~~~~~^~
      fs/afs/yfsclient.c:2017:2: note: here
        case 3:
        ^~~~
      fs/afs/yfsclient.c:2035:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
         call->unmarshall++;
         ~~~~~~~~~~~~~~~~^~
      fs/afs/yfsclient.c:2038:2: note: here
        case 4:
        ^~~~
      fs/afs/yfsclient.c:2047:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
         call->unmarshall++;
         ~~~~~~~~~~~~~~~~^~
      fs/afs/yfsclient.c:2050:2: note: here
        case 5:
        ^~~~
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      Also, fix some commenting style issues.
      
      This patch is part of the ongoing efforts to enable
      -Wimplicit-fallthrough.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      35a3a90c
    • J
      io_uring: ensure ->list is initialized for poll commands · 36703247
      Jens Axboe 提交于
      Daniel reports that when testing an http server that uses io_uring
      to poll for incoming connections, sometimes it hard crashes. This is
      due to an uninitialized list member for the io_uring request. Normally
      this doesn't trigger and none of the test cases caught it.
      Reported-by: NDaniel Kozak <kozzi11@gmail.com>
      Tested-by: NDaniel Kozak <kozzi11@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      36703247
  3. 25 7月, 2019 4 次提交
    • N
      btrfs: Fix deadlock caused by missing memory barrier · 6e7ca09b
      Nikolay Borisov 提交于
      Commit 06297d8c ("btrfs: switch extent_buffer blocking_writers from
      atomic to int") changed the type of blocking_writers but forgot to
      adjust relevant code in btrfs_tree_unlock by converting the
      smp_mb__after_atomic to smp_mb.  This opened up the possibility of a
      deadlock due to re-ordering of setting blocking_writers and
      checking/waking up the waiter. This particular lockup is explained in a
      comment above waitqueue_active() function.
      
      Fix it by converting the memory barrier to a full smp_mb, accounting
      for the fact that blocking_writers is a simple integer.
      
      Fixes: 06297d8c ("btrfs: switch extent_buffer blocking_writers from atomic to int")
      Tested-by: NJohannes Thumshirn <jthumshirn@suse.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6e7ca09b
    • J
      sched/fair: Don't free p->numa_faults with concurrent readers · 16d51a59
      Jann Horn 提交于
      When going through execve(), zero out the NUMA fault statistics instead of
      freeing them.
      
      During execve, the task is reachable through procfs and the scheduler. A
      concurrent /proc/*/sched reader can read data from a freed ->numa_faults
      allocation (confirmed by KASAN) and write it back to userspace.
      I believe that it would also be possible for a use-after-free read to occur
      through a race between a NUMA fault and execve(): task_numa_fault() can
      lead to task_numa_compare(), which invokes task_weight() on the currently
      running task of a different CPU.
      
      Another way to fix this would be to make ->numa_faults RCU-managed or add
      extra locking, but it seems easier to wipe the NUMA fault statistics on
      execve.
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Fixes: 82727018 ("sched/numa: Call task_numa_free() from do_execve()")
      Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      16d51a59
    • M
      iomap: fix Invalid License ID · 0ce38c5f
      Masahiro Yamada 提交于
      Detected by:
      
        $ ./scripts/spdxcheck.py
        fs/iomap/Makefile: 1:27 Invalid License ID: GPL-2.0-or-newer
      
      Fixes: 1c230208 ("iomap: start moving code to fs/iomap/")
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ce38c5f
    • L
      access: avoid the RCU grace period for the temporary subjective credentials · d7852fbd
      Linus Torvalds 提交于
      It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU
      work because it installs a temporary credential that gets allocated and
      freed for each system call.
      
      The allocation and freeing overhead is mostly benign, but because
      credentials can be accessed under the RCU read lock, the freeing
      involves a RCU grace period.
      
      Which is not a huge deal normally, but if you have a lot of access()
      calls, this causes a fair amount of seconday damage: instead of having a
      nice alloc/free patterns that hits in hot per-CPU slab caches, you have
      all those delayed free's, and on big machines with hundreds of cores,
      the RCU overhead can end up being enormous.
      
      But it turns out that all of this is entirely unnecessary.  Exactly
      because access() only installs the credential as the thread-local
      subjective credential, the temporary cred pointer doesn't actually need
      to be RCU free'd at all.  Once we're done using it, we can just free it
      synchronously and avoid all the RCU overhead.
      
      So add a 'non_rcu' flag to 'struct cred', which can be set by users that
      know they only use it in non-RCU context (there are other potential
      users for this).  We can make it a union with the rcu freeing list head
      that we need for the RCU case, so this doesn't need any extra storage.
      
      Note that this also makes 'get_current_cred()' clear the new non_rcu
      flag, in case we have filesystems that take a long-term reference to the
      cred and then expect the RCU delayed freeing afterwards.  It's not
      entirely clear that this is required, but it makes for clear semantics:
      the subjective cred remains non-RCU as long as you only access it
      synchronously using the thread-local accessors, but you _can_ use it as
      a generic cred if you want to.
      
      It is possible that we should just remove the whole RCU markings for
      ->cred entirely.  Only ->real_cred is really supposed to be accessed
      through RCU, and the long-term cred copies that nfs uses might want to
      explicitly re-enable RCU freeing if required, rather than have
      get_current_cred() do it implicitly.
      
      But this is a "minimal semantic changes" change for the immediate
      problem.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Jan Glauber <jglauber@marvell.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7852fbd
  4. 22 7月, 2019 3 次提交
    • Z
      io_uring: track io length in async_list based on bytes · 9310a7ba
      Zhengyuan Liu 提交于
      We are using PAGE_SIZE as the unit to determine if the total len in
      async_list has exceeded max_pages, it's not fair for smaller io sizes.
      For example, if we are doing 1k-size io streams, we will never exceed
      max_pages since len >>= PAGE_SHIFT always gets zero. So use original
      bytes to make it more accurate.
      Signed-off-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9310a7ba
    • J
      io_uring: don't use iov_iter_advance() for fixed buffers · bd11b3a3
      Jens Axboe 提交于
      Hrvoje reports that when a large fixed buffer is registered and IO is
      being done to the latter pages of said buffer, the IO submission time
      is much worse:
      
      reading to the start of the buffer: 11238 ns
      reading to the end of the buffer:   1039879 ns
      
      In fact, it's worse by two orders of magnitude. The reason for that is
      how io_uring figures out how to setup the iov_iter. We point the iter
      at the first bvec, and then use iov_iter_advance() to fast-forward to
      the offset within that buffer we need.
      
      However, that is abysmally slow, as it entails iterating the bvecs
      that we setup as part of buffer registration. There's really no need
      to use this generic helper, as we know it's a BVEC type iterator, and
      we also know that each bvec is PAGE_SIZE in size, apart from possibly
      the first and last. Hence we can just use a shift on the offset to
      find the right index, and then adjust the iov_iter appropriately.
      After this fix, the timings are:
      
      reading to the start of the buffer: 10135 ns
      reading to the end of the buffer:   1377 ns
      
      Or about an 755x improvement for the tail page.
      Reported-by: NHrvoje Zeba <zeba.hrvoje@gmail.com>
      Tested-by: NHrvoje Zeba <zeba.hrvoje@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bd11b3a3
    • J
      block: properly handle IOCB_NOWAIT for async O_DIRECT IO · 6a43074e
      Jens Axboe 提交于
      A caller is supposed to pass in REQ_NOWAIT if we can't block for any
      given operation, but O_DIRECT for block devices just ignore this. Hence
      we'll block for various resource shortages on the block layer side,
      like having to wait for requests.
      
      Use the new REQ_NOWAIT_INLINE to ask for this error to be returned
      inline, so we can handle it appropriately and return -EAGAIN to the
      caller.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6a43074e
  5. 19 7月, 2019 13 次提交
  6. 18 7月, 2019 1 次提交
  7. 17 7月, 2019 13 次提交