1. 10 8月, 2021 1 次提交
  2. 03 8月, 2021 4 次提交
  3. 28 7月, 2021 1 次提交
  4. 30 6月, 2021 1 次提交
  5. 25 6月, 2021 2 次提交
  6. 24 6月, 2021 1 次提交
  7. 04 6月, 2021 1 次提交
  8. 01 6月, 2021 6 次提交
  9. 20 5月, 2021 2 次提交
  10. 06 5月, 2021 2 次提交
    • Y
      block: reexpand iov_iter after read/write · cf7b39a0
      yangerkun 提交于
      We get a bug:
      
      BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404
      lib/iov_iter.c:1139
      Read of size 8 at addr ffff0000d3fb11f8 by task
      
      CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted
      5.10.0-00843-g352c8610ccd2 #2
      Hardware name: linux,dummy-virt (DT)
      Call trace:
       dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132
       show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x110/0x164 lib/dump_stack.c:118
       print_address_description+0x78/0x5c8 mm/kasan/report.c:385
       __kasan_report mm/kasan/report.c:545 [inline]
       kasan_report+0x148/0x1e4 mm/kasan/report.c:562
       check_memory_region_inline mm/kasan/generic.c:183 [inline]
       __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
       iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139
       io_read fs/io_uring.c:3421 [inline]
       io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943
       __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
       io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
       io_submit_sqe fs/io_uring.c:6395 [inline]
       io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
       __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
       __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
       __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
       do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
       el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
       el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
       el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
      
      Allocated by task 12570:
       stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
       kasan_save_stack mm/kasan/common.c:48 [inline]
       kasan_set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461
       kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475
       __kmalloc+0x23c/0x334 mm/slub.c:3970
       kmalloc include/linux/slab.h:557 [inline]
       __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210
       io_setup_async_rw fs/io_uring.c:3229 [inline]
       io_read fs/io_uring.c:3436 [inline]
       io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943
       __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
       io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
       io_submit_sqe fs/io_uring.c:6395 [inline]
       io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
       __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
       __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
       __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
       do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
       el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
       el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
       el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
      
      Freed by task 12570:
       stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
       kasan_save_stack mm/kasan/common.c:48 [inline]
       kasan_set_track+0x38/0x6c mm/kasan/common.c:56
       kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355
       __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422
       kasan_slab_free+0x10/0x1c mm/kasan/common.c:431
       slab_free_hook mm/slub.c:1544 [inline]
       slab_free_freelist_hook mm/slub.c:1577 [inline]
       slab_free mm/slub.c:3142 [inline]
       kfree+0x104/0x38c mm/slub.c:4124
       io_dismantle_req fs/io_uring.c:1855 [inline]
       __io_free_req+0x70/0x254 fs/io_uring.c:1867
       io_put_req_find_next fs/io_uring.c:2173 [inline]
       __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279
       __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051
       io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063
       task_work_run+0xdc/0x128 kernel/task_work.c:151
       get_signal+0x6f8/0x980 kernel/signal.c:2562
       do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658
       do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722
       work_pending+0xc/0x180
      
      blkdev_read_iter can truncate iov_iter's count since the count + pos may
      exceed the size of the blkdev. This will confuse io_read that we have
      consume the iovec. And once we do the iov_iter_revert in io_read, we
      will trigger the slab-out-of-bounds. Fix it by reexpand the count with
      size has been truncated.
      
      blkdev_write_iter can trigger the problem too.
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Acked-by: NPavel Begunkov <asml.silencec@gmail.com>
      Link: https://lore.kernel.org/r/20210401071807.3328235-1-yangerkun@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      cf7b39a0
    • M
      mm: introduce and use mapping_empty() · 7716506a
      Matthew Wilcox (Oracle) 提交于
      Patch series "Remove nrexceptional tracking", v2.
      
      We actually use nrexceptional for very little these days.  It's a minor
      pain to keep in sync with nrpages, but the pain becomes much bigger with
      the THP patches because we don't know how many indices a shadow entry
      occupies.  It's easier to just remove it than keep it accurate.
      
      Also, we save 8 bytes per inode which is nothing to sneeze at; on my
      laptop, it would improve shmem_inode_cache from 22 to 23 objects per
      16kB, and inode_cache from 26 to 27 objects.  Combined, that saves
      a megabyte of memory from a combined usage of 25MB for both caches.
      Unfortunately, ext4 doesn't cross a magic boundary, so it doesn't save
      any memory for ext4.
      
      This patch (of 4):
      
      Instead of checking the two counters (nrpages and nrexceptional), we can
      just check whether i_pages is empty.
      
      Link: https://lkml.kernel.org/r/20201026151849.24232-1-willy@infradead.org
      Link: https://lkml.kernel.org/r/20201026151849.24232-2-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: NVishal Verma <vishal.l.verma@intel.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7716506a
  11. 12 4月, 2021 2 次提交
  12. 09 4月, 2021 1 次提交
  13. 02 4月, 2021 1 次提交
  14. 29 3月, 2021 1 次提交
  15. 23 3月, 2021 1 次提交
  16. 11 3月, 2021 1 次提交
  17. 06 3月, 2021 1 次提交
    • J
      block: Try to handle busy underlying device on discard · 56887cff
      Jan Kara 提交于
      Commit 384d87ef ("block: Do not discard buffers under a mounted
      filesystem") made paths issuing discard or zeroout requests to the
      underlying device try to grab block device in exclusive mode. If that
      failed we returned EBUSY to userspace. This however caused unexpected
      fallout in userspace where e.g. FUSE filesystems issue discard requests
      from userspace daemons although the device is open exclusively by the
      kernel. Also shrinking of logical volume by LVM issues discard requests
      to a device which may be claimed exclusively because there's another LV
      on the same PV. So to avoid these userspace regressions, fall back to
      invalidate_inode_pages2_range() instead of returning EBUSY to userspace
      and return EBUSY only of that call fails as well (meaning that there's
      indeed someone using the particular device range we are trying to
      discard).
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=211167
      Fixes: 384d87ef ("block: Do not discard buffers under a mounted filesystem")
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      56887cff
  18. 27 2月, 2021 1 次提交
  19. 25 2月, 2021 1 次提交
  20. 28 1月, 2021 2 次提交
    • C
      block: use an on-stack bio in blkdev_issue_flush · c6bf3f0e
      Christoph Hellwig 提交于
      There is no point in allocating memory for a synchronous flush.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Acked-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c6bf3f0e
    • M
      Revert "block: simplify set_init_blocksize" to regain lost performance · 8dc932d3
      Maxim Mikityanskiy 提交于
      The cited commit introduced a serious regression with SATA write speed,
      as found by bisecting. This patch reverts this commit, which restores
      write speed back to the values observed before this commit.
      
      The performance tests were done on a Helios4 NAS (2nd batch) with 4 HDDs
      (WD8003FFBX) using dd (bs=1M count=2000). "Direct" is a test with a
      single HDD, the rest are different RAID levels built over the first
      partitions of 4 HDDs. Test results are in MB/s, R is read, W is write.
      
                      | Direct | RAID0 | RAID10 f2 | RAID10 n2 | RAID6
      ----------------+--------+-------+-----------+-----------+--------
      9011495c    | R:256  | R:313 | R:276     | R:313     | R:323
      (before faulty) | W:254  | W:253 | W:195     | W:204     | W:117
      ----------------+--------+-------+-----------+-----------+--------
      5ff9f192    | R:257  | R:398 | R:312     | R:344     | R:391
      (faulty commit) | W:154  | W:122 | W:67.7    | W:66.6    | W:67.2
      ----------------+--------+-------+-----------+-----------+--------
      5.10.10         | R:256  | R:401 | R:312     | R:356     | R:375
      unpatched       | W:149  | W:123 | W:64      | W:64.1    | W:61.5
      ----------------+--------+-------+-----------+-----------+--------
      5.10.10         | R:255  | R:396 | R:312     | R:340     | R:393
      patched         | W:247  | W:274 | W:220     | W:225     | W:121
      
      Applying this patch doesn't hurt read performance, while improves the
      write speed by 1.5x - 3.5x (more impact on RAID tests). The write speed
      is restored back to the state before the faulty commit, and even a bit
      higher in RAID tests (which aren't HDD-bound on this device) - that is
      likely related to other optimizations done between the faulty commit and
      5.10.10 which also improved the read speed.
      Signed-off-by: NMaxim Mikityanskiy <maxtram95@gmail.com>
      Fixes: 5ff9f192 ("block: simplify set_init_blocksize")
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8dc932d3
  21. 27 1月, 2021 2 次提交
  22. 25 1月, 2021 1 次提交
  23. 08 1月, 2021 2 次提交
    • C
      block: pre-initialize struct block_device in bdev_alloc_inode · 2d2f6f1b
      Christoph Hellwig 提交于
      bdev_evict_inode and bdev_free_inode are also called for the root inode
      of bdevfs, for which bdev_alloc is never called.  Move the zeroing o
      f struct block_device and the initialization of the bd_bdi field into
      bdev_alloc_inode to make sure they are initialized for the root inode
      as well.
      
      Fixes: e6cb5382 ("block: initialize struct block_device in bdev_alloc")
      Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2d2f6f1b
    • S
      fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb · 04a6a536
      Satya Tangirala 提交于
      freeze/thaw_bdev() currently use bdev->bd_fsfreeze_count to infer
      whether or not bdev->bd_fsfreeze_sb is valid (it's valid iff
      bd_fsfreeze_count is non-zero). thaw_bdev() doesn't nullify
      bd_fsfreeze_sb.
      
      But this means a freeze_bdev() call followed by a thaw_bdev() call can
      leave bd_fsfreeze_sb with a non-null value, while bd_fsfreeze_count is
      zero. If freeze_bdev() is called again, and this time
      get_active_super() returns NULL (e.g. because the FS is unmounted),
      we'll end up with bd_fsfreeze_count > 0, but bd_fsfreeze_sb is
      *untouched* - it stays the same (now garbage) value. A subsequent
      thaw_bdev() will decide that the bd_fsfreeze_sb value is legitimate
      (since bd_fsfreeze_count > 0), and attempt to use it.
      
      Fix this by always setting bd_fsfreeze_sb to NULL when
      bd_fsfreeze_count is successfully decremented to 0 in thaw_sb().
      Alternatively, we could set bd_fsfreeze_sb to whatever
      get_active_super() returns in freeze_bdev() whenever bd_fsfreeze_count
      is successfully incremented to 1 from 0 (which can be achieved cleanly
      by moving the line currently setting bd_fsfreeze_sb to immediately
      after the "sync:" label, but it might be a little too subtle/easily
      overlooked in future).
      
      This fixes the currently panicking xfstests generic/085.
      
      Fixes: 040f04bd ("fs: simplify freeze_bdev/thaw_bdev")
      Signed-off-by: NSatya Tangirala <satyat@google.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      04a6a536
  24. 30 12月, 2020 1 次提交
  25. 22 12月, 2020 1 次提交