1. 13 5月, 2021 2 次提交
    • H
      nvmet: use new ana_log_size instead the old one · e181811b
      Hou Pu 提交于
      The new ana_log_size should be used instead of the old one.
      Or kernel NULL pointer dereference will happen like below:
      
      [   38.957849][   T69] BUG: kernel NULL pointer dereference, address: 000000000000003c
      [   38.975550][   T69] #PF: supervisor write access in kernel mode
      [   38.975955][   T69] #PF: error_code(0x0002) - not-present page
      [   38.976905][   T69] PGD 0 P4D 0
      [   38.979388][   T69] Oops: 0002 [#1] SMP NOPTI
      [   38.980488][   T69] CPU: 0 PID: 69 Comm: kworker/0:2 Not tainted 5.12.0+ #54
      [   38.981254][   T69] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      [   38.982502][   T69] Workqueue: events nvme_loop_execute_work
      [   38.985219][   T69] RIP: 0010:memcpy_orig+0x68/0x10f
      [   38.986203][   T69] Code: 83 c2 20 eb 44 48 01 d6 48 01 d7 48 83 ea 20 0f 1f 00 48 83 ea 20 4c 8b 46 f8 4c 8b 4e f0 4c 8b 56 e8 4c 8b 5e e0 48 8d 76 e0 <4c> 89 47 f8 4c 89 4f f0 4c 89 57 e8 4c 89 5f e0 48 8d 7f e0 73 d2
      [   38.987677][   T69] RSP: 0018:ffffc900001b7d48 EFLAGS: 00000287
      [   38.987996][   T69] RAX: 0000000000000020 RBX: 0000000000000024 RCX: 0000000000000010
      [   38.988327][   T69] RDX: ffffffffffffffe4 RSI: ffff8881084bc004 RDI: 0000000000000044
      [   38.988620][   T69] RBP: 0000000000000024 R08: 0000000100000000 R09: 0000000000000000
      [   38.988991][   T69] R10: 0000000100000000 R11: 0000000000000001 R12: 0000000000000024
      [   38.989289][   T69] R13: ffff8881084bc000 R14: 0000000000000000 R15: 0000000000000024
      [   38.989845][   T69] FS:  0000000000000000(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000
      [   38.990234][   T69] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   38.990490][   T69] CR2: 000000000000003c CR3: 00000001085b2000 CR4: 00000000000006f0
      [   38.991105][   T69] Call Trace:
      [   38.994157][   T69]  sg_copy_buffer+0xb8/0xf0
      [   38.995357][   T69]  nvmet_copy_to_sgl+0x48/0x6d
      [   38.995565][   T69]  nvmet_execute_get_log_page_ana+0xd4/0x1cb
      [   38.995792][   T69]  nvmet_execute_get_log_page+0xc9/0x146
      [   38.995992][   T69]  nvme_loop_execute_work+0x3e/0x44
      [   38.996181][   T69]  process_one_work+0x1c3/0x3c0
      [   38.996393][   T69]  worker_thread+0x44/0x3d0
      [   38.996600][   T69]  ? cancel_delayed_work+0x90/0x90
      [   38.996804][   T69]  kthread+0xf7/0x130
      [   38.996961][   T69]  ? kthread_create_worker_on_cpu+0x70/0x70
      [   38.997171][   T69]  ret_from_fork+0x22/0x30
      [   38.997705][   T69] Modules linked in:
      [   38.998741][   T69] CR2: 000000000000003c
      [   39.000104][   T69] ---[ end trace e719927b609d0fa0 ]---
      
      Fixes: 5e1f6899 ("nvme-multipath: fix double initialization of ANA state")
      Signed-off-by: NHou Pu <houpu.main@gmail.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      e181811b
    • D
      nvmet: seset ns->file when open fails · 85428bea
      Daniel Wagner 提交于
      Reset the ns->file value to NULL also in the error case in
      nvmet_file_ns_enable().
      
      The ns->file variable points either to file object or contains the
      error code after the filp_open() call. This can lead to following
      problem:
      
      When the user first setups an invalid file backend and tries to enable
      the ns, it will fail. Then the user switches over to a bdev backend
      and enables successfully the ns. The first received I/O will crash the
      system because the IO backend is chosen based on the ns->file value:
      
      static u16 nvmet_parse_io_cmd(struct nvmet_req *req)
      {
      	[...]
      
      	if (req->ns->file)
      		return nvmet_file_parse_io_cmd(req);
      
      	return nvmet_bdev_parse_io_cmd(req);
      }
      Reported-by: NEnzo Matsumiya <ematsumiya@suse.com>
      Signed-off-by: NDaniel Wagner <dwagner@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      85428bea
  2. 12 5月, 2021 7 次提交
    • C
      nvmet: demote fabrics cmd parse err msg to debug · 7a4ffd20
      Chaitanya Kulkarni 提交于
      Host can send invalid commands and flood the target with error messages.
      Demote the error message from pr_err() to pr_debug() in
      nvmet_parse_fabrics_cmd() and nvmet_parse_connect_cmd().
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      7a4ffd20
    • C
      nvmet: use helper to remove the duplicate code · 4c2dab2b
      Chaitanya Kulkarni 提交于
      Use the helper nvmet_report_invalid_opcode() to report invalid opcode
      so we can remove the duplicate code.
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      4c2dab2b
    • C
      nvmet: demote discovery cmd parse err msg to debug · 3651aaac
      Chaitanya Kulkarni 提交于
      Host can send invalid commands and flood the target with error messages
      for the discovery controller. Demote the error message from pr_err() to
      pr_debug( in nvmet_parse_discovery_cmd(). 
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      3651aaac
    • M
      nvmet-rdma: Fix NULL deref when SEND is completed with error · 8cc365f9
      Michal Kalderon 提交于
      When running some traffic and taking down the link on peer, a
      retry counter exceeded error is received. This leads to
      nvmet_rdma_error_comp which tried accessing the cq_context to
      obtain the queue. The cq_context is no longer valid after the
      fix to use shared CQ mechanism and should be obtained similar
      to how it is obtained in other functions from the wc->qp.
      
      [ 905.786331] nvmet_rdma: SEND for CQE 0x00000000e3337f90 failed with status transport retry counter exceeded (12).
      [ 905.832048] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
      [ 905.839919] PGD 0 P4D 0
      [ 905.842464] Oops: 0000 1 SMP NOPTI
      [ 905.846144] CPU: 13 PID: 1557 Comm: kworker/13:1H Kdump: loaded Tainted: G OE --------- - - 4.18.0-304.el8.x86_64 #1
      [ 905.872135] RIP: 0010:nvmet_rdma_error_comp+0x5/0x1b [nvmet_rdma]
      [ 905.878259] Code: 19 4f c0 e8 89 b3 a5 f6 e9 5b e0 ff ff 0f b7 75 14 4c 89 ea 48 c7 c7 08 1a 4f c0 e8 71 b3 a5 f6 e9 4b e0 ff ff 0f 1f 44 00 00 <48> 8b 47 48 48 85 c0 74 08 48 89 c7 e9 98 bf 49 00 e9 c3 e3 ff ff
      [ 905.897135] RSP: 0018:ffffab601c45fe28 EFLAGS: 00010246
      [ 905.902387] RAX: 0000000000000065 RBX: ffff9e729ea2f800 RCX: 0000000000000000
      [ 905.909558] RDX: 0000000000000000 RSI: ffff9e72df9567c8 RDI: 0000000000000000
      [ 905.916731] RBP: ffff9e729ea2b400 R08: 000000000000074d R09: 0000000000000074
      [ 905.923903] R10: 0000000000000000 R11: ffffab601c45fcc0 R12: 0000000000000010
      [ 905.931074] R13: 0000000000000000 R14: 0000000000000010 R15: ffff9e729ea2f400
      [ 905.938247] FS: 0000000000000000(0000) GS:ffff9e72df940000(0000) knlGS:0000000000000000
      [ 905.938249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 905.950067] nvmet_rdma: SEND for CQE 0x00000000c7356cca failed with status transport retry counter exceeded (12).
      [ 905.961855] CR2: 0000000000000048 CR3: 000000678d010004 CR4: 00000000007706e0
      [ 905.961855] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 905.961856] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 905.961857] PKRU: 55555554
      [ 906.010315] Call Trace:
      [ 906.012778] __ib_process_cq+0x89/0x170 [ib_core]
      [ 906.017509] ib_cq_poll_work+0x26/0x80 [ib_core]
      [ 906.022152] process_one_work+0x1a7/0x360
      [ 906.026182] ? create_worker+0x1a0/0x1a0
      [ 906.030123] worker_thread+0x30/0x390
      [ 906.033802] ? create_worker+0x1a0/0x1a0
      [ 906.037744] kthread+0x116/0x130
      [ 906.040988] ? kthread_flush_work_fn+0x10/0x10
      [ 906.045456] ret_from_fork+0x1f/0x40
      
      Fixes: ca0f1a80 ("nvmet-rdma: use new shared CQ mechanism")
      Signed-off-by: NShai Malin <smalin@marvell.com>
      Signed-off-by: NMichal Kalderon <michal.kalderon@marvell.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      8cc365f9
    • C
      nvmet: fix inline bio check for passthru · ab96de5d
      Chaitanya Kulkarni 提交于
      When handling passthru commands, for inline bio allocation we only
      consider the transfer size. This works well when req->sg_cnt fits into
      the req->inline_bvec, but it will result in the early return from
      bio_add_hw_page() when req->sg_cnt > NVMET_MAX_INLINE_BVEC.
      
      Consider an I/O of size 32768 and first buffer is not aligned to the
      page boundary, then I/O is split in following manner :-
      
      [ 2206.256140] nvmet: sg->length 3440 sg->offset 656
      [ 2206.256144] nvmet: sg->length 4096 sg->offset 0
      [ 2206.256148] nvmet: sg->length 4096 sg->offset 0
      [ 2206.256152] nvmet: sg->length 4096 sg->offset 0
      [ 2206.256155] nvmet: sg->length 4096 sg->offset 0
      [ 2206.256159] nvmet: sg->length 4096 sg->offset 0
      [ 2206.256163] nvmet: sg->length 4096 sg->offset 0
      [ 2206.256166] nvmet: sg->length 4096 sg->offset 0
      [ 2206.256170] nvmet: sg->length 656 sg->offset 0
      
      Now the req->transfer_size == NVMET_MAX_INLINE_DATA_LEN i.e. 32768, but
      the req->sg_cnt is (9) > NVMET_MAX_INLINE_BIOVEC which is (8).
      This will result in early return in the following code path :-
      
      nvmet_bdev_execute_rw()
      	bio_add_pc_page()
      		bio_add_hw_page()
      			if (bio_full(bio, len))
      				return 0;
      
      Use previously introduced helper nvmet_use_inline_bvec() to consider
      req->sg_cnt when using inline bio. This only affects nvme-loop
      transport.
      
      Fixes: dab3902b ("nvmet: use inline bio for passthru fast path")
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      ab96de5d
    • C
      nvmet: fix inline bio check for bdev-ns · 608a9690
      Chaitanya Kulkarni 提交于
      When handling rw commands, for inline bio case we only consider
      transfer size. This works well when req->sg_cnt fits into the
      req->inline_bvec, but it will result in the warning in
      __bio_add_page() when req->sg_cnt > NVMET_MAX_INLINE_BVEC.
      
      Consider an I/O size 32768 and first page is not aligned to the page
      boundary, then I/O is split in following manner :-
      
      [ 2206.256140] nvmet: sg->length 3440 sg->offset 656
      [ 2206.256144] nvmet: sg->length 4096 sg->offset 0
      [ 2206.256148] nvmet: sg->length 4096 sg->offset 0
      [ 2206.256152] nvmet: sg->length 4096 sg->offset 0
      [ 2206.256155] nvmet: sg->length 4096 sg->offset 0
      [ 2206.256159] nvmet: sg->length 4096 sg->offset 0
      [ 2206.256163] nvmet: sg->length 4096 sg->offset 0
      [ 2206.256166] nvmet: sg->length 4096 sg->offset 0
      [ 2206.256170] nvmet: sg->length 656 sg->offset 0
      
      Now the req->transfer_size == NVMET_MAX_INLINE_DATA_LEN i.e. 32768, but
      the req->sg_cnt is (9) > NVMET_MAX_INLINE_BIOVEC which is (8).
      This will result in the following warning message :-
      
      nvmet_bdev_execute_rw()
      	bio_add_page()
      		__bio_add_page()
      			WARN_ON_ONCE(bio_full(bio, len));
      
      This scenario is very hard to reproduce on the nvme-loop transport only
      with rw commands issued with the passthru IOCTL interface from the host
      application and the data buffer is allocated with the malloc() and not
      the posix_memalign().
      
      Fixes: 73383adf ("nvmet: don't split large I/Os unconditionally")
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      608a9690
    • C
      nvme-multipath: fix double initialization of ANA state · 5e1f6899
      Christoph Hellwig 提交于
      nvme_init_identify and thus nvme_mpath_init can be called multiple
      times and thus must not overwrite potentially initialized or in-use
      fields.  Split out a helper for the basic initialization when the
      controller is initialized and make sure the init_identify path does
      not blindly change in-use data structures.
      
      Fixes: 0d0b660f ("nvme: add ANA support")
      Reported-by: NMartin Wilck <mwilck@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      5e1f6899
  3. 11 5月, 2021 1 次提交
    • O
      kyber: fix out of bounds access when preempted · efed9a33
      Omar Sandoval 提交于
      __blk_mq_sched_bio_merge() gets the ctx and hctx for the current CPU and
      passes the hctx to ->bio_merge(). kyber_bio_merge() then gets the ctx
      for the current CPU again and uses that to get the corresponding Kyber
      context in the passed hctx. However, the thread may be preempted between
      the two calls to blk_mq_get_ctx(), and the ctx returned the second time
      may no longer correspond to the passed hctx. This "works" accidentally
      most of the time, but it can cause us to read garbage if the second ctx
      came from an hctx with more ctx's than the first one (i.e., if
      ctx->index_hw[hctx->type] > hctx->nr_ctx).
      
      This manifested as this UBSAN array index out of bounds error reported
      by Jakub:
      
      UBSAN: array-index-out-of-bounds in ../kernel/locking/qspinlock.c:130:9
      index 13106 is out of range for type 'long unsigned int [128]'
      Call Trace:
       dump_stack+0xa4/0xe5
       ubsan_epilogue+0x5/0x40
       __ubsan_handle_out_of_bounds.cold.13+0x2a/0x34
       queued_spin_lock_slowpath+0x476/0x480
       do_raw_spin_lock+0x1c2/0x1d0
       kyber_bio_merge+0x112/0x180
       blk_mq_submit_bio+0x1f5/0x1100
       submit_bio_noacct+0x7b0/0x870
       submit_bio+0xc2/0x3a0
       btrfs_map_bio+0x4f0/0x9d0
       btrfs_submit_data_bio+0x24e/0x310
       submit_one_bio+0x7f/0xb0
       submit_extent_page+0xc4/0x440
       __extent_writepage_io+0x2b8/0x5e0
       __extent_writepage+0x28d/0x6e0
       extent_write_cache_pages+0x4d7/0x7a0
       extent_writepages+0xa2/0x110
       do_writepages+0x8f/0x180
       __writeback_single_inode+0x99/0x7f0
       writeback_sb_inodes+0x34e/0x790
       __writeback_inodes_wb+0x9e/0x120
       wb_writeback+0x4d2/0x660
       wb_workfn+0x64d/0xa10
       process_one_work+0x53a/0xa80
       worker_thread+0x69/0x5b0
       kthread+0x20b/0x240
       ret_from_fork+0x1f/0x30
      
      Only Kyber uses the hctx, so fix it by passing the request_queue to
      ->bio_merge() instead. BFQ and mq-deadline just use that, and Kyber can
      map the queues itself to avoid the mismatch.
      
      Fixes: a6088845 ("block: kyber: make kyber more friendly with merging")
      Reported-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Link: https://lore.kernel.org/r/c7598605401a48d5cfeadebb678abd10af22b83f.1620691329.git.osandov@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      efed9a33
  4. 10 5月, 2021 1 次提交
  5. 09 5月, 2021 1 次提交
  6. 06 5月, 2021 1 次提交
    • Y
      block: reexpand iov_iter after read/write · cf7b39a0
      yangerkun 提交于
      We get a bug:
      
      BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404
      lib/iov_iter.c:1139
      Read of size 8 at addr ffff0000d3fb11f8 by task
      
      CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted
      5.10.0-00843-g352c8610ccd2 #2
      Hardware name: linux,dummy-virt (DT)
      Call trace:
       dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132
       show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x110/0x164 lib/dump_stack.c:118
       print_address_description+0x78/0x5c8 mm/kasan/report.c:385
       __kasan_report mm/kasan/report.c:545 [inline]
       kasan_report+0x148/0x1e4 mm/kasan/report.c:562
       check_memory_region_inline mm/kasan/generic.c:183 [inline]
       __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
       iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139
       io_read fs/io_uring.c:3421 [inline]
       io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943
       __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
       io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
       io_submit_sqe fs/io_uring.c:6395 [inline]
       io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
       __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
       __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
       __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
       do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
       el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
       el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
       el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
      
      Allocated by task 12570:
       stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
       kasan_save_stack mm/kasan/common.c:48 [inline]
       kasan_set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461
       kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475
       __kmalloc+0x23c/0x334 mm/slub.c:3970
       kmalloc include/linux/slab.h:557 [inline]
       __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210
       io_setup_async_rw fs/io_uring.c:3229 [inline]
       io_read fs/io_uring.c:3436 [inline]
       io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943
       __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
       io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
       io_submit_sqe fs/io_uring.c:6395 [inline]
       io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
       __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
       __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
       __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
       do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
       el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
       el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
       el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
      
      Freed by task 12570:
       stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
       kasan_save_stack mm/kasan/common.c:48 [inline]
       kasan_set_track+0x38/0x6c mm/kasan/common.c:56
       kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355
       __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422
       kasan_slab_free+0x10/0x1c mm/kasan/common.c:431
       slab_free_hook mm/slub.c:1544 [inline]
       slab_free_freelist_hook mm/slub.c:1577 [inline]
       slab_free mm/slub.c:3142 [inline]
       kfree+0x104/0x38c mm/slub.c:4124
       io_dismantle_req fs/io_uring.c:1855 [inline]
       __io_free_req+0x70/0x254 fs/io_uring.c:1867
       io_put_req_find_next fs/io_uring.c:2173 [inline]
       __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279
       __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051
       io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063
       task_work_run+0xdc/0x128 kernel/task_work.c:151
       get_signal+0x6f8/0x980 kernel/signal.c:2562
       do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658
       do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722
       work_pending+0xc/0x180
      
      blkdev_read_iter can truncate iov_iter's count since the count + pos may
      exceed the size of the blkdev. This will confuse io_read that we have
      consume the iovec. And once we do the iov_iter_revert in io_read, we
      will trigger the slab-out-of-bounds. Fix it by reexpand the count with
      size has been truncated.
      
      blkdev_write_iter can trigger the problem too.
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Acked-by: NPavel Begunkov <asml.silencec@gmail.com>
      Link: https://lore.kernel.org/r/20210401071807.3328235-1-yangerkun@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      cf7b39a0
  7. 05 5月, 2021 1 次提交
    • J
      Merge tag 'nvme-5.13-2021-05-05' of git://git.infradead.org/nvme into block-5.13 · 9c38475c
      Jens Axboe 提交于
      Pull NVMe fixes from Christoph:
      
      "nvme updates for Linux 5.13
      
       - reset the bdev to ns head when failover (Daniel Wagner)
       - remove unsupported command noise (Keith Busch)
       - misc passthrough improvements (Kanchan Joshi)
       - fix controller ioctl through ns_head (Minwoo Im)
       - fix controller timeouts during reset (Tao Chiu)"
      
      * tag 'nvme-5.13-2021-05-05' of git://git.infradead.org/nvme:
        nvmet: remove unsupported command noise
        nvme-multipath: reset bdev to ns head when failover
        nvme-pci: fix controller reset hang when racing with nvme_timeout
        nvme: move the fabrics queue ready check routines to core
        nvme: avoid memset for passthrough requests
        nvme: add nvme_get_ns helper
        nvme: fix controller ioctl through ns_head
      9c38475c
  8. 04 5月, 2021 14 次提交
  9. 30 4月, 2021 8 次提交
    • L
      Merge tag 'x86-mm-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 635de956
      Linus Torvalds 提交于
      Pull x86 tlb updates from Ingo Molnar:
       "The x86 MM changes in this cycle were:
      
         - Implement concurrent TLB flushes, which overlaps the local TLB
           flush with the remote TLB flush.
      
           In testing this improved sysbench performance measurably by a
           couple of percentage points, especially if TLB-heavy security
           mitigations are active.
      
         - Further micro-optimizations to improve the performance of TLB
           flushes"
      
      * tag 'x86-mm-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        smp: Micro-optimize smp_call_function_many_cond()
        smp: Inline on_each_cpu_cond() and on_each_cpu()
        x86/mm/tlb: Remove unnecessary uses of the inline keyword
        cpumask: Mark functions as pure
        x86/mm/tlb: Do not make is_lazy dirty for no reason
        x86/mm/tlb: Privatize cpu_tlbstate
        x86/mm/tlb: Flush remote and local TLBs concurrently
        x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()
        x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote()
        smp: Run functions concurrently in smp_call_function_many_cond()
      635de956
    • L
      Merge tag 'microblaze-v5.13' of git://git.monstr.eu/linux-2.6-microblaze · d0cc7eca
      Linus Torvalds 提交于
      Pull Microblaze updates from Michal Simek:
       "No new features, just about cleaning up some code and moving to
        generic syscall solution used by other architectures:
      
         - Switch to generic syscall scripts
      
         - Some small fixes"
      
      * tag 'microblaze-v5.13' of git://git.monstr.eu/linux-2.6-microblaze:
        microblaze: add 'fallthrough' to memcpy/memset/memmove
        microblaze: Fix a typo
        microblaze: tag highmem_setup() with __meminit
        microblaze: syscalls: switch to generic syscallhdr.sh
        microblaze: syscalls: switch to generic syscalltbl.sh
      d0cc7eca
    • L
      Merge tag 'mips_5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 77d51337
      Linus Torvalds 提交于
      Pull MIPS updates from Thomas Bogendoerfer:
      
       - removed get_fs/set_fs
      
       - removed broken/unmaintained MIPS KVM trap and emulate support
      
       - added support for Loongson-2K1000
      
       - fixes and cleanups
      
      * tag 'mips_5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (107 commits)
        MIPS: BCM63XX: Use BUG_ON instead of condition followed by BUG.
        MIPS: select ARCH_KEEP_MEMBLOCK unconditionally
        mips: Do not include hi and lo in clobber list for R6
        MIPS:DTS:Correct the license for Loongson-2K
        MIPS:DTS:Fix label name and interrupt number of ohci for Loongson-2K
        MIPS: Avoid handcoded DIVU in `__div64_32' altogether
        lib/math/test_div64: Correct the spelling of "dividend"
        lib/math/test_div64: Fix error message formatting
        mips/bootinfo:correct some comments of fw_arg
        MIPS: Avoid DIVU in `__div64_32' is result would be zero
        MIPS: Reinstate platform `__div64_32' handler
        div64: Correct inline documentation for `do_div'
        lib/math: Add a `do_div' test module
        MIPS: Makefile: Replace -pg with CC_FLAGS_FTRACE
        MIPS: pci-legacy: revert "use generic pci_enable_resources"
        MIPS: Loongson64: Add kexec/kdump support
        MIPS: pci-legacy: use generic pci_enable_resources
        MIPS: pci-legacy: remove busn_resource field
        MIPS: pci-legacy: remove redundant info messages
        MIPS: pci-legacy: stop using of_pci_range_to_resource
        ...
      77d51337
    • L
      Merge tag 'fsnotify_for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 3644286f
      Linus Torvalds 提交于
      Pull fsnotify updates from Jan Kara:
      
       - support for limited fanotify functionality for unpriviledged users
      
       - faster merging of fanotify events
      
       - a few smaller fsnotify improvements
      
      * tag 'fsnotify_for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        shmem: allow reporting fanotify events with file handles on tmpfs
        fs: introduce a wrapper uuid_to_fsid()
        fanotify_user: use upper_32_bits() to verify mask
        fanotify: support limited functionality for unprivileged users
        fanotify: configurable limits via sysfs
        fanotify: limit number of event merge attempts
        fsnotify: use hash table for faster events merge
        fanotify: mix event info and pid into merge key hash
        fanotify: reduce event objectid to 29-bit hash
        fsnotify: allow fsnotify_{peek,remove}_first_event with empty queue
      3644286f
    • L
      Merge tag 'for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 767fcbc8
      Linus Torvalds 提交于
      Pull quota, ext2, reiserfs updates from Jan Kara:
      
       - support for path (instead of device) based quotactl syscall
         (quotactl_path(2))
      
       - ext2 conversion to kmap_local()
      
       - other minor cleanups & fixes
      
      * tag 'for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        fs/reiserfs/journal.c: delete useless variables
        fs/ext2: Replace kmap() with kmap_local_page()
        ext2: Match up ext2_put_page() with ext2_dotdot() and ext2_find_entry()
        fs/ext2/: fix misspellings using codespell tool
        quota: report warning limits for realtime space quotas
        quota: wire up quotactl_path
        quota: Add mountpath based quota support
      767fcbc8
    • L
      Merge tag 'xfs-5.13-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · d2b6f8a1
      Linus Torvalds 提交于
      Pull xfs updates from Darrick Wong:
       "The notable user-visible addition this cycle is ability to remove
        space from the last AG in a filesystem. This is the first of many
        changes needed for full-fledged support for shrinking a filesystem.
        Still needed are (a) the ability to reorganize files and metadata away
        from the end of the fs; (b) the ability to remove entire allocation
        groups; (c) shrink support for realtime volumes; and (d) thorough
        testing of (a-c).
      
        There are a number of performance improvements in this code drop: Dave
        streamlined various parts of the buffer logging code and reduced the
        cost of various debugging checks, and added the ability to pre-create
        the xattr structures while creating files. Brian eliminated
        transaction reservations that were being held across writeback (thus
        reducing livelock potential.
      
        Other random pieces: Pavel fixed the repetitve warnings about
        deprecated mount options, I fixed online fsck to behave itself when a
        readonly remount comes in during scrub, and refactored various other
        parts of that code, Christoph contributed a lot of refactoring this
        cycle. The xfs_icdinode structure has been absorbed into the (incore)
        xfs_inode structure, and the format and flags handling around
        xfs_inode_fork structures has been simplified. Chandan provided a
        number of fixes for extent count overflow related problems that have
        been shaken out by debugging knobs added during 5.12.
      
        Summary:
      
         - Various minor fixes in online scrub.
      
         - Prevent metadata files from being automatically inactivated.
      
         - Validate btree heights by the computed per-btree limits.
      
         - Don't warn about remounting with deprecated mount options.
      
         - Initialize attr forks at create time if we suspect we're going to
           need to store them.
      
         - Reduce memory reallocation workouts in the logging code.
      
         - Fix some theoretical math calculation errors in logged buffers that
           span multiple discontig memory ranges but contiguous ondisk
           regions.
      
         - Speedups in dirty buffer bitmap handling.
      
         - Make type verifier functions more inline-happy to reduce overhead.
      
         - Reduce debug overhead in directory checking code.
      
         - Many many typo fixes.
      
         - Begin to handle the permanent loss of the very end of a filesystem.
      
         - Fold struct xfs_icdinode into xfs_inode.
      
         - Deprecate the long defunct BMV_IF_NO_DMAPI_READ from the bmapx
           ioctl.
      
         - Remove a broken directory block format check from online scrub.
      
         - Fix a bug where we could produce an unnecessarily tall data fork
           btree when creating an attr fork.
      
         - Fix scrub and readonly remounts racing.
      
         - Fix a writeback ioend log deadlock problem by dropping the behavior
           where we could preallocate a setfilesize transaction.
      
         - Fix some bugs in the new extent count checking code.
      
         - Fix some bugs in the attr fork preallocation code.
      
         - Refactor if_flags out of the incore inode fork data structure"
      
      * tag 'xfs-5.13-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (77 commits)
        xfs: remove xfs_quiesce_attr declaration
        xfs: remove XFS_IFEXTENTS
        xfs: remove XFS_IFINLINE
        xfs: remove XFS_IFBROOT
        xfs: only look at the fork format in xfs_idestroy_fork
        xfs: simplify xfs_attr_remove_args
        xfs: rename and simplify xfs_bmap_one_block
        xfs: move the XFS_IFEXTENTS check into xfs_iread_extents
        xfs: drop unnecessary setfilesize helper
        xfs: drop unused ioend private merge and setfilesize code
        xfs: open code ioend needs workqueue helper
        xfs: drop submit side trans alloc for append ioends
        xfs: fix return of uninitialized value in variable error
        xfs: get rid of the ip parameter to xchk_setup_*
        xfs: fix scrub and remount-ro protection when running scrub
        xfs: move the check for post-EOF mappings into xfs_can_free_eofblocks
        xfs: move the xfs_can_free_eofblocks call under the IOLOCK
        xfs: precalculate default inode attribute offset
        xfs: default attr fork size does not handle device inodes
        xfs: inode fork allocation depends on XFS_IFEXTENT flag
        ...
      d2b6f8a1
    • L
      Merge tag 'gfs2-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · f2c80837
      Linus Torvalds 提交于
      Pull gfs2 updates from Andreas Gruenbacher:
      
       - Fix some compiler and kernel-doc warnings
      
       - Various minor cleanups and optimizations
      
       - Add a new sysfs gfs2 status file with some filesystem wide
         information
      
      * tag 'gfs2-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: Fix fall-through warnings for Clang
        gfs2: Fix a number of kernel-doc warnings
        gfs2: Make gfs2_setattr_simple static
        gfs2: Add new sysfs file for gfs2 status
        gfs2: Silence possible null pointer dereference warning
        gfs2: Turn gfs2_meta_indirect_buffer into gfs2_meta_buffer
        gfs2: Replace gfs2_lblk_to_dblk with gfs2_get_extent
        gfs2: Turn gfs2_extent_map into gfs2_{get,alloc}_extent
        gfs2: Add new gfs2_iomap_get helper
        gfs2: Remove unused variable sb_format
        gfs2: Fix dir.c function parameter descriptions
        gfs2: Eliminate gh parameter from go_xmote_bh func
        gfs2: don't create empty buffers for NO_CREATE
      f2c80837
    • L
      Merge tag 'exfat-for-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat · 8ae8932c
      Linus Torvalds 提交于
      Pull exfat updates from Namjae Jeon:
      
       - Improve write performance with dirsync mount option
      
       - Improve lookup performance
      
       - Add support for FITRIM ioctl
      
       - Fix a bug with discard option
      
      * tag 'exfat-for-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
        exfat: speed up iterate/lookup by fixing start point of traversing cluster chain
        exfat: improve write performance when dirsync enabled
        exfat: add support ioctl and FITRIM function
        exfat: introduce bitmap_lock for cluster bitmap access
        exfat: fix erroneous discard when clear cluster bit
      8ae8932c
  10. 29 4月, 2021 4 次提交
    • L
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · d72cd4ad
      Linus Torvalds 提交于
      Pull SCSI updates from James Bottomley:
       "This consists of the usual driver updates (ufs, target, tcmu,
        smartpqi, lpfc, zfcp, qla2xxx, mpt3sas, pm80xx).
      
        The major core change is using a sbitmap instead of an atomic for
        queue tracking"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (412 commits)
        scsi: target: tcm_fc: Fix a kernel-doc header
        scsi: target: Shorten ALUA error messages
        scsi: target: Fix two format specifiers
        scsi: target: Compare explicitly with SAM_STAT_GOOD
        scsi: sd: Introduce a new local variable in sd_check_events()
        scsi: dc395x: Open-code status_byte(u8) calls
        scsi: 53c700: Open-code status_byte(u8) calls
        scsi: smartpqi: Remove unused functions
        scsi: qla4xxx: Remove an unused function
        scsi: myrs: Remove unused functions
        scsi: myrb: Remove unused functions
        scsi: mpt3sas: Fix two kernel-doc headers
        scsi: fcoe: Suppress a compiler warning
        scsi: libfc: Fix a format specifier
        scsi: aacraid: Remove an unused function
        scsi: core: Introduce enum scsi_disposition
        scsi: core: Modify the scsi_send_eh_cmnd() return value for the SDEV_BLOCK case
        scsi: core: Rename scsi_softirq_done() into scsi_complete()
        scsi: core: Remove an incorrect comment
        scsi: core: Make the scsi_alloc_sgtables() documentation more accurate
        ...
      d72cd4ad
    • L
      Merge tag 'vfio-v5.13-rc1' of git://github.com/awilliam/linux-vfio · 238da4d0
      Linus Torvalds 提交于
      Pull VFIO updates from Alex Williamson:
      
       - Embed struct vfio_device into vfio driver structures (Jason
         Gunthorpe)
      
       - Make vfio_mdev type safe (Jason Gunthorpe)
      
       - Remove vfio-pci NVLink2 extensions for POWER9 (Christoph Hellwig)
      
       - Update vfio-pci IGD extensions for OpRegion 2.1+ (Fred Gao)
      
       - Various spelling/blank line fixes (Zhen Lei, Zhou Wang, Bhaskar
         Chowdhury)
      
       - Simplify unpin_pages error handling (Shenming Lu)
      
       - Fix i915 mdev Kconfig dependency (Arnd Bergmann)
      
       - Remove unused structure member (Keqian Zhu)
      
      * tag 'vfio-v5.13-rc1' of git://github.com/awilliam/linux-vfio: (43 commits)
        vfio/gvt: fix DRM_I915_GVT dependency on VFIO_MDEV
        vfio/iommu_type1: Remove unused pinned_page_dirty_scope in vfio_iommu
        vfio/mdev: Correct the function signatures for the mdev_type_attributes
        vfio/mdev: Remove kobj from mdev_parent_ops->create()
        vfio/gvt: Use mdev_get_type_group_id()
        vfio/gvt: Make DRM_I915_GVT depend on VFIO_MDEV
        vfio/mbochs: Use mdev_get_type_group_id()
        vfio/mdpy: Use mdev_get_type_group_id()
        vfio/mtty: Use mdev_get_type_group_id()
        vfio/mdev: Add mdev/mtype_get_type_group_id()
        vfio/mdev: Remove duplicate storage of parent in mdev_device
        vfio/mdev: Add missing error handling to dev_set_name()
        vfio/mdev: Reorganize mdev_device_create()
        vfio/mdev: Add missing reference counting to mdev_type
        vfio/mdev: Expose mdev_get/put_parent to mdev_private.h
        vfio/mdev: Use struct mdev_type in struct mdev_device
        vfio/mdev: Simplify driver registration
        vfio/mdev: Add missing typesafety around mdev_device
        vfio/mdev: Do not allow a mdev_type to have a NULL parent pointer
        vfio/mdev: Fix missing static's on MDEV_TYPE_ATTR's
        ...
      238da4d0
    • L
      Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 35655ceb
      Linus Torvalds 提交于
      Pull clk updates from Stephen Boyd:
       "Here's a collection of largely clk driver updates. The usual suspects
        are here: i.MX, Qualcomm, Renesas, Allwinner, Samsung, and Rockchip,
        but it feels pretty light on commits.
      
        There's only one real commit to the framework core and that's to
        consolidate code. Otherwise the diffstat is dominated by many Qualcomm
        clk driver patches that modernize the driver for the proper way of
        speciying clk parents. That's shifting data around, which could subtly
        break things so I'll be on the lookout for fixes.
      
        New Drivers:
         - Proper clk driver for Mediatek MT7621 SoCs
         - Support for the clock controller on the new Rockchip rk3568
      
        Updates:
         - Simplify Zynq Kconfig dependencies
         - Use clk_hw pointers in socfpga driver
         - Cleanup parent data in qcom clk drivers
         - Some cleanups for rk3399 modularization
         - Fix reparenting of i.MX UART clocks by initializing only the ones
           associated to stdout
         - Correct the PCIE clocks for i.MX8MP and i.MX8MQ
         - Make i.MX LPCG and SCU clocks return on registering failure
         - Kernel doc fixes
         - Add DAB hardware accelerator clocks on Renesas R-Car E3 and M3-N
         - Add timer (TMU) clocks on Renesas R-Car H3 ES1.0
         - Add Timer (TMU & CMT) and thermal sensor (TSC) clocks on
           Renesas R-Car V3U
         - Sigma-delta modulation on Allwinner V3s audio PLL"
      
      * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (82 commits)
        MAINTAINERS: add MT7621 CLOCK maintainer
        staging: mt7621-dts: use valid vendor 'mediatek' instead of invalid 'mtk'
        staging: mt7621-dts: make use of new 'mt7621-clk'
        clk: ralink: add clock driver for mt7621 SoC
        clk: uniphier: Fix potential infinite loop
        clk: qcom: rpmh: add support for SDX55 rpmh IPA clock
        clk: qcom: gcc-sdm845: get rid of the test clock
        clk: qcom: convert SDM845 Global Clock Controller to parent_data
        dt-bindings: clock: separate SDM845 GCC clock bindings
        clk: qcom: apss-ipq-pll: Add missing MODULE_DEVICE_TABLE
        clk: qcom: a53-pll: Add missing MODULE_DEVICE_TABLE
        clk: qcom: a7-pll: Add missing MODULE_DEVICE_TABLE
        dt: bindings: add mt7621-sysc device tree binding documentation
        dt-bindings: clock: add dt binding header for mt7621 clocks
        clk: samsung: Remove redundant dev_err calls
        clk: zynqmp: pll: add set_pll_mode to check condition in zynqmp_pll_enable
        clk: zynqmp: move zynqmp_pll_set_mode out of round_rate callback
        clk: zynqmp: Drop dependency on ARCH_ZYNQMP
        clk: zynqmp: Enable the driver if ZYNQMP_FIRMWARE is selected
        clk: qcom: gcc-sm8350: use ARRAY_SIZE instead of specifying num_parents
        ...
      35655ceb
    • L
      Merge tag 'mailbox-v5.13' of git://git.linaro.org/landing-teams/working/fujitsu/integration · d8201efe
      Linus Torvalds 提交于
      Pull mailbox updates from Jassi Brar:
       "qcom:
         - enable support for SM8350 and SC7280
      
        sprd:
         - refcount channel usage
         - specify interrupt names in dt
         - support sc9863a
      
        arm:
         - drop redundant print
      
        ti:
         - convert dt-bindings to json schema
      
        and misc spelling fixes"
      
      * tag 'mailbox-v5.13' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
        dt-bindings: mailbox: qcom-ipcc: Add compatible for SC7280
        dt-bindings: mailbox: ti,secure-proxy: Convert to json schema
        mailbox: arm_mhu_db: Remove redundant dev_err call in mhu_db_probe()
        mailbox: sprd: Add supplementary inbox support
        dt-bindings: mailbox: Add interrupt-names to SPRD mailbox
        mailbox: sprd: Introduce refcnt when clients requests/free channels
        MAINTAINERS: Add DT bindings directory to mailbox
        mailbox: fix various typos in comments
        mailbox: pcc: fix platform_no_drv_owner.cocci warnings
        dt-bindings: mailbox: Add compatible for SM8350 IPCC
      d8201efe