1. 06 9月, 2019 9 次提交
    • K
      nvme-pci: Fix async probe remove race · 4a982919
      Keith Busch 提交于
      [ Upstream commit bd46a90634302bfe791e93ad5496f98f165f7ae0 ]
      
      Ensure the controller is not in the NEW state when nvme_probe() exits.
      This will always allow a subsequent nvme_remove() to set the state to
      DELETING, fixing a potential race between the initial asynchronous probe
      and device removal.
      Reported-by: NLi Zhong <lizhongfs@gmail.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4a982919
    • S
      nvme: fix a possible deadlock when passthru commands sent to a multipath device · 431f579a
      Sagi Grimberg 提交于
      [ Upstream commit b9156daeb1601d69007b7e50efcf89d69d72ec1d ]
      
      When the user issues a command with side effects, we will end up freezing
      the namespace request queue when updating disk info (and the same for
      the corresponding mpath disk node).
      
      However, we are not freezing the mpath node request queue,
      which means that mpath I/O can still come in and block on blk_queue_enter
      (called from nvme_ns_head_make_request -> direct_make_request).
      
      This is a deadlock, because blk_queue_enter will block until the inner
      namespace request queue is unfroze, but that process is blocked because
      the namespace revalidation is trying to update the mpath disk info
      and freeze its request queue (which will never complete because
      of the I/O that is blocked on blk_queue_enter).
      
      Fix this by freezing all the subsystem nsheads request queues before
      executing the passthru command. Given that these commands are infrequent
      we should not worry about this temporary I/O freeze to keep things sane.
      
      Here is the matching hang traces:
      --
      [ 374.465002] INFO: task systemd-udevd:17994 blocked for more than 122 seconds.
      [ 374.472975] Not tainted 5.2.0-rc3-mpdebug+ #42
      [ 374.478522] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 374.487274] systemd-udevd D 0 17994 1 0x00000000
      [ 374.493407] Call Trace:
      [ 374.496145] __schedule+0x2ef/0x620
      [ 374.500047] schedule+0x38/0xa0
      [ 374.503569] blk_queue_enter+0x139/0x220
      [ 374.507959] ? remove_wait_queue+0x60/0x60
      [ 374.512540] direct_make_request+0x60/0x130
      [ 374.517219] nvme_ns_head_make_request+0x11d/0x420 [nvme_core]
      [ 374.523740] ? generic_make_request_checks+0x307/0x6f0
      [ 374.529484] generic_make_request+0x10d/0x2e0
      [ 374.534356] submit_bio+0x75/0x140
      [ 374.538163] ? guard_bio_eod+0x32/0xe0
      [ 374.542361] submit_bh_wbc+0x171/0x1b0
      [ 374.546553] block_read_full_page+0x1ed/0x330
      [ 374.551426] ? check_disk_change+0x70/0x70
      [ 374.556008] ? scan_shadow_nodes+0x30/0x30
      [ 374.560588] blkdev_readpage+0x18/0x20
      [ 374.564783] do_read_cache_page+0x301/0x860
      [ 374.569463] ? blkdev_writepages+0x10/0x10
      [ 374.574037] ? prep_new_page+0x88/0x130
      [ 374.578329] ? get_page_from_freelist+0xa2f/0x1280
      [ 374.583688] ? __alloc_pages_nodemask+0x179/0x320
      [ 374.588947] read_cache_page+0x12/0x20
      [ 374.593142] read_dev_sector+0x2d/0xd0
      [ 374.597337] read_lba+0x104/0x1f0
      [ 374.601046] find_valid_gpt+0xfa/0x720
      [ 374.605243] ? string_nocheck+0x58/0x70
      [ 374.609534] ? find_valid_gpt+0x720/0x720
      [ 374.614016] efi_partition+0x89/0x430
      [ 374.618113] ? string+0x48/0x60
      [ 374.621632] ? snprintf+0x49/0x70
      [ 374.625339] ? find_valid_gpt+0x720/0x720
      [ 374.629828] check_partition+0x116/0x210
      [ 374.634214] rescan_partitions+0xb6/0x360
      [ 374.638699] __blkdev_reread_part+0x64/0x70
      [ 374.643377] blkdev_reread_part+0x23/0x40
      [ 374.647860] blkdev_ioctl+0x48c/0x990
      [ 374.651956] block_ioctl+0x41/0x50
      [ 374.655766] do_vfs_ioctl+0xa7/0x600
      [ 374.659766] ? locks_lock_inode_wait+0xb1/0x150
      [ 374.664832] ksys_ioctl+0x67/0x90
      [ 374.668539] __x64_sys_ioctl+0x1a/0x20
      [ 374.672732] do_syscall_64+0x5a/0x1c0
      [ 374.676828] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [ 374.738474] INFO: task nvmeadm:49141 blocked for more than 123 seconds.
      [ 374.745871] Not tainted 5.2.0-rc3-mpdebug+ #42
      [ 374.751419] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 374.760170] nvmeadm D 0 49141 36333 0x00004080
      [ 374.766301] Call Trace:
      [ 374.769038] __schedule+0x2ef/0x620
      [ 374.772939] schedule+0x38/0xa0
      [ 374.776452] blk_mq_freeze_queue_wait+0x59/0x100
      [ 374.781614] ? remove_wait_queue+0x60/0x60
      [ 374.786192] blk_mq_freeze_queue+0x1a/0x20
      [ 374.790773] nvme_update_disk_info.isra.57+0x5f/0x350 [nvme_core]
      [ 374.797582] ? nvme_identify_ns.isra.50+0x71/0xc0 [nvme_core]
      [ 374.804006] __nvme_revalidate_disk+0xe5/0x110 [nvme_core]
      [ 374.810139] nvme_revalidate_disk+0xa6/0x120 [nvme_core]
      [ 374.816078] ? nvme_submit_user_cmd+0x11e/0x320 [nvme_core]
      [ 374.822299] nvme_user_cmd+0x264/0x370 [nvme_core]
      [ 374.827661] nvme_dev_ioctl+0x112/0x1d0 [nvme_core]
      [ 374.833114] do_vfs_ioctl+0xa7/0x600
      [ 374.837117] ? __audit_syscall_entry+0xdd/0x130
      [ 374.842184] ksys_ioctl+0x67/0x90
      [ 374.845891] __x64_sys_ioctl+0x1a/0x20
      [ 374.850082] do_syscall_64+0x5a/0x1c0
      [ 374.854178] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      --
      Reported-by: NJames Puthukattukaran <james.puthukattukaran@oracle.com>
      Tested-by: NJames Puthukattukaran <james.puthukattukaran@oracle.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      431f579a
    • L
      nvmet-loop: Flush nvme_delete_wq when removing the port · 32c0b8f1
      Logan Gunthorpe 提交于
      [ Upstream commit 86b9a63e595ff03f9d0a7b92b6acc231fecefc29 ]
      
      After calling nvme_loop_delete_ctrl(), the controllers will not
      yet be deleted because nvme_delete_ctrl() only schedules work
      to do the delete.
      
      This means a race can occur if a port is removed but there
      are still active controllers trying to access that memory.
      
      To fix this, flush the nvme_delete_wq before returning from
      nvme_loop_remove_port() so that any controllers that might
      be in the process of being deleted won't access a freed port.
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      32c0b8f1
    • D
      afs: Only update d_fsdata if different in afs_d_revalidate() · 9c55dc85
      David Howells 提交于
      [ Upstream commit 5dc84855b0fc7e1db182b55c5564fd539d6eff92 ]
      
      In the in-kernel afs filesystem, d_fsdata is set with the data version of
      the parent directory.  afs_d_revalidate() will update this to the current
      directory version, but it shouldn't do this if it the value it read from
      d_fsdata is the same as no lock is held and cmpxchg() is not used.
      
      Fix the code to only change the value if it is different from the current
      directory version.
      
      Fixes: 260a9803 ("[AFS]: Add "directory write" support.")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      9c55dc85
    • J
      fs: afs: Fix a possible null-pointer dereference in afs_put_read() · 24e093b9
      Jia-Ju Bai 提交于
      [ Upstream commit a6eed4ab5dd4bfb696c1a3f49742b8d1846a66a0 ]
      
      In afs_read_dir(), there is an if statement on line 255 to check whether
      req->pages is NULL:
      	if (!req->pages)
      		goto error;
      
      If req->pages is NULL, afs_put_read() on line 337 is executed.
      In afs_put_read(), req->pages[i] is used on line 195.
      Thus, a possible null-pointer dereference may occur in this case.
      
      To fix this possible bug, an if statement is added in afs_put_read() to
      check req->pages.
      
      This bug is found by a static analysis tool STCheck written by us.
      
      Fixes: f3ddee8d ("afs: Fix directory handling")
      Signed-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      24e093b9
    • M
      afs: Fix loop index mixup in afs_deliver_vl_get_entry_by_name_u() · 8e5179f9
      Marc Dionne 提交于
      [ Upstream commit 4a46fdba449a5cd890271df5a9e23927d519ed00 ]
      
      afs_deliver_vl_get_entry_by_name_u() scans through the vl entry
      received from the volume location server and builds a return list
      containing the sites that are currently valid.  When assigning
      values for the return list, the index into the vl entry (i) is used
      rather than the one for the new list (entry->nr_server).  If all
      sites are usable, this works out fine as the indices will match.
      If some sites are not valid, for example if AFS_VLSF_DONTUSE is
      set, fs_mask and the uuid will be set for the wrong return site.
      
      Fix this by using entry->nr_server as the index into the arrays
      being filled in rather than i.
      
      This can lead to EDESTADDRREQ errors if none of the returned sites
      have a valid fs_mask.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8e5179f9
    • D
      afs: Fix the CB.ProbeUuid service handler to reply correctly · dfc438c0
      David Howells 提交于
      [ Upstream commit 2067b2b3f4846402a040286135f98f46f8919939 ]
      
      Fix the service handler function for the CB.ProbeUuid RPC call so that it
      replies in the correct manner - that is an empty reply for success and an
      abort of 1 for failure.
      
      Putting 0 or 1 in an integer in the body of the reply should result in the
      fileserver throwing an RX_PROTOCOL_ERROR abort and discarding its record of
      the client; older servers, however, don't necessarily check that all the
      data got consumed, and so might incorrectly think that they got a positive
      response and associate the client with the wrong host record.
      
      If the client is incorrectly associated, this will result in callbacks
      intended for a different client being delivered to this one and then, when
      the other client connects and responds positively, all of the callback
      promises meant for the client that issued the improper response will be
      lost and it won't receive any further change notifications.
      
      Fixes: 9396d496 ("afs: support the CB.ProbeUuid RPC op")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      dfc438c0
    • A
      nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns · 7436dc2a
      Anthony Iliopoulos 提交于
      [ Upstream commit fab7772bfbcfe8fb8e3e352a6a8fcaf044cded17 ]
      
      When CONFIG_NVME_MULTIPATH is set, only the hidden gendisk associated
      with the per-controller ns is run through revalidate_disk when a
      rescan is triggered, while the visible blockdev never gets its size
      (bdev->bd_inode->i_size) updated to reflect any capacity changes that
      may have occurred.
      
      This prevents online resizing of nvme block devices and in extension of
      any filesystems atop that will are unable to expand while mounted, as
      userspace relies on the blockdev size for obtaining the disk capacity
      (via BLKGETSIZE/64 ioctls).
      
      Fix this by explicitly revalidating the actual namespace gendisk in
      addition to the per-controller gendisk, when multipath is enabled.
      Signed-off-by: NAnthony Iliopoulos <ailiopoulos@suse.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7436dc2a
    • A
      dmaengine: ste_dma40: fix unneeded variable warning · 2013d6ec
      Arnd Bergmann 提交于
      [ Upstream commit 5d6fb560729a5d5554e23db8d00eb57cd0021083 ]
      
      clang-9 points out that there are two variables that depending on the
      configuration may only be used in an ARRAY_SIZE() expression but not
      referenced:
      
      drivers/dma/ste_dma40.c:145:12: error: variable 'd40_backup_regs' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
      static u32 d40_backup_regs[] = {
                 ^
      drivers/dma/ste_dma40.c:214:12: error: variable 'd40_backup_regs_chan' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
      static u32 d40_backup_regs_chan[] = {
      
      Mark these __maybe_unused to shut up the warning.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Link: https://lore.kernel.org/r/20190712091357.744515-1-arnd@arndb.deSigned-off-by: NVinod Koul <vkoul@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2013d6ec
  2. 29 8月, 2019 31 次提交