1. 21 9月, 2019 1 次提交
  2. 06 9月, 2019 1 次提交
  3. 16 8月, 2019 2 次提交
    • A
      drbd: dynamically allocate shash descriptor · 38c919ec
      Arnd Bergmann 提交于
      [ Upstream commit 77ce56e2bfaa64127ae5e23ef136c0168b818777 ]
      
      Building with clang and KASAN, we get a warning about an overly large
      stack frame on 32-bit architectures:
      
      drivers/block/drbd/drbd_receiver.c:921:31: error: stack frame size of 1280 bytes in function 'conn_connect'
            [-Werror,-Wframe-larger-than=]
      
      We already allocate other data dynamically in this function, so
      just do the same for the shash descriptor, which makes up most of
      this memory.
      
      Link: https://lore.kernel.org/lkml/20190617132440.2721536-1-arnd@arndb.de/Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NRoland Kammerer <roland.kammerer@linbit.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      38c919ec
    • M
      loop: set PF_MEMALLOC_NOIO for the worker thread · c9a1c104
      Mikulas Patocka 提交于
      commit d0a255e795ab976481565f6ac178314b34fbf891 upstream.
      
      A deadlock with this stacktrace was observed.
      
      The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio
      shrinker and the shrinker depends on I/O completion in the dm-bufio
      subsystem.
      
      In order to fix the deadlock (and other similar ones), we set the flag
      PF_MEMALLOC_NOIO at loop thread entry.
      
      PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
         #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
         #1 [ffff8813dedfb990] schedule at ffffffff8173fa27
         #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
         #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
         #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
         #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
         #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
         #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
         #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
         #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
        #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
        #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
        #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
        #13 [ffff8813dedfbec0] kthread at ffffffff810a8428
        #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242
      
        PID: 14127  TASK: ffff881455749c00  CPU: 11  COMMAND: "loop1"
         #0 [ffff88272f5af228] __schedule at ffffffff8173f405
         #1 [ffff88272f5af280] schedule at ffffffff8173fa27
         #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e
         #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5
         #4 [ffff88272f5af330] mutex_lock at ffffffff81742133
         #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio]
         #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd
         #7 [ffff88272f5af470] shrink_zone at ffffffff811ad778
         #8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34
         #9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8
        #10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3
        #11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71
        #12 [ffff88272f5af760] new_slab at ffffffff811f4523
        #13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5
        #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b
        #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3
        #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3
        #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs]
        #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994
        #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs]
        #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop]
        #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop]
        #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c
        #23 [ffff88272f5afec0] kthread at ffffffff810a8428
        #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9a1c104
  4. 07 8月, 2019 1 次提交
    • M
      nbd: replace kill_bdev() with __invalidate_device() again · eb828241
      Munehisa Kamata 提交于
      commit 2b5c8f0063e4b263cf2de82029798183cf85c320 upstream.
      
      Commit abbbdf12 ("replace kill_bdev() with __invalidate_device()")
      once did this, but 29eaadc0 ("nbd: stop using the bdev everywhere")
      resurrected kill_bdev() and it has been there since then. So buffer_head
      mappings still get killed on a server disconnection, and we can still
      hit the BUG_ON on a filesystem on the top of the nbd device.
      
        EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
        block nbd0: Receive control failed (result -32)
        block nbd0: shutting down sockets
        print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
        EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
        print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
        EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
        EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
        ------------[ cut here ]------------
        kernel BUG at fs/buffer.c:3057!
        invalid opcode: 0000 [#1] SMP PTI
        CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
        Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
        RIP: 0010:submit_bh_wbc+0x18b/0x190
        ...
        Call Trace:
         jbd2_write_superblock+0xf1/0x230 [jbd2]
         ? account_entity_enqueue+0xc5/0xf0
         jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
         jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
         ? __switch_to_asm+0x40/0x70
         ...
         ? lock_timer_base+0x67/0x80
         kjournald2+0x121/0x360 [jbd2]
         ? remove_wait_queue+0x60/0x60
         kthread+0xf8/0x130
         ? commit_timeout+0x10/0x10 [jbd2]
         ? kthread_bind+0x10/0x10
         ret_from_fork+0x35/0x40
      
      With __invalidate_device(), I no longer hit the BUG_ON with sync or
      unmount on the disconnected device.
      
      Fixes: 29eaadc0 ("nbd: stop using the bdev everywhere")
      Cc: linux-block@vger.kernel.org
      Cc: Ratna Manoj Bolla <manoj.br@gmail.com>
      Cc: nbd@other.debian.org
      Cc: stable@vger.kernel.org
      Cc: David Woodhouse <dwmw@amazon.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NMunehisa Kamata <kamatam@amazon.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb828241
  5. 26 7月, 2019 5 次提交
    • D
      floppy: fix out-of-bounds read in copy_buffer · ff54c44f
      Denis Efremov 提交于
      [ Upstream commit da99466ac243f15fbba65bd261bfc75ffa1532b6 ]
      
      This fixes a global out-of-bounds read access in the copy_buffer
      function of the floppy driver.
      
      The FDDEFPRM ioctl allows one to set the geometry of a disk.  The sect
      and head fields (unsigned int) of the floppy_drive structure are used to
      compute the max_sector (int) in the make_raw_rw_request function.  It is
      possible to overflow the max_sector.  Next, max_sector is passed to the
      copy_buffer function and used in one of the memcpy calls.
      
      An unprivileged user could trigger the bug if the device is accessible,
      but requires a floppy disk to be inserted.
      
      The patch adds the check for the .sect * .head multiplication for not
      overflowing in the set_geometry function.
      
      The bug was found by syzkaller.
      Signed-off-by: NDenis Efremov <efremov@ispras.ru>
      Tested-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ff54c44f
    • D
      floppy: fix invalid pointer dereference in drive_name · a9444d9d
      Denis Efremov 提交于
      [ Upstream commit 9b04609b784027968348796a18f601aed9db3789 ]
      
      This fixes the invalid pointer dereference in the drive_name function of
      the floppy driver.
      
      The native_format field of the struct floppy_drive_params is used as
      floppy_type array index in the drive_name function.  Thus, the field
      should be checked the same way as the autodetect field.
      
      To trigger the bug, one could use a value out of range and set the drive
      parameters with the FDSETDRVPRM ioctl.  Next, FDGETDRVTYP ioctl should
      be used to call the drive_name.  A floppy disk is not required to be
      inserted.
      
      CAP_SYS_ADMIN is required to call FDSETDRVPRM.
      
      The patch adds the check for a value of the native_format field to be in
      the '0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array
      indices.
      
      The bug was found by syzkaller.
      Signed-off-by: NDenis Efremov <efremov@ispras.ru>
      Tested-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a9444d9d
    • D
      floppy: fix out-of-bounds read in next_valid_format · 5b565f32
      Denis Efremov 提交于
      [ Upstream commit 5635f897ed83fd539df78e98ba69ee91592f9bb8 ]
      
      This fixes a global out-of-bounds read access in the next_valid_format
      function of the floppy driver.
      
      The values from autodetect field of the struct floppy_drive_params are
      used as indices for the floppy_type array in the next_valid_format
      function 'floppy_type[DP->autodetect[probed_format]].sect'.
      
      To trigger the bug, one could use a value out of range and set the drive
      parameters with the FDSETDRVPRM ioctl.  A floppy disk is not required to
      be inserted.
      
      CAP_SYS_ADMIN is required to call FDSETDRVPRM.
      
      The patch adds the check for values of the autodetect field to be in the
      '0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array indices.
      
      The bug was found by syzkaller.
      Signed-off-by: NDenis Efremov <efremov@ispras.ru>
      Tested-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5b565f32
    • D
      floppy: fix div-by-zero in setup_format_params · 6e34fd07
      Denis Efremov 提交于
      [ Upstream commit f3554aeb991214cbfafd17d55e2bfddb50282e32 ]
      
      This fixes a divide by zero error in the setup_format_params function of
      the floppy driver.
      
      Two consecutive ioctls can trigger the bug: The first one should set the
      drive geometry with such .sect and .rate values for the F_SECT_PER_TRACK
      to become zero.  Next, the floppy format operation should be called.
      
      A floppy disk is not required to be inserted.  An unprivileged user
      could trigger the bug if the device is accessible.
      
      The patch checks F_SECT_PER_TRACK for a non-zero value in the
      set_geometry function.  The proper check should involve a reasonable
      upper limit for the .sect and .rate fields, but it could change the
      UAPI.
      
      The patch also checks F_SECT_PER_TRACK in the setup_format_params, and
      cancels the formatting operation in case of zero.
      
      The bug was found by syzkaller.
      Signed-off-by: NDenis Efremov <efremov@ispras.ru>
      Tested-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6e34fd07
    • B
      block: null_blk: fix race condition for null_del_dev · c8f75e75
      Bob Liu 提交于
      [ Upstream commit 7602843fd873cae43a444b83b14dfdd114a9659c ]
      
      Dulicate call of null_del_dev() will trigger null pointer error like below.
      The reason is a race condition between nullb_device_power_store() and
      nullb_group_drop_item().
      
        CPU#0                         CPU#1
        ----------------              -----------------
        do_rmdir()
         >configfs_rmdir()
          >client_drop_item()
           >nullb_group_drop_item()
                                      nullb_device_power_store()
      				>null_del_dev()
      
            >test_and_clear_bit(NULLB_DEV_FL_UP
             >null_del_dev()
             ^^^^^
             Duplicated null_dev_dev() triger null pointer error
      
      				>clear_bit(NULLB_DEV_FL_UP
      
      The fix could be keep the sequnce of clear NULLB_DEV_FL_UP and null_del_dev().
      
      [  698.613600] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      [  698.613608] #PF error: [normal kernel read fault]
      [  698.613611] PGD 0 P4D 0
      [  698.613619] Oops: 0000 [#1] SMP PTI
      [  698.613627] CPU: 3 PID: 6382 Comm: rmdir Not tainted 5.0.0+ #35
      [  698.613631] Hardware name: LENOVO 20LJS2EV08/20LJS2EV08, BIOS R0SET33W (1.17 ) 07/18/2018
      [  698.613644] RIP: 0010:null_del_dev+0xc/0x110 [null_blk]
      [  698.613649] Code: 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b eb 97 e8 47 bb 2a e8 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 <8b> 77 18 48 89 fb 4c 8b 27 48 c7 c7 40 57 1e c1 e8 bf c7 cb e8 48
      [  698.613654] RSP: 0018:ffffb887888bfde0 EFLAGS: 00010286
      [  698.613659] RAX: 0000000000000000 RBX: ffff9d436d92bc00 RCX: ffff9d43a9184681
      [  698.613663] RDX: ffffffffc11e5c30 RSI: 0000000068be6540 RDI: 0000000000000000
      [  698.613667] RBP: ffffb887888bfdf0 R08: 0000000000000001 R09: 0000000000000000
      [  698.613671] R10: ffffb887888bfdd8 R11: 0000000000000f16 R12: ffff9d436d92bc08
      [  698.613675] R13: ffff9d436d94e630 R14: ffffffffc11e5088 R15: ffffffffc11e5000
      [  698.613680] FS:  00007faa68be6540(0000) GS:ffff9d43d14c0000(0000) knlGS:0000000000000000
      [  698.613685] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  698.613689] CR2: 0000000000000018 CR3: 000000042f70c002 CR4: 00000000003606e0
      [  698.613693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  698.613697] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  698.613700] Call Trace:
      [  698.613712]  nullb_group_drop_item+0x50/0x70 [null_blk]
      [  698.613722]  client_drop_item+0x29/0x40
      [  698.613728]  configfs_rmdir+0x1ed/0x300
      [  698.613738]  vfs_rmdir+0xb2/0x130
      [  698.613743]  do_rmdir+0x1c7/0x1e0
      [  698.613750]  __x64_sys_rmdir+0x17/0x20
      [  698.613759]  do_syscall_64+0x5a/0x110
      [  698.613768]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c8f75e75
  6. 11 6月, 2019 1 次提交
  7. 26 5月, 2019 1 次提交
  8. 10 5月, 2019 1 次提交
    • D
      virtio-blk: limit number of hw queues by nr_cpu_ids · 0e8e67b8
      Dongli Zhang 提交于
      [ Upstream commit bf348f9b78d413e75bb079462751a1d86b6de36c ]
      
      When tag_set->nr_maps is 1, the block layer limits the number of hw queues
      by nr_cpu_ids. No matter how many hw queues are used by virtio-blk, as it
      has (tag_set->nr_maps == 1), it can use at most nr_cpu_ids hw queues.
      
      In addition, specifically for pci scenario, when the 'num-queues' specified
      by qemu is more than maxcpus, virtio-blk would not be able to allocate more
      than maxcpus vectors in order to have a vector for each queue. As a result,
      it falls back into MSI-X with one vector for config and one shared for
      queues.
      
      Considering above reasons, this patch limits the number of hw queues used
      by virtio-blk by nr_cpu_ids.
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0e8e67b8
  9. 08 5月, 2019 1 次提交
    • G
      xsysace: Fix error handling in ace_setup · a82cfd77
      Guenter Roeck 提交于
      [ Upstream commit 47b16820c490149c2923e8474048f2c6e7557cab ]
      
      If xace hardware reports a bad version number, the error handling code
      in ace_setup() calls put_disk(), followed by queue cleanup. However, since
      the disk data structure has the queue pointer set, put_disk() also
      cleans and releases the queue. This results in blk_cleanup_queue()
      accessing an already released data structure, which in turn may result
      in a crash such as the following.
      
      [   10.681671] BUG: Kernel NULL pointer dereference at 0x00000040
      [   10.681826] Faulting instruction address: 0xc0431480
      [   10.682072] Oops: Kernel access of bad area, sig: 11 [#1]
      [   10.682251] BE PAGE_SIZE=4K PREEMPT Xilinx Virtex440
      [   10.682387] Modules linked in:
      [   10.682528] CPU: 0 PID: 1 Comm: swapper Tainted: G        W         5.0.0-rc6-next-20190218+ #2
      [   10.682733] NIP:  c0431480 LR: c043147c CTR: c0422ad8
      [   10.682863] REGS: cf82fbe0 TRAP: 0300   Tainted: G        W          (5.0.0-rc6-next-20190218+)
      [   10.683065] MSR:  00029000 <CE,EE,ME>  CR: 22000222  XER: 00000000
      [   10.683236] DEAR: 00000040 ESR: 00000000
      [   10.683236] GPR00: c043147c cf82fc90 cf82ccc0 00000000 00000000 00000000 00000002 00000000
      [   10.683236] GPR08: 00000000 00000000 c04310bc 00000000 22000222 00000000 c0002c54 00000000
      [   10.683236] GPR16: 00000000 00000001 c09aa39c c09021b0 c09021dc 00000007 c0a68c08 00000000
      [   10.683236] GPR24: 00000001 ced6d400 ced6dcf0 c0815d9c 00000000 00000000 00000000 cedf0800
      [   10.684331] NIP [c0431480] blk_mq_run_hw_queue+0x28/0x114
      [   10.684473] LR [c043147c] blk_mq_run_hw_queue+0x24/0x114
      [   10.684602] Call Trace:
      [   10.684671] [cf82fc90] [c043147c] blk_mq_run_hw_queue+0x24/0x114 (unreliable)
      [   10.684854] [cf82fcc0] [c04315bc] blk_mq_run_hw_queues+0x50/0x7c
      [   10.685002] [cf82fce0] [c0422b24] blk_set_queue_dying+0x30/0x68
      [   10.685154] [cf82fcf0] [c0423ec0] blk_cleanup_queue+0x34/0x14c
      [   10.685306] [cf82fd10] [c054d73c] ace_probe+0x3dc/0x508
      [   10.685445] [cf82fd50] [c052d740] platform_drv_probe+0x4c/0xb8
      [   10.685592] [cf82fd70] [c052abb0] really_probe+0x20c/0x32c
      [   10.685728] [cf82fda0] [c052ae58] driver_probe_device+0x68/0x464
      [   10.685877] [cf82fdc0] [c052b500] device_driver_attach+0xb4/0xe4
      [   10.686024] [cf82fde0] [c052b5dc] __driver_attach+0xac/0xfc
      [   10.686161] [cf82fe00] [c0528428] bus_for_each_dev+0x80/0xc0
      [   10.686314] [cf82fe30] [c0529b3c] bus_add_driver+0x144/0x234
      [   10.686457] [cf82fe50] [c052c46c] driver_register+0x88/0x15c
      [   10.686610] [cf82fe60] [c09de288] ace_init+0x4c/0xac
      [   10.686742] [cf82fe80] [c0002730] do_one_initcall+0xac/0x330
      [   10.686888] [cf82fee0] [c09aafd0] kernel_init_freeable+0x34c/0x478
      [   10.687043] [cf82ff30] [c0002c6c] kernel_init+0x18/0x114
      [   10.687188] [cf82ff40] [c000f2f0] ret_from_kernel_thread+0x14/0x1c
      [   10.687349] Instruction dump:
      [   10.687435] 3863ffd4 4bfffd70 9421ffd0 7c0802a6 93c10028 7c9e2378 93e1002c 38810008
      [   10.687637] 7c7f1b78 90010034 4bfffc25 813f008c <81290040> 75290100 4182002c 80810008
      [   10.688056] ---[ end trace 13c9ff51d41b9d40 ]---
      
      Fix the problem by setting the disk queue pointer to NULL before calling
      put_disk(). A more comprehensive fix might be to rearrange the code
      to check the hardware version before initializing data structures,
      but I don't know if this would have undesirable side effects, and
      it would increase the complexity of backporting the fix to older kernels.
      
      Fixes: 74489a91 ("Add support for Xilinx SystemACE CompactFlash interface")
      Acked-by: NMichal Simek <michal.simek@xilinx.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a82cfd77
  10. 02 5月, 2019 2 次提交
  11. 06 4月, 2019 1 次提交
    • D
      loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part() · 61584032
      Dongli Zhang 提交于
      [ Upstream commit 758a58d0bc67457f1215321a536226654a830eeb ]
      
      Commit 0da03cab87e6
      ("loop: Fix deadlock when calling blkdev_reread_part()") moves
      blkdev_reread_part() out of the loop_ctl_mutex. However,
      GENHD_FL_NO_PART_SCAN is set before __blkdev_reread_part(). As a result,
      __blkdev_reread_part() will fail the check of GENHD_FL_NO_PART_SCAN and
      will not rescan the loop device to delete all partitions.
      
      Below are steps to reproduce the issue:
      
      step1 # dd if=/dev/zero of=tmp.raw bs=1M count=100
      step2 # losetup -P /dev/loop0 tmp.raw
      step3 # parted /dev/loop0 mklabel gpt
      step4 # parted -a none -s /dev/loop0 mkpart primary 64s 1
      step5 # losetup -d /dev/loop0
      
      Step5 will not be able to delete /dev/loop0p1 (introduced by step4) and
      there is below kernel warning message:
      
      [  464.414043] __loop_clr_fd: partition scan of loop0 failed (rc=-22)
      
      This patch sets GENHD_FL_NO_PART_SCAN after blkdev_reread_part().
      
      Fixes: 0da03cab87e6 ("loop: Fix deadlock when calling blkdev_reread_part()")
      Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      61584032
  12. 27 3月, 2019 1 次提交
    • D
      loop: access lo_backing_file only when the loop device is Lo_bound · 3254dd30
      Dongli Zhang 提交于
      commit f7c8a4120eedf24c36090b7542b179ff7a649219 upstream.
      
      Commit 758a58d0bc67 ("loop: set GENHD_FL_NO_PART_SCAN after
      blkdev_reread_part()") separates "lo->lo_backing_file = NULL" and
      "lo->lo_state = Lo_unbound" into different critical regions protected by
      loop_ctl_mutex.
      
      However, there is below race that the NULL lo->lo_backing_file would be
      accessed when the backend of a loop is another loop device, e.g., loop0's
      backend is a file, while loop1's backend is loop0.
      
      loop0's backend is file            loop1's backend is loop0
      
      __loop_clr_fd()
        mutex_lock(&loop_ctl_mutex);
        lo->lo_backing_file = NULL; --> set to NULL
        mutex_unlock(&loop_ctl_mutex);
                                         loop_set_fd()
                                           mutex_lock_killable(&loop_ctl_mutex);
                                           loop_validate_file()
                                             f = l->lo_backing_file; --> NULL
                                               access if loop0 is not Lo_unbound
        mutex_lock(&loop_ctl_mutex);
        lo->lo_state = Lo_unbound;
        mutex_unlock(&loop_ctl_mutex);
      
      lo->lo_backing_file should be accessed only when the loop device is
      Lo_bound.
      
      In fact, the problem has been introduced already in commit 7ccd0791d985
      ("loop: Push loop_ctl_mutex down into loop_clr_fd()") after which
      loop_validate_file() could see devices in Lo_rundown state with which it
      did not count. It was harmless at that point but still.
      
      Fixes: 7ccd0791d985 ("loop: Push loop_ctl_mutex down into loop_clr_fd()")
      Reported-by: syzbot+9bdc1adc1c55e7fe765b@syzkaller.appspotmail.com
      Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3254dd30
  13. 24 3月, 2019 1 次提交
  14. 13 2月, 2019 6 次提交
    • F
      block/swim3: Fix -EBUSY error when re-opening device after unmount · 295b3e2a
      Finn Thain 提交于
      [ Upstream commit 296dcc40f2f2e402facf7cd26cf3f2c8f4b17d47 ]
      
      When the block device is opened with FMODE_EXCL, ref_count is set to -1.
      This value doesn't get reset when the device is closed which means the
      device cannot be opened again. Fix this by checking for refcount <= 0
      in the release method.
      Reported-and-tested-by: NStan Johnson <userm57@yahoo.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NFinn Thain <fthain@telegraphics.com.au>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      295b3e2a
    • M
      zram: fix lockdep warning of free block handling · 3b3ee499
      Minchan Kim 提交于
      [ Upstream commit 3c9959e025472122a61faebb208525cf26b305d1 ]
      
      Patch series "zram idle page writeback", v3.
      
      Inherently, swap device has many idle pages which are rare touched since
      it was allocated.  It is never problem if we use storage device as swap.
      However, it's just waste for zram-swap.
      
      This patchset supports zram idle page writeback feature.
      
      * Admin can define what is idle page "no access since X time ago"
      * Admin can define when zram should writeback them
      * Admin can define when zram should stop writeback to prevent wearout
      
      Details are in each patch's description.
      
      This patch (of 7):
      
        ================================
        WARNING: inconsistent lock state
        4.19.0+ #390 Not tainted
        --------------------------------
        inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
        zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes:
        00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50
        {SOFTIRQ-ON-W} state was registered at:
          _raw_spin_lock+0x2c/0x40
          zram_make_request+0x755/0xdc9
          generic_make_request+0x373/0x6a0
          submit_bio+0x6c/0x140
          __swap_writepage+0x3a8/0x480
          shrink_page_list+0x1102/0x1a60
          shrink_inactive_list+0x21b/0x3f0
          shrink_node_memcg.constprop.99+0x4f8/0x7e0
          shrink_node+0x7d/0x2f0
          do_try_to_free_pages+0xe0/0x300
          try_to_free_pages+0x116/0x2b0
          __alloc_pages_slowpath+0x3f4/0xf80
          __alloc_pages_nodemask+0x2a2/0x2f0
          __handle_mm_fault+0x42e/0xb50
          handle_mm_fault+0x55/0xb0
          __do_page_fault+0x235/0x4b0
          page_fault+0x1e/0x30
        irq event stamp: 228412
        hardirqs last  enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600
        hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600
        softirqs last  enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427
        softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0
      
        other info that might help us debug this:
         Possible unsafe locking scenario:
      
               CPU0
               ----
          lock(&(&zram->bitmap_lock)->rlock);
          <Interrupt>
            lock(&(&zram->bitmap_lock)->rlock);
      
         *** DEADLOCK ***
      
        no locks held by zram_verify/2095.
      
        stack backtrace:
        CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
        Call Trace:
         <IRQ>
         dump_stack+0x67/0x9b
         print_usage_bug+0x1bd/0x1d3
         mark_lock+0x4aa/0x540
         __lock_acquire+0x51d/0x1300
         lock_acquire+0x90/0x180
         _raw_spin_lock+0x2c/0x40
         put_entry_bdev+0x1e/0x50
         zram_free_page+0xf6/0x110
         zram_slot_free_notify+0x42/0xa0
         end_swap_bio_read+0x5b/0x170
         blk_update_request+0x8f/0x340
         scsi_end_request+0x2c/0x1e0
         scsi_io_completion+0x98/0x650
         blk_done_softirq+0x9e/0xd0
         __do_softirq+0xcc/0x427
         irq_exit+0xd1/0xe0
         do_IRQ+0x93/0x120
         common_interrupt+0xf/0xf
         </IRQ>
      
      With writeback feature, zram_slot_free_notify could be called in softirq
      context by end_swap_bio_read.  However, bitmap_lock is not aware of that
      so lockdep yell out:
      
        get_entry_bdev
        spin_lock(bitmap->lock);
        irq
        softirq
        end_swap_bio_read
        zram_slot_free_notify
        zram_slot_lock <-- deadlock prone
        zram_free_page
        put_entry_bdev
        spin_lock(bitmap->lock); <-- deadlock prone
      
      With akpm's suggestion (i.e.  bitmap operation is already atomic), we
      could remove bitmap lock.  It might fail to find a empty slot if serious
      contention happens.  However, it's not severe problem because huge page
      writeback has already possiblity to fail if there is severe memory
      pressure.  Worst case is just keeping the incompressible in memory, not
      storage.
      
      The other problem is zram_slot_lock in zram_slot_slot_free_notify.  To
      make it safe is this patch introduces zram_slot_trylock where
      zram_slot_free_notify uses it.  Although it's rare to be contented, this
      patch adds new debug stat "miss_free" to keep monitoring how often it
      happens.
      
      Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3b3ee499
    • L
      drbd: skip spurious timeout (ping-timeo) when failing promote · 66345d53
      Lars Ellenberg 提交于
      [ Upstream commit 9848b6ddd8c92305252f94592c5e278574e7a6ac ]
      
      If you try to promote a Secondary while connected to a Primary
      and allow-two-primaries is NOT set, we will wait for "ping-timeout"
      to give this node a chance to detect a dead primary,
      in case the cluster manager noticed faster than we did.
      
      But if we then are *still* connected to a Primary,
      we fail (after an additional timeout of ping-timout).
      
      This change skips the spurious second timeout.
      
      Most people won't notice really,
      since "ping-timeout" by default is half a second.
      
      But in some installations, ping-timeout may be 10 or 20 seconds or more,
      and spuriously delaying the error return becomes annoying.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      66345d53
    • L
      drbd: disconnect, if the wrong UUIDs are attached on a connected peer · af70af5b
      Lars Ellenberg 提交于
      [ Upstream commit b17b59602b6dcf8f97a7dc7bc489a48388d7063a ]
      
      With "on-no-data-accessible suspend-io", DRBD requires the next attach
      or connect to be to the very same data generation uuid tag it lost last.
      
      If we first lost connection to the peer,
      then later lost connection to our own disk,
      we would usually refuse to re-connect to the peer,
      because it presents the wrong data set.
      
      However, if the peer first connects without a disk,
      and then attached its disk, we accepted that same wrong data set,
      which would be "unexpected" by any user of that DRBD
      and cause "undefined results" (read: very likely data corruption).
      
      The fix is to forcefully disconnect as soon as we notice that the peer
      attached to the "wrong" dataset.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      af70af5b
    • R
      drbd: narrow rcu_read_lock in drbd_sync_handshake · 3d67b428
      Roland Kammerer 提交于
      [ Upstream commit d29e89e34952a9ad02c77109c71a80043544296e ]
      
      So far there was the possibility that we called
      genlmsg_new(GFP_NOIO)/mutex_lock() while holding an rcu_read_lock().
      
      This included cases like:
      
      drbd_sync_handshake (acquire the RCU lock)
        drbd_asb_recover_1p
          drbd_khelper
            drbd_bcast_event
              genlmsg_new(GFP_NOIO) --> may sleep
      
      drbd_sync_handshake (acquire the RCU lock)
        drbd_asb_recover_1p
          drbd_khelper
            notify_helper
              genlmsg_new(GFP_NOIO) --> may sleep
      
      drbd_sync_handshake (acquire the RCU lock)
        drbd_asb_recover_1p
          drbd_khelper
            notify_helper
              mutex_lock --> may sleep
      
      While using GFP_ATOMIC whould have been possible in the first two cases,
      the real fix is to narrow the rcu_read_lock.
      Reported-by: NJia-Ju Bai <baijiaju1990@163.com>
      Reviewed-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NRoland Kammerer <roland.kammerer@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3d67b428
    • Y
      sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN · 3fbba4e5
      Young Xiao 提交于
      [ Upstream commit a11f6ca9aef989b56cd31ff4ee2af4fb31a172ec ]
      
      __vdc_tx_trigger should only loop on EAGAIN a finite
      number of times.
      
      See commit adddc32d ("sunvnet: Do not spin in an
      infinite loop when vio_ldc_send() returns EAGAIN") for detail.
      Signed-off-by: NYoung Xiao <YangX92@hotmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3fbba4e5
  15. 23 1月, 2019 15 次提交