1. 14 11月, 2018 6 次提交
  2. 11 10月, 2018 2 次提交
    • D
      dm linear: fix linear_end_io conditional definition · 118aa47c
      Damien Le Moal 提交于
      The dm-linear target is independent of the dm-zoned target. For code
      requiring support for zoned block devices, use CONFIG_BLK_DEV_ZONED
      instead of CONFIG_DM_ZONED.
      
      While at it, similarly to dm linear, also enable the DM_TARGET_ZONED_HM
      feature in dm-flakey only if CONFIG_BLK_DEV_ZONED is defined.
      
      Fixes: beb9caac ("dm linear: eliminate linear_end_io call if CONFIG_DM_ZONED disabled")
      Fixes: 0be12c1c ("dm linear: add support for zoned block devices")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      118aa47c
    • M
      dm linear: eliminate linear_end_io call if CONFIG_DM_ZONED disabled · beb9caac
      Mike Snitzer 提交于
      It is best to avoid any extra overhead associated with bio completion.
      DM core will indirectly call a DM target's .end_io if it is defined.
      In the case of DM linear, there is no need to do so (for every bio that
      completes) if CONFIG_DM_ZONED is not enabled.
      
      Avoiding an extra indirect call for every bio completion is very
      important for ensuring DM linear doesn't incur more overhead that
      further widens the performance gap between dm-linear and raw block
      devices.
      
      Fixes: 0be12c1c ("dm linear: add support for zoned block devices")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      beb9caac
  3. 10 10月, 2018 2 次提交
    • D
      dm: fix report zone remapping to account for partition offset · 9864cd5d
      Damien Le Moal 提交于
      If dm-linear or dm-flakey are layered on top of a partition of a zoned
      block device, remapping of the start sector and write pointer position
      of the zones reported by a report zones BIO must be modified to account
      for the target table entry mapping (start offset within the device and
      entry mapping with the dm device).  If the target's backing device is a
      partition of a whole disk, the start sector on the physical device of
      the partition must also be accounted for when modifying the zone
      information.  However, dm_remap_zone_report() was not considering this
      last case, resulting in incorrect zone information remapping with
      targets using disk partitions.
      
      Fix this by calculating the target backing device start sector using
      the position of the completed report zones BIO and the unchanged
      position and size of the original report zone BIO. With this value
      calculated, the start sector and write pointer position of the target
      zones can be correctly remapped.
      
      Fixes: 10999307 ("dm: introduce dm_remap_zone_report()")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      9864cd5d
    • S
      dm cache: destroy migration_cache if cache target registration failed · c7cd5550
      Shenghui Wang 提交于
      Commit 7e6358d2 ("dm: fix various targets to dm_register_target
      after module __init resources created") inadvertently introduced this
      bug when it moved dm_register_target() after the call to KMEM_CACHE().
      
      Fixes: 7e6358d2 ("dm: fix various targets to dm_register_target after module __init resources created")
      Cc: stable@vger.kernel.org
      Signed-off-by: NShenghui Wang <shhuiw@foxmail.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      c7cd5550
  4. 06 10月, 2018 1 次提交
    • K
      treewide: Replace more open-coded allocation size multiplications · 329e0989
      Kees Cook 提交于
      As done treewide earlier, this catches several more open-coded
      allocation size calculations that were added to the kernel during the
      merge window. This performs the following mechanical transformations
      using Coccinelle:
      
      	kvmalloc(a * b, ...) -> kvmalloc_array(a, b, ...)
      	kvzalloc(a * b, ...) -> kvcalloc(a, b, ...)
      	devm_kzalloc(..., a * b, ...) -> devm_kcalloc(..., a, b, ...)
      Signed-off-by: NKees Cook <keescook@chromium.org>
      329e0989
  5. 05 10月, 2018 2 次提交
    • M
      dm cache: fix resize crash if user doesn't reload cache table · 5d07384a
      Mike Snitzer 提交于
      A reload of the cache's DM table is needed during resize because
      otherwise a crash will occur when attempting to access smq policy
      entries associated with the portion of the cache that was recently
      extended.
      
      The reason is cache-size based data structures in the policy will not be
      resized, the only way to safely extend the cache is to allow for a
      proper cache policy initialization that occurs when the cache table is
      loaded.  For example the smq policy's space_init(), init_allocator(),
      calc_hotspot_params() must be sized based on the extended cache size.
      
      The fix for this is to disallow cache resizes of this pattern:
      1) suspend "cache" target's device
      2) resize the fast device used for the cache
      3) resume "cache" target's device
      
      Instead, the last step must be a full reload of the cache's DM table.
      
      Fixes: 66a63635 ("dm cache: add stochastic-multi-queue (smq) policy")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      5d07384a
    • J
      dm cache metadata: ignore hints array being too small during resize · 4561ffca
      Joe Thornber 提交于
      Commit fd2fa954 ("dm cache metadata: save in-core policy_hint_size to
      on-disk superblock") enabled previously written policy hints to be
      used after a cache is reactivated.  But in doing so the cache
      metadata's hint array was left exposed to out of bounds access because
      on resize the metadata's on-disk hint array wasn't ever extended.
      
      Fix this by ignoring that there are no on-disk hints associated with the
      newly added cache blocks.  An expanded on-disk hint array is later
      rewritten upon the next clean shutdown of the cache.
      
      Fixes: fd2fa954 ("dm cache metadata: save in-core policy_hint_size to on-disk superblock")
      Cc: stable@vger.kernel.org
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      4561ffca
  6. 27 9月, 2018 1 次提交
    • G
      bcache: add separate workqueue for journal_write to avoid deadlock · 0f843e65
      Guoju Fang 提交于
      After write SSD completed, bcache schedules journal_write work to
      system_wq, which is a public workqueue in system, without WQ_MEM_RECLAIM
      flag. system_wq is also a bound wq, and there may be no idle kworker on
      current processor. Creating a new kworker may unfortunately need to
      reclaim memory first, by shrinking cache and slab used by vfs, which
      depends on bcache device. That's a deadlock.
      
      This patch create a new workqueue for journal_write with WQ_MEM_RECLAIM
      flag. It's rescuer thread will work to avoid the deadlock.
      Signed-off-by: NGuoju Fang <fangguoju@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0f843e65
  7. 18 9月, 2018 2 次提交
  8. 17 9月, 2018 1 次提交
  9. 11 9月, 2018 1 次提交
    • J
      dm thin metadata: try to avoid ever aborting transactions · 3ab91828
      Joe Thornber 提交于
      Committing a transaction can consume some metadata of it's own, we now
      reserve a small amount of metadata to cover this.  Free metadata
      reported by the kernel will not include this reserve.
      
      If any of the reserve has been used after a commit we enter a new
      internal state PM_OUT_OF_METADATA_SPACE.  This is reported as
      PM_READ_ONLY, so no userland changes are needed.  If the metadata
      device is resized the pool will move back to PM_WRITE.
      
      These changes mean we never need to abort and rollback a transaction due
      to running out of metadata space.  This is particularly important
      because there have been a handful of reports of data corruption against
      DM thin-provisioning that can all be attributed to the thin-pool having
      ran out of metadata space.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      3ab91828
  10. 07 9月, 2018 6 次提交
  11. 04 9月, 2018 1 次提交
  12. 01 9月, 2018 3 次提交
    • G
      md-cluster: release RESYNC lock after the last resync message · 41a95041
      Guoqing Jiang 提交于
      All the RESYNC messages are sent with resync lock held, the only
      exception is resync_finish which releases resync_lockres before
      send the last resync message, this should be changed as well.
      Otherwise, we can see deadlock issue as follows:
      
      clustermd2-gqjiang2:~ # cat /proc/mdstat
      Personalities : [raid10] [raid1]
      md0 : active raid1 sdg[0] sdf[1]
            134144 blocks super 1.2 [2/2] [UU]
            [===================>.]  resync = 99.6% (134144/134144) finish=0.0min speed=26K/sec
            bitmap: 1/1 pages [4KB], 65536KB chunk
      
      unused devices: <none>
      clustermd2-gqjiang2:~ # ps aux|grep md|grep D
      root     20497  0.0  0.0      0     0 ?        D    16:00   0:00 [md0_raid1]
      clustermd2-gqjiang2:~ # cat /proc/20497/stack
      [<ffffffffc05ff51e>] dlm_lock_sync+0x8e/0xc0 [md_cluster]
      [<ffffffffc05ff7e8>] __sendmsg+0x98/0x130 [md_cluster]
      [<ffffffffc05ff900>] sendmsg+0x20/0x30 [md_cluster]
      [<ffffffffc05ffc35>] resync_info_update+0xb5/0xc0 [md_cluster]
      [<ffffffffc0593e84>] md_reap_sync_thread+0x134/0x170 [md_mod]
      [<ffffffffc059514c>] md_check_recovery+0x28c/0x510 [md_mod]
      [<ffffffffc060c882>] raid1d+0x42/0x800 [raid1]
      [<ffffffffc058ab61>] md_thread+0x121/0x150 [md_mod]
      [<ffffffff9a0a5b3f>] kthread+0xff/0x140
      [<ffffffff9a800235>] ret_from_fork+0x35/0x40
      [<ffffffffffffffff>] 0xffffffffffffffff
      
      clustermd-gqjiang1:~ # ps aux|grep md|grep D
      root     20531  0.0  0.0      0     0 ?        D    16:00   0:00 [md0_raid1]
      root     20537  0.0  0.0      0     0 ?        D    16:00   0:00 [md0_cluster_rec]
      root     20676  0.0  0.0      0     0 ?        D    16:01   0:00 [md0_resync]
      clustermd-gqjiang1:~ # cat /proc/mdstat
      Personalities : [raid10] [raid1]
      md0 : active raid1 sdf[1] sdg[0]
            134144 blocks super 1.2 [2/2] [UU]
            [===================>.]  resync = 97.3% (131072/134144) finish=8076.8min speed=0K/sec
            bitmap: 1/1 pages [4KB], 65536KB chunk
      
      unused devices: <none>
      clustermd-gqjiang1:~ # cat /proc/20531/stack
      [<ffffffffc080974d>] metadata_update_start+0xcd/0xd0 [md_cluster]
      [<ffffffffc079c897>] md_update_sb.part.61+0x97/0x820 [md_mod]
      [<ffffffffc079f15b>] md_check_recovery+0x29b/0x510 [md_mod]
      [<ffffffffc0816882>] raid1d+0x42/0x800 [raid1]
      [<ffffffffc0794b61>] md_thread+0x121/0x150 [md_mod]
      [<ffffffff9e0a5b3f>] kthread+0xff/0x140
      [<ffffffff9e800235>] ret_from_fork+0x35/0x40
      [<ffffffffffffffff>] 0xffffffffffffffff
      clustermd-gqjiang1:~ # cat /proc/20537/stack
      [<ffffffffc0813222>] freeze_array+0xf2/0x140 [raid1]
      [<ffffffffc080a56e>] recv_daemon+0x41e/0x580 [md_cluster]
      [<ffffffffc0794b61>] md_thread+0x121/0x150 [md_mod]
      [<ffffffff9e0a5b3f>] kthread+0xff/0x140
      [<ffffffff9e800235>] ret_from_fork+0x35/0x40
      [<ffffffffffffffff>] 0xffffffffffffffff
      clustermd-gqjiang1:~ # cat /proc/20676/stack
      [<ffffffffc080951e>] dlm_lock_sync+0x8e/0xc0 [md_cluster]
      [<ffffffffc080957f>] lock_token+0x2f/0xa0 [md_cluster]
      [<ffffffffc0809622>] lock_comm+0x32/0x90 [md_cluster]
      [<ffffffffc08098f5>] sendmsg+0x15/0x30 [md_cluster]
      [<ffffffffc0809c0a>] resync_info_update+0x8a/0xc0 [md_cluster]
      [<ffffffffc08130ba>] raid1_sync_request+0xa9a/0xb10 [raid1]
      [<ffffffffc079b8ea>] md_do_sync+0xbaa/0xf90 [md_mod]
      [<ffffffffc0794b61>] md_thread+0x121/0x150 [md_mod]
      [<ffffffff9e0a5b3f>] kthread+0xff/0x140
      [<ffffffff9e800235>] ret_from_fork+0x35/0x40
      [<ffffffffffffffff>] 0xffffffffffffffff
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      41a95041
    • X
      RAID10 BUG_ON in raise_barrier when force is true and conf->barrier is 0 · 1d0ffd26
      Xiao Ni 提交于
      In raid10 reshape_request it gets max_sectors in read_balance. If the underlayer disks
      have bad blocks, the max_sectors is less than last. It will call goto read_more many
      times. It calls raise_barrier(conf, sectors_done != 0) every time. In this condition
      sectors_done is not 0. So the value passed to the argument force of raise_barrier is
      true.
      
      In raise_barrier it checks conf->barrier when force is true. If force is true and
      conf->barrier is 0, it panic. In this case reshape_request submits bio to under layer
      disks. And in the callback function of the bio it calls lower_barrier. If the bio
      finishes before calling raise_barrier again, it can trigger the BUG_ON.
      
      Add one pair of raise_barrier/lower_barrier to fix this bug.
      Signed-off-by: NXiao Ni <xni@redhat.com>
      Suggested-by: NNeil Brown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      1d0ffd26
    • S
      md/raid5-cache: disable reshape completely · e254de6b
      Shaohua Li 提交于
      We don't support reshape yet if an array supports log device. Previously we
      determine the fact by checking ->log. However, ->log could be NULL after a log
      device is removed, but the array is still marked to support log device. Don't
      allow reshape in this case too. User can disable log device support by setting
      'consistency_policy' to 'resync' then do reshape.
      Reported-by: NXiao Ni <xni@redhat.com>
      Tested-by: NXiao Ni <xni@redhat.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      e254de6b
  13. 23 8月, 2018 2 次提交
  14. 17 8月, 2018 1 次提交
  15. 14 8月, 2018 1 次提交
  16. 12 8月, 2018 8 次提交