1. 13 12月, 2018 33 次提交
  2. 12 12月, 2018 7 次提交
    • M
      block: deactivate blk_stat timer in wbt_disable_default() · 544fbd16
      Ming Lei 提交于
      rwb_enabled() can't be changed when there is any inflight IO.
      
      wbt_disable_default() may set rwb->wb_normal as zero, however the
      blk_stat timer may still be pending, and the timer function will update
      wrb->wb_normal again.
      
      This patch introduces blk_stat_deactivate() and applies it in
      wbt_disable_default(), then the following IO hang triggered when running
      parted & switching io scheduler can be fixed:
      
      [  369.937806] INFO: task parted:3645 blocked for more than 120 seconds.
      [  369.938941]       Not tainted 4.20.0-rc6-00284-g906c801e5248 #498
      [  369.939797] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  369.940768] parted          D    0  3645   3239 0x00000000
      [  369.941500] Call Trace:
      [  369.941874]  ? __schedule+0x6d9/0x74c
      [  369.942392]  ? wbt_done+0x5e/0x5e
      [  369.942864]  ? wbt_cleanup_cb+0x16/0x16
      [  369.943404]  ? wbt_done+0x5e/0x5e
      [  369.943874]  schedule+0x67/0x78
      [  369.944298]  io_schedule+0x12/0x33
      [  369.944771]  rq_qos_wait+0xb5/0x119
      [  369.945193]  ? karma_partition+0x1c2/0x1c2
      [  369.945691]  ? wbt_cleanup_cb+0x16/0x16
      [  369.946151]  wbt_wait+0x85/0xb6
      [  369.946540]  __rq_qos_throttle+0x23/0x2f
      [  369.947014]  blk_mq_make_request+0xe6/0x40a
      [  369.947518]  generic_make_request+0x192/0x2fe
      [  369.948042]  ? submit_bio+0x103/0x11f
      [  369.948486]  ? __radix_tree_lookup+0x35/0xb5
      [  369.949011]  submit_bio+0x103/0x11f
      [  369.949436]  ? blkg_lookup_slowpath+0x25/0x44
      [  369.949962]  submit_bio_wait+0x53/0x7f
      [  369.950469]  blkdev_issue_flush+0x8a/0xae
      [  369.951032]  blkdev_fsync+0x2f/0x3a
      [  369.951502]  do_fsync+0x2e/0x47
      [  369.951887]  __x64_sys_fsync+0x10/0x13
      [  369.952374]  do_syscall_64+0x89/0x149
      [  369.952819]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  369.953492] RIP: 0033:0x7f95a1e729d4
      [  369.953996] Code: Bad RIP value.
      [  369.954456] RSP: 002b:00007ffdb570dd48 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
      [  369.955506] RAX: ffffffffffffffda RBX: 000055c2139c6be0 RCX: 00007f95a1e729d4
      [  369.956389] RDX: 0000000000000001 RSI: 0000000000001261 RDI: 0000000000000004
      [  369.957325] RBP: 0000000000000002 R08: 0000000000000000 R09: 000055c2139c6ce0
      [  369.958199] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c2139c0380
      [  369.959143] R13: 0000000000000004 R14: 0000000000000100 R15: 0000000000000008
      
      Cc: stable@vger.kernel.org
      Cc: Paolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      544fbd16
    • J
      sbitmap: flush deferred clears for resize and shallow gets · b2dbff1b
      Jens Axboe 提交于
      We're missing a deferred clear off the shallow get, which can cause
      a hang. Additionally, when we resize the sbitmap, we should also
      flush deferred clears for good measure.
      
      Ensure we have full coverage on batch clears, even for paths where
      we would not be doing deferred clear. This makes it less error
      prone for future additions.
      Reported-by: NBart Van Assche <bvanassche@acm.org>
      Tested-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b2dbff1b
    • I
      lightnvm: pblk: do not overwrite ppa list with meta list · 2c4d5356
      Igor Konopko 提交于
      Ehen using pblk with 0 sized metadata both ppa list and meta list
      points to the same memory since pblk_dma_meta_size() returns 0 in
      that case.
      
      This patch fix that issue by ensuring that pblk_dma_meta_size()
      always returns space equal to sizeof(struct pblk_sec_meta) and thus
      ppa list and meta list points to different memory address.
      
      Even that in that case drive does not really care about meta_list
      pointer, this is the easiest way to fix that issue without introducing
      changes in many places in the code just for 0 sized metadata case.
      
      The same approach needs to be also done for pblk_get_sec_meta()
      since we also cannot point to the same memory address in meta buffer
      when we are using it for pblk recovery process
      Reported-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
      Tested-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
      Signed-off-by: NIgor Konopko <igor.j.konopko@intel.com>
      Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2c4d5356
    • I
      lightnvm: pblk: support packed metadata · 55d8ec35
      Igor Konopko 提交于
      pblk performs recovery of open lines by storing the LBA in the per LBA
      metadata field. Recovery therefore only works for drives that has this
      field.
      
      This patch adds support for packed metadata, which store l2p mapping
      for open lines in last sector of every write unit and enables drives
      without per IO metadata to recover open lines.
      
      After this patch, drives with OOB size <16B will use packed metadata
      and metadata size larger than16B will continue to use the device per
      IO metadata.
      Reviewed-by: NJavier González <javier@cnexlabs.com>
      Signed-off-by: NIgor Konopko <igor.j.konopko@intel.com>
      Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      55d8ec35
    • I
      lightnvm: disable interleaved metadata · a16816b9
      Igor Konopko 提交于
      Currently pblk only check the size of I/O metadata and does not take
      into account if this metadata is in a separate buffer or interleaved
      in a single metadata buffer.
      
      In reality only the first scenario is supported, where second mode will
      break pblk functionality during any IO operation.
      
      This patch prevents pblk to be instantiated in case device only
      supports interleaved metadata.
      Reviewed-by: NJavier González <javier@cnexlabs.com>
      Signed-off-by: NIgor Konopko <igor.j.konopko@intel.com>
      Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a16816b9
    • I
      lightnvm: dynamic DMA pool entry size · 24828d05
      Igor Konopko 提交于
      Currently lightnvm and pblk uses single DMA pool, for which the entry
      size always is equal to PAGE_SIZE. The contents of each entry allocated
      from the DMA pool consists of a PPA list (8bytes * 64), leaving
      56bytes * 64 space for metadata. Since the metadata field can be bigger,
      such as 128 bytes, the static size does not cover this use-case.
      
      This patch adds support for I/O metadata above 56 bytes by changing DMA
      pool size based on device meta size and allows pblk to use OOB metadata
      >=16B.
      Reviewed-by: NJavier González <javier@cnexlabs.com>
      Signed-off-by: NIgor Konopko <igor.j.konopko@intel.com>
      Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      24828d05
    • I
      lightnvm: pblk: add helpers for OOB metadata · faa79f27
      Igor Konopko 提交于
      pblk currently assumes that size of OOB metadata on drive is always
      equal to size of pblk_sec_meta struct. This commit add helpers which will
      allow to handle different sizes of OOB metadata on drive in the future.
      
      After this patch only OOB metadata equal to 16 bytes is supported.
      Reviewed-by: NJavier González <javier@cnexlabs.com>
      Signed-off-by: NIgor Konopko <igor.j.konopko@intel.com>
      Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      faa79f27