1. 12 6月, 2015 8 次提交
    • J
      dm thin metadata: add dm_thin_find_mapped_range() · a5d895a9
      Joe Thornber 提交于
      Retrieve the next run of contiguously mapped blocks.  Useful for working
      out where to break up IO.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      a5d895a9
    • J
      dm btree: add dm_btree_remove_leaves() · 4ec331c3
      Joe Thornber 提交于
      Removes a range of leaf values from the tree.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      4ec331c3
    • P
      dm stats: Use kvfree() in dm_kvfree() · 0f24b79b
      Pekka Enberg 提交于
      Use kvfree() instead of open-coding it.
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      0f24b79b
    • J
      dm cache: age and write back cache entries even without active IO · fba10109
      Joe Thornber 提交于
      The policy tick() method is normally called from interrupt context.
      Both the mq and smq policies do some bottom half work for the tick
      method in their map functions.  However if no IO is going through the
      cache, then that bottom half work doesn't occur.  With these policies
      this means recently hit entries do not age and do not get written
      back as early as we'd like.
      
      Fix this by introducing a new 'can_block' parameter to the tick()
      method.  When this is set the bottom half work occurs immediately.
      'can_block' is set when the tick method is called every second by the
      core target (not in interrupt context).
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      fba10109
    • M
      dm cache: prefix all DMERR and DMINFO messages with cache device name · b61d9509
      Mike Snitzer 提交于
      Having the DM device name associated with the ERR or INFO message is
      very helpful.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      b61d9509
    • J
      dm cache: add fail io mode and needs_check flag · 028ae9f7
      Joe Thornber 提交于
      If a cache metadata operation fails (e.g. transaction commit) the
      cache's metadata device will abort the current transaction, set a new
      needs_check flag, and the cache will transition to "read-only" mode.  If
      aborting the transaction or setting the needs_check flag fails the cache
      will transition to "fail-io" mode.
      
      Once needs_check is set the cache device will not be allowed to
      activate.  Activation requires write access to metadata.  Future work is
      needed to add proper support for running the cache in read-only mode.
      
      Once in fail-io mode the cache will report a status of "Fail".
      
      Also, add commit() wrapper that will disallow commits if in read_only or
      fail mode.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      028ae9f7
    • J
      dm cache: wake the worker thread every time we free a migration object · 88bf5184
      Joe Thornber 提交于
      When the cache is idle, writeback work was only being issued every
      second.  With this change outstanding writebacks are streamed
      constantly.  This offers a writeback performance improvement.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      88bf5184
    • J
      dm cache: add stochastic-multi-queue (smq) policy · 66a63635
      Joe Thornber 提交于
      The stochastic-multi-queue (smq) policy addresses some of the problems
      with the current multiqueue (mq) policy.
      
      Memory usage
      ------------
      
      The mq policy uses a lot of memory; 88 bytes per cache block on a 64
      bit machine.
      
      SMQ uses 28bit indexes to implement it's data structures rather than
      pointers.  It avoids storing an explicit hit count for each block.  It
      has a 'hotspot' queue rather than a pre cache which uses a quarter of
      the entries (each hotspot block covers a larger area than a single
      cache block).
      
      All these mean smq uses ~25bytes per cache block.  Still a lot of
      memory, but a substantial improvement nontheless.
      
      Level balancing
      ---------------
      
      MQ places entries in different levels of the multiqueue structures
      based on their hit count (~ln(hit count)).  This means the bottom
      levels generally have the most entries, and the top ones have very
      few.  Having unbalanced levels like this reduces the efficacy of the
      multiqueue.
      
      SMQ does not maintain a hit count, instead it swaps hit entries with
      the least recently used entry from the level above.  The over all
      ordering being a side effect of this stochastic process.  With this
      scheme we can decide how many entries occupy each multiqueue level,
      resulting in better promotion/demotion decisions.
      
      Adaptability
      ------------
      
      The MQ policy maintains a hit count for each cache block.  For a
      different block to get promoted to the cache it's hit count has to
      exceed the lowest currently in the cache.  This means it can take a
      long time for the cache to adapt between varying IO patterns.
      Periodically degrading the hit counts could help with this, but I
      haven't found a nice general solution.
      
      SMQ doesn't maintain hit counts, so a lot of this problem just goes
      away.  In addition it tracks performance of the hotspot queue, which
      is used to decide which blocks to promote.  If the hotspot queue is
      performing badly then it starts moving entries more quickly between
      levels.  This lets it adapt to new IO patterns very quickly.
      
      Performance
      -----------
      
      In my tests SMQ shows substantially better performance than MQ.  Once
      this matures a bit more I'm sure it'll become the default policy.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      66a63635
  2. 30 5月, 2015 24 次提交
  3. 29 5月, 2015 1 次提交
    • M
      dm: fix false warning in free_rq_clone() for unmapped requests · e5d8de32
      Mike Snitzer 提交于
      When stacking request-based dm device on non blk-mq device and
      device-mapper target could not map the request (error target is used,
      multipath target with all paths down, etc), the WARN_ON_ONCE() in
      free_rq_clone() will trigger when it shouldn't.
      
      The warning was added by commit aa6df8dd ("dm: fix free_rq_clone() NULL
      pointer when requeueing unmapped request").  But free_rq_clone() with
      clone->q == NULL is valid usage for the case where
      dm_kill_unmapped_request() initiates request cleanup.
      
      Fix this false warning by just removing the WARN_ON -- it only generated
      false positives and was never useful in catching the intended case
      (completing clone request not being mapped e.g. clone->q being NULL).
      
      Fixes: aa6df8dd ("dm: fix free_rq_clone() NULL pointer when requeueing unmapped request")
      Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e5d8de32
  4. 28 5月, 2015 2 次提交
    • M
      dm: requeue from blk-mq dm_mq_queue_rq() using BLK_MQ_RQ_QUEUE_BUSY · 45714fbe
      Mike Snitzer 提交于
      Use BLK_MQ_RQ_QUEUE_BUSY to requeue a blk-mq request directly from the
      DM blk-mq device's .queue_rq.  This cleans up the previous convoluted
      handling of request requeueing that would return BLK_MQ_RQ_QUEUE_OK
      (even though it wasn't) and then run blk_mq_requeue_request() followed
      by blk_mq_kick_requeue_list().
      
      Also, document that DM blk-mq ontop of old request_fn devices cannot
      fail in clone_rq() since the clone request is preallocated as part of
      the pdu.
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      45714fbe
    • M
      dm mpath: fix leak of dm_mpath_io structure in blk-mq .queue_rq error path · 4c6dd53d
      Mike Snitzer 提交于
      Otherwise kmemleak reported:
      
      unreferenced object 0xffff88009b14e2b0 (size 16):
        comm "fio", pid 4274, jiffies 4294978034 (age 1253.210s)
        hex dump (first 16 bytes):
          40 12 f3 99 01 88 ff ff 00 10 00 00 00 00 00 00  @...............
        backtrace:
          [<ffffffff81600029>] kmemleak_alloc+0x49/0xb0
          [<ffffffff811679a8>] kmem_cache_alloc+0xf8/0x160
          [<ffffffff8111c950>] mempool_alloc_slab+0x10/0x20
          [<ffffffff8111cb37>] mempool_alloc+0x57/0x150
          [<ffffffffa04d2b61>] __multipath_map.isra.17+0xe1/0x220 [dm_multipath]
          [<ffffffffa04d2cb5>] multipath_clone_and_map+0x15/0x20 [dm_multipath]
          [<ffffffffa02889b5>] map_request.isra.39+0xd5/0x220 [dm_mod]
          [<ffffffffa028b0e4>] dm_mq_queue_rq+0x134/0x240 [dm_mod]
          [<ffffffff812cccb5>] __blk_mq_run_hw_queue+0x1d5/0x380
          [<ffffffff812ccaa5>] blk_mq_run_hw_queue+0xc5/0x100
          [<ffffffff812ce350>] blk_sq_make_request+0x240/0x300
          [<ffffffff812c0f30>] generic_make_request+0xc0/0x110
          [<ffffffff812c0ff2>] submit_bio+0x72/0x150
          [<ffffffff811c07cb>] do_blockdev_direct_IO+0x1f3b/0x2da0
          [<ffffffff811c166e>] __blockdev_direct_IO+0x3e/0x40
          [<ffffffff8120aa1a>] ext4_direct_IO+0x1aa/0x390
      
      Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices")
      Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.0+
      4c6dd53d
  5. 27 5月, 2015 2 次提交
    • J
      dm: fix NULL pointer when clone_and_map_rq returns !DM_MAPIO_REMAPPED · 3a140755
      Junichi Nomura 提交于
      When stacking request-based DM on blk_mq device, request cloning and
      remapping are done in a single call to target's clone_and_map_rq().
      The clone is allocated and valid only if clone_and_map_rq() returns
      DM_MAPIO_REMAPPED.
      
      The "IS_ERR(clone)" check in map_request() does not cover all the
      !DM_MAPIO_REMAPPED cases that are possible (E.g. if underlying devices
      are not ready or unavailable, clone_and_map_rq() may return
      DM_MAPIO_REQUEUE without ever having established an ERR_PTR).  Fix this
      by explicitly checking for a return that is not DM_MAPIO_REMAPPED in
      map_request().
      
      Without this fix, DM core may call setup_clone() for a NULL clone
      and oops like this:
      
         BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
         IP: [<ffffffff81227525>] blk_rq_prep_clone+0x7d/0x137
         ...
         CPU: 2 PID: 5793 Comm: kdmwork-253:3 Not tainted 4.0.0-nm #1
         ...
         Call Trace:
          [<ffffffffa01d1c09>] map_tio_request+0xa9/0x258 [dm_mod]
          [<ffffffff81071de9>] kthread_worker_fn+0xfd/0x150
          [<ffffffff81071cec>] ? kthread_parkme+0x24/0x24
          [<ffffffff81071cec>] ? kthread_parkme+0x24/0x24
          [<ffffffff81071fdd>] kthread+0xe6/0xee
          [<ffffffff81093a59>] ? put_lock_stats+0xe/0x20
          [<ffffffff81071ef7>] ? __init_kthread_worker+0x5b/0x5b
          [<ffffffff814c2d98>] ret_from_fork+0x58/0x90
          [<ffffffff81071ef7>] ? __init_kthread_worker+0x5b/0x5b
      
      Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices")
      Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.0+
      3a140755
    • J
      block: fix returnvar.cocci warnings · f6454b04
      Julia Lawall 提交于
      Remove unneeded variable used to store return value.
      
      Generated by: scripts/coccinelle/misc/returnvar.cocci
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NJulia Lawall <julia.lawall@lip6.fr>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f6454b04
  6. 26 5月, 2015 1 次提交
  7. 25 5月, 2015 2 次提交
    • L
      Linux 4.1-rc5 · ba155e2d
      Linus Torvalds 提交于
      ba155e2d
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 5b139666
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "This is a set of five fixes: Two MAINTAINER email updates (urgent
        because the non-avagotech emails will start bouncing) an lpfc big
        endian oops fix, a 256 byte sector hang fix (to eliminate 256 byte
        sectors) and a storvsc fix which could cause test unit ready failures
        on bringup"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        MAINTAINERS: Revise lpfc maintainers for Avago Technologies ownership of Emulex
        MAINTAINERS, be2iscsi: change email domain
        sd: Disable support for 256 byte/sector disks
        lpfc: Fix breakage on big endian kernels
        storvsc: Set the SRB flags correctly when no data transfer is needed
      5b139666