1. 11 11月, 2017 3 次提交
    • M
      dm cache: submit writethrough writes in parallel to origin and cache · 2df3bae9
      Mike Snitzer 提交于
      Discontinue issuing writethrough write IO in series to the origin and
      then cache.
      
      Use bio_clone_fast() to create a new origin clone bio that will be
      mapped to the origin device and then bio_chain() it to the bio that gets
      remapped to the cache device.  The origin clone bio does _not_ have a
      copy of the per_bio_data -- as such check_if_tick_bio_needed() will not
      be called.
      
      The cache bio (parent bio) will not complete until the origin bio has
      completed -- this fulfills bio_clone_fast()'s requirements as well as
      the requirement to not complete the original IO until the write IO has
      completed to both the origin and cache device.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2df3bae9
    • M
      dm cache: pass cache structure to mode functions · 8e3c3827
      Mike Snitzer 提交于
      No functional changes, just a bit cleaner than passing cache_features
      structure.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      8e3c3827
    • J
      dm cache: fix race condition in the writeback mode overwrite_bio optimisation · d1260e2a
      Joe Thornber 提交于
      When a DM cache in writeback mode moves data between the slow and fast
      device it can often avoid a copy if the triggering bio either:
      
      i) covers the whole block (no point copying if we're about to overwrite it)
      ii) the migration is a promotion and the origin block is currently discarded
      
      Prior to this fix there was a race with case (ii).  The discard status
      was checked with a shared lock held (rather than exclusive).  This meant
      another bio could run in parallel and write data to the origin, removing
      the discard state.  After the promotion the parallel write would have
      been lost.
      
      With this fix the discard status is re-checked once the exclusive lock
      has been aquired.  If the block is no longer discarded it falls back to
      the slower full copy path.
      
      Fixes: b29d4986 ("dm cache: significant rework to leverage dm-bio-prison-v2")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      d1260e2a
  2. 28 8月, 2017 1 次提交
  3. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  4. 09 6月, 2017 2 次提交
  5. 15 5月, 2017 3 次提交
  6. 09 4月, 2017 1 次提交
  7. 31 3月, 2017 1 次提交
    • J
      dm cache: set/clear the cache core's dirty_bitset when loading mappings · 449b668c
      Joe Thornber 提交于
      When loading metadata make sure to set/clear the dirty bits in the cache
      core's dirty_bitset as well as the policy.
      
      Otherwise the cache core is unaware that any blocks were dirty when the
      cache was last shutdown.  A very serious side-effect being that the
      cleaner policy would therefore never be tasked with writing back dirty
      data from a cache that was in writeback mode (e.g. when switching from
      smq policy to cleaner policy when decommissioning a writeback cache).
      
      This fixes a serious data corruption bug associated with writeback mode.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      449b668c
  8. 08 3月, 2017 2 次提交
  9. 17 2月, 2017 2 次提交
  10. 02 2月, 2017 1 次提交
  11. 28 1月, 2017 1 次提交
  12. 21 11月, 2016 1 次提交
  13. 08 8月, 2016 1 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
  14. 08 6月, 2016 2 次提交
  15. 11 3月, 2016 2 次提交
  16. 23 2月, 2016 1 次提交
  17. 10 12月, 2015 1 次提交
  18. 01 11月, 2015 1 次提交
  19. 01 9月, 2015 3 次提交
  20. 14 8月, 2015 1 次提交
    • K
      block: kill merge_bvec_fn() completely · 8ae12666
      Kent Overstreet 提交于
      As generic_make_request() is now able to handle arbitrarily sized bios,
      it's no longer necessary for each individual block driver to define its
      own ->merge_bvec_fn() callback. Remove every invocation completely.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: drbd-user@lists.linbit.com
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@kernel.org>
      Cc: ceph-devel@vger.kernel.org
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Neil Brown <neilb@suse.de>
      Cc: linux-raid@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits)
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
      [dpark: also remove ->merge_bvec_fn() in dm-thin as well as
       dm-era-target, and resolve merge conflicts]
      Signed-off-by: NDongsu Park <dpark@posteo.net>
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8ae12666
  21. 12 8月, 2015 1 次提交
  22. 30 7月, 2015 2 次提交
    • M
      dm cache: fix device destroy hang due to improper prealloc_used accounting · 795e633a
      Mike Snitzer 提交于
      Commit 665022d7 ("dm cache: avoid calls to prealloc_free_structs() if
      possible") introduced a regression that caused the removal of a DM cache
      device to hang in cache_postsuspend()'s call to wait_for_migrations()
      with the following stack trace:
      
        [<ffffffff81651457>] schedule+0x37/0x80
        [<ffffffffa041e21b>] cache_postsuspend+0xbb/0x470 [dm_cache]
        [<ffffffff810ba970>] ? prepare_to_wait_event+0xf0/0xf0
        [<ffffffffa0006f77>] dm_table_postsuspend_targets+0x47/0x60 [dm_mod]
        [<ffffffffa0001eb5>] __dm_destroy+0x215/0x250 [dm_mod]
        [<ffffffffa0004113>] dm_destroy+0x13/0x20 [dm_mod]
        [<ffffffffa00098cd>] dev_remove+0x10d/0x170 [dm_mod]
        [<ffffffffa00097c0>] ? dev_suspend+0x240/0x240 [dm_mod]
        [<ffffffffa0009f85>] ctl_ioctl+0x255/0x4d0 [dm_mod]
        [<ffffffff8127ac00>] ? SYSC_semtimedop+0x280/0xe10
        [<ffffffffa000a213>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
        [<ffffffff811fd432>] do_vfs_ioctl+0x2d2/0x4b0
        [<ffffffff81117d5f>] ? __audit_syscall_entry+0xaf/0x100
        [<ffffffff81022636>] ? do_audit_syscall_entry+0x66/0x70
        [<ffffffff811fd689>] SyS_ioctl+0x79/0x90
        [<ffffffff81023e58>] ? syscall_trace_leave+0xb8/0x110
        [<ffffffff81654f6e>] entry_SYSCALL_64_fastpath+0x12/0x71
      
      Fix this by accounting for the call to prealloc_data_structs()
      immediately _before_ the call as opposed to after.  This is needed
      because it is possible to break out of the control loop after the call
      to prealloc_data_structs() but before prealloc_used was set to true.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      795e633a
    • M
      Revert "dm cache: do not wake_worker() in free_migration()" · 3508e659
      Mike Snitzer 提交于
      This reverts commit 386cb7cd.
      
      Taking the wake_worker() out of free_migration() will slow writeback
      dramatically, and hence adaptability.
      
      Say we have 10k blocks that need writing back, but are only able to
      issue 5 concurrently due to the migration bandwidth: it's imperative
      that we wake_worker() immediately after migration completion; waiting
      for the next 1 second wake up (via do_waker) means it'll take a long
      time to write that all back.
      Reported-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      3508e659
  23. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  24. 17 7月, 2015 3 次提交
  25. 16 7月, 2015 1 次提交
  26. 12 6月, 2015 1 次提交
    • J
      dm cache: age and write back cache entries even without active IO · fba10109
      Joe Thornber 提交于
      The policy tick() method is normally called from interrupt context.
      Both the mq and smq policies do some bottom half work for the tick
      method in their map functions.  However if no IO is going through the
      cache, then that bottom half work doesn't occur.  With these policies
      this means recently hit entries do not age and do not get written
      back as early as we'd like.
      
      Fix this by introducing a new 'can_block' parameter to the tick()
      method.  When this is set the bottom half work occurs immediately.
      'can_block' is set when the tick method is called every second by the
      core target (not in interrupt context).
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      fba10109