1. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  2. 19 6月, 2017 1 次提交
    • N
      drbd: use bio_clone_fast() instead of bio_clone() · 8cb0defb
      NeilBrown 提交于
      drbd does not modify the bi_io_vec of the cloned bio,
      so there is no need to clone that part.  So bio_clone_fast()
      is the better choice.
      For bio_clone_fast() we need to specify a bio_set.
      We could use fs_bio_set, which bio_clone() uses, or
      drbd_md_io_bio_set, which drbd uses for metadata, but it is
      generally best to avoid sharing bio_sets unless you can
      be certain that there are no interdependencies.
      
      So create a new bio_set, drbd_io_bio_set, and use bio_clone_fast().
      
      Also remove a "XXX cannot fail ???" comment because it definitely
      cannot fail - bio_clone_fast() doesn't fail if the GFP flags allow for
      sleeping.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8cb0defb
  3. 09 6月, 2017 1 次提交
  4. 09 4月, 2017 1 次提交
  5. 02 3月, 2017 1 次提交
  6. 03 8月, 2016 1 次提交
  7. 14 6月, 2016 9 次提交
    • L
      drbd: al_write_transaction: skip re-scanning of bitmap page pointer array · 27ea1d87
      Lars Ellenberg 提交于
      For larger devices, the array of bitmap page pointers can grow very
      large (8000 pointers per TB of storage).
      
      For each activity log transaction, we need to flush the associated
      bitmap pages to stable storage. Currently, we just "mark" the respective
      pages while setting up the transaction, then tell the bitmap code to
      write out all marked pages, but skip unchanged pages.
      
      But one such transaction can affect only a small number of bitmap pages,
      there is no need to scan the full array of several (ten-)thousand
      page pointers to find the few marked ones.
      
      Instead, remember the index numbers of the few affected pages,
      and later only re-check those to skip duplicates and unchanged ones.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      27ea1d87
    • F
      drbd: code cleanups without semantic changes · 7e5fec31
      Fabian Frederick 提交于
      This contains various cosmetic fixes ranging from simple typos to
      const-ifying, and using booleans properly.
      
      Original commit messages from Fabian's patch set:
      drbd: debugfs: constify drbd_version_fops
      drbd: use seq_put instead of seq_print where possible
      drbd: include linux/uaccess.h instead of asm/uaccess.h
      drbd: use const char * const for drbd strings
      drbd: kerneldoc warning fix in w_e_end_data_req()
      drbd: use unsigned for one bit fields
      drbd: use bool for peer is_ states
      drbd: fix typo
      drbd: use | for bitmask combination
      drbd: use true/false for bool
      drbd: fix drbd_bm_init() comments
      drbd: introduce peer state union
      drbd: fix maybe_pull_ahead() locking comments
      drbd: use bool for growing
      drbd: remove redundant declarations
      drbd: replace if/BUG by BUG_ON
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NRoland Kammerer <roland.kammerer@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7e5fec31
    • L
      drbd: introduce WRITE_SAME support · 9104d31a
      Lars Ellenberg 提交于
      We will support WRITE_SAME, if
       * all peers support WRITE_SAME (both in kernel and DRBD version),
       * all peer devices support WRITE_SAME
       * logical_block_size is identical on all peers.
      
      We may at some point introduce a fallback on the receiving side
      for devices/kernels that do not support WRITE_SAME,
      by open-coding a submit loop. But not yet.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9104d31a
    • L
      drbd: introduce unfence-peer handler · 26a96110
      Lars Ellenberg 提交于
      When resync is finished, we already call the "after-resync-target"
      handler (on the former sync target, obviously), once per volume.
      
      Paired with the before-resync-target handler, you can create snapshots,
      before the resync causes the volumes to become inconsistent,
      and discard those snapshots again, once they are no longer needed.
      
      It was also overloaded to be paired with the "fence-peer" handler,
      to "unfence" once the volumes are up-to-date and known good.
      
      This has some disadvantages, though: we call "fence-peer" for the whole
      connection (once for the group of volumes), but would call unfence as
      side-effect of after-resync-target once for each volume.
      
      Also, we fence on a (current, or about to become) Primary,
      which will later become the sync-source.
      
      Calling unfence only as a side effect of the after-resync-target
      handler opens a race window, between a new fence on the Primary
      (SyncTarget) and the unfence on the SyncTarget, which is difficult to
      close without some kind of "cluster wide lock" in those handlers.
      
      We would not need those handlers if we could still communicate.
      Which makes trying to aquire a cluster wide lock from those handlers
      seem like a very bad idea.
      
      This introduces the "unfence-peer" handler, which will be called
      per connection (once for the group of volumes), just like the fence
      handler, only once all volumes are back in sync, and on the SyncSource.
      
      Which is expected to be the node that previously called "fence", the
      node that is currently allowed to be Primary, and thus the only node
      that could trigger a new "fence" that could race with this unfence.
      
      Which makes us not need any cluster wide synchronization here,
      serializing two scripts running on the same node is trivial.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      26a96110
    • L
      drbd: finish resync on sync source only by notification from sync target · 5052fee2
      Lars Ellenberg 提交于
      If the replication link breaks exactly during "resync finished" detection,
      finishing too early on the sync source could again lead to UUIDs rotated
      too fast, and potentially a spurious full resync on next handshake.
      
      Always wait for explicit resync finished state change notification from
      the sync target.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      5052fee2
    • L
      drbd: allow larger max_discard_sectors · 505675f9
      Lars Ellenberg 提交于
      Make sure we have at least 67 (> AL_UPDATES_PER_TRANSACTION)
      al-extents available, and allow up to half of that to be
      discarded in one bio.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      505675f9
    • L
      drbd: zero-out partial unaligned discards on local backend · 7435e901
      Lars Ellenberg 提交于
      For consistency, also zero-out partial unaligned chunks of discard
      requests on the local backend.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7435e901
    • L
      drbd: when receiving P_TRIM, zero-out partial unaligned chunks · dd4f699d
      Lars Ellenberg 提交于
      We can avoid spurious data divergence caused by partially-ignored
      discards on certain backends with discard_zeroes_data=0, if we
      translate partial unaligned discard requests into explicit zero-out.
      
      The relevant use case is LVM/DM thin.
      
      If on different nodes, DRBD is backed by devices with differing
      discard characteristics, discards may lead to data divergence
      (old data or garbage left over on one backend, zeroes due to
      unmapped areas on the other backend). Online verify would now
      potentially report tons of spurious differences.
      
      While probably harmless for most use cases (fstrim on a file system),
      DRBD cannot have that, it would violate our promise to upper layers
      that our data instances on the nodes are identical.
      
      To be correct and play safe (make sure data is identical on both copies),
      we would have to disable discard support, if our local backend (on a
      Primary) does not support "discard_zeroes_data=true".
      
      We'd also have to translate discards to explicit zero-out on the
      receiving (typically: Secondary) side, unless the receiving side
      supports "discard_zeroes_data=true".
      
      Which both would allocate those blocks, instead of unmapping them,
      in contrast with expectations.
      
      LVM/DM thin does set discard_zeroes_data=0,
      because it silently ignores discards to partial chunks.
      
      We can work around this by checking the alignment first.
      For unaligned (wrt. alignment and granularity) or too small discards,
      we zero-out the initial (and/or) trailing unaligned partial chunks,
      but discard all the aligned full chunks.
      
      At least for LVM/DM thin, the result is effectively "discard_zeroes_data=1".
      
      Arguably it should behave this way internally, by default,
      and we'll try to make that happen.
      
      But our workaround is still valid for already deployed setups,
      and for other devices that may behave this way.
      
      Setting discard-zeroes-if-aligned=yes will allow DRBD to use
      discards, and to announce discard_zeroes_data=true, even on
      backends that announce discard_zeroes_data=false.
      
      Setting discard-zeroes-if-aligned=no will cause DRBD to always
      fall-back to zero-out on the receiving side, and to not even
      announce discard capabilities on the Primary, if the respective
      backend announces discard_zeroes_data=false.
      
      We used to ignore the discard_zeroes_data setting completely.
      To not break established and expected behaviour, and suddenly
      cause fstrim on thin-provisioned LVs to run out-of-space,
      instead of freeing up space, the default value is "yes".
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      dd4f699d
    • P
      drbd: Implement handling of thinly provisioned storage on resync target nodes · 700ca8c0
      Philipp Reisner 提交于
      If during resync we read only zeroes for a range of sectors assume
      that these secotors can be discarded on the sync target node.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      700ca8c0
  8. 10 6月, 2016 1 次提交
  9. 08 6月, 2016 1 次提交
  10. 05 4月, 2016 1 次提交
  11. 27 1月, 2016 1 次提交
  12. 23 1月, 2016 1 次提交
  13. 26 11月, 2015 10 次提交
  14. 08 11月, 2015 1 次提交
  15. 14 8月, 2015 1 次提交
    • K
      block: kill merge_bvec_fn() completely · 8ae12666
      Kent Overstreet 提交于
      As generic_make_request() is now able to handle arbitrarily sized bios,
      it's no longer necessary for each individual block driver to define its
      own ->merge_bvec_fn() callback. Remove every invocation completely.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: drbd-user@lists.linbit.com
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@kernel.org>
      Cc: ceph-devel@vger.kernel.org
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Neil Brown <neilb@suse.de>
      Cc: linux-raid@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits)
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
      [dpark: also remove ->merge_bvec_fn() in dm-thin as well as
       dm-era-target, and resolve merge conflicts]
      Signed-off-by: NDongsu Park <dpark@posteo.net>
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8ae12666
  16. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  17. 02 6月, 2015 1 次提交
    • T
      writeback: separate out include/linux/backing-dev-defs.h · 66114cad
      Tejun Heo 提交于
      With the planned cgroup writeback support, backing-dev related
      declarations will be more widely used across block and cgroup;
      unfortunately, including backing-dev.h from include/linux/blkdev.h
      makes cyclic include dependency quite likely.
      
      This patch separates out backing-dev-defs.h which only has the
      essential definitions and updates blkdev.h to include it.  c files
      which need access to more backing-dev details now include
      backing-dev.h directly.  This takes backing-dev.h off the common
      include dependency chain making it a lot easier to use it across block
      and cgroup.
      
      v2: fs/fat build failure fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      66114cad
  18. 11 11月, 2014 3 次提交
  19. 11 9月, 2014 3 次提交