1. 30 8月, 2017 2 次提交
    • L
      drbd: Send P_NEG_ACK upon write error in protocol != C · e1fbc4ca
      Lars Ellenberg 提交于
      In protocol != C, we forgot to send the P_NEG_ACK for failing writes.
      
      Once we no longer submit to local disk, because we already "detached",
      due to the typical "on-io-error detach;" config setting,
      we already send the neg acks right away.
      
      Only those requests that have been submitted,
      and have been error-completed by the local disk,
      would forget to send the neg-ack,
      and only in asynchronous replication (protocol != C).
      Unless this happened during resync,
      where we already always send acks, regardless of protocol.
      
      The primary side needs the P_NEG_ACK in order to mark
      the affected block(s) for resync in its out-of-sync bitmap.
      
      If the blocks in question are not re-written again,
      we may miss to resync them later, causing data inconsistencies.
      
      This patch will always send the neg-acks, and also at least try to
      persist the out-of-sync status on the local node already.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e1fbc4ca
    • L
      drbd: introduce drbd_recv_header_maybe_unplug · c51a0ef3
      Lars Ellenberg 提交于
      Recently, drbd_recv_header() was changed to potentially
      implicitly "unplug" the backend device(s), in case there
      is currently nothing to receive.
      
      Be more explicit about it: re-introduce the original drbd_recv_header(),
      and introduce a new drbd_recv_header_maybe_unplug() for use by the
      receiver "main loop".
      
      Using explicit plugging via blk_start_plug(); blk_finish_plug();
      really helps the io-scheduler of the backend with merging requests.
      
      Wrap the receiver "main loop" with such a plug.
      Also catch unplug events on the Primary,
      and try to propagate.
      
      This is performance relevant.  Without this, if the receiving side does
      not merge requests, number of IOPS on the peer can me significantly
      higher than IOPS on the Primary, and can easily become the bottleneck.
      
      Together, both changes should help to reduce the number of IOPS
      as seen on the backend of the receiving side, by increasing
      the chance of merging mergable requests, without trading latency
      for more throughput.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c51a0ef3
  2. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  3. 09 6月, 2017 1 次提交
  4. 09 4月, 2017 1 次提交
  5. 02 3月, 2017 1 次提交
  6. 08 8月, 2016 1 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
  7. 21 7月, 2016 1 次提交
  8. 14 6月, 2016 5 次提交
    • F
      drbd: code cleanups without semantic changes · 7e5fec31
      Fabian Frederick 提交于
      This contains various cosmetic fixes ranging from simple typos to
      const-ifying, and using booleans properly.
      
      Original commit messages from Fabian's patch set:
      drbd: debugfs: constify drbd_version_fops
      drbd: use seq_put instead of seq_print where possible
      drbd: include linux/uaccess.h instead of asm/uaccess.h
      drbd: use const char * const for drbd strings
      drbd: kerneldoc warning fix in w_e_end_data_req()
      drbd: use unsigned for one bit fields
      drbd: use bool for peer is_ states
      drbd: fix typo
      drbd: use | for bitmask combination
      drbd: use true/false for bool
      drbd: fix drbd_bm_init() comments
      drbd: introduce peer state union
      drbd: fix maybe_pull_ahead() locking comments
      drbd: use bool for growing
      drbd: remove redundant declarations
      drbd: replace if/BUG by BUG_ON
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NRoland Kammerer <roland.kammerer@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7e5fec31
    • L
      drbd: introduce WRITE_SAME support · 9104d31a
      Lars Ellenberg 提交于
      We will support WRITE_SAME, if
       * all peers support WRITE_SAME (both in kernel and DRBD version),
       * all peer devices support WRITE_SAME
       * logical_block_size is identical on all peers.
      
      We may at some point introduce a fallback on the receiving side
      for devices/kernels that do not support WRITE_SAME,
      by open-coding a submit loop. But not yet.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9104d31a
    • L
      drbd: introduce unfence-peer handler · 26a96110
      Lars Ellenberg 提交于
      When resync is finished, we already call the "after-resync-target"
      handler (on the former sync target, obviously), once per volume.
      
      Paired with the before-resync-target handler, you can create snapshots,
      before the resync causes the volumes to become inconsistent,
      and discard those snapshots again, once they are no longer needed.
      
      It was also overloaded to be paired with the "fence-peer" handler,
      to "unfence" once the volumes are up-to-date and known good.
      
      This has some disadvantages, though: we call "fence-peer" for the whole
      connection (once for the group of volumes), but would call unfence as
      side-effect of after-resync-target once for each volume.
      
      Also, we fence on a (current, or about to become) Primary,
      which will later become the sync-source.
      
      Calling unfence only as a side effect of the after-resync-target
      handler opens a race window, between a new fence on the Primary
      (SyncTarget) and the unfence on the SyncTarget, which is difficult to
      close without some kind of "cluster wide lock" in those handlers.
      
      We would not need those handlers if we could still communicate.
      Which makes trying to aquire a cluster wide lock from those handlers
      seem like a very bad idea.
      
      This introduces the "unfence-peer" handler, which will be called
      per connection (once for the group of volumes), just like the fence
      handler, only once all volumes are back in sync, and on the SyncSource.
      
      Which is expected to be the node that previously called "fence", the
      node that is currently allowed to be Primary, and thus the only node
      that could trigger a new "fence" that could race with this unfence.
      
      Which makes us not need any cluster wide synchronization here,
      serializing two scripts running on the same node is trivial.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      26a96110
    • P
      drbd: Create the protocol feature THIN_RESYNC · 92d94ae6
      Philipp Reisner 提交于
      If thinly provisioned volumes are used, during a resync the sync source
      tries to find out if a block is deallocated. If it is deallocated, then
      the resync target uses block_dev_issue_zeroout() on the range in
      question.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      92d94ae6
    • P
      drbd: Implement handling of thinly provisioned storage on resync target nodes · 700ca8c0
      Philipp Reisner 提交于
      If during resync we read only zeroes for a range of sectors assume
      that these secotors can be discarded on the sync target node.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      700ca8c0
  9. 08 6月, 2016 1 次提交
  10. 27 1月, 2016 1 次提交
  11. 26 11月, 2015 5 次提交
  12. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  13. 11 11月, 2014 1 次提交
  14. 11 9月, 2014 5 次提交
  15. 11 7月, 2014 13 次提交