1. 10 4月, 2019 1 次提交
    • G
      block: Mark expected switch fall-throughs · e16fb3a8
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      This patch fixes the following warnings:
      
      drivers/block/drbd/drbd_int.h:1774:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      drivers/block/drbd/drbd_int.h:1774:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      drivers/block/drbd/drbd_int.h:1774:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      drivers/block/drbd/drbd_int.h:1774:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      drivers/block/drbd/drbd_int.h:1774:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      drivers/block/drbd/drbd_receiver.c:3093:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      drivers/block/drbd/drbd_receiver.c:3120:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      drivers/block/drbd/drbd_req.c:856:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enable
      -Wimplicit-fallthrough
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Acked-by: NRoland Kammerer <roland.kammerer@linbit.com>
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      e16fb3a8
  2. 21 12月, 2018 8 次提交
    • L
      drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire") · f31e583a
      Lars Ellenberg 提交于
      And also re-enable partial-zero-out + discard aligned.
      
      With the introduction of REQ_OP_WRITE_ZEROES,
      we started to use that for both WRITE_ZEROES and DISCARDS,
      hoping that WRITE_ZEROES would "do what we want",
      UNMAP if possible, zero-out the rest.
      
      The example scenario is some LVM "thin" backend.
      
      While an un-allocated block on dm-thin reads as zeroes, on a dm-thin
      with "skip_block_zeroing=true", after a partial block write allocated
      that block, that same block may well map "undefined old garbage" from
      the backends on LBAs that have not yet been written to.
      
      If we cannot distinguish between zero-out and discard on the receiving
      side, to avoid "undefined old garbage" to pop up randomly at later times
      on supposedly zero-initialized blocks, we'd need to map all discards to
      zero-out on the receiving side.  But that would potentially do a full
      alloc on thinly provisioned backends, even when the expectation was to
      unmap/trim/discard/de-allocate.
      
      We need to distinguish on the protocol level, whether we need to guarantee
      zeroes (and thus use zero-out, potentially doing the mentioned full-alloc),
      or if we want to put the emphasis on discard, and only do a "best effort
      zeroing" (by "discarding" blocks aligned to discard-granularity, and zeroing
      only potential unaligned head and tail clippings to at least *try* to
      avoid "false positives" in an online-verify later), hoping that someone
      set skip_block_zeroing=false.
      
      For some discussion regarding this on dm-devel, see also
      https://www.mail-archive.com/dm-devel%40redhat.com/msg07965.html
      https://www.redhat.com/archives/dm-devel/2018-January/msg00271.html
      
      For backward compatibility, P_TRIM means zero-out, unless the
      DRBD_FF_WZEROES feature flag is agreed upon during handshake.
      
      To have upper layers even try to submit WRITE ZEROES requests,
      we need to announce "efficient zeroout" independently.
      
      We need to fixup max_write_zeroes_sectors after blk_queue_stack_limits():
      if we can handle "zeroes" efficiently on the protocol,
      we want to do that, even if our backend does not announce
      max_write_zeroes_sectors itself.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f31e583a
    • L
      drbd: don't retry connection if peers do not agree on "authentication" settings · 9049ccd4
      Lars Ellenberg 提交于
      emma: "Unexpected data packet AuthChallenge (0x0010)"
       ava: "expected AuthChallenge packet, received: ReportProtocol (0x000b)"
            "Authentication of peer failed, trying again."
      
      Pattern repeats.
      
      There is no point in retrying the handshake,
      if we expect to receive an AuthChallenge,
      but the peer is not even configured to expect or use a shared secret.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9049ccd4
    • L
      drbd: fix comment typos · a2823ea9
      Lars Ellenberg 提交于
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a2823ea9
    • L
      drbd: reject attach of unsuitable uuids even if connected · fe43ed97
      Lars Ellenberg 提交于
      Multiple failure scenario:
      a) all good
         Connected Primary/Secondary UpToDate/UpToDate
      b) lose disk on Primary,
         Connected Primary/Secondary Diskless/UpToDate
      c) continue to write to the device,
         changes only make it to the Secondary storage.
      d) lose disk on Secondary,
         Connected Primary/Secondary Diskless/Diskless
      e) now try to re-attach on Primary
      
      This would have succeeded before, even though that is clearly the
      wrong data set to attach to (missing the modifications from c).
      Because we only compared our "effective" and the "to-be-attached"
      data generation uuid tags if (device->state.conn < C_CONNECTED).
      
      Fix: change that constraint to (device->state.pdsk != D_UP_TO_DATE)
      compare the uuids, and reject the attach.
      
      This patch also tries to improve the reverse scenario:
      first lose Secondary, then Primary disk,
      then try to attach the disk on Secondary.
      
      Before this patch, the attach on the Secondary succeeds, but since commit
      drbd: disconnect, if the wrong UUIDs are attached on a connected peer
      the Primary will notice unsuitable data, and drop the connection hard.
      
      Though unfortunately at a point in time during the handshake where
      we cannot easily abort the attach on the peer without more
      refactoring of the handshake.
      
      We now reject any attach to "unsuitable" uuids,
      as long as we can see a Primary role,
      unless we already have access to "good" data.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fe43ed97
    • L
      drbd: attach on connected diskless peer must not shrink a consistent device · ad6e8979
      Lars Ellenberg 提交于
      If we would reject a new handshake, if the peer had attached first,
      and then connected, we should force disconnect if the peer first connects,
      and only then attaches.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ad6e8979
    • L
      drbd: disconnect, if the wrong UUIDs are attached on a connected peer · b17b5960
      Lars Ellenberg 提交于
      With "on-no-data-accessible suspend-io", DRBD requires the next attach
      or connect to be to the very same data generation uuid tag it lost last.
      
      If we first lost connection to the peer,
      then later lost connection to our own disk,
      we would usually refuse to re-connect to the peer,
      because it presents the wrong data set.
      
      However, if the peer first connects without a disk,
      and then attached its disk, we accepted that same wrong data set,
      which would be "unexpected" by any user of that DRBD
      and cause "undefined results" (read: very likely data corruption).
      
      The fix is to forcefully disconnect as soon as we notice that the peer
      attached to the "wrong" dataset.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b17b5960
    • L
      drbd: ignore "all zero" peer volume sizes in handshake · 94c43a13
      Lars Ellenberg 提交于
      During handshake, if we are diskless ourselves, we used to accept any size
      presented by the peer.
      
      Which could be zero if that peer was just brought up and connected
      to us without having a disk attached first, in which case both
      peers would just "flip" their volume sizes.
      
      Now, even a diskless node will ignore "zero" sizes
      presented by a diskless peer.
      
      Also a currently Diskless Primary will refuse to shrink during handshake:
      it may be frozen, and waiting for a "suitable" local disk or peer to
      re-appear (on-no-data-accessible suspend-io). If the peer is smaller
      than what we used to be, it is not suitable.
      
      The logic for a diskless node during handshake is now supposed to be:
      believe the peer, if
       - I don't have a current size myself
       - we agree on the size anyways
       - I do have a current size, am Secondary, and he has the only disk
       - I do have a current size, am Primary, and he has the only disk,
         which is larger than my current size
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      94c43a13
    • R
      drbd: narrow rcu_read_lock in drbd_sync_handshake · d29e89e3
      Roland Kammerer 提交于
      So far there was the possibility that we called
      genlmsg_new(GFP_NOIO)/mutex_lock() while holding an rcu_read_lock().
      
      This included cases like:
      
      drbd_sync_handshake (acquire the RCU lock)
        drbd_asb_recover_1p
          drbd_khelper
            drbd_bcast_event
              genlmsg_new(GFP_NOIO) --> may sleep
      
      drbd_sync_handshake (acquire the RCU lock)
        drbd_asb_recover_1p
          drbd_khelper
            notify_helper
              genlmsg_new(GFP_NOIO) --> may sleep
      
      drbd_sync_handshake (acquire the RCU lock)
        drbd_asb_recover_1p
          drbd_khelper
            notify_helper
              mutex_lock --> may sleep
      
      While using GFP_ATOMIC whould have been possible in the first two cases,
      the real fix is to narrow the rcu_read_lock.
      Reported-by: NJia-Ju Bai <baijiaju1990@163.com>
      Reviewed-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NRoland Kammerer <roland.kammerer@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d29e89e3
  3. 20 11月, 2018 1 次提交
  4. 24 10月, 2018 1 次提交
    • D
      iov_iter: Separate type from direction and use accessor functions · aa563d7b
      David Howells 提交于
      In the iov_iter struct, separate the iterator type from the iterator
      direction and use accessor functions to access them in most places.
      
      Convert a bunch of places to use switch-statements to access them rather
      then chains of bitwise-AND statements.  This makes it easier to add further
      iterator types.  Also, this can be more efficient as to implement a switch
      of small contiguous integers, the compiler can use ~50% fewer compare
      instructions than it has to use bitwise-and instructions.
      
      Further, cease passing the iterator type into the iterator setup function.
      The iterator function can set that itself.  Only the direction is required.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      aa563d7b
  5. 07 9月, 2018 1 次提交
  6. 18 7月, 2018 1 次提交
  7. 09 7月, 2018 1 次提交
  8. 31 5月, 2018 1 次提交
  9. 03 12月, 2017 1 次提交
  10. 07 11月, 2017 1 次提交
    • K
      drbd: Convert timers to use timer_setup() · 2bccef39
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
      Cc: drbd-dev@lists.linbit.com
      Signed-off-by: NKees Cook <keescook@chromium.org>
      2bccef39
  11. 30 8月, 2017 5 次提交
  12. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  13. 09 6月, 2017 1 次提交
  14. 09 4月, 2017 3 次提交
  15. 02 3月, 2017 2 次提交
  16. 22 11月, 2016 1 次提交
  17. 01 11月, 2016 1 次提交
  18. 08 8月, 2016 1 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
  19. 14 6月, 2016 8 次提交
    • L
      drbd: correctly handle failed crypto_alloc_hash · 1b57e663
      Lars Ellenberg 提交于
      crypto_alloc_hash returns an ERR_PTR(), not NULL.
      
      Also reset peer_integrity_tfm to NULL, to not call crypto_free_hash()
      on an errno in the cleanup path.
      Reported-by: NInsu Yun <wuninsu@gmail.com>
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1b57e663
    • F
      drbd: code cleanups without semantic changes · 7e5fec31
      Fabian Frederick 提交于
      This contains various cosmetic fixes ranging from simple typos to
      const-ifying, and using booleans properly.
      
      Original commit messages from Fabian's patch set:
      drbd: debugfs: constify drbd_version_fops
      drbd: use seq_put instead of seq_print where possible
      drbd: include linux/uaccess.h instead of asm/uaccess.h
      drbd: use const char * const for drbd strings
      drbd: kerneldoc warning fix in w_e_end_data_req()
      drbd: use unsigned for one bit fields
      drbd: use bool for peer is_ states
      drbd: fix typo
      drbd: use | for bitmask combination
      drbd: use true/false for bool
      drbd: fix drbd_bm_init() comments
      drbd: introduce peer state union
      drbd: fix maybe_pull_ahead() locking comments
      drbd: use bool for growing
      drbd: remove redundant declarations
      drbd: replace if/BUG by BUG_ON
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NRoland Kammerer <roland.kammerer@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7e5fec31
    • L
      drbd: sync_handshake: handle identical uuids with current (frozen) Primary · f2d3d75b
      Lars Ellenberg 提交于
      If in a two-primary scenario, we lost our peer, freeze IO,
      and are still frozen (no UUID rotation) when the peer comes back
      as Secondary after a hard crash, we will see identical UUIDs.
      
      The "rule_nr = 40" chose to use the "CRASHED_PRIMARY" bit as
      arbitration, but that would cause the still running (but frozen) Primary
      to become SyncTarget (which it typically refuses), and the handshake is
      declined.
      
      Fix: check current roles.
      If we have *one* current primary, the Primary wins.
      (rule_nr = 41)
      
      Since that is a protocol change, use the newly introduced DRBD_FF_WSAME
      to determine if rule_nr = 41 can be applied.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f2d3d75b
    • L
      drbd: introduce WRITE_SAME support · 9104d31a
      Lars Ellenberg 提交于
      We will support WRITE_SAME, if
       * all peers support WRITE_SAME (both in kernel and DRBD version),
       * all peer devices support WRITE_SAME
       * logical_block_size is identical on all peers.
      
      We may at some point introduce a fallback on the receiving side
      for devices/kernels that do not support WRITE_SAME,
      by open-coding a submit loop. But not yet.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9104d31a
    • L
    • L
      drbd: when receiving P_TRIM, zero-out partial unaligned chunks · dd4f699d
      Lars Ellenberg 提交于
      We can avoid spurious data divergence caused by partially-ignored
      discards on certain backends with discard_zeroes_data=0, if we
      translate partial unaligned discard requests into explicit zero-out.
      
      The relevant use case is LVM/DM thin.
      
      If on different nodes, DRBD is backed by devices with differing
      discard characteristics, discards may lead to data divergence
      (old data or garbage left over on one backend, zeroes due to
      unmapped areas on the other backend). Online verify would now
      potentially report tons of spurious differences.
      
      While probably harmless for most use cases (fstrim on a file system),
      DRBD cannot have that, it would violate our promise to upper layers
      that our data instances on the nodes are identical.
      
      To be correct and play safe (make sure data is identical on both copies),
      we would have to disable discard support, if our local backend (on a
      Primary) does not support "discard_zeroes_data=true".
      
      We'd also have to translate discards to explicit zero-out on the
      receiving (typically: Secondary) side, unless the receiving side
      supports "discard_zeroes_data=true".
      
      Which both would allocate those blocks, instead of unmapping them,
      in contrast with expectations.
      
      LVM/DM thin does set discard_zeroes_data=0,
      because it silently ignores discards to partial chunks.
      
      We can work around this by checking the alignment first.
      For unaligned (wrt. alignment and granularity) or too small discards,
      we zero-out the initial (and/or) trailing unaligned partial chunks,
      but discard all the aligned full chunks.
      
      At least for LVM/DM thin, the result is effectively "discard_zeroes_data=1".
      
      Arguably it should behave this way internally, by default,
      and we'll try to make that happen.
      
      But our workaround is still valid for already deployed setups,
      and for other devices that may behave this way.
      
      Setting discard-zeroes-if-aligned=yes will allow DRBD to use
      discards, and to announce discard_zeroes_data=true, even on
      backends that announce discard_zeroes_data=false.
      
      Setting discard-zeroes-if-aligned=no will cause DRBD to always
      fall-back to zero-out on the receiving side, and to not even
      announce discard capabilities on the Primary, if the respective
      backend announces discard_zeroes_data=false.
      
      We used to ignore the discard_zeroes_data setting completely.
      To not break established and expected behaviour, and suddenly
      cause fstrim on thin-provisioned LVs to run out-of-space,
      instead of freeing up space, the default value is "yes".
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      dd4f699d
    • L
      drbd: allow parallel flushes for multi-volume resources · f9ff0da5
      Lars Ellenberg 提交于
      To maintain write-order fidelity accros all volumes in a DRBD resource,
      the receiver of a P_BARRIER needs to issue flushes to all volumes.
      We used to do this by calling blkdev_issue_flush(), synchronously,
      one volume at a time.
      
      We now submit all flushes to all volumes in parallel, then wait for all
      completions, to reduce worst-case latencies on multi-volume resources.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f9ff0da5
    • P
      drbd: Create the protocol feature THIN_RESYNC · 92d94ae6
      Philipp Reisner 提交于
      If thinly provisioned volumes are used, during a resync the sync source
      tries to find out if a block is deallocated. If it is deallocated, then
      the resync target uses block_dev_issue_zeroout() on the range in
      question.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      92d94ae6