1. 10 4月, 2019 1 次提交
    • G
      block: Mark expected switch fall-throughs · e16fb3a8
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      This patch fixes the following warnings:
      
      drivers/block/drbd/drbd_int.h:1774:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      drivers/block/drbd/drbd_int.h:1774:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      drivers/block/drbd/drbd_int.h:1774:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      drivers/block/drbd/drbd_int.h:1774:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      drivers/block/drbd/drbd_int.h:1774:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      drivers/block/drbd/drbd_receiver.c:3093:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      drivers/block/drbd/drbd_receiver.c:3120:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      drivers/block/drbd/drbd_req.c:856:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enable
      -Wimplicit-fallthrough
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Acked-by: NRoland Kammerer <roland.kammerer@linbit.com>
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      e16fb3a8
  2. 21 12月, 2018 16 次提交
    • N
      drbd: Change drbd_request_detach_interruptible's return type to int · 5816a093
      Nathan Chancellor 提交于
      Clang warns when an implicit conversion is done between enumerated
      types:
      
      drivers/block/drbd/drbd_state.c:708:8: warning: implicit conversion from
      enumeration type 'enum drbd_ret_code' to different enumeration type
      'enum drbd_state_rv' [-Wenum-conversion]
                      rv = ERR_INTR;
                         ~ ^~~~~~~~
      
      drbd_request_detach_interruptible's only call site is in the return
      statement of adm_detach, which returns an int. Change the return type of
      drbd_request_detach_interruptible to match, silencing Clang's warning.
      Reported-by: NNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5816a093
    • L
      drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire") · f31e583a
      Lars Ellenberg 提交于
      And also re-enable partial-zero-out + discard aligned.
      
      With the introduction of REQ_OP_WRITE_ZEROES,
      we started to use that for both WRITE_ZEROES and DISCARDS,
      hoping that WRITE_ZEROES would "do what we want",
      UNMAP if possible, zero-out the rest.
      
      The example scenario is some LVM "thin" backend.
      
      While an un-allocated block on dm-thin reads as zeroes, on a dm-thin
      with "skip_block_zeroing=true", after a partial block write allocated
      that block, that same block may well map "undefined old garbage" from
      the backends on LBAs that have not yet been written to.
      
      If we cannot distinguish between zero-out and discard on the receiving
      side, to avoid "undefined old garbage" to pop up randomly at later times
      on supposedly zero-initialized blocks, we'd need to map all discards to
      zero-out on the receiving side.  But that would potentially do a full
      alloc on thinly provisioned backends, even when the expectation was to
      unmap/trim/discard/de-allocate.
      
      We need to distinguish on the protocol level, whether we need to guarantee
      zeroes (and thus use zero-out, potentially doing the mentioned full-alloc),
      or if we want to put the emphasis on discard, and only do a "best effort
      zeroing" (by "discarding" blocks aligned to discard-granularity, and zeroing
      only potential unaligned head and tail clippings to at least *try* to
      avoid "false positives" in an online-verify later), hoping that someone
      set skip_block_zeroing=false.
      
      For some discussion regarding this on dm-devel, see also
      https://www.mail-archive.com/dm-devel%40redhat.com/msg07965.html
      https://www.redhat.com/archives/dm-devel/2018-January/msg00271.html
      
      For backward compatibility, P_TRIM means zero-out, unless the
      DRBD_FF_WZEROES feature flag is agreed upon during handshake.
      
      To have upper layers even try to submit WRITE ZEROES requests,
      we need to announce "efficient zeroout" independently.
      
      We need to fixup max_write_zeroes_sectors after blk_queue_stack_limits():
      if we can handle "zeroes" efficiently on the protocol,
      we want to do that, even if our backend does not announce
      max_write_zeroes_sectors itself.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f31e583a
    • L
      drbd: skip spurious timeout (ping-timeo) when failing promote · 9848b6dd
      Lars Ellenberg 提交于
      If you try to promote a Secondary while connected to a Primary
      and allow-two-primaries is NOT set, we will wait for "ping-timeout"
      to give this node a chance to detect a dead primary,
      in case the cluster manager noticed faster than we did.
      
      But if we then are *still* connected to a Primary,
      we fail (after an additional timeout of ping-timout).
      
      This change skips the spurious second timeout.
      
      Most people won't notice really,
      since "ping-timeout" by default is half a second.
      
      But in some installations, ping-timeout may be 10 or 20 seconds or more,
      and spuriously delaying the error return becomes annoying.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9848b6dd
    • L
      drbd: don't retry connection if peers do not agree on "authentication" settings · 9049ccd4
      Lars Ellenberg 提交于
      emma: "Unexpected data packet AuthChallenge (0x0010)"
       ava: "expected AuthChallenge packet, received: ReportProtocol (0x000b)"
            "Authentication of peer failed, trying again."
      
      Pattern repeats.
      
      There is no point in retrying the handshake,
      if we expect to receive an AuthChallenge,
      but the peer is not even configured to expect or use a shared secret.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9049ccd4
    • L
      drbd: fix print_st_err()'s prototype to match the definition · 2c38f035
      Luc Van Oostenryck 提交于
      print_st_err() is defined with its 4th argument taking an
      'enum drbd_state_rv' but its prototype use an int for it.
      
      Fix this by using 'enum drbd_state_rv' in the prototype too.
      Signed-off-by: NLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Signed-off-by: NRoland Kammerer <roland.kammerer@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2c38f035
    • L
      drbd: avoid spurious self-outdating with concurrent disconnect / down · be80ff88
      Lars Ellenberg 提交于
      If peers are "simultaneously" told to disconnect from each other,
      either explicitly, or implicitly by taking down the resource,
      with bad timing, one side may see its disconnect "fail" with
      a result of "state change failed by peer", and interpret this as
      "please oudate yourself".
      
      Try to catch this by checking for current connection status,
      and possibly retry as local-only state change instead.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      be80ff88
    • L
      drbd: do not block when adjusting "disk-options" while IO is frozen · f708bd08
      Lars Ellenberg 提交于
      "suspending" IO is overloaded.
      It can mean "do not allow new requests" (obviously),
      but it also may mean "must not complete pending IO",
      for example while the fencing handlers do their arbitration.
      
      When adjusting disk options, we suspend io (disallow new requests), then
      wait for the activity-log to become unused (drain all IO completions),
      and possibly replace it with a new activity log of different size.
      
      If the other "suspend IO" aspect is active, pending IO completions won't
      happen, and we would block forever (unkillable drbdsetup process).
      
      Fix this by skipping the activity log adjustment if the "al-extents"
      setting did not change. Also, in case it did change, fail early without
      blocking if it looks like we would block forever.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f708bd08
    • L
      drbd: fix comment typos · a2823ea9
      Lars Ellenberg 提交于
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a2823ea9
    • L
      drbd: reject attach of unsuitable uuids even if connected · fe43ed97
      Lars Ellenberg 提交于
      Multiple failure scenario:
      a) all good
         Connected Primary/Secondary UpToDate/UpToDate
      b) lose disk on Primary,
         Connected Primary/Secondary Diskless/UpToDate
      c) continue to write to the device,
         changes only make it to the Secondary storage.
      d) lose disk on Secondary,
         Connected Primary/Secondary Diskless/Diskless
      e) now try to re-attach on Primary
      
      This would have succeeded before, even though that is clearly the
      wrong data set to attach to (missing the modifications from c).
      Because we only compared our "effective" and the "to-be-attached"
      data generation uuid tags if (device->state.conn < C_CONNECTED).
      
      Fix: change that constraint to (device->state.pdsk != D_UP_TO_DATE)
      compare the uuids, and reject the attach.
      
      This patch also tries to improve the reverse scenario:
      first lose Secondary, then Primary disk,
      then try to attach the disk on Secondary.
      
      Before this patch, the attach on the Secondary succeeds, but since commit
      drbd: disconnect, if the wrong UUIDs are attached on a connected peer
      the Primary will notice unsuitable data, and drop the connection hard.
      
      Though unfortunately at a point in time during the handshake where
      we cannot easily abort the attach on the peer without more
      refactoring of the handshake.
      
      We now reject any attach to "unsuitable" uuids,
      as long as we can see a Primary role,
      unless we already have access to "good" data.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fe43ed97
    • L
      drbd: attach on connected diskless peer must not shrink a consistent device · ad6e8979
      Lars Ellenberg 提交于
      If we would reject a new handshake, if the peer had attached first,
      and then connected, we should force disconnect if the peer first connects,
      and only then attaches.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ad6e8979
    • L
      drbd: fix confusing error message during attach · 4ef2a4f4
      Lars Ellenberg 提交于
      If we attach a (consistent) backing device,
      which knows about a last-agreed effective size,
      and that effective size is *larger* than the currently requested size,
      we refused to attach with ERR_DISK_TOO_SMALL
        Failure: (111) Low.dev. smaller than requested DRBD-dev. size.
      which is confusing to say the least.
      
      This patch changes the error code in that case to ERR_IMPLICIT_SHRINK
        Failure: (170) Implicit device shrinking not allowed. See kernel log.
        additional info from kernel:
        To-be-attached device has last effective > current size, and is consistent
        (9999 > 7777 sectors). Refusing to attach.
      
      It also allows to attach with an explicit size.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4ef2a4f4
    • L
      drbd: disconnect, if the wrong UUIDs are attached on a connected peer · b17b5960
      Lars Ellenberg 提交于
      With "on-no-data-accessible suspend-io", DRBD requires the next attach
      or connect to be to the very same data generation uuid tag it lost last.
      
      If we first lost connection to the peer,
      then later lost connection to our own disk,
      we would usually refuse to re-connect to the peer,
      because it presents the wrong data set.
      
      However, if the peer first connects without a disk,
      and then attached its disk, we accepted that same wrong data set,
      which would be "unexpected" by any user of that DRBD
      and cause "undefined results" (read: very likely data corruption).
      
      The fix is to forcefully disconnect as soon as we notice that the peer
      attached to the "wrong" dataset.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b17b5960
    • L
      drbd: ignore "all zero" peer volume sizes in handshake · 94c43a13
      Lars Ellenberg 提交于
      During handshake, if we are diskless ourselves, we used to accept any size
      presented by the peer.
      
      Which could be zero if that peer was just brought up and connected
      to us without having a disk attached first, in which case both
      peers would just "flip" their volume sizes.
      
      Now, even a diskless node will ignore "zero" sizes
      presented by a diskless peer.
      
      Also a currently Diskless Primary will refuse to shrink during handshake:
      it may be frozen, and waiting for a "suitable" local disk or peer to
      re-appear (on-no-data-accessible suspend-io). If the peer is smaller
      than what we used to be, it is not suitable.
      
      The logic for a diskless node during handshake is now supposed to be:
      believe the peer, if
       - I don't have a current size myself
       - we agree on the size anyways
       - I do have a current size, am Secondary, and he has the only disk
       - I do have a current size, am Primary, and he has the only disk,
         which is larger than my current size
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      94c43a13
    • L
      drbd: centralize printk reporting of new size into drbd_set_my_capacity() · d5412e8d
      Lars Ellenberg 提交于
      Previously, some implicit resizes that happend during handshake
      have not been reported as prominently as explicit resize.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d5412e8d
    • L
    • R
      drbd: narrow rcu_read_lock in drbd_sync_handshake · d29e89e3
      Roland Kammerer 提交于
      So far there was the possibility that we called
      genlmsg_new(GFP_NOIO)/mutex_lock() while holding an rcu_read_lock().
      
      This included cases like:
      
      drbd_sync_handshake (acquire the RCU lock)
        drbd_asb_recover_1p
          drbd_khelper
            drbd_bcast_event
              genlmsg_new(GFP_NOIO) --> may sleep
      
      drbd_sync_handshake (acquire the RCU lock)
        drbd_asb_recover_1p
          drbd_khelper
            notify_helper
              genlmsg_new(GFP_NOIO) --> may sleep
      
      drbd_sync_handshake (acquire the RCU lock)
        drbd_asb_recover_1p
          drbd_khelper
            notify_helper
              mutex_lock --> may sleep
      
      While using GFP_ATOMIC whould have been possible in the first two cases,
      the real fix is to narrow the rcu_read_lock.
      Reported-by: NJia-Ju Bai <baijiaju1990@163.com>
      Reviewed-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NRoland Kammerer <roland.kammerer@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d29e89e3
  3. 20 11月, 2018 1 次提交
  4. 16 11月, 2018 2 次提交
  5. 24 10月, 2018 1 次提交
    • D
      iov_iter: Separate type from direction and use accessor functions · aa563d7b
      David Howells 提交于
      In the iov_iter struct, separate the iterator type from the iterator
      direction and use accessor functions to access them in most places.
      
      Convert a bunch of places to use switch-statements to access them rather
      then chains of bitwise-AND statements.  This makes it easier to add further
      iterator types.  Also, this can be more efficient as to implement a switch
      of small contiguous integers, the compiler can use ~50% fewer compare
      instructions than it has to use bitwise-and instructions.
      
      Further, cease passing the iterator type into the iterator setup function.
      The iterator function can set that itself.  Only the direction is required.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      aa563d7b
  6. 11 10月, 2018 1 次提交
    • B
      drivers/block: remove redundant 'default n' from Kconfig-s · 486c6fba
      Bartlomiej Zolnierkiewicz 提交于
      'default n' is the default value for any bool or tristate Kconfig
      setting so there is no need to write it explicitly.
      
      Also since commit f467c564 ("kconfig: only write '# CONFIG_FOO
      is not set' for visible symbols") the Kconfig behavior is the same
      regardless of 'default n' being present or not:
      
          ...
          One side effect of (and the main motivation for) this change is making
          the following two definitions behave exactly the same:
      
              config FOO
                      bool
      
              config FOO
                      bool
                      default n
      
          With this change, neither of these will generate a
          '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
          That might make it clearer to people that a bare 'default n' is
          redundant.
          ...
      Signed-off-by: NBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      486c6fba
  7. 04 10月, 2018 1 次提交
  8. 07 9月, 2018 1 次提交
  9. 09 8月, 2018 1 次提交
  10. 18 7月, 2018 2 次提交
    • M
      block: Add and use op_stat_group() for indexing disk_stat fields. · ddcf35d3
      Michael Callahan 提交于
      Add and use a new op_stat_group() function for indexing partition stat
      fields rather than indexing them by rq_data_dir() or bio_data_dir().
      This function works similarly to op_is_sync() in that it takes the
      request::cmd_flags or bio::bi_opf flags and determines which stats
      should et updated.
      
      In addition, the second parameter to generic_start_io_acct() and
      generic_end_io_acct() is now a REQ_OP rather than simply a read or
      write bit and it uses op_stat_group() on the parameter to determine
      the stat group.
      
      Note that the partition in_flight counts are not part of the per-cpu
      statistics and as such are not indexed via this function.  It's now
      indexed by op_is_write().
      
      tj: Refreshed on top of v4.17.  Updated to pass around REQ_OP.
      Signed-off-by: NMichael Callahan <michaelcallahan@fb.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Matias Bjorling <mb@lightnvm.io>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ddcf35d3
    • M
      block: Add part_stat_read_accum to read across field entries. · 59767fbd
      Michael Callahan 提交于
      Add a part_stat_read_accum macro to genhd.h to read and sum across
      field entries.  For example to sum up the number read and write
      sectors completed.  In addition to being ar reasonable cleanup by
      itself this will make it easier to add new stat fields in the future.
      
      tj: Refreshed on top of v4.17.
      Signed-off-by: NMichael Callahan <michaelcallahan@fb.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      59767fbd
  11. 09 7月, 2018 2 次提交
  12. 02 7月, 2018 1 次提交
    • L
      drbd: fix access after free · 64dafbc9
      Lars Ellenberg 提交于
      We have
        struct drbd_requests { ... struct bio *private_bio;  ... }
      to hold a bio clone for local submission.
      
      On local IO completion, we put that bio, and in case we want to use the
      result later, we overload that member to hold the ERR_PTR() of the
      completion result,
      
      Which, before v4.3, used to be the passed in "int error",
      so we could first bio_put(), then assign.
      
      v4.3-rc1~100^2~21 4246a0b6 block: add a bi_error field to struct bio
      changed that:
        	bio_put(req->private_bio);
       -	req->private_bio = ERR_PTR(error);
       +	req->private_bio = ERR_PTR(bio->bi_error);
      
      Which introduces an access after free,
      because it was non obvious that req->private_bio == bio.
      
      Impact of that was mostly unnoticable, because we only use that value
      in a multiple-failure case, and even then map any "unexpected" error
      code to EIO, so worst case we could potentially mask a more specific
      error with EIO in a multiple failure case.
      
      Unless the pointed to memory region was unmapped, as is the case with
      CONFIG_DEBUG_PAGEALLOC, in which case this results in
      
        BUG: unable to handle kernel paging request
      
      v4.13-rc1~70^2~75 4e4cbee9 block: switch bios to blk_status_t
      changes it further to
        	bio_put(req->private_bio);
        	req->private_bio = ERR_PTR(blk_status_to_errno(bio->bi_status));
      
      And blk_status_to_errno() now contains a WARN_ON_ONCE() for unexpected
      values, which catches this "sometimes", if the memory has been reused
      quickly enough for other things.
      
      Should also go into stable since 4.3, with the trivial change around 4.13.
      
      Cc: stable@vger.kernel.org
      Fixes: 4246a0b6 block: add a bi_error field to struct bio
      Reported-by: NSarah Newman <srn@prgmr.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      64dafbc9
  13. 29 6月, 2018 1 次提交
  14. 13 6月, 2018 1 次提交
    • K
      treewide: kzalloc() -> kcalloc() · 6396bb22
      Kees Cook 提交于
      The kzalloc() function has a 2-factor argument form, kcalloc(). This
      patch replaces cases of:
      
              kzalloc(a * b, gfp)
      
      with:
              kcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kzalloc(a * b * c, gfp)
      
      with:
      
              kzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kzalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc
      + kcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(sizeof(THING) * C2, ...)
      |
        kzalloc(sizeof(TYPE) * C2, ...)
      |
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(C1 * C2, ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6396bb22
  15. 31 5月, 2018 1 次提交
  16. 25 5月, 2018 1 次提交
  17. 16 5月, 2018 1 次提交
  18. 14 5月, 2018 1 次提交
  19. 09 3月, 2018 1 次提交
  20. 01 3月, 2018 1 次提交
    • B
      block: Fix a race between the cgroup code and request queue initialization · 498f6650
      Bart Van Assche 提交于
      Initialize the request queue lock earlier such that the following
      race can no longer occur:
      
      blk_init_queue_node()             blkcg_print_blkgs()
        blk_alloc_queue_node (1)
          q->queue_lock = &q->__queue_lock (2)
          blkcg_init_queue(q) (3)
                                          spin_lock_irq(blkg->q->queue_lock) (4)
        q->queue_lock = lock (5)
                                          spin_unlock_irq(blkg->q->queue_lock) (6)
      
      (1) allocate an uninitialized queue;
      (2) initialize queue_lock to its default internal lock;
      (3) initialize blkcg part of request queue, which will create blkg and
          then insert it to blkg_list;
      (4) traverse blkg_list and find the created blkg, and then take its
          queue lock, here it is the default *internal lock*;
      (5) *race window*, now queue_lock is overridden with *driver specified
          lock*;
      (6) now unlock *driver specified lock*, not the locked *internal lock*,
          unlock balance breaks.
      
      The changes in this patch are as follows:
      - Move the .queue_lock initialization from blk_init_queue_node() into
        blk_alloc_queue_node().
      - Only override the .queue_lock pointer for legacy queues because it
        is not useful for blk-mq queues to override this pointer.
      - For all all block drivers that initialize .queue_lock explicitly,
        change the blk_alloc_queue() call in the driver into a
        blk_alloc_queue_node() call and remove the explicit .queue_lock
        initialization. Additionally, initialize the spin lock that will
        be used as queue lock earlier if necessary.
      Reported-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Ulf Hansson <ulf.hansson@linaro.org>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      498f6650
  21. 07 1月, 2018 1 次提交
  22. 03 12月, 2017 1 次提交