1. 27 12月, 2019 4 次提交
    • Y
      md/raid1: fail run raid1 array when active disk less than one · ca48081e
      Yufen Yu 提交于
      [ Upstream commit 07f1a6850c5d5a65c917c3165692b5179ac4cb6b ]
      
      When run test case:
        mdadm -CR /dev/md1 -l 1 -n 4 /dev/sd[a-d] --assume-clean --bitmap=internal
        mdadm -S /dev/md1
        mdadm -A /dev/md1 /dev/sd[b-c] --run --force
      
        mdadm --zero /dev/sda
        mdadm /dev/md1 -a /dev/sda
      
        echo offline > /sys/block/sdc/device/state
        echo offline > /sys/block/sdb/device/state
        sleep 5
        mdadm -S /dev/md1
      
        echo running > /sys/block/sdb/device/state
        echo running > /sys/block/sdc/device/state
        mdadm -A /dev/md1 /dev/sd[a-c] --run --force
      
      mdadm run fail with kernel message as follow:
      [  172.986064] md: kicking non-fresh sdb from array!
      [  173.004210] md: kicking non-fresh sdc from array!
      [  173.022383] md/raid1:md1: active with 0 out of 4 mirrors
      [  173.022406] md1: failed to create bitmap (-5)
      
      In fact, when active disk in raid1 array less than one, we
      need to return fail in raid1_run().
      Reviewed-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      ca48081e
    • Y
      md/raid1: fix a race between removing rdev and access conf->mirrors[i].rdev · 10783f79
      Yufen Yu 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 18683
      CVE: NA
      ---------------------------
      
      We get a NULL pointer dereference oops when test raid1 as follow:
      
      mdadm -CR /dev/md1 -l 1 -n 2 /dev/sd[ab]
      
      mdadm /dev/md1 -f /dev/sda
      mdadm /dev/md1 -r /dev/sda
      mdadm /dev/md1 -a /dev/sda
      sleep 5
      mdadm /dev/md1 -f /dev/sdb
      mdadm /dev/md1 -r /dev/sdb
      mdadm /dev/md1 -a /dev/sdb
      
      After a disk(/dev/sda) has been removed, we add the disk to
      raid array again, which would trigger recovery action.
      Since the rdev current state is 'spare', read/write bio can
      be issued to the disk.
      
      Then we set the other disk (/dev/sdb) faulty. Since the raid
      array is now in degraded state and /dev/sdb is the only
      'In_sync' disk, raid1_error() will return but without set
      faulty success.
      
      However, that can interrupt the recovery action and
      md_check_recovery will try to call remove_and_add_spares()
      to remove the spare disk. And the race condition between
      remove_and_add_spares() and raid1_write_request() in follow
      can cause NULL pointer dereference for conf->mirrors[i].rdev:
      
      raid1_write_request()   md_check_recovery    raid1_error()
      rcu_read_lock()
      rdev != NULL
      !test_bit(Faulty, &rdev->flags)
      
                                                 conf->recovery_disabled=
                                                   mddev->recovery_disabled;
                                                  return busy
      
                              remove_and_add_spares
                              raid1_remove_disk
                              rdev->nr_pending == 0
      
      atomic_inc(&rdev->nr_pending);
      rcu_read_unlock()
      
                              p->rdev=NULL
      
      conf->mirrors[i].rdev->data_offset
      NULL pointer deref!!!
      
                              if (!test_bit(RemoveSynchronized,
                                &rdev->flags))
                               synchronize_rcu();
                               p->rdev=rdev
      
      To fix the race condition, we add a new flag 'WantRemove' for rdev.
      Before access conf->mirrors[i].rdev, we need to ensure the rdev
      without 'WantRemove' bit.
      
      Link: https://marc.info/?l=linux-raid&m=156412052717709&w=2Reported-by: NZou Wei <zou_wei@huawei.com>
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      10783f79
    • Y
      md/raid1: end bio when the device faulty · 5cbc7d01
      Yufen Yu 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 18664
      CVE: NA
      ---------------------------
      
      When write bio return error, it would be added to conf->retry_list
      and wait for raid1d thread to retry write and acknowledge badblocks.
      
      In narrow_write_error(), the error bio will be split in the unit of
      badblock shift (such as one sector) and raid1d thread issues them
      one by one. Until all of the splited bio has finished, raid1d thread
      can go on processing other things, which is time consuming.
      
      But, there is a scene for error handling that is not necessary.
      When the device has been set faulty, flush_bio_list() may end
      bios in pending_bio_list with error status. Since these bios
      has not been issued to the device actually, error handlding to
      retry write and acknowledge badblocks make no sense.
      
      Even without that scene, when the device is faulty, badblocks info
      can not be written out to the device. Thus, we also no need to
      handle the error IO.
      
      Link: https://marc.info/?l=linux-raid&m=156351499609153&w=2Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      5cbc7d01
    • N
      md/raid1: don't clear bitmap bits on interrupted recovery. · c31def90
      Nate Dailey 提交于
      commit dfcc34c99f3ebc16b787b118763bf9cb6b1efc7a upstream.
      
      sync_request_write no longer submits writes to a Faulty device. This has
      the unfortunate side effect that bitmap bits can be incorrectly cleared
      if a recovery is interrupted (previously, end_sync_write would have
      prevented this). This means the next recovery may not copy everything
      it should, potentially corrupting data.
      
      Add a function for doing the proper md_bitmap_end_sync, called from
      end_sync_write and the Faulty case in sync_request_write.
      
      backport note to 4.14: s/md_bitmap_end_sync/bitmap_end_sync
      Cc: stable@vger.kernel.org 4.14+
      Fixes: 0c9d5b12 ("md/raid1: avoid reusing a resync bio after error handling.")
      Reviewed-by: NJack Wang <jinpu.wang@cloud.ionos.com>
      Tested-by: NJack Wang <jinpu.wang@cloud.ionos.com>
      Signed-off-by: NNate Dailey <nate.dailey@stratus.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c31def90
  2. 14 11月, 2018 1 次提交
  3. 02 8月, 2018 1 次提交
  4. 13 6月, 2018 2 次提交
    • K
      treewide: kzalloc() -> kcalloc() · 6396bb22
      Kees Cook 提交于
      The kzalloc() function has a 2-factor argument form, kcalloc(). This
      patch replaces cases of:
      
              kzalloc(a * b, gfp)
      
      with:
              kcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kzalloc(a * b * c, gfp)
      
      with:
      
              kzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kzalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc
      + kcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(sizeof(THING) * C2, ...)
      |
        kzalloc(sizeof(TYPE) * C2, ...)
      |
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(C1 * C2, ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6396bb22
    • K
      treewide: kmalloc() -> kmalloc_array() · 6da2ec56
      Kees Cook 提交于
      The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
      patch replaces cases of:
      
              kmalloc(a * b, gfp)
      
      with:
              kmalloc_array(a * b, gfp)
      
      as well as handling cases of:
      
              kmalloc(a * b * c, gfp)
      
      with:
      
              kmalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kmalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kmalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The tools/ directory was manually excluded, since it has its own
      implementation of kmalloc().
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kmalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kmalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kmalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kmalloc
      + kmalloc_array
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kmalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(sizeof(THING) * C2, ...)
      |
        kmalloc(sizeof(TYPE) * C2, ...)
      |
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(C1 * C2, ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6da2ec56
  5. 31 5月, 2018 1 次提交
  6. 18 5月, 2018 1 次提交
  7. 02 5月, 2018 1 次提交
  8. 09 4月, 2018 2 次提交
    • M
    • Y
      md/raid1: exit sync request if MD_RECOVERY_INTR is set · 8c242593
      Yufen Yu 提交于
      We met a sync thread stuck as follows:
      
       raid1_sync_request+0x2c9/0xb50
       md_do_sync+0x983/0xfa0
       md_thread+0x11c/0x160
       kthread+0x111/0x130
       ret_from_fork+0x35/0x40
       0xffffffffffffffff
      
      At the same time, there is a stuck mdadm thread (mdadm --manage
      /dev/md2 --add /dev/sda). It is trying to stop the sync thread:
      
       kthread_stop+0x42/0xf0
       md_unregister_thread+0x3a/0x70
       md_reap_sync_thread+0x15/0x160
       action_store+0x142/0x2a0
       md_attr_store+0x6c/0xb0
       kernfs_fop_write+0x102/0x180
       __vfs_write+0x33/0x170
       vfs_write+0xad/0x1a0
       SyS_write+0x52/0xc0
       do_syscall_64+0x6e/0x190
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Debug tools show that the sync thread is waiting in raise_barrier(),
      until raid1d() end all normal IO bios into bio_end_io_list(introduced
      in commit 55ce74d4). But, raid1d() cannot end these bios if
      MD_CHANGE_PENDING bit is set. It needs to get mddev->reconfig_mutex lock
      and then clear the bit in md_check_recovery().
      However, the lock is holding by mdadm in action_store().
      
      Thus, there is a loop:
      mdadm waiting for sync thread to stop, sync thread waiting for
      raid1d() to end bios, raid1d() waiting for mdadm to release
      mddev->reconfig_mutex lock and then it can end bios.
      
      Fix this by checking MD_RECOVERY_INTR while waiting in raise_barrier(),
      so that sync thread can exit while mdadm is stoping the sync thread.
      
      Fixes: 55ce74d4 ("md/raid1: ensure device failure recorded before write request returns.")
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      8c242593
  9. 09 3月, 2018 1 次提交
  10. 26 2月, 2018 1 次提交
    • Y
      md/raid1: fix NULL pointer dereference · 3de59bb9
      Yufen Yu 提交于
      In handle_write_finished(), if r1_bio->bios[m] != NULL, it thinks
      the corresponding conf->mirrors[m].rdev is also not NULL. But, it
      is not always true.
      
      Even if some io hold replacement rdev(i.e. rdev->nr_pending.count > 0),
      raid1_remove_disk() can also set the rdev as NULL. That means,
      bios[m] != NULL, but mirrors[m].rdev is NULL, resulting in NULL
      pointer dereference in handle_write_finished and sync_request_write.
      
      This patch can fix BUGs as follows:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000140
       IP: [<ffffffff815bbbbd>] raid1d+0x2bd/0xfc0
       PGD 12ab52067 PUD 12f587067 PMD 0
       Oops: 0000 [#1] SMP
       CPU: 1 PID: 2008 Comm: md3_raid1 Not tainted 4.1.44+ #130
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
       Call Trace:
        ? schedule+0x37/0x90
        ? prepare_to_wait_event+0x83/0xf0
        md_thread+0x144/0x150
        ? wake_atomic_t_function+0x70/0x70
        ? md_start_sync+0xf0/0xf0
        kthread+0xd8/0xf0
        ? kthread_worker_fn+0x160/0x160
        ret_from_fork+0x42/0x70
        ? kthread_worker_fn+0x160/0x160
      
       BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8
       IP: sync_request_write+0x9e/0x980
       PGD 800000007c518067 P4D 800000007c518067 PUD 8002b067 PMD 0
       Oops: 0000 [#1] SMP PTI
       CPU: 24 PID: 2549 Comm: md3_raid1 Not tainted 4.15.0+ #118
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
       Call Trace:
        ? sched_clock+0x5/0x10
        ? sched_clock_cpu+0xc/0xb0
        ? flush_pending_writes+0x3a/0xd0
        ? pick_next_task_fair+0x4d5/0x5f0
        ? __switch_to+0xa2/0x430
        raid1d+0x65a/0x870
        ? find_pers+0x70/0x70
        ? find_pers+0x70/0x70
        ? md_thread+0x11c/0x160
        md_thread+0x11c/0x160
        ? finish_wait+0x80/0x80
        kthread+0x111/0x130
        ? kthread_create_worker_on_cpu+0x70/0x70
        ? do_syscall_64+0x6f/0x190
        ? SyS_exit_group+0x10/0x10
        ret_from_fork+0x35/0x40
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Signed-off-by: NShaohua Li <sh.li@alibaba-inc.com>
      3de59bb9
  11. 22 2月, 2018 1 次提交
    • I
      treewide/trivial: Remove ';;$' typo noise · ed7158ba
      Ingo Molnar 提交于
      On lkml suggestions were made to split up such trivial typo fixes into per subsystem
      patches:
      
        --- a/arch/x86/boot/compressed/eboot.c
        +++ b/arch/x86/boot/compressed/eboot.c
        @@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height)
                struct efi_uga_draw_protocol *uga = NULL, *first_uga;
                efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
                unsigned long nr_ugas;
        -       u32 *handles = (u32 *)uga_handle;;
        +       u32 *handles = (u32 *)uga_handle;
                efi_status_t status = EFI_INVALID_PARAMETER;
                int i;
      
      This patch is the result of the following script:
      
        $ sed -i 's/;;$/;/g' $(git grep -E ';;$'  | grep "\.[ch]:"  | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq)
      
      ... followed by manual review to make sure it's all good.
      
      Splitting this up is just crazy talk, let's get over with this and just do it.
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ed7158ba
  12. 18 2月, 2018 1 次提交
  13. 12 12月, 2017 1 次提交
    • N
      md/raid1,raid10: silence warning about wait-within-wait · 474beb57
      NeilBrown 提交于
      If you prepare_to_wait() after a previous prepare_to_wait(),
      but before calling schedule(), you get warning:
      
        do not call blocking ops when !TASK_RUNNING; state=2
      
      This is appropriate as it is often a bug.  The event that the
      first prepare_to_wait() expects might wake up the schedule following
      the second prepare_to_wait(), which could be confusing.
      
      However if both prepare_to_wait()s are part of simple wait_event()
      loops, and if the inner one is rarely called, then there is
      no problem.  The inner loop is too simple to get confused by
      a stray wakeup, and the outer loop won't spin unduly because the
      inner doesnt affect it often.
      
      This pattern occurs in both raid1.c and raid10.c in the use of
      flush_pending_writes().
      
      The warning can be silenced by setting current->state to TASK_RUNNING.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      474beb57
  14. 02 12月, 2017 1 次提交
  15. 02 11月, 2017 5 次提交
    • G
      raid1: remove obsolete code in raid1_write_request · f81f7302
      Guoqing Jiang 提交于
      There are some lines could be removed due to recent
      change for raid1 such as commit 3956df15d634 ("md:
      move suspend_hi/lo handling into core md code").
      
      Also, seems some comments are put to wrong place,
      move them before wait_barrier.
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      f81f7302
    • N
      raid1: prevent freeze_array/wait_all_barriers deadlock · f6eca2d4
      Nate Dailey 提交于
      If freeze_array is attempted in the middle of close_sync/
      wait_all_barriers, deadlock can occur.
      
      freeze_array will wait for nr_pending and nr_queued to line up.
      wait_all_barriers increments nr_pending for each barrier bucket, one
      at a time, but doesn't actually issue IO that could be counted in
      nr_queued. So freeze_array is blocked until wait_all_barriers
      completes and allow_all_barriers runs. At the same time, when
      _wait_barrier sees array_frozen == 1, it stops and waits for
      freeze_array to complete.
      
      Prevent the deadlock by making close_sync call _wait_barrier and
      _allow_barrier for one bucket at a time, instead of deferring the
      _allow_barrier calls until after all _wait_barriers are complete.
      Signed-off-by: NNate Dailey <nate.dailey@stratus.com>
      Fix: fd76863e(RAID1: a new I/O barrier implementation to remove resync window)
      Reviewed-by: NColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org (v4.11)
      Signed-off-by: NShaohua Li <shli@fb.com>
      f6eca2d4
    • M
      md: use TASK_IDLE instead of blocking signals · ae89fd3d
      Mikulas Patocka 提交于
      Hi - I submit this patch for the next merge window:
      
      Some times ago, I made a patch f9c79bc0 that blocks signals around the
      schedule() calls in MD. The MD subsystem needs to do an uninterruptible
      sleep that is not accounted in load average - so we block signals and use
      interruptible sleep.
      
      The kernel has a special TASK_IDLE state for this purpose, so we can use
      it instead of blocking signals. This patch doesn't fix any bug, it just
      makes the code simpler.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Acked-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      ae89fd3d
    • N
      md: remove special meaning of ->quiesce(.., 2) · b03e0ccb
      NeilBrown 提交于
      The '2' argument means "wake up anything that is waiting".
      This is an inelegant part of the design and was added
      to help support management of suspend_lo/suspend_hi setting.
      Now that suspend_lo/hi is managed in mddev_suspend/resume,
      that need is gone.
      These is still a couple of places where we call 'quiesce'
      with an argument of '2', but they can safely be changed to
      call ->quiesce(.., 1); ->quiesce(.., 0) which
      achieve the same result at the small cost of pausing IO
      briefly.
      
      This removes a small "optimization" from suspend_{hi,lo}_store,
      but it isn't clear that optimization served a useful purpose.
      The code now is a lot clearer.
      Suggested-by: NShaohua Li <shli@kernel.org>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      b03e0ccb
    • N
      md: move suspend_hi/lo handling into core md code · b3143b9a
      NeilBrown 提交于
      responding to ->suspend_lo and ->suspend_hi is similar
      to responding to ->suspended.  It is best to wait in
      the common core code without incrementing ->active_io.
      This allows mddev_suspend()/mddev_resume() to work while
      requests are waiting for suspend_lo/hi to change.
      This is will be important after a subsequent patch
      which uses mddev_suspend() to synchronize updating for
      suspend_lo/hi.
      
      So move the code for testing suspend_lo/hi out of raid1.c
      and raid5.c, and place it in md.c
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      b3143b9a
  16. 17 10月, 2017 2 次提交
  17. 28 8月, 2017 1 次提交
  18. 26 8月, 2017 1 次提交
  19. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  20. 22 7月, 2017 5 次提交
  21. 19 6月, 2017 1 次提交
  22. 17 6月, 2017 1 次提交
  23. 14 6月, 2017 2 次提交
    • M
      md: don't use flush_signals in userspace processes · f9c79bc0
      Mikulas Patocka 提交于
      The function flush_signals clears all pending signals for the process. It
      may be used by kernel threads when we need to prepare a kernel thread for
      responding to signals. However using this function for an userspaces
      processes is incorrect - clearing signals without the program expecting it
      can cause misbehavior.
      
      The raid1 and raid5 code uses flush_signals in its request routine because
      it wants to prepare for an interruptible wait. This patch drops
      flush_signals and uses sigprocmask instead to block all signals (including
      SIGKILL) around the schedule() call. The signals are not lost, but the
      schedule() call won't respond to them.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Acked-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      f9c79bc0
    • N
      md: fix deadlock between mddev_suspend() and md_write_start() · cc27b0c7
      NeilBrown 提交于
      If mddev_suspend() races with md_write_start() we can deadlock
      with mddev_suspend() waiting for the request that is currently
      in md_write_start() to complete the ->make_request() call,
      and md_write_start() waiting for the metadata to be updated
      to mark the array as 'dirty'.
      As metadata updates done by md_check_recovery() only happen then
      the mddev_lock() can be claimed, and as mddev_suspend() is often
      called with the lock held, these threads wait indefinitely for each
      other.
      
      We fix this by having md_write_start() abort if mddev_suspend()
      is happening, and ->make_request() aborts if md_write_start()
      aborted.
      md_make_request() can detect this abort, decrease the ->active_io
      count, and wait for mddev_suspend().
      Reported-by: NNix <nix@esperi.org.uk>
      Fix: 68866e42(MD: no sync IO while suspended)
      Cc: stable@vger.kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      cc27b0c7
  24. 09 6月, 2017 1 次提交
  25. 06 6月, 2017 1 次提交