1. 13 12月, 2018 3 次提交
    • C
      bcache: make cutoff_writeback and cutoff_writeback_sync tunable · 9aaf5165
      Coly Li 提交于
      Currently the cutoff writeback and cutoff writeback sync thresholds are
      defined by CUTOFF_WRITEBACK (40) and CUTOFF_WRITEBACK_SYNC (70) as
      static values. Most of time these they work fine, but when people want
      to do research on bcache writeback mode performance tuning, there is no
      chance to modify the soft and hard cutoff writeback values.
      
      This patch introduces two module parameters bch_cutoff_writeback_sync
      and bch_cutoff_writeback which permit people to tune the values when
      loading bcache.ko. If they are not specified by module loading, current
      values CUTOFF_WRITEBACK_SYNC and CUTOFF_WRITEBACK will be used as
      default and nothing changes.
      
      When people want to tune this two values,
      - cutoff_writeback can be set in range [1, 70]
      - cutoff_writeback_sync can be set in range [1, 90]
      - cutoff_writeback always <= cutoff_writeback_sync
      
      The default values are strongly recommended to most of users for most of
      workloads. Anyway, if people wants to take their own risk to do research
      on new writeback cutoff tuning for their own workload, now they can make
      it.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9aaf5165
    • C
      bcache: add MODULE_DESCRIPTION information · 009673d0
      Coly Li 提交于
      This patch moves MODULE_AUTHOR and MODULE_LICENSE to end of super.c, and
      add MODULE_DESCRIPTION("Bcache: a Linux block layer cache").
      
      This is preparation for adding module parameters.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      009673d0
    • S
      bcache: do not check if debug dentry is ERR or NULL explicitly on remove · ae171023
      Shenghui Wang 提交于
      debugfs_remove and debugfs_remove_recursive will check if the dentry
      pointer is NULL or ERR, and will do nothing in that case.
      
      Remove the check in cache_set_free and bch_debug_init.
      Signed-off-by: NShenghui Wang <shhuiw@foxmail.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ae171023
  2. 08 10月, 2018 7 次提交
  3. 27 9月, 2018 1 次提交
    • G
      bcache: add separate workqueue for journal_write to avoid deadlock · 0f843e65
      Guoju Fang 提交于
      After write SSD completed, bcache schedules journal_write work to
      system_wq, which is a public workqueue in system, without WQ_MEM_RECLAIM
      flag. system_wq is also a bound wq, and there may be no idle kworker on
      current processor. Creating a new kworker may unfortunately need to
      reclaim memory first, by shrinking cache and slab used by vfs, which
      depends on bcache device. That's a deadlock.
      
      This patch create a new workqueue for journal_write with WQ_MEM_RECLAIM
      flag. It's rescuer thread will work to avoid the deadlock.
      Signed-off-by: NGuoju Fang <fangguoju@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0f843e65
  4. 12 8月, 2018 9 次提交
  5. 09 8月, 2018 3 次提交
    • C
      bcache: set max writeback rate when I/O request is idle · ea8c5356
      Coly Li 提交于
      Commit b1092c9a ("bcache: allow quick writeback when backing idle")
      allows the writeback rate to be faster if there is no I/O request on a
      bcache device. It works well if there is only one bcache device attached
      to the cache set. If there are many bcache devices attached to a cache
      set, it may introduce performance regression because multiple faster
      writeback threads of the idle bcache devices will compete the btree level
      locks with the bcache device who have I/O requests coming.
      
      This patch fixes the above issue by only permitting fast writebac when
      all bcache devices attached on the cache set are idle. And if one of the
      bcache devices has new I/O request coming, minimized all writeback
      throughput immediately and let PI controller __update_writeback_rate()
      to decide the upcoming writeback rate for each bcache device.
      
      Also when all bcache devices are idle, limited wrieback rate to a small
      number is wast of thoughput, especially when backing devices are slower
      non-rotation devices (e.g. SATA SSD). This patch sets a max writeback
      rate for each backing device if the whole cache set is idle. A faster
      writeback rate in idle time means new I/Os may have more available space
      for dirty data, and people may observe a better write performance then.
      
      Please note bcache may change its cache mode in run time, and this patch
      still works if the cache mode is switched from writeback mode and there
      is still dirty data on cache.
      
      Fixes: Commit b1092c9a ("bcache: allow quick writeback when backing idle")
      Cc: stable@vger.kernel.org #4.16+
      Signed-off-by: NColy Li <colyli@suse.de>
      Tested-by: NKai Krakow <kai@kaishome.de>
      Tested-by: NStefan Priebe <s.priebe@profihost.ag>
      Cc: Michael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ea8c5356
    • C
      bcache: add a comment in super.c · e57fd746
      Coly Li 提交于
      This patch adds a line of code comment in super.c:register_bdev(), to
      make code to be more comprehensible.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e57fd746
    • C
      bcache: do not check return value of debugfs_create_dir() · 78ac2107
      Coly Li 提交于
      Greg KH suggests that normal code should not care about debugfs. Therefore
      no matter successful or failed of debugfs_create_dir() execution, it is
      unncessary to check its return value.
      
      There are two functions called debugfs_create_dir() and check the return
      value, which are bch_debug_init() and closure_debug_init(). This patch
      changes these two functions from int to void type, and ignore return values
      of debugfs_create_dir().
      
      This patch does not fix exact bug, just makes things work as they should.
      Signed-off-by: NColy Li <colyli@suse.de>
      Suggested-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: stable@vger.kernel.org
      Cc: Kai Krakow <kai@kaishome.de>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      78ac2107
  6. 27 7月, 2018 5 次提交
  7. 13 6月, 2018 2 次提交
    • K
      treewide: Use array_size() in vzalloc() · fad953ce
      Kees Cook 提交于
      The vzalloc() function has no 2-factor argument form, so multiplication
      factors need to be wrapped in array_size(). This patch replaces cases of:
      
              vzalloc(a * b)
      
      with:
              vzalloc(array_size(a, b))
      
      as well as handling cases of:
      
              vzalloc(a * b * c)
      
      with:
      
              vzalloc(array3_size(a, b, c))
      
      This does, however, attempt to ignore constant size factors like:
      
              vzalloc(4 * 1024)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        vzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        vzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        vzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
        vzalloc(
      -	SIZE * COUNT
      +	array_size(COUNT, SIZE)
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        vzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        vzalloc(C1 * C2 * C3, ...)
      |
        vzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants.
      @@
      expression E1, E2;
      constant C1, C2;
      @@
      
      (
        vzalloc(C1 * C2, ...)
      |
        vzalloc(
      -	E1 * E2
      +	array_size(E1, E2)
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      fad953ce
    • K
      treewide: kzalloc() -> kcalloc() · 6396bb22
      Kees Cook 提交于
      The kzalloc() function has a 2-factor argument form, kcalloc(). This
      patch replaces cases of:
      
              kzalloc(a * b, gfp)
      
      with:
              kcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kzalloc(a * b * c, gfp)
      
      with:
      
              kzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kzalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc
      + kcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(sizeof(THING) * C2, ...)
      |
        kzalloc(sizeof(TYPE) * C2, ...)
      |
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(C1 * C2, ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6396bb22
  8. 31 5月, 2018 1 次提交
  9. 29 5月, 2018 2 次提交
    • A
      bcache: Move couple of string arrays to sysfs.c · 04cbc211
      Andy Shevchenko 提交于
      There is couple of string arrays that are used exclusively in sysfs.c.
      Move it to there and make them static.
      
      Besides above, it will allow further clean up.
      
      No functional change intended.
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      04cbc211
    • C
      bcache: stop bcache device when backing device is offline · 0f0709e6
      Coly Li 提交于
      Currently bcache does not handle backing device failure, if backing
      device is offline and disconnected from system, its bcache device can still
      be accessible. If the bcache device is in writeback mode, I/O requests even
      can success if the requests hit on cache device. That is to say, when and
      how bcache handles offline backing device is undefined.
      
      This patch tries to handle backing device offline in a rather simple way,
      - Add cached_dev->status_update_thread kernel thread to update backing
        device status in every 1 second.
      - Add cached_dev->offline_seconds to record how many seconds the backing
        device is observed to be offline. If the backing device is offline for
        BACKING_DEV_OFFLINE_TIMEOUT (30) seconds, set dc->io_disable to 1 and
        call bcache_device_stop() to stop the bache device which linked to the
        offline backing device.
      
      Now if a backing device is offline for BACKING_DEV_OFFLINE_TIMEOUT seconds,
      its bcache device will be removed, then user space application writing on
      it will get error immediately, and handler the device failure in time.
      
      This patch is quite simple, does not handle more complicated situations.
      Once the bcache device is stopped, users need to recovery the backing
      device, register and attach it manually.
      
      Changelog:
      v3: call wait_for_kthread_stop() before exits kernel thread.
      v2: remove "bcache: " prefix when calling pr_warn().
      v1: initial version.
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Cc: Michael Lyle <mlyle@lyle.org>
      Cc: Junhui Tang <tang.junhui@zte.com.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0f0709e6
  10. 03 5月, 2018 4 次提交
    • C
      bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set · 09a44ca2
      Coly Li 提交于
      It is possible that multiple I/O requests hits on failed cache device or
      backing device, therefore it is quite common that CACHE_SET_IO_DISABLE is
      set already when a task tries to set the bit from bch_cache_set_error().
      Currently the message "CACHE_SET_IO_DISABLE already set" is printed by
      pr_warn(), which might mislead users to think a serious fault happens in
      source code.
      
      This patch uses pr_info() to print the information in such situation,
      avoid extra worries. This information is helpful to understand bcache
      behavior in cache device failures, so I still keep them in source code.
      
      Fixes: 771f393e ("bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags")
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      09a44ca2
    • C
      bcache: set dc->io_disable to true in conditional_stop_bcache_device() · 4fd8e138
      Coly Li 提交于
      Commit 7e027ca4 ("bcache: add stop_when_cache_set_failed option to
      backing device") adds stop_when_cache_set_failed option and stops bcache
      device if stop_when_cache_set_failed is auto and there is dirty data on
      broken cache device. There might exists a small time gap that the cache
      set is released and set to NULL but bcache device is not released yet
      (because they are released in parallel). During this time gap, dc->c is
      NULL so CACHE_SET_IO_DISABLE won't be checked, and dc->io_disable is still
      false, so new coming I/O requests will be accepted and directly go into
      backing device as no cache set attached to. If there is dirty data on
      cache device, this behavior may introduce potential inconsistent data.
      
      This patch sets dc->io_disable to true before calling bcache_device_stop()
      to make sure the backing device will reject new coming I/O request as
      well, so even in the small time gap no I/O will directly go into backing
      device to corrupt data consistency.
      
      Fixes: 7e027ca4 ("bcache: add stop_when_cache_set_failed option to backing device")
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4fd8e138
    • C
      bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error() · 6147305c
      Coly Li 提交于
      Commit c7b7bd07 ("bcache: add io_disable to struct cached_dev") tries
      to stop bcache device by calling bcache_device_stop() when too many I/O
      errors happened on backing device. But if there is internal I/O happening
      on cache device (writeback scan, garbage collection, etc), a regular I/O
      request triggers the internal I/Os may still holds a refcount of dc->count,
      and the refcount may only be dropped after the internal I/O stopped.
      
      By this patch, bch_cached_dev_error() will check if the backing device is
      attached to a cache set, if yes that CACHE_SET_IO_DISABLE will be set to
      flags of this cache set. Then internal I/Os on cache device will be
      rejected and stopped immediately, and the bcache device can be stopped.
      
      For people who are not familiar with the interesting refcount dependance,
      let me explain a bit more how the fix works. Example the writeback thread
      will scan cache device for dirty data writeback purpose. Before it stopps,
      it holds a refcount of dc->count. When CACHE_SET_IO_DISABLE bit is set,
      the internal I/O will stopped and the while-loop in bch_writeback_thread()
      quits and calls cached_dev_put() to drop dc->count. If this is the last
      refcount to drop, then cached_dev_detach_finish() will be called. In this
      call back function, in turn closure_put(dc->disk.cl) is called to drop a
      refcount of closure dc->disk.cl. If this is the last refcount of this
      closure to drop, then cached_dev_flush() will be called. Then the cached
      device is freed. So if CACHE_SET_IO_DISABLE is not set, the bache device
      can not be stopped until all inernal cache device I/O stopped. For large
      size cache device, and writeback thread competes locks with gc thread,
      there might be a quite long time to wait.
      
      Fixes: c7b7bd07 ("bcache: add io_disable to struct cached_dev")
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6147305c
    • C
      bcache: store disk name in struct cache and struct cached_dev · 6e916a7e
      Coly Li 提交于
      Current code uses bdevname() or bio_devname() to reference gendisk
      disk name when bcache needs to display the disk names in kernel message.
      It was safe before bcache device failure handling patch set merged in,
      because when devices are failed, there was deadlock to prevent bcache
      printing error messages with gendisk disk name. But after the failure
      handling patch set merged, the deadlock is fixed, so it is possible
      that the gendisk structure bdev->hd_disk is released when bdevname() is
      called to reference bdev->bd_disk->disk_name[]. This is why I receive
      bug report of NULL pointers deference panic.
      
      This patch stores gendisk disk name in a buffer inside struct cache and
      struct cached_dev, then print out the offline device name won't reference
      bdev->hd_disk anymore. And this patch also avoids extra function calls
      of bdevname() and bio_devnmae().
      
      Changelog:
      v3, add Reviewed-by from Hannes.
      v2, call bdevname() earlier in register_bdev()
      v1, first version with segguestion from Junhui Tang.
      
      Fixes: c7b7bd07 ("bcache: add io_disable to struct cached_dev")
      Fixes: 5138ac67 ("bcache: fix misleading error message in bch_count_io_errors()")
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6e916a7e
  11. 19 3月, 2018 3 次提交
    • B
      bcache: Fix a compiler warning in bcache_device_init() · 5f2b18ec
      Bart Van Assche 提交于
      Avoid that building with W=1 triggers the following compiler warning:
      
      drivers/md/bcache/super.c:776:20: warning: comparison is always false due to limited range of data type [-Wtype-limits]
            d->nr_stripes > SIZE_MAX / sizeof(atomic_t)) {
                          ^
      Reviewed-by: NColy Li <colyli@suse.de>
      Reviewed-by: NMichael Lyle <mlyle@lyle.org>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5f2b18ec
    • C
      bcache: add io_disable to struct cached_dev · c7b7bd07
      Coly Li 提交于
      If a bcache device is configured to writeback mode, current code does not
      handle write I/O errors on backing devices properly.
      
      In writeback mode, write request is written to cache device, and
      latter being flushed to backing device. If I/O failed when writing from
      cache device to the backing device, bcache code just ignores the error and
      upper layer code is NOT noticed that the backing device is broken.
      
      This patch tries to handle backing device failure like how the cache device
      failure is handled,
      - Add a error counter 'io_errors' and error limit 'error_limit' in struct
        cached_dev. Add another io_disable to struct cached_dev to disable I/Os
        on the problematic backing device.
      - When I/O error happens on backing device, increase io_errors counter. And
        if io_errors reaches error_limit, set cache_dev->io_disable to true, and
        stop the bcache device.
      
      The result is, if backing device is broken of disconnected, and I/O errors
      reach its error limit, backing device will be disabled and the associated
      bcache device will be removed from system.
      
      Changelog:
      v2: remove "bcache: " prefix in pr_error(), and use correct name string to
          print out bcache device gendisk name.
      v1: indeed this is new added in v2 patch set.
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NMichael Lyle <mlyle@lyle.org>
      Cc: Michael Lyle <mlyle@lyle.org>
      Cc: Junhui Tang <tang.junhui@zte.com.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c7b7bd07
    • C
      bcache: add backing_request_endio() for bi_end_io · 27a40ab9
      Coly Li 提交于
      In order to catch I/O error of backing device, a separate bi_end_io
      call back is required. Then a per backing device counter can record I/O
      errors number and retire the backing device if the counter reaches a
      per backing device I/O error limit.
      
      This patch adds backing_request_endio() to bcache backing device I/O code
      path, this is a preparation for further complicated backing device failure
      handling. So far there is no real code logic change, I make this change a
      separate patch to make sure it is stable and reliable for further work.
      
      Changelog:
      v2: Fix code comments typo, remove a redundant bch_writeback_add() line
          added in v4 patch set.
      v1: indeed this is new added in this patch set.
      
      [mlyle: truncated commit subject]
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NMichael Lyle <mlyle@lyle.org>
      Cc: Junhui Tang <tang.junhui@zte.com.cn>
      Cc: Michael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      27a40ab9