1. 01 9月, 2016 1 次提交
    • S
      raid5-cache: fix a deadlock in superblock write · 8e018c21
      Shaohua Li 提交于
      There is a potential deadlock in superblock write. Discard could zero data, so
      before discard we must make sure superblock is updated to new log tail.
      Updating superblock (either directly call md_update_sb() or depend on md
      thread) must hold reconfig mutex. On the other hand, raid5_quiesce is called
      with reconfig_mutex hold. The first step of raid5_quiesce() is waitting for all
      IO finish, hence waitting for reclaim thread, while reclaim thread is calling
      this function and waitting for reconfig mutex. So there is a deadlock. We
      workaround this issue with a trylock. The downside of the solution is we could
      miss discard if we can't take reconfig mutex. But this should happen rarely
      (mainly in raid array stop), so miss discard shouldn't be a big problem.
      
      Cc: NeilBrown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      8e018c21
  2. 25 8月, 2016 7 次提交
  3. 19 8月, 2016 3 次提交
    • E
      bcache: pr_err: more meaningful error message when nr_stripes is invalid · 90706094
      Eric Wheeler 提交于
      The original error was thought to be corruption, but was actually caused by:
      	make-bcache --data-offset N
      where N was in bytes and should have been in sectors.  While userspace
      tools should be updated to check --data-offset beyond end of volume,
      hopefully this will help others that might not have noticed the units.
      Signed-off-by: NEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      90706094
    • K
      bcache: RESERVE_PRIO is too small by one when prio_buckets() is a power of two. · acc9cf8c
      Kent Overstreet 提交于
      This patch fixes a cachedev registration-time allocation deadlock.
      This can deadlock on boot if your initrd auto-registeres bcache devices:
      
      Allocator thread:
      [  720.727614] INFO: task bcache_allocato:3833 blocked for more than 120 seconds.
      [  720.732361]  [<ffffffff816eeac7>] schedule+0x37/0x90
      [  720.732963]  [<ffffffffa05192b8>] bch_bucket_alloc+0x188/0x360 [bcache]
      [  720.733538]  [<ffffffff810e6950>] ? prepare_to_wait_event+0xf0/0xf0
      [  720.734137]  [<ffffffffa05302bd>] bch_prio_write+0x19d/0x340 [bcache]
      [  720.734715]  [<ffffffffa05190bf>] bch_allocator_thread+0x3ff/0x470 [bcache]
      [  720.735311]  [<ffffffff816ee41c>] ? __schedule+0x2dc/0x950
      [  720.735884]  [<ffffffffa0518cc0>] ? invalidate_buckets+0x980/0x980 [bcache]
      
      Registration thread:
      [  720.710403] INFO: task bash:3531 blocked for more than 120 seconds.
      [  720.715226]  [<ffffffff816eeac7>] schedule+0x37/0x90
      [  720.715805]  [<ffffffffa05235cd>] __bch_btree_map_nodes+0x12d/0x150 [bcache]
      [  720.716409]  [<ffffffffa0522d30>] ? bch_btree_insert_check_key+0x1c0/0x1c0 [bcache]
      [  720.717008]  [<ffffffffa05236e4>] bch_btree_insert+0xf4/0x170 [bcache]
      [  720.717586]  [<ffffffff810e6950>] ? prepare_to_wait_event+0xf0/0xf0
      [  720.718191]  [<ffffffffa0527d9a>] bch_journal_replay+0x14a/0x290 [bcache]
      [  720.718766]  [<ffffffff810cc90d>] ? ttwu_do_activate.constprop.94+0x5d/0x70
      [  720.719369]  [<ffffffff810cf684>] ? try_to_wake_up+0x1d4/0x350
      [  720.719968]  [<ffffffffa05317d0>] run_cache_set+0x580/0x8e0 [bcache]
      [  720.720553]  [<ffffffffa053302e>] register_bcache+0xe2e/0x13b0 [bcache]
      [  720.721153]  [<ffffffff81354cef>] kobj_attr_store+0xf/0x20
      [  720.721730]  [<ffffffff812a2dad>] sysfs_kf_write+0x3d/0x50
      [  720.722327]  [<ffffffff812a225a>] kernfs_fop_write+0x12a/0x180
      [  720.722904]  [<ffffffff81225177>] __vfs_write+0x37/0x110
      [  720.723503]  [<ffffffff81228048>] ? __sb_start_write+0x58/0x110
      [  720.724100]  [<ffffffff812cedb3>] ? security_file_permission+0x23/0xa0
      [  720.724675]  [<ffffffff812258a9>] vfs_write+0xa9/0x1b0
      [  720.725275]  [<ffffffff8102479c>] ? do_audit_syscall_entry+0x6c/0x70
      [  720.725849]  [<ffffffff81226755>] SyS_write+0x55/0xd0
      [  720.726451]  [<ffffffff8106a390>] ? do_page_fault+0x30/0x80
      [  720.727045]  [<ffffffff816f2cae>] system_call_fastpath+0x12/0x71
      
      The fifo code in upstream bcache can't use the last element in the buffer,
      which was the cause of the bug: if you asked for a power of two size,
      it'd give you a fifo that could hold one less than what you asked for
      rather than allocating a buffer twice as big.
      Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
      Tested-by: NEric Wheeler <bcache@linux.ewheeler.net>
      Cc: stable@vger.kernel.org
      acc9cf8c
    • E
      bcache: register_bcache(): call blkdev_put() when cache_alloc() fails · d9dc1702
      Eric Wheeler 提交于
      register_cache() is supposed to return an error string on error so that
      register_bcache() will will blkdev_put and cleanup other user counters,
      but it does not set 'char *err' when cache_alloc() fails (eg, due to
      memory pressure) and thus register_bcache() performs no cleanup.
      
      register_bcache() <----------\  <- no jump to err_close, no blkdev_put()
         |                         |
         +->register_cache()       |  <- fails to set char *err
               |                   |
               +->cache_alloc() ---/  <- returns error
      
      This patch sets `char *err` for this failure case so that register_cache()
      will cause register_bcache() to correctly jump to err_close and do
      cleanup.  This was tested under OOM conditions that triggered the bug.
      Signed-off-by: NEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: stable@vger.kernel.org
      d9dc1702
  4. 18 8月, 2016 2 次提交
  5. 17 8月, 2016 5 次提交
  6. 15 8月, 2016 2 次提交
  7. 08 8月, 2016 1 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
  8. 06 8月, 2016 2 次提交
    • A
      md: Prevent IO hold during accessing to faulty raid5 array · 11367799
      Alexey Obitotskiy 提交于
      After array enters in faulty state (e.g. number of failed drives
      becomes more then accepted for raid5 level) it sets error flags
      (one of this flags is MD_CHANGE_PENDING). For internal metadata
      arrays MD_CHANGE_PENDING cleared into md_update_sb, but not for
      external metadata arrays. MD_CHANGE_PENDING flag set prevents to
      finish all new or non-finished IOs to array and hold them in
      pending state. In some cases this can leads to deadlock situation.
      
      For example, we have faulty array (2 of 4 drives failed) and
      udev handle array state changes and blkid started (or other
      userspace application that used array to read/write) but unable
      to finish reads due to IO hold. At the same time we unable to get
      exclusive access to array (to stop array in our case) because
      another external application still use this array.
      
      Fix makes possible to return IO with errors immediately.
      So external application can finish working with array and
      give exclusive access to other applications to perform
      required management actions with array.
      Signed-off-by: NAlexey Obitotskiy <aleksey.obitotskiy@intel.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      11367799
    • S
      MD: hold mddev lock to change bitmap location · d9dd26b2
      Shaohua Li 提交于
      Changing the location changes a lot of things. Holding the lock to avoid race.
      This makes the .quiesce called with mddev lock hold too.
      Acked-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      d9dd26b2
  9. 04 8月, 2016 2 次提交
  10. 03 8月, 2016 5 次提交
  11. 02 8月, 2016 1 次提交
    • Z
      raid5: fix incorrectly counter of conf->empty_inactive_list_nr · ff00d3b4
      ZhengYuan Liu 提交于
      The counter conf->empty_inactive_list_nr is only used for determine if the
      raid5 is congested which is deal with in function raid5_congested().
      It was increased in get_free_stripe() when conf->inactive_list got to be
      empty and decreased in release_inactive_stripe_list() when splice
      temp_inactive_list to conf->inactive_list. However, this may have a
      problem when raid5_get_active_stripe or stripe_add_to_batch_list was called,
      because these two functions may call list_del_init(&sh->lru) to delete sh from
      "conf->inactive_list + hash" which may cause "conf->inactive_list + hash" to
      be empty when atomic_inc_not_zero(&sh->count) got false. So a check should be
      done at these two point and increase empty_inactive_list_nr accordingly.
      Otherwise the counter may get to be negative number which would influence
      async readahead from VFS.
      Signed-off-by: NZhengYuan Liu <liuzhengyuan@kylinos.cn>
      Signed-off-by: NShaohua Li <shli@fb.com>
      ff00d3b4
  12. 31 7月, 2016 1 次提交
  13. 29 7月, 2016 1 次提交
  14. 21 7月, 2016 7 次提交