1. 12 7月, 2013 8 次提交
    • K
      bcache: Allocation kthread fixes · 79826c35
      Kent Overstreet 提交于
      The alloc kthread should've been using try_to_freeze() - and also there
      was the potential for the alloc kthread to get woken up after it had
      shut down, which would have been bad.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      79826c35
    • K
      bcache: Fix GC_SECTORS_USED() calculation · 29ebf465
      Kent Overstreet 提交于
      Part of the job of garbage collection is to add up however many sectors
      of live data it finds in each bucket, but that doesn't work very well if
      it doesn't reset GC_SECTORS_USED() when it starts. Whoops.
      
      This wouldn't have broken anything horribly, but allocation tries to
      preferentially reclaim buckets that are mostly empty and that's not
      gonna work with an incorrect GC_SECTORS_USED() value.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
      29ebf465
    • K
      bcache: Journal replay fix · faa56736
      Kent Overstreet 提交于
      The journal replay code starts by finding something that looks like a
      valid journal entry, then it does a binary search over the unchecked
      region of the journal for the journal entries with the highest sequence
      numbers.
      
      Trouble is, the logic was wrong - journal_read_bucket() returns true if
      it found journal entries we need, but if the range of journal entries
      we're looking for loops around the end of the journal - in that case
      journal_read_bucket() could return true when it hadn't found the highest
      sequence number we'd seen yet, and in that case the binary search did
      the wrong thing. Whoops.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
      faa56736
    • K
      bcache: Shutdown fix · 5caa52af
      Kent Overstreet 提交于
      Stopping a cache set is supposed to make it stop attached backing
      devices, but somewhere along the way that code got lost. Fixing this
      mainly has the effect of fixing our reboot notifier.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
      5caa52af
    • K
      bcache: Fix a sysfs splat on shutdown · c9502ea4
      Kent Overstreet 提交于
      If we stopped a bcache device when we were already detaching (or
      something like that), bcache_device_unlink() would try to remove a
      symlink from sysfs that was already gone because the bcache dev kobject
      had already been removed from sysfs.
      
      So keep track of whether we've removed stuff from sysfs.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
      c9502ea4
    • K
      bcache: Advertise that flushes are supported · 54d12f2b
      Kent Overstreet 提交于
      Whoops - bcache's flush/FUA was mostly correct, but flushes get filtered
      out unless we say we support them...
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
      54d12f2b
    • D
      bcache: check for allocation failures · d2a65ce2
      Dan Carpenter 提交于
      There is a missing NULL check after the kzalloc().
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      d2a65ce2
    • K
      bcache: Fix a dumb race · 6aa8f1a6
      Kent Overstreet 提交于
      In the far-too-complicated closure code - closures can have destructors,
      for probably dubious reasons; they get run after the closure is no
      longer waiting on anything but before dropping the parent ref, intended
      just for freeing whatever memory the closure is embedded in.
      
      Trouble is, when remaining goes to 0 and we've got nothing more to run -
      we also have to unlock the closure, setting remaining to -1. If there's
      a destructor, that unlock isn't doing anything - nobody could be trying
      to lock it if we're about to free it - but if the unlock _is needed...
      that check for a destructor was racy. Argh.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
      6aa8f1a6
  2. 02 7月, 2013 4 次提交
  3. 27 6月, 2013 13 次提交
  4. 13 6月, 2013 4 次提交
    • H
      md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in place · 5026d7a9
      H. Peter Anvin 提交于
      There are cases where the kernel will believe that the WRITE SAME
      command is supported by a block device which does not, in fact,
      support WRITE SAME.  This currently happens for SATA drivers behind a
      SAS controller, but there are probably a hundred other ways that can
      happen, including drive firmware bugs.
      
      After receiving an error for WRITE SAME the block layer will retry the
      request as a plain write of zeroes, but mdraid will consider the
      failure as fatal and consider the drive failed.  This has the effect
      that all the mirrors containing a specific set of data are each
      offlined in very rapid succession resulting in data loss.
      
      However, just bouncing the request back up to the block layer isn't
      ideal either, because the whole initial request-retry sequence should
      be inside the write bitmap fence, which probably means that md needs
      to do its own conversion of WRITE SAME to write zero.
      
      Until the failure scenario has been sorted out, disable WRITE SAME for
      raid1, raid5, and raid10.
      
      [neilb: added raid5]
      
      This patch is appropriate for any -stable since 3.7 when write_same
      support was added.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5026d7a9
    • N
      md/raid1,raid10: use freeze_array in place of raise_barrier in various places. · e2d59925
      NeilBrown 提交于
      Various places in raid1 and raid10 are calling raise_barrier when they
      really should call freeze_array.
      The former is only intended to be called from "make_request".
      The later has extra checks for 'nr_queued' and makes a call to
      flush_pending_writes(), so it is safe to call it from within the
      management thread.
      
      Using raise_barrier will sometimes deadlock.  Using freeze_array
      should not.
      
      As 'freeze_array' currently expects one request to be pending (in
      handle_read_error - the only previous caller), we need to pass
      it the number of pending requests (extra) to ignore.
      
      The deadlock was made particularly noticeable by commits
      050b6615 (raid10) and 6b740b8d (raid1) which
      appeared in 3.4, so the fix is appropriate for any -stable
      kernel since then.
      
      This patch probably won't apply directly to some early kernels and
      will need to be applied by hand.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAlexander Lyakas <alex.bolshoy@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e2d59925
    • A
      md/raid1: consider WRITE as successful only if at least one non-Faulty and... · 3056e3ae
      Alex Lyakas 提交于
      md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it.
      
      Without that fix, the following scenario could happen:
      
      - RAID1 with drives A and B; drive B was freshly-added and is rebuilding
      - Drive A fails
      - WRITE request arrives to the array. It is failed by drive A, so
      r1_bio is marked as R1BIO_WriteError, but the rebuilding drive B
      succeeds in writing it, so the same r1_bio is marked as
      R1BIO_Uptodate.
      - r1_bio arrives to handle_write_finished, badblocks are disabled,
      md_error()->error() does nothing because we don't fail the last drive
      of raid1
      - raid_end_bio_io()  calls call_bio_endio()
      - As a result, in call_bio_endio():
              if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
                      clear_bit(BIO_UPTODATE, &bio->bi_flags);
      this code doesn't clear the BIO_UPTODATE flag, and the whole master
      WRITE succeeds, back to the upper layer.
      
      So we returned success to the upper layer, even though we had written
      the data onto the rebuilding drive only. But when we want to read the
      data back, we would not read from the rebuilding drive, so this data
      is lost.
      
      [neilb - applied identical change to raid10 as well]
      
      This bug can result in lost data, so it is suitable for any
      -stable kernel.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlex Lyakas <alex@zadarastorage.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      3056e3ae
    • N
      md: md_stop_writes() should always freeze recovery. · 6b6204ee
      NeilBrown 提交于
      __md_stop_writes() will currently sometimes freeze recovery.
      So any caller must be ready for that to happen, and indeed they are.
      
      However if __md_stop_writes() doesn't freeze_recovery, then
      a recovery could start before mddev_suspend() is called, which
      could be awkward.  This can particularly cause problems or dm-raid.
      
      So change __md_stop_writes() to always freeze recovery.  This is safe
      and more predicatable.
      Reported-by: NBrassow Jonathan <jbrassow@redhat.com>
      Tested-by: NBrassow Jonathan <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      6b6204ee
  5. 30 5月, 2013 1 次提交
  6. 20 5月, 2013 1 次提交
    • A
      dm thin: fix metadata dev resize detection · 610bba8b
      Alasdair G Kergon 提交于
      Fix detection of the need to resize the dm thin metadata device.
      
      The code incorrectly tried to extend the metadata device when it
      didn't need to due to a merging error with patch 24347e95 ("dm thin:
      detect metadata device resizing").
      
        device-mapper: transaction manager: couldn't open metadata space map
        device-mapper: thin metadata: tm_open_with_sm failed
        device-mapper: thin: aborting transaction failed
        device-mapper: thin: switching pool to failure mode
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      610bba8b
  7. 15 5月, 2013 3 次提交
    • K
      bcache: Fix error handling in init code · f59fce84
      Kent Overstreet 提交于
      This code appears to have rotted... fix various bugs and do some
      refactoring.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      f59fce84
    • P
      bcache: drop "select CLOSURES" · bbb1c3b5
      Paul Bolle 提交于
      The Kconfig entry for BCACHE selects CLOSURES. But there's no Kconfig
      symbol CLOSURES. That symbol was used in development versions of bcache,
      but was removed when the closures code was no longer provided as a
      kernel library. It can safely be dropped.
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      bbb1c3b5
    • E
      bcache: Fix incompatible pointer type warning · 867e1162
      Emil Goode 提交于
      The function pointer release in struct block_device_operations
      should point to functions declared as void.
      
      Sparse warnings:
      
      drivers/md/bcache/super.c:656:27: warning:
      	incorrect type in initializer (different base types)
      	drivers/md/bcache/super.c:656:27:
      	expected void ( *release )( ... )
      	drivers/md/bcache/super.c:656:27:
      	got int ( static [toplevel] *<noident> )( ... )
      
      drivers/md/bcache/super.c:656:2: warning:
      	initialization from incompatible pointer type [enabled by default]
      
      drivers/md/bcache/super.c:656:2: warning:
      	(near initialization for ‘bcache_ops.release’) [enabled by default]
      Signed-off-by: NEmil Goode <emilgoode@gmail.com>
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      867e1162
  8. 10 5月, 2013 6 次提交