1. 09 11月, 2013 1 次提交
  2. 24 10月, 2013 1 次提交
    • L
      md: Fix skipping recovery for read-only arrays. · 61e4947c
      Lukasz Dorau 提交于
      Since:
              commit 7ceb17e8
              md: Allow devices to be re-added to a read-only array.
      
      spares are activated on a read-only array. In case of raid1 and raid10
      personalities it causes that not-in-sync devices are marked in-sync
      without checking if recovery has been finished.
      
      If a read-only array is degraded and one of its devices is not in-sync
      (because the array has been only partially recovered) recovery will be skipped.
      
      This patch adds checking if recovery has been finished before marking a device
      in-sync for raid1 and raid10 personalities. In case of raid5 personality
      such condition is already present (at raid5.c:6029).
      
      Bug was introduced in 3.10 and causes data corruption.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NPawel Baldysiak <pawel.baldysiak@intel.com>
      Signed-off-by: NLukasz Dorau <lukasz.dorau@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      61e4947c
  3. 25 7月, 2013 1 次提交
    • N
      md/raid10: remove use-after-free bug. · 0eb25bb0
      NeilBrown 提交于
      We always need to be careful when calling generic_make_request, as it
      can start a chain of events which might free something that we are
      using.
      
      Here is one place I wasn't careful enough.  If the wbio2 is not in
      use, then it might get freed at the first generic_make_request call.
      So perform all necessary tests first.
      
      This bug was introduced in 3.3-rc3 (24afd80d) and can cause an
      oops, so fix is suitable for any -stable since then.
      
      Cc: stable@vger.kernel.org (3.3+)
      Signed-off-by: NNeilBrown <neilb@suse.de>
      0eb25bb0
  4. 18 7月, 2013 1 次提交
    • N
      md/raid10: fix two problems with RAID10 resync. · 7bb23c49
      NeilBrown 提交于
      1/ When an different between blocks is found, data is copied from
         one bio to the other.  However bv_len is used as the length to
         copy and this could be zero.  So use r10_bio->sectors to calculate
         length instead.
         Using bv_len was probably always a bit dubious, but the introduction
         of bio_advance made it much more likely to be a problem.
      
      2/ When preparing some blocks for sync, we don't set BIO_UPTODATE
         except on bios that we schedule for a read.  This ensures that
         missing/failed devices don't confuse the loop at the top of
         sync_request write.
         Commit 8be185f2 "raid10: Use bio_reset()"
         removed a loop which set BIO_UPTDATE on all appropriate bios.
         So we need to re-add that flag.
      
      These bugs were introduced in 3.10, so this patch is suitable for
      3.10-stable, and can remove a potential for data corruption.
      
      Cc: stable@vger.kernel.org (3.10)
      Reported-by: NBrassow Jonathan <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7bb23c49
  5. 04 7月, 2013 1 次提交
    • N
      md/raid10: fix bug which causes all RAID10 reshapes to move no data. · 13765120
      NeilBrown 提交于
      The recent comment:
      commit 7e83ccbe
          md/raid10: Allow skipping recovery when clean arrays are assembled
      
      Causes raid10 to skip a recovery in certain cases where it is safe to
      do so.  Unfortunately it also causes a reshape to be skipped which is
      never safe.  The result is that an attempt to reshape a RAID10 will
      appear to complete instantly, but no data will have been moves so the
      array will now contain garbage.
      (If nothing is written, you can recovery by simple performing the
      reverse reshape which will also complete instantly).
      
      Bug was introduced in 3.10, so this is suitable for 3.10-stable.
      
      Cc: stable@vger.kernel.org (3.10)
      Cc: Martin Wilck <mwilck@arcor.de>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      13765120
  6. 03 7月, 2013 1 次提交
    • N
      md/raid10: fix two bugs affecting RAID10 reshape. · 78eaa0d4
      NeilBrown 提交于
      1/ If a RAID10 is being reshaped to a fewer number of devices
       and is stopped while this is ongoing, then when the array is
       reassembled the 'mirrors' array will be allocated too small.
       This will lead to an access error or memory corruption.
      
      2/ A sanity test for a reshaping RAID10 array is restarted
       is slightly incorrect.
      
      Due to the first bug, this is suitable for any -stable
      kernel since 3.5 where this code was introduced.
      
      Cc: stable@vger.kernel.org (v3.5+)
      Signed-off-by: NNeilBrown <neilb@suse.de>
      78eaa0d4
  7. 14 6月, 2013 3 次提交
    • N
      md/raid10: check In_sync flag in 'enough()'. · 725d6e57
      NeilBrown 提交于
      It isn't really enough to check that the rdev is present, we need to
      also be sure that the device is still In_sync.
      
      Doing this requires using rcu_dereference to access the rdev, and
      holding the rcu_read_lock() to ensure the rdev doesn't disappear while
      we look at it.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      725d6e57
    • N
      md/raid10: locking changes for 'enough()'. · 635f6416
      NeilBrown 提交于
      As 'enough' accesses conf->prev and conf->geo, which can change
      spontanously, it should guard against changes.
      This can be done with device_lock as start_reshape holds device_lock
      while updating 'geo' and end_reshape holds it while updating 'prev'.
      
      So 'error' needs to hold 'device_lock'.
      
      On the other hand, raid10_end_read_request knows which of the two it
      really wants to access, and as it is an active request on that one,
      the value cannot change underneath it.
      
      So change _enough to take flag rather than a pointer, pass the
      appropriate flag from raid10_end_read_request(), and remove the locking.
      
      All other calls to 'enough' are made with reconfig_mutex held, so
      neither 'prev' nor 'geo' can change.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      635f6416
    • J
      DM RAID: Add ability to restore transiently failed devices on resume · 9092c02d
      Jonathan Brassow 提交于
      DM RAID: Add ability to restore transiently failed devices on resume
      
      This patch adds code to the resume function to check over the devices
      in the RAID array.  If any are found to be marked as failed and their
      superblocks can be read, an attempt is made to reintegrate them into
      the array.  This allows the user to refresh the array with a simple
      suspend and resume of the array - rather than having to load a
      completely new table, allocate and initialize all the structures and
      throw away the old instantiation.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      9092c02d
  8. 13 6月, 2013 3 次提交
    • H
      md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in place · 5026d7a9
      H. Peter Anvin 提交于
      There are cases where the kernel will believe that the WRITE SAME
      command is supported by a block device which does not, in fact,
      support WRITE SAME.  This currently happens for SATA drivers behind a
      SAS controller, but there are probably a hundred other ways that can
      happen, including drive firmware bugs.
      
      After receiving an error for WRITE SAME the block layer will retry the
      request as a plain write of zeroes, but mdraid will consider the
      failure as fatal and consider the drive failed.  This has the effect
      that all the mirrors containing a specific set of data are each
      offlined in very rapid succession resulting in data loss.
      
      However, just bouncing the request back up to the block layer isn't
      ideal either, because the whole initial request-retry sequence should
      be inside the write bitmap fence, which probably means that md needs
      to do its own conversion of WRITE SAME to write zero.
      
      Until the failure scenario has been sorted out, disable WRITE SAME for
      raid1, raid5, and raid10.
      
      [neilb: added raid5]
      
      This patch is appropriate for any -stable since 3.7 when write_same
      support was added.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5026d7a9
    • N
      md/raid1,raid10: use freeze_array in place of raise_barrier in various places. · e2d59925
      NeilBrown 提交于
      Various places in raid1 and raid10 are calling raise_barrier when they
      really should call freeze_array.
      The former is only intended to be called from "make_request".
      The later has extra checks for 'nr_queued' and makes a call to
      flush_pending_writes(), so it is safe to call it from within the
      management thread.
      
      Using raise_barrier will sometimes deadlock.  Using freeze_array
      should not.
      
      As 'freeze_array' currently expects one request to be pending (in
      handle_read_error - the only previous caller), we need to pass
      it the number of pending requests (extra) to ignore.
      
      The deadlock was made particularly noticeable by commits
      050b6615 (raid10) and 6b740b8d (raid1) which
      appeared in 3.4, so the fix is appropriate for any -stable
      kernel since then.
      
      This patch probably won't apply directly to some early kernels and
      will need to be applied by hand.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAlexander Lyakas <alex.bolshoy@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e2d59925
    • A
      md/raid1: consider WRITE as successful only if at least one non-Faulty and... · 3056e3ae
      Alex Lyakas 提交于
      md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it.
      
      Without that fix, the following scenario could happen:
      
      - RAID1 with drives A and B; drive B was freshly-added and is rebuilding
      - Drive A fails
      - WRITE request arrives to the array. It is failed by drive A, so
      r1_bio is marked as R1BIO_WriteError, but the rebuilding drive B
      succeeds in writing it, so the same r1_bio is marked as
      R1BIO_Uptodate.
      - r1_bio arrives to handle_write_finished, badblocks are disabled,
      md_error()->error() does nothing because we don't fail the last drive
      of raid1
      - raid_end_bio_io()  calls call_bio_endio()
      - As a result, in call_bio_endio():
              if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
                      clear_bit(BIO_UPTODATE, &bio->bi_flags);
      this code doesn't clear the BIO_UPTODATE flag, and the whole master
      WRITE succeeds, back to the upper layer.
      
      So we returned success to the upper layer, even though we had written
      the data onto the rebuilding drive only. But when we want to read the
      data back, we would not read from the rebuilding drive, so this data
      is lost.
      
      [neilb - applied identical change to raid10 as well]
      
      This bug can result in lost data, so it is suitable for any
      -stable kernel.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlex Lyakas <alex@zadarastorage.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      3056e3ae
  9. 30 4月, 2013 2 次提交
  10. 24 4月, 2013 1 次提交
  11. 24 3月, 2013 5 次提交
    • K
      raid10: Use bio_reset() · 8be185f2
      Kent Overstreet 提交于
      More prep work for immutable bio vecs, mainly getting rid of references
      to bi_idx.
      
      bio_reset was being open coded in a few places. The one in sync_request
      was a bit nontrivial to convert, so could use some extra eyeballs.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      Acked-by: NNeilBrown <neilb@suse.de>
      8be185f2
    • K
      block: Add submit_bio_wait(), remove from md · 9e882242
      Kent Overstreet 提交于
      Random cleanup - this code was duplicated and it's not really specific
      to md.
      
      Also added the ability to return the actual error code.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      Acked-by: NTejun Heo <tj@kernel.org>
      9e882242
    • K
      block: Remove bi_idx references · 4f2ac93c
      Kent Overstreet 提交于
      For immutable bvecs, all bi_idx usage needs to be audited - so here
      we're removing all the unnecessary uses.
      
      Most of these are places where it was being initialized on a bio that
      was just allocated, a few others are conversions to standard macros.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      4f2ac93c
    • K
      block: Change bio_split() to respect the current value of bi_idx · 5b83636a
      Kent Overstreet 提交于
      In the current code bio_split() won't be seeing partially completed bios
      so this doesn't change any behaviour, but this makes the code a bit
      clearer as to what bio_split() actually requires.
      
      The immediate purpose of the patch is removing unnecessary bi_idx
      references, but the end goal is to allow partial completed bios to be
      submitted, which along with immutable biovecs enables effecient bio
      splitting.
      
      Some of the callers were (double) checking that bios could be split, so
      update their checks too.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
      CC: Neil Brown <neilb@suse.de>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      5b83636a
    • K
      block: Use bio_sectors() more consistently · aa8b57aa
      Kent Overstreet 提交于
      Bunch of places in the code weren't using it where they could be -
      this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx
      into a struct bvec_iter.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: "Ed L. Cashin" <ecashin@coraid.com>
      CC: Nick Piggin <npiggin@kernel.dk>
      CC: Jiri Kosina <jkosina@suse.cz>
      CC: Jim Paris <jim@jtan.com>
      CC: Geoff Levand <geoff@infradead.org>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: dm-devel@redhat.com
      CC: Neil Brown <neilb@suse.de>
      CC: Steven Rostedt <rostedt@goodmis.org>
      Acked-by: NEd Cashin <ecashin@coraid.com>
      aa8b57aa
  12. 26 2月, 2013 5 次提交
    • N
      md/raid1,raid10: fix deadlock with freeze_array() · ee0b0244
      NeilBrown 提交于
      When raid1/raid10 needs to fix a read error, it first drains
      all pending requests by calling freeze_array().
      This calls flush_pending_writes() if it needs to sleep,
      but some writes may be pending in a per-process plug rather
      than in the per-array request queue.
      
      When raid1{,0}_unplug() moves the request from the per-process
      plug to the per-array request queue (from which
      flush_pending_writes() can flush them), it needs to wake up
      freeze_array(), or freeze_array() will never flush them and so
      it will block forever.
      
      So add the requires wake_up() calls.
      
      This bug was introduced by commit
         f54a9d0e
      for raid1 and a similar commit for RAID10, and so has been present
      since linux-3.6.  As the bug causes a deadlock I believe this fix is
      suitable for -stable.
      
      Cc: stable@vger.kernel.org (3.6.y 3.7.y 3.8.y)
      Reported-by: NTregaron Bayly <tbayly@bluehost.com>
      Tested-by: NTregaron Bayly <tbayly@bluehost.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ee0b0244
    • J
      MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 2) · 9a3152ab
      Jonathan Brassow 提交于
      MD RAID10:  Improve redundancy for 'far' and 'offset' algorithms (part 2)
      
      This patch addresses raid arrays that have a number of devices that cannot
      be evenly divided by 'far_copies'.  (E.g. 5 devices, far_copies = 2)  This
      case must be handled differently because it causes that last set to be of
      a different size than the rest of the sets.  We must compute a new modulo
      for this last set so that copied chunks are properly wrapped around.
      
      Example use_far_sets=1, far_copies=2, near_copies=1, devices=5:
                      "far" algorithm
              dev1 dev2 dev3 dev4 dev5
      	==== ==== ==== ==== ====
      	[ A   B ] [ C    D   E ]
              [ G   H ] [ I    J   K ]
                          ...
              [ B   A ] [ E    C   D ] --> nominal set of 2 and last set of 3
              [ H   G ] [ K    I   J ]     []'s show far/offset sets
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      9a3152ab
    • J
      MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1) · 475901af
      Jonathan Brassow 提交于
      The MD RAID10 'far' and 'offset' algorithms make copies of entire stripe
      widths - copying them to a different location on the same devices after
      shifting the stripe.  An example layout of each follows below:
      
      	        "far" algorithm
      	dev1 dev2 dev3 dev4 dev5 dev6
      	==== ==== ==== ==== ==== ====
      	 A    B    C    D    E    F
      	 G    H    I    J    K    L
      	            ...
      	 F    A    B    C    D    E  --> Copy of stripe0, but shifted by 1
      	 L    G    H    I    J    K
      	            ...
      
      		"offset" algorithm
      	dev1 dev2 dev3 dev4 dev5 dev6
      	==== ==== ==== ==== ==== ====
      	 A    B    C    D    E    F
      	 F    A    B    C    D    E  --> Copy of stripe0, but shifted by 1
      	 G    H    I    J    K    L
      	 L    G    H    I    J    K
      	            ...
      
      Redundancy for these algorithms is gained by shifting the copied stripes
      one device to the right.  This patch proposes that array be divided into
      sets of adjacent devices and when the stripe copies are shifted, they wrap
      on set boundaries rather than the array size boundary.  That is, for the
      purposes of shifting, the copies are confined to their sets within the
      array.  The sets are 'near_copies * far_copies' in size.
      
      The above "far" algorithm example would change to:
      	        "far" algorithm
      	dev1 dev2 dev3 dev4 dev5 dev6
      	==== ==== ==== ==== ==== ====
      	 A    B    C    D    E    F
      	 G    H    I    J    K    L
      	            ...
      	 B    A    D    C    F    E  --> Copy of stripe0, shifted 1, 2-dev sets
      	 H    G    J    I    L    K      Dev sets are 1-2, 3-4, 5-6
      	            ...
      
      This has the affect of improving the redundancy of the array.  We can
      always sustain at least one failure, but sometimes more than one can
      be handled.  In the first examples, the pairs of devices that CANNOT fail
      together are:
      	(1,2) (2,3) (3,4) (4,5) (5,6) (1, 6) [40% of possible pairs]
      In the example where the copies are confined to sets, the pairs of
      devices that cannot fail together are:
      	(1,2) (3,4) (5,6)                    [20% of possible pairs]
      
      We cannot simply replace the old algorithms, so the 17th bit of the 'layout'
      variable is used to indicate whether we use the old or new method of computing
      the shift.  (This is similar to the way the 16th bit indicates whether the
      "far" algorithm or the "offset" algorithm is being used.)
      
      This patch only handles the cases where the number of total raid disks is
      a multiple of 'far_copies'.  A follow-on patch addresses the condition where
      this is not true.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      475901af
    • J
      MD RAID10: Minor non-functional code changes · 4c0ca26b
      Jonathan Brassow 提交于
      Changes include assigning 'addr' from 's' instead of 'sector' to be
      consistent with the way the code does it just a few lines later and
      using '%=' vs a conditional and subtraction.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      4c0ca26b
    • J
      md: raid1,10: Handle REQ_WRITE_SAME flag in write bios · c8dc9c65
      Joe Lawrence 提交于
      Set mddev queue's max_write_same_sectors to its chunk_sector value (before
      disk_stack_limits merges the underlying disk limits.)  With that in place,
      be sure to handle writes coming down from the block layer that have the
      REQ_WRITE_SAME flag set.  That flag needs to be copied into any newly cloned
      write bio.
      Signed-off-by: NJoe Lawrence <joe.lawrence@stratus.com>
      Acked-by: N"Martin K. Petersen" <martin.petersen@oracle.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c8dc9c65
  13. 30 11月, 2012 1 次提交
    • L
      wait: add wait_event_lock_irq() interface · eed8c02e
      Lukas Czerner 提交于
      New wait_event{_interruptible}_lock_irq{_cmd} macros added. This commit
      moves the private wait_event_lock_irq() macro from MD to regular wait
      includes, introduces new macro wait_event_lock_irq_cmd() instead of using
      the old method with omitting cmd parameter which is ugly and makes a use
      of new macros in the MD. It also introduces the _interruptible_ variant.
      
      The use of new interface is when one have a special lock to protect data
      structures used in the condition, or one also needs to invoke "cmd"
      before putting it to sleep.
      
      All new macros are expected to be called with the lock taken. The lock
      is released before sleep and is reacquired afterwards. We will leave the
      macro with the lock held.
      
      Note to DM: IMO this should also fix theoretical race on waitqueue while
      using simultaneously wait_event_lock_irq() and wait_event() because of
      lack of locking around current state setting and wait queue removal.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      eed8c02e
  14. 27 11月, 2012 1 次提交
    • N
      md/raid1{,0}: fix deadlock in bitmap_unplug. · 874807a8
      NeilBrown 提交于
      If the raid1 or raid10 unplug function gets called
      from a make_request function (which is very possible) when
      there are bios on the current->bio_list list, then it will not
      be able to successfully call bitmap_unplug() and it could
      need to submit more bios and wait for them to complete.
      But they won't complete while current->bio_list is non-empty.
      
      So detect that case and handle the unplugging off to another thread
      just like we already do when called from within the scheduler.
      
      RAID1 version of bug was introduced in 3.6, so that part of fix is
      suitable for 3.6.y.  RAID10 part won't apply.
      
      Cc: stable@vger.kernel.org
      Reported-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
      Reported-by: NPeter Maloney <peter.maloney@brockmann-consult.de>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      874807a8
  15. 22 11月, 2012 2 次提交
    • N
      md/raid10: decrement correct pending counter when writing to replacement. · 884162df
      NeilBrown 提交于
      When a write to a replacement device completes, we carefully
      and correctly found the rdev that the write actually went to
      and the blithely called rdev_dec_pending on the primary rdev,
      even if this write was to the replacement.
      
      This means that any writes to an array while a replacement
      was ongoing would cause the nr_pending count for the primary
      device to go negative, so it could never be removed.
      
      This bug has been present since replacement was introduced in
      3.3, so it is suitable for any -stable kernel since then.
      Reported-by: N"George Spelvin" <linux@horizon.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      884162df
    • N
      md/raid10: close race that lose writes lost when replacement completes. · e7c0c3fa
      NeilBrown 提交于
      When a replacement operation completes there is a small window
      when the original device is marked 'faulty' and the replacement
      still looks like a replacement.  The faulty should be removed and
      the replacement moved in place very quickly, bit it isn't instant.
      
      So the code write out to the array must handle the possibility that
      the only working device for some slot in the replacement - but it
      doesn't.  If the primary device is faulty it just gives up.  This
      can lead to corruption.
      
      So make the code more robust: if either  the primary or the
      replacement is present and working, write to them.  Only when
      neither are present do we give up.
      
      This bug has been present since replacement was introduced in
      3.3, so it is suitable for any -stable kernel since then.
      Reported-by: N"George Spelvin" <linux@horizon.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e7c0c3fa
  16. 31 10月, 2012 1 次提交
  17. 11 10月, 2012 6 次提交
  18. 27 9月, 2012 1 次提交
    • N
      md/raid10: fix "enough" function for detecting if array is failed. · 80b48124
      NeilBrown 提交于
      The 'enough' function is written to work with 'near' arrays only
      in that is implicitly assumes that the offset from one 'group' of
      devices to the next is the same as the number of copies.
      In reality it is the number of 'near' copies.
      
      So change it to make this number explicit.
      
      This bug makes it possible to run arrays without enough drives
      present, which is dangerous.
      It is appropriate for an -stable kernel, but will almost certainly
      need to be modified for some of them.
      
      Cc: stable@vger.kernel.org
      Reported-by: NJakub Husák <jakub@gooseman.cz>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      80b48124
  19. 18 8月, 2012 1 次提交
    • N
      md/raid10: fix problem with on-stack allocation of r10bio structure. · e0ee7785
      NeilBrown 提交于
      A 'struct r10bio' has an array of per-copy information at the end.
      This array is declared with size [0] and r10bio_pool_alloc allocates
      enough extra space to store the per-copy information depending on the
      number of copies needed.
      
      So declaring a 'struct r10bio on the stack isn't going to work.  It
      won't allocate enough space, and memory corruption will ensue.
      
      So in the two places where this is done, declare a sufficiently large
      structure and use that instead.
      
      The two call-sites of this bug were introduced in 3.4 and 3.5
      so this is suitable for both those kernels.  The patch will have to
      be modified for 3.4 as it only has one bug.
      
      Cc: stable@vger.kernel.org
      Reported-by: NIvan Vasilyev <ivan.vasilyev@gmail.com>
      Tested-by: NIvan Vasilyev <ivan.vasilyev@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e0ee7785
  20. 31 7月, 2012 2 次提交
    • N
      md: remove plug_cnt feature of plugging. · 0021b7bc
      NeilBrown 提交于
      This seemed like a good idea at the time, but after further thought I
      cannot see it making a difference other than very occasionally and
      testing to try to exercise the case it is most likely to help did not
      show any performance difference by removing it.
      
      So remove the counting of active plugs and allow 'pending writes' to
      be activated at any time, not just when no plugs are active.
      
      This is only relevant when there is a write-intent bitmap, and the
      updating of the bitmap will likely introduce enough delay that
      the single-threading of bitmap updates will be enough to collect large
      numbers of updates together.
      
      Removing this will make it easier to centralise the unplug code, and
      will clear the other for other unplug enhancements which have a
      measurable effect.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0021b7bc
    • J
      MD RAID10: Export md_raid10_congested · cc4d1efd
      Jonathan Brassow 提交于
      md/raid10: Export is_congested test.
      
      In similar fashion to commits
      	11d8a6e3
      	1ed7242e
      we export the RAID10 congestion checking function so that dm-raid.c can
      make use of it and make use of the personality.  The 'queue' and 'gendisk'
      structures will not be available to the MD code when device-mapper sets
      up the device, so we conditionalize access to these fields also.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      cc4d1efd