1. 19 3月, 2012 12 次提交
    • N
      md/bitmap: change a 'goto' to a normal 'if' construct. · 278c1ca2
      NeilBrown 提交于
      The use of a goto makes the control flow more obscure here.
      
      So make it a normal:
        if (x) {
           Y;
        }
      
      No functional change.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      278c1ca2
    • N
      md/bitmap: move printing of bitmap status to bitmap.c · 57148964
      NeilBrown 提交于
      The part of /proc/mdstat which describes the bitmap should really
      be generated by code in bitmap.c.  So move it there.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      57148964
    • N
      md/bitmap: remove some unused noise from bitmap.h · 4ba97dff
      NeilBrown 提交于
      Signed-off-by: NNeilBrown <neilb@suse.de>
      4ba97dff
    • N
      md/raid10 - support resizing some RAID10 arrays. · 006a09a0
      NeilBrown 提交于
      'resizing' an array in this context means making use of extra
      space that has become available in component devices, not adding new
      devices.
      It also includes shrinking the array to take up less space of
      component devices.
      
      This is not supported for array with a 'far' layout.  However
      for 'near' and 'offset' layout arrays, adding and removing space at
      the end of the devices is easy to support, and this patch provides
      that support.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      006a09a0
    • N
      md/raid1: handle merge_bvec_fn in member devices. · 6b740b8d
      NeilBrown 提交于
      Currently we don't honour merge_bvec_fn in member devices so if there
      is one, we force all requests to be single-page at most.
      This is not ideal.
      
      So create a raid1 merge_bvec_fn to check that function in children
      as well.
      
      This introduces a small problem.  There is no locking around calls
      the ->merge_bvec_fn and subsequent calls to ->make_request.  So a
      device added between these could end up getting a request which
      violates its merge_bvec_fn.
      
      Currently the best we can do is synchronize_sched().  This will work
      providing no preemption happens.  If there is is preemption, we just
      have to hope that new devices are largely consistent with old devices.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      6b740b8d
    • N
      md/raid10: handle merge_bvec_fn in member devices. · 050b6615
      NeilBrown 提交于
      Currently we don't honour merge_bvec_fn in member devices so if there
      is one, we force all requests to be single-page at most.
      This is not ideal.
      
      So enhance the raid10 merge_bvec_fn to check that function in children
      as well.
      
      This introduces a small problem.  There is no locking around calls
      the ->merge_bvec_fn and subsequent calls to ->make_request.  So a
      device added between these could end up getting a request which
      violates its merge_bvec_fn.
      
      Currently the best we can do is synchronize_sched().  This will work
      providing no preemption happens.  If there is preemption, we just
      have to hope that new devices are largely consistent with old devices.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      050b6615
    • N
      md: add proper merge_bvec handling to RAID0 and Linear. · ba13da47
      NeilBrown 提交于
      These personalities currently set a max request size of one page
      when any member device has a merge_bvec_fn because they don't
      bother to call that function.
      
      This causes extra works in splitting and combining requests.
      
      So make the extra effort to call the merge_bvec_fn when it exists
      so that we end up with larger requests out the bottom.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ba13da47
    • N
      md: tidy up rdev_for_each usage. · dafb20fa
      NeilBrown 提交于
      md.h has an 'rdev_for_each()' macro for iterating the rdevs in an
      mddev.  However it uses the 'safe' version of list_for_each_entry,
      and so requires the extra variable, but doesn't include 'safe' in the
      name, which is useful documentation.
      
      Consequently some places use this safe version without needing it, and
      many use an explicity list_for_each entry.
      
      So:
       - rename rdev_for_each to rdev_for_each_safe
       - create a new rdev_for_each which uses the plain
         list_for_each_entry,
       - use the 'safe' version only where needed, and convert all other
         list_for_each_entry calls to use rdev_for_each.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      dafb20fa
    • N
      md/raid1,raid10: avoid deadlock during resync/recovery. · d6b42dcb
      NeilBrown 提交于
      If RAID1 or RAID10 is used under LVM or some other stacking
      block device, it is possible to enter a deadlock during
      resync or recovery.
      This can happen if the upper level block device creates
      two requests to the RAID1 or RAID10.  The first request gets
      processed, blocks recovery and queue requests for underlying
      requests in current->bio_list.  A resync request then starts
      which will wait for those requests and block new IO.
      
      But then the second request to the RAID1/10 will be attempted
      and it cannot progress until the resync request completes,
      which cannot progress until the underlying device requests complete,
      which are on a queue behind that second request.
      
      So allow that second request to proceed even though there is
      a resync request about to start.
      
      This is suitable for any -stable kernel.
      
      Cc: stable@vger.kernel.org
      Reported-by: NRay Morris <support@bettercgi.com>
      Tested-by: NRay Morris <support@bettercgi.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d6b42dcb
    • N
      md/bitmap: ensure to load bitmap when creating via sysfs. · 4474ca42
      NeilBrown 提交于
      When commit 69e51b44 (md/bitmap:  separate out loading a bitmap...)
      created bitmap_load, it missed calling it after bitmap_create when a
      bitmap is created through the sysfs interface.
      So if a bitmap is added this way, we don't allocate memory properly
      and can crash.
      
      This is suitable for any -stable release since 2.6.35.
      Cc: stable@vger.kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      4474ca42
    • N
      md: don't set md arrays to readonly on shutdown. · c744a65c
      NeilBrown 提交于
      It seems that with recent kernel, writeback can still be happening
      while shutdown is happening, and consequently data can be written
      after the md reboot notifier switches all arrays to read-only.
      This causes a BUG.
      
      So don't switch them to read-only - just mark them clean and
      set 'safemode' to '2' which mean that immediately after any
      write the array will be switch back to 'clean'.
      
      This could result in the shutdown happening when array is marked
      dirty, thus forcing a resync on reboot.  However if you reboot
      without performing a "sync" first, you get to keep both halves.
      
      This is suitable for any stable kernel (though there might be some
      conflicts with obvious fixes in earlier kernels).
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c744a65c
    • N
      md: allow re-add to failed arrays. · dc10c643
      NeilBrown 提交于
      When an array is failed (some data inaccessible) then there is no
      point attempting to add a spare as it could not possibly be recovered.
      
      However that may be value in re-adding a recently removed device.
      e.g. if there is a write-intent-bitmap and it is clear, then access
      to the data could be restored by this action.
      
      So don't reject a re-add to a failed array for RAID10 and RAID5 (the
      only arrays  types that check for a failed array).
      Signed-off-by: NNeilBrown <neilb@suse.de>
      dc10c643
  2. 13 3月, 2012 4 次提交
  3. 08 3月, 2012 8 次提交
    • J
      dm raid: fix flush support · 0ca93de9
      Jonathan E Brassow 提交于
      Fix dm-raid flush support.
      
      Both md and dm have support for flush, but the dm-raid target
      forgot to set the flag to indicate that flushes should be
      passed on.  (Important for data integrity e.g. with writeback cache
      enabled.)
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      0ca93de9
    • J
      dm raid: set MD_CHANGE_DEVS when rebuilding · 3aa3b2b2
      Jonathan E Brassow 提交于
      The 'rebuild' parameter is used to rebuild individual devices in an
      array (e.g. resynchronize a RAID1 device or recalculate a parity device
      in higher RAID).  The MD_CHANGE_DEVS flag must be set when this
      parameter is given in order to write out the superblocks and make the
      change take immediate effect.  The code that handles new devices in
      super_load already sets MD_CHANGE_DEVS and 'FirstUse'.  (The 'FirstUse'
      flag was being set as a special case for rebuilds in
      super_init_validation.)
      
      Add a condition for rebuilds in super_load to take care of both flags
      without the special case in 'super_init_validation'.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      3aa3b2b2
    • J
      dm thin metadata: decrement counter after removing mapped block · af63bcb8
      Joe Thornber 提交于
      Correct the number of mapped sectors shown on a thin device's
      status line by decrementing td->mapped_blocks in __remove() each time
      a block is removed.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      af63bcb8
    • J
      dm thin metadata: unlock superblock in init_pmd error path · 4469a5f3
      Joe Thornber 提交于
      If dm_sm_disk_create() fails the superblock must be unlocked.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      4469a5f3
    • M
      dm thin metadata: remove incorrect close_device on creation error paths · 1f3db25d
      Mike Snitzer 提交于
      The __open_device() error paths in __create_thin() and __create_snap()
      incorrectly call __close_device() even if td was not initialized by
      __open_device().  Remove this.
      
      Also document __open_device() return values, remove a redundant
      td->changed = 1 in __create_thin(), and insert an additional
      safeguard against creating an already-existing device.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      1f3db25d
    • M
      dm flakey: fix crash on read when corrupt_bio_byte not set · 1212268f
      Mike Snitzer 提交于
      The following BUG is hit on the first read that is submitted to a dm
      flakey test device while the device is "down" if the corrupt_bio_byte
      feature wasn't requested when the device's table was loaded.
      
      Example DM table that will hit this BUG:
      0 2097152 flakey 8:0 2048 0 30
      
      This bug was introduced by commit a3998799
      (dm flakey: add corrupt_bio_byte feature) in v3.1-rc1.
      
      BUG: unable to handle kernel paging request at ffff8801cfce3fff
      IP: [<ffffffffa008c233>] corrupt_bio_data+0x6e/0xae [dm_flakey]
      PGD 1606063 PUD 0
      Oops: 0002 [#1] SMP
      ...
      Call Trace:
       <IRQ>
       [<ffffffffa008c2b5>] flakey_end_io+0x42/0x48 [dm_flakey]
       [<ffffffffa00dca98>] clone_endio+0x54/0xb6 [dm_mod]
       [<ffffffff81130587>] bio_endio+0x2d/0x2f
       [<ffffffff811c819a>] req_bio_endio+0x96/0x9f
       [<ffffffff811c94b9>] blk_update_request+0x1dc/0x3a9
       [<ffffffff812f5ee2>] ? rcu_read_unlock+0x21/0x23
       [<ffffffff811c96a6>] blk_update_bidi_request+0x20/0x6e
       [<ffffffff811c9713>] blk_end_bidi_request+0x1f/0x5d
       [<ffffffff811c978d>] blk_end_request+0x10/0x12
       [<ffffffff8128f450>] scsi_io_completion+0x1e5/0x4b1
       [<ffffffff812882a9>] scsi_finish_command+0xec/0xf5
       [<ffffffff8128f830>] scsi_softirq_done+0xff/0x108
       [<ffffffff811ce284>] blk_done_softirq+0x84/0x98
       [<ffffffff81048d19>] __do_softirq+0xe3/0x1d5
       [<ffffffff8138f83f>] ? _raw_spin_lock+0x62/0x69
       [<ffffffff810997cf>] ? handle_irq_event+0x4c/0x61
       [<ffffffff8139833c>] call_softirq+0x1c/0x30
       [<ffffffff81003b37>] do_softirq+0x4b/0xa3
       [<ffffffff81048a39>] irq_exit+0x53/0xca
       [<ffffffff81398acd>] do_IRQ+0x9d/0xb4
       [<ffffffff81390333>] common_interrupt+0x73/0x73
      ...
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.1+
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      1212268f
    • M
      dm io: fix discard support · 0c535e0d
      Milan Broz 提交于
      This patch fixes a crash by recognising discards in dm_io.
      
      Currently dm_mirror can send REQ_DISCARD bios if running over a
      discard-enabled device and without support in dm_io the system
      crashes badly.
      
      BUG: unable to handle kernel paging request at 00800000
      IP:  __bio_add_page.part.17+0xf5/0x1e0
      ...
       bio_add_page+0x56/0x70
       dispatch_io+0x1cf/0x240 [dm_mod]
       ? km_get_page+0x50/0x50 [dm_mod]
       ? vm_next_page+0x20/0x20 [dm_mod]
       ? mirror_flush+0x130/0x130 [dm_mirror]
       dm_io+0xdc/0x2b0 [dm_mod]
      ...
      
      Introduced in 2.6.38-rc1 by commit 5fc2ffea
      (dm raid1: support discard).
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Cc: stable@kernel.org
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      0c535e0d
    • J
      dm ioctl: do not leak argv if target message only contains whitespace · 902c6a96
      Jesper Juhl 提交于
      If 'argc' is zero we jump to the 'out:' label, but this leaks the
      (unused) memory that 'dm_split_args()' allocated for 'argv' if the
      string being split consisted entirely of whitespace.  Jump to the
      'out_argv:' label instead to free up that memory.
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Cc: stable@kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      902c6a96
  4. 06 3月, 2012 1 次提交
  5. 14 2月, 2012 1 次提交
    • N
      md/raid10: fix handling of error on last working device in array. · fae8cc5e
      NeilBrown 提交于
      If we get a read error on the last working device in a RAID10 which
      contains the target block, then we don't fail the device (which is
      good) but we don't abort retries, which is wrong.
      We end up in an infinite loop retrying the read on the one device.
      
      This patch fixes the problem in two places:
      1/ in raid10_end_read_request we don't even ask for a retry if this
         was the last usable device.  This is efficient but a little racy
         and will sometimes retry when it should not.
      
      2/ in handle_read_error we are careful to exclude any device from
         retry which we tried to mark as faulty (that might have failed if
         it was the last device).  This is race-free but less efficient.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      fae8cc5e
  6. 13 2月, 2012 1 次提交
  7. 07 2月, 2012 1 次提交
    • N
      md: two small fixes to handling interrupt resync. · db91ff55
      NeilBrown 提交于
      1/ If a resync is aborted we should record how far we got
       (recovery_cp) the last request that we know has completed
       (->curr_resync_completed) rather than the last request that was
       submitted (->curr_resync).
      
      2/ When a resync aborts we still want to update the metadata with
       any changes, so set MD_CHANGE_DEVS even if we 'skip'.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      db91ff55
  8. 31 1月, 2012 1 次提交
    • J
      Prevent DM RAID from loading bitmap twice. · 34f8ac6d
      Jonathan Brassow 提交于
      The life cycle of a device-mapper target is:
      1) create
      2) resume
      3) suspend
      *) possibly repeat from 2
      4) destroy
      
      The dm-raid target is unconditionally calling MD's bitmap_load function upon
      every resume.  If steps 2 & 3 above are repeated, bitmap_load is called
      multiple times.  It is only written to be called once; otherwise, it allocates
      new memory for the bitmap (without freeing the old) and incrementing the number
      of pages it thinks it has without zeroing first.  This ultimately leads to
      access beyond allocated memory and lost memory.
      
      Simply avoiding the bitmap_load call upon resume is not sufficient.  If the
      target was suspended while the initial recovery was only partially complete,
      it needs to be restarted when the target is resumed.  This is why
      'md_wakeup_thread' is called before issuing the 'mddev_resume'.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      34f8ac6d
  9. 15 1月, 2012 1 次提交
  10. 11 1月, 2012 3 次提交
  11. 04 1月, 2012 1 次提交
  12. 23 12月, 2011 6 次提交