1. 09 1月, 2009 8 次提交
    • N
      md: don't retry recovery of raid1 that fails due to error on source drive. · 4044ba58
      NeilBrown 提交于
      If a raid1 has only one working drive and it has a sector which
      gives an error on read, then an attempt to recover onto a spare will
      fail, but as the single remaining drive is not removed from the
      array, the recovery will be immediately re-attempted, resulting
      in an infinite recovery loop.
      
      So detect this situation and don't retry recovery once an error
      on the lone remaining drive is detected.
      
      Allow recovery to be retried once every time a spare is added
      in case the problem wasn't actually a media error.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      4044ba58
    • N
      md: Allow md devices to be created by name. · efeb53c0
      NeilBrown 提交于
      Using sequential numbers to identify md devices is somewhat artificial.
      Using names can be a lot more user-friendly.
      
      Also, creating md devices by opening the device special file is a bit
      awkward.
      
      So this patch provides a new option for creating and naming devices.
      
      Writing a name such as "md_home" to
          /sys/modules/md_mod/parameters/new_array
      will cause an array with that name to be created.  It will appear in
      /sys/block/ /proc/partitions and /proc/mdstat as 'md_home'.
      It will have an arbitrary minor number allocated.
      
      md devices that a created by an open are destroyed on the last
      close when the device is inactive.
      For named md devices, they will not be destroyed until the array
      is explicitly stopped, either with the STOP_ARRAY ioctl or by
      writing 'clear' to /sys/block/md_XXXX/md/array_state.
      
      The name of the array must start 'md_' to avoid conflict with
      other devices.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      efeb53c0
    • N
      md: make devices disappear when they are no longer needed. · d3374825
      NeilBrown 提交于
      Currently md devices, once created, never disappear until the module
      is unloaded.  This is essentially because the gendisk holds a
      reference to the mddev, and the mddev holds a reference to the
      gendisk, this a circular reference.
      
      If we drop the reference from mddev to gendisk, then we need to ensure
      that the mddev is destroyed when the gendisk is destroyed.  However it
      is not possible to hook into the gendisk destruction process to enable
      this.
      
      So we drop the reference from the gendisk to the mddev and destroy the
      gendisk when the mddev gets destroyed.  However this has a
      complication.
      Between the call
         __blkdev_get->get_gendisk->kobj_lookup->md_probe
      and the call
         __blkdev_get->md_open
      
      there is no obvious way to hold a reference on the mddev any more, so
      unless something is done, it will disappear and gendisk will be
      destroyed prematurely.
      
      Also, once we decide to destroy the mddev, there will be an unlockable
      moment before the gendisk is unlinked (blk_unregister_region) during
      which a new reference to the gendisk can be created.  We need to
      ensure that this reference can not be used.  i.e. the ->open must
      fail.
      
      So:
       1/  in md_probe we set a flag in the mddev (hold_active) which
           indicates that the array should be treated as active, even
           though there are no references, and no appearance of activity.
           This is cleared by md_release when the device is closed if it
           is no longer needed.
           This ensures that the gendisk will survive between md_probe and
           md_open.
      
       2/  In md_open we check if the mddev we expect to open matches
           the gendisk that we did open.
           If there is a mismatch we return -ERESTARTSYS and modify
           __blkdev_get to retry from the top in that case.
           In the -ERESTARTSYS sys case we make sure to wait until
           the old gendisk (that we succeeded in opening) is really gone so
           we loop at most once.
      
      Some udev configurations will always open an md device when it first
      appears.   If we allow an md device that was just created by an open
      to disappear on an immediate close, then this can race with such udev
      configurations and result in an infinite loop the device being opened
      and closed, then re-open due to the 'ADD' even from the first open,
      and then close and so on.
      So we make sure an md device, once created by an open, remains active
      at least until some md 'ioctl' has been made on it.  This means that
      all normal usage of md devices will allow them to disappear promptly
      when not needed, but the worst that an incorrect usage will do it
      cause an inactive md device to be left in existence (it can easily be
      removed).
      
      As an array can be stopped by writing to a sysfs attribute
        echo clear > /sys/block/mdXXX/md/array_state
      we need to use scheduled work for deleting the gendisk and other
      kobjects.  This allows us to wait for any pending gendisk deletion to
      complete by simply calling flush_scheduled_work().
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d3374825
    • N
      md: centralise all freeing of an 'mddev' in 'md_free' · a21d1504
      NeilBrown 提交于
      md_free is the .release handler for the md kobj_type.
      So it makes sense to release all the objects referenced by
      the mddev in there, rather than just prior to calling kobject_put
      for what we think is the last time.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a21d1504
    • N
      md: move allocation of ->queue from mddev_find to md_probe · 8b765398
      NeilBrown 提交于
      It is more balanced to just do simple initialisation in mddev_find,
      which allocates and links a new md device, and leave all the
      more sophisticated allocation to md_probe (which calls mddev_find).
      md_probe already allocated the gendisk.  It should allocate the
      queue too.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      8b765398
    • C
      md: need another print_sb for mdp_superblock_1 · cd2ac932
      Cheng Renquan 提交于
      md_print_devices is called in two code path: MD_BUG(...), and md_ioctl
      with PRINT_RAID_DEBUG.  it will dump out all in use md devices
      information;
      
      However, it wrongly processed two types of superblock in one:
      
      The header file <linux/raid/md_p.h> has defined two types of superblock,
      struct mdp_superblock_s (typedefed with mdp_super_t) according to md with
      metadata 0.90, and struct mdp_superblock_1 according to md with metadata
      1.0 and later,
      
      These two types of superblock are very different,
      
      The md_print_devices code processed them both in mdp_super_t, that would
      lead to wrong informaton dump like:
      
      	[ 6742.345877]
      	[ 6742.345887] md:	**********************************
      	[ 6742.345890] md:	* <COMPLETE RAID STATE PRINTOUT> *
      	[ 6742.345892] md:	**********************************
      	[ 6742.345896] md1: <ram7><ram6><ram5><ram4>
      	[ 6742.345907] md: rdev ram7, SZ:00065472 F:0 S:1 DN:3
      	[ 6742.345909] md: rdev superblock:
      	[ 6742.345914] md:  SB: (V:0.90.0) ID:<42ef13c7.598c059a.5f9f1645.801e9ee6> CT:4919856d
      	[ 6742.345918] md:     L5 S00065472 ND:4 RD:4 md1 LO:2 CS:65536
      	[ 6742.345922] md:     UT:4919856d ST:1 AD:4 WD:4 FD:0 SD:0 CSUM:b7992907 E:00000001
      	[ 6742.345924]      D  0:  DISK<N:0,(1,8),R:0,S:6>
      	[ 6742.345930]      D  1:  DISK<N:1,(1,10),R:1,S:6>
      	[ 6742.345933]      D  2:  DISK<N:2,(1,12),R:2,S:6>
      	[ 6742.345937]      D  3:  DISK<N:3,(1,14),R:3,S:6>
      	[ 6742.345942] md:     THIS:  DISK<N:3,(1,14),R:3,S:6>
      	...
      	[ 6742.346058] md0: <ram3><ram2><ram1><ram0>
      	[ 6742.346067] md: rdev ram3, SZ:00065472 F:0 S:1 DN:3
      	[ 6742.346070] md: rdev superblock:
      	[ 6742.346073] md:  SB: (V:1.0.0) ID:<369aad81.00000000.00000000.00000000> CT:9a322a9c
      	[ 6742.346077] md:     L-1507699579 S976570180 ND:48 RD:0 md0 LO:65536 CS:196610
      	[ 6742.346081] md:     UT:00000018 ST:0 AD:131048 WD:0 FD:8 SD:0 CSUM:00000000 E:00000000
      	[ 6742.346084]      D  0:  DISK<N:-1,(-1,-1),R:-1,S:-1>
      	[ 6742.346089]      D  1:  DISK<N:-1,(-1,-1),R:-1,S:-1>
      	[ 6742.346092]      D  2:  DISK<N:-1,(-1,-1),R:-1,S:-1>
      	[ 6742.346096]      D  3:  DISK<N:-1,(-1,-1),R:-1,S:-1>
      	[ 6742.346102] md:     THIS:  DISK<N:0,(0,0),R:0,S:0>
      	...
      	[ 6742.346219] md:	**********************************
      	[ 6742.346221]
      
      Here md1 is metadata 0.90.0, and md0 is metadata 1.2
      
      After some more code to distinguish these two types of superblock, in this patch,
      
      it will generate dump information like:
      
      	[ 7906.755790]
      	[ 7906.755799] md:	**********************************
      	[ 7906.755802] md:	* <COMPLETE RAID STATE PRINTOUT> *
      	[ 7906.755804] md:	**********************************
      	[ 7906.755808] md1: <ram7><ram6><ram5><ram4>
      	[ 7906.755819] md: rdev ram7, SZ:00065472 F:0 S:1 DN:3
      	[ 7906.755821] md: rdev superblock (MJ:0):
      	[ 7906.755826] md:  SB: (V:0.90.0) ID:<3fca7a0d.a612bfed.5f9f1645.801e9ee6> CT:491989f3
      	[ 7906.755830] md:     L5 S00065472 ND:4 RD:4 md1 LO:2 CS:65536
      	[ 7906.755834] md:     UT:491989f3 ST:1 AD:4 WD:4 FD:0 SD:0 CSUM:00fb52ad E:00000001
      	[ 7906.755836]      D  0:  DISK<N:0,(1,8),R:0,S:6>
      	[ 7906.755842]      D  1:  DISK<N:1,(1,10),R:1,S:6>
      	[ 7906.755845]      D  2:  DISK<N:2,(1,12),R:2,S:6>
      	[ 7906.755849]      D  3:  DISK<N:3,(1,14),R:3,S:6>
      	[ 7906.755855] md:     THIS:  DISK<N:3,(1,14),R:3,S:6>
      	...
      	[ 7906.755972] md0: <ram3><ram2><ram1><ram0>
      	[ 7906.755981] md: rdev ram3, SZ:00065472 F:0 S:1 DN:3
      	[ 7906.755984] md: rdev superblock (MJ:1):
      	[ 7906.755989] md:  SB: (V:1) (F:0) Array-ID:<5fbcf158:55aa:5fbe:9a79:1e939880dcbd>
      	[ 7906.755990] md:    Name: "DG5:0" CT:1226410480
      	[ 7906.755998] md:       L5 SZ130944 RD:4 LO:2 CS:128 DO:24 DS:131048 SO:8 RO:0
      	[ 7906.755999] md:     Dev:00000003 UUID: 9194d744:87f7:a448:85f2:7497b84ce30a
      	[ 7906.756001] md:       (F:0) UT:1226410480 Events:0 ResyncOffset:-1 CSUM:0dbcd829
      	[ 7906.756003] md:         (MaxDev:384)
      	...
      	[ 7906.756113] md:	**********************************
      	[ 7906.756116]
      
      this md0 (metadata 1.2) information dumping is exactly according to struct
      mdp_superblock_1.
      Signed-off-by: NCheng Renquan <crquan@gmail.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Dan Williams <dan.j.williams@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      cd2ac932
    • C
      md: use list_for_each_entry macro directly · 159ec1fc
      Cheng Renquan 提交于
      The rdev_for_each macro defined in <linux/raid/md_k.h> is identical to
      list_for_each_entry_safe, from <linux/list.h>, it should be defined to
      use list_for_each_entry_safe, instead of reinventing the wheel.
      
      But some calls to each_entry_safe don't really need a safe version,
      just a direct list_for_each_entry is enough, this could save a temp
      variable (tmp) in every function that used rdev_for_each.
      
      In this patch, most rdev_for_each loops are replaced by list_for_each_entry,
      totally save many tmp vars; and only in the other situations that will call
      list_del to delete an entry, the safe version is used.
      Signed-off-by: NCheng Renquan <crquan@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      159ec1fc
    • N
      md: use sysfs_notify_dirent to notify changes to md/sync_action. · 0c3573f1
      NeilBrown 提交于
      There is no compelling need for this, but sysfs_notify_dirent is a
      nicer interface and the change is good for consistency.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      0c3573f1
  2. 06 11月, 2008 1 次提交
    • N
      md: revert the recent addition of a call to the BLKRRPART ioctl. · cb3ac42b
      NeilBrown 提交于
      It turns out that it is only safe to call blkdev_ioctl when the device
      is actually open (as ->bd_disk is set to NULL on last close).  And it
      is quite possible for do_md_stop to be called when the device is not
      open.  So discard the call to blkdev_ioctl(BLKRRPART) which was
      added in
         commit 934d9c23
      
      It is just as easy to call this ioctl from userspace when needed (on
      mdadm -S) so leave it out of the kernel
      Signed-off-by: NNeilBrown <neilb@suse.de>
      cb3ac42b
  3. 28 10月, 2008 1 次提交
    • N
      md: destroy partitions and notify udev when md array is stopped. · 934d9c23
      NeilBrown 提交于
      md arrays are not currently destroyed when they are stopped - they
      remain in /sys/block.  Last time I tried this I tripped over locking
      too much.
      
      A consequence of this is that udev doesn't remove anything from /dev.
      This is rather ugly.
      
      As an interim measure until proper device removal can be achieved,
      make sure all partitions are removed using the BLKRRPART ioctl, and
      send a KOBJ_CHANGE when an md array is stopped.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      934d9c23
  4. 21 10月, 2008 6 次提交
    • A
      [PATCH] pass fmode_t to blkdev_put() · 9a1c3542
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9a1c3542
    • A
      [PATCH] switch md · a39907fa
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a39907fa
    • A
      [PATCH] beginning of methods conversion · d4430d62
      Al Viro 提交于
      To keep the size of changesets sane we split the switch by drivers;
      to keep the damn thing bisectable we do the following:
      	1) rename the affected methods, add ones with correct
      prototypes, make (few) callers handle both.  That's this changeset.
      	2) for each driver convert to new methods.  *ALL* drivers
      are converted in this series.
      	3) kill the old (renamed) methods.
      
      Note that it _is_ a flagday; all in-tree drivers are converted and by the
      end of this series no trace of old methods remain.  The only reason why
      we do that this way is to keep the damn thing bisectable and allow per-driver
      debugging if anything goes wrong.
      
      New methods:
      	open(bdev, mode)
      	release(disk, mode)
      	ioctl(bdev, mode, cmd, arg)		/* Called without BKL */
      	compat_ioctl(bdev, mode, cmd, arg)
      	locked_ioctl(bdev, mode, cmd, arg)	/* Called with BKL, legacy */
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d4430d62
    • N
      md: allow extended partitions on md devices. · 92850bbd
      NeilBrown 提交于
      The new extended partition support provides a much nicer was
      to have partitions on md devices that the 'mdp' alternate major.
      We cannot really get rid of 'mdp' at this time, but we can
      enable extended partitions as that will probably make life
      easier for sysadmins.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      92850bbd
    • N
      md: use sysfs_notify_dirent to notify changes to md/dev-xxx/state · 3c0ee63a
      NeilBrown 提交于
      The 'state' file for a device reports, for example, when the device
      has failed.  Changes should be reported to userspace ASAP without
      the possibility of blocking on low-memory.  sysfs_notify does
      have that possibility (as it takes a mutex which can be held
      across a kmalloc) so use sysfs_notify_dirent instead.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      3c0ee63a
    • N
      md: use sysfs_notify_dirent to notify changes to md/array_state · b62b7590
      NeilBrown 提交于
      Now that we have sysfs_notify_dirent, use it to notify changes
      to md/array_state.
      As sysfs_notify_dirent can be called in atomic context, we can
      remove the delayed notify and the MD_NOTIFY_ARRAY_STATE flag.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b62b7590
  5. 16 10月, 2008 2 次提交
  6. 15 10月, 2008 1 次提交
    • S
      md: build failure due to missing delay.h · 25570727
      Stephen Rothwell 提交于
      Today's linux-next build (powerpc ppc64_defconfig) failed like this:
      
      drivers/md/raid1.c: In function 'sync_request':
      drivers/md/raid1.c:1759: error: implicit declaration of function 'msleep_interruptible'
      make[3]: *** [drivers/md/raid1.o] Error 1
      make[3]: *** Waiting for unfinished jobs....
      drivers/md/raid10.c: In function 'sync_request':
      drivers/md/raid10.c:1749: error: implicit declaration of function 'msleep_interruptible'
      make[3]: *** [drivers/md/raid10.o] Error 1
      drivers/md/md.c: In function 'md_do_sync':
      drivers/md/md.c:5915: error: implicit declaration of function 'msleep'
      
      Caused by commit 6caa3b0bbdb474647f6bdd8a958ffc46f78d8d58 ("md: Remove
      unnecessary #includes, #defines, and function declarations").  I added
      the following patch.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      25570727
  7. 13 10月, 2008 6 次提交
  8. 09 10月, 2008 3 次提交
    • T
      block: move stats from disk to part0 · 074a7aca
      Tejun Heo 提交于
      Move stats related fields - stamp, in_flight, dkstats - from disk to
      part0 and unify stat handling such that...
      
      * part_stat_*() now updates part0 together if the specified partition
        is not part0.  ie. part_stat_*() are now essentially all_stat_*().
      
      * {disk|all}_stat_*() are gone.
      
      * part_round_stats() is updated similary.  It handles part0 stats
        automatically and disk_round_stats() is killed.
      
      * part_{inc|dec}_in_fligh() is implemented which automatically updates
        part0 stats for parts other than part0.
      
      * disk_map_sector_rcu() is updated to return part0 if no part matches.
        Combined with the above changes, this makes NULL special case
        handling in callers unnecessary.
      
      * Separate stats show code paths for disk are collapsed into part
        stats show code paths.
      
      * Rename disk_stat_lock/unlock() to part_stat_lock/unlock()
      
      While at it, reposition stat handling macros a bit and add missing
      parentheses around macro parameters.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      074a7aca
    • T
      block: always set bdev->bd_part · 0762b8bd
      Tejun Heo 提交于
      Till now, bdev->bd_part is set only if the bdev was for parts other
      than part0.  This patch makes bdev->bd_part always set so that code
      paths don't have to differenciate common handling.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      0762b8bd
    • T
      block: implement and use {disk|part}_to_dev() · ed9e1982
      Tejun Heo 提交于
      Implement {disk|part}_to_dev() and use them to access generic device
      instead of directly dereferencing {disk|part}->dev.  To make sure no
      user is left behind, rename generic devices fields to __dev.
      
      This is in preparation of unifying partition 0 handling with other
      partitions.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      ed9e1982
  9. 19 9月, 2008 1 次提交
    • N
      md: Don't wait UNINTERRUPTIBLE for other resync to finish · 9744197c
      NeilBrown 提交于
      When two md arrays share some block device (e.g each uses different
      partitions on the one device), a resync of one array will wait for
      the resync on the other to finish.
      
      This can be a long time and as it currently waits TASK_UNINTERRUPTIBLE,
      the softlockup code notices and complains.
      
      So use TASK_INTERRUPTIBLE instead and make sure to flush signals
      before calling schedule.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      9744197c
  10. 01 9月, 2008 1 次提交
    • N
      Remove invalidate_partition call from do_md_stop. · 271f5a9b
      NeilBrown 提交于
      When stopping an md array, or just switching to read-only, we
      currently call invalidate_partition while holding the mddev lock.
      The main reason for this is probably to ensure all dirty buffers
      are flushed (invalidate_partition calls fsync_bdev).
      
      However if any dirty buffers are found, it will almost certainly cause
      a deadlock as starting writeout will require an update to the
      superblock, and performing that updates requires taking the mddev
      lock - which is already held.
      
      This deadlock can be demonstrated by running "reboot -f -n" with
      a root filesystem on md/raid, and some dirty buffers in memory.
      
      All other calls to stop an array should already happen after a flush.
      The normal sequence is to stop using the array (e.g. umount) which
      will cause __blkdev_put to call sync_blockdev.  Then open the
      array and issue the STOP_ARRAY ioctl while the buffers are all still
      clean.
      
      So this invalidate_partition is normally a no-op, except for one case
      where it will cause a deadlock.
      
      So remove it.
      
      This patch possibly addresses the regression recored in
         http://bugzilla.kernel.org/show_bug.cgi?id=11460
      and
         http://bugzilla.kernel.org/show_bug.cgi?id=11452
      
      though it isn't yet clear how it ever worked.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      271f5a9b
  11. 08 8月, 2008 1 次提交
  12. 05 8月, 2008 4 次提交
    • N
      Allow faulty devices to be removed from a readonly array. · c89a8eee
      NeilBrown 提交于
      Removing faulty devices from an array is a two stage process.
      First the device is moved from being a part of the active array
      to being similar to a spare device.  Then it can be removed
      by a request from user space.
      
      The first step is currently not performed for read-only arrays,
      so the second step can never succeed.
      
      So allow readonly arrays to remove failed devices (which aren't
      blocked).
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c89a8eee
    • N
      Fail safely when trying to grow an array with a write-intent bitmap. · dba034ee
      NeilBrown 提交于
      We cannot currently change the size of a write-intent bitmap.
      So if we change the size of an array which has such a bitmap, it
      tries to set bits beyond the end of the bitmap.
      
      For now, simply reject any request to change the size of an array
      which has a bitmap.  mdadm can remove the bitmap and add a new one
      after the array has changed size.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      dba034ee
    • N
      Restore force switch of md array to readonly at reboot time. · 2b25000b
      NeilBrown 提交于
      A recent patch allowed do_md_stop to know whether it was being called
      via an ioctl or not, and thus where to allow for an extra open file
      descriptor when checking if it is in use.
      This broke then switch to readonly performed by the shutdown notifier,
      which needs to work even when the array is still (apparently) active
      (as md doesn't get told when the filesystem becomes readonly).
      
      So restore this feature by pretending that there can be lots of
      file descriptors open, but we still want do_md_stop to switch to
      readonly.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      2b25000b
    • N
      Make writes to md/safe_mode_delay immediately effective. · 19052c0e
      NeilBrown 提交于
      If we reduce the 'safe_mode_delay', it could still wait for the old
      delay to completely expire before doing anything about safe_mode.
      Thus the effect if the change is delayed.
      
      To make the effect more immediate, run the timeout function
      immediately if the delay was reduced.  This may cause it to run
      slightly earlier that required, but that is the safer option.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      19052c0e
  13. 29 7月, 2008 1 次提交
  14. 24 7月, 2008 1 次提交
  15. 21 7月, 2008 3 次提交
    • N
      md: Protect access to mddev->disks list using RCU · 4b80991c
      NeilBrown 提交于
      All modifications and most access to the mddev->disks list are made
      under the reconfig_mutex lock.  However there are three places where
      the list is walked without any locking.  If a reconfig happens at this
      time, havoc (and oops) can ensue.
      
      So use RCU to protect these accesses:
        - wrap them in rcu_read_{,un}lock()
        - use list_for_each_entry_rcu
        - add to the list with list_add_rcu
        - delete from the list with list_del_rcu
        - delay the 'free' with call_rcu rather than schedule_work
      
      Note that export_rdev did a list_del_init on this list.  In almost all
      cases the entry was not in the list anymore so it was a no-op and so
      safe.  It is no longer safe as after list_del_rcu we may not touch
      the list_head.
      An audit shows that export_rdev is called:
        - after unbind_rdev_from_array, in which case the delete has
           already been done,
        - after bind_rdev_to_array fails, in which case the delete isn't needed.
        - before the device has been put on a list at all (e.g. in
            add_new_disk where reading the superblock fails).
        - and in autorun devices after a failure when the device is on a
            different list.
      
      So remove the list_del_init call from export_rdev, and add it back
      immediately before the called to export_rdev for that last case.
      
      Note also that ->same_set is sometimes used for lists other than
      mddev->list (e.g. candidates).  In these cases rcu is not needed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      4b80991c
    • N
      md: only count actual openers as access which prevent a 'stop' · f2ea68cf
      NeilBrown 提交于
      Open isn't the only thing that increments ->active.  e.g. reading
      /proc/mdstat will increment it briefly.  So to avoid false positives
      in testing for concurrent access, introduce a new counter that counts
      just the number of times the md device it open.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f2ea68cf
    • A
      md: Make mddev->array_size sector-based. · f233ea5c
      Andre Noll 提交于
      This patch renames the array_size field of struct mddev_s to array_sectors
      and converts all instances to use units of 512 byte sectors instead of 1k
      blocks.
      Signed-off-by: NAndre Noll <maan@systemlinux.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f233ea5c