1. 01 7月, 2009 1 次提交
  2. 18 6月, 2009 5 次提交
    • N
      md/linear: use call_rcu to free obsolete 'conf' structures. · 495d3573
      NeilBrown 提交于
      Current, when we update the 'conf' structure, when adding a
      drive to a linear array, we keep the old version around until
      the array is finally stopped, as it is not safe to free it
      immediately.
      
      Now that we have rcu protection on all accesses to 'conf',
      we can use call_rcu to free it more promptly.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      495d3573
    • S
      md linear: Protecting mddev with rcu locks to avoid races · af11c397
      SandeepKsinha 提交于
      
      Due to the lack of memory ordering guarantees, we may have races around
      mddev->conf.
      
      In particular, the correct contents of the structure we get from
      dereferencing ->private might not be visible to this CPU yet, and
      they might not be correct w.r.t mddev->raid_disks.
      
      This patch addresses the problem using rcu protection to avoid
      such race conditions.
      Signed-off-by: NSandeepKsinha <sandeepksinha@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      af11c397
    • A
      md: Move check for bitmap presence to personality code. · 0894cc30
      Andre Noll 提交于
      If the superblock of a component device indicates the presence of a
      bitmap but the corresponding raid personality does not support bitmaps
      (raid0, linear, multipath, faulty), then something is seriously wrong
      and we'd better refuse to run such an array.
      
      Currently, this check is performed while the superblocks are examined,
      i.e. before entering personality code. Therefore the generic md layer
      must know which raid levels support bitmaps and which do not.
      
      This patch avoids this layer violation without adding identical code
      to various personalities. This is accomplished by introducing a new
      public function to md.c, md_check_no_bitmap(), which replaces the
      hard-coded checks in the superblock loading functions.
      
      A call to md_check_no_bitmap() is added to the ->run method of each
      personality which does not support bitmaps and assembly is aborted
      if at least one component device contains a bitmap.
      Signed-off-by: NAndre Noll <maan@systemlinux.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      0894cc30
    • N
      md: raid0/linear: ensure device sizes are rounded to chunk size. · 13f2682b
      NeilBrown 提交于
      This is currently ensured by common code, but it is more reliable to
      ensure it where it is needed in personality code.
      All the other personalities that care already round the size to
      the chunk_size.  raid0 and linear are the only hold-outs.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      13f2682b
    • A
      md: Make mddev->chunk_size sector-based. · 9d8f0363
      Andre Noll 提交于
      This patch renames the chunk_size field to chunk_sectors with the
      implied change of semantics.  Since
      
      	is_power_of_2(chunk_size) = is_power_of_2(chunk_sectors << 9)
      				  = is_power_of_2(chunk_sectors)
      
      these bits don't need an adjustment for the shift.
      Signed-off-by: NAndre Noll <maan@systemlinux.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      9d8f0363
  3. 16 6月, 2009 4 次提交
  4. 23 5月, 2009 1 次提交
  5. 31 3月, 2009 6 次提交
  6. 06 2月, 2009 1 次提交
    • A
      md: Fix a bug in linear.c causing which_dev() to return the wrong device. · 852c8bf4
      Andre Noll 提交于
      ab5bd5cb introduced the following
      bug in linear software raid for large arrays on 32 bit machines:
      
      which_dev() computes the device holding a given sector by shifting
      down the sector number to a 32 bit range, dividing by the array
      spacing and looking up the resulting index in the hash table of
      the array.
      
      Because the computed index might be slightly too small, a loop at
      the end of which_dev() increases the index until the given sector
      actually falls into the range of the device associated with that index.
      
      The changes of the above mentioned commit caused this loop to check
      whether the _index_ rather than the sector number is small enough,
      effectively bypassing the loop and thus possibly returning the wrong
      device.
      
      As reported by Simon Kirby, this leads to errors such as
      
      	linear_make_request: Sector 2340486136 out of bounds on dev sdi: 156301312 sectors, offset 2109870464
      
      Fix this bug by introducing a local variable for the index so that
      the variable containing the passed sector is left unchanged.
      
      Cc: stable@kernel.org
      Signed-off-by: NAndre Noll <maan@systemlinux.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      852c8bf4
  7. 09 1月, 2009 1 次提交
    • C
      md: use list_for_each_entry macro directly · 159ec1fc
      Cheng Renquan 提交于
      The rdev_for_each macro defined in <linux/raid/md_k.h> is identical to
      list_for_each_entry_safe, from <linux/list.h>, it should be defined to
      use list_for_each_entry_safe, instead of reinventing the wheel.
      
      But some calls to each_entry_safe don't really need a safe version,
      just a direct list_for_each_entry is enough, this could save a temp
      variable (tmp) in every function that used rdev_for_each.
      
      In this patch, most rdev_for_each loops are replaced by list_for_each_entry,
      totally save many tmp vars; and only in the other situations that will call
      list_del to delete an entry, the safe version is used.
      Signed-off-by: NCheng Renquan <crquan@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      159ec1fc
  8. 06 11月, 2008 1 次提交
    • A
      md: linear: Fix a division by zero bug for very small arrays. · f1cd14ae
      Andre Noll 提交于
      We currently oops with a divide error on starting a linear software
      raid array consisting of at least two very small (< 500K) devices.
      
      The bug is caused by the calculation of the hash table size which
      tries to compute sector_div(sz, base) with "base" being zero due to
      the small size of the component devices of the array.
      
      Fix this by requiring the hash spacing to be at least one which
      implies that also "base" is non-zero.
      
      This bug has existed since about 2.6.14.
      
      Cc: stable@kernel.org
      Signed-off-by: NAndre Noll <maan@systemlinux.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f1cd14ae
  9. 13 10月, 2008 7 次提交
  10. 09 10月, 2008 3 次提交
    • D
      block: mark bio_split_pool static · 6feef531
      Denis ChengRq 提交于
      Since all bio_split calls refer the same single bio_split_pool, the bio_split
      function can use bio_split_pool directly instead of the mempool_t parameter;
      
      then the mempool_t parameter can be removed from bio_split param list, and
      bio_split_pool is only referred in fs/bio.c file, can be marked static.
      Signed-off-by: NDenis ChengRq <crquan@gmail.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      6feef531
    • T
      block: move stats from disk to part0 · 074a7aca
      Tejun Heo 提交于
      Move stats related fields - stamp, in_flight, dkstats - from disk to
      part0 and unify stat handling such that...
      
      * part_stat_*() now updates part0 together if the specified partition
        is not part0.  ie. part_stat_*() are now essentially all_stat_*().
      
      * {disk|all}_stat_*() are gone.
      
      * part_round_stats() is updated similary.  It handles part0 stats
        automatically and disk_round_stats() is killed.
      
      * part_{inc|dec}_in_fligh() is implemented which automatically updates
        part0 stats for parts other than part0.
      
      * disk_map_sector_rcu() is updated to return part0 if no part matches.
        Combined with the above changes, this makes NULL special case
        handling in callers unnecessary.
      
      * Separate stats show code paths for disk are collapsed into part
        stats show code paths.
      
      * Rename disk_stat_lock/unlock() to part_stat_lock/unlock()
      
      While at it, reposition stat handling macros a bit and add missing
      parentheses around macro parameters.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      074a7aca
    • T
      block: fix diskstats access · c9959059
      Tejun Heo 提交于
      There are two variants of stat functions - ones prefixed with double
      underbars which don't care about preemption and ones without which
      disable preemption before manipulating per-cpu counters.  It's unclear
      whether the underbarred ones assume that preemtion is disabled on
      entry as some callers don't do that.
      
      This patch unifies diskstats access by implementing disk_stat_lock()
      and disk_stat_unlock() which take care of both RCU (for partition
      access) and preemption (for per-cpu counter access).  diskstats access
      should always be enclosed between the two functions.  As such, there's
      no need for the versions which disables preemption.  They're removed
      and double underbars ones are renamed to drop the underbars.  As an
      extra argument is added, there's no danger of using the old version
      unconverted.
      
      disk_stat_lock() uses get_cpu() and returns the cpu index and all
      diskstat functions which access per-cpu counters now has @cpu
      argument to help RT.
      
      This change adds RCU or preemption operations at some places but also
      collapses several preemption ops into one at others.  Overall, the
      performance difference should be negligible as all involved ops are
      very lightweight per-cpu ones.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      c9959059
  11. 21 7月, 2008 2 次提交
  12. 03 7月, 2008 1 次提交
    • A
      Add bvec_merge_data to handle stacked devices and ->merge_bvec() · cc371e66
      Alasdair G Kergon 提交于
      When devices are stacked, one device's merge_bvec_fn may need to perform
      the mapping and then call one or more functions for its underlying devices.
      
      The following bio fields are used:
        bio->bi_sector
        bio->bi_bdev
        bio->bi_size
        bio->bi_rw  using bio_data_dir()
      
      This patch creates a new struct bvec_merge_data holding a copy of those
      fields to avoid having to change them directly in the struct bio when
      going down the stack only to have to change them back again on the way
      back up.  (And then when the bio gets mapped for real, the whole
      exercise gets repeated, but that's a problem for another day...)
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Milan Broz <mbroz@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      cc371e66
  13. 28 6月, 2008 1 次提交
  14. 15 5月, 2008 1 次提交
    • N
      Remove blkdev warning triggered by using md · e7e72bf6
      Neil Brown 提交于
      As setting and clearing queue flags now requires that we hold a spinlock
      on the queue, and as blk_queue_stack_limits is called without that lock,
      get the lock inside blk_queue_stack_limits.
      
      For blk_queue_stack_limits to be able to find the right lock, each md
      personality needs to set q->queue_lock to point to the appropriate lock.
      Those personalities which didn't previously use a spin_lock, us
      q->__queue_lock.  So always initialise that lock when allocated.
      
      With this in place, setting/clearing of the QUEUE_FLAG_PLUGGED bit will no
      longer cause warnings as it will be clear that the proper lock is held.
      
      Thanks to Dan Williams for review and fixing the silly bugs.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Alistair John Strachan <alistair@devzero.co.uk>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Jacek Luczak <difrost.kernel@gmail.com>
      Cc: Prakash Punnoor <prakash@punnoor.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7e72bf6
  15. 07 2月, 2008 1 次提交
  16. 09 11月, 2007 1 次提交
  17. 16 10月, 2007 1 次提交
  18. 10 10月, 2007 1 次提交
  19. 24 7月, 2007 1 次提交