1. 29 7月, 2015 2 次提交
    • J
      block: manipulate bio->bi_flags through helpers · b7c44ed9
      Jens Axboe 提交于
      Some places use helpers now, others don't. We only have the 'is set'
      helper, add helpers for setting and clearing flags too.
      
      It was a bit of a mess of atomic vs non-atomic access. With
      BIO_UPTODATE gone, we don't have any risk of concurrent access to the
      flags. So relax the restriction and don't make any of them atomic. The
      flags that do have serialization issues (reffed and chained), we
      already handle those separately.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b7c44ed9
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  2. 17 7月, 2015 1 次提交
  3. 11 7月, 2015 1 次提交
  4. 01 7月, 2015 2 次提交
  5. 26 6月, 2015 4 次提交
  6. 25 6月, 2015 3 次提交
    • N
      md: clear Blocked flag on failed devices when array is read-only. · ab16bfc7
      Neil Brown 提交于
      The Blocked flag indicates that a device has failed but that this
      fact hasn't been recorded in the metadata yet.  Writes to such
      devices cannot be allowed until the metadata has been updated.
      
      On a read-only array, the Blocked flag will never be cleared.
      This prevents the device being removed from the array.
      
      If the metadata is being handled by the kernel
      (i.e. !mddev->external), then we can be sure that if the array is
      switch to writable, then a metadata update will happen and will
      record the failure.  So we don't need the flag set.
      
      If metadata is externally managed, it is upto the external manager
      to clear the 'blocked' flag.
      Reported-by: NXiaoNi <xni@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ab16bfc7
    • N
      md: unlock mddev_lock on an error path. · 9a8c0fa8
      NeilBrown 提交于
      This error path retuns while still holding the lock - bad.
      
      Fixes: 6791875e ("md: make reconfig_mutex optional for writes to md sysfs files.")
      Cc: stable@vger.kernel.org (v4.0+)
      Signed-off-by: NNeilBrown <neilb@suse.com>
      9a8c0fa8
    • N
      md: clear mddev->private when it has been freed. · bd691922
      NeilBrown 提交于
      If ->private is set when ->run is called, it is assumed to be
      a 'config'  prepared as part of 'reshape'.
      
      So it is important when we free that config, that we also clear ->private.
      This is not often a problem as the mddev will normally be discarded
      shortly after the config us freed.
      However if an 'assemble' races with a final close, the assemble can use
      the old mddev which has a stale ->private.  This leads to any of
      various sorts of crashes.
      
      So clear ->private after calling ->free().
      Reported-by: NNate Clark <nate@neworld.us>
      Cc: stable@vger.kernel.org (v4.0+)
      Fixes: afa0f557 ("md: rename ->stop to ->free")
      Signed-off-by: NNeilBrown <neilb@suse.com>
      bd691922
  7. 24 6月, 2015 2 次提交
  8. 18 6月, 2015 5 次提交
  9. 17 6月, 2015 7 次提交
    • J
      dm space map metadata: fix occasional leak of a metadata block on resize · 6096d91a
      Joe Thornber 提交于
      The metadata space map has a simplified 'bootstrap' mode that is
      operational when extending the space maps.  Whilst in this mode it's
      possible for some refcount decrement operations to become queued (eg, as
      a result of shadowing one of the bitmap indexes).  These decrements were
      not being applied when switching out of bootstrap mode.
      
      The effect of this bug was the leaking of a 4k metadata block.  This is
      detected by the latest version of thin_check as a non fatal error.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      6096d91a
    • F
      md: fix a build warning · 4e023612
      Firo Yang 提交于
      Warning like this:
      
      drivers/md/md.c: In function "update_array_info":
      drivers/md/md.c:6394:26: warning: logical not is only applied
      to the left hand side of comparison [-Wlogical-not-parentheses]
            !mddev->persistent  != info->not_persistent||
      
      Fix it as Neil Brown said:
      mddev->persistent != !info->not_persistent ||
      Signed-off-by: NFiro Yang <firogm@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      4e023612
    • S
      md/raid5: ignore released_stripes check · 713bc5c2
      Shaohua Li 提交于
      conf->released_stripes list isn't always related to where there are
      free stripes pending. Active stripes can be in the list too.
      And even free stripes were active very recently.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      713bc5c2
    • Y
      md/raid5: per hash value and exclusive wait_for_stripe · e9e4c377
      Yuanhan Liu 提交于
      I noticed heavy spin lock contention at get_active_stripe() with fsmark
      multiple thread write workloads.
      
      Here is how this hot contention comes from. We have limited stripes, and
      it's a multiple thread write workload. Hence, those stripes will be taken
      soon, which puts later processes to sleep for waiting free stripes. When
      enough stripes(>= 1/4 total stripes) are released, all process are woken,
      trying to get the lock. But there is one only being able to get this lock
      for each hash lock, making other processes spinning out there for acquiring
      the lock.
      
      Thus, it's effectiveless to wakeup all processes and let them battle for
      a lock that permits one to access only each time. Instead, we could make
      it be a exclusive wake up: wake up one process only. That avoids the heavy
      spin lock contention naturally.
      
      To do the exclusive wake up, we've to split wait_for_stripe into multiple
      wait queues, to make it per hash value, just like the hash lock.
      
      Here are some test results I have got with this patch applied(all test run
      3 times):
      
      `fsmark.files_per_sec'
      =====================
      
      next-20150317                 this patch
      -------------------------     -------------------------
      metric_value     ±stddev      metric_value     ±stddev     change      testbox/benchmark/testcase-params
      -------------------------     -------------------------   --------     ------------------------------
            25.600     ±0.0              92.700     ±2.5          262.1%     ivb44/fsmark/1x-64t-4BRD_12G-RAID5-btrfs-4M-30G-fsyncBeforeClose
            25.600     ±0.0              77.800     ±0.6          203.9%     ivb44/fsmark/1x-64t-9BRD_6G-RAID5-btrfs-4M-30G-fsyncBeforeClose
            32.000     ±0.0              93.800     ±1.7          193.1%     ivb44/fsmark/1x-64t-4BRD_12G-RAID5-ext4-4M-30G-fsyncBeforeClose
            32.000     ±0.0              81.233     ±1.7          153.9%     ivb44/fsmark/1x-64t-9BRD_6G-RAID5-ext4-4M-30G-fsyncBeforeClose
            48.800     ±14.5             99.667     ±2.0          104.2%     ivb44/fsmark/1x-64t-4BRD_12G-RAID5-xfs-4M-30G-fsyncBeforeClose
             6.400     ±0.0              12.800     ±0.0          100.0%     ivb44/fsmark/1x-64t-3HDD-RAID5-btrfs-4M-40G-fsyncBeforeClose
            63.133     ±8.2              82.800     ±0.7           31.2%     ivb44/fsmark/1x-64t-9BRD_6G-RAID5-xfs-4M-30G-fsyncBeforeClose
           245.067     ±0.7             306.567     ±7.9           25.1%     ivb44/fsmark/1x-64t-4BRD_12G-RAID5-f2fs-4M-30G-fsyncBeforeClose
            17.533     ±0.3              21.000     ±0.8           19.8%     ivb44/fsmark/1x-1t-3HDD-RAID5-xfs-4M-40G-fsyncBeforeClose
           188.167     ±1.9             215.033     ±3.1           14.3%     ivb44/fsmark/1x-1t-4BRD_12G-RAID5-btrfs-4M-30G-NoSync
           254.500     ±1.8             290.733     ±2.4           14.2%     ivb44/fsmark/1x-1t-9BRD_6G-RAID5-btrfs-4M-30G-NoSync
      
      `time.system_time'
      =====================
      
      next-20150317                 this patch
      -------------------------    -------------------------
      metric_value     ±stddev     metric_value     ±stddev     change       testbox/benchmark/testcase-params
      -------------------------    -------------------------    --------     ------------------------------
          7235.603     ±1.2             185.163     ±1.9          -97.4%     ivb44/fsmark/1x-64t-4BRD_12G-RAID5-btrfs-4M-30G-fsyncBeforeClose
          7666.883     ±2.9             202.750     ±1.0          -97.4%     ivb44/fsmark/1x-64t-9BRD_6G-RAID5-btrfs-4M-30G-fsyncBeforeClose
         14567.893     ±0.7             421.230     ±0.4          -97.1%     ivb44/fsmark/1x-64t-3HDD-RAID5-btrfs-4M-40G-fsyncBeforeClose
          3697.667     ±14.0            148.190     ±1.7          -96.0%     ivb44/fsmark/1x-64t-4BRD_12G-RAID5-xfs-4M-30G-fsyncBeforeClose
          5572.867     ±3.8             310.717     ±1.4          -94.4%     ivb44/fsmark/1x-64t-9BRD_6G-RAID5-ext4-4M-30G-fsyncBeforeClose
          5565.050     ±0.5             313.277     ±1.5          -94.4%     ivb44/fsmark/1x-64t-4BRD_12G-RAID5-ext4-4M-30G-fsyncBeforeClose
          2420.707     ±17.1            171.043     ±2.7          -92.9%     ivb44/fsmark/1x-64t-9BRD_6G-RAID5-xfs-4M-30G-fsyncBeforeClose
          3743.300     ±4.6             379.827     ±3.5          -89.9%     ivb44/fsmark/1x-64t-3HDD-RAID5-ext4-4M-40G-fsyncBeforeClose
          3308.687     ±6.3             363.050     ±2.0          -89.0%     ivb44/fsmark/1x-64t-3HDD-RAID5-xfs-4M-40G-fsyncBeforeClose
      
      Where,
      
           1x: where 'x' means iterations or loop, corresponding to the 'L' option of fsmark
      
           1t, 64t: where 't' means thread
      
           4M: means the single file size, corresponding to the '-s' option of fsmark
           40G, 30G, 120G: means the total test size
      
           4BRD_12G: BRD is the ramdisk, where '4' means 4 ramdisk, and where '12G' means
                     the size of one ramdisk. So, it would be 48G in total. And we made a
                     raid on those ramdisk
      
      As you can see, though there are no much performance gain for hard disk
      workload, the system time is dropped heavily, up to 97%. And as expected,
      the performance increased a lot, up to 260%, for fast device(ram disk).
      
      v2: use bits instead of array to note down wait queue need to wake up.
      Signed-off-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e9e4c377
    • Y
      md/raid5: split wait_for_stripe and introduce wait_for_quiescent · b1b46486
      Yuanhan Liu 提交于
      I noticed heavy spin lock contention at get_active_stripe(), introduced
      at being wake up stage, where a bunch of processes try to re-hold the
      spin lock again.
      
      After giving some thoughts on this issue, I found the lock could be
      relieved(and even avoided) if we turn the wait_for_stripe to per
      waitqueue for each lock hash and make the wake up exclusive: wake up
      one process each time, which avoids the lock contention naturally.
      
      Before go hacking with wait_for_stripe, I found it actually has 2
      usages: for the array to enter or leave the quiescent state, and also
      to wait for an available stripe in each of the hash lists.
      
      So this patch splits the first usage off into a separate wait_queue,
      wait_for_quiescent, and the next patch will turn the second usage into
      one waitqueue for each hash value, and make it exclusive, to relieve
      the lock contention.
      
      v2: wake_up(wait_for_quiescent) when (active_stripes == 0)
          Commit log refactor suggestion from Neil.
      Signed-off-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b1b46486
    • A
      md: convert to kstrto*() · 4c9309c0
      Alexey Dobriyan 提交于
      Convert away from deprecated simple_strto*() functions.
      
      Add "fit into sector_t" checks.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      4c9309c0
    • K
      md/raid10: make sync_request_write() call bio_copy_data() · c31df25f
      Kent Overstreet 提交于
      Refactor sync_request_write() of md/raid10 to use bio_copy_data()
      instead of open coding bio_vec iterations.
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: linux-raid@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
      [dpark: add more description in commit message]
      Signed-off-by: NDongsu Park <dpark@posteo.net>
      Signed-off-by: NMing Lin <mlin@kernel.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c31df25f
  10. 12 6月, 2015 13 次提交