1. 15 10月, 2012 6 次提交
  2. 13 10月, 2012 5 次提交
  3. 12 10月, 2012 12 次提交
  4. 11 10月, 2012 17 次提交
    • N
      md: refine reporting of resync/reshape delays. · 72f36d59
      NeilBrown 提交于
      If 'resync_max' is set to 0 (as is often done when starting a
      reshape, so the mdadm can remain in control during a sensitive
      period), and if the reshape request is initially delayed because
      another array using the same array is resyncing or reshaping etc,
      when user-space cannot easily tell when the delay changes from being
      due to a conflicting reshape, to being due to resync_max = 0.
      
      So introduce a new state: (curr_resync == 3) to reflect this, make
      sure it is visible both via /proc/mdstat and via the "sync_completed"
      sysfs attribute, and ensure that the event transition from one delay
      state to the other is properly notified.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      72f36d59
    • N
      md/raid5: be careful not to resize_stripes too big. · e56108d6
      NeilBrown 提交于
      When a RAID5 is reshaping, conf->raid_disks is increased
      before mddev->delta_disks becomes zero.
      This can result in check_reshape calling resize_stripes with a
      number that is too large.  This particularly happens
      when md_check_recovery calls ->check_reshape().
      
      If we use ->previous_raid_disks, we don't risk this.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e56108d6
    • N
      md: make sure manual changes to recovery checkpoint are saved. · db07d85e
      NeilBrown 提交于
      If you make an array bigger but suppress resync of the new region with
        mdadm --grow /dev/mdX --size=max --assume-clean
      
      then stop the array before anything is written to it, the effect of
      the "--assume-clean" is lost and the array will resync the new space
      when restarted.
      So ensure that we update the metadata in the case.
      Reported-by: NSebastian Riemer <sebastian.riemer@profitbricks.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      db07d85e
    • D
      md/raid10: use correct limit variable · 91502f09
      Dan Carpenter 提交于
      Clang complains that we are assigning a variable to itself.  This should
      be using bad_sectors like the similar earlier check does.
      
      Bug has been present since 3.1-rc1.  It is minor but could
      conceivably cause corruption or other bad behaviour.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      91502f09
    • N
      md: writing to sync_action should clear the read-auto state. · 48c26ddc
      NeilBrown 提交于
      In some cases array are started in 'read-auto' state where in
      nothing gets written to any device until the array is written
      to.  The purpose of this is to make accidental auto-assembly
      of the wrong arrays less of a risk, and to allow arrays to be
      started to read suspend-to-disk images without actually changing
      anything (as might happen if the array were dirty and a
      resync seemed necessary).
      
      Explicitly writing the 'sync_action' for a read-auto array currently
      doesn't clear the read-auto state, so the sync action doesn't
      happen, which can be confusing.
      
      So allow any successful write to sync_action to clear any read-auto
      state.
      Reported-by: NAlexander Kühn <alexander.kuehn@nagilum.de>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      48c26ddc
    • J
      Subject: [PATCH] md:change resync_mismatches to atomic64_t to avoid races · 7f7583d4
      Jianpeng Ma 提交于
      Now that multiple threads can handle stripes, it is safer to
      use an atomic64_t for resync_mismatches, to avoid update races.
      Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7f7583d4
    • H
      e1000e: Change wthresh to 1 to avoid possible Tx stalls · 8edc0e62
      Hiroaki SHIMODA 提交于
      This patch originated from Hiroaki SHIMODA but has been modified
      by Intel with some minor cleanups and additional commit log text.
      
      Denys Fedoryshchenko and others reported Tx stalls on e1000e with
      BQL enabled.  Issue was root caused to hardware delays. They were
      introduced because some of the e1000e hardware with transmit
      writeback bursting enabled, waits until the driver does an
      explict flush OR there are WTHRESH descriptors to write back.
      
      Sometimes the delays in question were on the order of seconds,
      causing visible lag for ssh sessions and unacceptable tx
      completion latency, especially for BQL enabled kernels.
      
      To avoid possible Tx stalls, change WTHRESH back to 1.
      
      The current plan is to investigate a method for re-enabling
      WTHRESH while not harming BQL, but those patches will be later
      for net-next if they work.
      
      please enqueue for stable since v3.3 as this bug was introduced in
      commit 3f0cfa3b
      Author: Tom Herbert <therbert@google.com>
      Date:   Mon Nov 28 16:33:16 2011 +0000
      
          e1000e: Support for byte queue limits
      
          Changes to e1000e to use byte queue limits.
      Reported-by: NDenys Fedoryshchenko <denys@visp.net.lb>
      Tested-by: NDenys Fedoryshchenko <denys@visp.net.lb>
      Signed-off-by: NHiroaki SHIMODA <shimoda.hiroaki@gmail.com>
      CC: eric.dumazet@gmail.com
      CC: therbert@google.com
      Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8edc0e62
    • I
      xen: netback: handle compound page fragments on transmit. · 6a8ed462
      Ian Campbell 提交于
      An SKB paged fragment can consist of a compound page with order > 0.
      However the netchannel protocol deals only in PAGE_SIZE frames.
      
      Handle this in netbk_gop_frag_copy and xen_netbk_count_skb_slots by
      iterating over the frames which make up the page.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
      Cc: Sander Eikelenboom <linux@eikelenboom.it>
      Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a8ed462
    • D
      isdn: fix a wrapping bug in isdn_ppp_ioctl() · 435f08a7
      Dan Carpenter 提交于
      "protos" is an array of unsigned longs and "i" is the number of bits in
      an unsigned long so we need to use 1UL as well to prevent the shift
      from wrapping around.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      435f08a7
    • N
      md/raid5: make sure to_read and to_write never go negative. · 1ed850f3
      NeilBrown 提交于
      to_read and to_write are part of the result of analysing
      a stripe before handling it.
      Their use is to avoid some loops and tests if the values are
      known to be zero.  Thus it is not a problem if they are a
      little bit larger than they should be.
      
      So decrementing them in handle_failed_stripe serves little value, and
      due to races it could cause some loops to be skipped incorrectly.
      
      So remove those decrements.
      Reported-by: N"Jianpeng Ma" <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      1ed850f3
    • A
    • N
      md/raid5: protect debug message against NULL derefernce. · b97390ae
      NeilBrown 提交于
      The pr_debug in add_stripe_bio could race with something
      changing *bip, so it is best to hold the lock until
      after the pr_debug.
      Reported-by: N"Jianpeng Ma" <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b97390ae
    • N
      md/raid5: add some missing locking in handle_failed_stripe. · 143c4d05
      NeilBrown 提交于
      We really should hold the stripe_lock while accessing
      'toread' else we could race with add_stripe_bio and corrupt
      a list.
      Reported-by: N"Jianpeng Ma" <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      143c4d05
    • S
      MD: raid5 avoid unnecessary zero page for trim · 9e444768
      Shaohua Li 提交于
      We want to avoid zero discarded dev page, because it's useless for discard.
      But if we don't zero it, another read/write hit such page in the cache and will
      get inconsistent data.
      
      To avoid zero the page, we don't set R5_UPTODATE flag after construction is
      done. In this way, discard write request is still issued and finished, but read
      will not hit the page. If the stripe gets accessed soon, we need reread the
      stripe, but since the chance is low, the reread isn't a big deal.
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      9e444768
    • S
      MD: raid5 trim support · 620125f2
      Shaohua Li 提交于
      
      Discard for raid4/5/6 has limitation. If discard request size is
      small, we do discard for one disk, but we need calculate parity and
      write parity disk.  To correctly calculate parity, zero_after_discard
      must be guaranteed. Even it's true, we need do discard for one disk
      but write another disks, which makes the parity disks wear out
      fast. This doesn't make sense. So an efficient discard for raid4/5/6
      should discard all data disks and parity disks, which requires the
      write pattern to be (A, A+chunk_size, A+chunk_size*2...). If A's size
      is smaller than chunk_size, such pattern is almost impossible in
      practice. So in this patch, I only handle the case that A's size
      equals to chunk_size. That is discard request should be aligned to
      stripe size and its size is multiple of stripe size.
      
      Since we can only handle request with specific alignment and size (or
      part of the request fitting stripes), we can't guarantee
      zero_after_discard even zero_after_discard is true in low level
      drives.
      
      The block layer doesn't send down correctly aligned requests even
      correct discard alignment is set, so I must filter out.
      
      For raid4/5/6 parity calculation, if data is 0, parity is 0. So if
      zero_after_discard is true for all disks, data is consistent after
      discard.  Otherwise, data might be lost. Let's consider a scenario:
      discard a stripe, write data to one disk and write parity disk. The
      stripe could be still inconsistent till then depending on using data
      from other data disks or parity disks to calculate new parity. If the
      disk is broken, we can't restore it. So in this patch, we only enable
      discard support if all disks have zero_after_discard.
      
      If discard fails in one disk, we face the similar inconsistent issue
      above. The patch will make discard follow the same path as normal
      write request. If discard fails, a resync will be scheduled to make
      the data consistent. This isn't good to have extra writes, but data
      consistency is important.
      
      If a subsequent read/write request hits raid5 cache of a discarded
      stripe, the discarded dev page should have zero filled, so the data is
      consistent. This patch will always zero dev page for discarded request
      stripe. This isn't optimal because discard request doesn't need such
      payload. Next patch will avoid it.
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      620125f2
    • J
      md/bitmap:Don't use IS_ERR to judge alloc_page(). · 582e2e05
      Jianpeng Ma 提交于
      Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      582e2e05
    • N
      md/raid1: Don't release reference to device while handling read error. · 7ad4d4a6
      NeilBrown 提交于
      When we get a read error, we arrange for raid1d to handle it.
      Currently we release the reference on the device.  This can result
      in
         conf->mirrors[read_disk].rdev
      being NULL in fix_read_error, if the device happens to get removed
      before the read error is handled.
      
      So instead keep the reference until the read error has been fully
      handled.
      Reported-by: Nhank <pyu@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7ad4d4a6