1. 13 10月, 2012 2 次提交
  2. 12 10月, 2012 4 次提交
  3. 27 9月, 2012 9 次提交
    • N
      md/raid10: fix "enough" function for detecting if array is failed. · 80b48124
      NeilBrown 提交于
      The 'enough' function is written to work with 'near' arrays only
      in that is implicitly assumes that the offset from one 'group' of
      devices to the next is the same as the number of copies.
      In reality it is the number of 'near' copies.
      
      So change it to make this number explicit.
      
      This bug makes it possible to run arrays without enough drives
      present, which is dangerous.
      It is appropriate for an -stable kernel, but will almost certainly
      need to be modified for some of them.
      
      Cc: stable@vger.kernel.org
      Reported-by: NJakub Husák <jakub@gooseman.cz>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      80b48124
    • M
      dm verity: fix overflow check · 1d55f6bc
      Mikulas Patocka 提交于
      This patch fixes sector_t overflow checking in dm-verity.
      
      Without this patch, the code checks for overflow only if sector_t is
      smaller than long long, not if sector_t and long long have the same size.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      1d55f6bc
    • M
      dm thin: fix discard support for data devices · 0424caa1
      Mike Snitzer 提交于
      The discard limits that get established for a thin-pool or thin device
      may be incompatible with the pool's data device.  Avoid this by checking
      the discard limits of the pool's data device.  If an incompatibility is
      found then the pool's 'discard passdown' feature is disabled.
      
      Change thin_io_hints to ensure that a thin device always uses the same
      queue limits as its pool device.
      
      Introduce requested_pf to track whether or not the table line originally
      contained the no_discard_passdown flag and use this directly for table
      output.  We prepare the correct setting for discard_passdown directly in
      bind_control_target (called from pool_io_hints) and store it in
      adjusted_pf rather than waiting until we have access to pool->pf in
      pool_preresume.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      0424caa1
    • M
      dm thin: tidy discard support · 9bc142dd
      Mike Snitzer 提交于
      A little thin discard code refactoring to make the next patch (dm thin:
      fix discard support for data devices) more readable.
      Pull out a couple of functions (and uses bools instead of unsigned for
      features).
      
      No functional changes.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      9bc142dd
    • M
      dm: retain table limits when swapping to new table with no devices · 3ae70656
      Mike Snitzer 提交于
      Add a safety net that will re-use the DM device's existing limits in the
      event that DM device has a temporary table that doesn't have any
      component devices.  This is to reduce the chance that requests not
      respecting the hardware limits will reach the device.
      
      DM recalculates queue limits based only on devices which currently exist
      in the table.  This creates a problem in the event all devices are
      temporarily removed such as all paths being lost in multipath.  DM will
      reset the limits to the maximum permissible, which can then assemble
      requests which exceed the limits of the paths when the paths are
      restored.  The request will fail the blk_rq_check_limits() test when
      sent to a path with lower limits, and will be retried without end by
      multipath.  This became a much bigger issue after v3.6 commit fe86cdce
      ("block: do not artificially constrain max_sectors for stacking
      drivers").
      Reported-by: NDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      3ae70656
    • M
      dm table: clear add_random unless all devices have it set · c3c4555e
      Milan Broz 提交于
      Always clear QUEUE_FLAG_ADD_RANDOM if any underlying device does not
      have it set. Otherwise devices with predictable characteristics may
      contribute entropy.
      
      QUEUE_FLAG_ADD_RANDOM specifies whether or not queue IO timings
      contribute to the random pool.
      
      For bio-based targets this flag is always 0 because such devices have no
      real queue.
      
      For request-based devices this flag was always set to 1 by default.
      
      Now set it according to the flags on underlying devices. If there is at
      least one device which should not contribute, set the flag to zero: If a
      device, such as fast SSD storage, is not suitable for supplying entropy,
      a request-based queue stacked over it will not be either.
      
      Because the checking logic is exactly same as for the rotational flag,
      share the iteration function with device_is_nonrot().
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      c3c4555e
    • M
      dm: handle requests beyond end of device instead of using BUG_ON · ba1cbad9
      Mike Snitzer 提交于
      The access beyond the end of device BUG_ON that was introduced to
      dm_request_fn via commit 29e4013d ("dm: implement
      REQ_FLUSH/FUA support for request-based dm") was an overly
      drastic (but simple) response to this situation.
      
      I have received a report that this BUG_ON was hit and now think
      it would be better to use dm_kill_unmapped_request() to fail the clone
      and original request with -EIO.
      
      map_request() will assign the valid target returned by
      dm_table_find_target to tio->ti.  But when the target
      isn't valid tio->ti is never assigned (because map_request isn't
      called); so add a check for tio->ti != NULL to dm_done().
      Reported-by: NMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: stable@vger.kernel.org # v2.6.37+
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      ba1cbad9
    • M
      dm mpath: only retry ioctl when no paths if queue_if_no_path set · 7ba10aa6
      Mike Snitzer 提交于
      When there are no paths and multipath receives an ioctl, it waits until
      a path becomes available.  This behaviour is incorrect if the
      "queue_if_no_path" setting was not specified, as then the ioctl should
      be rejected immediately, which this patch now does.
      
      commit 35991652 ("dm mpath: allow ioctls to trigger pg init") should
      have checked if queue_if_no_path was configured before queueing IO.
      
      Checking for the queue_if_no_path feature, like is done in map_io(),
      allows the following table load to work without blocking in the
      multipath_ioctl retry loop:
      
        echo "0 1024 multipath 0 0 0 0" | dmsetup create mpath_nodevs
      
      Without this fix the multipath_ioctl will block with the following stack
      trace:
      
        blkid           D 0000000000000002     0 23936      1 0x00000000
         ffff8802b89e5cd8 0000000000000082 ffff8802b89e5fd8 0000000000012440
         ffff8802b89e4010 0000000000012440 0000000000012440 0000000000012440
         ffff8802b89e5fd8 0000000000012440 ffff88030c2aab30 ffff880325794040
        Call Trace:
         [<ffffffff814ce099>] schedule+0x29/0x70
         [<ffffffff814cc312>] schedule_timeout+0x182/0x2e0
         [<ffffffff8104dee0>] ? lock_timer_base+0x70/0x70
         [<ffffffff814cc48e>] schedule_timeout_uninterruptible+0x1e/0x20
         [<ffffffff8104f840>] msleep+0x20/0x30
         [<ffffffffa0000839>] multipath_ioctl+0x109/0x170 [dm_multipath]
         [<ffffffffa06bfb9c>] dm_blk_ioctl+0xbc/0xd0 [dm_mod]
         [<ffffffff8122a408>] __blkdev_driver_ioctl+0x28/0x30
         [<ffffffff8122a79e>] blkdev_ioctl+0xce/0x730
         [<ffffffff811970ac>] block_ioctl+0x3c/0x40
         [<ffffffff8117321c>] do_vfs_ioctl+0x8c/0x340
         [<ffffffff81166293>] ? sys_newfstat+0x33/0x40
         [<ffffffff81173571>] sys_ioctl+0xa1/0xb0
         [<ffffffff814d70a9>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.5+
      Acked-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      7ba10aa6
    • M
      dm thin: do not set discard_zeroes_data · 307615a2
      Mike Snitzer 提交于
      The dm thin pool target claims to support the zeroing of discarded
      data areas.  This turns out to be incorrect when processing discards
      that do not exactly cover a complete number of blocks, so the target
      must always set discard_zeroes_data_unsupported.
      
      The thin pool target will zero blocks when they are allocated if the
      skip_block_zeroing feature is not specified.  The block layer
      may send a discard that only partly covers a block.  If a thin pool
      block is partially discarded then there is no guarantee that the
      discarded data will get zeroed before it is accessed again.
      Due to this, thin devices cannot claim discards will always zero data.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org # 3.4+
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      307615a2
  4. 24 9月, 2012 1 次提交
    • N
      md/raid5: add missing spin_lock_init. · cb13ff69
      NeilBrown 提交于
      commit b17459c0
         raid5: add a per-stripe lock
      
      added a spin_lock to the 'stripe_head' struct.
      Unfortunately there are two places where this struct is allocated
      but the spin lock was only initialised in one of them.
      
      So add the missing spin_lock_init.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      cb13ff69
  5. 20 9月, 2012 1 次提交
  6. 19 9月, 2012 3 次提交
    • N
      md: make sure metadata is updated when spares are activated or removed. · 6dafab6b
      NeilBrown 提交于
      It isn't always necessary to update the metadata when spares are
      removed as the presence-or-not of a spare isn't really important to
      the integrity of an array.
      Also activating a spare doesn't always require updating the metadata
      as the update on 'recovery-completed' is usually sufficient.
      
      However the introduction of 'replacement' devices have made these
      transitions sometimes more important.  For example the 'Replacement'
      flag isn't cleared until the original device is removed, so we need
      to ensure a metadata update after that 'spare' is removed.
      
      So set MD_CHANGE_DEVS whenever a spare is activated or removed, to
      complement the current situation where it is set when a spare is added
      or a device is failed (or a number of other less common situations).
      
      This is suitable for -stable as out-of-data metadata could lead
      to data corruption.
      This is only relevant for 3.3 and later 9when 'replacement' as
      introduced.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      6dafab6b
    • N
      md/raid5: fix calculate of 'degraded' when a replacement becomes active. · e5c86471
      NeilBrown 提交于
      When a replacement device becomes active, we mark the device that it
      replaces as 'faulty' so that it can subsequently get removed.
      However 'calc_degraded' only pays attention to the primary device, not
      the replacement, so the array appears to become degraded, which is
      wrong.
      
      So teach 'calc_degraded' to consider any replacement if a primary
      device is faulty.
      
      This is suitable for -stable as an incorrect 'degraded' value can
      confuse md and could lead to data corruption.
      This is only relevant for 3.3 and later.
      
      Cc: stable@vger.kernel.org
      Reported-by: NRobin Hill <robin@robinhill.me.uk>
      Reported-by: NJohn Drescher <drescherjm@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e5c86471
    • N
      Revert "md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE." · a852d7b8
      NeilBrown 提交于
      This reverts commit 895e3c5c.
      
      While this patch seemed like a good idea and did help some workloads,
      it hurts other workloads.
      Large sequential O_DIRECT writes were faster,
      Small random O_DIRECT writes were slower.
      
      Other changes (batching RAID5 writes) have improved the sequential
      writes using a different mechanism, so the net result of this patch
      is definitely negative.  So revert it.
      Reported-by: NShaohua Li <shli@kernel.org>
      Tested-by: NJianpeng Ma <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a852d7b8
  7. 09 9月, 2012 4 次提交
  8. 21 8月, 2012 1 次提交
    • T
      workqueue: deprecate flush[_delayed]_work_sync() · 43829731
      Tejun Heo 提交于
      flush[_delayed]_work_sync() are now spurious.  Mark them deprecated
      and convert all users to flush[_delayed]_work().
      
      If you're cc'd and wondering what's going on: Now all workqueues are
      non-reentrant and the regular flushes guarantee that the work item is
      not pending or running on any CPU on return, so there's no reason to
      use the sync flushes at all and they're going away.
      
      This patch doesn't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Mattia Dongili <malattia@linux.it>
      Cc: Kent Yoder <key@linux.vnet.ibm.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Cc: Bryan Wu <bryan.wu@canonical.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-wireless@vger.kernel.org
      Cc: Anton Vorontsov <cbou@mail.ru>
      Cc: Sangbeom Kim <sbkim73@samsung.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Avi Kivity <avi@redhat.com> 
      43829731
  9. 18 8月, 2012 1 次提交
    • N
      md/raid10: fix problem with on-stack allocation of r10bio structure. · e0ee7785
      NeilBrown 提交于
      A 'struct r10bio' has an array of per-copy information at the end.
      This array is declared with size [0] and r10bio_pool_alloc allocates
      enough extra space to store the per-copy information depending on the
      number of copies needed.
      
      So declaring a 'struct r10bio on the stack isn't going to work.  It
      won't allocate enough space, and memory corruption will ensue.
      
      So in the two places where this is done, declare a sufficiently large
      structure and use that instead.
      
      The two call-sites of this bug were introduced in 3.4 and 3.5
      so this is suitable for both those kernels.  The patch will have to
      be modified for 3.4 as it only has one bug.
      
      Cc: stable@vger.kernel.org
      Reported-by: NIvan Vasilyev <ivan.vasilyev@gmail.com>
      Tested-by: NIvan Vasilyev <ivan.vasilyev@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e0ee7785
  10. 16 8月, 2012 1 次提交
    • N
      md: Don't truncate size at 4TB for RAID0 and Linear · 667a5313
      NeilBrown 提交于
      commit 27a7b260
         md: Fix handling for devices from 2TB to 4TB in 0.90 metadata.
      
      changed 0.90 metadata handling to truncated size to 4TB as that is
      all that 0.90 can record.
      However for RAID0 and Linear, 0.90 doesn't need to record the size, so
      this truncation is not needed and causes working arrays to become too small.
      
      So avoid the truncation for RAID0 and Linear
      
      This bug was introduced in 3.1 and is suitable for any stable kernels
      from then onwards.
      As the offending commit was tagged for 'stable', any stable kernel
      that it was applied to should also get this patch.  That includes
      at least 2.6.32, 2.6.33 and 3.0. (Thanks to Ben Hutchings for
      providing that list).
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NNeil Brown <neilb@suse.de>
      667a5313
  11. 02 8月, 2012 4 次提交
  12. 01 8月, 2012 1 次提交
  13. 31 7月, 2012 8 次提交