1. 23 2月, 2015 22 次提交
  2. 18 2月, 2015 3 次提交
    • M
      dm snapshot: fix a possible invalid memory access on unload · 22aa66a3
      Mikulas Patocka 提交于
      When the snapshot target is unloaded, snapshot_dtr() waits until
      pending_exceptions_count drops to zero.  Then, it destroys the snapshot.
      Therefore, the function that decrements pending_exceptions_count
      should not touch the snapshot structure after the decrement.
      
      pending_complete() calls free_pending_exception(), which decrements
      pending_exceptions_count, and then it performs up_write(&s->lock) and it
      calls retry_origin_bios() which dereferences  s->origin.  These two
      memory accesses to the fields of the snapshot may touch the dm_snapshot
      struture after it is freed.
      
      This patch moves the call to free_pending_exception() to the end of
      pending_complete(), so that the snapshot will not be destroyed while
      pending_complete() is in progress.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      22aa66a3
    • M
      dm: fix a race condition in dm_get_md · 2bec1f4a
      Mikulas Patocka 提交于
      The function dm_get_md finds a device mapper device with a given dev_t,
      increases the reference count and returns the pointer.
      
      dm_get_md calls dm_find_md, dm_find_md takes _minor_lock, finds the
      device, tests that the device doesn't have DMF_DELETING or DMF_FREEING
      flag, drops _minor_lock and returns pointer to the device. dm_get_md then
      calls dm_get. dm_get calls BUG if the device has the DMF_FREEING flag,
      otherwise it increments the reference count.
      
      There is a possible race condition - after dm_find_md exits and before
      dm_get is called, there are no locks held, so the device may disappear or
      DMF_FREEING flag may be set, which results in BUG.
      
      To fix this bug, we need to call dm_get while we hold _minor_lock. This
      patch renames dm_find_md to dm_get_md and changes it so that it calls
      dm_get while holding the lock.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      2bec1f4a
    • N
      md/raid5: Fix livelock when array is both resyncing and degraded. · 26ac1073
      NeilBrown 提交于
      Commit a7854487:
        md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write.
      
      Causes an RCW cycle to be forced even when the array is degraded.
      A degraded array cannot support RCW as that requires reading all data
      blocks, and one may be missing.
      
      Forcing an RCW when it is not possible causes a live-lock and the code
      spins, repeatedly deciding to do something that cannot succeed.
      
      So change the condition to only force RCW on non-degraded arrays.
      Reported-by: NManibalan P <pmanibalan@amiindia.co.in>
      Bisected-by: NJes Sorensen <Jes.Sorensen@redhat.com>
      Tested-by: NJes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Fixes: a7854487
      Cc: stable@vger.kernel.org (v3.7+)
      26ac1073
  3. 17 2月, 2015 7 次提交
    • M
      dm crypt: sort writes · b3c5fd30
      Mikulas Patocka 提交于
      Write requests are sorted in a red-black tree structure and are
      submitted in the sorted order.
      
      In theory the sorting should be performed by the underlying disk
      scheduler, however, in practice the disk scheduler only accepts and
      sorts a finite number of requests.  To allow the sorting of all
      requests, dm-crypt needs to implement its own sorting.
      
      The overhead associated with rbtree-based sorting is considered
      negligible so it is not used conditionally.  Even on SSD sorting can be
      beneficial since in-order request dispatch promotes lower latency IO
      completion to the upper layers.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      b3c5fd30
    • M
      dm crypt: add 'submit_from_crypt_cpus' option · 0f5d8e6e
      Mikulas Patocka 提交于
      Make it possible to disable offloading writes by setting the optional
      'submit_from_crypt_cpus' table argument.
      
      There are some situations where offloading write bios from the
      encryption threads to a single thread degrades performance
      significantly.
      
      The default is to offload write bios to the same thread because it
      benefits CFQ to have writes submitted using the same IO context.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      0f5d8e6e
    • M
      dm crypt: offload writes to thread · dc267621
      Mikulas Patocka 提交于
      Submitting write bios directly in the encryption thread caused serious
      performance degradation.  On a multiprocessor machine, encryption requests
      finish in a different order than they were submitted.  Consequently, write
      requests would be submitted in a different order and it could cause severe
      performance degradation.
      
      Move the submission of write requests to a separate thread so that the
      requests can be sorted before submitting.  But this commit improves
      dm-crypt performance even without having dm-crypt perform request
      sorting (in particular it enables IO schedulers like CFQ to sort more
      effectively).
      
      Note: it is required that a previous commit ("dm crypt: don't allocate
      pages for a partial request") be applied before applying this patch.
      Otherwise, this commit could introduce a crash.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      dc267621
    • M
      dm crypt: remove unused io_pool and _crypt_io_pool · 94f5e024
      Mikulas Patocka 提交于
      The previous commit ("dm crypt: don't allocate pages for a partial
      request") stopped using the io_pool slab mempool and backing
      _crypt_io_pool kmem cache.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      94f5e024
    • M
      dm crypt: avoid deadlock in mempools · 7145c241
      Mikulas Patocka 提交于
      Fix a theoretical deadlock introduced in the previous commit ("dm crypt:
      don't allocate pages for a partial request").
      
      The function crypt_alloc_buffer may be called concurrently.  If we allocate
      from the mempool concurrently, there is a possibility of deadlock.  For
      example, if we have mempool of 256 pages, two processes, each wanting
      256, pages allocate from the mempool concurrently, it may deadlock in a
      situation where both processes have allocated 128 pages and the mempool
      is exhausted.
      
      To avoid such a scenario we allocate the pages under a mutex.  In order
      to not degrade performance with excessive locking, we try non-blocking
      allocations without a mutex first and if that fails, we fallback to a
      blocking allocations with a mutex.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      7145c241
    • M
      dm crypt: don't allocate pages for a partial request · cf2f1abf
      Mikulas Patocka 提交于
      Change crypt_alloc_buffer so that it only ever allocates pages for a
      full request.  This is a prerequisite for the commit "dm crypt: offload
      writes to thread".
      
      This change simplifies the dm-crypt code at the expense of reduced
      throughput in low memory conditions (where allocation for a partial
      request is most useful).
      
      Note: the next commit ("dm crypt: avoid deadlock in mempools") is needed
      to fix a theoretical deadlock.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      cf2f1abf
    • M
      dm crypt: use unbound workqueue for request processing · f3396c58
      Mikulas Patocka 提交于
      Use unbound workqueue by default so that work is automatically balanced
      between available CPUs.  The original behavior of encrypting using the
      same cpu that IO was submitted on can still be enabled by setting the
      optional 'same_cpu_crypt' table argument.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      f3396c58
  4. 16 2月, 2015 2 次提交
  5. 14 2月, 2015 3 次提交
    • D
      dm io: reject unsupported DISCARD requests with EOPNOTSUPP · 37527b86
      Darrick J. Wong 提交于
      I created a dm-raid1 device backed by a device that supports DISCARD
      and another device that does NOT support DISCARD with the following
      dm configuration:
      
       #  echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo
       # lsblk -D
       NAME         DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
       sda                 0        4K       1G         0
       `-moo (dm-0)        0        4K       1G         0
       sdb                 0        0B       0B         0
       `-moo (dm-0)        0        4K       1G         0
      
      Notice that the mirror device /dev/mapper/moo advertises DISCARD
      support even though one of the mirror halves doesn't.
      
      If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl
      BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite
      loop in do_region() when it tries to issue a DISCARD request to sdb.
      The problem is that when we call do_region() against sdb, num_sectors
      is set to zero because q->limits.max_discard_sectors is zero.
      Therefore, "remaining" never decreases and the loop never terminates.
      
      To fix this: before entering the loop, check for the combination of
      REQ_DISCARD and no discard and return -EOPNOTSUPP to avoid hanging up
      the mirror device.
      
      This bug was found by the unfortunate coincidence of pvmove and a
      discard operation in the RHEL 6.5 kernel; upstream is also affected.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Acked-by: N"Martin K. Petersen" <martin.petersen@oracle.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      37527b86
    • M
      dm mirror: do not degrade the mirror on discard error · f2ed51ac
      Mikulas Patocka 提交于
      It may be possible that a device claims discard support but it rejects
      discards with -EOPNOTSUPP.  It happens when using loopback on ext2/ext3
      filesystem driven by the ext4 driver.  It may also happen if the
      underlying devices are moved from one disk on another.
      
      If discard error happens, we reject the bio with -EOPNOTSUPP, but we do
      not degrade the array.
      
      This patch fixes failed test shell/lvconvert-repair-transient.sh in the
      lvm2 testsuite if the testsuite is extracted on an ext2 or ext3
      filesystem and it is being driven by the ext4 driver.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      f2ed51ac
    • M
      dm space map disk: fix sm_disk_count_is_more_than_one() · 145b9006
      Mike Snitzer 提交于
      dm_tm_shadow_block() is the only caller of
      dm_sm_count_is_more_than_one() which only ever operates on a metadata
      space-map.  So in practice, sm_disk_count_is_more_than_one() isn't
      actually used (which explains why this bug never amounted to anything).
      
      But fix sm_disk_count_is_more_than_one() to properly set *result and
      return 0.
      Reported-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      145b9006
  6. 12 2月, 2015 1 次提交
    • N
      md/raid10: fix conversion from RAID0 to RAID10 · 53a6ab4d
      NeilBrown 提交于
      A RAID0 array (like a LINEAR array) does not have a concept
      of 'size' being the amount of each device that is in use.
      Rather, as much of each device as is available is used.
      So the 'size' is set to 0 and ignored.
      
      RAID10 does have this concept and needs it to be set correctly.
      So when we convert RAID0 to RAID10 we must determine the
      'size' (that being the size of the first 'strip_zone' in the
      RAID0), and set it correctly.
      Reported-and-tested-by: NXiao Ni <xni@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      53a6ab4d
  7. 11 2月, 2015 1 次提交
  8. 10 2月, 2015 1 次提交