1. 13 7月, 2017 1 次提交
  2. 11 7月, 2017 2 次提交
    • X
      Raid5 should update rdev->sectors after reshape · b5d27718
      Xiao Ni 提交于
      The raid5 md device is created by the disks which we don't use the total size. For example,
      the size of the device is 5G and it just uses 3G of the devices to create one raid5 device.
      Then change the chunksize and wait reshape to finish. After reshape finishing stop the raid
      and assemble it again. It fails.
      mdadm -CR /dev/md0 -l5 -n3 /dev/loop[0-2] --size=3G --chunk=32 --assume-clean
      mdadm /dev/md0 --grow --chunk=64
      wait reshape to finish
      mdadm -S /dev/md0
      mdadm -As
      The error messages:
      [197519.814302] md: loop1 does not have a valid v1.2 superblock, not importing!
      [197519.821686] md: md_import_device returned -22
      
      After reshape the data offset is changed. It selects backwards direction in this condition.
      In function super_1_load it compares the available space of the underlying device with
      sb->data_size. The new data offset gets bigger after reshape. So super_1_load returns -EINVAL.
      rdev->sectors is updated in md_finish_reshape. Then sb->data_size is set in super_1_sync based
      on rdev->sectors. So add md_finish_reshape in end_reshape.
      Signed-off-by: NXiao Ni <xni@redhat.com>
      Acked-by: NGuoqing Jiang <gqjiang@suse.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NShaohua Li <shli@fb.com>
      b5d27718
    • G
      md/bitmap: don't read page from device with Bitmap_sync · 4aaf7694
      Guoqing Jiang 提交于
      The device owns Bitmap_sync flag needs recovery
      to become in sync, and read page from this type
      device could get stale status.
      
      Also add comments for Bitmap_sync bit per the
      suggestion from Shaohua and Neil.
      
      Previous disscussion can be found here:
      https://marc.info/?t=149760428900004&r=1&w=2Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      4aaf7694
  3. 06 7月, 2017 1 次提交
  4. 04 7月, 2017 2 次提交
  5. 30 6月, 2017 1 次提交
  6. 28 6月, 2017 2 次提交
    • V
      dm thin: do not queue freed thin mapping for next stage processing · 00a0ea33
      Vallish Vaidyeshwara 提交于
      process_prepared_discard_passdown_pt1() should cleanup
      dm_thin_new_mapping in cases of error.
      
      dm_pool_inc_data_range() can fail trying to get a block reference:
      
      metadata operation 'dm_pool_inc_data_range' failed: error = -61
      
      When dm_pool_inc_data_range() fails, dm thin aborts current metadata
      transaction and marks pool as PM_READ_ONLY. Memory for thin mapping
      is released as well. However, current thin mapping will be queued
      onto next stage as part of queue_passdown_pt2() or passdown_endio().
      This dangling thin mapping memory when processed and accessed in
      next stage will lead to device mapper crashing.
      
      Code flow without fix:
      -> process_prepared_discard_passdown_pt1(m)
         -> dm_thin_remove_range()
         -> discard passdown
            --> passdown_endio(m) queues m onto next stage
         -> dm_pool_inc_data_range() fails, frees memory m
                  but does not remove it from next stage queue
      
      -> process_prepared_discard_passdown_pt2(m)
         -> processes freed memory m and crashes
      
      One such stack:
      
      Call Trace:
      [<ffffffffa037a46f>] dm_cell_release_no_holder+0x2f/0x70 [dm_bio_prison]
      [<ffffffffa039b6dc>] cell_defer_no_holder+0x3c/0x80 [dm_thin_pool]
      [<ffffffffa039b88b>] process_prepared_discard_passdown_pt2+0x4b/0x90 [dm_thin_pool]
      [<ffffffffa0399611>] process_prepared+0x81/0xa0 [dm_thin_pool]
      [<ffffffffa039e735>] do_worker+0xc5/0x820 [dm_thin_pool]
      [<ffffffff8152bf54>] ? __schedule+0x244/0x680
      [<ffffffff81087e72>] ? pwq_activate_delayed_work+0x42/0xb0
      [<ffffffff81089f53>] process_one_work+0x153/0x3f0
      [<ffffffff8108a71b>] worker_thread+0x12b/0x4b0
      [<ffffffff8108a5f0>] ? rescuer_thread+0x350/0x350
      [<ffffffff8108fd6a>] kthread+0xca/0xe0
      [<ffffffff8108fca0>] ? kthread_park+0x60/0x60
      [<ffffffff81530b45>] ret_from_fork+0x25/0x30
      
      The fix is to first take the block ref count for discarded block and
      then do a passdown discard of this block. If block ref count fails,
      then bail out aborting current metadata transaction, mark pool as
      PM_READ_ONLY and also free current thin mapping memory (existing error
      handling code) without queueing this thin mapping onto next stage of
      processing. If block ref count succeeds, then passdown discard of this
      block. Discard callback of passdown_endio() will queue this thin mapping
      onto next stage of processing.
      
      Code flow with fix:
      -> process_prepared_discard_passdown_pt1(m)
         -> dm_thin_remove_range()
         -> dm_pool_inc_data_range()
            --> if fails, free memory m and bail out
         -> discard passdown
            --> passdown_endio(m) queues m onto next stage
      
      Cc: stable <stable@vger.kernel.org> # v4.9+
      Reviewed-by: NEduardo Valentin <eduval@amazon.com>
      Reviewed-by: NCristian Gafton <gafton@amazon.com>
      Reviewed-by: NAnchal Agarwal <anchalag@amazon.com>
      Signed-off-by: NVallish Vaidyeshwara <vallish@amazon.com>
      Reviewed-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      00a0ea33
    • C
      dm: don't set bounce limit · 41341afa
      Christoph Hellwig 提交于
      Now all queues allocators come without abounce limit by default,
      dm doesn't have to override this anymore.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      41341afa
  7. 24 6月, 2017 2 次提交
  8. 22 6月, 2017 2 次提交
    • N
      md: use a separate bio_set for synchronous IO. · 5a85071c
      NeilBrown 提交于
      md devices allocate a bio_set and use it for two
      distinct purposes.
      mddev->bio_set is used to clone bios as part of sending
      upper level requests down to lower level devices,
      and it is also use for synchronous IO such as superblock
      and bitmap updates, and for correcting read errors.
      
      This multiple usage can lead to deadlocks.  It is likely
      that cloned bios might be queued for write and to be
      waiting for a metadata update before the write can be permitted.
      If the cloning exhausted mddev->bio_set, the metadata update
      may not be able to proceed.
      
      This scenario has been seen during heavy testing, with lots of IO and
      lots of memory pressure.
      
      Address this by adding a new bio_set specifically for synchronous IO.
      All synchronous IO goes directly to the underlying device and is not
      queued at the md level, so request using entries from the new
      mddev->sync_set will complete in a timely fashion.
      Requests that use mddev->bio_set will sometimes need to wait
      for synchronous IO, but will no longer risk deadlocking that iO.
      
      Also: small simplification in mddev_put(): there is no need to
      wait until the spinlock is released before calling bioset_free().
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      5a85071c
    • M
      dm io: fix duplicate bio completion due to missing ref count · feb7695f
      Mike Snitzer 提交于
      If only a subset of the devices associated with multiple regions support
      a given special operation (eg. DISCARD) then the dec_count() that is
      used to set error for the region must increment the io->count.
      
      Otherwise, when the dec_count() is called it can cause the dm-io
      caller's bio to be completed multiple times.  As was reported against
      the dm-mirror target that had mirror legs with a mix of discard
      capabilities.
      
      Bug: https://bugzilla.kernel.org/show_bug.cgi?id=196077Reported-by: NZhang Yi <yizhan@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      feb7695f
  9. 21 6月, 2017 1 次提交
  10. 20 6月, 2017 1 次提交
    • I
      sched/wait: Rename wait_queue_t => wait_queue_entry_t · ac6424b9
      Ingo Molnar 提交于
      Rename:
      
      	wait_queue_t		=>	wait_queue_entry_t
      
      'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
      but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
      which had to carry the name.
      
      Start sorting this out by renaming it to 'wait_queue_entry_t'.
      
      This also allows the real structure name 'struct __wait_queue' to
      lose its double underscore and become 'struct wait_queue_entry',
      which is the more canonical nomenclature for such data types.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ac6424b9
  11. 19 6月, 2017 19 次提交
  12. 17 6月, 2017 3 次提交
  13. 16 6月, 2017 1 次提交
  14. 15 6月, 2017 1 次提交
  15. 14 6月, 2017 1 次提交