1. 11 9月, 2017 2 次提交
    • J
      dm log writes: fix >512b sectorsize support · 228bb5b2
      Josef Bacik 提交于
      512b sectors vs device's physical sectorsize was not maintained
      consistently and as such the support for >512b sector devices has bugs.
      The log metadata expects native sectorsize but 512b sectors were being
      stored.  Also, device's sectorsize was assumed when assigning the
      bi_sector for blocks that were being logged.
      
      Fix this up by adding two helpers to convert between bio and dev
      sectors, and use these in the appropriate places to fix the problem and
      make it clear which units go where.  Doing so allows dm-log-writes use
      with 4k devices.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      228bb5b2
    • J
      dm log writes: don't use all the cpu while waiting to log blocks · 0c79c620
      Josef Bacik 提交于
      The check to see if the logging kthread needs to go to sleep is wrong,
      it checks lc->pending_blocks, which will be non-0 if there are any
      blocks that are pending, whether they are ready to be logged or not.
      What we really want is to go to sleep until it's time to log blocks, so
      change this check so we do actually go to sleep in between flushes.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      0c79c620
  2. 28 8月, 2017 20 次提交
    • E
      dm ioctl: constify ioctl lookup table · cf0dec66
      Eric Biggers 提交于
      Constify the lookup table for device-mapper ioctls so that it is placed
      in .rodata.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      cf0dec66
    • E
      dm: constify argument arrays · 5916a22b
      Eric Biggers 提交于
      The arrays of 'struct dm_arg' are never modified by the device-mapper
      core, so constify them so that they are placed in .rodata.
      
      (Exception: the args array in dm-raid cannot be constified because it is
      allocated on the stack and modified.)
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      5916a22b
    • M
      dm integrity: count and display checksum failures · 3f2e5393
      Mikulas Patocka 提交于
      This changes DM integrity to count the number of checksum failures and
      report the counter in response to STATUSTYPE_INFO request (via 'dmsetup
      status').
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      3f2e5393
    • M
      dm integrity: optimize writing dm-bufio buffers that are partially changed · 1e3b21c6
      Mikulas Patocka 提交于
      Rather than write the entire dm-bufio buffer when only a subset is
      changed, improve dm-bufio (and dm-integrity) by only writing the subset
      of the buffer that changed.
      
      Update dm-integrity to make use of dm-bufio's new
      dm_bufio_mark_partial_buffer_dirty() interface.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      1e3b21c6
    • M
      dm rq: do not update rq partially in each ending bio · dc6364b5
      Ming Lei 提交于
      We don't need to update the original dm request partially when ending
      each cloned bio: just update original dm request once when the whole
      cloned request is finished.  This still allows full support for partial
      completion because a new 'completed' counter accounts for incremental
      progress as the clone bios complete.
      
      Partial request update can be a bit expensive, so we should try to avoid
      it, especially because it is run in softirq context.
      
      Avoiding all the partial request updates fixes both hard lockup and
      soft lockups that were easily reproduced while running Laurence's
      test[1] on IB/SRP.
      
      BTW, after d4acf365 ("block: Make blk_mq_delay_kick_requeue_list()
      rerun the queue at a quiet time"), we need to make the test more
      aggressive for reproducing the lockup:
      
      	1) run hammer_write.sh 32 or 64 concurrently.
      	2) write 8M each time
      
      [1] https://marc.info/?l=linux-block&m=150220185510245&w=2Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      dc6364b5
    • B
      dm rq: make dm-sq requeuing behavior consistent with dm-mq behavior · d5c27f3f
      Bart Van Assche 提交于
      DM_MAPIO_DELAY_REQUEUE causes dm-mq to requeue after a delay but
      causes dm-sq to requeue immediately.  Make the behavior of dm-sq
      consistent with that of dm-mq.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      d5c27f3f
    • B
      dm mpath: complain about unsupported __multipath_map_bio() return values · 9157c8d3
      Bart Van Assche 提交于
      WARN_ONCE() if __multipath_map_bio() returns an unsupported return value.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      9157c8d3
    • B
    • B
      dm mpath: do not lock up a CPU with requeuing activity · 1c23484c
      Bart Van Assche 提交于
      When using the block layer in single queue mode, get_request()
      returns ERR_PTR(-EAGAIN) if the queue is dying and the REQ_NOWAIT
      flag has been passed to get_request(). Avoid that the kernel
      reports soft lockup complaints in this case due to continuous
      requeuing activity.
      
      Fixes: 7083abbb ("dm mpath: avoid that path removal can trigger an infinite loop")
      Cc: stable@vger.kernel.org
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Tested-by: NLaurence Oberman <loberman@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      1c23484c
    • B
      dm: fix printk() rate limiting code · 60440789
      Bart Van Assche 提交于
      Using the same rate limiting state for different kinds of messages
      is wrong because this can cause a high frequency message to suppress
      a report of a low frequency message. Hence use a unique rate limiting
      state per message type.
      
      Fixes: 71a16736 ("dm: use local printk ratelimit")
      Cc: stable@vger.kernel.org
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      60440789
    • B
      dm mpath: retry BLK_STS_RESOURCE errors · 68515cc7
      Bart Van Assche 提交于
      Retry requests instead of failing them if an out-of-memory error occurs
      or the block driver below dm-mpath is busy.  This restores the v4.12
      behavior of noretry_error(), namely that -ENOMEM results in a retry.
      
      Fixes: 2a842aca ("block: introduce new block status code type")
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      68515cc7
    • B
      dm: fix the second dec_pending() argument in __split_and_process_bio() · 54385bf7
      Bart Van Assche 提交于
      Detected by sparse.
      
      Fixes: 4e4cbee9 ("block: switch bios to blk_status_t")
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NLaurence Oberman <loberman@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      54385bf7
    • L
      Linux 4.13-rc7 · cc4a41fe
      Linus Torvalds 提交于
      cc4a41fe
    • L
      Merge tag 'iommu-fixes-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 2c25833c
      Linus Torvalds 提交于
      Pull IOMMU fix from Joerg Roedel:
       "Another fix, this time in common IOMMU sysfs code.
      
        In the conversion from the old iommu sysfs-code to the
        iommu_device_register interface, I missed to update the release path
        for the struct device associated with an IOMMU. It freed the 'struct
        device', which was a pointer before, but is now embedded in another
        struct.
      
        Freeing from the middle of allocated memory had all kinds of nasty
        side effects when an IOMMU was unplugged. Unfortunatly nobody
        unplugged and IOMMU until now, so this was not discovered earlier. The
        fix is to make the 'struct device' a pointer again"
      
      * tag 'iommu-fixes-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu: Fix wrong freeing of iommu_device->dev
      2c25833c
    • L
      Merge tag 'char-misc-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 80f73b2d
      Linus Torvalds 提交于
      Pull char/misc fix from Greg KH:
       "Here is a single misc driver fix for 4.13-rc7. It resolves a reported
        problem in the Android binder driver due to previous patches in
        4.13-rc.
      
        It's been in linux-next with no reported issues"
      
      * tag 'char-misc-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        ANDROID: binder: fix proc->tsk check.
      80f73b2d
    • L
      Merge tag 'staging-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · c3c16263
      Linus Torvalds 提交于
      Pull staging/iio fixes from Greg KH:
       "Here are few small staging driver fixes, and some more IIO driver
        fixes for 4.13-rc7. Nothing major, just resolutions for some reported
        problems.
      
        All of these have been in linux-next with no reported problems"
      
      * tag 'staging-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        iio: magnetometer: st_magn: remove ihl property for LSM303AGR
        iio: magnetometer: st_magn: fix status register address for LSM303AGR
        iio: hid-sensor-trigger: Fix the race with user space powering up sensors
        iio: trigger: stm32-timer: fix get trigger mode
        iio: imu: adis16480: Fix acceleration scale factor for adis16480
        PATCH] iio: Fix some documentation warnings
        staging: rtl8188eu: add RNX-N150NUB support
        Revert "staging: fsl-mc: be consistent when checking strcmp() return"
        iio: adc: stm32: fix common clock rate
        iio: adc: ina219: Avoid underflow for sleeping time
        iio: trigger: stm32-timer: add enable attribute
        iio: trigger: stm32-timer: fix get/set down count direction
        iio: trigger: stm32-timer: fix write_raw return value
        iio: trigger: stm32-timer: fix quadrature mode get routine
        iio: bmp280: properly initialize device for humidity reading
      c3c16263
    • L
      Merge tag 'ntb-4.13-bugfixes' of git://github.com/jonmason/ntb · fff4e7a0
      Linus Torvalds 提交于
      Pull NTB fixes from Jon Mason:
       "NTB bug fixes to address an incorrect ntb_mw_count reference in the
        NTB transport, improperly bringing down the link if SPADs are
        corrupted, and an out-of-order issue regarding link negotiation and
        data passing"
      
      * tag 'ntb-4.13-bugfixes' of git://github.com/jonmason/ntb:
        ntb: ntb_test: ensure the link is up before trying to configure the mws
        ntb: transport shouldn't disable link due to bogus values in SPADs
        ntb: use correct mw_count function in ntb_tool and ntb_transport
      fff4e7a0
    • L
      Avoid page waitqueue race leaving possible page locker waiting · a8b169af
      Linus Torvalds 提交于
      The "lock_page_killable()" function waits for exclusive access to the
      page lock bit using the WQ_FLAG_EXCLUSIVE bit in the waitqueue entry
      set.
      
      That means that if it gets woken up, other waiters may have been
      skipped.
      
      That, in turn, means that if it sees the page being unlocked, it *must*
      take that lock and return success, even if a lethal signal is also
      pending.
      
      So instead of checking for lethal signals first, we need to check for
      them after we've checked the actual bit that we were waiting for.  Even
      if that might then delay the killing of the process.
      
      This matches the order of the old "wait_on_bit_lock()" infrastructure
      that the page locking used to use (and is still used in a few other
      areas).
      
      Note that if we still return an error after having unsuccessfully tried
      to acquire the page lock, that is ok: that means that some other thread
      was able to get ahead of us and lock the page, and when that other
      thread then unlocks the page, the wakeup event will be repeated.  So any
      other pending waiters will now get properly woken up.
      
      Fixes: 62906027 ("mm: add PageWaiters indicating tasks are waiting for a page bit")
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Andi Kleen <ak@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8b169af
    • L
      Minor page waitqueue cleanups · 3510ca20
      Linus Torvalds 提交于
      Tim Chen and Kan Liang have been battling a customer load that shows
      extremely long page wakeup lists.  The cause seems to be constant NUMA
      migration of a hot page that is shared across a lot of threads, but the
      actual root cause for the exact behavior has not been found.
      
      Tim has a patch that batches the wait list traversal at wakeup time, so
      that we at least don't get long uninterruptible cases where we traverse
      and wake up thousands of processes and get nasty latency spikes.  That
      is likely 4.14 material, but we're still discussing the page waitqueue
      specific parts of it.
      
      In the meantime, I've tried to look at making the page wait queues less
      expensive, and failing miserably.  If you have thousands of threads
      waiting for the same page, it will be painful.  We'll need to try to
      figure out the NUMA balancing issue some day, in addition to avoiding
      the excessive spinlock hold times.
      
      That said, having tried to rewrite the page wait queues, I can at least
      fix up some of the braindamage in the current situation. In particular:
      
       (a) we don't want to continue walking the page wait list if the bit
           we're waiting for already got set again (which seems to be one of
           the patterns of the bad load).  That makes no progress and just
           causes pointless cache pollution chasing the pointers.
      
       (b) we don't want to put the non-locking waiters always on the front of
           the queue, and the locking waiters always on the back.  Not only is
           that unfair, it means that we wake up thousands of reading threads
           that will just end up being blocked by the writer later anyway.
      
      Also add a comment about the layout of 'struct wait_page_key' - there is
      an external user of it in the cachefiles code that means that it has to
      match the layout of 'struct wait_bit_key' in the two first members.  It
      so happens to match, because 'struct page *' and 'unsigned long *' end
      up having the same values simply because the page flags are the first
      member in struct page.
      
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3510ca20
    • L
      Clarify (and fix) MAX_LFS_FILESIZE macros · 0cc3b0ec
      Linus Torvalds 提交于
      We have a MAX_LFS_FILESIZE macro that is meant to be filled in by
      filesystems (and other IO targets) that know they are 64-bit clean and
      don't have any 32-bit limits in their IO path.
      
      It turns out that our 32-bit value for that limit was bogus.  On 32-bit,
      the VM layer is limited by the page cache to only 32-bit index values,
      but our logic for that was confusing and actually wrong.  We used to
      define that value to
      
      	(((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)
      
      which is actually odd in several ways: it limits the index to 31 bits,
      and then it limits files so that they can't have data in that last byte
      of a page that has the highest 31-bit index (ie page index 0x7fffffff).
      
      Neither of those limitations make sense.  The index is actually the full
      32 bit unsigned value, and we can use that whole full page.  So the
      maximum size of the file would logically be "PAGE_SIZE << BITS_PER_LONG".
      
      However, we do wan tto avoid the maximum index, because we have code
      that iterates over the page indexes, and we don't want that code to
      overflow.  So the maximum size of a file on a 32-bit host should
      actually be one page less than the full 32-bit index.
      
      So the actual limit is ULONG_MAX << PAGE_SHIFT.  That means that we will
      not actually be using the page of that last index (ULONG_MAX), but we
      can grow a file up to that limit.
      
      The wrong value of MAX_LFS_FILESIZE actually caused problems for Doug
      Nazar, who was still using a 32-bit host, but with a 9.7TB 2 x RAID5
      volume.  It turns out that our old MAX_LFS_FILESIZE was 8TiB (well, one
      byte less), but the actual true VM limit is one page less than 16TiB.
      
      This was invisible until commit c2a9737f ("vfs,mm: fix a dead loop
      in truncate_inode_pages_range()"), which started applying that
      MAX_LFS_FILESIZE limit to block devices too.
      
      NOTE! On 64-bit, the page index isn't a limiter at all, and the limit is
      actually just the offset type itself (loff_t), which is signed.  But for
      clarity, on 64-bit, just use the maximum signed value, and don't make
      people have to count the number of 'f' characters in the hex constant.
      
      So just use LLONG_MAX for the 64-bit case.  That was what the value had
      been before too, just written out as a hex constant.
      
      Fixes: c2a9737f ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
      Reported-and-tested-by: NDoug Nazar <nazard@nazar.ca>
      Cc: Andreas Dilger <adilger@dilger.ca>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0cc3b0ec
  3. 27 8月, 2017 4 次提交
  4. 26 8月, 2017 14 次提交