1. 21 6月, 2017 1 次提交
    • J
      time: Clean up CLOCK_MONOTONIC_RAW time handling · fc6eead7
      John Stultz 提交于
      Now that we fixed the sub-ns handling for CLOCK_MONOTONIC_RAW,
      remove the duplicitive tk->raw_time.tv_nsec, which can be
      stored in tk->tkr_raw.xtime_nsec (similarly to how its handled
      for monotonic time).
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Tested-by: NDaniel Mentz <danielmentz@google.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      fc6eead7
  2. 20 6月, 2017 4 次提交
    • T
      Merge branch 'clockevents/4.12-fixes' of... · 8e6cec1c
      Thomas Gleixner 提交于
      Merge branch 'clockevents/4.12-fixes' of https://git.linaro.org/people/daniel.lezcano/linux into timers/urgent
      
      Pull clockevents fixes from Daniel Lezcano:
      
       - Fixed wrong iomem area unmapped in the arch_arm_timer (Frank Rowand)
      
       - Added missing includes for sun5i and cadence-ttc (Stephen Rothwell)
      8e6cec1c
    • W
      arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW · dbb236c1
      Will Deacon 提交于
      Recently vDSO support for CLOCK_MONOTONIC_RAW was added in
      49eea433 ("arm64: Add support for CLOCK_MONOTONIC_RAW in
      clock_gettime() vDSO"). Noticing that the core timekeeping code
      never set tkr_raw.xtime_nsec, the vDSO implementation didn't
      bother exposing it via the data page and instead took the
      unshifted tk->raw_time.tv_nsec value which was then immediately
      shifted left in the vDSO code.
      
      Unfortunately, by accellerating the MONOTONIC_RAW clockid, it
      uncovered potential 1ns time inconsistencies caused by the
      timekeeping core not handing sub-ns resolution.
      
      Now that the core code has been fixed and is actually setting
      tkr_raw.xtime_nsec, we need to take that into account in the
      vDSO by adding it to the shifted raw_time value, in order to
      fix the user-visible inconsistency. Rather than do that at each
      use (and expand the data page in the process), instead perform
      the shift/addition operation when populating the data page and
      remove the shift from the vDSO code entirely.
      
      [jstultz: minor whitespace tweak, tried to improve commit
       message to make it more clear this fixes a regression]
      Reported-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Tested-by: NDaniel Mentz <danielmentz@google.com>
      Acked-by: NKevin Brodsky <kevin.brodsky@arm.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: "stable #4 . 8+" <stable@vger.kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-4-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      dbb236c1
    • J
      time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting · 3d88d56c
      John Stultz 提交于
      Due to how the MONOTONIC_RAW accumulation logic was handled,
      there is the potential for a 1ns discontinuity when we do
      accumulations. This small discontinuity has for the most part
      gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
      in their vDSO clock_gettime implementation, we've seen failures
      with the inconsistency-check test in kselftest.
      
      This patch addresses the issue by using the same sub-ns
      accumulation handling that CLOCK_MONOTONIC uses, which avoids
      the issue for in-kernel users.
      
      Since the ARM64 vDSO implementation has its own clock_gettime
      calculation logic, this patch reduces the frequency of errors,
      but failures are still seen. The ARM64 vDSO will need to be
      updated to include the sub-nanosecond xtime_nsec values in its
      calculation for this issue to be completely fixed.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Tested-by: NDaniel Mentz <danielmentz@google.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "stable #4 . 8+" <stable@vger.kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3d88d56c
    • J
      time: Fix clock->read(clock) race around clocksource changes · ceea5e37
      John Stultz 提交于
      In tests, which excercise switching of clocksources, a NULL
      pointer dereference can be observed on AMR64 platforms in the
      clocksource read() function:
      
      u64 clocksource_mmio_readl_down(struct clocksource *c)
      {
      	return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask;
      }
      
      This is called from the core timekeeping code via:
      
      	cycle_now = tkr->read(tkr->clock);
      
      tkr->read is the cached tkr->clock->read() function pointer.
      When the clocksource is changed then tkr->clock and tkr->read
      are updated sequentially. The code above results in a sequential
      load operation of tkr->read and tkr->clock as well.
      
      If the store to tkr->clock hits between the loads of tkr->read
      and tkr->clock, then the old read() function is called with the
      new clock pointer. As a consequence the read() function
      dereferences a different data structure and the resulting 'reg'
      pointer can point anywhere including NULL.
      
      This problem was introduced when the timekeeping code was
      switched over to use struct tk_read_base. Before that, it was
      theoretically possible as well when the compiler decided to
      reload clock in the code sequence:
      
           now = tk->clock->read(tk->clock);
      
      Add a helper function which avoids the issue by reading
      tk_read_base->clock once into a local variable clk and then issue
      the read function via clk->read(clk). This guarantees that the
      read() function always gets the proper clocksource pointer handed
      in.
      
      Since there is now no use for the tkr.read pointer, this patch
      also removes it, and to address stopping the fast timekeeper
      during suspend/resume, it introduces a dummy clocksource to use
      rather then just a dummy read function.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: stable <stable@vger.kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      ceea5e37
  3. 19 6月, 2017 9 次提交
    • L
      Linux 4.12-rc6 · 41f1830f
      Linus Torvalds 提交于
      41f1830f
    • H
      mm: larger stack guard gap, between vmas · 1be7107f
      Hugh Dickins 提交于
      Stack guard page is a useful feature to reduce a risk of stack smashing
      into a different mapping. We have been using a single page gap which
      is sufficient to prevent having stack adjacent to a different mapping.
      But this seems to be insufficient in the light of the stack usage in
      userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
      used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
      which is 256kB or stack strings with MAX_ARG_STRLEN.
      
      This will become especially dangerous for suid binaries and the default
      no limit for the stack size limit because those applications can be
      tricked to consume a large portion of the stack and a single glibc call
      could jump over the guard page. These attacks are not theoretical,
      unfortunatelly.
      
      Make those attacks less probable by increasing the stack guard gap
      to 1MB (on systems with 4k pages; but make it depend on the page size
      because systems with larger base pages might cap stack allocations in
      the PAGE_SIZE units) which should cover larger alloca() and VLA stack
      allocations. It is obviously not a full fix because the problem is
      somehow inherent, but it should reduce attack space a lot.
      
      One could argue that the gap size should be configurable from userspace,
      but that can be done later when somebody finds that the new 1MB is wrong
      for some special case applications.  For now, add a kernel command line
      option (stack_guard_gap) to specify the stack gap size (in page units).
      
      Implementation wise, first delete all the old code for stack guard page:
      because although we could get away with accounting one extra page in a
      stack vma, accounting a larger gap can break userspace - case in point,
      a program run with "ulimit -S -v 20000" failed when the 1MB gap was
      counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
      and strict non-overcommit mode.
      
      Instead of keeping gap inside the stack vma, maintain the stack guard
      gap as a gap between vmas: using vm_start_gap() in place of vm_start
      (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
      places which need to respect the gap - mainly arch_get_unmapped_area(),
      and and the vma tree's subtree_gap support for that.
      Original-patch-by: NOleg Nesterov <oleg@redhat.com>
      Original-patch-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Tested-by: Helge Deller <deller@gmx.de> # parisc
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1be7107f
    • L
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 1132d5e7
      Linus Torvalds 提交于
      Pull ARM SoC fixes from Olof Johansson:
       "Stream of fixes has slowed down, only a few this week:
      
         - Some DT fixes for Allwinner platforms, and addition of a clock to
           the R_CCU clock controller that had been missed.
      
         - A couple of small DT fixes for am335x-sl50"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        arm64: allwinner: a64: Add PLL_PERIPH0 clock to the R_CCU
        ARM: sunxi: h3-h5: Add PLL_PERIPH0 clock to the R_CCU
        ARM: dts: am335x-sl50: Fix cannot claim requested pins for spi0
        ARM: dts: am335x-sl50: Fix card detect pin for mmc1
        arm64: allwinner: h5: Remove syslink to shared DTSI
        ARM: sunxi: h3/h5: fix the compatible of R_CCU
      1132d5e7
    • O
      Merge tag 'sunxi-fixes-for-4.12' of... · a1858df9
      Olof Johansson 提交于
      Merge tag 'sunxi-fixes-for-4.12' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into fixes
      
      Allwinner fixes for 4.12
      
      A few fixes around the PRCM support that got in 4.12 with a wrong
      compatible, and a missing clock in the binding.
      
      * tag 'sunxi-fixes-for-4.12' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
        arm64: allwinner: a64: Add PLL_PERIPH0 clock to the R_CCU
        ARM: sunxi: h3-h5: Add PLL_PERIPH0 clock to the R_CCU
        arm64: allwinner: h5: Remove syslink to shared DTSI
        ARM: sunxi: h3/h5: fix the compatible of R_CCU
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      a1858df9
    • O
      Merge tag 'omap-for-v4.12/fixes-sl50' of... · 51b6e281
      Olof Johansson 提交于
      Merge tag 'omap-for-v4.12/fixes-sl50' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
      
      Two fixes for am335x-sl50 to fix a boot time error
      for claiming SPI pins, and to fix a SDIO card detect
      pin for production version of the device.
      
      * tag 'omap-for-v4.12/fixes-sl50' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
        ARM: dts: am335x-sl50: Fix cannot claim requested pins for spi0
        ARM: dts: am335x-sl50: Fix card detect pin for mmc1
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      51b6e281
    • L
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 3696e4f0
      Linus Torvalds 提交于
      Pull virtio bugfix from Michael Tsirkin:
       "It turns out balloon does not handle IOMMUs correctly. We should fix
        that at some point, for now let's just disable this configuration"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio_balloon: disable VIOMMU support
      3696e4f0
    • L
      Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 7d62d947
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "Two driver bugfixes"
      
      * 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: ismt: fix wrong device address when unmap the data buffer
        i2c: rcar: use correct length when unmapping DMA
      7d62d947
    • L
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · b3ee4edd
      Linus Torvalds 提交于
      Pull MIPS fixes from Ralf Baechle:
      
       - Three highmem fixes:
          + Fixed mapping initialization
          + Adjust the pkmap location
          + Ensure we use at most one page for PTEs
      
       - Fix makefile dependencies for .its targets to depend on vmlinux
      
       - Fix reversed condition in BNEZC and JIALC software branch emulation
      
       - Only flush initialized flush_insn_slot to avoid NULL pointer
         dereference
      
       - perf: Remove incorrect odd/even counter handling for I6400
      
       - ftrace: Fix init functions tracing
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        MIPS: .its targets depend on vmlinux
        MIPS: Fix bnezc/jialc return address calculation
        MIPS: kprobes: flush_insn_slot should flush only if probe initialised
        MIPS: ftrace: fix init functions tracing
        MIPS: mm: adjust PKMAP location
        MIPS: highmem: ensure that we don't use more than one page for PTEs
        MIPS: mm: fixed mappings: correct initialisation
        MIPS: perf: Remove incorrect odd/even counter handling for I6400
      b3ee4edd
    • M
      virtio_balloon: disable VIOMMU support · e41b1355
      Michael S. Tsirkin 提交于
      virtio balloon bypasses the DMA API entirely so does not support the
      VIOMMU right now.  It's not clear we need that support, for now let's
      just make sure we don't pretend to support it.
      
      Cc: stable@vger.kernel.org
      Cc: Wei Wang <wei.w.wang@intel.com>
      Fixes: 1a937693 ("virtio: new feature to detect IOMMU device quirk")
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      e41b1355
  4. 18 6月, 2017 10 次提交
  5. 17 6月, 2017 14 次提交
    • L
      Merge tag 'xfs-4.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · adc31103
      Linus Torvalds 提交于
      Pull xfs fix from Darrick Wong:
       "One more bugfix for you for 4.12-rc6 to fix something that came up in
        an earlier rc:
      
         - Fix some bogus ASSERT failures on CONFIG_SMP=n and CONFIG_XFS_DEBUG=y"
      
      * tag 'xfs-4.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: fix spurious spin_is_locked() assert failures on non-smp kernels
      adc31103
    • L
      Merge branch 'ufs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · c8636b90
      Linus Torvalds 提交于
      Pull ufs fixes from Al Viro:
       "Fix assorted ufs bugs: a couple of deadlocks, fs corruption in
        truncate(), oopsen on tail unpacking and truncate when racing with
        vmscan, mild fs corruption (free blocks stats summary buggered, *BSD
        fsck would complain and fix), several instances of broken logics
        around reserved blocks (starting with "check almost never triggers
        when it should" and then there are issues with sufficiently large
        UFS2)"
      
      [ Note: ufs hasn't gotten any loving in a long time, because nobody
        really seems to use it. These ufs fixes are triggered by people
        actually caring now, not some sudden influx of new bugs.  - Linus ]
      
      * 'ufs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        ufs_truncate_blocks(): fix the case when size is in the last direct block
        ufs: more deadlock prevention on tail unpacking
        ufs: avoid grabbing ->truncate_mutex if possible
        ufs_get_locked_page(): make sure we have buffer_heads
        ufs: fix s_size/s_dsize users
        ufs: fix reserved blocks check
        ufs: make ufs_freespace() return signed
        ufs: fix logics in "ufs: make fsck -f happy"
      c8636b90
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · ccd3d905
      Linus Torvalds 提交于
      Pull vfs fixes from Al Viro:
       "A couple of fixes; a leak in mntns_install() caught by Andrei (this
        cycle regression) + d_invalidate() softlockup fix - that had been
        reported by a bunch of people lately, but the problem is pretty old"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs: don't forget to put old mntns in mntns_install
        Hang/soft lockup in d_invalidate with simultaneous calls
      ccd3d905
    • L
      Merge tag 'pci-v4.12-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 1439ccf7
      Linus Torvalds 提交于
      Pull PCI fixes from Bjorn Helgaas:
      
       - fix another PCI_ENDPOINT build error (merged for v4.12)
      
       - fix error codes added to config accessors for v4.12
      
      * tag 'pci-v4.12-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: endpoint: Select CRC32 to fix test build error
        PCI: Make error code types consistent in pci_{read,write}_config_*
      1439ccf7
    • L
      Merge tag 'fbdev-v4.12-rc6' of git://github.com/bzolnier/linux · 3a448294
      Linus Torvalds 提交于
      Pull fbdev fixes from Bartlomiej Zolnierkiewicz:
      
       - fix udlfb driver to stop spamming logs (Mike Gerow)
      
       - add missing endianness conversions in smscufx & udlfb drivers (Johan
         Hovold)
      
       - fix few gcc warnings/errors (Arnd Bergmann)
      
      * tag 'fbdev-v4.12-rc6' of git://github.com/bzolnier/linux:
        video: fbdev: udlfb: drop log level for blanking
        video: fbdev: via: remove possibly unused variables
        video: fbdev: add missing USB-descriptor endianness conversions
        video: fbdev: avoid int-in-bool-context warning
      3a448294
    • L
      Merge branch 'akpm' (patches from Andrew) · 162f73f4
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "5 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: correct the comment when reclaimed pages exceed the scanned pages
        userfaultfd: shmem: handle coredumping in handle_userfault()
        mm: numa: avoid waiting on freed migrated pages
        swap: cond_resched in swap_cgroup_prepare()
        mm/memory-failure.c: use compound_head() flags for huge pages
      162f73f4
    • Z
      mm: correct the comment when reclaimed pages exceed the scanned pages · d7143e31
      zhongjiang 提交于
      Commit e1587a49 ("mm: vmpressure: fix sending wrong events on
      underflow") declared that reclaimed pages exceed the scanned pages due
      to the thp reclaim.
      
      That is incorrect because THP will be spilt to normal page and loop
      again, which will result in the scanned pages increment.
      
      [akpm@linux-foundation.org: tweak comment text]
      Link: http://lkml.kernel.org/r/1496824266-25235-1-git-send-email-zhongjiang@huawei.comSigned-off-by: Nzhongjiang <zhongjiang@huawei.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7143e31
    • A
      userfaultfd: shmem: handle coredumping in handle_userfault() · 64c2b203
      Andrea Arcangeli 提交于
      Anon and hugetlbfs handle FOLL_DUMP set by get_dump_page() internally to
      __get_user_pages().
      
      shmem as opposed has no special FOLL_DUMP handling there so
      handle_mm_fault() is invoked without mmap_sem and ends up calling
      handle_userfault() that isn't expecting to be invoked without mmap_sem
      held.
      
      This makes handle_userfault() fail immediately if invoked through
      shmem_vm_ops->fault during coredumping and solves the problem.
      
      The side effect is a BUG_ON with no lock held triggered by the
      coredumping process which exits.  Only 4.11 is affected, pre-4.11 anon
      memory holes are skipped in __get_user_pages by checking FOLL_DUMP
      explicitly against empty pagetables (mm/gup.c:no_page_table()).
      
      It's zero cost as we already had a check for current->flags to prevent
      futex to trigger userfaults during exit (PF_EXITING).
      
      Link: http://lkml.kernel.org/r/20170615214838.27429-1-aarcange@redhat.comSigned-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: N"Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.11+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64c2b203
    • M
      mm: numa: avoid waiting on freed migrated pages · 3c226c63
      Mark Rutland 提交于
      In do_huge_pmd_numa_page(), we attempt to handle a migrating thp pmd by
      waiting until the pmd is unlocked before we return and retry.  However,
      we can race with migrate_misplaced_transhuge_page():
      
          // do_huge_pmd_numa_page                // migrate_misplaced_transhuge_page()
          // Holds 0 refs on page                 // Holds 2 refs on page
      
          vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
          /* ... */
          if (pmd_trans_migrating(*vmf->pmd)) {
                  page = pmd_page(*vmf->pmd);
                  spin_unlock(vmf->ptl);
                                                  ptl = pmd_lock(mm, pmd);
                                                  if (page_count(page) != 2)) {
                                                          /* roll back */
                                                  }
                                                  /* ... */
                                                  mlock_migrate_page(new_page, page);
                                                  /* ... */
                                                  spin_unlock(ptl);
                                                  put_page(page);
                                                  put_page(page); // page freed here
                  wait_on_page_locked(page);
                  goto out;
          }
      
      This can result in the freed page having its waiters flag set
      unexpectedly, which trips the PAGE_FLAGS_CHECK_AT_PREP checks in the
      page alloc/free functions.  This has been observed on arm64 KVM guests.
      
      We can avoid this by having do_huge_pmd_numa_page() take a reference on
      the page before dropping the pmd lock, mirroring what we do in
      __migration_entry_wait().
      
      When we hit the race, migrate_misplaced_transhuge_page() will see the
      reference and abort the migration, as it may do today in other cases.
      
      Fixes: b8916634 ("mm: Prevent parallel splits during THP migration")
      Link: http://lkml.kernel.org/r/1497349722-6731-2-git-send-email-will.deacon@arm.comSigned-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NSteve Capper <steve.capper@arm.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c226c63
    • Y
      swap: cond_resched in swap_cgroup_prepare() · ef707629
      Yu Zhao 提交于
      I saw need_resched() warnings when swapping on large swapfile (TBs)
      because continuously allocating many pages in swap_cgroup_prepare() took
      too long.
      
      We already cond_resched when freeing page in swap_cgroup_swapoff().  Do
      the same for the page allocation.
      
      Link: http://lkml.kernel.org/r/20170604200109.17606-1-yuzhao@google.comSigned-off-by: NYu Zhao <yuzhao@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVladimir Davydov <vdavydov.dev@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef707629
    • J
      mm/memory-failure.c: use compound_head() flags for huge pages · 7258ae5c
      James Morse 提交于
      memory_failure() chooses a recovery action function based on the page
      flags.  For huge pages it uses the tail page flags which don't have
      anything interesting set, resulting in:
      
      > Memory failure: 0x9be3b4: Unknown page state
      > Memory failure: 0x9be3b4: recovery action for unknown page: Failed
      
      Instead, save a copy of the head page's flags if this is a huge page,
      this means if there are no relevant flags for this tail page, we use the
      head pages flags instead.  This results in the me_huge_page() recovery
      action being called:
      
      > Memory failure: 0x9b7969: recovery action for huge page: Delayed
      
      For hugepages that have not yet been allocated, this allows the hugepage
      to be dequeued.
      
      Fixes: 524fca1e ("HWPOISON: fix misjudgement of page_action() for errors on mlocked pages")
      Link: http://lkml.kernel.org/r/20170524130204.21845-1-james.morse@arm.comSigned-off-by: NJames Morse <james.morse@arm.com>
      Tested-by: NPunit Agrawal <punit.agrawal@arm.com>
      Acked-by: NPunit Agrawal <punit.agrawal@arm.com>
      Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7258ae5c
    • L
      Merge tag 'powerpc-4.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 5ac447d2
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
       "Three small fixes for recently merged code:
      
         - remove a spurious WARN_ON when a PCI device has no of_node, it's
           allowed in some circumstances for there to be no of_node.
      
         - fix the offset for store EOI MMIOs in the XIVE interrupt
           controller.
      
         - fix non-const WARN_ONs which were becoming BUGs due to them losing
           BUGFLAG_WARNING in a recent cleanup patch.
      
        Thanks to: Alexey Kardashevskiy, Alistair Popple, Benjamin
        Herrenschmidt"
      
      * tag 'powerpc-4.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/debug: Add missing warn flag to WARN_ON's non-builtin path
        powerpc/xive: Fix offset for store EOI MMIOs
        powerpc/npu-dma: Remove spurious WARN_ON when a PCI device has no of_node
      5ac447d2
    • I
      Merge tag 'perf-urgent-for-mingo-4.12-20170616' of... · 531c221d
      Ingo Molnar 提交于
      Merge tag 'perf-urgent-for-mingo-4.12-20170616' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
      
      - Fix probing of precise_ip level for default cycles event, that
        got broken recently on x86_64 when its arch code started
        considering invalid requesting precise samples when not sampling
        (i.e. when attr.sample_period == 0).
      
        This also fixes another problem in s/390 where the precision
        probing with sample_period == 0 returned precise_ip > 0, that
        then, when setting up the real cycles event (not probing) would
        return EOPNOTSUPP for precise_ip > 0 (as determined previously
        by probing) and sample_period > 0.
      
        These problems resulted in attr_precise not being set to the
        highest precision available on x86.64 when no event was specified,
        i.e. the canonical:
      
      	perf record ./workload
      
        would end up using attr.precise_ip = 0. As a workaround this would
        need to be done:
      
      	perf record -e cycles:P ./workload
      
        And on s/390 it would plain not work, requiring using:
      
              perf record -e cycles ./workload
      
        as a workaround.  (Arnaldo Carvalho de Melo)
      
      - Fix perf build with ARCH=x86_64, when ARCH should be transformed
        into ARCH=x86, just like with the main kernel Makefile and
        tools/objtool's, i.e. use SRCARCH. (Jiada Wang)
      
      - Avoid accessing uninitialized data structures when unwinding with
        elfutils's libdw, making it more closely mimic libunwind's unwinder.
        (Milian Wolff)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      531c221d
    • M
      perf unwind: Report module before querying isactivation in dwfl unwind · 9126cbba
      Milian Wolff 提交于
      The PC returned by dwfl_frame_pc() may map into a not-yet-reported
      module. We have to report it before we continue unwinding. But when we
      query for the isactivation flag in dwfl_frame_pc, libdw will actually do
      one more unwinding step internally which can then break and lead to
      missed frames or broken stacks.
      
      With libunwind we get e.g.:
      
      ~~~~~
        heaptrack_gui  2228 135073.400474:     613969 cycles:
      	          108c8e [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1093bc [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          109e7b QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
      	          1470ff [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          147f67 QSystemLocale::query (/usr/lib/libQt5Core.so.5.8.0)
      	          109fbf QLocalePrivate::updateSystemPrivate (/usr/lib/libQt5Core.so.5.8.0)
      	          10aa27 QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
      	          1e02c3 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          2113bb [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          211505 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1b5df0 QFileInfo::exists (/usr/lib/libQt5Core.so.5.8.0)
      	           92eb2 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	           93423 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	           93d2a QLibraryInfo::location (/usr/lib/libQt5Core.so.5.8.0)
      	          2170af [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          297c53 QCoreApplicationPrivate::init (/usr/lib/libQt5Core.so.5.8.0)
      	           f7cde QGuiApplicationPrivate::init (/usr/lib/libQt5Gui.so.5.8.0)
      	          1589e8 QApplicationPrivate::init (/usr/lib/libQt5Widgets.so.5.8.0)
      	           78622 main (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      	           20439 __libc_start_main (/usr/lib/libc-2.25.so)
      	           78299 _start (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      
        heaptrack_gui  2228 135073.401156:     569521 cycles:
      	          131633 QString::endsWith (/usr/lib/libQt5Core.so.5.8.0)
      	          1a0701 QDir::cleanPath (/usr/lib/libQt5Core.so.5.8.0)
      	          21b82d [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1b3727 QFileInfo::canonicalFilePath (/usr/lib/libQt5Core.so.5.8.0)
      	          2780c7 QFactoryLoader::update (/usr/lib/libQt5Core.so.5.8.0)
      	          279525 QFactoryLoader::QFactoryLoader (/usr/lib/libQt5Core.so.5.8.0)
      	           e5bd0 QPlatformIntegrationFactory::create (/usr/lib/libQt5Gui.so.5.8.0)
      	           f5a1c QGuiApplicationPrivate::createPlatformIntegration (/usr/lib/libQt5Gui.so.5.8.0)
      	           f650c QGuiApplicationPrivate::createEventDispatcher (/usr/lib/libQt5Gui.so.5.8.0)
      	          298524 QCoreApplicationPrivate::init (/usr/lib/libQt5Core.so.5.8.0)
      	           f7cde QGuiApplicationPrivate::init (/usr/lib/libQt5Gui.so.5.8.0)
      	          1589e8 QApplicationPrivate::init (/usr/lib/libQt5Widgets.so.5.8.0)
      	           78622 main (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      	           20439 __libc_start_main (/usr/lib/libc-2.25.so)
      	           78299 _start (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      ~~~~~
      
      Note the two frames 1589e8 and 78622 in the first sample. These are
      missing when unwinding with libdw. The second sample's breakage is
      more obvious:
      
      ~~~~~
        heaptrack_gui  2228 135073.400474:     613969 cycles:
      	          108c8e [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1093bc [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          109e7b QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
      	          1470ff [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          147f67 QSystemLocale::query (/usr/lib/libQt5Core.so.5.8.0)
      	          109fbf QLocalePrivate::updateSystemPrivate (/usr/lib/libQt5Core.so.5.8.0)
      	          10aa27 QLocale::QLocale (/usr/lib/libQt5Core.so.5.8.0)
      	          1e02c3 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          2113bb [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          211505 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1b5df0 QFileInfo::exists (/usr/lib/libQt5Core.so.5.8.0)
      	           92eb2 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	           93423 [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	           93d2a QLibraryInfo::location (/usr/lib/libQt5Core.so.5.8.0)
      	          2170af [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          297c53 QCoreApplicationPrivate::init (/usr/lib/libQt5Core.so.5.8.0)
      	           f7cde QGuiApplicationPrivate::init (/usr/lib/libQt5Gui.so.5.8.0)
      	           20439 __libc_start_main (/usr/lib/libc-2.25.so)
      	           78299 _start (/home/milian/projects/compiled/other/bin/heaptrack_gui)
      
      heaptrack_gui  2228 135073.401156:     569521 cycles:
      	          131633 QString::endsWith (/usr/lib/libQt5Core.so.5.8.0)
      	          1a0701 QDir::cleanPath (/usr/lib/libQt5Core.so.5.8.0)
      	          21b82d [unknown] (/usr/lib/libQt5Core.so.5.8.0)
      	          1b3727 QFileInfo::canonicalFilePath (/usr/lib/libQt5Core.so.5.8.0)
      	          2780c7 QFactoryLoader::update (/usr/lib/libQt5Core.so.5.8.0)
      	          279525 QFactoryLoader::QFactoryLoader (/usr/lib/libQt5Core.so.5.8.0)
      	           e5bd0 QPlatformIntegrationFactory::create (/usr/lib/libQt5Gui.so.5.8.0)
      	          723dbf [unknown] ([unknown])
      ~~~~~
      
      This patch fixes this issue and the libdw unwinder mimicks the libunwind
      behavior more closely.
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Acked-by: NJan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20170602143753.16907-2-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9126cbba
  6. 16 6月, 2017 2 次提交