1. 18 12月, 2018 8 次提交
  2. 15 12月, 2018 1 次提交
  3. 12 12月, 2018 2 次提交
    • D
      x86/mm: Fix decoy address handling vs 32-bit builds · 51c3fbd8
      Dan Williams 提交于
      A decoy address is used by set_mce_nospec() to update the cache attributes
      for a page that may contain poison (multi-bit ECC error) while attempting
      to minimize the possibility of triggering a speculative access to that
      page.
      
      When reserve_memtype() is handling a decoy address it needs to convert it
      to its real physical alias. The conversion, AND'ing with __PHYSICAL_MASK,
      is broken for a 32-bit physical mask and reserve_memtype() is passed the
      last physical page. Gert reports triggering the:
      
          BUG_ON(start >= end);
      
      ...assertion when running a 32-bit non-PAE build on a platform that has
      a driver resource at the top of physical memory:
      
          BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
      
      Given that the decoy address scheme is only targeted at 64-bit builds and
      assumes that the top of physical address space is free for use as a decoy
      address range, simply bypass address sanitization in the 32-bit case.
      
      Lastly, there was no need to crash the system when this failure occurred,
      and no need to crash future systems if the assumptions of decoy addresses
      are ever violated. Change the BUG_ON() to a WARN() with an error return.
      
      Fixes: 510ee090 ("x86/mm/pat: Prepare {reserve, free}_memtype() for...")
      Reported-by: NGert Robben <t2@gert.gr>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NGert Robben <t2@gert.gr>
      Cc: stable@vger.kernel.org
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: platform-driver-x86@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/154454337985.789277.12133288391664677775.stgit@dwillia2-desk3.amr.corp.intel.com
      51c3fbd8
    • R
      x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking sequence · 80b71c34
      Reinette Chatre 提交于
      The user triggers the creation of a pseudo-locked region when writing
      the requested schemata to the schemata resctrl file. The pseudo-locking
      of a region is required to be done on a CPU that is associated with the
      cache on which the pseudo-locked region will reside. In order to run the
      locking code on a specific CPU, the needed CPU has to be selected and
      ensured to remain online during the entire locking sequence.
      
      At this time, the cpu_hotplug_lock is not taken during the pseudo-lock
      region creation and it is thus possible for a CPU to be selected to run
      the pseudo-locking code and then that CPU to go offline before the
      thread is able to run on it.
      
      Fix this by ensuring that the cpu_hotplug_lock is taken while the CPU on
      which code has to run needs to be controlled. Since the cpu_hotplug_lock
      is always taken before rdtgroup_mutex the lock order is maintained.
      
      Fixes: e0bdfe8e ("x86/intel_rdt: Support creation/removal of pseudo-locked region")
      Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: gavin.hindman@intel.com
      Cc: jithu.joseph@intel.com
      Cc: stable <stable@vger.kernel.org>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/b7b17432a80f95a1fa21a1698ba643014f58ad31.1544476425.git.reinette.chatre@intel.com
      80b71c34
  4. 11 12月, 2018 2 次提交
  5. 08 12月, 2018 1 次提交
  6. 05 12月, 2018 6 次提交
  7. 03 12月, 2018 5 次提交
    • J
      x86/boot: Clear RSDP address in boot_params for broken loaders · 182ddd16
      Juergen Gross 提交于
      Gunnar Krueger reported a systemd-boot failure and bisected it down to:
      
        e6e094e0 ("x86/acpi, x86/boot: Take RSDP address from boot params if available")
      
      In case a broken boot loader doesn't clear its 'struct boot_params', clear
      rsdp_addr in sanitize_boot_params().
      Reported-by: NGunnar Krueger <taijian@posteo.de>
      Tested-by: NGunnar Krueger <taijian@posteo.de>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bp@alien8.de
      Cc: sstabellini@kernel.org
      Fixes: e6e094e0 ("x86/acpi, x86/boot: Take RSDP address from boot params if available")
      Link: http://lkml.kernel.org/r/20181203103811.17056-1-jgross@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      182ddd16
    • L
      Linux 4.20-rc5 · 25956467
      Linus Torvalds 提交于
      25956467
    • L
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 6a512726
      Linus Torvalds 提交于
      Pull ARM SoC fixes from Olof Johansson:
       "Volume is a little higher than usual due to a set of gpio fixes for
        Davinci platforms that's been around a while, still seemed appropriate
        to not hold off until next merge window.
      
        Besides that it's the usual mix of minor fixes, mostly corrections of
        small stuff in device trees.
      
        Major stability-related one is the removal of a regulator from DT on
        Rock960, since DVFS caused undervoltage. I expect it'll be restored
        once they figure out the underlying issue"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits)
        MAINTAINERS: Remove unused Qualcomm SoC mailing list
        ARM: davinci: dm644x: set the GPIO base to 0
        ARM: davinci: da830: set the GPIO base to 0
        ARM: davinci: dm355: set the GPIO base to 0
        ARM: davinci: dm646x: set the GPIO base to 0
        ARM: davinci: dm365: set the GPIO base to 0
        ARM: davinci: da850: set the GPIO base to 0
        gpio: davinci: restore a way to manually specify the GPIO base
        ARM: davinci: dm644x: define gpio interrupts as separate resources
        ARM: davinci: dm355: define gpio interrupts as separate resources
        ARM: davinci: dm646x: define gpio interrupts as separate resources
        ARM: davinci: dm365: define gpio interrupts as separate resources
        ARM: davinci: da8xx: define gpio interrupts as separate resources
        ARM: dts: at91: sama5d2: use the divided clock for SMC
        ARM: dts: imx51-zii-rdu1: Remove EEPROM node
        ARM: dts: rockchip: Remove @0 from the veyron memory node
        arm64: dts: rockchip: Fix PCIe reset polarity for rk3399-puma-haikou.
        arm64: dts: qcom: msm8998: Reserve gpio ranges on MTP
        arm64: dts: sdm845-mtp: Reserve reserved gpios
        arm64: dts: ti: k3-am654: Fix wakeup_uart reg address
        ...
      6a512726
    • L
      Merge tag 'for-linus-4.20a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 292974c5
      Linus Torvalds 提交于
      Pull xen fixes from Juergen Gross:
      
       - A revert of a previous commit as it is no longer necessary and has
         shown to cause problems in some memory hotplug cases.
      
       - Some small fixes and a minor cleanup.
      
       - A patch for adding better diagnostic data in a very rare failure
         case.
      
      * tag 'for-linus-4.20a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        pvcalls-front: fixes incorrect error handling
        Revert "xen/balloon: Mark unallocated host memory as UNUSABLE"
        xen: xlate_mmu: add missing header to fix 'W=1' warning
        xen/x86: add diagnostic printout to xen_mc_flush() in case of error
        x86/xen: cleanup includes in arch/x86/xen/spinlock.c
      292974c5
    • L
      Merge tag 'dmaengine-fix-4.20-rc5' of git://git.infradead.org/users/vkoul/slave-dma · a234c737
      Linus Torvalds 提交于
      Pull dmaengine fixes from Vinod Koul:
       "This contains two fixes to at_hdmac which fixes long standing bus
        reported recently on serial transfers causing memory leak. These fixes
        were done by Richard Genoud"
      
      * tag 'dmaengine-fix-4.20-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: at_hdmac: fix module unloading
        dmaengine: at_hdmac: fix memory leak in at_dma_xlate()
      a234c737
  8. 02 12月, 2018 3 次提交
    • L
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4b783176
      Linus Torvalds 提交于
      Pull STIBP fallout fixes from Thomas Gleixner:
       "The performance destruction department finally got it's act together
        and came up with a cure for the STIPB regression:
      
         - Provide a command line option to control the spectre v2 user space
           mitigations. Default is either seccomp or prctl (if seccomp is
           disabled in Kconfig). prctl allows mitigation opt-in, seccomp
           enables the migitation for sandboxed processes.
      
         - Rework the code to handle the conditional STIBP/IBPB control and
           remove the now unused ptrace_may_access_sched() optimization
           attempt
      
         - Disable STIBP automatically when SMT is disabled
      
         - Optimize the switch_to() logic to avoid MSR writes and invocations
           of __switch_to_xtra().
      
         - Make the asynchronous speculation TIF updates synchronous to
           prevent stale mitigation state.
      
        As a general cleanup this also makes retpoline directly depend on
        compiler support and removes the 'minimal retpoline' option which just
        pretended to provide some form of security while providing none"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
        x86/speculation: Provide IBPB always command line options
        x86/speculation: Add seccomp Spectre v2 user space protection mode
        x86/speculation: Enable prctl mode for spectre_v2_user
        x86/speculation: Add prctl() control for indirect branch speculation
        x86/speculation: Prepare arch_smt_update() for PRCTL mode
        x86/speculation: Prevent stale SPEC_CTRL msr content
        x86/speculation: Split out TIF update
        ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS
        x86/speculation: Prepare for conditional IBPB in switch_mm()
        x86/speculation: Avoid __switch_to_xtra() calls
        x86/process: Consolidate and simplify switch_to_xtra() code
        x86/speculation: Prepare for per task indirect branch speculation control
        x86/speculation: Add command line control for indirect branch speculation
        x86/speculation: Unify conditional spectre v2 print functions
        x86/speculataion: Mark command line parser data __initdata
        x86/speculation: Mark string arrays const correctly
        x86/speculation: Reorder the spec_v2 code
        x86/l1tf: Show actual SMT state
        x86/speculation: Rework SMT state change
        sched/smt: Expose sched_smt_present static key
        ...
      4b783176
    • L
      Merge tag 'for-linus-20181201' of git://git.kernel.dk/linux-block · 88058417
      Linus Torvalds 提交于
      Pull block layer fixes from Jens Axboe:
      
       - Single range elevator discard merge fix, that caused crashes (Ming)
      
       - Fix for a regression in O_DIRECT, where we could potentially lose the
         error value (Maximilian Heyne)
      
       - NVMe pull request from Christoph, with little fixes all over the map
         for NVMe.
      
      * tag 'for-linus-20181201' of git://git.kernel.dk/linux-block:
        block: fix single range discard merge
        nvme-rdma: fix double freeing of async event data
        nvme: flush namespace scanning work just before removing namespaces
        nvme: warn when finding multi-port subsystems without multipathing enabled
        fs: fix lost error code in dio_complete
        nvme-pci: fix surprise removal
        nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request()
        nvme: Free ctrl device name on init failure
      88058417
    • L
      Merge tag 'pci-v4.20-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · c734b425
      Linus Torvalds 提交于
      Pull PCI fixes from Bjorn Helgaas:
      
       - Fix a link speed checking interface that broke PCIe gen3 cards in
         gen1 slots (Mikulas Patocka)
      
       - Fix an imx6 link training error (Trent Piepho)
      
       - Fix a layerscape outbound window accessor calling error (Hou
         Zhiqiang)
      
       - Fix a DesignWare endpoint MSI-X address calculation error (Gustavo
         Pimentel)
      
      * tag 'pci-v4.20-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: Fix incorrect value returned from pcie_get_speed_cap()
        PCI: dwc: Fix MSI-X EP framework address calculation bug
        PCI: layerscape: Fix wrong invocation of outbound window disable accessor
        PCI: imx6: Fix link training status detection in link up check
      c734b425
  9. 01 12月, 2018 12 次提交
    • B
      Merge remote-tracking branch 'lorenzo/pci/controller-fixes' into for-linus · c74eadf8
      Bjorn Helgaas 提交于
        - Fix DesignWare endpoint MSI-X address calculation bug (Gustavo
          Pimentel)
      
        - Fix Layerscape outbound window disable usage (Hou Zhiqiang)
      
        - Fix imx6 link up detection (Trent Piepho)
      
      * lorenzo/pci/controller-fixes:
        PCI: dwc: Fix MSI-X EP framework address calculation bug
        PCI: layerscape: Fix wrong invocation of outbound window disable accessor
        PCI: imx6: Fix link training status detection in link up check
      c74eadf8
    • M
      PCI: Fix incorrect value returned from pcie_get_speed_cap() · f1f90e25
      Mikulas Patocka 提交于
      The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks.  We must mask
      the register and compare it against them.
      
      This fixes errors like this:
      
        amdgpu: [powerplay] failed to send message 261 ret is 0
      
      when a PCIe-v3 card is plugged into a PCIe-v1 slot, because the slot is
      being incorrectly reported as PCIe-v3 capable.
      
      6cf57be0, which appeared in v4.17, added pcie_get_speed_cap() with the
      incorrect test of PCI_EXP_LNKCAP_SLS as a bitmask.  5d9a6330, which
      appeared in v4.19, changed amdgpu to use pcie_get_speed_cap(), so the
      amdgpu bug reports below are regressions in v4.19.
      
      Fixes: 6cf57be0 ("PCI: Add pcie_get_speed_cap() to find max supported link speed")
      Fixes: 5d9a6330 ("drm/amdgpu: use pcie functions for link width and speed")
      Link: https://bugs.freedesktop.org/show_bug.cgi?id=108704
      Link: https://bugs.freedesktop.org/show_bug.cgi?id=108778Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      [bhelgaas: update comment, remove use of PCI_EXP_LNKCAP_SLS_8_0GB and
      PCI_EXP_LNKCAP_SLS_16_0GB since those should be covered by PCI_EXP_LNKCAP2,
      remove test of PCI_EXP_LNKCAP for zero, since that register is required]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org	# v4.17+
      f1f90e25
    • L
      Merge branch 'akpm' (patches from Andrew) · d8f190ee
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "31 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits)
        ocfs2: fix potential use after free
        mm/khugepaged: fix the xas_create_range() error path
        mm/khugepaged: collapse_shmem() do not crash on Compound
        mm/khugepaged: collapse_shmem() without freezing new_page
        mm/khugepaged: minor reorderings in collapse_shmem()
        mm/khugepaged: collapse_shmem() remember to clear holes
        mm/khugepaged: fix crashes due to misaccounted holes
        mm/khugepaged: collapse_shmem() stop if punched or truncated
        mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
        mm/huge_memory: splitting set mapping+index before unfreeze
        mm/huge_memory: rename freeze_page() to unmap_page()
        initramfs: clean old path before creating a hardlink
        kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
        psi: make disabling/enabling easier for vendor kernels
        proc: fixup map_files test on arm
        debugobjects: avoid recursive calls with kmemleak
        userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
        userfaultfd: shmem: add i_size checks
        userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
        userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
        ...
      d8f190ee
    • L
      Merge tag 'mips_fixes_4.20_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 6c7954b7
      Linus Torvalds 提交于
      Pull few more MIPS fixes from Paul Burton:
      
       - Fix mips_get_syscall_arg() to operate on the task specified when
         detecting o32 tasks running on MIPS64 kernels.
      
       - Fix some incorrect GPIO pin muxing for the MT7620 SoC.
      
       - Update the linux-mips mailing list address.
      
      * tag 'mips_fixes_4.20_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        MAINTAINERS: Update linux-mips mailing list address
        MIPS: ralink: Fix mt7620 nd_sd pinmux
        mips: fix mips_get_syscall_arg o32 check
      6c7954b7
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 868dda00
      Linus Torvalds 提交于
      Pull arm64 fixes from Catalin Marinas:
      
       - Cortex-A76 erratum workaround
      
       - ftrace fix to enable syscall events on arm64
      
       - Fix uninitialised pointer in iort_get_platform_device_domain()
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        ACPI/IORT: Fix iort_get_platform_device_domain() uninitialized pointer value
        arm64: ftrace: Fix to enable syscall events on arm64
        arm64: Add workaround for Cortex-A76 erratum 1286807
      868dda00
    • L
      Merge tag 'gcc-plugins-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 1f817429
      Linus Torvalds 提交于
      Pull stackleak plugin fix from Kees Cook:
       "Fix crash by not allowing kprobing of stackleak_erase() (Alexander
        Popov)"
      
      * tag 'gcc-plugins-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        stackleak: Disable function tracing and kprobes for stackleak_erase()
      1f817429
    • L
      Merge tag 'fscache-fixes-20181130' of... · fd3b3e0e
      Linus Torvalds 提交于
      Merge tag 'fscache-fixes-20181130' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      Pull fscache and cachefiles fixes from David Howells:
       "Misc fixes:
      
         - Fix an assertion failure at fs/cachefiles/xattr.c:138 caused by a
           race between a cache object lookup failing and someone attempting
           to reenable that object, thereby triggering an update of the
           object's attributes.
      
         - Fix an assertion failure at fs/fscache/operation.c:449 caused by a
           split atomic subtract and atomic read that allows a race to happen.
      
         - Fix a leak of backing pages when simultaneously reading the same
           page from the same object from two or more threads.
      
         - Fix a hang due to a race between a cache object being discarded and
           the corresponding cookie being reenabled.
      
        There are also some minor cleanups:
      
         - Cast an enum value to a different enum type to prevent clang from
           generating a warning. This shouldn't cause any sort of change in
           the emitted code.
      
         - Use ktime_get_real_seconds() instead of get_seconds(). This is just
           used to uniquify a filename for an object to be placed in the
           graveyard. Objects placed there are deleted by cachfilesd in
           userspace immediately thereafter.
      
         - Remove an initialised, but otherwise unused variable. This should
           have been entirely optimised away anyway"
      
      * tag 'fscache-fixes-20181130' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        fscache, cachefiles: remove redundant variable 'cache'
        cachefiles: avoid deprecated get_seconds()
        cachefiles: Explicitly cast enumerated type in put_object
        fscache: fix race between enablement and dropping of object
        cachefiles: Fix page leak in cachefiles_read_backing_file while vmscan is active
        fscache: Fix race in fscache_op_complete() due to split atomic_sub & read
        cachefiles: Fix an assertion failure when trying to update a failed object
      fd3b3e0e
    • P
      MAINTAINERS: Update linux-mips mailing list address · 6584297b
      Paul Burton 提交于
      The linux-mips.org infrastructure has been unreliable recently & nobody
      with sufficient access to fix it is around to do so. As a result we're
      moving away from it, and part of this is migrating our mailing list to
      kernel.org.
      
      Replace all instances of linux-mips@linux-mips.org in MAINTAINERS with
      the shiny new linux-mips@vger.kernel.org address.
      
      The new list is now being archived on kernel.org at
      https://lore.kernel.org/linux-mips/ which also holds the history of the
      old linux-mips.org list.
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Cc: linux-mips@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      6584297b
    • P
      ocfs2: fix potential use after free · 164f7e58
      Pan Bian 提交于
      ocfs2_get_dentry() calls iput(inode) to drop the reference count of
      inode, and if the reference count hits 0, inode is freed.  However, in
      this function, it then reads inode->i_generation, which may result in a
      use after free bug.  Move the put operation later.
      
      Link: http://lkml.kernel.org/r/1543109237-110227-1-git-send-email-bianpan2016@163.com
      Fixes: 781f200c("ocfs2: Remove masklog ML_EXPORT.")
      Signed-off-by: NPan Bian <bianpan2016@163.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Changwei Ge <ge.changwei@h3c.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      164f7e58
    • H
      mm/khugepaged: fix the xas_create_range() error path · 95feeabb
      Hugh Dickins 提交于
      collapse_shmem()'s xas_nomem() is very unlikely to fail, but it is
      rightly given a failure path, so move the whole xas_create_range() block
      up before __SetPageLocked(new_page): so that it does not need to
      remember to unlock_page(new_page).
      
      Add the missing mem_cgroup_cancel_charge(), and set (currently unused)
      result to SCAN_FAIL rather than SCAN_SUCCEED.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261531200.2275@eggly.anvils
      Fixes: 77da9389 ("mm: Convert collapse_shmem to XArray")
      Signed-off-by: NHugh Dickins <hughd@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95feeabb
    • H
      mm/khugepaged: collapse_shmem() do not crash on Compound · 06a5e126
      Hugh Dickins 提交于
      collapse_shmem()'s VM_BUG_ON_PAGE(PageTransCompound) was unsafe: before
      it holds page lock of the first page, racing truncation then extension
      might conceivably have inserted a hugepage there already.  Fail with the
      SCAN_PAGE_COMPOUND result, instead of crashing (CONFIG_DEBUG_VM=y) or
      otherwise mishandling the unexpected hugepage - though later we might
      code up a more constructive way of handling it, with SCAN_SUCCESS.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261529310.2275@eggly.anvils
      Fixes: f3f0e1d2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      06a5e126
    • H
      mm/khugepaged: collapse_shmem() without freezing new_page · 87c460a0
      Hugh Dickins 提交于
      khugepaged's collapse_shmem() does almost all of its work, to assemble
      the huge new_page from 512 scattered old pages, with the new_page's
      refcount frozen to 0 (and refcounts of all old pages so far also frozen
      to 0).  Including shmem_getpage() to read in any which were out on swap,
      memory reclaim if necessary to allocate their intermediate pages, and
      copying over all the data from old to new.
      
      Imagine the frozen refcount as a spinlock held, but without any lock
      debugging to highlight the abuse: it's not good, and under serious load
      heads into lockups - speculative getters of the page are not expecting
      to spin while khugepaged is rescheduled.
      
      One can get a little further under load by hacking around elsewhere; but
      fortunately, freezing the new_page turns out to have been entirely
      unnecessary, with no hacks needed elsewhere.
      
      The huge new_page lock is already held throughout, and guards all its
      subpages as they are brought one by one into the page cache tree; and
      anything reading the data in that page, without the lock, before it has
      been marked PageUptodate, would already be in the wrong.  So simply
      eliminate the freezing of the new_page.
      
      Each of the old pages remains frozen with refcount 0 after it has been
      replaced by a new_page subpage in the page cache tree, until they are
      all unfrozen on success or failure: just as before.  They could be
      unfrozen sooner, but cause no problem once no longer visible to
      find_get_entry(), filemap_map_pages() and other speculative lookups.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261527570.2275@eggly.anvils
      Fixes: f3f0e1d2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87c460a0