1. 18 8月, 2018 1 次提交
  2. 11 8月, 2018 1 次提交
  3. 02 8月, 2018 1 次提交
    • H
      mm: delete historical BUG from zap_pmd_range() · 53406ed1
      Hugh Dickins 提交于
      Delete the old VM_BUG_ON_VMA() from zap_pmd_range(), which asserted
      that mmap_sem must be held when splitting an "anonymous" vma there.
      Whether that's still strictly true nowadays is not entirely clear,
      but the danger of sometimes crashing on the BUG is now fairly clear.
      
      Even with the new stricter rules for anonymous vma marking, the
      condition it checks for can possible trigger. Commit 44960f2a
      ("staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem
      pages") is good, and originally I thought it was safe from that
      VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is
      disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE)
      insists on VM_SHARED.
      
      But after I read John's earlier mail, drawing attention to the
      vfs_fallocate() in there: I may be wrong, and I don't know if Android
      has THP in the config anyway, but it looks to me like an
      unmap_mapping_range() from ashmem's vfs_fallocate() could hit precisely
      the VM_BUG_ON_VMA(), once it's vma_is_anonymous().
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53406ed1
  4. 17 7月, 2018 1 次提交
    • R
      x86/mm/tlb: Leave lazy TLB mode at page table free time · 2ff6ddf1
      Rik van Riel 提交于
      Andy discovered that speculative memory accesses while in lazy
      TLB mode can crash a system, when a CPU tries to dereference a
      speculative access using memory contents that used to be valid
      page table memory, but have since been reused for something else
      and point into la-la land.
      
      The latter problem can be prevented in two ways. The first is to
      always send a TLB shootdown IPI to CPUs in lazy TLB mode, while
      the second one is to only send the TLB shootdown at page table
      freeing time.
      
      The second should result in fewer IPIs, since operationgs like
      mprotect and madvise are very common with some workloads, but
      do not involve page table freeing. Also, on munmap, batching
      of page table freeing covers much larger ranges of virtual
      memory than the batching of unmapped user pages.
      Tested-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NRik van Riel <riel@surriel.com>
      Acked-by: NDave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: efault@gmx.de
      Cc: kernel-team@fb.com
      Cc: luto@kernel.org
      Link: http://lkml.kernel.org/r/20180716190337.26133-3-riel@surriel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2ff6ddf1
  5. 09 7月, 2018 1 次提交
  6. 21 6月, 2018 1 次提交
    • A
      x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings · 42e4089c
      Andi Kleen 提交于
      For L1TF PROT_NONE mappings are protected by inverting the PFN in the page
      table entry. This sets the high bits in the CPU's address space, thus
      making sure to point to not point an unmapped entry to valid cached memory.
      
      Some server system BIOSes put the MMIO mappings high up in the physical
      address space. If such an high mapping was mapped to unprivileged users
      they could attack low memory by setting such a mapping to PROT_NONE. This
      could happen through a special device driver which is not access
      protected. Normal /dev/mem is of course access protected.
      
      To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings.
      
      Valid page mappings are allowed because the system is then unsafe anyways.
      
      It's not expected that users commonly use PROT_NONE on MMIO. But to
      minimize any impact this is only enforced if the mapping actually refers to
      a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip
      the check for root.
      
      For mmaps this is straight forward and can be handled in vm_insert_pfn and
      in remap_pfn_range().
      
      For mprotect it's a bit trickier. At the point where the actual PTEs are
      accessed a lot of state has been changed and it would be difficult to undo
      on an error. Since this is a uncommon case use a separate early page talk
      walk pass for MMIO PROT_NONE mappings that checks for this condition
      early. For non MMIO and non PROT_NONE there are no changes.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: NDave Hansen <dave.hansen@intel.com>
      
      42e4089c
  7. 08 6月, 2018 3 次提交
  8. 01 6月, 2018 1 次提交
  9. 06 4月, 2018 2 次提交
  10. 18 3月, 2018 1 次提交
  11. 17 2月, 2018 1 次提交
    • A
      mm: hide a #warning for COMPILE_TEST · af27d940
      Arnd Bergmann 提交于
      We get a warning about some slow configurations in randconfig kernels:
      
        mm/memory.c:83:2: error: #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid. [-Werror=cpp]
      
      The warning is reasonable by itself, but gets in the way of randconfig
      build testing, so I'm hiding it whenever CONFIG_COMPILE_TEST is set.
      
      The warning was added in 2013 in commit 75980e97 ("mm: fold
      page->_last_nid into page->flags where possible").
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af27d940
  12. 07 2月, 2018 1 次提交
  13. 01 2月, 2018 3 次提交
  14. 20 1月, 2018 2 次提交
  15. 16 12月, 2017 1 次提交
    • L
      Revert "mm: replace p??_write with pte_access_permitted in fault + gup paths" · f6f37321
      Linus Torvalds 提交于
      This reverts commits 5c9d2d5c, c7da82b8, and e7fe7b5c.
      
      We'll probably need to revisit this, but basically we should not
      complicate the get_user_pages_fast() case, and checking the actual page
      table protection key bits will require more care anyway, since the
      protection keys depend on the exact state of the VM in question.
      
      Particularly when doing a "remote" page lookup (ie in somebody elses VM,
      not your own), you need to be much more careful than this was.  Dave
      Hansen says:
      
       "So, the underlying bug here is that we now a get_user_pages_remote()
        and then go ahead and do the p*_access_permitted() checks against the
        current PKRU. This was introduced recently with the addition of the
        new p??_access_permitted() calls.
      
        We have checks in the VMA path for the "remote" gups and we avoid
        consulting PKRU for them. This got missed in the pkeys selftests
        because I did a ptrace read, but not a *write*. I also didn't
        explicitly test it against something where a COW needed to be done"
      
      It's also not entirely clear that it makes sense to check the protection
      key bits at this level at all.  But one possible eventual solution is to
      make the get_user_pages_fast() case just abort if it sees protection key
      bits set, which makes us fall back to the regular get_user_pages() case,
      which then has a vma and can do the check there if we want to.
      
      We'll see.
      
      Somewhat related to this all: what we _do_ want to do some day is to
      check the PAGE_USER bit - it should obviously always be set for user
      pages, but it would be a good check to have back.  Because we have no
      generic way to test for it, we lost it as part of moving over from the
      architecture-specific x86 GUP implementation to the generic one in
      commit e585513b ("x86/mm/gup: Switch GUP to the generic
      get_user_page_fast() implementation").
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f6f37321
  16. 15 12月, 2017 1 次提交
  17. 30 11月, 2017 4 次提交
  18. 28 11月, 2017 1 次提交
    • K
      mm, thp: Do not make pmd/pud dirty without a reason · 152e93af
      Kirill A. Shutemov 提交于
      Currently we make page table entries dirty all the time regardless of
      access type and don't even consider if the mapping is write-protected.
      The reasoning is that we don't really need dirty tracking on THP and
      making the entry dirty upfront may save some time on first write to the
      page.
      
      Unfortunately, such approach may result in false-positive
      can_follow_write_pmd() for huge zero page or read-only shmem file.
      
      Let's only make page dirty only if we about to write to the page anyway
      (as we do for small pages).
      
      I've restructured the code to make entry dirty inside
      maybe_p[mu]d_mkwrite(). It also takes into account if the vma is
      write-protected.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      152e93af
  19. 16 11月, 2017 6 次提交
  20. 25 10月, 2017 1 次提交
    • P
      locking/atomics, mm: Convert ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE() · b03a0fe0
      Paul E. McKenney 提交于
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't currently harmful.
      
      However, for some features it is necessary to instrument reads and
      writes separately, which is not possible with ACCESS_ONCE(). This
      distinction is critical to correct operation.
      
      It's possible to transform the bulk of kernel code using the Coccinelle
      script below. However, this doesn't handle comments, leaving references
      to ACCESS_ONCE() instances which have been removed. As a preparatory
      step, this patch converts the mm code and comments to use
      {READ,WRITE}_ONCE() consistently.
      
      ----
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Link: http://lkml.kernel.org/r/1508792849-3115-15-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b03a0fe0
  21. 04 10月, 2017 1 次提交
  22. 09 9月, 2017 5 次提交