1. 13 6月, 2017 1 次提交
  2. 23 4月, 2017 1 次提交
    • I
      Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" · 6dd29b3d
      Ingo Molnar 提交于
      This reverts commit 2947ba05.
      
      Dan Williams reported dax-pmem kernel warnings with the following signature:
      
         WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
         percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0 (0) after switching to atomic
      
      ... and bisected it to this commit, which suggests possible memory corruption
      caused by the x86 fast-GUP conversion.
      
      He also pointed out:
      
       "
        This is similar to the backtrace when we were not properly handling
        pud faults and was fixed with this commit: 220ced16 "mm: fix
        get_user_pages() vs device-dax pud mappings"
      
        I've found some missing _devmap checks in the generic
        get_user_pages_fast() path, but this does not fix the regression
        [...]
       "
      
      So given that there are known bugs, and a pretty robust looking bisection
      points to this commit suggesting that are unknown bugs in the conversion
      as well, revert it for the time being - we'll re-try in v4.13.
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: dann.frazier@canonical.com
      Cc: dave.hansen@intel.com
      Cc: steve.capper@linaro.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6dd29b3d
  3. 18 3月, 2017 1 次提交
  4. 17 3月, 2017 1 次提交
    • A
      mm, x86: fix native_pud_clear build error · d0f33ac9
      Arnd Bergmann 提交于
      We still get a build error in random configurations, after this has been
      modified a few times:
      
        In file included from include/linux/mm.h:68:0,
                         from include/linux/suspend.h:8,
                         from arch/x86/kernel/asm-offsets.c:12:
        arch/x86/include/asm/pgtable.h:66:26: error: redefinition of 'native_pud_clear'
         #define pud_clear(pud)   native_pud_clear(pud)
      
      My interpretation is that the build error comes from a typo in
      __PAGETABLE_PUD_FOLDED, so fix that typo now, and remove the incorrect
      #ifdef around the native_pud_clear definition.
      
      Fixes: 3e761a42 ("mm, x86: fix HIGHMEM64 && PARAVIRT build config for native_pud_clear()")
      Fixes: a00cc7d9 ("mm, x86: add support for PUD-sized transparent hugepages")
      Link: http://lkml.kernel.org/r/20170314121330.182155-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Ackedy-by: NDave Jiang <dave.jiang@intel.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0f33ac9
  5. 15 3月, 2017 1 次提交
    • A
      mm, x86: Fix native_pud_clear build error · d434936e
      Arnd Bergmann 提交于
      We still get a build error in random configurations, after this has been
      modified a few times:
      
      In file included from include/linux/mm.h:68:0,
                       from include/linux/suspend.h:8,
                       from arch/x86/kernel/asm-offsets.c:12:
      arch/x86/include/asm/pgtable.h:66:26: error: redefinition of 'native_pud_clear'
       #define pud_clear(pud)   native_pud_clear(pud)
      
      My interpretation is that the build error comes from a typo in __PAGETABLE_PUD_FOLDED,
      so fix that typo now, and remove the incorrect #ifdef around the native_pud_clear
      definition.
      
      Fixes: 3e761a42 ("mm, x86: fix HIGHMEM64 && PARAVIRT build config for native_pud_clear()")
      Fixes: a00cc7d9 ("mm, x86: add support for PUD-sized transparent hugepages")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NDave Jiang <dave.jiang@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Thomas Garnier <thgarnie@google.com>
      Link: http://lkml.kernel.org/r/20170314121330.182155-1-arnd@arndb.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      d434936e
  6. 28 2月, 2017 1 次提交
  7. 25 2月, 2017 1 次提交
  8. 11 2月, 2015 1 次提交
  9. 14 8月, 2013 1 次提交
  10. 21 6月, 2012 1 次提交
    • A
      thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE · e4eed03f
      Andrea Arcangeli 提交于
      In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding the
      mmap_sem for reading, cmpxchg8b cannot be used to read pmd contents under
      Xen.
      
      So instead of dealing only with "consistent" pmdvals in
      pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
      simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals
      where the low 32bit and high 32bit could be inconsistent (to avoid having
      to use cmpxchg8b).
      
      The only guarantee we get from pmd_read_atomic is that if the low part of
      the pmd was found null, the high part will be null too (so the pmd will be
      considered unstable).  And if the low part of the pmd is found "stable"
      later, then it means the whole pmd was read atomically (because after a
      pmd is stable, neither MADV_DONTNEED nor page faults can alter it anymore,
      and we read the high part after the low part).
      
      In the 32bit PAE x86 case, it is enough to read the low part of the pmdval
      atomically to declare the pmd as "stable" and that's true for THP and no
      THP, furthermore in the THP case we also have a barrier() that will
      prevent any inconsistent pmdvals to be cached by a later re-read of the
      *pmd.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Jonathan Nieder <jrnieder@gmail.com>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Tested-by: NAndrew Jones <drjones@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4eed03f
  11. 06 6月, 2012 1 次提交
  12. 30 5月, 2012 1 次提交
    • A
      mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition · 26c19178
      Andrea Arcangeli 提交于
      When holding the mmap_sem for reading, pmd_offset_map_lock should only
      run on a pmd_t that has been read atomically from the pmdp pointer,
      otherwise we may read only half of it leading to this crash.
      
      PID: 11679  TASK: f06e8000  CPU: 3   COMMAND: "do_race_2_panic"
       #0 [f06a9dd8] crash_kexec at c049b5ec
       #1 [f06a9e2c] oops_end at c083d1c2
       #2 [f06a9e40] no_context at c0433ded
       #3 [f06a9e64] bad_area_nosemaphore at c043401a
       #4 [f06a9e6c] __do_page_fault at c0434493
       #5 [f06a9eec] do_page_fault at c083eb45
       #6 [f06a9f04] error_code (via page_fault) at c083c5d5
          EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
          00000000
          DS:  007b     ESI: 9e201000 ES:  007b     EDI: 01fb4700 GS:  00e0
          CS:  0060     EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
       #7 [f06a9f38] _spin_lock at c083bc14
       #8 [f06a9f44] sys_mincore at c0507b7d
       #9 [f06a9fb0] system_call at c083becd
                               start           len
          EAX: ffffffda  EBX: 9e200000  ECX: 00001000  EDX: 6228537f
          DS:  007b      ESI: 00000000  ES:  007b      EDI: 003d0f00
          SS:  007b      ESP: 62285354  EBP: 62285388  GS:  0033
          CS:  0073      EIP: 00291416  ERR: 000000da  EFLAGS: 00000286
      
      This should be a longstanding bug affecting x86 32bit PAE without THP.
      Only archs with 64bit large pmd_t and 32bit unsigned long should be
      affected.
      
      With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
      would partly hide the bug when the pmd transition from none to stable,
      by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
      enabled a new set of problem arises by the fact could then transition
      freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
      So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
      unconditional isn't good idea and it would be a flakey solution.
      
      This should be fully fixed by introducing a pmd_read_atomic that reads
      the pmd in order with THP disabled, or by reading the pmd atomically
      with cmpxchg8b with THP enabled.
      
      Luckily this new race condition only triggers in the places that must
      already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
      is localized there but this bug is not related to THP.
      
      NOTE: this can trigger on x86 32bit systems with PAE enabled with more
      than 4G of ram, otherwise the high part of the pmd will never risk to be
      truncated because it would be zero at all times, in turn so hiding the
      SMP race.
      
      This bug was discovered and fully debugged by Ulrich, quote:
      
      ----
      [..]
      pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
      eax.
      
          496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
          *pmd)
          497 {
          498         /* depend on compiler for an atomic pmd read */
          499         pmd_t pmdval = *pmd;
      
                                      // edi = pmd pointer
      0xc0507a74 <sys_mincore+548>:   mov    0x8(%esp),%edi
      ...
                                      // edx = PTE page table high address
      0xc0507a84 <sys_mincore+564>:   mov    0x4(%edi),%edx
      ...
                                      // eax = PTE page table low address
      0xc0507a8e <sys_mincore+574>:   mov    (%edi),%eax
      
      [..]
      
      Please note that the PMD is not read atomically. These are two "mov"
      instructions where the high order bits of the PMD entry are fetched
      first. Hence, the above machine code is prone to the following race.
      
      -  The PMD entry {high|low} is 0x0000000000000000.
         The "mov" at 0xc0507a84 loads 0x00000000 into edx.
      
      -  A page fault (on another CPU) sneaks in between the two "mov"
         instructions and instantiates the PMD.
      
      -  The PMD entry {high|low} is now 0x00000003fda38067.
         The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
      ----
      Reported-by: NUlrich Obergfell <uobergfe@redhat.com>
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26c19178
  13. 18 3月, 2011 1 次提交
    • S
      x86: Flush TLB if PGD entry is changed in i386 PAE mode · 4981d01e
      Shaohua Li 提交于
      According to intel CPU manual, every time PGD entry is changed in i386 PAE
      mode, we need do a full TLB flush. Current code follows this and there is
      comment for this too in the code.
      
      But current code misses the multi-threaded case. A changed page table
      might be used by several CPUs, every such CPU should flush TLB. Usually
      this isn't a problem, because we prepopulate all PGD entries at process
      fork. But when the process does munmap and follows new mmap, this issue
      will be triggered.
      
      When it happens, some CPUs keep doing page faults:
      
        http://marc.info/?l=linux-kernel&m=129915020508238&w=2
      
      Reported-by: Yasunori Goto<y-goto@jp.fujitsu.com>
      Tested-by: Yasunori Goto<y-goto@jp.fujitsu.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: Shaohua Li<shaohua.li@intel.com>
      Cc: Mallick Asit K <asit.k.mallick@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: stable <stable@kernel.org>
      LKML-Reference: <1300246649.2337.95.camel@sli10-conroe>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4981d01e
  14. 14 1月, 2011 1 次提交
  15. 19 3月, 2009 1 次提交
  16. 07 2月, 2009 8 次提交
  17. 17 12月, 2008 1 次提交
    • J
      x86: consolidate __swp_XXX() macros · 1796316a
      Jan Beulich 提交于
      Impact: cleanup, code robustization
      
      The __swp_...() macros silently relied upon which bits are used for
      _PAGE_FILE and _PAGE_PROTNONE. After having changed _PAGE_PROTNONE in
      our Xen kernel to no longer overlap _PAGE_PAT, live locks and crashes
      were reported that could have been avoided if these macros properly
      used the symbolic constants. Since, as pointed out earlier, for Xen
      Dom0 support mainline likewise will need to eliminate the conflict
      between _PAGE_PAT and _PAGE_PROTNONE, this patch does all the necessary
      adjustments, plus it introduces a mechanism to check consistency
      between MAX_SWAPFILES_SHIFT and the actual encoding macros.
      
      This also fixes a latent bug in that x86-64 used a 6-bit mask in
      __swp_type(), and if MAX_SWAPFILES_SHIFT was increased beyond 5 in (the
      seemingly unrelated) linux/swap.h, this would have resulted in a
      collision with _PAGE_FILE.
      
      Non-PAE 32-bit code gets similarly adjusted for its pte_to_pgoff() and
      pgoff_to_pte() calculations.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1796316a
  18. 30 10月, 2008 1 次提交
  19. 23 10月, 2008 2 次提交
  20. 10 9月, 2008 1 次提交
    • H
      x86: unsigned long pte_pfn · 91030ca1
      Hugh Dickins 提交于
      pte_pfn() has always been of type unsigned long, even on 32-bit PAE;
      but in the current tip/next/mm tree it works out to be unsigned long
      long on 64-bit, which gives an irritating warning if you try to printk
      a pfn with the usual %lx.
      
      Now use the same pte_pfn() function, moved from pgtable-3level.h
      to pgtable.h, for all models: as suggested by Jeremy Fitzhardinge.
      And pte_page() can well move along with it (remaining a macro to
      avoid dependence on mm_types.h).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      91030ca1
  21. 23 7月, 2008 1 次提交
    • V
      x86: consolidate header guards · 77ef50a5
      Vegard Nossum 提交于
      This patch is the result of an automatic script that consolidates the
      format of all the headers in include/asm-x86/.
      
      The format:
      
      1. No leading underscore. Names with leading underscores are reserved.
      2. Pathname components are separated by two underscores. So we can
         distinguish between mm_types.h and mm/types.h.
      3. Everything except letters and numbers are turned into single
         underscores.
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      77ef50a5
  22. 22 7月, 2008 1 次提交
    • J
      x86: rename PTE_MASK to PTE_PFN_MASK · 59438c9f
      Jeremy Fitzhardinge 提交于
      Rusty, in his peevish way, complained that macros defining constants
      should have a name which somewhat accurately reflects the actual
      purpose of the constant.
      
      Aside from the fact that PTE_MASK gives no clue as to what's actually
      being masked, and is misleadingly similar to the functionally entirely
      different PMD_MASK, PUD_MASK and PGD_MASK, I don't really see what the
      problem is.
      
      But if this patch silences the incessent noise, then it will have
      achieved its goal (TODO: write test-case).
      Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      59438c9f
  23. 20 5月, 2008 1 次提交
  24. 17 4月, 2008 1 次提交
  25. 04 2月, 2008 3 次提交
  26. 30 1月, 2008 5 次提交
    • J
      x86: defer cr3 reload when doing pud_clear() · fa28ba21
      Jeremy Fitzhardinge 提交于
      PAE mode requires that we reload cr3 in order to guarantee that
      changes to the pgd will be noticed by the processor.  This means that
      in principle pud_clear needs to reload cr3 every time.  However,
      because reloading cr3 implies a tlb flush, we want to avoid it where
      possible.
      
      pud_clear() is only used in a couple of places:
       - in free_pmd_range(), when pulling down a range of process address space, and
       - huge_pmd_unshare()
      
      In both cases, the calling code will do a a tlb flush anyway, so
      there's no need to do it within pud_clear().
      
      In free_pmd_range(), the pud_clear is immediately followed by
      pmd_free_tlb(); we can hook that to make the mmu_gather do an
      unconditional full flush to make sure cr3 gets reloaded.
      
      In huge_pmd_unshare, it is followed by flush_tlb_range, which always
      results in a full cr3-reload tlb flush.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: William Irwin <wli@holomorphy.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      fa28ba21
    • J
      x86: don't special-case pmd allocations as much · 6194ba6f
      Jeremy Fitzhardinge 提交于
      In x86 PAE mode, stop treating pmds as a special case.  Previously
      they were always allocated and freed with the pgd.  The modifies the
      code to be the same as 64-bit mode, where they are allocated on
      demand.
      
      This is a step on the way to unifying 32/64-bit pagetable allocation
      as much as possible.
      
      There is a complicating wart, however.  When you install a new
      reference to a pmd in the pgd, the processor isn't guaranteed to see
      it unless you reload cr3.  Since reloading cr3 also has the
      side-effect of flushing the tlb, this is an expense that we want to
      avoid whereever possible.
      
      This patch simply avoids reloading cr3 unless the update is to the
      current pagetable.  Later patches will optimise this further.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: William Irwin <wli@holomorphy.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6194ba6f
    • A
      x86: clean up pte_exec · 4c3c4b45
      Andi Kleen 提交于
      - Rename it to pte_exec() from pte_exec_kernel(). There is nothing
      kernel specific in there.
      - Move it into the common file because _PAGE_NX is 0 on !PAE and then
      pte_exec() will be always evaluate to true.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      4c3c4b45
    • J
      x86: demacro asm-x86/pgalloc_32.h · a5a19c63
      Jeremy Fitzhardinge 提交于
      Convert macros into inline functions, for better type-checking.
      
      This patch required a little bit of fiddling with headers in order to
      make __(pte|pmd)_free_tlb inline rather than macros.
      asm-generic/tlb.h includes asm/pgalloc.h, though it doesn't directly
      use any pgalloc definitions.  I removed this include to avoid an
      include cycle, but it may cause secondary compile failures by things
      depending on the indirect inclusion; arch/x86/mm/hugetlbpage.c was one
      such place; there may be others.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      a5a19c63
    • J
      x86: unify paravirt pagetable accessors · 4891645e
      Jeremy Fitzhardinge 提交于
      Put all the defines for mapping pagetable operations to their native
      versions (for the non-paravirt case) into one place.  Make the
      corresponding changes to paravirt.h.
      
      The tricky part here is that when a pagetable entry can't be updated
      atomically (ie, 32-bit PAE), we need special handlers for pte_clear,
      set_pte_atomic and set_pte_present.  However, the other two modes
      don't need special handling for these, and can use a common
      set_pte(_at) path.
      
      [ mingo@elte.hu: fixes ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      4891645e