1. 14 9月, 2018 1 次提交
    • J
      Revert "x86/mm/legacy: Populate the user page-table with user pgd's" · 61a6bd83
      Joerg Roedel 提交于
      This reverts commit 1f40a46c.
      
      It turned out that this patch is not sufficient to enable PTI on 32 bit
      systems with legacy 2-level page-tables. In this paging mode the huge-page
      PTEs are in the top-level page-table directory, where also the mirroring to
      the user-space page-table happens. So every huge PTE exits twice, in the
      kernel and in the user page-table.
      
      That means that accessed/dirty bits need to be fetched from two PTEs in
      this mode to be safe, but this is not trivial to implement because it needs
      changes to generic code just for the sake of enabling PTI with 32-bit
      legacy paging. As all systems that need PTI should support PAE anyway,
      remove support for PTI when 32-bit legacy paging is used.
      
      Fixes: 7757d607 ('x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32')
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: hpa@zytor.com
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Link: https://lkml.kernel.org/r/1536922754-31379-1-git-send-email-joro@8bytes.org
      61a6bd83
  2. 20 7月, 2018 1 次提交
    • J
      x86/mm/legacy: Populate the user page-table with user pgd's · 1f40a46c
      Joerg Roedel 提交于
      Also populate the user-spage pgd's in the user page-table.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NPavel Machek <pavel@ucw.cz>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Waiman Long <llong@redhat.com>
      Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
      Cc: joro@8bytes.org
      Link: https://lkml.kernel.org/r/1531906876-13451-24-git-send-email-joro@8bytes.org
      1f40a46c
  3. 21 6月, 2018 1 次提交
    • A
      x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation · 6b28baca
      Andi Kleen 提交于
      When PTEs are set to PROT_NONE the kernel just clears the Present bit and
      preserves the PFN, which creates attack surface for L1TF speculation
      speculation attacks.
      
      This is important inside guests, because L1TF speculation bypasses physical
      page remapping. While the host has its own migitations preventing leaking
      data from other VMs into the guest, this would still risk leaking the wrong
      page inside the current guest.
      
      This uses the same technique as Linus' swap entry patch: while an entry is
      is in PROTNONE state invert the complete PFN part part of it. This ensures
      that the the highest bit will point to non existing memory.
      
      The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and
      pte/pmd/pud_pfn undo it.
      
      This assume that no code path touches the PFN part of a PTE directly
      without using these primitives.
      
      This doesn't handle the case that MMIO is on the top of the CPU physical
      memory. If such an MMIO region was exposed by an unpriviledged driver for
      mmap it would be possible to attack some real memory.  However this
      situation is all rather unlikely.
      
      For 32bit non PAE the inversion is not done because there are really not
      enough bits to protect anything.
      
      Q: Why does the guest need to be protected when the HyperVisor already has
         L1TF mitigations?
      
      A: Here's an example:
      
         Physical pages 1 2 get mapped into a guest as
         GPA 1 -> PA 2
         GPA 2 -> PA 1
         through EPT.
      
         The L1TF speculation ignores the EPT remapping.
      
         Now the guest kernel maps GPA 1 to process A and GPA 2 to process B, and
         they belong to different users and should be isolated.
      
         A sets the GPA 1 PA 2 PTE to PROT_NONE to bypass the EPT remapping and
         gets read access to the underlying physical page. Which in this case
         points to PA 2, so it can read process B's data, if it happened to be in
         L1, so isolation inside the guest is broken.
      
         There's nothing the hypervisor can do about this. This mitigation has to
         be done in the guest itself.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NDave Hansen <dave.hansen@intel.com>
      
      
      6b28baca
  4. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  5. 25 2月, 2017 1 次提交
  6. 11 2月, 2015 1 次提交
  7. 05 6月, 2014 2 次提交
    • C
      mm: x86 pgtable: require X86_64 for soft-dirty tracker · 2bf01f9f
      Cyrill Gorcunov 提交于
      Tracking dirty status on 2 level pages requires very ugly macros and
      taking into account how old the machines who can operate without PAE
      mode only are, lets drop soft dirty tracker from them for code
      simplicity (note I can't drop all the macros from 2 level pages by now
      since _PAGE_BIT_PROTNONE and _PAGE_BIT_FILE are still used even without
      tracker).
      
      Linus proposed to completely rip off softdirty support on x86-32 (even
      with PAE) and since for CRIU we're not planning to support native x86-32
      mode, lets do that.
      
      (Softdirty tracker is relatively new feature which is mostly used by
      CRIU so I don't expect if such API change would cause problems for
      userspace).
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2bf01f9f
    • C
      mm: x86 pgtable: drop unneeded preprocessor ifdef · 2373eaec
      Cyrill Gorcunov 提交于
      _PAGE_BIT_FILE (bit 6) is always less than _PAGE_BIT_PROTNONE (bit 8), so
      drop redundant #ifdef.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2373eaec
  8. 19 11月, 2013 1 次提交
  9. 14 8月, 2013 1 次提交
  10. 06 6月, 2012 1 次提交
  11. 14 1月, 2011 1 次提交
  12. 19 3月, 2009 1 次提交
  13. 07 2月, 2009 1 次提交
  14. 17 12月, 2008 1 次提交
    • J
      x86: consolidate __swp_XXX() macros · 1796316a
      Jan Beulich 提交于
      Impact: cleanup, code robustization
      
      The __swp_...() macros silently relied upon which bits are used for
      _PAGE_FILE and _PAGE_PROTNONE. After having changed _PAGE_PROTNONE in
      our Xen kernel to no longer overlap _PAGE_PAT, live locks and crashes
      were reported that could have been avoided if these macros properly
      used the symbolic constants. Since, as pointed out earlier, for Xen
      Dom0 support mainline likewise will need to eliminate the conflict
      between _PAGE_PAT and _PAGE_PROTNONE, this patch does all the necessary
      adjustments, plus it introduces a mechanism to check consistency
      between MAX_SWAPFILES_SHIFT and the actual encoding macros.
      
      This also fixes a latent bug in that x86-64 used a 6-bit mask in
      __swp_type(), and if MAX_SWAPFILES_SHIFT was increased beyond 5 in (the
      seemingly unrelated) linux/swap.h, this would have resulted in a
      collision with _PAGE_FILE.
      
      Non-PAE 32-bit code gets similarly adjusted for its pte_to_pgoff() and
      pgoff_to_pte() calculations.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1796316a
  15. 23 10月, 2008 2 次提交
  16. 10 9月, 2008 1 次提交
    • H
      x86: unsigned long pte_pfn · 91030ca1
      Hugh Dickins 提交于
      pte_pfn() has always been of type unsigned long, even on 32-bit PAE;
      but in the current tip/next/mm tree it works out to be unsigned long
      long on 64-bit, which gives an irritating warning if you try to printk
      a pfn with the usual %lx.
      
      Now use the same pte_pfn() function, moved from pgtable-3level.h
      to pgtable.h, for all models: as suggested by Jeremy Fitzhardinge.
      And pte_page() can well move along with it (remaining a macro to
      avoid dependence on mm_types.h).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      91030ca1
  17. 23 7月, 2008 1 次提交
    • V
      x86: consolidate header guards · 77ef50a5
      Vegard Nossum 提交于
      This patch is the result of an automatic script that consolidates the
      format of all the headers in include/asm-x86/.
      
      The format:
      
      1. No leading underscore. Names with leading underscores are reserved.
      2. Pathname components are separated by two underscores. So we can
         distinguish between mm_types.h and mm/types.h.
      3. Everything except letters and numbers are turned into single
         underscores.
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      77ef50a5
  18. 17 4月, 2008 1 次提交
  19. 30 1月, 2008 5 次提交
  20. 11 10月, 2007 1 次提交
  21. 17 7月, 2007 1 次提交
  22. 03 5月, 2007 5 次提交
    • Z
      [PATCH] i386: pte simplify ops · 9e5e3162
      Zachary Amsden 提交于
      Add comment and condense code to make use of native_local_ptep_get_and_clear
      function.  Also, it turns out the 2-level and 3-level paging definitions were
      identical, so move the common definition into pgtable.h
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      9e5e3162
    • Z
      [PATCH] i386: pte xchg optimization · 142dd975
      Zachary Amsden 提交于
      In situations where page table updates need only be made locally, and there is
      no cross-processor A/D bit races involved, we need not use the heavyweight
      xchg instruction to atomically fetch and clear page table entries.  Instead,
      we can just read and clear them directly.
      
      This introduces a neat optimization for non-SMP kernels; drop the atomic xchg
      operations from page table updates.
      
      Thanks to Michel Lespinasse for noting this potential optimization.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      142dd975
    • Z
      [PATCH] i386: pte clear optimization · c2c1accd
      Zachary Amsden 提交于
      When exiting from an address space, no special hypervisor notification of page
      table updates needs to occur; direct page table hypervisors, such as Xen,
      switch to another address space first (init_mm) and unprotects the page tables
      to avoid the cost of trapping to the hypervisor for each pte_clear.  Shadow
      mode hypervisors, such as VMI and lhype don't need to do the extra work of
      calling through paravirt-ops, and can just directly clear the page table
      entries without notifiying the hypervisor, since all the page tables are about
      to be freed.
      
      So introduce native_pte_clear functions which bypass any paravirt-ops
      notification.  This results in a significant performance win for VMI and
      removes some indirect calls from zap_pte_range.
      
      Note the 3-level paging already had a native_pte_clear function, thus
      demanding argument conformance and extra args for the 2-level definition.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      c2c1accd
    • J
      [PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing · 5311ab62
      Jeremy Fitzhardinge 提交于
      Normally when running in PAE mode, the 4th PMD maps the kernel address space,
      which can be shared among all processes (since they all need the same kernel
      mappings).
      
      Xen, however, does not allow guests to have the kernel pmd shared between page
      tables, so parameterize pgtable.c to allow both modes of operation.
      
      There are several side-effects of this.  One is that vmalloc will update the
      kernel address space mappings, and those updates need to be propagated into
      all processes if the kernel mappings are not intrinsically shared.  In the
      non-PAE case, this is done by maintaining a pgd_list of all processes; this
      list is used when all process pagetables must be updated.  pgd_list is
      threaded via otherwise unused entries in the page structure for the pgd, which
      means that the pgd must be page-sized for this to work.
      
      Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE
      pgd to page aligned anyway, so this patch forces the pgd to be page
      aligned+sized when the kernel pmd is unshared, to accomodate both these
      requirements.
      
      Also, since there may be several distinct kernel pmds (if the user/kernel
      split is below 3G), there's no point in allocating them from a slab cache;
      they're just allocated with get_free_page and initialized appropriately.  (Of
      course the could be cached if there is just a single kernel pmd - which is the
      default with a 3G user/kernel split - but it doesn't seem worthwhile to add
      yet another case into this code).
      
      [ Many thanks to wli for review comments. ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NWilliam Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      5311ab62
    • J
      [PATCH] i386: PARAVIRT: Add pagetable accessors to pack and unpack pagetable entries · 3dc494e8
      Jeremy Fitzhardinge 提交于
      Add a set of accessors to pack, unpack and modify page table entries
      (at all levels).  This allows a paravirt implementation to control the
      contents of pgd/pmd/pte entries.  For example, Xen uses this to
      convert the (pseudo-)physical address into a machine address when
      populating a pagetable entry, and converting back to pphys address
      when an entry is read.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      3dc494e8
  23. 07 12月, 2006 3 次提交
    • Z
      [PATCH] paravirt: fix missing pte update · 8ecb8950
      Zachary Amsden 提交于
      The function ptep_get_and_clear uses an atomic instruction sequence to get and
      clear an active pte.  Rather than add such an atomic operator to all virtual
      machine implementations in paravirt-ops, it is easier to support the raw
      atomic sequence and use either a trapping writable pagetable approach, or a
      post-update notification.  For the post update notification, we require the
      pte_update function to be called after the access.  Combine the 2-level and
      3-level paging operators into one common function which does the post-update
      notification, and rename the actual atomic sequences to raw_ptep_xxx
      operators.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      8ecb8950
    • Z
      [PATCH] paravirt: Preparatory mmu header movement · a2952d89
      Zachary Amsden 提交于
      Move header includes for the nopud / nopmd types to the location of the actual
      pte / pgd type definitions.  This allows generic 4-level page type code to be
      written before the split 2/3 level page table headers are included.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      a2952d89
    • R
      [PATCH] paravirt: Add MMU virtualization to paravirt_ops · da181a8b
      Rusty Russell 提交于
      Add the three bare TLB accessor functions to paravirt-ops.  Most amusingly,
      flush_tlb is redefined on SMP, so I can't call the paravirt op flush_tlb.
      Instead, I chose to indicate the actual flush type, kernel (global) vs. user
      (non-global).  Global in this sense means using the global bit in the page
      table entry, which makes TLB entries persistent across CR3 reloads, not
      global as in the SMP sense of invoking remote shootdowns, so the term is
      confusingly overloaded.
      
      AK: folded in fix from Zach for PAE compilation
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      da181a8b
  24. 01 10月, 2006 1 次提交
    • Z
      [PATCH] paravirt: optimize ptep establish for pae · d6d861e3
      Zachary Amsden 提交于
      The ptep_establish macro is only used on user-level PTEs, for P->P mapping
      changes.  Since these always happen under protection of the pagetable lock,
      the strong synchronization of a 64-bit cmpxchg is not needed, in fact, not
      even a lock prefix needs to be used.  We can simply instead clear the P-bit,
      followed by a normal set.  The write ordering is still important to avoid the
      possibility of the TLB snooping a partially written PTE and getting a bad
      mapping installed.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d6d861e3
  25. 26 9月, 2006 1 次提交
  26. 28 4月, 2006 1 次提交
    • Z
      [PATCH] x86/PAE: Fix pte_clear for the >4GB RAM case · 6e5882cf
      Zachary Amsden 提交于
      Proposed fix for ptep_get_and_clear_full PAE bug.  Pte_clear had the same bug,
      so use the same fix for both.  Turns out pmd_clear had it as well, but pgds
      are not affected.
      
      The problem is rather intricate.  Page table entries in PAE mode are 64-bits
      wide, but the only atomic 8-byte write operation available in 32-bit mode is
      cmpxchg8b, which is expensive (at least on P4), and thus avoided.  But it can
      happen that the processor may prefetch entries into the TLB in the middle of an
      operation which clears a page table entry.  So one must always clear the P-bit
      in the low word of the page table entry first when clearing it.
      
      Since the sequence *ptep = __pte(0) leaves the order of the write dependent on
      the compiler, it must be coded explicitly as a clear of the low word followed
      by a clear of the high word.  Further, there must be a write memory barrier
      here to enforce proper ordering by the compiler (and, in the future, by the
      processor as well).
      
      On > 4GB memory machines, the implementation of pte_clear for PAE was clearly
      deficient, as it could leave virtual mappings of physical memory above 4GB
      aliased to memory below 4GB in the TLB.  The implementation of
      ptep_get_and_clear_full has a similar bug, although not nearly as likely to
      occur, since the mappings being cleared are in the process of being destroyed,
      and should never be dereferenced again.
      
      But, as luck would have it, it is possible to trigger bugs even without ever
      dereferencing these bogus TLB mappings, even if the clear is followed fairly
      soon after with a TLB flush or invalidation.  The problem is that memory above
      4GB may now be aliased into the first 4GB of memory, and in fact, may hit a
      region of memory with non-memory semantics.  These regions include AGP and PCI
      space.  As such, these memory regions are not cached by the processor.  This
      introduces the bug.
      
      The processor can speculate memory operations, including memory writes, as long
      as they are committed with the proper ordering.  Speculating a memory write to
      a linear address that has a bogus TLB mapping is possible.  Normally, the
      speculation is harmless.  But for cached memory, it does leave the falsely
      speculated cacheline unmodified, but in a dirty state.  This cache line will be
      eventually written back.  If this cacheline happens to intersect a region of
      memory that is not protected by the cache coherency protocol, it can corrupt
      data in I/O memory, which is generally a very bad thing to do, and can cause
      total system failure or just plain undefined behavior.
      
      These bugs are extremely unlikely, but the severity is of such magnitude, and
      the fix so simple that I think fixing them immediately is justified.  Also,
      they are nearly impossible to debug.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6e5882cf
  27. 23 3月, 2006 1 次提交
    • J
      [PATCH] i386: actively synchronize vmalloc area when registering certain callbacks · 101f12af
      Jan Beulich 提交于
      Registering a callback handler through register_die_notifier() is obviously
      primarily intended for use by modules.  However, the way these currently
      get called it is basically impossible for them to actually be used by
      modules, as there is, on non-PAE configurationes, a good chance (the larger
      the module, the better) for the system to crash as a result.
      
      This is because the callback gets invoked
      
      (a) in the page fault path before the top level page table propagation
          gets carried out (hence a fault to propagate the top level page table
          entry/entries mapping to module's code/data would nest infinitly) and
      
      (b) in the NMI path, where nested faults must absolutely not happen,
          since otherwise the IRET from the nested fault re-enables NMIs,
          potentially resulting in nested NMI occurences.
      
      Besides the modular aspect, similar problems would even arise for in-
      kernel consumers of the API if they touched ioremap()ed or vmalloc()ed
      memory inside their handlers.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      101f12af
  28. 31 10月, 2005 1 次提交