1. 26 9月, 2006 3 次提交
  2. 28 4月, 2006 1 次提交
    • Z
      [PATCH] x86/PAE: Fix pte_clear for the >4GB RAM case · 6e5882cf
      Zachary Amsden 提交于
      Proposed fix for ptep_get_and_clear_full PAE bug.  Pte_clear had the same bug,
      so use the same fix for both.  Turns out pmd_clear had it as well, but pgds
      are not affected.
      
      The problem is rather intricate.  Page table entries in PAE mode are 64-bits
      wide, but the only atomic 8-byte write operation available in 32-bit mode is
      cmpxchg8b, which is expensive (at least on P4), and thus avoided.  But it can
      happen that the processor may prefetch entries into the TLB in the middle of an
      operation which clears a page table entry.  So one must always clear the P-bit
      in the low word of the page table entry first when clearing it.
      
      Since the sequence *ptep = __pte(0) leaves the order of the write dependent on
      the compiler, it must be coded explicitly as a clear of the low word followed
      by a clear of the high word.  Further, there must be a write memory barrier
      here to enforce proper ordering by the compiler (and, in the future, by the
      processor as well).
      
      On > 4GB memory machines, the implementation of pte_clear for PAE was clearly
      deficient, as it could leave virtual mappings of physical memory above 4GB
      aliased to memory below 4GB in the TLB.  The implementation of
      ptep_get_and_clear_full has a similar bug, although not nearly as likely to
      occur, since the mappings being cleared are in the process of being destroyed,
      and should never be dereferenced again.
      
      But, as luck would have it, it is possible to trigger bugs even without ever
      dereferencing these bogus TLB mappings, even if the clear is followed fairly
      soon after with a TLB flush or invalidation.  The problem is that memory above
      4GB may now be aliased into the first 4GB of memory, and in fact, may hit a
      region of memory with non-memory semantics.  These regions include AGP and PCI
      space.  As such, these memory regions are not cached by the processor.  This
      introduces the bug.
      
      The processor can speculate memory operations, including memory writes, as long
      as they are committed with the proper ordering.  Speculating a memory write to
      a linear address that has a bogus TLB mapping is possible.  Normally, the
      speculation is harmless.  But for cached memory, it does leave the falsely
      speculated cacheline unmodified, but in a dirty state.  This cache line will be
      eventually written back.  If this cacheline happens to intersect a region of
      memory that is not protected by the cache coherency protocol, it can corrupt
      data in I/O memory, which is generally a very bad thing to do, and can cause
      total system failure or just plain undefined behavior.
      
      These bugs are extremely unlikely, but the severity is of such magnitude, and
      the fix so simple that I think fixing them immediately is justified.  Also,
      they are nearly impossible to debug.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6e5882cf
  3. 26 4月, 2006 1 次提交
  4. 22 3月, 2006 1 次提交
    • Z
      [PATCH] Enable mprotect on huge pages · 8f860591
      Zhang, Yanmin 提交于
      2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
      mprotect.
      
      From: David Gibson <david@gibson.dropbear.id.au>
      
        Remove a test from the mprotect() path which checks that the mprotect()ed
        range on a hugepage VMA is hugepage aligned (yes, really, the sense of
        is_aligned_hugepage_range() is the opposite of what you'd guess :-/).
      
        In fact, we don't need this test.  If the given addresses match the
        beginning/end of a hugepage VMA they must already be suitably aligned.  If
        they don't, then mprotect_fixup() will attempt to split the VMA.  The very
        first test in split_vma() will check for a badly aligned address on a
        hugepage VMA and return -EINVAL if necessary.
      
      From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      
        On i386 and x86-64, pte flag _PAGE_PSE collides with _PAGE_PROTNONE.  The
        identify of hugetlb pte is lost when changing page protection via mprotect.
        A page fault occurs later will trigger a bug check in huge_pte_alloc().
      
        The fix is to always make new pte a hugetlb pte and also to clean up
        legacy code where _PAGE_PRESENT is forced on in the pre-faulting day.
      Signed-off-by: NZhang Yanmin <yanmin.zhang@intel.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8f860591
  5. 07 11月, 2005 1 次提交
  6. 31 10月, 2005 2 次提交
  7. 30 10月, 2005 1 次提交
    • H
      [PATCH] mm: pte_offset_map_lock loops · 705e87c0
      Hugh Dickins 提交于
      Convert those common loops using page_table_lock on the outside and
      pte_offset_map within to use just pte_offset_map_lock within instead.
      
      These all hold mmap_sem (some exclusively, some not), so at no level can a
      page table be whipped away from beneath them.  But whereas pte_alloc loops
      tested with the "atomic" pmd_present, these loops are testing with pmd_none,
      which on i386 PAE tests both lower and upper halves.
      
      That's now unsafe, so add a cast into pmd_none to test only the vital lower
      half: we lose a little sensitivity to a corrupt middle directory, but not
      enough to worry about.  It appears that i386 and UML were the only
      architectures vulnerable in this way, and pgd and pud no problem.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      705e87c0
  8. 13 9月, 2005 1 次提交
  9. 05 9月, 2005 4 次提交
    • Z
      [PATCH] i386: encapsulate copying of pgd entries · d7271b14
      Zachary Amsden 提交于
      Add a clone operation for pgd updates.
      
      This helps complete the encapsulation of updates to page tables (or pages
      about to become page tables) into accessor functions rather than using
      memcpy() to duplicate them.  This is both generally good for consistency
      and also necessary for running in a hypervisor which requires explicit
      updates to page table entries.
      
      The new function is:
      
      clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
      
         dst - pointer to pgd range anwhere on a pgd page
         src - ""
         count - the number of pgds to copy.
      
         dst and src can be on the same page, but the range must not overlap
         and must not cross a page boundary.
      
      Note that I ommitted using this call to copy pgd entries into the
      software suspend page root, since this is not technically a live paging
      structure, rather it is used on resume from suspend.  CC'ing Pavel in case
      he has any feedback on this.
      
      Thanks to Chris Wright for noticing that this could be more optimal in
      PAE compiles by eliminating the memset.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d7271b14
    • Z
      [PATCH] x86: ptep_clear optimization · a600388d
      Zachary Amsden 提交于
      Add a new accessor for PTEs, which passes the full hint from the mmu_gather
      struct; this allows architectures with hardware pagetables to optimize away
      atomic PTE operations when destroying an address space.  Removing the
      locked operation should allow better pipelining of memory access in this
      loop.  I measured an average savings of 30-35 cycles per zap_pte_range on
      the first 500 destructions on Pentium-M, but I believe the optimization
      would win more on older processors which still assert the bus lock on xchg
      for an exclusive cacheline.
      
      Update: I made some new measurements, and this saves exactly 26 cycles over
      ptep_get_and_clear on Pentium M.  On P4, with a PAE kernel, this saves 180
      cycles per ptep_get_and_clear, for a whopping 92160 cycles savings for a
      full address space destruction.
      
      pte_clear_full is not yet used, but is provided for future optimizations
      (in particular, when running inside of a hypervisor that queues page table
      updates, the full hint allows us to avoid queueing unnecessary page table
      update for an address space in the process of being destroyed.
      
      This is not a huge win, but it does help a bit, and sets the stage for
      further hypervisor optimization of the mm layer on all architectures.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Christoph Lameter <christoph@lameter.com>
      Cc: <linux-mm@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a600388d
    • A
      [PATCH] hugetlb: add pte_huge() macro · 32e51a8c
      Adam Litke 提交于
      This patch adds a macro pte_huge(pte) for i386/x86_64 which is needed by a
      patch later in the series.  Instead of repeating (_PAGE_PRESENT |
      _PAGE_PSE), I've added __LARGE_PTE to i386 to match x86_64.
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: <linux-mm@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      32e51a8c
    • P
      [PATCH] mm: correct _PAGE_FILE comment · 9b4ee40e
      Paolo 'Blaisorblade' Giarrusso 提交于
      _PAGE_FILE does not indicate whether a file is in page / swap cache, it is
      set just for non-linear PTE's.  Correct the comment for i386, x86_64, UML.
      Also clearify _PAGE_NONE.
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9b4ee40e
  10. 24 6月, 2005 1 次提交
  11. 22 6月, 2005 1 次提交
    • D
      [PATCH] Hugepage consolidation · 63551ae0
      David Gibson 提交于
      A lot of the code in arch/*/mm/hugetlbpage.c is quite similar.  This patch
      attempts to consolidate a lot of the code across the arch's, putting the
      combined version in mm/hugetlb.c.  There are a couple of uglyish hacks in
      order to covert all the hugepage archs, but the result is a very large
      reduction in the total amount of code.  It also means things like hugepage
      lazy allocation could be implemented in one place, instead of six.
      
      Tested, at least a little, on ppc64, i386 and x86_64.
      
      Notes:
      	- this patch changes the meaning of set_huge_pte() to be more
      	  analagous to set_pte()
      	- does SH4 need s special huge_ptep_get_and_clear()??
      Acked-by: NWilliam Lee Irwin <wli@holomorphy.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      63551ae0
  12. 01 5月, 2005 1 次提交
    • J
      [PATCH] misc verify_area cleanups · e49332bd
      Jesper Juhl 提交于
      There were still a few comments left refering to verify_area, and two
      functions, verify_area_skas & verify_area_tt that just wrap corresponding
      access_ok_skas & access_ok_tt functions, just like verify_area does for
      access_ok - deprecate those.
      
      There was also a few places that still used verify_area in commented-out
      code, fix those up to use access_ok.
      
      After applying this one there should not be anything left but finally
      removing verify_area completely, which will happen after a kernel release
      or two.
      Signed-off-by: NJesper Juhl <juhl-lkml@dif.dk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e49332bd
  13. 20 4月, 2005 1 次提交
  14. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4