1. 09 5月, 2007 1 次提交
    • C
      move die notifier handling to common code · 1eeb66a1
      Christoph Hellwig 提交于
      This patch moves the die notifier handling to common code.  Previous
      various architectures had exactly the same code for it.  Note that the new
      code is compiled unconditionally, this should be understood as an appel to
      the other architecture maintainer to implement support for it aswell (aka
      sprinkling a notify_die or two in the proper place)
      
      arm had a notifiy_die that did something totally different, I renamed it to
      arm_notify_die as part of the patch and made it static to the file it's
      declared and used at.  avr32 used to pass slightly less information through
      this interface and I brought it into line with the other architectures.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
      [bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NBryan Wu <bryan.wu@analog.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1eeb66a1
  2. 08 5月, 2007 2 次提交
  3. 03 5月, 2007 9 次提交
    • Z
      [PATCH] i386: pte simplify ops · 9e5e3162
      Zachary Amsden 提交于
      Add comment and condense code to make use of native_local_ptep_get_and_clear
      function.  Also, it turns out the 2-level and 3-level paging definitions were
      identical, so move the common definition into pgtable.h
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      9e5e3162
    • Z
      [PATCH] i386: pte clear optimization · c2c1accd
      Zachary Amsden 提交于
      When exiting from an address space, no special hypervisor notification of page
      table updates needs to occur; direct page table hypervisors, such as Xen,
      switch to another address space first (init_mm) and unprotects the page tables
      to avoid the cost of trapping to the hypervisor for each pte_clear.  Shadow
      mode hypervisors, such as VMI and lhype don't need to do the extra work of
      calling through paravirt-ops, and can just directly clear the page table
      entries without notifiying the hypervisor, since all the page tables are about
      to be freed.
      
      So introduce native_pte_clear functions which bypass any paravirt-ops
      notification.  This results in a significant performance win for VMI and
      removes some indirect calls from zap_pte_range.
      
      Note the 3-level paging already had a native_pte_clear function, thus
      demanding argument conformance and extra args for the 2-level definition.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      c2c1accd
    • J
      [PATCH] i386: PARAVIRT: drop unused ptep_get_and_clear · 4cdd9c89
      Jeremy Fitzhardinge 提交于
      In shadow mode hypervisors, ptep_get_and_clear achieves the desired
      purpose of keeping the shadows in sync by issuing a native_get_and_clear,
      followed by a call to pte_update, which indicates the PTE has been
      modified.
      
      Direct mode hypervisors (Xen) have no need for this anyway, and will trap
      the update using writable pagetables.
      
      This means no hypervisor makes use of ptep_get_and_clear; there is no
      reason to have it in the paravirt-ops structure.  Change confusing
      terminology about raw vs. native functions into consistent use of
      native_pte_xxx for operations which do not invoke paravirt-ops.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      4cdd9c89
    • J
      [PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages · ce6234b5
      Jeremy Fitzhardinge 提交于
      Xen and VMI both have special requirements when mapping a highmem pte
      page into the kernel address space.  These can be dealt with by adding
      a new kmap_atomic_pte() function for mapping highptes, and hooking it
      into the paravirt_ops infrastructure.
      
      Xen specifically wants to map the pte page RO, so this patch exposes a
      helper function, kmap_atomic_prot, which maps the page with the
      specified page protections.
      
      This also adds a kmap_flush_unused() function to clear out the cached
      kmap mappings.  Xen needs this to clear out any potential stray RW
      mappings of pages which will become part of a pagetable.
      
      [ Zach - vmi.c will need some attention after this patch.  It wasn't
        immediately obvious to me what needs to be done. ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      ce6234b5
    • J
      [PATCH] i386: PARAVIRT: revert map_pt_hook. · a27fe809
      Jeremy Fitzhardinge 提交于
      Back out the map_pt_hook to clear the way for kmap_atomic_pte.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      a27fe809
    • J
      [PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing · 5311ab62
      Jeremy Fitzhardinge 提交于
      Normally when running in PAE mode, the 4th PMD maps the kernel address space,
      which can be shared among all processes (since they all need the same kernel
      mappings).
      
      Xen, however, does not allow guests to have the kernel pmd shared between page
      tables, so parameterize pgtable.c to allow both modes of operation.
      
      There are several side-effects of this.  One is that vmalloc will update the
      kernel address space mappings, and those updates need to be propagated into
      all processes if the kernel mappings are not intrinsically shared.  In the
      non-PAE case, this is done by maintaining a pgd_list of all processes; this
      list is used when all process pagetables must be updated.  pgd_list is
      threaded via otherwise unused entries in the page structure for the pgd, which
      means that the pgd must be page-sized for this to work.
      
      Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE
      pgd to page aligned anyway, so this patch forces the pgd to be page
      aligned+sized when the kernel pmd is unshared, to accomodate both these
      requirements.
      
      Also, since there may be several distinct kernel pmds (if the user/kernel
      split is below 3G), there's no point in allocating them from a slab cache;
      they're just allocated with get_free_page and initialized appropriately.  (Of
      course the could be cached if there is just a single kernel pmd - which is the
      default with a 3G user/kernel split - but it doesn't seem worthwhile to add
      yet another case into this code).
      
      [ Many thanks to wli for review comments. ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NWilliam Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      5311ab62
    • J
      [PATCH] i386: PARAVIRT: Hooks to set up initial pagetable · b239fb25
      Jeremy Fitzhardinge 提交于
      This patch introduces paravirt_ops hooks to control how the kernel's
      initial pagetable is set up.
      
      In the case of a native boot, the very early bootstrap code creates a
      simple non-PAE pagetable to map the kernel and physical memory.  When
      the VM subsystem is initialized, it creates a proper pagetable which
      respects the PAE mode, large pages, etc.
      
      When booting under a hypervisor, there are many possibilities for what
      paging environment the hypervisor establishes for the guest kernel, so
      the constructon of the kernel's pagetable depends on the hypervisor.
      
      In the case of Xen, the hypervisor boots the kernel with a fully
      constructed pagetable, which is already using PAE if necessary.  Also,
      Xen requires particular care when constructing pagetables to make sure
      all pagetables are always mapped read-only.
      
      In order to make this easier, kernel's initial pagetable construction
      has been changed to only allocate and initialize a pagetable page if
      there's no page already present in the pagetable.  This allows the Xen
      paravirt backend to make a copy of the hypervisor-provided pagetable,
      allowing the kernel to establish any more mappings it needs while
      keeping the existing ones.
      
      A slightly subtle point which is worth highlighting here is that Xen
      requires all kernel mappings to share the same pte_t pages between all
      pagetables, so that updating a kernel page's mapping in one pagetable
      is reflected in all other pagetables.  This makes it possible to
      allocate a page and attach it to a pagetable without having to
      explicitly enumerate that page's mapping in all pagetables.
      
      And:
      
      +From: "Eric W. Biederman" <ebiederm@xmission.com>
      
      If we don't set the leaf page table entries it is quite possible that
      will inherit and incorrect page table entry from the initial boot
      page table setup in head.S.  So we need to redo the effort here,
      so we pick up PSE, PGE and the like.
      
      Hypervisors like Xen require that their page tables be read-only,
      which is slightly incompatible with our low identity mappings, however
      I discussed this with Jeremy he has modified the Xen early set_pte
      function to avoid problems in this area.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Acked-by: NWilliam Irwin <bill.irwin@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      b239fb25
    • J
      [PATCH] i386: PARAVIRT: Add pagetable accessors to pack and unpack pagetable entries · 3dc494e8
      Jeremy Fitzhardinge 提交于
      Add a set of accessors to pack, unpack and modify page table entries
      (at all levels).  This allows a paravirt implementation to control the
      contents of pgd/pmd/pte entries.  For example, Xen uses this to
      convert the (pseudo-)physical address into a machine address when
      populating a pagetable entry, and converting back to pphys address
      when an entry is read.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      3dc494e8
    • J
      [PATCH] x86: Improve handling of kernel mappings in change_page_attr · d01ad8dd
      Jan Beulich 提交于
      Fix various broken corner cases in i386 and x86-64 change_page_attr.
      
      AK: split off from tighten kernel image access rights
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      d01ad8dd
  4. 05 3月, 2007 1 次提交
    • Z
      [PATCH] vmi: fix highpte · 9a1c13e9
      Zachary Amsden 提交于
      Provide a PT map hook for HIGHPTE kernels to designate where they are mapping
      page tables.  This information is required so the physical address of PTE
      updates can be determined; otherwise, the mm layer would have to carry the
      physical address all the way to each PTE modification callsite, which is even
      more hideous that the macros required to provide the proper hooks.
      
      So lets not mess up arch neutral code to achieve this, but keep the horror in
      an #ifdef HIGHPTE in include/asm-i386/pgtable.h.  I had to use macros here
      because some types are not yet defined in all the include paths for this
      header.
      
      This patch is absolutely required for HIGHPTE kernels to operate properly with
      VMI.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a1c13e9
  5. 08 12月, 2006 1 次提交
  6. 07 12月, 2006 3 次提交
    • Z
      [PATCH] paravirt: fix missing pte update · 8ecb8950
      Zachary Amsden 提交于
      The function ptep_get_and_clear uses an atomic instruction sequence to get and
      clear an active pte.  Rather than add such an atomic operator to all virtual
      machine implementations in paravirt-ops, it is easier to support the raw
      atomic sequence and use either a trapping writable pagetable approach, or a
      post-update notification.  For the post update notification, we require the
      pte_update function to be called after the access.  Combine the 2-level and
      3-level paging operators into one common function which does the post-update
      notification, and rename the actual atomic sequences to raw_ptep_xxx
      operators.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      8ecb8950
    • Z
      [PATCH] paravirt: fix parameter names in mmu operations · dfbea0ad
      Zachary Amsden 提交于
      Make parameter names match function argument names for the yet to be defined
      pte_update_defer accessor.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      dfbea0ad
    • R
      [PATCH] paravirt: Add MMU virtualization to paravirt_ops · da181a8b
      Rusty Russell 提交于
      Add the three bare TLB accessor functions to paravirt-ops.  Most amusingly,
      flush_tlb is redefined on SMP, so I can't call the paravirt op flush_tlb.
      Instead, I chose to indicate the actual flush type, kernel (global) vs. user
      (non-global).  Global in this sense means using the global bit in the page
      table entry, which makes TLB entries persistent across CR3 reloads, not
      global as in the SMP sense of invoking remote shootdowns, so the term is
      confusingly overloaded.
      
      AK: folded in fix from Zach for PAE compilation
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      da181a8b
  7. 01 10月, 2006 4 次提交
    • Z
      [PATCH] paravirt: update pte hook · 789e6ac0
      Zachary Amsden 提交于
      Add a pte_update_hook which notifies about pte changes that have been made
      without using the set_pte / clear_pte interfaces.  This allows shadow mode
      hypervisors which do not trap on page table access to maintain synchronized
      shadows.
      
      It also turns out, there was one pte update in PAE mode that wasn't using any
      accessor interface at all for setting NX protection.  Considering it is PAE
      specific, and the accessor is i386 specific, I didn't want to add a generic
      encapsulation of this behavior yet.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      789e6ac0
    • Z
      [PATCH] paravirt: optimize ptep establish for pae · d6d861e3
      Zachary Amsden 提交于
      The ptep_establish macro is only used on user-level PTEs, for P->P mapping
      changes.  Since these always happen under protection of the pagetable lock,
      the strong synchronization of a 64-bit cmpxchg is not needed, in fact, not
      even a lock prefix needs to be used.  We can simply instead clear the P-bit,
      followed by a normal set.  The write ordering is still important to avoid the
      possibility of the TLB snooping a partially written PTE and getting a bad
      mapping installed.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d6d861e3
    • Z
      [PATCH] paravirt: kpte flush · 23002d88
      Zachary Amsden 提交于
      Create a new PTE function which combines clearing a kernel PTE with the
      subsequent flush.  This allows the two to be easily combined into a single
      hypercall or paravirt-op.  More subtly, reverse the order of the flush for
      kmap_atomic.  Instead of flushing on establishing a mapping, flush on clearing
      a mapping.  This eliminates the possibility of leaving stale kmap entries
      which may still have valid TLB mappings.  This is required for direct mode
      hypervisors, which need to reprotect all mappings of a given page when
      changing the page type from a normal page to a protected page (such as a page
      table or descriptor table page).  But it also provides some nicer semantics
      for real hardware, by providing extra debug-proofing against using stale
      mappings, as well as ensuring that no stale mappings exist when changing the
      cacheability attributes of a page, which could lead to cache conflicts when
      two different types of mappings exist for the same page.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      23002d88
    • Z
      [PATCH] paravirt: combine flush accessed dirty.patch · 25e4df5b
      Zachary Amsden 提交于
      Remove ptep_test_and_clear_{dirty|young} from i386, and instead use the
      dominating functions, ptep_clear_flush_{dirty|young}.  This allows the TLB
      page flush to be contained in the same macro, and allows for an eager
      optimization - if reading the PTE initially returned dirty/accessed, we can
      assume the fact that no subsequent update to the PTE which cleared accessed /
      dirty has occurred, as the only way A/D bits can change without holding the
      page table lock is if a remote processor clears them.  This eliminates an
      extra branch which came from the generic version of the code, as we know that
      no other CPU could have cleared the A/D bit, so the flush will always be
      needed.
      
      We still export these two defines, even though we do not actually define
      the macros in the i386 code:
      
       #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
       #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY
      
      The reason for this is that the only use of these functions is within the
      generic clear_flush functions, and we want a strong guarantee that there
      are no other users of these functions, so we want to prevent the generic
      code from defining them for us.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      25e4df5b
  8. 26 9月, 2006 4 次提交
    • R
      [PATCH] x86: trivial move of ptep_set_access_flags · 2965a0e6
      Rusty Russell 提交于
      Move ptep_set_access_flags to be closer to the other ptep accessors, and make
      the indentation standard.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2965a0e6
    • R
      [PATCH] x86: trivial move of __HAVE macros in i386 pagetable headers · 6049742d
      Rusty Russell 提交于
      Move the __HAVE_ARCH_PTEP defines to accompany the function definitions.
      Anything else is just a complete nightmare to track through the 2/3-level
      paging code, and this caused duplicate definitions to be needed (pte_same),
      which could have easily been taken care of with the asm-generic pgtable
      functions.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6049742d
    • D
      [PATCH] Standardize pxx_page macros · 46a82b2d
      Dave McCracken 提交于
      One of the changes necessary for shared page tables is to standardize the
      pxx_page macros.  pte_page and pmd_page have always returned the struct
      page associated with their entry, while pte_page_kernel and pmd_page_kernel
      have returned the kernel virtual address.  pud_page and pgd_page, on the
      other hand, return the kernel virtual address.
      
      Shared page tables needs pud_page and pgd_page to return the actual page
      structures.  There are very few actual users of these functions, so it is
      simple to standardize their usage.
      
      Since this is basic cleanup, I am submitting these changes as a standalone
      patch.  Per Hugh Dickins' comments about it, I am also changing the
      pxx_page_kernel macros to pxx_page_vaddr to clarify their meaning.
      Signed-off-by: NDave McCracken <dmccr@us.ibm.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      46a82b2d
    • R
      [PATCH] i386: Replace i386 open-coded cmdline parsing with · 1a3f239d
      Rusty Russell 提交于
      This patch replaces the open-coded early commandline parsing
      throughout the i386 boot code with the generic mechanism (already used
      by ppc, powerpc, ia64 and s390).  The code was inconsistent with
      whether it deletes the option from the cmdline or not, meaning some of
      these will get passed through the environment into init.
      
      This transformation is mainly mechanical, but there are some notable
      parts:
      
      1) Grammar: s/linux never set's it up/linux never sets it up/
      
      2) Remove hacked-in earlyprintk= option scanning.  When someone
         actually implements CONFIG_EARLY_PRINTK, then they can use
         early_param().
      [AK: actually it is implemented, but I'm adding the early_param it in the next
      x86-64 patch]
      
      3) Move declaration of generic_apic_probe() from setup.c into asm/apic.h
      
      4) Various parameters now moved into their appropriate files (thanks Andi).
      
      5) All parse functions which examine arg need to check for NULL,
         except one where it has subtle humor value.
      
      AK: readded acpi_sci handling which was completely dropped
      AK: moved some more variables into acpi/boot.c
      
      Cc: len.brown@intel.com
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      1a3f239d
  9. 28 4月, 2006 1 次提交
    • Z
      [PATCH] x86/PAE: Fix pte_clear for the >4GB RAM case · 6e5882cf
      Zachary Amsden 提交于
      Proposed fix for ptep_get_and_clear_full PAE bug.  Pte_clear had the same bug,
      so use the same fix for both.  Turns out pmd_clear had it as well, but pgds
      are not affected.
      
      The problem is rather intricate.  Page table entries in PAE mode are 64-bits
      wide, but the only atomic 8-byte write operation available in 32-bit mode is
      cmpxchg8b, which is expensive (at least on P4), and thus avoided.  But it can
      happen that the processor may prefetch entries into the TLB in the middle of an
      operation which clears a page table entry.  So one must always clear the P-bit
      in the low word of the page table entry first when clearing it.
      
      Since the sequence *ptep = __pte(0) leaves the order of the write dependent on
      the compiler, it must be coded explicitly as a clear of the low word followed
      by a clear of the high word.  Further, there must be a write memory barrier
      here to enforce proper ordering by the compiler (and, in the future, by the
      processor as well).
      
      On > 4GB memory machines, the implementation of pte_clear for PAE was clearly
      deficient, as it could leave virtual mappings of physical memory above 4GB
      aliased to memory below 4GB in the TLB.  The implementation of
      ptep_get_and_clear_full has a similar bug, although not nearly as likely to
      occur, since the mappings being cleared are in the process of being destroyed,
      and should never be dereferenced again.
      
      But, as luck would have it, it is possible to trigger bugs even without ever
      dereferencing these bogus TLB mappings, even if the clear is followed fairly
      soon after with a TLB flush or invalidation.  The problem is that memory above
      4GB may now be aliased into the first 4GB of memory, and in fact, may hit a
      region of memory with non-memory semantics.  These regions include AGP and PCI
      space.  As such, these memory regions are not cached by the processor.  This
      introduces the bug.
      
      The processor can speculate memory operations, including memory writes, as long
      as they are committed with the proper ordering.  Speculating a memory write to
      a linear address that has a bogus TLB mapping is possible.  Normally, the
      speculation is harmless.  But for cached memory, it does leave the falsely
      speculated cacheline unmodified, but in a dirty state.  This cache line will be
      eventually written back.  If this cacheline happens to intersect a region of
      memory that is not protected by the cache coherency protocol, it can corrupt
      data in I/O memory, which is generally a very bad thing to do, and can cause
      total system failure or just plain undefined behavior.
      
      These bugs are extremely unlikely, but the severity is of such magnitude, and
      the fix so simple that I think fixing them immediately is justified.  Also,
      they are nearly impossible to debug.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6e5882cf
  10. 26 4月, 2006 1 次提交
  11. 22 3月, 2006 1 次提交
    • Z
      [PATCH] Enable mprotect on huge pages · 8f860591
      Zhang, Yanmin 提交于
      2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
      mprotect.
      
      From: David Gibson <david@gibson.dropbear.id.au>
      
        Remove a test from the mprotect() path which checks that the mprotect()ed
        range on a hugepage VMA is hugepage aligned (yes, really, the sense of
        is_aligned_hugepage_range() is the opposite of what you'd guess :-/).
      
        In fact, we don't need this test.  If the given addresses match the
        beginning/end of a hugepage VMA they must already be suitably aligned.  If
        they don't, then mprotect_fixup() will attempt to split the VMA.  The very
        first test in split_vma() will check for a badly aligned address on a
        hugepage VMA and return -EINVAL if necessary.
      
      From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      
        On i386 and x86-64, pte flag _PAGE_PSE collides with _PAGE_PROTNONE.  The
        identify of hugetlb pte is lost when changing page protection via mprotect.
        A page fault occurs later will trigger a bug check in huge_pte_alloc().
      
        The fix is to always make new pte a hugetlb pte and also to clean up
        legacy code where _PAGE_PRESENT is forced on in the pre-faulting day.
      Signed-off-by: NZhang Yanmin <yanmin.zhang@intel.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8f860591
  12. 07 11月, 2005 1 次提交
  13. 31 10月, 2005 2 次提交
  14. 30 10月, 2005 1 次提交
    • H
      [PATCH] mm: pte_offset_map_lock loops · 705e87c0
      Hugh Dickins 提交于
      Convert those common loops using page_table_lock on the outside and
      pte_offset_map within to use just pte_offset_map_lock within instead.
      
      These all hold mmap_sem (some exclusively, some not), so at no level can a
      page table be whipped away from beneath them.  But whereas pte_alloc loops
      tested with the "atomic" pmd_present, these loops are testing with pmd_none,
      which on i386 PAE tests both lower and upper halves.
      
      That's now unsafe, so add a cast into pmd_none to test only the vital lower
      half: we lose a little sensitivity to a corrupt middle directory, but not
      enough to worry about.  It appears that i386 and UML were the only
      architectures vulnerable in this way, and pgd and pud no problem.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      705e87c0
  15. 13 9月, 2005 1 次提交
  16. 05 9月, 2005 4 次提交
    • Z
      [PATCH] i386: encapsulate copying of pgd entries · d7271b14
      Zachary Amsden 提交于
      Add a clone operation for pgd updates.
      
      This helps complete the encapsulation of updates to page tables (or pages
      about to become page tables) into accessor functions rather than using
      memcpy() to duplicate them.  This is both generally good for consistency
      and also necessary for running in a hypervisor which requires explicit
      updates to page table entries.
      
      The new function is:
      
      clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
      
         dst - pointer to pgd range anwhere on a pgd page
         src - ""
         count - the number of pgds to copy.
      
         dst and src can be on the same page, but the range must not overlap
         and must not cross a page boundary.
      
      Note that I ommitted using this call to copy pgd entries into the
      software suspend page root, since this is not technically a live paging
      structure, rather it is used on resume from suspend.  CC'ing Pavel in case
      he has any feedback on this.
      
      Thanks to Chris Wright for noticing that this could be more optimal in
      PAE compiles by eliminating the memset.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d7271b14
    • Z
      [PATCH] x86: ptep_clear optimization · a600388d
      Zachary Amsden 提交于
      Add a new accessor for PTEs, which passes the full hint from the mmu_gather
      struct; this allows architectures with hardware pagetables to optimize away
      atomic PTE operations when destroying an address space.  Removing the
      locked operation should allow better pipelining of memory access in this
      loop.  I measured an average savings of 30-35 cycles per zap_pte_range on
      the first 500 destructions on Pentium-M, but I believe the optimization
      would win more on older processors which still assert the bus lock on xchg
      for an exclusive cacheline.
      
      Update: I made some new measurements, and this saves exactly 26 cycles over
      ptep_get_and_clear on Pentium M.  On P4, with a PAE kernel, this saves 180
      cycles per ptep_get_and_clear, for a whopping 92160 cycles savings for a
      full address space destruction.
      
      pte_clear_full is not yet used, but is provided for future optimizations
      (in particular, when running inside of a hypervisor that queues page table
      updates, the full hint allows us to avoid queueing unnecessary page table
      update for an address space in the process of being destroyed.
      
      This is not a huge win, but it does help a bit, and sets the stage for
      further hypervisor optimization of the mm layer on all architectures.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Christoph Lameter <christoph@lameter.com>
      Cc: <linux-mm@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a600388d
    • A
      [PATCH] hugetlb: add pte_huge() macro · 32e51a8c
      Adam Litke 提交于
      This patch adds a macro pte_huge(pte) for i386/x86_64 which is needed by a
      patch later in the series.  Instead of repeating (_PAGE_PRESENT |
      _PAGE_PSE), I've added __LARGE_PTE to i386 to match x86_64.
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: <linux-mm@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      32e51a8c
    • P
      [PATCH] mm: correct _PAGE_FILE comment · 9b4ee40e
      Paolo 'Blaisorblade' Giarrusso 提交于
      _PAGE_FILE does not indicate whether a file is in page / swap cache, it is
      set just for non-linear PTE's.  Correct the comment for i386, x86_64, UML.
      Also clearify _PAGE_NONE.
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9b4ee40e
  17. 24 6月, 2005 1 次提交
  18. 22 6月, 2005 1 次提交
    • D
      [PATCH] Hugepage consolidation · 63551ae0
      David Gibson 提交于
      A lot of the code in arch/*/mm/hugetlbpage.c is quite similar.  This patch
      attempts to consolidate a lot of the code across the arch's, putting the
      combined version in mm/hugetlb.c.  There are a couple of uglyish hacks in
      order to covert all the hugepage archs, but the result is a very large
      reduction in the total amount of code.  It also means things like hugepage
      lazy allocation could be implemented in one place, instead of six.
      
      Tested, at least a little, on ppc64, i386 and x86_64.
      
      Notes:
      	- this patch changes the meaning of set_huge_pte() to be more
      	  analagous to set_pte()
      	- does SH4 need s special huge_ptep_get_and_clear()??
      Acked-by: NWilliam Lee Irwin <wli@holomorphy.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      63551ae0
  19. 01 5月, 2005 1 次提交
    • J
      [PATCH] misc verify_area cleanups · e49332bd
      Jesper Juhl 提交于
      There were still a few comments left refering to verify_area, and two
      functions, verify_area_skas & verify_area_tt that just wrap corresponding
      access_ok_skas & access_ok_tt functions, just like verify_area does for
      access_ok - deprecate those.
      
      There was also a few places that still used verify_area in commented-out
      code, fix those up to use access_ok.
      
      After applying this one there should not be anything left but finally
      removing verify_area completely, which will happen after a kernel release
      or two.
      Signed-off-by: NJesper Juhl <juhl-lkml@dif.dk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e49332bd