1. 03 5月, 2007 5 次提交
    • Z
      [PATCH] i386: pte simplify ops · 9e5e3162
      Zachary Amsden 提交于
      Add comment and condense code to make use of native_local_ptep_get_and_clear
      function.  Also, it turns out the 2-level and 3-level paging definitions were
      identical, so move the common definition into pgtable.h
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      9e5e3162
    • Z
      [PATCH] i386: pte xchg optimization · 142dd975
      Zachary Amsden 提交于
      In situations where page table updates need only be made locally, and there is
      no cross-processor A/D bit races involved, we need not use the heavyweight
      xchg instruction to atomically fetch and clear page table entries.  Instead,
      we can just read and clear them directly.
      
      This introduces a neat optimization for non-SMP kernels; drop the atomic xchg
      operations from page table updates.
      
      Thanks to Michel Lespinasse for noting this potential optimization.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      142dd975
    • Z
      [PATCH] i386: pte clear optimization · c2c1accd
      Zachary Amsden 提交于
      When exiting from an address space, no special hypervisor notification of page
      table updates needs to occur; direct page table hypervisors, such as Xen,
      switch to another address space first (init_mm) and unprotects the page tables
      to avoid the cost of trapping to the hypervisor for each pte_clear.  Shadow
      mode hypervisors, such as VMI and lhype don't need to do the extra work of
      calling through paravirt-ops, and can just directly clear the page table
      entries without notifiying the hypervisor, since all the page tables are about
      to be freed.
      
      So introduce native_pte_clear functions which bypass any paravirt-ops
      notification.  This results in a significant performance win for VMI and
      removes some indirect calls from zap_pte_range.
      
      Note the 3-level paging already had a native_pte_clear function, thus
      demanding argument conformance and extra args for the 2-level definition.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      c2c1accd
    • J
      [PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing · 5311ab62
      Jeremy Fitzhardinge 提交于
      Normally when running in PAE mode, the 4th PMD maps the kernel address space,
      which can be shared among all processes (since they all need the same kernel
      mappings).
      
      Xen, however, does not allow guests to have the kernel pmd shared between page
      tables, so parameterize pgtable.c to allow both modes of operation.
      
      There are several side-effects of this.  One is that vmalloc will update the
      kernel address space mappings, and those updates need to be propagated into
      all processes if the kernel mappings are not intrinsically shared.  In the
      non-PAE case, this is done by maintaining a pgd_list of all processes; this
      list is used when all process pagetables must be updated.  pgd_list is
      threaded via otherwise unused entries in the page structure for the pgd, which
      means that the pgd must be page-sized for this to work.
      
      Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE
      pgd to page aligned anyway, so this patch forces the pgd to be page
      aligned+sized when the kernel pmd is unshared, to accomodate both these
      requirements.
      
      Also, since there may be several distinct kernel pmds (if the user/kernel
      split is below 3G), there's no point in allocating them from a slab cache;
      they're just allocated with get_free_page and initialized appropriately.  (Of
      course the could be cached if there is just a single kernel pmd - which is the
      default with a 3G user/kernel split - but it doesn't seem worthwhile to add
      yet another case into this code).
      
      [ Many thanks to wli for review comments. ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NWilliam Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      5311ab62
    • J
      [PATCH] i386: PARAVIRT: Add pagetable accessors to pack and unpack pagetable entries · 3dc494e8
      Jeremy Fitzhardinge 提交于
      Add a set of accessors to pack, unpack and modify page table entries
      (at all levels).  This allows a paravirt implementation to control the
      contents of pgd/pmd/pte entries.  For example, Xen uses this to
      convert the (pseudo-)physical address into a machine address when
      populating a pagetable entry, and converting back to pphys address
      when an entry is read.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      3dc494e8
  2. 07 12月, 2006 3 次提交
    • Z
      [PATCH] paravirt: fix missing pte update · 8ecb8950
      Zachary Amsden 提交于
      The function ptep_get_and_clear uses an atomic instruction sequence to get and
      clear an active pte.  Rather than add such an atomic operator to all virtual
      machine implementations in paravirt-ops, it is easier to support the raw
      atomic sequence and use either a trapping writable pagetable approach, or a
      post-update notification.  For the post update notification, we require the
      pte_update function to be called after the access.  Combine the 2-level and
      3-level paging operators into one common function which does the post-update
      notification, and rename the actual atomic sequences to raw_ptep_xxx
      operators.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      8ecb8950
    • Z
      [PATCH] paravirt: Preparatory mmu header movement · a2952d89
      Zachary Amsden 提交于
      Move header includes for the nopud / nopmd types to the location of the actual
      pte / pgd type definitions.  This allows generic 4-level page type code to be
      written before the split 2/3 level page table headers are included.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      a2952d89
    • R
      [PATCH] paravirt: Add MMU virtualization to paravirt_ops · da181a8b
      Rusty Russell 提交于
      Add the three bare TLB accessor functions to paravirt-ops.  Most amusingly,
      flush_tlb is redefined on SMP, so I can't call the paravirt op flush_tlb.
      Instead, I chose to indicate the actual flush type, kernel (global) vs. user
      (non-global).  Global in this sense means using the global bit in the page
      table entry, which makes TLB entries persistent across CR3 reloads, not
      global as in the SMP sense of invoking remote shootdowns, so the term is
      confusingly overloaded.
      
      AK: folded in fix from Zach for PAE compilation
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      da181a8b
  3. 01 10月, 2006 1 次提交
    • Z
      [PATCH] paravirt: optimize ptep establish for pae · d6d861e3
      Zachary Amsden 提交于
      The ptep_establish macro is only used on user-level PTEs, for P->P mapping
      changes.  Since these always happen under protection of the pagetable lock,
      the strong synchronization of a 64-bit cmpxchg is not needed, in fact, not
      even a lock prefix needs to be used.  We can simply instead clear the P-bit,
      followed by a normal set.  The write ordering is still important to avoid the
      possibility of the TLB snooping a partially written PTE and getting a bad
      mapping installed.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d6d861e3
  4. 26 9月, 2006 1 次提交
  5. 28 4月, 2006 1 次提交
    • Z
      [PATCH] x86/PAE: Fix pte_clear for the >4GB RAM case · 6e5882cf
      Zachary Amsden 提交于
      Proposed fix for ptep_get_and_clear_full PAE bug.  Pte_clear had the same bug,
      so use the same fix for both.  Turns out pmd_clear had it as well, but pgds
      are not affected.
      
      The problem is rather intricate.  Page table entries in PAE mode are 64-bits
      wide, but the only atomic 8-byte write operation available in 32-bit mode is
      cmpxchg8b, which is expensive (at least on P4), and thus avoided.  But it can
      happen that the processor may prefetch entries into the TLB in the middle of an
      operation which clears a page table entry.  So one must always clear the P-bit
      in the low word of the page table entry first when clearing it.
      
      Since the sequence *ptep = __pte(0) leaves the order of the write dependent on
      the compiler, it must be coded explicitly as a clear of the low word followed
      by a clear of the high word.  Further, there must be a write memory barrier
      here to enforce proper ordering by the compiler (and, in the future, by the
      processor as well).
      
      On > 4GB memory machines, the implementation of pte_clear for PAE was clearly
      deficient, as it could leave virtual mappings of physical memory above 4GB
      aliased to memory below 4GB in the TLB.  The implementation of
      ptep_get_and_clear_full has a similar bug, although not nearly as likely to
      occur, since the mappings being cleared are in the process of being destroyed,
      and should never be dereferenced again.
      
      But, as luck would have it, it is possible to trigger bugs even without ever
      dereferencing these bogus TLB mappings, even if the clear is followed fairly
      soon after with a TLB flush or invalidation.  The problem is that memory above
      4GB may now be aliased into the first 4GB of memory, and in fact, may hit a
      region of memory with non-memory semantics.  These regions include AGP and PCI
      space.  As such, these memory regions are not cached by the processor.  This
      introduces the bug.
      
      The processor can speculate memory operations, including memory writes, as long
      as they are committed with the proper ordering.  Speculating a memory write to
      a linear address that has a bogus TLB mapping is possible.  Normally, the
      speculation is harmless.  But for cached memory, it does leave the falsely
      speculated cacheline unmodified, but in a dirty state.  This cache line will be
      eventually written back.  If this cacheline happens to intersect a region of
      memory that is not protected by the cache coherency protocol, it can corrupt
      data in I/O memory, which is generally a very bad thing to do, and can cause
      total system failure or just plain undefined behavior.
      
      These bugs are extremely unlikely, but the severity is of such magnitude, and
      the fix so simple that I think fixing them immediately is justified.  Also,
      they are nearly impossible to debug.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6e5882cf
  6. 23 3月, 2006 1 次提交
    • J
      [PATCH] i386: actively synchronize vmalloc area when registering certain callbacks · 101f12af
      Jan Beulich 提交于
      Registering a callback handler through register_die_notifier() is obviously
      primarily intended for use by modules.  However, the way these currently
      get called it is basically impossible for them to actually be used by
      modules, as there is, on non-PAE configurationes, a good chance (the larger
      the module, the better) for the system to crash as a result.
      
      This is because the callback gets invoked
      
      (a) in the page fault path before the top level page table propagation
          gets carried out (hence a fault to propagate the top level page table
          entry/entries mapping to module's code/data would nest infinitly) and
      
      (b) in the NMI path, where nested faults must absolutely not happen,
          since otherwise the IRET from the nested fault re-enables NMIs,
          potentially resulting in nested NMI occurences.
      
      Besides the modular aspect, similar problems would even arise for in-
      kernel consumers of the API if they touched ioremap()ed or vmalloc()ed
      memory inside their handlers.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      101f12af
  7. 31 10月, 2005 1 次提交
  8. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4