1. 17 4月, 2008 2 次提交
  2. 12 3月, 2008 1 次提交
    • T
      x86: remove quicklists · 985a34bd
      Thomas Gleixner 提交于
      quicklists cause a serious memory leak on 32-bit x86,
      as documented at:
      
        http://bugzilla.kernel.org/show_bug.cgi?id=9991
      
      the reason is that the quicklist pool is a special-purpose
      cache that grows out of proportion. It is not accounted for
      anywhere and users have no way to even realize that it's
      the quicklists that are causing RAM usage spikes. It was
      supposed to be a relatively small pool, but as demonstrated
      by KOSAKI Motohiro, they can grow as large as:
      
        Quicklists:    1194304 kB
      
      given how much trouble this code has caused historically,
      and given that Andrew objected to its introduction on x86
      (years ago), the best option at this point is to remove them.
      
      [ any performance benefits of caching constructed pgds should
        be implemented in a more generic way (possibly within the page
        allocator), while still allowing constructed pages to be
        allocated by other workloads. ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      985a34bd
  3. 09 2月, 2008 1 次提交
    • M
      CONFIG_HIGHPTE vs. sub-page page tables. · 2f569afd
      Martin Schwidefsky 提交于
      Background: I've implemented 1K/2K page tables for s390.  These sub-page
      page tables are required to properly support the s390 virtualization
      instruction with KVM.  The SIE instruction requires that the page tables
      have 256 page table entries (pte) followed by 256 page status table entries
      (pgste).  The pgstes are only required if the process is using the SIE
      instruction.  The pgstes are updated by the hardware and by the hypervisor
      for a number of reasons, one of them is dirty and reference bit tracking.
      To avoid wasting memory the standard pte table allocation should return
      1K/2K (31/64 bit) and 2K/4K if the process is using SIE.
      
      Problem: Page size on s390 is 4K, page table size is 1K or 2K.  That means
      the s390 version for pte_alloc_one cannot return a pointer to a struct
      page.  Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
      cannot return a pointer to a pte either, since that would require more than
      32 bit for the return value of pte_alloc_one (and the pte * would not be
      accessible since its not kmapped).
      
      Solution: The only solution I found to this dilemma is a new typedef: a
      pgtable_t.  For s390 pgtable_t will be a (pte *) - to be introduced with a
      later patch.  For everybody else it will be a (struct page *).  The
      additional problem with the initialization of the ptl lock and the
      NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
      a destructor pgtable_page_dtor.  The page table allocation and free
      functions need to call these two whenever a page table page is allocated or
      freed.  pmd_populate will get a pgtable_t instead of a struct page pointer.
       To get the pgtable_t back from a pmd entry that has been installed with
      pmd_populate a new function pmd_pgtable is added.  It replaces the pmd_page
      call in free_pte_range and apply_to_pte_range.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f569afd
  4. 06 2月, 2008 1 次提交
  5. 04 2月, 2008 2 次提交
  6. 01 2月, 2008 1 次提交
  7. 30 1月, 2008 5 次提交
  8. 18 10月, 2007 2 次提交
  9. 17 10月, 2007 1 次提交
  10. 11 10月, 2007 2 次提交
  11. 22 7月, 2007 1 次提交
  12. 13 5月, 2007 1 次提交
  13. 03 5月, 2007 2 次提交
    • J
      [PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing · 5311ab62
      Jeremy Fitzhardinge 提交于
      Normally when running in PAE mode, the 4th PMD maps the kernel address space,
      which can be shared among all processes (since they all need the same kernel
      mappings).
      
      Xen, however, does not allow guests to have the kernel pmd shared between page
      tables, so parameterize pgtable.c to allow both modes of operation.
      
      There are several side-effects of this.  One is that vmalloc will update the
      kernel address space mappings, and those updates need to be propagated into
      all processes if the kernel mappings are not intrinsically shared.  In the
      non-PAE case, this is done by maintaining a pgd_list of all processes; this
      list is used when all process pagetables must be updated.  pgd_list is
      threaded via otherwise unused entries in the page structure for the pgd, which
      means that the pgd must be page-sized for this to work.
      
      Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE
      pgd to page aligned anyway, so this patch forces the pgd to be page
      aligned+sized when the kernel pmd is unshared, to accomodate both these
      requirements.
      
      Also, since there may be several distinct kernel pmds (if the user/kernel
      split is below 3G), there's no point in allocating them from a slab cache;
      they're just allocated with get_free_page and initialized appropriately.  (Of
      course the could be cached if there is just a single kernel pmd - which is the
      default with a 3G user/kernel split - but it doesn't seem worthwhile to add
      yet another case into this code).
      
      [ Many thanks to wli for review comments. ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NWilliam Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      5311ab62
    • J
      [PATCH] i386: Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO · d4f7a2c1
      Jeremy Fitzhardinge 提交于
      Some versions of libc can't deal with a VDSO which doesn't have its
      ELF headers matching its mapped address.  COMPAT_VDSO maps the VDSO at
      a specific system-wide fixed address.  Previously this was all done at
      build time, on the grounds that the fixed VDSO address is always at
      the top of the address space.  However, a hypervisor may reserve some
      of that address space, pushing the fixmap address down.
      
      This patch does the adjustment dynamically at runtime, depending on
      the runtime location of the VDSO fixmap.
      
      [ Patch has been through several hands: Jan Beulich wrote the orignal
        version; Zach reworked it, and Jeremy converted it to relocate phdrs
        as well as sections. ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: "Jan Beulich" <JBeulich@novell.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Roland McGrath <roland@redhat.com>
      d4f7a2c1
  14. 13 2月, 2007 2 次提交
    • Z
      [PATCH] i386: vMI backend for paravirt-ops · 7ce0bcfd
      Zachary Amsden 提交于
      Fairly straightforward implementation of VMI backend for paravirt-ops.
      
      [Adrian Bunk: some cleanups]
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      7ce0bcfd
    • Z
      [PATCH] MM: page allocation hooks for VMI backend · c119ecce
      Zachary Amsden 提交于
      The VMI backend uses explicit page type notification to track shadow page
      tables.  The allocation of page table roots is especially tricky.  We need to
      clone the root for non-PAE mode while it is protected under the pgd lock to
      correctly copy the shadow.
      
      We don't need to allocate pgds in PAE mode, (PDPs in Intel terminology) as
      they only have 4 entries, and are cached entirely by the processor, which
      makes shadowing them rather simple.
      
      For base page table level allocation, pmd_populate provides the exact hook
      point we need.  Also, we need to allocate pages when splitting a large page,
      and we must release pages before returning the page to any free pool.
      
      Despite being required with these slightly odd semantics for VMI, Xen also
      uses these hooks to determine the exact moment when page tables are created or
      released.
      
      AK: All nops for other architectures
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      c119ecce
  15. 08 12月, 2006 1 次提交
  16. 07 12月, 2006 1 次提交
    • J
      [PATCH] i386: clear_fixmap() should not use set_pte() · b0bfece4
      Jan Beulich 提交于
      While not strictly required with the current code (as the upper half of
      page table entries generated by __set_fixmap() cannot be non-zero due
      to the second parameter of this function being 'unsigned long'), the
      use of set_pte() in __set_fixmap() in the context of clear_fixmap() is
      still improper with CONFIG_X86_PAE (see the respective comment in
      include/asm-i386/pgtable-3level.h) and would turn into a bug if that
      second parameter ever gets changed to a 64-bit type.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      b0bfece4
  17. 26 9月, 2006 2 次提交
  18. 01 7月, 2006 6 次提交
  19. 28 3月, 2006 1 次提交
  20. 30 10月, 2005 2 次提交
    • D
      [PATCH] memory hotplug locking: node_size_lock · 208d54e5
      Dave Hansen 提交于
      pgdat->node_size_lock is basically only neeeded in one place in the normal
      code: show_mem(), which is the arch-specific sysrq-m printing function.
      
      Strictly speaking, the architectures not doing memory hotplug do no need this
      locking in show_mem().  However, they are all included for completeness.  This
      should also make any future consolidation of all of the implementations a
      little more straightforward.
      
      This lock is also held in the sparsemem code during a memory removal, as
      sections are invalidated.  This is the place there pfn_valid() is made false
      for a memory area that's being removed.  The lock is only required when doing
      pfn_valid() operations on memory which the user does not already have a
      reference on the page, such as in show_mem().
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      208d54e5
    • H
      [PATCH] mm: split page table lock · 4c21e2f2
      Hugh Dickins 提交于
      Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
      a many-threaded application which concurrently initializes different parts of
      a large anonymous area.
      
      This patch corrects that, by using a separate spinlock per page table page, to
      guard the page table entries in that page, instead of using the mm's single
      page_table_lock.  (But even then, page_table_lock is still used to guard page
      table allocation, and anon_vma allocation.)
      
      In this implementation, the spinlock is tucked inside the struct page of the
      page table page: with a BUILD_BUG_ON in case it overflows - which it would in
      the case of 32-bit PA-RISC with spinlock debugging enabled.
      
      Splitting the lock is not quite for free: another cacheline access.  Ideally,
      I suppose we would use split ptlock only for multi-threaded processes on
      multi-cpu machines; but deciding that dynamically would have its own costs.
      So for now enable it by config, at some number of cpus - since the Kconfig
      language doesn't support inequalities, let preprocessor compare that with
      NR_CPUS.  But I don't think it's worth being user-configurable: for good
      testing of both split and unsplit configs, split now at 4 cpus, and perhaps
      change that to 8 later.
      
      There is a benefit even for singly threaded processes: kswapd can be attacking
      one part of the mm while another part is busy faulting.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4c21e2f2
  21. 05 9月, 2005 1 次提交
    • Z
      [PATCH] i386: encapsulate copying of pgd entries · d7271b14
      Zachary Amsden 提交于
      Add a clone operation for pgd updates.
      
      This helps complete the encapsulation of updates to page tables (or pages
      about to become page tables) into accessor functions rather than using
      memcpy() to duplicate them.  This is both generally good for consistency
      and also necessary for running in a hypervisor which requires explicit
      updates to page table entries.
      
      The new function is:
      
      clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
      
         dst - pointer to pgd range anwhere on a pgd page
         src - ""
         count - the number of pgds to copy.
      
         dst and src can be on the same page, but the range must not overlap
         and must not cross a page boundary.
      
      Note that I ommitted using this call to copy pgd entries into the
      software suspend page root, since this is not technically a live paging
      structure, rather it is used on resume from suspend.  CC'ing Pavel in case
      he has any feedback on this.
      
      Thanks to Chris Wright for noticing that this could be more optimal in
      PAE compiles by eliminating the memset.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d7271b14
  22. 26 6月, 2005 1 次提交
  23. 24 6月, 2005 1 次提交