1. 07 1月, 2009 4 次提交
  2. 13 11月, 2008 1 次提交
  3. 07 11月, 2008 2 次提交
    • A
      hugetlb: pull gigantic page initialisation out of the default path · 18229df5
      Andy Whitcroft 提交于
      As we can determine exactly when a gigantic page is in use we can optimise
      the common regular page cases by pulling out gigantic page initialisation
      into its own function.  As gigantic pages are never released to buddy we
      do not need a destructor.  This effectivly reverts the previous change to
      the main buddy allocator.  It also adds a paranoid check to ensure we
      never release gigantic pages from hugetlbfs to the main buddy.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: <stable@kernel.org>		[2.6.27.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      18229df5
    • A
      hugetlbfs: handle pages higher order than MAX_ORDER · 69d177c2
      Andy Whitcroft 提交于
      When working with hugepages, hugetlbfs assumes that those hugepages are
      smaller than MAX_ORDER.  Specifically it assumes that the mem_map is
      contigious and uses that to optimise access to the elements of the mem_map
      that represent the hugepage.  Gigantic pages (such as 16GB pages on
      powerpc) by definition are of greater order than MAX_ORDER (larger than
      MAX_ORDER_NR_PAGES in size).  This means that we can no longer make use of
      the buddy alloctor guarentees for the contiguity of the mem_map, which
      ensures that the mem_map is at least contigious for maximmally aligned
      areas of MAX_ORDER_NR_PAGES pages.
      
      This patch adds new mem_map accessors and iterator helpers which handle
      any discontiguity at MAX_ORDER_NR_PAGES boundaries.  It then uses these to
      implement gigantic page versions of copy_huge_page and clear_huge_page,
      and to allow follow_hugetlb_page handle gigantic pages.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: <stable@kernel.org>		[2.6.27.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69d177c2
  4. 23 10月, 2008 1 次提交
  5. 20 10月, 2008 3 次提交
    • K
      hugepage: support ZERO_PAGE() · 4b2e38ad
      KOSAKI Motohiro 提交于
      Presently hugepage doesn't use zero page at all because zero page is only
      used for coredumping and hugepage can't core dump.
      
      However we have now implemented hugepage coredumping.  Therefore we should
      implement the zero page of hugepage.
      
      Implementation note:
      
      o Why do we only check VM_SHARED for zero page?
        normal page checked as ..
      
      	static inline int use_zero_page(struct vm_area_struct *vma)
      	{
      	        if (vma->vm_flags & (VM_LOCKED | VM_SHARED))
      	                return 0;
      
      	        return !vma->vm_ops || !vma->vm_ops->fault;
      	}
      
      First, hugepages are never mlock()ed.  We aren't concerned with VM_LOCKED.
      
      Second, hugetlbfs is a pseudo filesystem, not a real filesystem and it
      doesn't have any file backing.  Thus ops->fault checking is meaningless.
      
      o Why don't we use zero page if !pte.
      
      !pte indicate {pud, pmd} doesn't exist or some error happened.  So we
      shouldn't return zero page if any error occurred.
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Kawai Hidehiro <hidehiro.kawai.ez@hitachi.com>
      Cc: Mel Gorman <mel@skynet.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b2e38ad
    • H
      mm: hugetlb.c make functions static, use NULL rather than 0 · 2a4b3ded
      Harvey Harrison 提交于
      mm/hugetlb.c:265:17: warning: symbol 'resv_map_alloc' was not declared. Should it be static?
      mm/hugetlb.c:277:6: warning: symbol 'resv_map_release' was not declared. Should it be static?
      mm/hugetlb.c:292:9: warning: Using plain integer as NULL pointer
      mm/hugetlb.c:1750:5: warning: symbol 'unmap_ref_private' was not declared. Should it be static?
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Acked-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a4b3ded
    • R
      vmscan: split LRU lists into anon & file sets · 4f98a2fe
      Rik van Riel 提交于
      Split the LRU lists in two, one set for pages that are backed by real file
      systems ("file") and one for pages that are backed by memory and swap
      ("anon").  The latter includes tmpfs.
      
      The advantage of doing this is that the VM will not have to scan over lots
      of anonymous pages (which we generally do not want to swap out), just to
      find the page cache pages that it should evict.
      
      This patch has the infrastructure and a basic policy to balance how much
      we scan the anon lists and how much we scan the file lists.  The big
      policy changes are in separate patches.
      
      [lee.schermerhorn@hp.com: collect lru meminfo statistics from correct offset]
      [kosaki.motohiro@jp.fujitsu.com: prevent incorrect oom under split_lru]
      [kosaki.motohiro@jp.fujitsu.com: fix pagevec_move_tail() doesn't treat unevictable page]
      [hugh@veritas.com: memcg swapbacked pages active]
      [hugh@veritas.com: splitlru: BDI_CAP_SWAP_BACKED]
      [akpm@linux-foundation.org: fix /proc/vmstat units]
      [nishimura@mxp.nes.nec.co.jp: memcg: fix handling of shmem migration]
      [kosaki.motohiro@jp.fujitsu.com: adjust Quicklists field of /proc/meminfo]
      [kosaki.motohiro@jp.fujitsu.com: fix style issue of get_scan_ratio()]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NLee Schermerhorn <Lee.Schermerhorn@hp.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f98a2fe
  6. 17 10月, 2008 1 次提交
    • D
      hugetlb: handle updating of ACCESSED and DIRTY in hugetlb_fault() · b4d1d99f
      David Gibson 提交于
      The page fault path for normal pages, if the fault is neither a no-page
      fault nor a write-protect fault, will update the DIRTY and ACCESSED bits
      in the page table appropriately.
      
      The hugepage fault path, however, does not do this, handling only no-page
      or write-protect type faults.  It assumes that either the ACCESSED and
      DIRTY bits are irrelevant for hugepages (usually true, since they are
      never swapped) or that they are handled by the arch code.
      
      This is inconvenient for some software-loaded TLB architectures, where the
      _PAGE_ACCESSED (_PAGE_DIRTY) bits need to be set to enable read (write)
      access to the page at the TLB miss.  This could be worked around in the
      arch TLB miss code, but the TLB miss fast path can be made simple more
      easily if the hugetlb_fault() path handles this, as the normal page fault
      path does.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4d1d99f
  7. 13 8月, 2008 3 次提交
  8. 07 8月, 2008 1 次提交
  9. 02 8月, 2008 1 次提交
  10. 30 7月, 2008 1 次提交
  11. 29 7月, 2008 2 次提交
    • A
      mm/hugetlb.c must #include <asm/io.h> · 78a34ae2
      Adrian Bunk 提交于
      This patch fixes the following build error on sh caused by commit
      aa888a74 ("hugetlb: support larger than
      MAX_ORDER"):
      
        mm/hugetlb.c: In function 'alloc_bootmem_huge_page':
        mm/hugetlb.c:958: error: implicit declaration of function 'virt_to_phys'
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78a34ae2
    • A
      mmu-notifiers: core · cddb8a5c
      Andrea Arcangeli 提交于
      With KVM/GFP/XPMEM there isn't just the primary CPU MMU pointing to pages.
       There are secondary MMUs (with secondary sptes and secondary tlbs) too.
      sptes in the kvm case are shadow pagetables, but when I say spte in
      mmu-notifier context, I mean "secondary pte".  In GRU case there's no
      actual secondary pte and there's only a secondary tlb because the GRU
      secondary MMU has no knowledge about sptes and every secondary tlb miss
      event in the MMU always generates a page fault that has to be resolved by
      the CPU (this is not the case of KVM where the a secondary tlb miss will
      walk sptes in hardware and it will refill the secondary tlb transparently
      to software if the corresponding spte is present).  The same way
      zap_page_range has to invalidate the pte before freeing the page, the spte
      (and secondary tlb) must also be invalidated before any page is freed and
      reused.
      
      Currently we take a page_count pin on every page mapped by sptes, but that
      means the pages can't be swapped whenever they're mapped by any spte
      because they're part of the guest working set.  Furthermore a spte unmap
      event can immediately lead to a page to be freed when the pin is released
      (so requiring the same complex and relatively slow tlb_gather smp safe
      logic we have in zap_page_range and that can be avoided completely if the
      spte unmap event doesn't require an unpin of the page previously mapped in
      the secondary MMU).
      
      The mmu notifiers allow kvm/GRU/XPMEM to attach to the tsk->mm and know
      when the VM is swapping or freeing or doing anything on the primary MMU so
      that the secondary MMU code can drop sptes before the pages are freed,
      avoiding all page pinning and allowing 100% reliable swapping of guest
      physical address space.  Furthermore it avoids the code that teardown the
      mappings of the secondary MMU, to implement a logic like tlb_gather in
      zap_page_range that would require many IPI to flush other cpu tlbs, for
      each fixed number of spte unmapped.
      
      To make an example: if what happens on the primary MMU is a protection
      downgrade (from writeable to wrprotect) the secondary MMU mappings will be
      invalidated, and the next secondary-mmu-page-fault will call
      get_user_pages and trigger a do_wp_page through get_user_pages if it
      called get_user_pages with write=1, and it'll re-establishing an updated
      spte or secondary-tlb-mapping on the copied page.  Or it will setup a
      readonly spte or readonly tlb mapping if it's a guest-read, if it calls
      get_user_pages with write=0.  This is just an example.
      
      This allows to map any page pointed by any pte (and in turn visible in the
      primary CPU MMU), into a secondary MMU (be it a pure tlb like GRU, or an
      full MMU with both sptes and secondary-tlb like the shadow-pagetable layer
      with kvm), or a remote DMA in software like XPMEM (hence needing of
      schedule in XPMEM code to send the invalidate to the remote node, while no
      need to schedule in kvm/gru as it's an immediate event like invalidating
      primary-mmu pte).
      
      At least for KVM without this patch it's impossible to swap guests
      reliably.  And having this feature and removing the page pin allows
      several other optimizations that simplify life considerably.
      
      Dependencies:
      
      1) mm_take_all_locks() to register the mmu notifier when the whole VM
         isn't doing anything with "mm".  This allows mmu notifier users to keep
         track if the VM is in the middle of the invalidate_range_begin/end
         critical section with an atomic counter incraese in range_begin and
         decreased in range_end.  No secondary MMU page fault is allowed to map
         any spte or secondary tlb reference, while the VM is in the middle of
         range_begin/end as any page returned by get_user_pages in that critical
         section could later immediately be freed without any further
         ->invalidate_page notification (invalidate_range_begin/end works on
         ranges and ->invalidate_page isn't called immediately before freeing
         the page).  To stop all page freeing and pagetable overwrites the
         mmap_sem must be taken in write mode and all other anon_vma/i_mmap
         locks must be taken too.
      
      2) It'd be a waste to add branches in the VM if nobody could possibly
         run KVM/GRU/XPMEM on the kernel, so mmu notifiers will only enabled if
         CONFIG_KVM=m/y.  In the current kernel kvm won't yet take advantage of
         mmu notifiers, but this already allows to compile a KVM external module
         against a kernel with mmu notifiers enabled and from the next pull from
         kvm.git we'll start using them.  And GRU/XPMEM will also be able to
         continue the development by enabling KVM=m in their config, until they
         submit all GRU/XPMEM GPLv2 code to the mainline kernel.  Then they can
         also enable MMU_NOTIFIERS in the same way KVM does it (even if KVM=n).
         This guarantees nobody selects MMU_NOTIFIER=y if KVM and GRU and XPMEM
         are all =n.
      
      The mmu_notifier_register call can fail because mm_take_all_locks may be
      interrupted by a signal and return -EINTR.  Because mmu_notifier_reigster
      is used when a driver startup, a failure can be gracefully handled.  Here
      an example of the change applied to kvm to register the mmu notifiers.
      Usually when a driver startups other allocations are required anyway and
      -ENOMEM failure paths exists already.
      
       struct  kvm *kvm_arch_create_vm(void)
       {
              struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
      +       int err;
      
              if (!kvm)
                      return ERR_PTR(-ENOMEM);
      
              INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
      
      +       kvm->arch.mmu_notifier.ops = &kvm_mmu_notifier_ops;
      +       err = mmu_notifier_register(&kvm->arch.mmu_notifier, current->mm);
      +       if (err) {
      +               kfree(kvm);
      +               return ERR_PTR(err);
      +       }
      +
              return kvm;
       }
      
      mmu_notifier_unregister returns void and it's reliable.
      
      The patch also adds a few needed but missing includes that would prevent
      kernel to compile after these changes on non-x86 archs (x86 didn't need
      them by luck).
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix mm/filemap_xip.c build]
      [akpm@linux-foundation.org: fix mm/mmu_notifier.c build]
      Signed-off-by: NAndrea Arcangeli <andrea@qumranet.com>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Robin Holt <holt@sgi.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Kanoj Sarcar <kanojsarcar@yahoo.com>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: Steve Wise <swise@opengridcomputing.com>
      Cc: Avi Kivity <avi@qumranet.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Anthony Liguori <aliguori@us.ibm.com>
      Cc: Chris Wright <chrisw@redhat.com>
      Cc: Marcelo Tosatti <marcelo@kvack.org>
      Cc: Eric Dumazet <dada1@cosmosbay.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Cc: Izik Eidus <izike@qumranet.com>
      Cc: Anthony Liguori <aliguori@us.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cddb8a5c
  12. 27 7月, 2008 1 次提交
  13. 26 7月, 2008 1 次提交
  14. 25 7月, 2008 18 次提交