1. 07 1月, 2006 10 次提交
    • C
      [PATCH] Add NUMA policy support for huge pages. · 5da7ca86
      Christoph Lameter 提交于
      The huge_zonelist() function in the memory policy layer provides an list of
      zones ordered by NUMA distance.  The hugetlb layer will walk that list looking
      for a zone that has available huge pages but is also in the nodeset of the
      current cpuset.
      
      This patch does not contain the folding of find_or_alloc_huge_page() that was
      controversial in the earlier discussion.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@muc.de>
      Acked-by: NWilliam Lee Irwin III <wli@holomorphy.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5da7ca86
    • C
      [PATCH] mm: dequeue a huge page near to this node · 96df9333
      Christoph Lameter 提交于
      This was discussed at
      http://marc.theaimsgroup.com/?l=linux-kernel&m=113166526217117&w=2
      
      This patch changes the dequeueing to select a huge page near the node
      executing instead of always beginning to check for free nodes from node 0.
      This will result in a placement of the huge pages near the executing
      processor improving performance.
      
      The existing implementation can place the huge pages far away from the
      executing processor causing significant degradation of performance.  The
      search starting from zero also means that the lower zones quickly run out
      of memory.  Selecting a huge page near the process distributed the huge
      pages better.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      96df9333
    • D
      [PATCH] Hugetlb: Copy on Write support · 1e8f889b
      David Gibson 提交于
      Implement copy-on-write support for hugetlb mappings so MAP_PRIVATE can be
      supported.  This helps us to safely use hugetlb pages in many more
      applications.  The patch makes the following changes.  If needed, I also have
      it broken out according to the following paragraphs.
      
      1. Add a pair of functions to set/clear write access on huge ptes.  The
         writable check in make_huge_pte is moved out to the caller for use by COW
         later.
      
      2. Hugetlb copy-on-write requires special case handling in the following
         situations:
      
         - copy_hugetlb_page_range() - Copied pages must be write protected so
           a COW fault will be triggered (if necessary) if those pages are written
           to.
      
         - find_or_alloc_huge_page() - Only MAP_SHARED pages are added to the
           page cache.  MAP_PRIVATE pages still need to be locked however.
      
      3. Provide hugetlb_cow() and calls from hugetlb_fault() and
         hugetlb_no_page() which handles the COW fault by making the actual copy.
      
      4. Remove the check in hugetlbfs_file_map() so that MAP_PRIVATE mmaps
         will be allowed.  Make MAP_HUGETLB exempt from the depricated VM_RESERVED
         mapping check.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "Seth, Rohit" <rohit.seth@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1e8f889b
    • A
      [PATCH] Hugetlb: Reorganize hugetlb_fault to prepare for COW · 86e5216f
      Adam Litke 提交于
      This patch splits the "no_page()" type activity into its own function,
      hugetlb_no_page().  hugetlb_fault() becomes the entry point for hugetlb faults
      and delegates to the appropriate handler depending on the type of fault.
      Right now we still have only hugetlb_no_page() but a later patch introduces a
      COW fault.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "Seth, Rohit" <rohit.seth@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      86e5216f
    • A
      [PATCH] Hugetlb: Rename find_lock_page to find_or_alloc_huge_page · 85ef47f7
      Adam Litke 提交于
      find_lock_huge_page() isn't a great name, since it does extra things not
      analagous to find_lock_page().  Rename it find_or_alloc_huge_page() which is
      closer to the mark.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "Seth, Rohit" <rohit.seth@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      85ef47f7
    • A
      [PATCH] Hugetlb: Remove duplicate i_size check · f0916794
      Adam Litke 提交于
      cleanup
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "Seth, Rohit" <rohit.seth@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f0916794
    • B
      [PATCH] madvise(MADV_REMOVE): remove pages from tmpfs shm backing store · f6b3ec23
      Badari Pulavarty 提交于
      Here is the patch to implement madvise(MADV_REMOVE) - which frees up a
      given range of pages & its associated backing store.  Current
      implementation supports only shmfs/tmpfs and other filesystems return
      -ENOSYS.
      
      "Some app allocates large tmpfs files, then when some task quits and some
      client disconnect, some memory can be released.  However the only way to
      release tmpfs-swap is to MADV_REMOVE". - Andrea Arcangeli
      
      Databases want to use this feature to drop a section of their bufferpool
      (shared memory segments) - without writing back to disk/swap space.
      
      This feature is also useful for supporting hot-plug memory on UML.
      
      Concerns raised by Andrew Morton:
      
      - "We have no plan for holepunching!  If we _do_ have such a plan (or
        might in the future) then what would the API look like?  I think
        sys_holepunch(fd, start, len), so we should start out with that."
      
      - Using madvise is very weird, because people will ask "why do I need to
        mmap my file before I can stick a hole in it?"
      
      - None of the other madvise operations call into the filesystem in this
        manner.  A broad question is: is this capability an MM operation or a
        filesytem operation?  truncate, for example, is a filesystem operation
        which sometimes has MM side-effects.  madvise is an mm operation and with
        this patch, it gains FS side-effects, only they're really, really
        significant ones."
      
      Comments:
      
      - Andrea suggested the fs operation too but then it's more efficient to
        have it as a mm operation with fs side effects, because they don't
        immediatly know fd and physical offset of the range.  It's possible to
        fixup in userland and to use the fs operation but it's more expensive,
        the vmas are already in the kernel and we can use them.
      
      Short term plan &  Future Direction:
      
      - We seem to need this interface only for shmfs/tmpfs files in the short
        term.  We have to add hooks into the filesystem for correctness and
        completeness.  This is what this patch does.
      
      - In the future, plan is to support both fs and mmap apis also.  This
        also involves (other) filesystem specific functions to be implemented.
      
      - Current patch doesn't support VM_NONLINEAR - which can be addressed in
        the future.
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Andrea Arcangeli <andrea@suse.de>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f6b3ec23
    • H
      [PATCH] reiser4: vfs: add truncate_inode_pages_range() · d7339071
      Hans Reiser 提交于
      This patch makes truncate_inode_pages_range from truncate_inode_pages.
      truncate_inode_pages became a one-liner call to truncate_inode_pages_range.
      
      Reiser4 needs truncate_inode_pages_ranges because it tries to keep
      correspondence between existences of metadata pointing to data pages and pages
      to which those metadata point to.  So, when metadata of certain part of file
      is removed from filesystem tree, only pages of corresponding range are to be
      truncated.
      
      (Needed by the madvise(MADV_REMOVE) patch)
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d7339071
    • A
      [PATCH] memhotplug: __add_section remove unused pgdat definition · 5ac24eef
      Andy Whitcroft 提交于
      __add_section defines an unused pointer to the zones pgdat.  Remove this
      definition.  This fixes a compile warning.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5ac24eef
    • P
      [PATCH] mm: fix __alloc_pages cpuset ALLOC_* flags · 47f3a867
      Paul Jackson 提交于
      Two changes to the setting of the ALLOC_CPUSET flag in
      mm/page_alloc.c:__alloc_pages()
      
      - A bug fix - the "ignoring mins" case should not be honoring ALLOC_CPUSET.
        This case of all cases, since it is handling a request that will free up
        more memory than is asked for (exiting tasks, e.g.) should be allowed to
        escape cpuset constraints when memory is tight.
      
      - A logic change to make it simpler.  Honor cpusets even on GFP_ATOMIC
        (!wait) requests.  With this, cpuset confinement applies to all requests
        except ALLOC_NO_WATERMARKS, so that in a subsequent cleanup patch, I can
        remove the ALLOC_CPUSET flag entirely.  Since I don't know any real reason
        this logic has to be either way, I am choosing the path of the simplest
        code.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      47f3a867
  2. 04 1月, 2006 1 次提交
    • Z
      [PATCH] add AOP_TRUNCATED_PAGE, prepend AOP_ to WRITEPAGE_ACTIVATE · 994fc28c
      Zach Brown 提交于
      readpage(), prepare_write(), and commit_write() callers are updated to
      understand the special return code AOP_TRUNCATED_PAGE in the style of
      writepage() and WRITEPAGE_ACTIVATE.  AOP_TRUNCATED_PAGE tells the caller that
      the callee has unlocked the page and that the operation should be tried again
      with a new page.  OCFS2 uses this to detect and work around a lock inversion in
      its aop methods.  There should be no change in behaviour for methods that don't
      return AOP_TRUNCATED_PAGE.
      
      WRITEPAGE_ACTIVATE is also prepended with AOP_ for consistency and they are
      made enums so that kerneldoc can be used to document their semantics.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      994fc28c
  3. 03 1月, 2006 1 次提交
  4. 17 12月, 2005 1 次提交
  5. 16 12月, 2005 1 次提交
  6. 14 12月, 2005 1 次提交
  7. 13 12月, 2005 2 次提交
    • L
      get_user_pages: don't try to follow PFNMAP pages · 1ff80389
      Linus Torvalds 提交于
      Nick Piggin points out that a few drivers play games with VM_IO (why?
      who knows..) and thus a pfn-remapped area may not have that bit set even
      if remap_pfn_range() set it originally.
      
      So make it explicit in get_user_pages() that we don't follow VM_PFNMAP
      pages, since pretty much by definition they do not have a "struct page"
      associated with them.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1ff80389
    • H
      [PATCH] fix in __alloc_bootmem_core() when there is no free page in first node's memory · 66d43e98
      Haren Myneni 提交于
      Hitting BUG_ON() in __alloc_bootmem_core() when there is no free page
      available in the first node's memory.  For the case of kdump on PPC64
      (Power 4 machine), the captured kernel is used two memory regions - memory
      for TCE tables (tce-base and tce-size at top of RAM and reserved) and
      captured kernel memory region (crashk_base and crashk_size).  Since we
      reserve the memory for the first node, we should be returning from
      __alloc_bootmem_core() to search for the next node (pg_dat).
      
      Currently, find_next_zero_bit() is returning the n^th bit (eidx) when there
      is no free page.  Then, test_bit() is failed since we set 0xff only for the
      actual size initially (init_bootmem_core()) even though rounded up to one
      page for bdata->node_bootmem_map.  We are hitting the BUG_ON after failing
      to enter second "for" loop.
      Signed-off-by: NHaren Myneni <haren@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      66d43e98
  8. 12 12月, 2005 3 次提交
  9. 04 12月, 2005 2 次提交
  10. 01 12月, 2005 1 次提交
    • L
      VM: add "vm_insert_page()" function · a145dd41
      Linus Torvalds 提交于
      This is what a lot of drivers will actually want to use to insert
      individual pages into a user VMA.  It doesn't have the old PageReserved
      restrictions of remap_pfn_range(), and it doesn't complain about partial
      remappings.
      
      The page you insert needs to be a nice clean kernel allocation, so you
      can't insert arbitrary page mappings with this, but that's not what
      people want.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a145dd41
  11. 30 11月, 2005 7 次提交
  12. 29 11月, 2005 6 次提交
    • N
      [PATCH] Fix vma argument in get_usr_pages() for gate areas · fa2a455b
      Nick Piggin 提交于
      The system call gate area handling called vm_normal_page() with the
      wrong vma (which was always NULL, and caused an oops).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fa2a455b
    • A
      [PATCH] shrinker->nr = LONG_MAX means deadlock for icache · ea164d73
      Andrea Arcangeli 提交于
      With Andrew Morton <akpm@osdl.org>
      
      The slab scanning code tries to balance the scanning rate of slabs versus the
      scanning rate of LRU pages.  To do this, it retains state concerning how many
      slabs have been scanned - if a particular slab shrinker didn't scan enough
      objects, we remember that for next time, and scan more objects on the next
      pass.
      
      The problem with this is that with (say) a huge number of GFP_NOIO
      direct-reclaim attempts, the number of objects which are to be scanned when we
      finally get a GFP_KERNEL request can be huge.  Because some shrinker handlers
      just bail out if !__GFP_FS.
      
      So the patch clamps the number of objects-to-be-scanned to 2* the total number
      of objects in the slab cache.
      Signed-off-by: NAndrea Arcangeli <andrea@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ea164d73
    • R
      [PATCH] temporarily disable swap token on memory pressure · f7b7fd8f
      Rik van Riel 提交于
      Some users (hi Zwane) have seen a problem when running a workload that
      eats nearly all of physical memory - th system does an OOM kill, even
      when there is still a lot of swap free.
      
      The problem appears to be a very big task that is holding the swap
      token, and the VM has a very hard time finding any other page in the
      system that is swappable.
      
      Instead of ignoring the swap token when sc->priority reaches 0, we could
      simply take the swap token away from the memory hog and make sure we
      don't give it back to the memory hog for a few seconds.
      
      This patch resolves the problem Zwane ran into.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f7b7fd8f
    • N
      [PATCH] mm: __alloc_pages cleanup fix · 3148890b
      Nick Piggin 提交于
      I believe this patch is required to fix breakage in the asynch reclaim
      watermark logic introduced by this patch:
      
      http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7fb1d9fca5c6e3b06773b69165a73f3fb786b8ee
      
      Just some background of the watermark logic in case it isn't clear...
      Basically what we have is this:
      
       ---  pages_high
         |
         | (a)
         |
       ---  pages_low
         |
         | (b)
         |
       ---  pages_min
         |
         | (c)
         |
       ---  0
      
      Now when pages_low is reached, we want to kick asynch reclaim, which gives us
      an interval of "b" before we must start synch reclaim, and gives kswapd an
      interval of "a" before it need go back to sleep.
      
      When pages_min is reached, normal allocators must enter synch reclaim, but
      PF_MEMALLOC, ALLOC_HARDER, and ALLOC_HIGH (ie.  atomic allocations, recursive
      allocations, etc.) get access to varying amounts of the reserve "c".
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: "Seth, Rohit" <rohit.seth@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3148890b
    • A
      [PATCH] Workaround for gcc 2.96 (undefined references) · e0f39591
      Alan Stern 提交于
        LD      .tmp_vmlinux1
      mm/built-in.o(.text+0x100d6): In function `copy_page_range':
      : undefined reference to `__pud_alloc'
      mm/built-in.o(.text+0x1010b): In function `copy_page_range':
      : undefined reference to `__pmd_alloc'
      mm/built-in.o(.text+0x11ef4): In function `__handle_mm_fault':
      : undefined reference to `__pud_alloc'
      fs/built-in.o(.text+0xc930): In function `install_arg_page':
      : undefined reference to `__pud_alloc'
      make: *** [.tmp_vmlinux1] Error 1
      
      Those missing references in mm/memory.c arise from this code in
      include/linux/mm.h, combined with the fact that __PGTABLE_PMD_FOLDED and
      __PGTABLE_PUD_FOLDED are both set and __ARCH_HAS_4LEVEL_HACK is not:
      
      /*
       * The following ifdef needed to get the 4level-fixup.h header to work.
       * Remove it when 4level-fixup.h has been removed.
       */
      #if defined(CONFIG_MMU) && !defined(__ARCH_HAS_4LEVEL_HACK)
      static inline pud_t *pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
      {
              return (unlikely(pgd_none(*pgd)) && __pud_alloc(mm, pgd, address))?
                      NULL: pud_offset(pgd, address);
      }
      
      static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
      {
              return (unlikely(pud_none(*pud)) && __pmd_alloc(mm, pud, address))?
                      NULL: pmd_offset(pud, address);
      }
      #endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */
      
      With my configuration the pgd_none and pud_none routines are inlines
      returning a constant 0.  Apparently the old compiler avoids generating
      calls to __pud_alloc and __pmd_alloc but still lists them as undefined
      references in the module's symbol table.
      
      I don't know which change caused this problem.  I think it was added
      somewhere between 2.6.14 and 2.6.15-rc1, because I remember building
      several 2.6.14-rc kernels without difficulty.  However I can't point to an
      individual culprit.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e0f39591
    • L
      mm: re-architect the VM_UNPAGED logic · 6aab341e
      Linus Torvalds 提交于
      This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
      explicit support for a "remapped page range" aka VM_PFNMAP.  It allows a
      VM area to contain an arbitrary range of page table entries that the VM
      never touches, and never considers to be normal pages.
      
      Any user of "remap_pfn_range()" automatically gets this new
      functionality, and doesn't even have to mark the pages reserved or
      indeed mark them any other way.  It just works.  As a side effect, doing
      mmap() on /dev/mem works for arbitrary ranges.
      
      Sparc update from David in the next commit.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6aab341e
  13. 24 11月, 2005 2 次提交
  14. 23 11月, 2005 2 次提交
    • E
      [PATCH] hugetlb: fix race in set_max_huge_pages for multiple updaters of nr_huge_pages · 0bd0f9fb
      Eric Paris 提交于
      If there are multiple updaters to /proc/sys/vm/nr_hugepages simultaneously
      it is possible for the nr_huge_pages variable to become incorrect.  There
      is no locking in the set_max_huge_pages function around
      alloc_fresh_huge_page which is able to update nr_huge_pages.  Two callers
      to alloc_fresh_huge_page could race against each other as could a call to
      alloc_fresh_huge_page and a call to update_and_free_page.  This patch just
      expands the area covered by the hugetlb_lock to cover the call into
      alloc_fresh_huge_page.  I'm not sure how we could say that a sysctl section
      is performance critical where more specific locking would be needed.
      
      My reproducer was to run a couple copies of the following script
      simultaneously
      
      while [ true ]; do
      	echo 1000 > /proc/sys/vm/nr_hugepages
      	echo 500 > /proc/sys/vm/nr_hugepages
      	echo 750 > /proc/sys/vm/nr_hugepages
      	echo 100 > /proc/sys/vm/nr_hugepages
      	echo 0 > /proc/sys/vm/nr_hugepages
      done
      
      and then watch /proc/meminfo and eventually you will see things like
      
      HugePages_Total:     100
      HugePages_Free:      109
      
      After applying the patch all seemed well.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NWilliam Irwin <wli@holomorphy.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0bd0f9fb
    • H
      [PATCH] unpaged: PG_reserved bad_page · 689bcebf
      Hugh Dickins 提交于
      It used to be the case that PG_reserved pages were silently never freed, but
      in 2.6.15-rc1 they may be freed with a "Bad page state" message.  We should
      work through such cases as they appear, fixing the code; but for now it's
      safer to issue the message without freeing the page, leaving PG_reserved set.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      689bcebf