1. 07 1月, 2006 18 次提交
    • C
      [PATCH] Remove old node based policy interface from mempolicy.c · 21abb147
      Christoph Lameter 提交于
      mempolicy.c contains provisional interface for huge page allocation based on
      node numbers.  This is in use in SLES9 but was never used (AFAIK) in upstream
      versions of Linux.
      
      Huge page allocations now use zonelists to figure out where to allocate pages.
       The use of zonelists allows us to find the closest hugepage which was the
      consideration of the NUMA distance for huge page allocations.
      
      Remove the obsolete functions.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@muc.de>
      Acked-by: NWilliam Lee Irwin III <wli@holomorphy.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Acked-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      21abb147
    • C
      [PATCH] Add NUMA policy support for huge pages. · 5da7ca86
      Christoph Lameter 提交于
      The huge_zonelist() function in the memory policy layer provides an list of
      zones ordered by NUMA distance.  The hugetlb layer will walk that list looking
      for a zone that has available huge pages but is also in the nodeset of the
      current cpuset.
      
      This patch does not contain the folding of find_or_alloc_huge_page() that was
      controversial in the earlier discussion.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@muc.de>
      Acked-by: NWilliam Lee Irwin III <wli@holomorphy.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5da7ca86
    • C
      [PATCH] mm: dequeue a huge page near to this node · 96df9333
      Christoph Lameter 提交于
      This was discussed at
      http://marc.theaimsgroup.com/?l=linux-kernel&m=113166526217117&w=2
      
      This patch changes the dequeueing to select a huge page near the node
      executing instead of always beginning to check for free nodes from node 0.
      This will result in a placement of the huge pages near the executing
      processor improving performance.
      
      The existing implementation can place the huge pages far away from the
      executing processor causing significant degradation of performance.  The
      search starting from zero also means that the lower zones quickly run out
      of memory.  Selecting a huge page near the process distributed the huge
      pages better.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      96df9333
    • D
      [PATCH] Hugetlb: Copy on Write support · 1e8f889b
      David Gibson 提交于
      Implement copy-on-write support for hugetlb mappings so MAP_PRIVATE can be
      supported.  This helps us to safely use hugetlb pages in many more
      applications.  The patch makes the following changes.  If needed, I also have
      it broken out according to the following paragraphs.
      
      1. Add a pair of functions to set/clear write access on huge ptes.  The
         writable check in make_huge_pte is moved out to the caller for use by COW
         later.
      
      2. Hugetlb copy-on-write requires special case handling in the following
         situations:
      
         - copy_hugetlb_page_range() - Copied pages must be write protected so
           a COW fault will be triggered (if necessary) if those pages are written
           to.
      
         - find_or_alloc_huge_page() - Only MAP_SHARED pages are added to the
           page cache.  MAP_PRIVATE pages still need to be locked however.
      
      3. Provide hugetlb_cow() and calls from hugetlb_fault() and
         hugetlb_no_page() which handles the COW fault by making the actual copy.
      
      4. Remove the check in hugetlbfs_file_map() so that MAP_PRIVATE mmaps
         will be allowed.  Make MAP_HUGETLB exempt from the depricated VM_RESERVED
         mapping check.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "Seth, Rohit" <rohit.seth@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1e8f889b
    • A
      [PATCH] Hugetlb: Reorganize hugetlb_fault to prepare for COW · 86e5216f
      Adam Litke 提交于
      This patch splits the "no_page()" type activity into its own function,
      hugetlb_no_page().  hugetlb_fault() becomes the entry point for hugetlb faults
      and delegates to the appropriate handler depending on the type of fault.
      Right now we still have only hugetlb_no_page() but a later patch introduces a
      COW fault.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "Seth, Rohit" <rohit.seth@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      86e5216f
    • A
      [PATCH] Hugetlb: Rename find_lock_page to find_or_alloc_huge_page · 85ef47f7
      Adam Litke 提交于
      find_lock_huge_page() isn't a great name, since it does extra things not
      analagous to find_lock_page().  Rename it find_or_alloc_huge_page() which is
      closer to the mark.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "Seth, Rohit" <rohit.seth@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      85ef47f7
    • A
      [PATCH] Hugetlb: Remove duplicate i_size check · f0916794
      Adam Litke 提交于
      cleanup
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "Seth, Rohit" <rohit.seth@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f0916794
    • B
      [PATCH] madvise(MADV_REMOVE): remove pages from tmpfs shm backing store · f6b3ec23
      Badari Pulavarty 提交于
      Here is the patch to implement madvise(MADV_REMOVE) - which frees up a
      given range of pages & its associated backing store.  Current
      implementation supports only shmfs/tmpfs and other filesystems return
      -ENOSYS.
      
      "Some app allocates large tmpfs files, then when some task quits and some
      client disconnect, some memory can be released.  However the only way to
      release tmpfs-swap is to MADV_REMOVE". - Andrea Arcangeli
      
      Databases want to use this feature to drop a section of their bufferpool
      (shared memory segments) - without writing back to disk/swap space.
      
      This feature is also useful for supporting hot-plug memory on UML.
      
      Concerns raised by Andrew Morton:
      
      - "We have no plan for holepunching!  If we _do_ have such a plan (or
        might in the future) then what would the API look like?  I think
        sys_holepunch(fd, start, len), so we should start out with that."
      
      - Using madvise is very weird, because people will ask "why do I need to
        mmap my file before I can stick a hole in it?"
      
      - None of the other madvise operations call into the filesystem in this
        manner.  A broad question is: is this capability an MM operation or a
        filesytem operation?  truncate, for example, is a filesystem operation
        which sometimes has MM side-effects.  madvise is an mm operation and with
        this patch, it gains FS side-effects, only they're really, really
        significant ones."
      
      Comments:
      
      - Andrea suggested the fs operation too but then it's more efficient to
        have it as a mm operation with fs side effects, because they don't
        immediatly know fd and physical offset of the range.  It's possible to
        fixup in userland and to use the fs operation but it's more expensive,
        the vmas are already in the kernel and we can use them.
      
      Short term plan &  Future Direction:
      
      - We seem to need this interface only for shmfs/tmpfs files in the short
        term.  We have to add hooks into the filesystem for correctness and
        completeness.  This is what this patch does.
      
      - In the future, plan is to support both fs and mmap apis also.  This
        also involves (other) filesystem specific functions to be implemented.
      
      - Current patch doesn't support VM_NONLINEAR - which can be addressed in
        the future.
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Andrea Arcangeli <andrea@suse.de>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f6b3ec23
    • H
      [PATCH] reiser4: vfs: add truncate_inode_pages_range() · d7339071
      Hans Reiser 提交于
      This patch makes truncate_inode_pages_range from truncate_inode_pages.
      truncate_inode_pages became a one-liner call to truncate_inode_pages_range.
      
      Reiser4 needs truncate_inode_pages_ranges because it tries to keep
      correspondence between existences of metadata pointing to data pages and pages
      to which those metadata point to.  So, when metadata of certain part of file
      is removed from filesystem tree, only pages of corresponding range are to be
      truncated.
      
      (Needed by the madvise(MADV_REMOVE) patch)
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d7339071
    • A
      [PATCH] memhotplug: register_memory should be global · 900b2b46
      Andy Whitcroft 提交于
      register_memory is global and declared so in linux/memory.h.  Update the
      HOTPLUG specific definition to match.  This fixes a compile warning when
      HOTPLUG is enabled.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      900b2b46
    • A
      [PATCH] memhotplug: register_ and unregister_memory_notifier should be global · 98a38ebd
      Andy Whitcroft 提交于
      Both register_memory_notifer and unregister_memory_notifier are global and
      declared so in linux/memory.h.  Update the HOTPLUG specific definitions to
      match.  This fixes a compile warning when HOTPLUG is enabled.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      98a38ebd
    • A
      [PATCH] memhotplug: __add_section remove unused pgdat definition · 5ac24eef
      Andy Whitcroft 提交于
      __add_section defines an unused pointer to the zones pgdat.  Remove this
      definition.  This fixes a compile warning.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5ac24eef
    • P
      [PATCH] mm: fix __alloc_pages cpuset ALLOC_* flags · 47f3a867
      Paul Jackson 提交于
      Two changes to the setting of the ALLOC_CPUSET flag in
      mm/page_alloc.c:__alloc_pages()
      
      - A bug fix - the "ignoring mins" case should not be honoring ALLOC_CPUSET.
        This case of all cases, since it is handling a request that will free up
        more memory than is asked for (exiting tasks, e.g.) should be allowed to
        escape cpuset constraints when memory is tight.
      
      - A logic change to make it simpler.  Honor cpusets even on GFP_ATOMIC
        (!wait) requests.  With this, cpuset confinement applies to all requests
        except ALLOC_NO_WATERMARKS, so that in a subsequent cleanup patch, I can
        remove the ALLOC_CPUSET flag entirely.  Since I don't know any real reason
        this logic has to be either way, I am choosing the path of the simplest
        code.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      47f3a867
    • A
      [PATCH] swsusp: resume_store() retval fix · a576219a
      Andrew Morton 提交于
      - This function returns -EINVAL all the time.  Fix.
      
      - Decruftify it a bit too.
      
      - Writing to it doesn't seem to do what it's suppoed to do.
      
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a576219a
    • A
      [PATCH] alpha: dma_map_page() fix · 817c41d7
      Andrew Morton 提交于
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Richard Henderson <rth@twiddle.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      817c41d7
    • N
      [PATCH] knfsd: fix hash function for IP addresses on 64bit little-endian machines. · 1f1e030b
      NeilBrown 提交于
      The hash.h hash_long function, when used on a 64 bit machine, ignores many
      of the middle-order bits.  (The prime chosen it too bit-sparse).
      
      IP addresses for clients of an NFS server are very likely to differ only in
      the low-order bits.  As addresses are stored in network-byte-order, these
      bits become middle-order bits in a little-endian 64bit 'long', and so do
      not contribute to the hash.  Thus you can have the situation where all
      clients appear on one hash chain.
      
      So, until hash_long is fixed (or maybe forever), us a hash function that
      works well on IP addresses - xor the bytes together.
      
      Thanks to "Iozone" <capps@iozone.org> for identifying this problem.
      
      Cc: "Iozone" <capps@iozone.org>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1f1e030b
    • H
      [PATCH] nbd: fix TX/RX race condition · 4b2f0260
      Herbert Xu 提交于
      Janos Haar of First NetCenter Bt.  reported numerous crashes involving the
      NBD driver.  With his help, this was tracked down to bogus bio vectors
      which in turn was the result of a race condition between the
      receive/transmit routines in the NBD driver.
      
      The bug manifests itself like this:
      
      CPU0				CPU1
      do_nbd_request
      	add req to queuelist
      	nbd_send_request
      		send req head
      		for each bio
      			kmap
      			send
      				nbd_read_stat
      					nbd_find_request
      					nbd_end_request
      			kunmap
      
      When CPU1 finishes nbd_end_request, the request and all its associated
      bio's are freed.  So when CPU0 calls kunmap whose argument is derived from
      the last bio, it may crash.
      
      Under normal circumstances, the race occurs only on the last bio.  However,
      if an error is encountered on the remote NBD server (such as an incorrect
      magic number in the request), or if there were a bug in the server, it is
      possible for the nbd_end_request to occur any time after the request's
      addition to the queuelist.
      
      The following patch fixes this problem by making sure that requests are not
      added to the queuelist until after they have been completed transmission.
      
      In order for the receiving side to be ready for responses involving
      requests still being transmitted, the patch introduces the concept of the
      active request.
      
      When a response matches the current active request, its processing is
      delayed until after the tranmission has come to a stop.
      
      This has been tested by Janos and it has been successful in curing this
      race condition.
      
      From: Herbert Xu <herbert@gondor.apana.org.au>
      
        Here is an updated patch which removes the active_req wait in
        nbd_clear_queue and the associated memory barrier.
      
        I've also clarified this in the comment.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Cc: <djani22@dynamicweb.hu>
      Cc: Paul Clements <Paul.Clements@SteelEye.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4b2f0260
    • J
      [PATCH] hfsplus oops fix · bd6a59b2
      Joshua Kwan 提交于
      nls_utf8 is available, and the check in hfsplus_fill_super checks the wrong
      pointer for NULLness (it checks the saved nls, not the new one that it
      needs to use.)
      Signed-off-by: NJoshua Kwan <joshk@triplehelix.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bd6a59b2
  2. 06 1月, 2006 22 次提交