1. 22 5月, 2006 3 次提交
    • B
      [PATCH] Align the node_mem_map endpoints to a MAX_ORDER boundary · e984bb43
      Bob Picco 提交于
      Andy added code to buddy allocator which does not require the zone's
      endpoints to be aligned to MAX_ORDER.  An issue is that the buddy allocator
      requires the node_mem_map's endpoints to be MAX_ORDER aligned.  Otherwise
      __page_find_buddy could compute a buddy not in node_mem_map for partial
      MAX_ORDER regions at zone's endpoints.  page_is_buddy will detect that
      these pages at endpoints are not PG_buddy (they were zeroed out by bootmem
      allocator and not part of zone).  Of course the negative here is we could
      waste a little memory but the positive is eliminating all the old checks
      for zone boundary conditions.
      
      SPARSEMEM won't encounter this issue because of MAX_ORDER size constraint
      when SPARSEMEM is configured.  ia64 VIRTUAL_MEM_MAP doesn't need the logic
      either because the holes and endpoints are handled differently.  This
      leaves checking alloc_remap and other arches which privately allocate for
      node_mem_map.
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e984bb43
    • P
      [PATCH] Cpuset: might sleep checking zones allowed fix · bdd804f4
      Paul Jackson 提交于
      Fix a couple of infrequently encountered 'sleeping function called from
      invalid context' in the cpuset hooks in __alloc_pages.  Could sleep while
      interrupts disabled.
      
      The routine cpuset_zone_allowed() is called by code in mm/page_alloc.c
      __alloc_pages() to determine if a zone is allowed in the current tasks
      cpuset.  This routine can sleep, for certain GFP_KERNEL allocations, if the
      zone is on a memory node not allowed in the current cpuset, but might be
      allowed in a parent cpuset.
      
      But we can't sleep in __alloc_pages() if in interrupt, nor if called for a
      GFP_ATOMIC request (__GFP_WAIT not set in gfp_flags).
      
      The rule was intended to be:
        Don't call cpuset_zone_allowed() if you can't sleep, unless you
        pass in the __GFP_HARDWALL flag set in gfp_flag, which disables
        the code that might scan up ancestor cpusets and sleep.
      
      This rule was being violated in a couple of places, due to a bogus change
      made (by myself, pj) to __alloc_pages() as part of the November 2005 effort
      to cleanup its logic, and also due to a later fix to constrain which swap
      daemons were awoken.
      
      The bogus change can be seen at:
        http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-11/4691.html
        [PATCH 01/05] mm fix __alloc_pages cpuset ALLOC_* flags
      
      This was first noticed on a tight memory system, in code that was disabling
      interrupts and doing allocation requests with __GFP_WAIT not set, which
      resulted in __might_sleep() writing complaints to the log "Debug: sleeping
      function called ...", when the code in cpuset_zone_allowed() tried to take
      the callback_sem cpuset semaphore.
      
      We haven't seen a system hang on this 'might_sleep' yet, but we are at
      decent risk of seeing it fairly soon, especially since the additional
      cpuset_zone_allowed() check was added, conditioning wakeup_kswapd(), in
      March 2006.
      
      Special thanks to Dave Chinner, for figuring this out, and a tip of the hat
      to Nick Piggin who warned me of this back in Nov 2005, before I was ready
      to listen.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bdd804f4
    • M
      [PATCH] SPARSEMEM incorrectly calculates section number · 12783b00
      Mike Kravetz 提交于
      A bad calculation/loop in __section_nr() could result in incorrect section
      information being put into sysfs memory entries.  This primarily impacts
      memory add operations as the sysfs information is used while onlining new
      memory.
      
      Fix suggested by Dave Hansen.
      
      Note that the bug may not be obvious from the patch.  It actually occurs in
      the function's return statement:
      
      	return (root_nr * SECTIONS_PER_ROOT) + (ms - root);
      
      In the existing code, root_nr has already been multiplied by
      SECTIONS_PER_ROOT.
      Signed-off-by: NMike Kravetz <kravetz@us.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      12783b00
  2. 16 5月, 2006 3 次提交
  3. 02 5月, 2006 3 次提交
    • J
      [PATCH] spufs: fix for CONFIG_NUMA · bed120c6
      Joel H Schopp 提交于
      Based on an older patch from  Mike Kravetz <kravetz@us.ibm.com>
      
      We need to have a mem_map for high addresses in order to make fops->no_page
      work on spufs mem and register files.  So far, we have used the
      memory_present() function during early bootup, but that did not work when
      CONFIG_NUMA was enabled.
      
      We now use the __add_pages() function to add the mem_map when loading the
      spufs module, which is a lot nicer.
      Signed-off-by: NArnd Bergmann <arnd.bergmann@de.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bed120c6
    • M
      [PATCH] sparsemem interaction with memory add bug fixes · 46a66eec
      Mike Kravetz 提交于
      This patch fixes two bugs with the way sparsemem interacts with memory add.
      They are:
      
      - memory leak if memmap for section already exists
      
      - calling alloc_bootmem_node() after boot
      
      These bugs were discovered and a first cut at the fixes were provided by
      Arnd Bergmann <arnd@arndb.de> and Joel Schopp <jschopp@us.ibm.com>.
      Signed-off-by: NMike Kravetz <kravetz@us.ibm.com>
      Signed-off-by: NJoel Schopp <jschopp@austin.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      46a66eec
    • C
      [PATCH] page migration: Fix fallback behavior for dirty pages · 4c28f811
      Christoph Lameter 提交于
      Currently we check PageDirty() in order to make the decision to swap out
      the page.  However, the dirty information may be only be contained in the
      ptes pointing to the page.  We need to first unmap the ptes before checking
      for PageDirty().  If unmap is successful then the page count of the page
      will also be decreased so that pageout() works properly.
      
      This is a fix necessary for 2.6.17.  Without this fix we may migrate dirty
      pages for filesystems without migration functions.  Filesystems may keep
      pointers to dirty pages.  Migration of dirty pages can result in the
      filesystem keeping pointers to freed pages.
      
      Unmapping is currently not be separated out from removing all the
      references to a page and moving the mapping.  Therefore try_to_unmap will
      be called again in migrate_page() if the writeout is successful.  However,
      it wont do anything since the ptes are already removed.
      
      The coming updates to the page migration code will restructure the code
      so that this is no longer necessary.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4c28f811
  4. 29 4月, 2006 1 次提交
  5. 27 4月, 2006 1 次提交
  6. 26 4月, 2006 1 次提交
  7. 23 4月, 2006 1 次提交
    • L
      [PATCH] add migratepage address space op to shmem · 304dbdb7
      Lee Schermerhorn 提交于
      Basic problem: pages of a shared memory segment can only be migrated once.
      
      In 2.6.16 through 2.6.17-rc1, shared memory mappings do not have a
      migratepage address space op.  Therefore, migrate_pages() falls back to
      default processing.  In this path, it will try to pageout() dirty pages.
      Once a shared memory page has been migrated it becomes dirty, so
      migrate_pages() will try to page it out.  However, because the page count
      is 3 [cache + current + pte], pageout() will return PAGE_KEEP because
      is_page_cache_freeable() returns false.  This will abort all subsequent
      migrations.
      
      This patch adds a migratepage address space op to shared memory segments to
      avoid taking the default path.  We use the "migrate_page()" function
      because it knows how to migrate dirty pages.  This allows shared memory
      segment pages to migrate, subject to other conditions such as # pte's
      referencing the page [page_mapcount(page)], when requested.
      
      I think this is safe.  If we're migrating a shared memory page, then we
      found the page via a page table, so it must be in memory.
      
      Can be verified with memtoy and the shmem-mbind-test script, both
      available at:  http://free.linux.hp.com/~lts/Tools/Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      304dbdb7
  8. 20 4月, 2006 5 次提交
  9. 18 4月, 2006 1 次提交
  10. 11 4月, 2006 11 次提交
  11. 10 4月, 2006 1 次提交
    • A
      [PATCH] x86_64: Handle empty PXMs that only contain hotplug memory · a8062231
      Andi Kleen 提交于
      The node setup code would try to allocate the node metadata in the node
      itself, but that fails if there is no memory in there.
      
      This can happen with memory hotplug when the hotplug area defines an so
      far empty node.
      
      Now use bootmem to try to allocate the mem_map in other nodes.
      
      And if it fails don't panic, but just ignore the node.
      
      To make this work I added a new __alloc_bootmem_nopanic function that
      does what its name implies.
      
      TBD should try to use nearby nodes here.  Currently we just use any.
      It's hard to do it better because bootmem doesn't have proper fallback
      lists yet.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a8062231
  12. 02 4月, 2006 3 次提交
  13. 01 4月, 2006 6 次提交
    • E
      BUG_ON() Conversion in mm/vmalloc.c · 5aae277e
      Eric Sesterhenn 提交于
      this changes if() BUG(); constructs to BUG_ON() which is
      cleaner, contains unlikely() and can better optimized away.
      Signed-off-by: NEric Sesterhenn <snakebyte@gmx.de>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      5aae277e
    • E
      BUG_ON() Conversion in mm/swap_state.c · e74ca2b4
      Eric Sesterhenn 提交于
      this changes if() BUG(); constructs to BUG_ON() which is
      cleaner, contains unlikely() and can better optimized away.
      Signed-off-by: NEric Sesterhenn <snakebyte@gmx.de>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      e74ca2b4
    • E
      BUG_ON() Conversion in mm/mmap.c · 46a350ef
      Eric Sesterhenn 提交于
      this changes if() BUG(); constructs to BUG_ON() which is
      cleaner, contains unlikely() and can better optimized away.
      Signed-off-by: NEric Sesterhenn <snakebyte@gmx.de>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      46a350ef
    • A
      [PATCH] sys_sync_file_range() · f79e2abb
      Andrew Morton 提交于
      Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
      fadvise() additions, do it in a new sys_sync_file_range() syscall instead.
      Reasons:
      
      - It's more flexible.  Things which would require two or three syscalls with
        fadvise() can be done in a single syscall.
      
      - Using fadvise() in this manner is something not covered by POSIX.
      
      The patch wires up the syscall for x86.
      
      The sycall is implemented in the new fs/sync.c.  The intention is that we can
      move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later.
      
      Documentation for the syscall is in fs/sync.c.
      
      A test app (sync_file_range.c) is in
      http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.
      
      The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can
      say NFS_DATA_SYNC or NFS_FILE_SYNC.  I can skip the ->fsync call for
      NFS_DATA_SYNC which is hopefully the more common."
      
      Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if
      the queue is congested.  This is trivial to fix: add a new flag bit, set
      wbc->nonblocking.  But I'm not sure that we want to expose implementation
      details down to that level.
      
      Note: it's notable that we can sync an fd which wasn't opened for writing.
      Same with fsync() and fdatasync()).
      
      Note: the code takes some care to handle attempts to sync file contents
      outside the 16TB offset on 32-bit machines.  It makes such attempts appear to
      succeed, for best 32-bit/64-bit compatibility.  Perhaps it should make such
      requests fail...
      
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Neil Brown <neilb@cse.unsw.edu.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f79e2abb
    • O
      [PATCH] Don't pass boot parameters to argv_init[] · 9b41046c
      OGAWA Hirofumi 提交于
      The boot cmdline is parsed in parse_early_param() and
      parse_args(,unknown_bootoption).
      
      And __setup() is used in obsolete_checksetup().
      
      	start_kernel()
      		-> parse_args()
      			-> unknown_bootoption()
      				-> obsolete_checksetup()
      
      If __setup()'s callback (->setup_func()) returns 1 in
      obsolete_checksetup(), obsolete_checksetup() thinks a parameter was
      handled.
      
      If ->setup_func() returns 0, obsolete_checksetup() tries other
      ->setup_func().  If all ->setup_func() that matched a parameter returns 0,
      a parameter is seted to argv_init[].
      
      Then, when runing /sbin/init or init=app, argv_init[] is passed to the app.
      If the app doesn't ignore those arguments, it will warning and exit.
      
      This patch fixes a wrong usage of it, however fixes obvious one only.
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9b41046c
    • C
      [PATCH] hugetlb: don't allow free hugetlb count fall below reserved count · 78c997a4
      Chen, Kenneth W 提交于
      With strict page reservation, I think kernel should enforce number of free
      hugetlb page don't fall below reserved count.  Currently it is possible in
      the sysctl path.  Add proper check in sysctl to disallow that.
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      78c997a4