1. 10 3月, 2009 1 次提交
    • T
      percpu: make x86 addr <-> pcpu ptr conversion macros generic · e0100983
      Tejun Heo 提交于
      Impact: generic addr <-> pcpu ptr conversion macros
      
      There's nothing arch specific about x86 __addr_to_pcpu_ptr() and
      __pcpu_ptr_to_addr().  With proper __per_cpu_load and __per_cpu_start
      defined, they'll do the right thing regardless of actual layout.
      
      Move these macros from arch/x86/include/asm/percpu.h to mm/percpu.c
      and allow archs to override it as necessary.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e0100983
  2. 07 3月, 2009 1 次提交
    • T
      percpu: finer grained locking to break deadlock and allow atomic free · ccea34b5
      Tejun Heo 提交于
      Impact: fix deadlock and allow atomic free
      
      Percpu allocation always uses GFP_KERNEL and whole alloc/free paths
      were protected by single mutex.  All percpu allocations have been from
      GFP_KERNEL-safe context and the original allocator had this assumption
      too.  However, by protecting both alloc and free paths with the same
      mutex, the new allocator creates free -> alloc -> GFP_KERNEL
      dependency which the original allocator didn't have.  This can lead to
      deadlock if free is called from FS or IO paths.  Also, in general,
      allocators are expected to allow free to be called from atomic
      context.
      
      This patch implements finer grained locking to break the deadlock and
      allow atomic free.  For details, please read the "Synchronization
      rules" comment.
      
      While at it, also add CONTEXT: to function comments to describe which
      context they expect to be called from and what they do to it.
      
      This problem was reported by Thomas Gleixner and Peter Zijlstra.
      
        http://thread.gmane.org/gmane.linux.kernel/802384Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      ccea34b5
  3. 06 3月, 2009 8 次提交
    • T
      percpu: move fully free chunk reclamation into a work · a56dbddf
      Tejun Heo 提交于
      Impact: code reorganization for later changes
      
      Do fully free chunk reclamation using a work.  This change is to
      prepare for locking changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a56dbddf
    • T
      percpu: move chunk area map extension out of area allocation · 9f7dcf22
      Tejun Heo 提交于
      Impact: code reorganization for later changes
      
      Separate out chunk area map extension into a separate function -
      pcpu_extend_area_map() - and call it directly from pcpu_alloc() such
      that pcpu_alloc_area() is guaranteed to have enough area map slots on
      invocation.
      
      With this change, pcpu_alloc_area() does only area allocation and the
      only failure mode is when the chunk doens't have enough room, so
      there's no need to distinguish it from memory allocation failures.
      Make it return -1 on such cases instead of hacky -ENOSPC.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9f7dcf22
    • T
      percpu: replace pcpu_realloc() with pcpu_mem_alloc() and pcpu_mem_free() · 1880d93b
      Tejun Heo 提交于
      Impact: code reorganization for later changes
      
      With static map handling moved to pcpu_split_block(), pcpu_realloc()
      only clutters the code and it's also unsuitable for scheduled locking
      changes.  Implement and use pcpu_mem_alloc/free() instead.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      1880d93b
    • T
      percpu, module: implement reserved allocation and use it for module percpu variables · edcb4639
      Tejun Heo 提交于
      Impact: add reserved allocation functionality and use it for module
      	percpu variables
      
      This patch implements reserved allocation from the first chunk.  When
      setting up the first chunk, arch can ask to set aside certain number
      of bytes right after the core static area which is available only
      through a separate reserved allocator.  This will be used primarily
      for module static percpu variables on architectures with limited
      relocation range to ensure that the module perpcu symbols are inside
      the relocatable range.
      
      If reserved area is requested, the first chunk becomes reserved and
      isn't available for regular allocation.  If the first chunk also
      includes piggy-back dynamic allocation area, a separate chunk mapping
      the same region is created to serve dynamic allocation.  The first one
      is called static first chunk and the second dynamic first chunk.
      Although they share the page map, their different area map
      initializations guarantee they serve disjoint areas according to their
      purposes.
      
      If arch doesn't setup reserved area, reserved allocation is handled
      like any other allocation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      edcb4639
    • T
      percpu: add an indirection ptr for chunk page map access · 3e24aa58
      Tejun Heo 提交于
      Impact: allow sharing page map, no functional difference yet
      
      Make chunk->page access indirect by adding a pointer and renaming the
      actual array to page_ar.  This will be used by future changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      3e24aa58
    • T
      percpu: use negative for auto for pcpu_setup_first_chunk() arguments · cafe8816
      Tejun Heo 提交于
      Impact: argument semantic cleanup
      
      In pcpu_setup_first_chunk(), zero @unit_size and @dyn_size meant
      auto-sizing.  It's okay for @unit_size as 0 doesn't make sense but 0
      dynamic reserve size is valid.  Alos, if arch @dyn_size is calculated
      from other parameters, it might end up passing in 0 @dyn_size and
      malfunction when the size is automatically adjusted.
      
      This patch makes both @unit_size and @dyn_size ssize_t and use -1 for
      auto sizing.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      cafe8816
    • T
      percpu: improve first chunk initial area map handling · 61ace7fa
      Tejun Heo 提交于
      Impact: no functional change
      
      When the first chunk is created, its initial area map is not allocated
      because kmalloc isn't online yet.  The map is allocated and
      initialized on the first allocation request on the chunk.  This works
      fine but the scattering of initialization logic between the init
      function and allocation path is a bit confusing.
      
      This patch makes the first chunk initialize and use minimal statically
      allocated map from pcpu_setpu_first_chunk().  The map resizing path
      still needs to handle this specially but it's more straight-forward
      and gives more latitude to the init path.  This will ease future
      changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      61ace7fa
    • T
      percpu: cosmetic renames in pcpu_setup_first_chunk() · 2441d15c
      Tejun Heo 提交于
      Impact: cosmetic, preparation for future changes
      
      Make the following renames in pcpur_setup_first_chunk() in preparation
      for future changes.
      
      * s/free_size/dyn_size/
      * s/static_vm/first_vm/
      * s/static_chunk/schunk/
      Signed-off-by: NTejun Heo <tj@kernel.org>
      2441d15c
  4. 02 3月, 2009 1 次提交
    • I
      x86, mm: dont use non-temporal stores in pagecache accesses · f1800536
      Ingo Molnar 提交于
      Impact: standardize IO on cached ops
      
      On modern CPUs it is almost always a bad idea to use non-temporal stores,
      as the regression in this commit has shown it:
      
        30d697fa: x86: fix performance regression in write() syscall
      
      The kernel simply has no good information about whether using non-temporal
      stores is a good idea or not - and trying to add heuristics only increases
      complexity and inserts fragility.
      
      The regression on cached write()s took very long to be found - over two
      years. So dont take any chances and let the hardware decide how it makes
      use of its caches.
      
      The only exception is drivers/gpu/drm/i915/i915_gem.c: there were we are
      absolutely sure that another entity (the GPU) will pick up the dirty
      data immediately and that the CPU will not touch that data before the
      GPU will.
      
      Also, keep the _nocache() primitives to make it easier for people to
      experiment with these details. There may be more clear-cut cases where
      non-cached copies can be used, outside of filemap.c.
      
      Cc: Salman Qazi <sqazi@google.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f1800536
  5. 01 3月, 2009 2 次提交
    • T
      bootmem, x86: further fixes for arch-specific bootmem wrapping · d0c4f570
      Tejun Heo 提交于
      Impact: fix new breakages introduced by previous fix
      
      Commit c1329375 tried to clean up
      bootmem arch wrapper but it wasn't quite correct.  Before the commit,
      the followings were broken.
      
      * Low level interface functions prefixed with __ ignored arch
        preference.
      
      * reserve_bootmem(...) can't be mapped into
        reserve_bootmem_node(NODE_DATA(0)->bdata, ...) because the node is
        not preference here.  The region specified MUST fall into the
        specified region; otherwise, it will panic.
      
      After the commit,
      
      * If allocation fails for the arch preferred node, it should fallback
        to whatever is available.  Instead, it simply failed allocation.
      
      There are too many internal details to allow generic wrapping and
      still keep things simple for archs.  Plus, all that arch wants is a
      way to prefer certain node over another.
      
      This patch drops the generic wrapping around alloc_bootmem_core() and
      add alloc_bootmem_core() instead.  If necessary, arch can define
      bootmem_arch_referred_node() macro or function which takes all
      allocation information and returns the preferred node.  bootmem
      generic code will always try the preferred node first and then
      fallback to other nodes as usual.
      
      Breakages noted and changes reviewed by Johannes Weiner.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      d0c4f570
    • T
      percpu: kill compile warning in pcpu_populate_chunk() · 02d51fdf
      Tejun Heo 提交于
      Impact: remove compile warning
      
      Mark local variable map_end in pcpu_populate_chunk() with
      uninitialized_var().  The variable is always used in tandem with
      map_start and guaranteed to be initialized before use but gcc doesn't
      understand that.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NIngo Molnar <mingo@elte.hu>
      02d51fdf
  6. 28 2月, 2009 2 次提交
    • V
      mm: fix lazy vmap purging (use-after-free error) · cbb76676
      Vegard Nossum 提交于
      I just got this new warning from kmemcheck:
      
          WARNING: kmemcheck: Caught 32-bit read from freed memory (c7806a60)
          a06a80c7ecde70c1a04080c700000000a06709c1000000000000000000000000
           f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f
           ^
      
          Pid: 0, comm: swapper Not tainted (2.6.29-rc4 #230)
          EIP: 0060:[<c1096df7>] EFLAGS: 00000286 CPU: 0
          EIP is at __purge_vmap_area_lazy+0x117/0x140
          EAX: 00070f43 EBX: c7806a40 ECX: c1677080 EDX: 00027b66
          ESI: 00002001 EDI: c170df0c EBP: c170df00 ESP: c178830c
           DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
          CR0: 80050033 CR2: c7806b14 CR3: 01775000 CR4: 00000690
          DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
          DR6: 00004000 DR7: 00000000
           [<c1096f3e>] free_unmap_vmap_area_noflush+0x6e/0x70
           [<c1096f6a>] remove_vm_area+0x2a/0x70
           [<c1097025>] __vunmap+0x45/0xe0
           [<c10970de>] vunmap+0x1e/0x30
           [<c1008ba5>] text_poke+0x95/0x150
           [<c1008ca9>] alternatives_smp_unlock+0x49/0x60
           [<c171ef47>] alternative_instructions+0x11b/0x124
           [<c171f991>] check_bugs+0xbd/0xdc
           [<c17148c5>] start_kernel+0x2ed/0x360
           [<c171409e>] __init_begin+0x9e/0xa9
           [<ffffffff>] 0xffffffff
      
      It happened here:
      
          $ addr2line -e vmlinux -i c1096df7
          mm/vmalloc.c:540
      
      Code:
      
      	list_for_each_entry(va, &valist, purge_list)
      		__free_vmap_area(va);
      
      It's this instruction:
      
          mov    0x20(%ebx),%edx
      
      Which corresponds to a dereference of va->purge_list.next:
      
          (gdb) p ((struct vmap_area *) 0)->purge_list.next
          Cannot access memory at address 0x20
      
      It seems that we should use "safe" list traversal here, as the element
      is freed inside the loop. Please verify that this is the right fix.
      Acked-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbb76676
    • N
      mm: vmap fix overflow · 7766970c
      Nick Piggin 提交于
      The new vmap allocator can wrap the address and get confused in the case
      of large allocations or VMALLOC_END near the end of address space.
      
      Problem reported by Christoph Hellwig on a 32-bit XFS workload.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7766970c
  7. 26 2月, 2009 1 次提交
    • H
      shmem: fix shared anonymous accounting · 0b0a0806
      Hugh Dickins 提交于
      Each time I exit Firefox, /proc/meminfo's Committed_AS goes down almost
      400 kB: OVERCOMMIT_NEVER would be allowing overcommits it should
      prohibit.
      
      Commit fc8744ad "Stop playing silly
      games with the VM_ACCOUNT flag" changed shmem_file_setup() to set the
      shmem file's VM_ACCOUNT flag according to VM_NORESERVE not being set in
      the vma flags; but did so only _after_ the shmem_acct_size(flags, size)
      call which is expected to pre-account a shared anonymous object.
      
      It's all clearer if we switch shmem.c over to use VM_NORESERVE
      throughout in place of !VM_ACCOUNT.
      
      But I very nearly sent in a patch which mistakenly removed the
      accounting from tmpfs files: shmem_get_inode()'s memset was good for not
      setting VM_ACCOUNT, but now it needs to set VM_NORESERVE.
      
      Rather than setting that by default, then perhaps clearing it again in
      shmem_file_setup(), let's pass it as a flag to shmem_get_inode(): that
      allows us to remove the #ifdef CONFIG_SHMEM from shmem_file_setup().
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b0a0806
  8. 25 2月, 2009 2 次提交
  9. 24 2月, 2009 6 次提交
    • T
      percpu: add __read_mostly to variables which are mostly read only · 40150d37
      Tejun Heo 提交于
      Most global variables in percpu allocator are initialized during boot
      and read only from that point on.  Add __read_mostly as per Rusty's
      suggestion.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      40150d37
    • T
      percpu: give more latitude to arch specific first chunk initialization · 8d408b4b
      Tejun Heo 提交于
      Impact: more latitude for first percpu chunk allocation
      
      The first percpu chunk serves the kernel static percpu area and may or
      may not contain extra room for further dynamic allocation.
      Initialization of the first chunk needs to be done before normal
      memory allocation service is up, so it has its own init path -
      pcpu_setup_static().
      
      It seems archs need more latitude while initializing the first chunk
      for example to take advantage of large page mapping.  This patch makes
      the following changes to allow this.
      
      * Define PERCPU_DYNAMIC_RESERVE to give arch hint about how much space
        to reserve in the first chunk for further dynamic allocation.
      
      * Rename pcpu_setup_static() to pcpu_setup_first_chunk().
      
      * Make pcpu_setup_first_chunk() much more flexible by fetching page
        pointer by callback and adding optional @unit_size, @free_size and
        @base_addr arguments which allow archs to selectively part of chunk
        initialization to their likings.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      8d408b4b
    • T
      percpu: remove unit_size power-of-2 restriction · d9b55eeb
      Tejun Heo 提交于
      Impact: allow unit_size to be arbitrary multiple of PAGE_SIZE
      
      In dynamic percpu allocator, there is no reason the unit size should
      be power of two.  Remove the restriction.
      
      As non-power-of-two unit size means that empty chunks fall into the
      same slot index as lightly occupied chunks which is bad for reclaming.
      Reserve an extra slot for empty chunks.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      d9b55eeb
    • T
      vmalloc: add @align to vm_area_register_early() · c0c0a293
      Tejun Heo 提交于
      Impact: allow larger alignment for early vmalloc area allocation
      
      Some early vmalloc users might want larger alignment, for example, for
      custom large page mapping.  Add @align to vm_area_register_early().
      While at it, drop docbook comment on non-existent @size.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      c0c0a293
    • T
      bootmem: clean up arch-specific bootmem wrapping · c1329375
      Tejun Heo 提交于
      Impact: cleaner and consistent bootmem wrapping
      
      By setting CONFIG_HAVE_ARCH_BOOTMEM_NODE, archs can define
      arch-specific wrappers for bootmem allocation.  However, this is done
      a bit strangely in that only the high level convenience macros can be
      changed while lower level, but still exported, interface functions
      can't be wrapped.  This not only is messy but also leads to strange
      situation where alloc_bootmem() does what the arch wants it to do but
      the equivalent __alloc_bootmem() call doesn't although they should be
      able to be used interchangeably.
      
      This patch updates bootmem such that archs can override / wrap the
      backend function - alloc_bootmem_core() instead of the highlevel
      interface functions to allow simpler and consistent wrapping.  Also,
      HAVE_ARCH_BOOTMEM_NODE is renamed to HAVE_ARCH_BOOTMEM.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@saeurebad.de>
      c1329375
    • T
      percpu: fix pcpu_chunk_struct_size · cb83b42e
      Tejun Heo 提交于
      Impact: fix short allocation leading to memory corruption
      
      While dropping rvalue wrapping macros around global parameters,
      pcpu_chunk_struct_size was set incorrectly resulting in shorter page
      pointer array.  Fix it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      cb83b42e
  10. 22 2月, 2009 3 次提交
  11. 21 2月, 2009 3 次提交
  12. 20 2月, 2009 5 次提交
    • T
      percpu: implement new dynamic percpu allocator · fbf59bc9
      Tejun Heo 提交于
      Impact: new scalable dynamic percpu allocator which allows dynamic
              percpu areas to be accessed the same way as static ones
      
      Implement scalable dynamic percpu allocator which can be used for both
      static and dynamic percpu areas.  This will allow static and dynamic
      areas to share faster direct access methods.  This feature is optional
      and enabled only when CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is defined by
      arch.  Please read comment on top of mm/percpu.c for details.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      fbf59bc9
    • T
      vmalloc: add un/map_kernel_range_noflush() · 8fc48985
      Tejun Heo 提交于
      Impact: two more public map/unmap functions
      
      Implement map_kernel_range_noflush() and unmap_kernel_range_noflush().
      These functions respectively map and unmap address range in kernel VM
      area but doesn't do any vcache or tlb flushing.  These will be used by
      new percpu allocator.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      8fc48985
    • T
      vmalloc: implement vm_area_register_early() · f0aa6617
      Tejun Heo 提交于
      Impact: allow multiple early vm areas
      
      There are places where kernel VM area needs to be allocated before
      vmalloc is initialized.  This is done by allocating static vm_struct,
      initializing several fields and linking it to vmlist and later vmalloc
      initialization picking up these from vmlist.  This is currently done
      manually and if there's more than one such areas, there's no defined
      way to arbitrate who gets which address.
      
      This patch implements vm_area_register_early(), which takes vm_area
      struct with flags and size initialized, assigns address to it and puts
      it on the vmlist.  This way, multiple early vm areas can determine
      which addresses they should use.  The only current user - alpha mm
      init - is converted to use it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f0aa6617
    • T
      percpu: kill percpu_alloc() and friends · f2a8205c
      Tejun Heo 提交于
      Impact: kill unused functions
      
      percpu_alloc() and its friends never saw much action.  It was supposed
      to replace the cpu-mask unaware __alloc_percpu() but it never happened
      and in fact __percpu_alloc_mask() itself never really grew proper
      up/down handling interface either (no exported interface for
      populate/depopulate).
      
      percpu allocation is about to go through major reimplementation and
      there's no reason to carry this unused interface around.  Replace it
      with __alloc_percpu() and free_percpu().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f2a8205c
    • T
      vmalloc: call flush_cache_vunmap() from unmap_kernel_range() · 73426952
      Tejun Heo 提交于
      Impact: proper vcache flush on unmap_kernel_range()
      
      flush_cache_vunmap() should be called before pages are unmapped.  Add
      a call to it in unmap_kernel_range().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      73426952
  13. 19 2月, 2009 4 次提交
    • K
      mm: fix memmap init for handling memory hole · cc2559bc
      KAMEZAWA Hiroyuki 提交于
      Now, early_pfn_in_nid(PFN, NID) may returns false if PFN is a hole.
      and memmap initialization was not done. This was a trouble for
      sparc boot.
      
      To fix this, the PFN should be initialized and marked as PG_reserved.
      This patch changes early_pfn_in_nid() return true if PFN is a hole.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reported-by: NDavid Miller <davem@davemlloft.net>
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc2559bc
    • K
      mm: clean up for early_pfn_to_nid() · f2dbcfa7
      KAMEZAWA Hiroyuki 提交于
      What's happening is that the assertion in mm/page_alloc.c:move_freepages()
      is triggering:
      
      	BUG_ON(page_zone(start_page) != page_zone(end_page));
      
      Once I knew this is what was happening, I added some annotations:
      
      	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
      		printk(KERN_ERR "move_freepages: Bogus zones: "
      		       "start_page[%p] end_page[%p] zone[%p]\n",
      		       start_page, end_page, zone);
      		printk(KERN_ERR "move_freepages: "
      		       "start_zone[%p] end_zone[%p]\n",
      		       page_zone(start_page), page_zone(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
      		       page_to_pfn(start_page), page_to_pfn(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_nid[%d] end_nid[%d]\n",
      		       page_to_nid(start_page), page_to_nid(end_page));
       ...
      
      And here's what I got:
      
      	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
      	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
      	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
      	move_freepages: start_nid[1] end_nid[0]
      
      My memory layout on this box is:
      
      [    0.000000] Zone PFN ranges:
      [    0.000000]   Normal   0x00000000 -> 0x0081ff5d
      [    0.000000] Movable zone start PFN for each node
      [    0.000000] early_node_map[8] active PFN ranges
      [    0.000000]     0: 0x00000000 -> 0x00020000
      [    0.000000]     1: 0x00800000 -> 0x0081f7ff
      [    0.000000]     1: 0x0081f800 -> 0x0081fe50
      [    0.000000]     1: 0x0081fed1 -> 0x0081fed8
      [    0.000000]     1: 0x0081feda -> 0x0081fedb
      [    0.000000]     1: 0x0081fedd -> 0x0081fee5
      [    0.000000]     1: 0x0081fee7 -> 0x0081ff51
      [    0.000000]     1: 0x0081ff59 -> 0x0081ff5d
      
      So it's a block move in that 0x81f600-->0x81f7ff region which triggers
      the problem.
      
      This patch:
      
      Declaration of early_pfn_to_nid() is scattered over per-arch include
      files, and it seems it's complicated to know when the declaration is used.
       I think it makes fix-for-memmap-init not easy.
      
      This patch moves all declaration to include/linux/mm.h
      
      After this,
        if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use static definition in include/linux/mm.h
        else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use generic definition in mm/page_alloc.c
        else
           -> per-arch back end function will be called.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: NDavid Miller <davem@davemlloft.net>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2dbcfa7
    • N
      mm: task dirty accounting fix · 1cf6e7d8
      Nick Piggin 提交于
      YAMAMOTO-san noticed that task_dirty_inc doesn't seem to be called properly for
      cases where set_page_dirty is not used to dirty a page (eg. mark_buffer_dirty).
      
      Additionally, there is some inconsistency about when task_dirty_inc is
      called.  It is used for dirty balancing, however it even gets called for
      __set_page_dirty_no_writeback.
      
      So rather than increment it in a set_page_dirty wrapper, move it down to
      exactly where the dirty page accounting stats are incremented.
      
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1cf6e7d8
    • B
      vmalloc: add __get_vm_area_caller() · c2968612
      Benjamin Herrenschmidt 提交于
      We have get_vm_area_caller() and __get_vm_area() but not
      __get_vm_area_caller()
      
      On powerpc, I use __get_vm_area() to separate the ranges of addresses
      given to vmalloc vs.  ioremap (various good reasons for that) so in order
      to be able to implement the new caller tracking in /proc/vmallocinfo, I
      need a "_caller" variant of it.
      
      (akpm: needed for ongoing powerpc development, so merge it early)
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2968612
  14. 18 2月, 2009 1 次提交