1. 14 8月, 2009 1 次提交
    • T
      percpu, sparc64: fix sparse possible cpu map handling · 74d46d6b
      Tejun Heo 提交于
      percpu code has been assuming num_possible_cpus() == nr_cpu_ids which
      is incorrect if cpu_possible_map contains holes.  This causes percpu
      code to access beyond allocated memories and vmalloc areas.  On a
      sparc64 machine with cpus 0 and 2 (u60), this triggers the following
      warning or fails boot.
      
       WARNING: at /devel/tj/os/work/mm/vmalloc.c:106 vmap_page_range_noflush+0x1f0/0x240()
       Modules linked in:
       Call Trace:
        [00000000004b17d0] vmap_page_range_noflush+0x1f0/0x240
        [00000000004b1840] map_vm_area+0x20/0x60
        [00000000004b1950] __vmalloc_area_node+0xd0/0x160
        [0000000000593434] deflate_init+0x14/0xe0
        [0000000000583b94] __crypto_alloc_tfm+0xd4/0x1e0
        [00000000005844f0] crypto_alloc_base+0x50/0xa0
        [000000000058b898] alg_test_comp+0x18/0x80
        [000000000058dad4] alg_test+0x54/0x180
        [000000000058af00] cryptomgr_test+0x40/0x60
        [0000000000473098] kthread+0x58/0x80
        [000000000042b590] kernel_thread+0x30/0x60
        [0000000000472fd0] kthreadd+0xf0/0x160
       ---[ end trace 429b268a213317ba ]---
      
      This patch fixes generic percpu functions and sparc64
      setup_per_cpu_areas() so that they handle sparse cpu_possible_map
      properly.
      
      Please note that on x86, cpu_possible_map() doesn't contain holes and
      thus num_possible_cpus() == nr_cpu_ids and this patch doesn't cause
      any behavior difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      74d46d6b
  2. 22 6月, 2009 2 次提交
    • T
      x86: implement percpu_alloc kernel parameter · fa8a7094
      Tejun Heo 提交于
      According to Andi, it isn't clear whether lpage allocator is worth the
      trouble as there are many processors where PMD TLB is far scarcer than
      PTE TLB.  The advantage or disadvantage probably depends on the actual
      size of percpu area and specific processor.  As performance
      degradation due to TLB pressure tends to be highly workload specific
      and subtle, it is difficult to decide which way to go without more
      data.
      
      This patch implements percpu_alloc kernel parameter to allow selecting
      which first chunk allocator to use to ease debugging and testing.
      
      While at it, make sure all the failure paths report why something
      failed to help determining why certain allocator isn't working.  Also,
      kill the "Great future plan" comment which had already been realized
      quite some time ago.
      
      [ Impact: allow explicit percpu first chunk allocator selection ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      fa8a7094
    • T
      percpu: fix too lazy vunmap cache flushing · 85ae87c1
      Tejun Heo 提交于
      In pcpu_unmap(), flushing virtual cache on vunmap can't be delayed as
      the page is going to be returned to the page allocator.  Only TLB
      flushing can be put off such that vmalloc code can handle it lazily.
      Fix it.
      
      [ Impact: fix subtle virtual cache flush bug ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      85ae87c1
  3. 09 4月, 2009 2 次提交
    • C
      percpu: remove rbtree and use page->index instead · e1b9aa3f
      Christoph Lameter 提交于
      Impact: use page->index for addr to chunk mapping instead of dedicated rbtree
      
      The rbtree is used to determine the chunk from the virtual address.
      However, we can already determine the page struct from a virtual
      address and there are several unused fields in page struct used by
      vmalloc.  Use the index field to store a pointer to the chunk. Then
      there is no need anymore for an rbtree.
      
      tj: * s/(set|get)_chunk/pcpu_\1_page_chunk/
      
          * Drop inline from the above two functions and moved them upwards
            so that they are with other simple helpers.
      
          * Initial pages might not (actually most of the time don't) live
            in the vmalloc area.  With the previous patch to manually
            reverse-map both first chunks, this is no longer an issue.
            Removed pcpu_set_chunk() call on initial pages.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: rusty@rustcorp.com.au
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: rmk@arm.linux.org.uk
      Cc: starvik@axis.com
      Cc: ralf@linux-mips.org
      Cc: davem@davemloft.net
      Cc: cooloney@kernel.org
      Cc: kyle@mcmartin.ca
      Cc: matthew@wil.cx
      Cc: grundler@parisc-linux.org
      Cc: takata@linux-m32r.org
      Cc: benh@kernel.crashing.org
      Cc: rth@twiddle.net
      Cc: ink@jurassic.park.msu.ru
      Cc: heiko.carstens@de.ibm.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      LKML-Reference: <49D43D58.4050102@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e1b9aa3f
    • T
      percpu: don't put the first chunk in reverse-map rbtree · ae9e6bc9
      Tejun Heo 提交于
      Impact: both first chunks don't use rbtree, no functional change
      
      There can be two first chunks - reserved and dynamic with the former
      one being optional.  Dynamic first chunk was linked on reverse-mapping
      rbtree while the reserved one was mapped manually using the start
      address and reserved offset limit.
      
      This patch makes both first chunks to be looked up manually without
      using the rbtree.  This is to help getting rid of the rbtree.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: rusty@rustcorp.com.au
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: rmk@arm.linux.org.uk
      Cc: starvik@axis.com
      Cc: ralf@linux-mips.org
      Cc: davem@davemloft.net
      Cc: cooloney@kernel.org
      Cc: kyle@mcmartin.ca
      Cc: matthew@wil.cx
      Cc: grundler@parisc-linux.org
      Cc: takata@linux-m32r.org
      Cc: benh@kernel.crashing.org
      Cc: rth@twiddle.net
      Cc: ink@jurassic.park.msu.ru
      Cc: heiko.carstens@de.ibm.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Christoph Lameter <cl@linux.com>
      LKML-Reference: <49D43CEA.3040609@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ae9e6bc9
  4. 10 3月, 2009 3 次提交
    • T
      percpu: generalize embedding first chunk setup helper · 66c3a757
      Tejun Heo 提交于
      Impact: code reorganization
      
      Separate out embedding first chunk setup helper from x86 embedding
      first chunk allocator and put it in mm/percpu.c.  This will be used by
      the default percpu first chunk allocator and possibly by other archs.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      66c3a757
    • T
      percpu: more flexibility for @dyn_size of pcpu_setup_first_chunk() · 6074d5b0
      Tejun Heo 提交于
      Impact: cleanup, more flexibility for first chunk init
      
      Non-negative @dyn_size used to be allowed iff @unit_size wasn't auto.
      This restriction stemmed from implementation detail and made things a
      bit less intuitive.  This patch allows @dyn_size to be specified
      regardless of @unit_size and swaps the positions of @dyn_size and
      @unit_size so that the parameter order makes more sense (static,
      reserved and dyn sizes followed by enclosing unit_size).
      
      While at it, add @unit_size >= PCPU_MIN_UNIT_SIZE sanity check.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6074d5b0
    • T
      percpu: make x86 addr <-> pcpu ptr conversion macros generic · e0100983
      Tejun Heo 提交于
      Impact: generic addr <-> pcpu ptr conversion macros
      
      There's nothing arch specific about x86 __addr_to_pcpu_ptr() and
      __pcpu_ptr_to_addr().  With proper __per_cpu_load and __per_cpu_start
      defined, they'll do the right thing regardless of actual layout.
      
      Move these macros from arch/x86/include/asm/percpu.h to mm/percpu.c
      and allow archs to override it as necessary.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e0100983
  5. 07 3月, 2009 1 次提交
    • T
      percpu: finer grained locking to break deadlock and allow atomic free · ccea34b5
      Tejun Heo 提交于
      Impact: fix deadlock and allow atomic free
      
      Percpu allocation always uses GFP_KERNEL and whole alloc/free paths
      were protected by single mutex.  All percpu allocations have been from
      GFP_KERNEL-safe context and the original allocator had this assumption
      too.  However, by protecting both alloc and free paths with the same
      mutex, the new allocator creates free -> alloc -> GFP_KERNEL
      dependency which the original allocator didn't have.  This can lead to
      deadlock if free is called from FS or IO paths.  Also, in general,
      allocators are expected to allow free to be called from atomic
      context.
      
      This patch implements finer grained locking to break the deadlock and
      allow atomic free.  For details, please read the "Synchronization
      rules" comment.
      
      While at it, also add CONTEXT: to function comments to describe which
      context they expect to be called from and what they do to it.
      
      This problem was reported by Thomas Gleixner and Peter Zijlstra.
      
        http://thread.gmane.org/gmane.linux.kernel/802384Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      ccea34b5
  6. 06 3月, 2009 8 次提交
    • T
      percpu: move fully free chunk reclamation into a work · a56dbddf
      Tejun Heo 提交于
      Impact: code reorganization for later changes
      
      Do fully free chunk reclamation using a work.  This change is to
      prepare for locking changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a56dbddf
    • T
      percpu: move chunk area map extension out of area allocation · 9f7dcf22
      Tejun Heo 提交于
      Impact: code reorganization for later changes
      
      Separate out chunk area map extension into a separate function -
      pcpu_extend_area_map() - and call it directly from pcpu_alloc() such
      that pcpu_alloc_area() is guaranteed to have enough area map slots on
      invocation.
      
      With this change, pcpu_alloc_area() does only area allocation and the
      only failure mode is when the chunk doens't have enough room, so
      there's no need to distinguish it from memory allocation failures.
      Make it return -1 on such cases instead of hacky -ENOSPC.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9f7dcf22
    • T
      percpu: replace pcpu_realloc() with pcpu_mem_alloc() and pcpu_mem_free() · 1880d93b
      Tejun Heo 提交于
      Impact: code reorganization for later changes
      
      With static map handling moved to pcpu_split_block(), pcpu_realloc()
      only clutters the code and it's also unsuitable for scheduled locking
      changes.  Implement and use pcpu_mem_alloc/free() instead.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      1880d93b
    • T
      percpu, module: implement reserved allocation and use it for module percpu variables · edcb4639
      Tejun Heo 提交于
      Impact: add reserved allocation functionality and use it for module
      	percpu variables
      
      This patch implements reserved allocation from the first chunk.  When
      setting up the first chunk, arch can ask to set aside certain number
      of bytes right after the core static area which is available only
      through a separate reserved allocator.  This will be used primarily
      for module static percpu variables on architectures with limited
      relocation range to ensure that the module perpcu symbols are inside
      the relocatable range.
      
      If reserved area is requested, the first chunk becomes reserved and
      isn't available for regular allocation.  If the first chunk also
      includes piggy-back dynamic allocation area, a separate chunk mapping
      the same region is created to serve dynamic allocation.  The first one
      is called static first chunk and the second dynamic first chunk.
      Although they share the page map, their different area map
      initializations guarantee they serve disjoint areas according to their
      purposes.
      
      If arch doesn't setup reserved area, reserved allocation is handled
      like any other allocation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      edcb4639
    • T
      percpu: add an indirection ptr for chunk page map access · 3e24aa58
      Tejun Heo 提交于
      Impact: allow sharing page map, no functional difference yet
      
      Make chunk->page access indirect by adding a pointer and renaming the
      actual array to page_ar.  This will be used by future changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      3e24aa58
    • T
      percpu: use negative for auto for pcpu_setup_first_chunk() arguments · cafe8816
      Tejun Heo 提交于
      Impact: argument semantic cleanup
      
      In pcpu_setup_first_chunk(), zero @unit_size and @dyn_size meant
      auto-sizing.  It's okay for @unit_size as 0 doesn't make sense but 0
      dynamic reserve size is valid.  Alos, if arch @dyn_size is calculated
      from other parameters, it might end up passing in 0 @dyn_size and
      malfunction when the size is automatically adjusted.
      
      This patch makes both @unit_size and @dyn_size ssize_t and use -1 for
      auto sizing.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      cafe8816
    • T
      percpu: improve first chunk initial area map handling · 61ace7fa
      Tejun Heo 提交于
      Impact: no functional change
      
      When the first chunk is created, its initial area map is not allocated
      because kmalloc isn't online yet.  The map is allocated and
      initialized on the first allocation request on the chunk.  This works
      fine but the scattering of initialization logic between the init
      function and allocation path is a bit confusing.
      
      This patch makes the first chunk initialize and use minimal statically
      allocated map from pcpu_setpu_first_chunk().  The map resizing path
      still needs to handle this specially but it's more straight-forward
      and gives more latitude to the init path.  This will ease future
      changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      61ace7fa
    • T
      percpu: cosmetic renames in pcpu_setup_first_chunk() · 2441d15c
      Tejun Heo 提交于
      Impact: cosmetic, preparation for future changes
      
      Make the following renames in pcpur_setup_first_chunk() in preparation
      for future changes.
      
      * s/free_size/dyn_size/
      * s/static_vm/first_vm/
      * s/static_chunk/schunk/
      Signed-off-by: NTejun Heo <tj@kernel.org>
      2441d15c
  7. 01 3月, 2009 1 次提交
  8. 24 2月, 2009 5 次提交
    • T
      percpu: add __read_mostly to variables which are mostly read only · 40150d37
      Tejun Heo 提交于
      Most global variables in percpu allocator are initialized during boot
      and read only from that point on.  Add __read_mostly as per Rusty's
      suggestion.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      40150d37
    • T
      percpu: give more latitude to arch specific first chunk initialization · 8d408b4b
      Tejun Heo 提交于
      Impact: more latitude for first percpu chunk allocation
      
      The first percpu chunk serves the kernel static percpu area and may or
      may not contain extra room for further dynamic allocation.
      Initialization of the first chunk needs to be done before normal
      memory allocation service is up, so it has its own init path -
      pcpu_setup_static().
      
      It seems archs need more latitude while initializing the first chunk
      for example to take advantage of large page mapping.  This patch makes
      the following changes to allow this.
      
      * Define PERCPU_DYNAMIC_RESERVE to give arch hint about how much space
        to reserve in the first chunk for further dynamic allocation.
      
      * Rename pcpu_setup_static() to pcpu_setup_first_chunk().
      
      * Make pcpu_setup_first_chunk() much more flexible by fetching page
        pointer by callback and adding optional @unit_size, @free_size and
        @base_addr arguments which allow archs to selectively part of chunk
        initialization to their likings.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      8d408b4b
    • T
      percpu: remove unit_size power-of-2 restriction · d9b55eeb
      Tejun Heo 提交于
      Impact: allow unit_size to be arbitrary multiple of PAGE_SIZE
      
      In dynamic percpu allocator, there is no reason the unit size should
      be power of two.  Remove the restriction.
      
      As non-power-of-two unit size means that empty chunks fall into the
      same slot index as lightly occupied chunks which is bad for reclaming.
      Reserve an extra slot for empty chunks.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      d9b55eeb
    • T
      vmalloc: add @align to vm_area_register_early() · c0c0a293
      Tejun Heo 提交于
      Impact: allow larger alignment for early vmalloc area allocation
      
      Some early vmalloc users might want larger alignment, for example, for
      custom large page mapping.  Add @align to vm_area_register_early().
      While at it, drop docbook comment on non-existent @size.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      c0c0a293
    • T
      percpu: fix pcpu_chunk_struct_size · cb83b42e
      Tejun Heo 提交于
      Impact: fix short allocation leading to memory corruption
      
      While dropping rvalue wrapping macros around global parameters,
      pcpu_chunk_struct_size was set incorrectly resulting in shorter page
      pointer array.  Fix it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      cb83b42e
  9. 21 2月, 2009 1 次提交
    • T
      percpu: clean up size usage · cae3aeb8
      Tejun Heo 提交于
      Andrew was concerned about the unit of variables named or have suffix
      size.  Every usage in percpu allocator is in bytes but make it super
      clear by adding comments.
      
      While at it, make pcpu_depopulate_chunk() take int @off and @size like
      everyone else.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      cae3aeb8
  10. 20 2月, 2009 1 次提交
    • T
      percpu: implement new dynamic percpu allocator · fbf59bc9
      Tejun Heo 提交于
      Impact: new scalable dynamic percpu allocator which allows dynamic
              percpu areas to be accessed the same way as static ones
      
      Implement scalable dynamic percpu allocator which can be used for both
      static and dynamic percpu areas.  This will allow static and dynamic
      areas to share faster direct access methods.  This feature is optional
      and enabled only when CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is defined by
      arch.  Please read comment on top of mm/percpu.c for details.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      fbf59bc9