1. 19 6月, 2014 1 次提交
  2. 15 4月, 2014 1 次提交
    • J
      percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree() · 5a838c3b
      Jianyu Zhan 提交于
      pcpu_chunk_struct_size = sizeof(struct pcpu_chunk) +
      	BITS_TO_LONGS(pcpu_unit_pages) * sizeof(unsigned long)
      
      It hardly could be ever bigger than PAGE_SIZE even for large-scale machine,
      but for consistency with its couterpart pcpu_mem_zalloc(),
      use pcpu_mem_free() instead.
      
      Commit b4916cb1 ("percpu: make pcpu_free_chunk() use
      pcpu_mem_free() instead of kfree()") addressed this problem, but
      missed this one.
      
      tj: commit message updated
      Signed-off-by: NJianyu Zhan <nasa4836@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: 099a19d9 ("percpu: allow limited allocation before slab is online)
      Cc: stable@vger.kernel.org
      5a838c3b
  3. 29 3月, 2014 1 次提交
  4. 18 3月, 2014 1 次提交
    • V
      percpu: allocation size should be even · 2f69fa82
      Viro 提交于
      723ad1d9 ("percpu: store offsets instead of lengths in ->map[]")
      updated percpu area allocator to use the lowest bit, instead of sign,
      to signify whether the area is occupied and forced min align to 2;
      unfortunately, it forgot to force the allocation size to be even
      causing malfunctions for the very rare odd-sized allocations.
      
      Always force the allocations to be even sized.
      
      tj: Wrote patch description.
      Original-patch-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      2f69fa82
  5. 07 3月, 2014 3 次提交
    • A
      percpu: speed alloc_pcpu_area() up · 3d331ad7
      Al Viro 提交于
      If we know that first N areas are all in use, we can obviously skip
      them when searching for a free one.  And that kind of hint is very
      easy to maintain.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      3d331ad7
    • A
      percpu: store offsets instead of lengths in ->map[] · 723ad1d9
      Al Viro 提交于
      Current code keeps +-length for each area in chunk->map[].  It has
      several unpleasant consequences:
      	* even if we know that first 50 areas are all in use, allocation
      still needs to go through all those areas just to sum their sizes, just
      to get the offset of free one.
      	* freeing needs to find the array entry refering to the area
      in question; again, the need to sum the sizes until we reach the offset
      we are interested in.  Note that offsets are monotonous, so simple
      binary search would do here.
      
      	New data representation: array of <offset,in-use flag> pairs.
      Each pair is represented by one int - we use offset|1 for <offset, in use>
      and offset for <offset, free> (we make sure that all offsets are even).
      In the end we put a sentry entry - <total size, in use>.  The first
      entry is <0, flag>; it would be possible to store together the flag
      for Nth area and offset for N+1st, but that leads to much hairier code.
      
      In other words, where the old variant would have
      	4, -8, -4, 4, -12, 100
      (4 bytes free, 8 in use, 4 in use, 4 free, 12 in use, 100 free) we store
      	<0,0>, <4,1>, <12,1>, <16,0>, <20,1>, <32,0>, <132,1>
      i.e.
      	0, 5, 13, 16, 21, 32, 133
      
      This commit switches to new data representation and takes care of a couple
      of low-hanging fruits in free_pcpu_area() - one is the switch to binary
      search, another is not doing two memmove() when one would do.  Speeding
      the alloc side up (by keeping track of how many areas in the beginning are
      known to be all in use) also becomes possible - that'll be done in the next
      commit.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      723ad1d9
    • A
      perpcu: fold pcpu_split_block() into the only caller · 706c16f2
      Al Viro 提交于
      ... and simplify the results a bit.  Makes the next step easier
      to deal with - we will be changing the data representation for
      chunk->map[] and it's easier to do if the code in question is
      not split between pcpu_alloc_area() and pcpu_split_block().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      706c16f2
  6. 22 1月, 2014 1 次提交
    • S
      mm/percpu.c: use memblock apis for early memory allocations · 999c17e3
      Santosh Shilimkar 提交于
      Switch to memblock interfaces for early memory allocator instead of
      bootmem allocator.  No functional change in beahvior than what it is in
      current code from bootmem users points of view.
      
      Archs already converted to NO_BOOTMEM now directly use memblock
      interfaces instead of bootmem wrappers build on top of memblock.  And
      the archs which still uses bootmem, these new apis just fallback to
      exiting bootmem APIs.
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Walmsley <paul@pwsan.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      999c17e3
  7. 21 1月, 2014 1 次提交
  8. 23 9月, 2013 1 次提交
    • M
      percpu: fix bootmem error handling in pcpu_page_first_chunk() · f851c8d8
      Michael Holzheu 提交于
      If memory allocation of in pcpu_embed_first_chunk() fails, the
      allocated memory is not released correctly. In the release loop also
      the non-allocated elements are released which leads to the following
      kernel BUG on systems with very little memory:
      
      [    0.000000] kernel BUG at mm/bootmem.c:307!
      [    0.000000] illegal operation: 0001 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [    0.000000] Modules linked in:
      [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0 #22
      [    0.000000] task: 0000000000a20ae0 ti: 0000000000a08000 task.ti: 0000000000a08000
      [    0.000000] Krnl PSW : 0400000180000000 0000000000abda7a (__free+0x116/0x154)
      [    0.000000]            R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:0 PM:0 EA:3
      ...
      [    0.000000]  [<0000000000abdce2>] mark_bootmem_node+0xde/0xf0
      [    0.000000]  [<0000000000abdd9c>] mark_bootmem+0xa8/0x118
      [    0.000000]  [<0000000000abcbba>] pcpu_embed_first_chunk+0xe7a/0xf0c
      [    0.000000]  [<0000000000abcc96>] setup_per_cpu_areas+0x4a/0x28c
      
      To fix the problem now only allocated elements are released. This then
      leads to the correct kernel panic:
      
      [    0.000000] Kernel panic - not syncing: Failed to initialize percpu areas.
      ...
      [    0.000000] Call Trace:
      [    0.000000] ([<000000000011307e>] show_trace+0x132/0x150)
      [    0.000000]  [<0000000000113160>] show_stack+0xc4/0xd4
      [    0.000000]  [<00000000007127dc>] dump_stack+0x74/0xd8
      [    0.000000]  [<00000000007123fe>] panic+0xea/0x264
      [    0.000000]  [<0000000000b14814>] setup_per_cpu_areas+0x5c/0x28c
      
      tj: Flipped if conditional so that it doesn't need "continue".
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f851c8d8
  9. 02 12月, 2012 1 次提交
  10. 29 10月, 2012 1 次提交
  11. 06 10月, 2012 1 次提交
  12. 10 5月, 2012 2 次提交
  13. 30 3月, 2012 1 次提交
  14. 16 12月, 2011 1 次提交
    • E
      percpu: fix per_cpu_ptr_to_phys() handling of non-page-aligned addresses · 9f57bd4d
      Eugene Surovegin 提交于
      per_cpu_ptr_to_phys() incorrectly rounds up its result for non-kmalloc
      case to the page boundary, which is bogus for any non-page-aligned
      address.
      
      This affects the only in-tree user of this function - sysfs handler
      for per-cpu 'crash_notes' physical address.  The trouble is that the
      crash_notes per-cpu variable is not page-aligned:
      
      crash_notes = 0xc08e8ed4
      PER-CPU OFFSET VALUES:
       CPU 0: 3711f000
       CPU 1: 37129000
       CPU 2: 37133000
       CPU 3: 3713d000
      
      So, the per-cpu addresses are:
       crash_notes on CPU 0: f7a07ed4 => phys 36b57ed4
       crash_notes on CPU 1: f7a11ed4 => phys 36b4ded4
       crash_notes on CPU 2: f7a1bed4 => phys 36b43ed4
       crash_notes on CPU 3: f7a25ed4 => phys 36b39ed4
      
      However, /sys/devices/system/cpu/cpu*/crash_notes says:
       /sys/devices/system/cpu/cpu0/crash_notes: 36b57000
       /sys/devices/system/cpu/cpu1/crash_notes: 36b4d000
       /sys/devices/system/cpu/cpu2/crash_notes: 36b43000
       /sys/devices/system/cpu/cpu3/crash_notes: 36b39000
      
      As you can see, all values are rounded down to a page
      boundary. Consequently, this is where kexec sets up the NOTE segments,
      and thus where the secondary kernel is looking for them. However, when
      the first kernel crashes, it saves the notes to the unaligned
      addresses, where they are not found.
      
      Fix it by adding offset_in_page() to the translated page address.
      
      -tj: Combined Eugene's and Petr's commit messages.
      Signed-off-by: NEugene Surovegin <ebs@ebshome.net>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NPetr Tesarik <ptesarik@suse.cz>
      Cc: stable@kernel.org
      9f57bd4d
  15. 03 12月, 2011 1 次提交
  16. 24 11月, 2011 1 次提交
  17. 23 11月, 2011 2 次提交
    • T
      percpu: fix chunk range calculation · a855b84c
      Tejun Heo 提交于
      Percpu allocator recorded the cpus which map to the first and last
      units in pcpu_first/last_unit_cpu respectively and used them to
      determine the address range of a chunk - e.g. it assumed that the
      first unit has the lowest address in a chunk while the last unit has
      the highest address.
      
      This simply isn't true.  Groups in a chunk can have arbitrary positive
      or negative offsets from the previous one and there is no guarantee
      that the first unit occupies the lowest offset while the last one the
      highest.
      
      Fix it by actually comparing unit offsets to determine cpus occupying
      the lowest and highest offsets.  Also, rename pcu_first/last_unit_cpu
      to pcpu_low/high_unit_cpu to avoid confusion.
      
      The chunk address range is used to flush cache on vmalloc area
      map/unmap and decide whether a given address is in the first chunk by
      per_cpu_ptr_to_phys() and the bug was discovered by invalid
      per_cpu_ptr_to_phys() translation for crash_note.
      
      Kudos to Dave Young for tracking down the problem.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Reported-by: NDave Young <dyoung@redhat.com>
      Tested-by: NDave Young <dyoung@redhat.com>
      LKML-Reference: <4EC21F67.10905@redhat.com>
      Cc: stable @kernel.org
      a855b84c
    • B
      percpu: rename pcpu_mem_alloc to pcpu_mem_zalloc · 90459ce0
      Bob Liu 提交于
      Currently pcpu_mem_alloc() is implemented always return zeroed memory.
      So rename it to make user like pcpu_get_pages_and_bitmap() know don't
      reinit it.
      Signed-off-by: NBob Liu <lliubbo@gmail.com>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      90459ce0
  18. 31 3月, 2011 1 次提交
  19. 29 3月, 2011 1 次提交
    • M
      percpu: Cast away printk format warning · 787e5b06
      Mike Frysinger 提交于
      On 32-bit systems which don't happen to implicitly define or cast
      VMALLOC_START and/or VMALLOC_END to long in their arch headers, the
      printk in the percpu code will cause a warning to be emitted:
      
      mm/percpu.c: In function 'pcpu_embed_first_chunk':
      mm/percpu.c:1648: warning: format '%lx' expects type 'long unsigned int',
              but argument 3 has type 'unsigned int'
      
      So add an explicit cast to unsigned long here.
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      787e5b06
  20. 28 3月, 2011 1 次提交
    • D
      NOMMU: percpu should use is_vmalloc_addr(). · eac522ef
      David Howells 提交于
      per_cpu_ptr_to_phys() uses VMALLOC_START and VMALLOC_END to determine if an
      address is in the vmalloc() region or not.  This is incorrect on NOMMU as
      there is no real vmalloc() capability (vmalloc() is emulated by kmalloc()).
      
      The correct way to do this is to use is_vmalloc_addr().  This encapsulates the
      vmalloc() region test in MMU mode and just returns 0 in NOMMU mode.
      
      On FRV in NOMMU mode, the percpu compilation fails without this patch:
      
      mm/percpu.c: In function 'per_cpu_ptr_to_phys':
      mm/percpu.c:1011: error: 'VMALLOC_START' undeclared (first use in this function)
      mm/percpu.c:1011: error: (Each undeclared identifier is reported only once
      mm/percpu.c:1011: error: for each function it appears in.)
      mm/percpu.c:1012: error: 'VMALLOC_END' undeclared (first use in this function)
      mm/percpu.c:1018: warning: control reaches end of non-void function
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      eac522ef
  21. 25 3月, 2011 1 次提交
    • T
      percpu: Always align percpu output section to PAGE_SIZE · 0415b00d
      Tejun Heo 提交于
      Percpu allocator honors alignment request upto PAGE_SIZE and both the
      percpu addresses in the percpu address space and the translated kernel
      addresses should be aligned accordingly.  The calculation of the
      former depends on the alignment of percpu output section in the kernel
      image.
      
      The linker script macros PERCPU_VADDR() and PERCPU() are used to
      define this output section and the latter takes @align parameter.
      Several architectures are using @align smaller than PAGE_SIZE breaking
      percpu memory alignment.
      
      This patch removes @align parameter from PERCPU(), renames it to
      PERCPU_SECTION() and makes it always align to PAGE_SIZE.  While at it,
      add PCPU_SETUP_BUG_ON() checks such that alignment problems are
      reliably detected and remove percpu alignment comment recently added
      in workqueue.c as the condition would trigger BUG way before reaching
      there.
      
      For um, this patch raises the alignment of percpu area.  As the area
      is in .init, there shouldn't be any noticeable difference.
      
      This problem was discovered by David Howells while debugging boot
      failure on mn10300.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMike Frysinger <vapier@gentoo.org>
      Cc: uclinux-dist-devel@blackfin.uclinux.org
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: user-mode-linux-devel@lists.sourceforge.net
      0415b00d
  22. 22 12月, 2010 1 次提交
  23. 07 12月, 2010 1 次提交
  24. 02 11月, 2010 1 次提交
  25. 02 10月, 2010 1 次提交
    • T
      percpu: use percpu allocator on UP too · 9b8327bb
      Tejun Heo 提交于
      On UP, percpu allocations were redirected to kmalloc.  This has the
      following problems.
      
      * For certain amount of allocations (determined by
        PERCPU_DYNAMIC_EARLY_SLOTS and PERCPU_DYNAMIC_EARLY_SIZE), percpu
        allocator can be used before the usual kernel memory allocator is
        brought online.  On SMP, this is used to initialize the kernel
        memory allocator.
      
      * percpu allocator honors alignment upto PAGE_SIZE but kmalloc()
        doesn't.  For example, workqueue makes use of larger alignments for
        cpu_workqueues.
      
      Currently, users of percpu allocators need to handle UP differently,
      which is somewhat fragile and ugly.  Other than small amount of
      memory, there isn't much to lose by enabling percpu allocator on UP.
      It can simply use kernel memory based chunk allocation which was added
      for SMP archs w/o MMUs.
      
      This patch removes mm/percpu_up.c, builds mm/percpu.c on UP too and
      makes UP build use percpu-km.  As percpu addresses and kernel
      addresses are always identity mapped and static percpu variables don't
      need any special treatment, nothing is arch dependent and mm/percpu.c
      implements generic setup_per_cpu_areas() for UP.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      9b8327bb
  26. 21 9月, 2010 1 次提交
    • T
      percpu: fix pcpu_last_unit_cpu · 46b30ea9
      Tejun Heo 提交于
      pcpu_first/last_unit_cpu are used to track which cpu has the first and
      last units assigned.  This in turn is used to determine the span of a
      chunk for man/unmap cache flushes and whether an address belongs to
      the first chunk or not in per_cpu_ptr_to_phys().
      
      When the number of possible CPUs isn't power of two, a chunk may
      contain unassigned units towards the end of a chunk.  The logic to
      determine pcpu_last_unit_cpu was incorrect when there was an unused
      unit at the end of a chunk.  It failed to ignore the unused unit and
      assigned the unused marker NR_CPUS to pcpu_last_unit_cpu.
      
      This was discovered through kdump failure which was caused by
      malfunctioning per_cpu_ptr_to_phys() on a kvm setup with 50 possible
      CPUs by CAI Qian.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NCAI Qian <caiqian@redhat.com>
      Cc: stable@kernel.org
      46b30ea9
  27. 10 9月, 2010 2 次提交
  28. 08 9月, 2010 1 次提交
    • T
      percpu: use percpu allocator on UP too · bbddff05
      Tejun Heo 提交于
      On UP, percpu allocations were redirected to kmalloc.  This has the
      following problems.
      
      * For certain amount of allocations (determined by
        PERCPU_DYNAMIC_EARLY_SLOTS and PERCPU_DYNAMIC_EARLY_SIZE), percpu
        allocator can be used before the usual kernel memory allocator is
        brought online.  On SMP, this is used to initialize the kernel
        memory allocator.
      
      * percpu allocator honors alignment upto PAGE_SIZE but kmalloc()
        doesn't.  For example, workqueue makes use of larger alignments for
        cpu_workqueues.
      
      Currently, users of percpu allocators need to handle UP differently,
      which is somewhat fragile and ugly.  Other than small amount of
      memory, there isn't much to lose by enabling percpu allocator on UP.
      It can simply use kernel memory based chunk allocation which was added
      for SMP archs w/o MMUs.
      
      This patch removes mm/percpu_up.c, builds mm/percpu.c on UP too and
      makes UP build use percpu-km.  As percpu addresses and kernel
      addresses are always identity mapped and static percpu variables don't
      need any special treatment, nothing is arch dependent and mm/percpu.c
      implements generic setup_per_cpu_areas() for UP.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      bbddff05
  29. 27 8月, 2010 2 次提交
  30. 11 8月, 2010 1 次提交
  31. 28 6月, 2010 2 次提交
    • T
      percpu: allow limited allocation before slab is online · 099a19d9
      Tejun Heo 提交于
      This patch updates percpu allocator such that it can serve limited
      amount of allocation before slab comes online.  This is primarily to
      allow slab to depend on working percpu allocator.
      
      Two parameters, PERCPU_DYNAMIC_EARLY_SIZE and SLOTS, determine how
      much memory space and allocation map slots are reserved.  If this
      reserved area is exhausted, WARN_ON_ONCE() will trigger and allocation
      will fail till slab comes online.
      
      The following changes are made to implement early alloc.
      
      * pcpu_mem_alloc() now checks slab_is_available()
      
      * Chunks are allocated using pcpu_mem_alloc()
      
      * Init paths make sure ai->dyn_size is at least as large as
        PERCPU_DYNAMIC_EARLY_SIZE.
      
      * Initial alloc maps are allocated in __initdata and copied to
        kmalloc'd areas once slab is online.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      099a19d9
    • T
      percpu: make @dyn_size always mean min dyn_size in first chunk init functions · 4ba6ce25
      Tejun Heo 提交于
      In pcpu_build_alloc_info() and pcpu_embed_first_chunk(), @dyn_size was
      ssize_t, -1 meant auto-size, 0 forced 0 and positive meant minimum
      size.  There's no use case for forcing 0 and the upcoming early alloc
      support always requires non-zero dynamic size.  Make @dyn_size always
      mean minimum dyn_size.
      
      While at it, make pcpu_build_alloc_info() static which doesn't have
      any external caller as suggested by David Rientjes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      4ba6ce25
  32. 18 6月, 2010 1 次提交
    • T
      percpu: fix first chunk match in per_cpu_ptr_to_phys() · 9983b6f0
      Tejun Heo 提交于
      per_cpu_ptr_to_phys() determines whether the passed in @addr belongs
      to the first_chunk or not by just matching the address against the
      address range of the base unit (unit0, used by cpu0).  When an adress
      from another cpu was passed in, it will always determine that the
      address doesn't belong to the first chunk even when it does.  This
      makes the function return a bogus physical address which may lead to
      crash.
      
      This problem was discovered by Cliff Wickman while investigating a
      crash during kdump on a SGI UV system.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NCliff Wickman <cpw@sgi.com>
      Tested-by: NCliff Wickman <cpw@sgi.com>
      Cc: stable@kernel.org
      9983b6f0
  33. 17 6月, 2010 1 次提交