1. 13 12月, 2016 1 次提交
  2. 20 10月, 2016 1 次提交
    • Z
      percpu: ensure the requested alignment is power of two · 3ca45a46
      zijun_hu 提交于
      The percpu allocator expectedly assumes that the requested alignment
      is power of two but hasn't been veryfing the input.  If the specified
      alignment isn't power of two, the allocator can malfunction.  Add the
      sanity check.
      
      The following is detailed analysis of the effects of alignments which
      aren't power of two.
      
       The alignment must be a even at least since the LSB of a chunk->map
       element is used as free/in-use flag of a area; besides, the alignment
       must be a power of 2 too since ALIGN() doesn't work well for other
       alignment always but is adopted by pcpu_fit_in_area().  IOW, the
       current allocator only works well for a power of 2 aligned area
       allocation.
      
       See below opposite example for why an odd alignment doesn't work.
       Let's assume area [16, 36) is free but its previous one is in-use, we
       want to allocate a @size == 8 and @align == 7 area.  The larger area
       [16, 36) is split to three areas [16, 21), [21, 29), [29, 36)
       eventually.  However, due to the usage for a chunk->map element, the
       actual offset of the aim area [21, 29) is 21 but is recorded in
       relevant element as 20; moreover, the residual tail free area [29,
       36) is mistook as in-use and is lost silently
      
       Unlike macro roundup(), ALIGN(x, a) doesn't work if @a isn't a power
       of 2 for example, roundup(10, 6) == 12 but ALIGN(10, 6) == 10, and
       the latter result isn't desired obviously.
      
      tj: Code style and patch description updates.
      Signed-off-by: Nzijun_hu <zijun_hu@htc.com>
      Suggested-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      3ca45a46
  3. 05 10月, 2016 2 次提交
    • Z
      mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk() · 9b739662
      zijun_hu 提交于
      in order to ensure the percpu group areas within a chunk aren't
      distributed too sparsely, pcpu_embed_first_chunk() goes to error handling
      path when a chunk spans over 3/4 VMALLOC area, however, during the error
      handling, it forget to free the memory allocated for all percpu groups by
      going to label @out_free other than @out_free_areas.
      
      it will cause memory leakage issue if the rare scene really happens, in
      order to fix the issue, we check chunk spanned area immediately after
      completing memory allocation for all percpu groups, we go to label
      @out_free_areas to free the memory then return if the checking is failed.
      
      in order to verify the approach, we dump all memory allocated then
      enforce the jump then dump all memory freed, the result is okay after
      checking whether we free all memory we allocate in this function.
      
      BTW, The approach is chosen after thinking over the below scenes
       - we don't go to label @out_free directly to fix this issue since we
         maybe free several allocated memory blocks twice
       - the aim of jumping after pcpu_setup_first_chunk() is bypassing free
         usable memory other than handling error, moreover, the function does
         not return error code in any case, it either panics due to BUG_ON()
         or return 0.
      Signed-off-by: Nzijun_hu <zijun_hu@htc.com>
      Tested-by: Nzijun_hu <zijun_hu@htc.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9b739662
    • Z
      mm/percpu.c: correct max_distance calculation for pcpu_embed_first_chunk() · 93c76b6b
      zijun_hu 提交于
      pcpu_embed_first_chunk() calculates the range a percpu chunk spans into
      @max_distance and uses it to ensure that a chunk is not too big compared
      to the total vmalloc area. However, during calculation, it used incorrect
      top address by adding a unit size to the highest group's base address.
      
      This can make the calculated max_distance slightly smaller than the actual
      distance although given the scale of values involved the error is very
      unlikely to have an actual impact.
      
      Fix this issue by adding the group's size instead of a unit size.
      
      BTW, The type of variable max_distance is changed from size_t to unsigned
      long too based on below consideration:
       - type unsigned long usually have same width with IP core registers and
         can be applied at here very well
       - make @max_distance type consistent with the operand calculated against
         it such as @ai->groups[i].base_offset and macro VMALLOC_TOTAL
       - type unsigned long is more universal then size_t, size_t is type defined
         to unsigned int or unsigned long among various ARCHs usually
      Signed-off-by: Nzijun_hu <zijun_hu@htc.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      93c76b6b
  4. 25 5月, 2016 2 次提交
  5. 18 3月, 2016 4 次提交
  6. 23 1月, 2016 1 次提交
  7. 06 11月, 2015 1 次提交
  8. 21 7月, 2015 1 次提交
  9. 25 6月, 2015 1 次提交
    • L
      mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc() · 8a8c35fa
      Larry Finger 提交于
      Beginning at commit d52d3997 ("ipv6: Create percpu rt6_info"), the
      following INFO splat is logged:
      
        ===============================
        [ INFO: suspicious RCU usage. ]
        4.1.0-rc7-next-20150612 #1 Not tainted
        -------------------------------
        kernel/sched/core.c:7318 Illegal context switch in RCU-bh read-side critical section!
        other info that might help us debug this:
        rcu_scheduler_active = 1, debug_locks = 0
         3 locks held by systemd/1:
         #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff815f0c8f>] rtnetlink_rcv+0x1f/0x40
         #1:  (rcu_read_lock_bh){......}, at: [<ffffffff816a34e2>] ipv6_add_addr+0x62/0x540
         #2:  (addrconf_hash_lock){+...+.}, at: [<ffffffff816a3604>] ipv6_add_addr+0x184/0x540
        stack backtrace:
        CPU: 0 PID: 1 Comm: systemd Not tainted 4.1.0-rc7-next-20150612 #1
        Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20   04/17/2014
        Call Trace:
          dump_stack+0x4c/0x6e
          lockdep_rcu_suspicious+0xe7/0x120
          ___might_sleep+0x1d5/0x1f0
          __might_sleep+0x4d/0x90
          kmem_cache_alloc+0x47/0x250
          create_object+0x39/0x2e0
          kmemleak_alloc_percpu+0x61/0xe0
          pcpu_alloc+0x370/0x630
      
      Additional backtrace lines are truncated.  In addition, the above splat
      is followed by several "BUG: sleeping function called from invalid
      context at mm/slub.c:1268" outputs.  As suggested by Martin KaFai Lau,
      these are the clue to the fix.  Routine kmemleak_alloc_percpu() always
      uses GFP_KERNEL for its allocations, whereas it should follow the gfp
      from its callers.
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: <stable@vger.kernel.org>	[3.18+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a8c35fa
  10. 25 3月, 2015 1 次提交
  11. 14 2月, 2015 1 次提交
  12. 29 10月, 2014 1 次提交
  13. 09 10月, 2014 1 次提交
    • T
      percpu: fix how @gfp is interpreted by the percpu allocator · 6ae833c7
      Tejun Heo 提交于
      When @gfp is specified, the percpu allocator is interested in whether
      it contains all of GFP_KERNEL or not.  If it does, the normal
      allocation path is taken; otherwise, the atomic allocation path.
      Unfortunately, pcpu_alloc() was incorrectly testing for whether @gfp
      contains any part of GFP_KERNEL.
      
      Fix it by testing "(gfp & GFP_KERNEL) != GFP_KERNEL" instead of
      "!(gfp & GFP_KERNEL)" to decide whether the allocation should be
      atomic or not.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6ae833c7
  14. 22 9月, 2014 1 次提交
  15. 09 9月, 2014 1 次提交
  16. 03 9月, 2014 10 次提交
    • T
      percpu: implement asynchronous chunk population · 1a4d7607
      Tejun Heo 提交于
      The percpu allocator now supports atomic allocations by only
      allocating from already populated areas but the mechanism to ensure
      that there's adequate amount of populated areas was missing.
      
      This patch expands pcpu_balance_work so that in addition to freeing
      excess free chunks it also populates chunks to maintain an adequate
      level of populated areas.  pcpu_alloc() schedules pcpu_balance_work if
      the amount of free populated areas is too low or after an atomic
      allocation failure.
      
      * PERPCU_DYNAMIC_RESERVE is increased by two pages to account for
        PCPU_EMPTY_POP_PAGES_LOW.
      
      * pcpu_async_enabled is added to gate both async jobs -
        chunk->map_extend_work and pcpu_balance_work - so that we don't end
        up scheduling them while the needed subsystems aren't up yet.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      1a4d7607
    • T
      percpu: rename pcpu_reclaim_work to pcpu_balance_work · fe6bd8c3
      Tejun Heo 提交于
      pcpu_reclaim_work will also be used to populate chunks asynchronously.
      Rename it to pcpu_balance_work in preparation.  pcpu_reclaim() is
      renamed to pcpu_balance_workfn() and some of its local variables are
      renamed too.
      
      This is pure rename.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      fe6bd8c3
    • T
      percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated · b539b87f
      Tejun Heo 提交于
      pcpu_nr_empty_pop_pages counts the number of empty populated pages
      across all chunks and chunk->nr_populated counts the number of
      populated pages in a chunk.  Both will be used to implement pre/async
      population for atomic allocations.
      
      pcpu_chunk_[de]populated() are added to update chunk->populated,
      chunk->nr_populated and pcpu_nr_empty_pop_pages together.  All
      successful chunk [de]populations should be followed by the
      corresponding pcpu_chunk_[de]populated() calls.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      b539b87f
    • T
      percpu: make sure chunk->map array has available space · 9c824b6a
      Tejun Heo 提交于
      An allocation attempt may require extending chunk->map array which
      requires GFP_KERNEL context which isn't available for atomic
      allocations.  This patch ensures that chunk->map array usually keeps
      some amount of available space by directly allocating buffer space
      during GFP_KERNEL allocations and scheduling async extension during
      atomic ones.  This should make atomic allocation failures from map
      space exhaustion rare.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9c824b6a
    • T
      percpu: implement [__]alloc_percpu_gfp() · 5835d96e
      Tejun Heo 提交于
      Now that pcpu_alloc_area() can allocate only from populated areas,
      it's easy to add atomic allocation support to [__]alloc_percpu().
      Update pcpu_alloc() so that it accepts @gfp and skips all the blocking
      operations and allocates only from the populated areas if @gfp doesn't
      contain GFP_KERNEL.  New interface functions [__]alloc_percpu_gfp()
      are added.
      
      While this means that atomic allocations are possible, this isn't
      complete yet as there's no mechanism to ensure that certain amount of
      populated areas is kept available and atomic allocations may keep
      failing under certain conditions.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5835d96e
    • T
      percpu: indent the population block in pcpu_alloc() · e04d3208
      Tejun Heo 提交于
      The next patch will conditionalize the population block in
      pcpu_alloc() which will end up making a rather large indentation
      change obfuscating the actual logic change.  This patch puts the block
      under "if (true)" so that the next patch can avoid indentation
      changes.  The defintions of the local variables which are used only in
      the block are moved into the block.
      
      This patch is purely cosmetic.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e04d3208
    • T
      percpu: make pcpu_alloc_area() capable of allocating only from populated areas · a16037c8
      Tejun Heo 提交于
      Update pcpu_alloc_area() so that it can skip unpopulated areas if the
      new parameter @pop_only is true.  This is implemented by a new
      function, pcpu_fit_in_area(), which determines the amount of head
      padding considering the alignment and populated state.
      
      @pop_only is currently always false but this will be used to implement
      atomic allocation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a16037c8
    • T
      percpu: restructure locking · b38d08f3
      Tejun Heo 提交于
      At first, the percpu allocator required a sleepable context for both
      alloc and free paths and used pcpu_alloc_mutex to protect everything.
      Later, pcpu_lock was introduced to protect the index data structure so
      that the free path can be invoked from atomic contexts.  The
      conversion only updated what's necessary and left most of the
      allocation path under pcpu_alloc_mutex.
      
      The percpu allocator is planned to add support for atomic allocation
      and this patch restructures locking so that the coverage of
      pcpu_alloc_mutex is further reduced.
      
      * pcpu_alloc() now grab pcpu_alloc_mutex only while creating a new
        chunk and populating the allocated area.  Everything else is now
        protected soley by pcpu_lock.
      
        After this change, multiple instances of pcpu_extend_area_map() may
        race but the function already implements sufficient synchronization
        using pcpu_lock.
      
        This also allows multiple allocators to arrive at new chunk
        creation.  To avoid creating multiple empty chunks back-to-back, a
        new chunk is created iff there is no other empty chunk after
        grabbing pcpu_alloc_mutex.
      
      * pcpu_lock is now held while modifying chunk->populated bitmap.
        After this, all data structures are protected by pcpu_lock.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      b38d08f3
    • T
      percpu: move region iterations out of pcpu_[de]populate_chunk() · a93ace48
      Tejun Heo 提交于
      Previously, pcpu_[de]populate_chunk() were called with the range which
      may contain multiple target regions in it and
      pcpu_[de]populate_chunk() iterated over the regions.  This has the
      benefit of batching up cache flushes for all the regions; however,
      we're planning to add more bookkeeping logic around [de]population to
      support atomic allocations and this delegation of iterations gets in
      the way.
      
      This patch moves the region iterations out of
      pcpu_[de]populate_chunk() into its callers - pcpu_alloc() and
      pcpu_reclaim() - so that we can later add logic to track more states
      around them.  This change may make cache and tlb flushes more frequent
      but multi-region [de]populations are rare anyway and if this actually
      becomes a problem, it's not difficult to factor out cache flushes as
      separate callbacks which are directly invoked from percpu.c.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a93ace48
    • T
      percpu: move common parts out of pcpu_[de]populate_chunk() · dca49645
      Tejun Heo 提交于
      percpu-vm and percpu-km implement separate versions of
      pcpu_[de]populate_chunk() and some part which is or should be common
      are currently in the specific implementations.  Make the following
      changes.
      
      * Allocate area clearing is moved from the pcpu_populate_chunk()
        implementations to pcpu_alloc().  This makes percpu-km's version
        noop.
      
      * Quick exit tests in pcpu_[de]populate_chunk() of percpu-vm are moved
        to their respective callers so that they are applied to percpu-km
        too.  This doesn't make any meaningful difference as both functions
        are noop for percpu-km; however, this is more consistent and will
        help implementing atomic allocation support.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      dca49645
  17. 16 8月, 2014 1 次提交
  18. 19 6月, 2014 1 次提交
  19. 15 4月, 2014 1 次提交
    • J
      percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree() · 5a838c3b
      Jianyu Zhan 提交于
      pcpu_chunk_struct_size = sizeof(struct pcpu_chunk) +
      	BITS_TO_LONGS(pcpu_unit_pages) * sizeof(unsigned long)
      
      It hardly could be ever bigger than PAGE_SIZE even for large-scale machine,
      but for consistency with its couterpart pcpu_mem_zalloc(),
      use pcpu_mem_free() instead.
      
      Commit b4916cb1 ("percpu: make pcpu_free_chunk() use
      pcpu_mem_free() instead of kfree()") addressed this problem, but
      missed this one.
      
      tj: commit message updated
      Signed-off-by: NJianyu Zhan <nasa4836@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: 099a19d9 ("percpu: allow limited allocation before slab is online)
      Cc: stable@vger.kernel.org
      5a838c3b
  20. 29 3月, 2014 1 次提交
  21. 18 3月, 2014 1 次提交
    • V
      percpu: allocation size should be even · 2f69fa82
      Viro 提交于
      723ad1d9 ("percpu: store offsets instead of lengths in ->map[]")
      updated percpu area allocator to use the lowest bit, instead of sign,
      to signify whether the area is occupied and forced min align to 2;
      unfortunately, it forgot to force the allocation size to be even
      causing malfunctions for the very rare odd-sized allocations.
      
      Always force the allocations to be even sized.
      
      tj: Wrote patch description.
      Original-patch-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      2f69fa82
  22. 07 3月, 2014 3 次提交
    • A
      percpu: speed alloc_pcpu_area() up · 3d331ad7
      Al Viro 提交于
      If we know that first N areas are all in use, we can obviously skip
      them when searching for a free one.  And that kind of hint is very
      easy to maintain.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      3d331ad7
    • A
      percpu: store offsets instead of lengths in ->map[] · 723ad1d9
      Al Viro 提交于
      Current code keeps +-length for each area in chunk->map[].  It has
      several unpleasant consequences:
      	* even if we know that first 50 areas are all in use, allocation
      still needs to go through all those areas just to sum their sizes, just
      to get the offset of free one.
      	* freeing needs to find the array entry refering to the area
      in question; again, the need to sum the sizes until we reach the offset
      we are interested in.  Note that offsets are monotonous, so simple
      binary search would do here.
      
      	New data representation: array of <offset,in-use flag> pairs.
      Each pair is represented by one int - we use offset|1 for <offset, in use>
      and offset for <offset, free> (we make sure that all offsets are even).
      In the end we put a sentry entry - <total size, in use>.  The first
      entry is <0, flag>; it would be possible to store together the flag
      for Nth area and offset for N+1st, but that leads to much hairier code.
      
      In other words, where the old variant would have
      	4, -8, -4, 4, -12, 100
      (4 bytes free, 8 in use, 4 in use, 4 free, 12 in use, 100 free) we store
      	<0,0>, <4,1>, <12,1>, <16,0>, <20,1>, <32,0>, <132,1>
      i.e.
      	0, 5, 13, 16, 21, 32, 133
      
      This commit switches to new data representation and takes care of a couple
      of low-hanging fruits in free_pcpu_area() - one is the switch to binary
      search, another is not doing two memmove() when one would do.  Speeding
      the alloc side up (by keeping track of how many areas in the beginning are
      known to be all in use) also becomes possible - that'll be done in the next
      commit.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      723ad1d9
    • A
      perpcu: fold pcpu_split_block() into the only caller · 706c16f2
      Al Viro 提交于
      ... and simplify the results a bit.  Makes the next step easier
      to deal with - we will be changing the data representation for
      chunk->map[] and it's easier to do if the code in question is
      not split between pcpu_alloc_area() and pcpu_split_block().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      706c16f2
  23. 22 1月, 2014 1 次提交
    • S
      mm/percpu.c: use memblock apis for early memory allocations · 999c17e3
      Santosh Shilimkar 提交于
      Switch to memblock interfaces for early memory allocator instead of
      bootmem allocator.  No functional change in beahvior than what it is in
      current code from bootmem users points of view.
      
      Archs already converted to NO_BOOTMEM now directly use memblock
      interfaces instead of bootmem wrappers build on top of memblock.  And
      the archs which still uses bootmem, these new apis just fallback to
      exiting bootmem APIs.
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Walmsley <paul@pwsan.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      999c17e3
  24. 21 1月, 2014 1 次提交