提交 · 5ccd30e40e731051f6d1eb02f7ac073c1ef9deba · openeuler / raspberrypi-kernel

21 6月, 2017 1 次提交

percpu: add missing lockdep_assert_held to func pcpu_free_area · 5ccd30e4

由 Dennis Zhou 提交于 6月 19, 2017

Add a missing lockdep_assert_held for pcpu_lock to improve consistency
and safety throughout mm/percpu.c.
Signed-off-by: NDennis Zhou <dennisz@fb.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

5ccd30e4

11 5月, 2017 1 次提交

mark most percpu globals as __ro_after_init · 1328710b

由 Daniel Micay 提交于 5月 10, 2017

Moving pcpu_base_addr to this section comes from PaX where it's part of
KERNEXEC. This extends it to the rest of the globals only written by the
init code.
Signed-off-by: NDaniel Micay <danielmicay@gmail.com>
Acked-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

1328710b

26 3月, 2017 1 次提交

lockdep: Fix per-cpu static objects · 8ce371f9

由 Peter Zijlstra 提交于 3月 20, 2017

Since commit 383776fa ("locking/lockdep: Handle statically initialized
PER_CPU locks properly") we try to collapse per-cpu locks into a single
class by giving them all the same key. For this key we choose the canonical
address of the per-cpu object, which would be the offset into the per-cpu
area.

This has two problems:

 - there is a case where we run !0 lock->key through static_obj() and
   expect this to pass; it doesn't for canonical pointers.

 - 0 is a valid canonical address.

Cure both issues by redefining the canonical address as the address of the
per-cpu variable on the boot CPU.

Since I didn't want to rely on CPU0 being the boot-cpu, or even existing at
all, track the boot CPU in a variable.

Fixes: 383776fa ("locking/lockdep: Handle statically initialized PER_CPU locks properly")
Reported-by: Nkernel test robot <fengguang.wu@intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NBorislav Petkov <bp@suse.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-mm@kvack.org
Cc: wfg@linux.intel.com
Cc: kernel test robot <fengguang.wu@intel.com>
Cc: LKP <lkp@01.org>
Link: http://lkml.kernel.org/r/20170320114108.kbvcsuepem45j5cr@hirez.programming.kicks-ass.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

8ce371f9

16 3月, 2017 1 次提交

locking/lockdep: Handle statically initialized PER_CPU locks properly · 383776fa

由 Thomas Gleixner 提交于 2月 27, 2017

If a PER_CPU struct which contains a spin_lock is statically initialized
via:

DEFINE_PER_CPU(struct foo, bla) = {
	.lock = __SPIN_LOCK_UNLOCKED(bla.lock)
};

then lockdep assigns a seperate key to each lock because the logic for
assigning a key to statically initialized locks is to use the address as
the key. With per CPU locks the address is obvioulsy different on each CPU.

That's wrong, because all locks should have the same key.

To solve this the following modifications are required:

 1) Extend the is_kernel/module_percpu_addr() functions to hand back the
    canonical address of the per CPU address, i.e. the per CPU address
    minus the per CPU offset.

 2) Check the lock address with these functions and if the per CPU check
    matches use the returned canonical address as the lock key, so all per
    CPU locks have the same key.

 3) Move the static_obj(key) check into look_up_lock_class() so this check
    can be avoided for statically initialized per CPU locks.  That's
    required because the canonical address fails the static_obj(key) check
    for obvious reasons.
Reported-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
[ Merged Dan's fixups for !MODULES and !SMP into this patch. ]
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Murphy <dmurphy@ti.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170227143736.pectaimkjkan5kow@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

383776fa

07 3月, 2017 1 次提交

percpu: acquire pcpu_lock when updating pcpu_nr_empty_pop_pages · 320661b0

由 Tahsin Erdogan 提交于 2月 25, 2017

Update to pcpu_nr_empty_pop_pages in pcpu_alloc() is currently done
without holding pcpu_lock. This can lead to bad updates to the variable.
Add missing lock calls.

Fixes: b539b87f ("percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated")
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v3.18+

320661b0

28 2月, 2017 1 次提交

scripts/spelling.txt: add "followings" pattern and fix typo instances · 4091fb95

由 Masahiro Yamada 提交于 2月 27, 2017

Fix typos and add the following to the scripts/spelling.txt:

  followings||following

While we are here, add a missing colon in the boilerplate in DT binding
documents.  The "you SoC" in allwinner,sunxi-pinctrl.txt was fixed as
well.

I reworded "as the followings:" to "as follows:" for
drivers/usb/gadget/udc/renesas_usb3.c.

Link: http://lkml.kernel.org/r/1481573103-11329-32-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4091fb95

13 12月, 2016 1 次提交

mm/percpu.c: fix panic triggered by BUG_ON() falsely · 8f606604

由 zijun_hu 提交于 12月 12, 2016

As shown by pcpu_build_alloc_info(), the number of units within a percpu
group is deduced by rounding up the number of CPUs within the group to
@upa boundary/ Therefore, the number of CPUs isn't equal to the units's
if it isn't aligned to @upa normally.  However, pcpu_page_first_chunk()
uses BUG_ON() to assert that one number is equal to the other roughly,
so a panic is maybe triggered by the BUG_ON() incorrectly.

In order to fix this issue, the number of CPUs is rounded up then
compared with units's and the BUG_ON() is replaced with a warning and
return of an error code as well, to keep system alive as much as
possible.

Link: http://lkml.kernel.org/r/57FCF07C.2020103@zoho.comSigned-off-by: Nzijun_hu <zijun_hu@htc.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8f606604

20 10月, 2016 1 次提交

percpu: ensure the requested alignment is power of two · 3ca45a46

由 zijun_hu 提交于 10月 14, 2016

The percpu allocator expectedly assumes that the requested alignment
is power of two but hasn't been veryfing the input.  If the specified
alignment isn't power of two, the allocator can malfunction.  Add the
sanity check.

The following is detailed analysis of the effects of alignments which
aren't power of two.

 The alignment must be a even at least since the LSB of a chunk->map
 element is used as free/in-use flag of a area; besides, the alignment
 must be a power of 2 too since ALIGN() doesn't work well for other
 alignment always but is adopted by pcpu_fit_in_area().  IOW, the
 current allocator only works well for a power of 2 aligned area
 allocation.

 See below opposite example for why an odd alignment doesn't work.
 Let's assume area [16, 36) is free but its previous one is in-use, we
 want to allocate a @size == 8 and @align == 7 area.  The larger area
 [16, 36) is split to three areas [16, 21), [21, 29), [29, 36)
 eventually.  However, due to the usage for a chunk->map element, the
 actual offset of the aim area [21, 29) is 21 but is recorded in
 relevant element as 20; moreover, the residual tail free area [29,
 36) is mistook as in-use and is lost silently

 Unlike macro roundup(), ALIGN(x, a) doesn't work if @a isn't a power
 of 2 for example, roundup(10, 6) == 12 but ALIGN(10, 6) == 10, and
 the latter result isn't desired obviously.

tj: Code style and patch description updates.
Signed-off-by: Nzijun_hu <zijun_hu@htc.com>
Suggested-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

3ca45a46

05 10月, 2016 2 次提交

mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk() · 9b739662

由 zijun_hu 提交于 10月 05, 2016

in order to ensure the percpu group areas within a chunk aren't
distributed too sparsely, pcpu_embed_first_chunk() goes to error handling
path when a chunk spans over 3/4 VMALLOC area, however, during the error
handling, it forget to free the memory allocated for all percpu groups by
going to label @out_free other than @out_free_areas.

it will cause memory leakage issue if the rare scene really happens, in
order to fix the issue, we check chunk spanned area immediately after
completing memory allocation for all percpu groups, we go to label
@out_free_areas to free the memory then return if the checking is failed.

in order to verify the approach, we dump all memory allocated then
enforce the jump then dump all memory freed, the result is okay after
checking whether we free all memory we allocate in this function.

BTW, The approach is chosen after thinking over the below scenes
 - we don't go to label @out_free directly to fix this issue since we
   maybe free several allocated memory blocks twice
 - the aim of jumping after pcpu_setup_first_chunk() is bypassing free
   usable memory other than handling error, moreover, the function does
   not return error code in any case, it either panics due to BUG_ON()
   or return 0.
Signed-off-by: Nzijun_hu <zijun_hu@htc.com>
Tested-by: Nzijun_hu <zijun_hu@htc.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

9b739662

mm/percpu.c: correct max_distance calculation for pcpu_embed_first_chunk() · 93c76b6b

由 zijun_hu 提交于 10月 05, 2016

pcpu_embed_first_chunk() calculates the range a percpu chunk spans into
@max_distance and uses it to ensure that a chunk is not too big compared
to the total vmalloc area. However, during calculation, it used incorrect
top address by adding a unit size to the highest group's base address.

This can make the calculated max_distance slightly smaller than the actual
distance although given the scale of values involved the error is very
unlikely to have an actual impact.

Fix this issue by adding the group's size instead of a unit size.

BTW, The type of variable max_distance is changed from size_t to unsigned
long too based on below consideration:
 - type unsigned long usually have same width with IP core registers and
   can be applied at here very well
 - make @max_distance type consistent with the operand calculated against
   it such as @ai->groups[i].base_offset and macro VMALLOC_TOTAL
 - type unsigned long is more universal then size_t, size_t is type defined
   to unsigned int or unsigned long among various ARCHs usually
Signed-off-by: Nzijun_hu <zijun_hu@htc.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

93c76b6b

25 5月, 2016 2 次提交

percpu: fix synchronization between synchronous map extension and chunk destruction · 6710e594

由 Tejun Heo 提交于 5月 25, 2016

For non-atomic allocations, pcpu_alloc() can try to extend the area
map synchronously after dropping pcpu_lock; however, the extension
wasn't synchronized against chunk destruction and the chunk might get
freed while extension is in progress.

This patch fixes the bug by putting most of non-atomic allocations
under pcpu_alloc_mutex to synchronize against pcpu_balance_work which
is responsible for async chunk management including destruction.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-and-tested-by: NAlexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-by: NVlastimil Babka <vbabka@suse.cz>
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Cc: stable@vger.kernel.org # v3.18+
Fixes: 1a4d7607 ("percpu: implement asynchronous chunk population")

6710e594

percpu: fix synchronization between chunk->map_extend_work and chunk destruction · 4f996e23

由 Tejun Heo 提交于 5月 25, 2016

Atomic allocations can trigger async map extensions which is serviced
by chunk->map_extend_work.  pcpu_balance_work which is responsible for
destroying idle chunks wasn't synchronizing properly against
chunk->map_extend_work and may end up freeing the chunk while the work
item is still in flight.

This patch fixes the bug by rolling async map extension operations
into pcpu_balance_work.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-and-tested-by: NAlexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-by: NVlastimil Babka <vbabka@suse.cz>
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Cc: stable@vger.kernel.org # v3.18+
Fixes: 9c824b6a ("percpu: make sure chunk->map array has available space")

4f996e23

18 3月, 2016 4 次提交

mm: percpu: use pr_fmt to prefix output · 870d4b12

由 Joe Perches 提交于 3月 17, 2016

Use the normal mechanism to make the logging output consistently
"percpu:" instead of a mix of "PERCPU:" and "percpu:"
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

870d4b12

mm: convert printk(KERN_<LEVEL> to pr_<level> · 1170532b

由 Joe Perches 提交于 3月 17, 2016

Most of the mm subsystem uses pr_<level> so make it consistent.

Miscellanea:

 - Realign arguments
 - Add missing newline to format
 - kmemleak-test.c has a "kmemleak: " prefix added to the
   "Kmemleak testing" logging message via pr_fmt
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: Tejun Heo <tj@kernel.org>	[percpu]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1170532b

mm: coalesce split strings · 756a025f

由 Joe Perches 提交于 3月 17, 2016

Kernel style prefers a single string over split strings when the string is
'user-visible'.

Miscellanea:

 - Add a missing newline
 - Realign arguments
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: Tejun Heo <tj@kernel.org>	[percpu]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

756a025f

mm: convert pr_warning to pr_warn · 598d8091

由 Joe Perches 提交于 3月 17, 2016

There are a mixture of pr_warning and pr_warn uses in mm.  Use pr_warn
consistently.

Miscellanea:

 - Coalesce formats
 - Realign arguments
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: Tejun Heo <tj@kernel.org>	[percpu]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

598d8091

23 1月, 2016 1 次提交

tree wide: use kvfree() than conditional kfree()/vfree() · 1d5cfdb0

由 Tetsuo Handa 提交于 1月 22, 2016

There are many locations that do

  if (memory_was_allocated_by_vmalloc)
    vfree(ptr);
  else
    kfree(ptr);

but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory
using is_vmalloc_addr().  Unless callers have special reasons, we can
replace this branch with kvfree().  Please check and reply if you found
problems.
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NJan Kara <jack@suse.com>
Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: NAndreas Dilger <andreas.dilger@intel.com>
Acked-by: N"Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Boris Petkov <bp@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d5cfdb0

06 11月, 2015 1 次提交

mm/percpu: use offset_in_page macro · f09f1243

由 Alexander Kuleshov 提交于 11月 05, 2015

linux/mm.h provides offset_in_page() macro.  Let's use already predefined
macro instead of (addr & ~PAGE_MASK).
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f09f1243

21 7月, 2015 1 次提交

percpu: clean up of schunk->map[] assignment in pcpu_setup_first_chunk · 292c24a0

由 Baoquan He 提交于 7月 20, 2015

The original assignment is a little redundent.
Signed-off-by: NBaoquan He <bhe@redhat.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

292c24a0

25 6月, 2015 1 次提交

mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc() · 8a8c35fa

由 Larry Finger 提交于 6月 24, 2015

Beginning at commit d52d3997 ("ipv6: Create percpu rt6_info"), the
following INFO splat is logged:

  ===============================
  [ INFO: suspicious RCU usage. ]
  4.1.0-rc7-next-20150612 #1 Not tainted
  -------------------------------
  kernel/sched/core.c:7318 Illegal context switch in RCU-bh read-side critical section!
  other info that might help us debug this:
  rcu_scheduler_active = 1, debug_locks = 0
   3 locks held by systemd/1:
   #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff815f0c8f>] rtnetlink_rcv+0x1f/0x40
   #1:  (rcu_read_lock_bh){......}, at: [<ffffffff816a34e2>] ipv6_add_addr+0x62/0x540
   #2:  (addrconf_hash_lock){+...+.}, at: [<ffffffff816a3604>] ipv6_add_addr+0x184/0x540
  stack backtrace:
  CPU: 0 PID: 1 Comm: systemd Not tainted 4.1.0-rc7-next-20150612 #1
  Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20   04/17/2014
  Call Trace:
    dump_stack+0x4c/0x6e
    lockdep_rcu_suspicious+0xe7/0x120
    ___might_sleep+0x1d5/0x1f0
    __might_sleep+0x4d/0x90
    kmem_cache_alloc+0x47/0x250
    create_object+0x39/0x2e0
    kmemleak_alloc_percpu+0x61/0xe0
    pcpu_alloc+0x370/0x630

Additional backtrace lines are truncated.  In addition, the above splat
is followed by several "BUG: sleeping function called from invalid
context at mm/slub.c:1268" outputs.  As suggested by Martin KaFai Lau,
these are the clue to the fix.  Routine kmemleak_alloc_percpu() always
uses GFP_KERNEL for its allocations, whereas it should follow the gfp
from its callers.
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NLarry Finger <Larry.Finger@lwfinger.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@vger.kernel.org>	[3.18+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8a8c35fa

25 3月, 2015 1 次提交

percpu: Fix trivial typos in comments · bffc4375

由 Yannick Guerrini 提交于 3月 06, 2015

Change 'tranlated' to 'translated'
Change 'mutliples' to 'multiples'
Signed-off-by: NYannick Guerrini <yguerrini@tomshardware.fr>
Signed-off-by: NTejun Heo <tj@kernel.org>

bffc4375

14 2月, 2015 1 次提交

percpu: use %*pb[l] to print bitmaps including cpumasks and nodemasks · 807de073

由 Tejun Heo 提交于 2月 13, 2015

printk and friends can now format bitmaps using '%*pb[l]'.  cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

807de073

29 10月, 2014 1 次提交

percpu: off by one in BUG_ON() · 9f295664

由 Dan Carpenter 提交于 10月 29, 2014

The unit_map[] array has "nr_cpu_ids" number of elements.  It's
allocated a few lines earlier in the function.  So this test should be
>= instead of >.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

9f295664

09 10月, 2014 1 次提交

percpu: fix how @gfp is interpreted by the percpu allocator · 6ae833c7

由 Tejun Heo 提交于 10月 08, 2014

When @gfp is specified, the percpu allocator is interested in whether
it contains all of GFP_KERNEL or not.  If it does, the normal
allocation path is taken; otherwise, the atomic allocation path.
Unfortunately, pcpu_alloc() was incorrectly testing for whether @gfp
contains any part of GFP_KERNEL.

Fix it by testing "(gfp & GFP_KERNEL) != GFP_KERNEL" instead of
"!(gfp & GFP_KERNEL)" to decide whether the allocation should be
atomic or not.
Signed-off-by: NTejun Heo <tj@kernel.org>

6ae833c7

22 9月, 2014 1 次提交

Revert "percpu: free percpu allocation info for uniprocessor system" · bb2e226b

由 Guenter Roeck 提交于 9月 21, 2014

This reverts commit 3189eddb ("percpu: free percpu allocation info for
uniprocessor system").

The commit causes a hang with a crisv32 image. This may be an architecture
problem, but at least for now the revert is necessary to be able to boot a
crisv32 image.

Cc: Tejun Heo <tj@kernel.org>
Cc: Honggang Li <enjoymindful@gmail.com>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NTejun Heo <tj@kernel.org>
Fixes: 3189eddb ("percpu: free percpu allocation info for uniprocessor system")
Cc: stable@vger.kernel.org # Please don't apply 3189eddb

bb2e226b

09 9月, 2014 1 次提交

percpu: fix locking regression in the failure path of pcpu_alloc() · 23cb8981

由 Tejun Heo 提交于 9月 09, 2014

While updating locking, b38d08f3 ("percpu: restructure locking")
broke pcpu_create_chunk() creation path in pcpu_alloc().  It returns
without releasing pcpu_alloc_mutex.  Fix it.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NJulia Lawall <julia.lawall@lip6.fr>

23cb8981

03 9月, 2014 10 次提交

percpu: implement asynchronous chunk population · 1a4d7607

由 Tejun Heo 提交于 9月 02, 2014

The percpu allocator now supports atomic allocations by only
allocating from already populated areas but the mechanism to ensure
that there's adequate amount of populated areas was missing.

This patch expands pcpu_balance_work so that in addition to freeing
excess free chunks it also populates chunks to maintain an adequate
level of populated areas.  pcpu_alloc() schedules pcpu_balance_work if
the amount of free populated areas is too low or after an atomic
allocation failure.

* PERPCU_DYNAMIC_RESERVE is increased by two pages to account for
  PCPU_EMPTY_POP_PAGES_LOW.

* pcpu_async_enabled is added to gate both async jobs -
  chunk->map_extend_work and pcpu_balance_work - so that we don't end
  up scheduling them while the needed subsystems aren't up yet.
Signed-off-by: NTejun Heo <tj@kernel.org>

1a4d7607

percpu: rename pcpu_reclaim_work to pcpu_balance_work · fe6bd8c3

由 Tejun Heo 提交于 9月 02, 2014

pcpu_reclaim_work will also be used to populate chunks asynchronously.
Rename it to pcpu_balance_work in preparation.  pcpu_reclaim() is
renamed to pcpu_balance_workfn() and some of its local variables are
renamed too.

This is pure rename.
Signed-off-by: NTejun Heo <tj@kernel.org>

fe6bd8c3

percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated · b539b87f

由 Tejun Heo 提交于 9月 02, 2014

pcpu_nr_empty_pop_pages counts the number of empty populated pages
across all chunks and chunk->nr_populated counts the number of
populated pages in a chunk.  Both will be used to implement pre/async
population for atomic allocations.

pcpu_chunk_[de]populated() are added to update chunk->populated,
chunk->nr_populated and pcpu_nr_empty_pop_pages together.  All
successful chunk [de]populations should be followed by the
corresponding pcpu_chunk_[de]populated() calls.
Signed-off-by: NTejun Heo <tj@kernel.org>

b539b87f

percpu: make sure chunk->map array has available space · 9c824b6a

由 Tejun Heo 提交于 9月 02, 2014

An allocation attempt may require extending chunk->map array which
requires GFP_KERNEL context which isn't available for atomic
allocations.  This patch ensures that chunk->map array usually keeps
some amount of available space by directly allocating buffer space
during GFP_KERNEL allocations and scheduling async extension during
atomic ones.  This should make atomic allocation failures from map
space exhaustion rare.
Signed-off-by: NTejun Heo <tj@kernel.org>

9c824b6a

percpu: implement [__]alloc_percpu_gfp() · 5835d96e

由 Tejun Heo 提交于 9月 02, 2014

Now that pcpu_alloc_area() can allocate only from populated areas,
it's easy to add atomic allocation support to [__]alloc_percpu().
Update pcpu_alloc() so that it accepts @gfp and skips all the blocking
operations and allocates only from the populated areas if @gfp doesn't
contain GFP_KERNEL.  New interface functions [__]alloc_percpu_gfp()
are added.

While this means that atomic allocations are possible, this isn't
complete yet as there's no mechanism to ensure that certain amount of
populated areas is kept available and atomic allocations may keep
failing under certain conditions.
Signed-off-by: NTejun Heo <tj@kernel.org>

5835d96e

percpu: indent the population block in pcpu_alloc() · e04d3208

由 Tejun Heo 提交于 9月 02, 2014

The next patch will conditionalize the population block in
pcpu_alloc() which will end up making a rather large indentation
change obfuscating the actual logic change.  This patch puts the block
under "if (true)" so that the next patch can avoid indentation
changes.  The defintions of the local variables which are used only in
the block are moved into the block.

This patch is purely cosmetic.
Signed-off-by: NTejun Heo <tj@kernel.org>

e04d3208

percpu: make pcpu_alloc_area() capable of allocating only from populated areas · a16037c8

由 Tejun Heo 提交于 9月 02, 2014

Update pcpu_alloc_area() so that it can skip unpopulated areas if the
new parameter @pop_only is true.  This is implemented by a new
function, pcpu_fit_in_area(), which determines the amount of head
padding considering the alignment and populated state.

@pop_only is currently always false but this will be used to implement
atomic allocation.
Signed-off-by: NTejun Heo <tj@kernel.org>

a16037c8

percpu: restructure locking · b38d08f3

由 Tejun Heo 提交于 9月 02, 2014

At first, the percpu allocator required a sleepable context for both
alloc and free paths and used pcpu_alloc_mutex to protect everything.
Later, pcpu_lock was introduced to protect the index data structure so
that the free path can be invoked from atomic contexts.  The
conversion only updated what's necessary and left most of the
allocation path under pcpu_alloc_mutex.

The percpu allocator is planned to add support for atomic allocation
and this patch restructures locking so that the coverage of
pcpu_alloc_mutex is further reduced.

* pcpu_alloc() now grab pcpu_alloc_mutex only while creating a new
  chunk and populating the allocated area.  Everything else is now
  protected soley by pcpu_lock.

  After this change, multiple instances of pcpu_extend_area_map() may
  race but the function already implements sufficient synchronization
  using pcpu_lock.

  This also allows multiple allocators to arrive at new chunk
  creation.  To avoid creating multiple empty chunks back-to-back, a
  new chunk is created iff there is no other empty chunk after
  grabbing pcpu_alloc_mutex.

* pcpu_lock is now held while modifying chunk->populated bitmap.
  After this, all data structures are protected by pcpu_lock.
Signed-off-by: NTejun Heo <tj@kernel.org>

b38d08f3

percpu: move region iterations out of pcpu_[de]populate_chunk() · a93ace48

由 Tejun Heo 提交于 9月 02, 2014

Previously, pcpu_[de]populate_chunk() were called with the range which
may contain multiple target regions in it and
pcpu_[de]populate_chunk() iterated over the regions.  This has the
benefit of batching up cache flushes for all the regions; however,
we're planning to add more bookkeeping logic around [de]population to
support atomic allocations and this delegation of iterations gets in
the way.

This patch moves the region iterations out of
pcpu_[de]populate_chunk() into its callers - pcpu_alloc() and
pcpu_reclaim() - so that we can later add logic to track more states
around them.  This change may make cache and tlb flushes more frequent
but multi-region [de]populations are rare anyway and if this actually
becomes a problem, it's not difficult to factor out cache flushes as
separate callbacks which are directly invoked from percpu.c.
Signed-off-by: NTejun Heo <tj@kernel.org>

a93ace48

percpu: move common parts out of pcpu_[de]populate_chunk() · dca49645

由 Tejun Heo 提交于 9月 02, 2014

percpu-vm and percpu-km implement separate versions of
pcpu_[de]populate_chunk() and some part which is or should be common
are currently in the specific implementations.  Make the following
changes.

* Allocate area clearing is moved from the pcpu_populate_chunk()
  implementations to pcpu_alloc().  This makes percpu-km's version
  noop.

* Quick exit tests in pcpu_[de]populate_chunk() of percpu-vm are moved
  to their respective callers so that they are applied to percpu-km
  too.  This doesn't make any meaningful difference as both functions
  are noop for percpu-km; however, this is more consistent and will
  help implementing atomic allocation support.
Signed-off-by: NTejun Heo <tj@kernel.org>

dca49645

16 8月, 2014 1 次提交

percpu: free percpu allocation info for uniprocessor system · 3189eddb

由 Honggang Li 提交于 8月 12, 2014

Currently, only SMP system free the percpu allocation info.
Uniprocessor system should free it too. For example, one x86 UML
virtual machine with 256MB memory, UML kernel wastes one page memory.
Signed-off-by: NHonggang Li <enjoymindful@gmail.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org

3189eddb

19 6月, 2014 1 次提交
- C
  percpu: Use ALIGN macro instead of hand coding alignment calculation · fb009e3a
  由 Christoph Lameter 提交于 6月 19, 2014
```
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
```
  fb009e3a
15 4月, 2014 1 次提交

percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree() · 5a838c3b

由 Jianyu Zhan 提交于 4月 14, 2014

pcpu_chunk_struct_size = sizeof(struct pcpu_chunk) +
	BITS_TO_LONGS(pcpu_unit_pages) * sizeof(unsigned long)

It hardly could be ever bigger than PAGE_SIZE even for large-scale machine,
but for consistency with its couterpart pcpu_mem_zalloc(),
use pcpu_mem_free() instead.

Commit b4916cb1 ("percpu: make pcpu_free_chunk() use
pcpu_mem_free() instead of kfree()") addressed this problem, but
missed this one.

tj: commit message updated
Signed-off-by: NJianyu Zhan <nasa4836@gmail.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Fixes: 099a19d9 ("percpu: allow limited allocation before slab is online)
Cc: stable@vger.kernel.org

5a838c3b

29 3月, 2014 1 次提交

percpu: renew the max_contig if we merge the head and previous block · 21ddfd38

由 Jianyu Zhan 提交于 3月 28, 2014

During pcpu_alloc_area(), we might merge the current head with the
previous block. Since we have calculated the max_contig using the
size of previous block before we skip it, and now we update the size
of previous block, so we should renew the max_contig.
Signed-off-by: NJianyu Zhan <nasa4836@gmail.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

21ddfd38