提交 · 05c5df31afd1092ca6322094d22aff6351fa67fe · openeuler / Kernel

28 5月, 2015 6 次提交

rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT · 05c5df31

由 Paul E. McKenney 提交于 4月 20, 2015

This commit introduces an RCU_FANOUT C-preprocessor macro so that RCU will
build even when CONFIG_RCU_FANOUT is undefined. The RCU_FANOUT macro is
set to the value of CONFIG_RCU_FANOUT when defined, otherwise it is set
to 32 for 32-bit systems and 64 for 64-bit systems. This commit then
makes CONFIG_RCU_FANOUT depend on CONFIG_RCU_EXPERT, so that Kconfig
users won't be asked about CONFIG_RCU_FANOUT unless they want to be.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>

05c5df31

rcu: Break dependency of RCU_FANOUT_LEAF on RCU_FANOUT · 8739c5cb

由 Paul E. McKenney 提交于 4月 20, 2015

RCU_FANOUT_LEAF's range and default values depend on the value of
RCU_FANOUT, which at the time seemed like a cute way to save two lines
of Kconfig code.  However, adding a dependency from both of these
Kconfig parameters on RCU_EXPERT requires that RCU_FANOUT_LEAF operate
correctly even if RCU_FANOUT is undefined.  This commit therefore
allows RCU_FANOUT_LEAF to take on the full range of permitted values,
even in cases where RCU_FANOUT is undefined.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Eliminate redundant "default" as suggested by Pranith Kumar. ]
Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>

8739c5cb

rcu: Create RCU_EXPERT Kconfig and hide booleans behind it · 78cae10b

由 Paul E. McKenney 提交于 4月 20, 2015

This commit creates an RCU_EXPERT Kconfig and hides the independent
boolean RCU-related user-visible Kconfig parameters behind it, namely
RCU_FAST_NO_HZ and RCU_BOOST. This prevents Kconfig from asking about
these parameters unless the user really wants to be asked.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>

78cae10b

rcu: Convert CONFIG_RCU_FANOUT_EXACT to boot parameter · 7fa27001

由 Paul E. McKenney 提交于 4月 20, 2015

The CONFIG_RCU_FANOUT_EXACT Kconfig parameter is used primarily (and
perhaps only) by rcutorture to verify that RCU works correctly in specific
rcu_node combining-tree configurations.  It therefore does not make
much sense have this as a question to people attempting to configure
their kernels.  So this commit creates an rcutree.rcu_fanout_exact=
boot parameter that rcutorture can use, and eliminates the original
CONFIG_RCU_FANOUT_EXACT Kconfig parameter.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>

7fa27001

rcu: Directly drive RCU_USER_QS from Kconfig · 7db21edf

由 Paul E. McKenney 提交于 4月 20, 2015

Currently, Kconfig will ask the user whether RCU_USER_QS should be set.
This is silly because Kconfig already has all the information that it
needs to set this parameter.  This commit therefore directly drives
the value of RCU_USER_QS via NO_HZ_FULL's "select" statement.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>

7db21edf

rcu: Directly drive TASKS_RCU from Kconfig · 82d0f4c0

由 Paul E. McKenney 提交于 4月 20, 2015

Currently, Kconfig will ask the user whether TASKS_RCU should be set.
This is silly because Kconfig already has all the information that it
needs to set this parameter.  This commit therefore directly drives
the value of TASKS_RCU via "select" statements.  Which means that
as subsystems require TASKS_RCU, those subsystems will need to add
"select" statements of their own.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>

82d0f4c0

06 5月, 2015 1 次提交

init: fix regression by supporting devices with major:minor:offset format · cb31ef48

由 Chen Yu 提交于 5月 03, 2015

Commit 283e7ad0 ("init: stricter checking of major:minor root=
values") was so strict that it exposed the fact that a previously
unknown device format was being used.

Distributions like Ubuntu uses klibc (rather than uswsusp) to resume
system from hibernation.  klibc expressed the swap partition/file in
the form of major:minor:offset.  For example, 8:3:0 represents a swap
partition in klibc, and klibc's resume process in initrd will finally
echo 8:3:0 to /sys/power/resume for manually resuming.  However, due
to commit 283e7ad0's stricter checking, 8:3:0 will be treated as an
invalid device format, and manual resuming from hibernation will fail.

Fix this by adding support for devices with major:minor:offset format
when resuming from hibernation.
Reported-by: NPrigent, Christophe <christophe.prigent@intel.com>
Signed-off-by: NChen Yu <yu.c.chen@intel.com>
Acked-by: NRafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

cb31ef48

17 4月, 2015 1 次提交

kernel/fork.c: new function for max_threads · ff691f6e

由 Heinrich Schuchardt 提交于 4月 16, 2015

PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
THREAD_SIZE.

E.g.  architecture hexagon may have page size 1M and thread size 4096.
This would lead to a division by zero in the calculation of max_threads.

With this patch the buggy code is moved to a separate function
set_max_threads.  The error is not fixed.

After fixing the problem in a separate patch the new function can be
reused to adjust max_threads after adding or removing memory.

Argument mempages of function fork_init() is removed as totalram_pages is
an exported symbol.

The creation of separate patches for refactoring to a new function and for
fixing the logic was suggested by Ingo Molnar.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ff691f6e

16 4月, 2015 3 次提交

kernel: conditionally support non-root users, groups and capabilities · 2813893f

由 Iulia Manda 提交于 4月 15, 2015

There are a lot of embedded systems that run most or all of their
functionality in init, running as root:root.  For these systems,
supporting multiple users is not necessary.

This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for
non-root users, non-root groups, and capabilities optional.  It is enabled
under CONFIG_EXPERT menu.

When this symbol is not defined, UID and GID are zero in any possible case
and processes always have all capabilities.

The following syscalls are compiled out: setuid, setregid, setgid,
setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
getgroups, setfsuid, setfsgid, capget, capset.

Also, groups.c is compiled out completely.

In kernel/capability.c, capable function was moved in order to avoid
adding two ifdef blocks.

This change saves about 25 KB on a defconfig build.  The most minimal
kernels have total text sizes in the high hundreds of kB rather than
low MB.  (The 25k goes down a bit with allnoconfig, but not that much.

The kernel was booted in Qemu.  All the common functionalities work.
Adding users/groups is not possible, failing with -ENOSYS.

Bloat-o-meter output:
add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650)

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NIulia Manda <iulia.manda21@gmail.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2813893f

init: stricter checking of major:minor root= values · 283e7ad0

由 Dan Ehrenberg 提交于 2月 10, 2015

In the kernel command-line, previously, root=1:2jakshflaksjdhfa would
be accepted and interpreted just like root=1:2. This patch adds
stricter checking so that additional characters after major:minor are
rejected by root=.

The goal of this change is to help in unifying DM's interpretation of
its block device argument by using existing kernel code (name_to_dev_t).
But DM rejects malformed major:minor pairs, it seems reasonable for
root= to reject them as well.
Signed-off-by: NDan Ehrenberg <dehrenberg@chromium.org>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

283e7ad0

init: export name_to_dev_t and mark name argument as const · e6e20a7a

由 Dan Ehrenberg 提交于 2月 10, 2015

DM will switch its device lookup code to using name_to_dev_t() so it
must be exported.  Also, the @name argument should be marked const.
Signed-off-by: NDan Ehrenberg <dehrenberg@chromium.org>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e6e20a7a

15 4月, 2015 1 次提交

lib/ioremap.c: add huge I/O map capability interfaces · 0ddab1d2

由 Toshi Kani 提交于 4月 14, 2015

Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which return 1 when
I/O mappings with pud/pmd are enabled on the kernel.

ioremap_huge_init() calls arch_ioremap_pud_supported() and
arch_ioremap_pmd_supported() to initialize the capabilities at boot-time.

A new kernel option "nohugeiomap" is also added, so that user can disable
the huge I/O map capabilities when necessary.
Signed-off-by: NToshi Kani <toshi.kani@hp.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Robert Elliott <Elliott@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ddab1d2

13 4月, 2015 1 次提交

cpu: Defer smpboot kthread unparking until CPU known to scheduler · 00df35f9

由 Paul E. McKenney 提交于 4月 12, 2015

Currently, smpboot_unpark_threads() is invoked before the incoming CPU
has been added to the scheduler's runqueue structures.  This might
potentially cause the unparked kthread to run on the wrong CPU, since the
correct CPU isn't fully set up yet.

That causes a sporadic, hard to debug boot crash triggering on some
systems, reported by Borislav Petkov, and bisected down to:

  2a442c9c ("x86: Use common outgoing-CPU-notification code")

This patch places smpboot_unpark_threads() in a CPU hotplug
notifier with priority set so that these kthreads are unparked just after
the CPU has been added to the runqueues.
Reported-and-tested-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

00df35f9

11 4月, 2015 1 次提交

Documentation/memcg: update memcg/kmem status · 19717542

由 Vladimir Davydov 提交于 4月 01, 2015

Memcg/kmem reclaim support has been finally merged. Reflect this in the
documentation.
Acked-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

19717542

02 4月, 2015 1 次提交

bpf: Fix the build on BPF_SYSCALL=y && !CONFIG_TRACING kernels, make it more configurable · e1abf2cc

由 Ingo Molnar 提交于 4月 02, 2015

So bpf_tracing.o depends on CONFIG_BPF_SYSCALL - but that's not its only
dependency, it also depends on the tracing infrastructure and on kprobes,
without which it will fail to build with:

  In file included from kernel/trace/bpf_trace.c:14:0:
  kernel/trace/trace.h: In function ‘trace_test_and_set_recursion’:
  kernel/trace/trace.h:491:28: error: ‘struct task_struct’ has no member named ‘trace_recursion’
    unsigned int val = current->trace_recursion;
  [...]

It took quite some time to trigger this build failure, because right now
BPF_SYSCALL is very obscure, depends on CONFIG_EXPERT. So also make BPF_SYSCALL
more configurable, not just under CONFIG_EXPERT.

If BPF_SYSCALL, tracing and kprobes are enabled then enable the bpf_tracing
gateway as well.

We might want to make this an interactive option later on, although
I'd not complicate it unnecessarily: enabling BPF_SYSCALL is enough of
an indicator that the user wants BPF support.

Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

e1abf2cc

07 3月, 2015 1 次提交

init/main: fix reset_device comment · fc454fdc

由 Frans Klaver 提交于 2月 23, 2015

Fix incorrect spelling of situation.
Signed-off-by: NFrans Klaver <fransklaver@gmail.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

fc454fdc

05 3月, 2015 1 次提交

cpuset: initialize cpuset a bit early · 695df213

由 Zefan Li 提交于 3月 04, 2015

Now we call ss->bind() in cgroup_init(), so cgroup_init() will
call cpuset_bind() and then the latter will access top_cpuset's
cpumask, which is NULL, because cpuset_init() is called after
cgroup_init()

The simplest fix is to swap cgroup_init() and cpuset_init().

Cc: Vladimir Davydov <vdavydov@parallels.com>
Fixes: 295458e6 ("cgroup: call cgroup_subsys->bind on cgroup subsys initialization")
Reported by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: NZefan Li <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NVladimir Davydov <vdavydov@parallels.com>

695df213

27 2月, 2015 1 次提交

rcu: Add Kconfig option to expedite grace periods during boot · ee42571f

由 Paul E. McKenney 提交于 2月 19, 2015

This commit adds a CONFIG_RCU_EXPEDITE_BOOT Kconfig parameter
that emulates a very early boot rcu_expedite_gp().  A late-boot
call to rcu_end_inkernel_boot() will provide the corresponding
rcu_unexpedite_gp().  The late-boot call to rcu_end_inkernel_boot()
should be made just before init is spawned.

According to Arjan:

> To show the boot time, I'm using the timestamp of the "Write protecting"
> line, that's pretty much the last thing we print prior to ring 3 execution.
>
> A kernel with default RCU behavior (inside KVM, only virtual devices)
> looks like this:
>
> [    0.038724] Write protecting the kernel read-only data: 10240k
>
> a kernel with expedited RCU (using the command line option, so that I
> don't have to recompile between measurements and thus am completely
> oranges-to-oranges)
>
> [    0.031768] Write protecting the kernel read-only data: 10240k
>
> which, in percentage, is an 18% improvement.
Reported-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NArjan van de Ven <arjan@linux.intel.com>

ee42571f

14 2月, 2015 1 次提交

init: remove CONFIG_INIT_FALLBACK · 5125991c

由 Andy Lutomirski 提交于 2月 13, 2015

CONFIG_INIT_FALLBACK adds config bloat without an obvious use case that
makes it worth keeping around.  Delete it.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Frank Rowand <frowand.list@gmail.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rob Landley <rob@landley.net>
Cc: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5125991c

22 1月, 2015 1 次提交

init: Get rid of x86isms · 30b8b006

由 Thomas Gleixner 提交于 1月 15, 2015

The UP local API support can be set up from an early initcall. No need
for horrible hackery in the init code.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20150115211703.827943883@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

30b8b006

16 1月, 2015 1 次提交

rcu: Optionally run grace-period kthreads at real-time priority · a94844b2

由 Paul E. McKenney 提交于 12月 12, 2014

Recent testing has shown that under heavy load, running RCU's grace-period
kthreads at real-time priority can improve performance (according to 0day
test robot) and reduce the incidence of RCU CPU stall warnings. However,
most systems do just fine with the default non-realtime priorities for
these kthreads, and it does not make sense to expose the entire user
base to any risk stemming from this change, given that this change is
of use only to a few users running extremely heavy workloads.

Therefore, this commit allows users to specify realtime priorities
for the grace-period kthreads, but leaves them running SCHED_OTHER
by default. The realtime priority may be specified at build time
via the RCU_KTHREAD_PRIO Kconfig parameter, or at boot time via the
rcutree.kthread_prio parameter. Either way, 0 says to continue the
default SCHED_OTHER behavior and values from 1-99 specify that priority
of SCHED_FIFO behavior. Note that a value of 0 is not permitted when
the RCU_BOOST Kconfig parameter is specified.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

a94844b2

08 1月, 2015 1 次提交

kbuild: trivial - fix the help doc of CONFIG_CC_OPTIMIZE_FOR_SIZE · 31a4af7f

由 Masahiro Yamada 提交于 8月 05, 2014

Other than GCC, we have another choice, Clang for building the kernel
these days.  It seems better to say "compiler" rather than "gcc".
Signed-off-by: NMasahiro Yamada <yamada.m@jp.panasonic.com>
Signed-off-by: NMichal Marek <mmarek@suse.cz>

31a4af7f

07 1月, 2015 3 次提交

kconfig: use bool instead of boolean for type definition attributes · 6341e62b

由 Christoph Jaeger 提交于 12月 20, 2014

Support for keyword 'boolean' will be dropped later on.

No functional change.

Reference: http://lkml.kernel.org/r/cover.1418003065.git.cj@linux.comSigned-off-by: NChristoph Jaeger <cj@linux.com>
Signed-off-by: NMichal Marek <mmarek@suse.cz>

6341e62b

rcu: Make SRCU optional by using CONFIG_SRCU · 83fe27ea

由 Pranith Kumar 提交于 12月 05, 2014

SRCU is not necessary to be compiled by default in all cases. For tinification
efforts not compiling SRCU unless necessary is desirable.

The current patch tries to make compiling SRCU optional by introducing a new
Kconfig option CONFIG_SRCU which is selected when any of the components making
use of SRCU are selected.

If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.

   text    data     bss     dec     hex filename
   2007       0       0    2007     7d7 kernel/rcu/srcu.o

Size of arch/powerpc/boot/zImage changes from

   text    data     bss     dec     hex filename
 831552   64180   23944  919676   e087c arch/powerpc/boot/zImage : before
 829504   64180   23952  917636   e0084 arch/powerpc/boot/zImage : after

so the savings are about ~2000 bytes.
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]

83fe27ea

rcu: Remove "select IRQ_WORK" from config TREE_RCU · 5a43b88e

由 Lai Jiangshan 提交于 12月 24, 2014

The 48a7639c ("rcu: Make callers awaken grace-period kthread")
removed the irq_work_queue(), so the TREE_RCU doesn't need
irq work any more.  This commit therefore updates RCU's Kconfig and
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

5a43b88e

17 12月, 2014 1 次提交

init: fix read-write root mount · 10975933

由 Miklos Szeredi 提交于 11月 20, 2014

If mount flags don't have MS_RDONLY, iso9660 returns EACCES without actually
checking if it's an iso image.

This tricks mount_block_root() into retrying with MS_RDONLY.  This results
in a read-only root despite the "rw" boot parameter if the actual
filesystem was checked after iso9660.

I believe the behavior of iso9660 is okay, while that of mount_block_root()
is not.  It should rather try all types without MS_RDONLY and only then
retry with MS_RDONLY.

This change also makes the code more robust against the case when EACCES is
returned despite MS_RDONLY, which would've resulted in a lockup.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

10975933

15 12月, 2014 1 次提交

tracing: Move enabling tracepoints to just after rcu_init() · 5f893b26

由 Steven Rostedt (Red Hat) 提交于 12月 12, 2014

Enabling tracepoints at boot up can be very useful. The tracepoint
can be initialized right after RCU has been. There's no need to
wait for the early_initcall() to be called. That's too late for some
things that can use tracepoints for debugging. Move the logic to
enable tracepoints out of the initcalls and into init/main.c to
right after rcu_init().

This also allows trace_printk() to be used early too.

Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos
Link: http://lkml.kernel.org/r/20141214164104.307127356@goodmis.orgReviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

5f893b26

14 12月, 2014 1 次提交

mm/page_ext: resurrect struct page extending code for debugging · eefa864b

由 Joonsoo Kim 提交于 12月 12, 2014

When we debug something, we'd like to insert some information to every
page.  For this purpose, we sometimes modify struct page itself.  But,
this has drawbacks.  First, it requires re-compile.  This makes us
hesitate to use the powerful debug feature so development process is
slowed down.  And, second, sometimes it is impossible to rebuild the
kernel due to third party module dependency.  At third, system behaviour
would be largely different after re-compile, because it changes size of
struct page greatly and this structure is accessed by every part of
kernel.  Keeping this as it is would be better to reproduce errornous
situation.

This feature is intended to overcome above mentioned problems.  This
feature allocates memory for extended data per page in certain place
rather than the struct page itself.  This memory can be accessed by the
accessor functions provided by this code.  During the boot process, it
checks whether allocation of huge chunk of memory is needed or not.  If
not, it avoids allocating memory at all.  With this advantage, we can
include this feature into the kernel in default and can avoid rebuild and
solve related problems.

Until now, memcg uses this technique.  But, now, memcg decides to embed
their variable to struct page itself and it's code to extend struct page
has been removed.  I'd like to use this code to develop debug feature, so
this patch resurrect it.

To help these things to work well, this patch introduces two callbacks for
clients.  One is the need callback which is mandatory if user wants to
avoid useless memory allocation at boot-time.  The other is optional, init
callback, which is used to do proper initialization after memory is
allocated.  Detailed explanation about purpose of these functions is in
code comment.  Please refer it.

Others are completely same with previous extension code in memcg.
Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eefa864b

11 12月, 2014 8 次提交

take the targets of /proc/*/ns/* symlinks to separate fs · e149ed2b

由 Al Viro 提交于 11月 01, 2014

New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now.
It's not mountable (not even registered, so it's not in /proc/filesystems,
etc.). Files on it *are* bindable - we explicitly permit that in do_loopback().

This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
get_proc_ns() is a macro now (it's simply returning ->i_private; would
have been an inline, if not for header ordering headache).
proc_ns_inode() is an ex-parrot. The interface used in procfs is
ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).

Dentries and inodes are never hashed; a non-counting reference to dentry
is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
of that mechanism.

As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets
from ns_get_path().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e149ed2b

init: allow CONFIG_INIT_FALLBACK=n to disable defaults if init= fails · 6ef4536e

由 Andy Lutomirski 提交于 12月 10, 2014

If a user puts init=/whatever on the command line and /whatever can't be
run, then the kernel will try a few default options before giving up.  If
init=/whatever came from a bootloader prompt, then this is unexpected but
probably harmless.  On the other hand, if it comes from a script (e.g.  a
tool like virtme or perhaps a future kselftest script), then the fallbacks
are likely to exist, but they'll do the wrong thing.  For example, they
might unexpectedly invoke systemd.

This adds a config option CONFIG_INIT_FALLBACK.  If unset, then a failure
to run the specified init= process be fatal.

The tentative plan is to remove CONFIG_INIT_FALLBACK for 3.20.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Rob Landley <rob@landley.net>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shuah Khan <shuah.kh@samsung.com>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6ef4536e

mm: move page->mem_cgroup bad page handling into generic code · 9edad6ea

由 Johannes Weiner 提交于 12月 10, 2014

Now that the external page_cgroup data structure and its lookup is
gone, let the generic bad_page() check for page->mem_cgroup sanity.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NVladimir Davydov <vdavydov@parallels.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Tejun Heo <tj@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9edad6ea

mm: embed the memcg pointer directly into struct page · 1306a85a

由 Johannes Weiner 提交于 12月 10, 2014

Memory cgroups used to have 5 per-page pointers.  To allow users to
disable that amount of overhead during runtime, those pointers were
allocated in a separate array, with a translation layer between them and
struct page.

There is now only one page pointer remaining: the memcg pointer, that
indicates which cgroup the page is associated with when charged.  The
complexity of runtime allocation and the runtime translation overhead is
no longer justified to save that *potential* 0.19% of memory.  With
CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding
after the page->private member and doesn't even increase struct page,
and then this patch actually saves space.  Remaining users that care can
still compile their kernels without CONFIG_MEMCG.

     text    data     bss     dec     hex     filename
  8828345 1725264  983040 11536649 b00909  vmlinux.old
  8827425 1725264  966656 11519345 afc571  vmlinux.new

[mhocko@suse.cz: update Documentation/cgroups/memory.txt]
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NVladimir Davydov <vdavydov@parallels.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: NKonstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1306a85a

mm/numa balancing: rearrange Kconfig entry · 6f7c97e8

由 Aneesh Kumar K.V 提交于 12月 10, 2014

Add the default enable config option after the NUMA_BALANCING option so
that it appears related in the nconfig interface.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6f7c97e8

kernel: res_counter: remove the unused API · 5b1efc02

由 Johannes Weiner 提交于 12月 10, 2014

All memory accounting and limiting has been switched over to the
lockless page counters.  Bye, res_counter!

[akpm@linux-foundation.org: update Documentation/cgroups/memory.txt]
[mhocko@suse.cz: ditch the last remainings of res_counter]
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NVladimir Davydov <vdavydov@parallels.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5b1efc02

mm: hugetlb_cgroup: convert to lockless page counters · 71f87bee

由 Johannes Weiner 提交于 12月 10, 2014

Abandon the spinlock-protected byte counters in favor of the unlocked
page counters in the hugetlb controller as well.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NVladimir Davydov <vdavydov@parallels.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

71f87bee

mm: memcontrol: lockless page counters · 3e32cb2e

由 Johannes Weiner 提交于 12月 10, 2014

Memory is internally accounted in bytes, using spinlock-protected 64-bit
counters, even though the smallest accounting delta is a page.  The
counter interface is also convoluted and does too many things.

Introduce a new lockless word-sized page counter API, then change all
memory accounting over to it.  The translation from and to bytes then only
happens when interfacing with userspace.

The removed locking overhead is noticable when scaling beyond the per-cpu
charge caches - on a 4-socket machine with 144-threads, the following test
shows the performance differences of 288 memcgs concurrently running a
page fault benchmark:

vanilla:

   18631648.500498      task-clock (msec)         #  140.643 CPUs utilized            ( +-  0.33% )
         1,380,638      context-switches          #    0.074 K/sec                    ( +-  0.75% )
            24,390      cpu-migrations            #    0.001 K/sec                    ( +-  8.44% )
     1,843,305,768      page-faults               #    0.099 M/sec                    ( +-  0.00% )
50,134,994,088,218      cycles                    #    2.691 GHz                      ( +-  0.33% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
 8,049,712,224,651      instructions              #    0.16  insns per cycle          ( +-  0.04% )
 1,586,970,584,979      branches                  #   85.176 M/sec                    ( +-  0.05% )
     1,724,989,949      branch-misses             #    0.11% of all branches          ( +-  0.48% )

     132.474343877 seconds time elapsed                                          ( +-  0.21% )

lockless:

   12195979.037525      task-clock (msec)         #  133.480 CPUs utilized            ( +-  0.18% )
           832,850      context-switches          #    0.068 K/sec                    ( +-  0.54% )
            15,624      cpu-migrations            #    0.001 K/sec                    ( +- 10.17% )
     1,843,304,774      page-faults               #    0.151 M/sec                    ( +-  0.00% )
32,811,216,801,141      cycles                    #    2.690 GHz                      ( +-  0.18% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
 9,999,265,091,727      instructions              #    0.30  insns per cycle          ( +-  0.10% )
 2,076,759,325,203      branches                  #  170.282 M/sec                    ( +-  0.12% )
     1,656,917,214      branch-misses             #    0.08% of all branches          ( +-  0.55% )

      91.369330729 seconds time elapsed                                          ( +-  0.45% )

On top of improved scalability, this also gets rid of the icky long long
types in the very heart of memcg, which is great for 32 bit and also makes
the code a lot more readable.

Notable differences between the old and new API:

- res_counter_charge() and res_counter_charge_nofail() become
  page_counter_try_charge() and page_counter_charge() resp. to match
  the more common kernel naming scheme of try_do()/do()

- res_counter_uncharge_until() is only ever used to cancel a local
  counter and never to uncharge bigger segments of a hierarchy, so
  it's replaced by the simpler page_counter_cancel()

- res_counter_set_limit() is replaced by page_counter_limit(), which
  expects its callers to serialize against themselves

- res_counter_memparse_write_strategy() is replaced by
  page_counter_limit(), which rounds down to the nearest page size -
  rather than up.  This is more reasonable for explicitely requested
  hard upper limits.

- to keep charging light-weight, page_counter_try_charge() charges
  speculatively, only to roll back if the result exceeds the limit.
  Because of this, a failing bigger charge can temporarily lock out
  smaller charges that would otherwise succeed.  The error is bounded
  to the difference between the smallest and the biggest possible
  charge size, so for memcg, this means that a failing THP charge can
  send base page charges into reclaim upto 2MB (4MB) before the limit
  would have been reached.  This should be acceptable.

[akpm@linux-foundation.org: add includes for WARN_ON_ONCE and memparse]
[akpm@linux-foundation.org: add includes for WARN_ON_ONCE, memparse, strncmp, and PAGE_SIZE]
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NVladimir Davydov <vdavydov@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3e32cb2e

05 12月, 2014 2 次提交

A
copy address of proc_ns_ops into ns_common · 33c42940
由 Al Viro 提交于 11月 01, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
33c42940

common object embedded into various struct ....ns · 435d5f4b

由 Al Viro 提交于 10月 31, 2014

for now - just move corresponding ->proc_inum instances over there
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

435d5f4b

18 11月, 2014 1 次提交

integrity: provide a hook to load keys when rootfs is ready · c9cd2ce2

由 Dmitry Kasatkin 提交于 11月 05, 2014

Keys can only be loaded once the rootfs is mounted. Initcalls
are not suitable for that. This patch defines a special hook
to load the x509 public keys onto the IMA keyring, before
attempting to access any file. The keys are required for
verifying the file's signature. The hook is called after the
root filesystem is mounted and before the kernel calls 'init'.

Changes in v3:
* added more explanation to the patch description (Mimi)

Changes in v2:
* Hook renamed as 'integrity_load_keys()' to handle both IMA and EVM
  keys by integrity subsystem.
* Hook patch moved after defining loading functions
Signed-off-by: NDmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>

c9cd2ce2

11 11月, 2014 1 次提交

param: fix crash on bad kernel arguments · 3438cf54

由 Daniel Thompson 提交于 11月 11, 2014

Currently if the user passes an invalid value on the kernel command line
then the kernel will crash during argument parsing. On most systems this
is very hard to debug because the console hasn't been initialized yet.

This is a regression due to commit 51e158c1 ("param: hand arguments
after -- straight to init") which, in response to the systemd debug
controversy, made it possible to explicitly pass arguments to init. To
achieve this parse_args() was extended from simply returning an error
code to returning a pointer. Regretably the new init args logic does not
perform a proper validity check on the pointer resulting in a crash.

This patch fixes the validity check. Should the check fail then no arguments
will be passed to init. This is reasonable and matches how the kernel treats
its own arguments (i.e. no error recovery).
Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

3438cf54

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功