提交 · 660fd04f9317172ae90f414c68b18a26ae88a829 · openeuler / Kernel

14 1月, 2020 2 次提交

lib/vdso: Prepare for time namespace support · 660fd04f

由 Thomas Gleixner 提交于 11月 12, 2019

To support time namespaces in the vdso with a minimal impact on regular non
time namespace affected tasks, the namespace handling needs to be hidden in
a slow path.

The most obvious place is vdso_seq_begin(). If a task belongs to a time
namespace then the VVAR page which contains the system wide vdso data is
replaced with a namespace specific page which has the same layout as the
VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path
and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time
namespace handling path.

The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the vdso data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.

If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.

If VDSO time namespace support is disabled the whole magic is compiled out.

Initial testing shows that the disabled case is almost identical to the
host case which does not take the slow timens path. With the special timens
page installed the performance hit is constant time and in the range of
5-7%.

For the vdso functions which are not using the sequence count an
unconditional check for vdso_data->clock_mode is added which switches to
the real vdso when the clock_mode is VCLOCK_TIMENS.

[avagin: Make do_hres_timens() work with raw clocks too: choose vdso_data
 pointer by CS_RAW offset.]
Suggested-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrei Vagin <avagin@gmail.com>
Signed-off-by: NDmitry Safonov <dima@arista.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-21-dima@arista.com

660fd04f

ns: Introduce Time Namespace · 769071ac

由 Andrei Vagin 提交于 11月 12, 2019

Time Namespace isolates clock values.

The kernel provides access to several clocks CLOCK_REALTIME,
CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc.

CLOCK_REALTIME
      System-wide clock that measures real (i.e., wall-clock) time.

CLOCK_MONOTONIC
      Clock that cannot be set and represents monotonic time since
      some unspecified starting point.

CLOCK_BOOTTIME
      Identical to CLOCK_MONOTONIC, except it also includes any time
      that the system is suspended.

For many users, the time namespace means the ability to changes date and
time in a container (CLOCK_REALTIME). Providing per namespace notions of
CLOCK_REALTIME would be complex with a massive overhead, but has a dubious
value.

But in the context of checkpoint/restore functionality, monotonic and
boottime clocks become interesting. Both clocks are monotonic with
unspecified starting points. These clocks are widely used to measure time
slices and set timers. After restoring or migrating processes, it has to be
guaranteed that they never go backward. In an ideal case, the behavior of
these clocks should be the same as for a case when a whole system is
suspended. All this means that it is required to set CLOCK_MONOTONIC and
CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace
offsets for clocks.

A time namespace is similar to a pid namespace in the way how it is
created: unshare(CLONE_NEWTIME) system call creates a new time namespace,
but doesn't set it to the current process. Then all children of the process
will be born in the new time namespace, or a process can use the setns()
system call to join a namespace.

This scheme allows setting clock offsets for a namespace, before any
processes appear in it.

All available clone flags have been used, so CLONE_NEWTIME uses the highest
bit of CSIGNAL. It means that it can be used only with the unshare() and
the clone3() system calls.

[ tglx: Adjusted paragraph about clone3() to reality and massaged the
  	changelog a bit. ]
Co-developed-by: NDmitry Safonov <dima@arista.com>
Signed-off-by: NAndrei Vagin <avagin@gmail.com>
Signed-off-by: NDmitry Safonov <dima@arista.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://criu.org/Time_namespace
Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html
Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com

769071ac

03 1月, 2020 1 次提交

Revert "fs: remove ksys_dup()" · 74f1a299

由 Dominik Brodowski 提交于 1月 01, 2020

This reverts commit 8243186f ("fs: remove ksys_dup()") and the
subsequent fix for it in commit 2d3145f8 ("early init: fix error
handling when opening /dev/console").

Trying to use filp_open() and f_dupfd() instead of pseudo-syscalls
caused more trouble than what is worth it: it requires accessing vfs
internals and it turns out there were other bugs in it too.

In particular, the file reference counting was wrong - because unlike
the original "open+2*dup" sequence it used "filp_open+3*f_dupfd" and
thus had an extra leaked file reference.

That in turn then caused odd problems with Androidx86 long after boot
becaue of how the extra reference to the console kept the session active
even after all file descriptors had been closed.
Reported-by: Nyouling 257 <youling257@gmail.com>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

74f1a299

18 12月, 2019 1 次提交

early init: fix error handling when opening /dev/console · 2d3145f8

由 Linus Torvalds 提交于 12月 17, 2019

The comment says "this should never fail", but it definitely can fail
when you have odd initial boot filesystems, or kernel configurations.

So get the error handling right: filp_open() returns an error pointer.
Reported-by: NJesse Barnes <jsbarnes@google.com>
Reported-by: Nyouling 257 <youling257@gmail.com>
Fixes: 8243186f ("fs: remove ksys_dup()")
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d3145f8

17 12月, 2019 1 次提交

Fix root mounting with no mount options · 7de7de7c

由 Linus Torvalds 提交于 12月 15, 2019

The "trivial conversion" in commit cccaa5e3 ("init: use do_mount()
instead of ksys_mount()") was totally broken, since it didn't handle the
case of a NULL mount data pointer.  And while I had "tested" it (and
presumably Dominik had too) that bug was hidden by me having options.

Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Reported-by: NOndřej Jirman <megi@xff.cz>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Reported-and-tested-by: NBorislav Petkov <bp@suse.de>
Tested-by: NChris Clayton <chris2553@googlemail.com>
Tested-by: NEric Biggers <ebiggers@kernel.org>
Tested-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Tested-by: NGuido Günther <agx@sigxcpu.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7de7de7c

13 12月, 2019 2 次提交

fs: remove ksys_dup() · 8243186f

由 Dominik Brodowski 提交于 10月 23, 2018

ksys_dup() is used only at one place in the kernel, namely to duplicate
fd 0 of /dev/console to stdout and stderr. The same functionality can be
achieved by using functions already available within the kernel namespace.
Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>

8243186f

init: unify opening /dev/console as stdin/stdout/stderr · b49a733d

由 Dominik Brodowski 提交于 10月 23, 2018

Merge the two instances where /dev/console is opened as
stdin/stdout/stderr.
Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>

b49a733d

12 12月, 2019 3 次提交

init: use do_mount() instead of ksys_mount() · cccaa5e3

由 Dominik Brodowski 提交于 10月 23, 2018

In prepare_namespace(), do_mount() can be used instead of ksys_mount()
as the first and third argument are const strings in the kernel, the
second and fourth argument are passed through anyway, and the fifth
argument is NULL.

In do_mount_root(), ksys_mount() is called with the first and third
argument being already kernelspace strings, which do not need to be
copied over from userspace to kernelspace (again). The second and
fourth arguments are passed through to do_mount() anyway. The fifth
argument, while already residing in kernelspace, needs to be put into
a page of its own. Then, do_mount() can be used instead of
ksys_mount().

Once this is done, there are no in-kernel users to ksys_mount() left,
which can therefore be removed.
Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>

cccaa5e3

initrd: use do_mount() instead of ksys_mount() · d4440aac

由 Dominik Brodowski 提交于 11月 27, 2019

All three calls to ksys_mount() in initrd-related kernel code can
be switched over to do_mount():
- the first and third arguments are const strings in the kernel,
  and do not need to be copied over from userspace;
- the fifth argument is NULL, and therefore no page needs to be,
  copied over from userspace;
- the second and fourth argument are passed through anyway.
Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>

d4440aac

devtmpfs: use do_mount() instead of ksys_mount() · 5e787dbf

由 Dominik Brodowski 提交于 10月 23, 2018

In devtmpfs, do_mount() can be called directly instead of complex wrapping
by ksys_mount():
- the first and third arguments are const strings in the kernel,
  and do not need to be copied over from userspace;
- the fifth argument is NULL, and therefore no page needs to be
  copied over from userspace;
- the second and fourth argument are passed through anyway.
Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>

5e787dbf

05 12月, 2019 1 次提交

init/Kconfig: fix indentation · e8cf4e9c

由 Krzysztof Kozlowski 提交于 12月 04, 2019

Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
	$ sed -e 's/^        /	/' -i */Kconfig

Link: http://lkml.kernel.org/r/1574306670-30234-1-git-send-email-krzk@kernel.orgSigned-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Cc: Jiri Kosina <trivial@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e8cf4e9c

27 11月, 2019 1 次提交

sysctl: Remove the sysctl system call · 61a47c1a

由 Eric W. Biederman 提交于 10月 01, 2019

This system call has been deprecated almost since it was introduced, and
in a survey of the linux distributions I can no longer find any of them
that enable CONFIG_SYSCTL_SYSCALL.  The only indication that I can find
that anyone might care is that a few of the defconfigs in the kernel
enable CONFIG_SYSCTL_SYSCALL.  However this appears in only 31 of 414
defconfigs in the kernel, so I suspect this symbols presence is simply
because it is harmless to include rather than because it is necessary.

As there appear to be no users of the sysctl system call, remove the
code.  As this removes one of the few uses of the internal kernel mount
of proc I hope this allows for even more simplifications of the proc
filesystem.

Cc: Alex Smith <alex.smith@imgtec.com>
Cc: Anders Berg <anders.berg@lsi.com>
Cc: Apelete Seketeli <apelete@seketeli.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Chee Nouk Phoon <cnphoon@altera.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Christian Ruppert <christian.ruppert@abilis.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Harvey Hunt <harvey.hunt@imgtec.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hongliang Tao <taohl@lemote.com>
Cc: Hua Yan <yanh@lemote.com>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: John Crispin <blogic@openwrt.org>
Cc: Jonas Jensen <jonas.jensen@gmail.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Jun Nie <jun.nie@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Kevin Wells <kevin.wells@nxp.com>
Cc: Kumar Gala <galak@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Phil Edworthy <phil.edworthy@renesas.com>
Cc: Pierrick Hascoet <pierrick.hascoet@abilis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Roland Stigge <stigge@antcom.de>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Scott Telford <stelford@cadence.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: Tanmay Inamdar <tinamdar@apm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

61a47c1a

17 11月, 2019 1 次提交

int128: move __uint128_t compiler test to Kconfig · c12d3362

由 Ard Biesheuvel 提交于 11月 08, 2019

In order to use 128-bit integer arithmetic in C code, the architecture
needs to have declared support for it by setting ARCH_SUPPORTS_INT128,
and it requires a version of the toolchain that supports this at build
time. This is why all existing tests for ARCH_SUPPORTS_INT128 also test
whether __SIZEOF_INT128__ is defined, since this is only the case for
compilers that can support 128-bit integers.

Let's fold this additional test into the Kconfig declaration of
ARCH_SUPPORTS_INT128 so that we can also use the symbol in Makefiles,
e.g., to decide whether a certain object needs to be included in the
first place.

Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

c12d3362

14 11月, 2019 1 次提交

kbuild: remove header compile test · fcbb8461

由 Masahiro Yamada 提交于 11月 07, 2019

There are both positive and negative options about this feature.
At first, I thought it was a good idea, but actually Linus stated a
negative opinion (https://lkml.org/lkml/2019/9/29/227). I admit it
is ugly and annoying.

The baseline I'd like to keep is the compile-test of uapi headers.
(Otherwise, kernel developers have no way to ensure the correctness
of the exported headers.)

I will maintain a small build rule in usr/include/Makefile.
Remove the other header test functionality.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>

fcbb8461

30 10月, 2019 1 次提交

io_uring: replace workqueue usage with io-wq · 561fb04a

由 Jens Axboe 提交于 10月 24, 2019

Drop various work-arounds we have for workqueues:

- We no longer need the async_list for tracking sequential IO.

- We don't have to maintain our own mm tracking/setting.

- We don't need a separate workqueue for buffered writes. This didn't
  even work that well to begin with, as it was suboptimal for multiple
  buffered writers on multiple files.

- We can properly cancel pending interruptible work. This fixes
  deadlocks with particularly socket IO, where we cannot cancel them
  when the io_uring is closed. Hence the ring will wait forever for
  these requests to complete, which may never happen. This is different
  from disk IO where we know requests will complete in a finite amount
  of time.

- Due to being able to cancel work interruptible work that is already
  running, we can implement file table support for work. We need that
  for supporting system calls that add to a process file table.

- It gets us one step closer to adding async support for any system
  call.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

561fb04a

16 10月, 2019 1 次提交

arm64: use generic free_initrd_mem() · 899ee4af

由 Mike Rapoport 提交于 9月 28, 2019

arm64 calls memblock_free() for the initrd area in its implementation of
free_initrd_mem(), but this call has no actual effect that late in the boot
process. By the time initrd is freed, all the reserved memory is managed by
the page allocator and the memblock.reserved is unused, so the only purpose
of the memblock_free() call is to keep track of initrd memory for debugging
and accounting.

Without the memblock_free() call the only difference between arm64 and the
generic versions of free_initrd_mem() is the memory poisoning.

Move memblock_free() call to the generic code, enable it there
for the architectures that define ARCH_KEEP_MEMBLOCK and use the generic
implementation of free_initrd_mem() on arm64.

Tested-by: Anshuman Khandual <anshuman.khandual@arm.com>	#arm64
Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Acked-by: NWill Deacon <will@kernel.org>
Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

899ee4af

03 10月, 2019 1 次提交

init: Support mounting root file systems over SMB · 8902dd52

由 Paulo Alcantara (SUSE) 提交于 10月 01, 2019

Add a new virtual device named /dev/cifs (0xfe) to tell the kernel to
mount the root file system over the network by using SMB protocol.

cifs_root_data() will be responsible to retrieve the parsed
information of the new command-line option (cifsroot=) and then call
do_mount_root() with the appropriate mount options for cifs.ko.
Signed-off-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8902dd52

25 9月, 2019 3 次提交

mm: use CPU_BITS_NONE to initialize init_mm.cpu_bitmask · 2286bf4e

由 Mike Rapoport 提交于 9月 23, 2019

Replace open-coded bitmap array initialization of init_mm.cpu_bitmask with
neat CPU_BITS_NONE macro.

And, since init_mm.cpu_bitmask is statically set to zero, there is no way
to clear it again in start_kernel().

Link: http://lkml.kernel.org/r/1565703815-8584-1-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2286bf4e

mm: consolidate pgtable_cache_init() and pgd_cache_init() · 782de70c

由 Mike Rapoport 提交于 9月 23, 2019

Both pgtable_cache_init() and pgd_cache_init() are used to initialize kmem
cache for page table allocations on several architectures that do not use
PAGE_SIZE tables for one or more levels of the page table hierarchy.

Most architectures do not implement these functions and use __weak default
NOP implementation of pgd_cache_init().  Since there is no such default
for pgtable_cache_init(), its empty stub is duplicated among most
architectures.

Rename the definitions of pgd_cache_init() to pgtable_cache_init() and
drop empty stubs of pgtable_cache_init().

Link: http://lkml.kernel.org/r/1566457046-22637-1-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
Acked-by: Will Deacon <will@kernel.org>		[arm64]
Acked-by: Thomas Gleixner <tglx@linutronix.de>	[x86]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

782de70c

mm: kmemleak: use the memory pool for early allocations · c5665868

由 Catalin Marinas 提交于 9月 23, 2019

Currently kmemleak uses a static early_log buffer to trace all memory
allocation/freeing before the slab allocator is initialised.  Such early
log is replayed during kmemleak_init() to properly initialise the kmemleak
metadata for objects allocated up that point.  With a memory pool that
does not rely on the slab allocator, it is possible to skip this early log
entirely.

In order to remove the early logging, consider kmemleak_enabled == 1 by
default while the kmem_cache availability is checked directly on the
object_cache and scan_area_cache variables.  The RCU callback is only
invoked after object_cache has been initialised as we wouldn't have any
concurrent list traversal before this.

In order to reduce the number of callbacks before kmemleak is fully
initialised, move the kmemleak_init() call to mm_init().

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: remove WARN_ON(), per Catalin]
Link: http://lkml.kernel.org/r/20190812160642.52134-4-catalin.marinas@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c5665868

16 9月, 2019 2 次提交

um: Enable CONFIG_CONSTRUCTORS · 786b2384

由 Johannes Berg 提交于 8月 23, 2019

We do need to call the constructors for *modules*, and
at least for KASAN in the future, we must call even the
kernel constructors only later when the kernel has been
initialized.

Instead of relying on libc to call them, emit an empty
section for libc and let the kernel's CONSTRUCTORS code
do the rest of the job.

Tested that it indeed doesn't work in modules, and does
work after the fixes in both, with a few functions with
__attribute__((constructor)) in both dynamic and static
builds.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NRichard Weinberger <richard@nod.at>

786b2384

compiler-types.h: add asm_inline definition · eb111869

由 Rasmus Villemoes 提交于 9月 13, 2019

This adds an asm_inline macro which expands to "asm inline" [1] when
the compiler supports it. This is currently gcc 9.1+, gcc 8.3
and (once released) gcc 7.5 [2]. It expands to just "asm" for other
compilers.

Using asm inline("foo") instead of asm("foo") overrules gcc's
heuristic estimate of the size of the code represented by the asm()
statement, and makes gcc use the minimum possible size instead. That
can in turn affect gcc's inlining decisions.

I wasn't sure whether to make this a function-like macro or not - this
way, it can be combined with volatile as

  asm_inline volatile()

but perhaps we'd prefer to spell that

  asm_inline_volatile()

anyway.

The Kconfig logic is taken from an RFC patch by Masahiro Yamada [3].

[1] Technically, asm __inline, since both inline and __inline__
are macros that attach various attributes, making gcc barf if one
literally does "asm inline()". However, the third spelling __inline is
available for referring to the bare keyword.

[2] https://lore.kernel.org/lkml/20190907001411.GG9749@gate.crashing.org/

[3] https://lore.kernel.org/lkml/1544695154-15250-1-git-send-email-yamada.masahiro@socionext.com/Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>

eb111869

13 9月, 2019 1 次提交

vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API · f3235626

由 David Howells 提交于 3月 25, 2019

Convert the ramfs, shmem, tmpfs, devtmpfs and rootfs filesystems to the new
internal mount API as the old one will be obsoleted and removed.  This
allows greater flexibility in communication of mount parameters between
userspace, the VFS and the filesystem.

See Documentation/filesystems/mount_api.txt for more information.

Note that tmpfs is slightly tricky as it can contain embedded commas, so it
can't be trivially split up using strsep() to break on commas in
generic_parse_monolithic().  Instead, tmpfs has to supply its own generic
parser.

However, if tmpfs changes, then devtmpfs and rootfs, which are wrappers
around tmpfs or ramfs, must change too - and thus so must ramfs, so these
had to be converted also.

[AV: rewritten]
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Hugh Dickins <hughd@google.com>
cc: linux-mm@kvack.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f3235626

12 9月, 2019 2 次提交

module: move CONFIG_UNUSED_SYMBOLS to the sub-menu of MODULES · efd9763d

由 Masahiro Yamada 提交于 9月 09, 2019

When CONFIG_MODULES is disabled, CONFIG_UNUSED_SYMBOLS is pointless,
thus it should be invisible.

Instead of adding "depends on MODULES", I moved it to the sub-menu
"Enable loadable module support", which is a better fit. I put it
close to TRIM_UNUSED_KSYMS because it depends on !UNUSED_SYMBOLS.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NJessica Yu <jeyu@kernel.org>

efd9763d

module: remove redundant 'depends on MODULES' · d189c2a4

由 Masahiro Yamada 提交于 9月 09, 2019

These are located in the 'if MODULES' ... 'endif' block.

Remove the redundant dependencies.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NJessica Yu <jeyu@kernel.org>

d189c2a4

10 9月, 2019 1 次提交

module: add config option MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS · 3d52ec5e

由 Matthias Maennich 提交于 9月 06, 2019

If MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is enabled (default=n), the
requirement for modules to import all namespaces that are used by
the module is relaxed.

Enabling this option effectively allows (invalid) modules to be loaded
while only a warning is emitted.

Disabling this option keeps the enforcement at module loading time and
loading is denied if the module's imports are not satisfactory.
Reviewed-by: NMartijn Coenen <maco@android.com>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NMatthias Maennich <maennich@google.com>
Signed-off-by: NJessica Yu <jeyu@kernel.org>

3d52ec5e

06 9月, 2019 2 次提交

make shmem_fill_super() static · 7e30d2a5

由 Al Viro 提交于 6月 01, 2019

... have callers use shmem_mount()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7e30d2a5

make ramfs_fill_super() static · df024502

由 Al Viro 提交于 6月 01, 2019

all users should just call ramfs_mount()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

df024502

04 9月, 2019 1 次提交

kbuild,arc: add CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3 for ARC · 15f5db60

由 Masahiro Yamada 提交于 8月 21, 2019

arch/arc/Makefile overrides -O2 with -O3. This is the only user of
ARCH_CFLAGS. There is no user of ARCH_CPPFLAGS or ARCH_AFLAGS.
My plan is to remove ARCH_{CPP,A,C}FLAGS after refactoring the ARC
Makefile.

Currently, ARC has no way to enable -Wmaybe-uninitialized because both
-O3 and -Os disable it. Enabling it will be useful for compile-testing.
This commit allows allmodconfig (, which defaults to -O2) to enable it.

Add CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3=y to all the defconfig files
in arch/arc/configs/ in order to keep the current config settings.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: NVineet Gupta <vgupta@synopsys.com>

15f5db60

03 9月, 2019 1 次提交

sched/uclamp: Extend CPU's cgroup controller · 2480c093

由 Patrick Bellasi 提交于 8月 22, 2019

The cgroup CPU bandwidth controller allows to assign a specified
(maximum) bandwidth to the tasks of a group. However this bandwidth is
defined and enforced only on a temporal base, without considering the
actual frequency a CPU is running on. Thus, the amount of computation
completed by a task within an allocated bandwidth can be very different
depending on the actual frequency the CPU is running that task.
The amount of computation can be affected also by the specific CPU a
task is running on, especially when running on asymmetric capacity
systems like Arm's big.LITTLE.

With the availability of schedutil, the scheduler is now able
to drive frequency selections based on actual task utilization.
Moreover, the utilization clamping support provides a mechanism to
bias the frequency selection operated by schedutil depending on
constraints assigned to the tasks currently RUNNABLE on a CPU.

Giving the mechanisms described above, it is now possible to extend the
cpu controller to specify the minimum (or maximum) utilization which
should be considered for tasks RUNNABLE on a cpu.
This makes it possible to better defined the actual computational
power assigned to task groups, thus improving the cgroup CPU bandwidth
controller which is currently based just on time constraints.

Extend the CPU controller with a couple of new attributes uclamp.{min,max}
which allow to enforce utilization boosting and capping for all the
tasks in a group.

Specifically:

- uclamp.min: defines the minimum utilization which should be considered
	      i.e. the RUNNABLE tasks of this group will run at least at a
	      minimum frequency which corresponds to the uclamp.min
	      utilization

- uclamp.max: defines the maximum utilization which should be considered
	      i.e. the RUNNABLE tasks of this group will run up to a
	      maximum frequency which corresponds to the uclamp.max
	      utilization

These attributes:

a) are available only for non-root nodes, both on default and legacy
   hierarchies, while system wide clamps are defined by a generic
   interface which does not depends on cgroups. This system wide
   interface enforces constraints on tasks in the root node.

b) enforce effective constraints at each level of the hierarchy which
   are a restriction of the group requests considering its parent's
   effective constraints. Root group effective constraints are defined
   by the system wide interface.
   This mechanism allows each (non-root) level of the hierarchy to:
   - request whatever clamp values it would like to get
   - effectively get only up to the maximum amount allowed by its parent

c) have higher priority than task-specific clamps, defined via
   sched_setattr(), thus allowing to control and restrict task requests.

Add two new attributes to the cpu controller to collect "requested"
clamp values. Allow that at each non-root level of the hierarchy.
Keep it simple by not caring now about "effective" values computation
and propagation along the hierarchy.

Update sysctl_sched_uclamp_handler() to use the newly introduced
uclamp_mutex so that we serialize system default updates with cgroup
relate updates.
Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NMichal Koutny <mkoutny@suse.com>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190822132811.31294-2-patrick.bellasi@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

2480c093

29 8月, 2019 1 次提交

init/Kconfig: rework help of CONFIG_CC_OPTIMIZE_FOR_SIZE · ce3b487f

由 Masahiro Yamada 提交于 8月 21, 2019

CONFIG_CC_OPTIMIZE_FOR_SIZE was originally an independent boolean
option, but commit 877417e6 ("Kbuild: change CC_OPTIMIZE_FOR_SIZE
definition") turned it into a choice between _PERFORMANCE and _SIZE.

The phrase "If unsure, say N." sounds like an independent option.
Reword the help text to make it appropriate for the choice menu.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>

ce3b487f

28 8月, 2019 1 次提交

posix-cpu-timers: Move state tracking to struct posix_cputimers · 244d49e3

由 Thomas Gleixner 提交于 8月 21, 2019

Put it where it belongs and clean up the ifdeffery in fork completely.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190821192922.743229404@linutronix.de

244d49e3

22 8月, 2019 1 次提交

kbuild: add CONFIG_ASM_MODVERSIONS · 2ff2b7ec

由 Masahiro Yamada 提交于 8月 19, 2019

Add CONFIG_ASM_MODVERSIONS. This allows to remove one if-conditional
nesting in scripts/Makefile.build.

scripts/Makefile.build is run every time Kbuild descends into a
sub-directory. So, I want to avoid $(wildcard ...) evaluation
where possible although computing $(wildcard ...) is so cheap that
it may not make measurable performance difference.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>

2ff2b7ec

20 8月, 2019 3 次提交

Revert "init/Kconfig: Fix infinite Kconfig recursion on PPC" · 2d122942

由 Will Deacon 提交于 8月 20, 2019

This reverts commit 71c67a31.

Commit 117acf5c ("powerpc/Makefile: Always pass --synthetic to nm if
supported") removed the only conditional definition of $(NM), so we can
revert our temporary bodge to avoid Kconfig recursion and go back to
passing $(NM) through to the 'tools-support-relr.sh' when detecting
support for RELR relocations.
Signed-off-by: NWill Deacon <will@kernel.org>

2d122942

lockdown: Enforce module signatures if the kernel is locked down · 49fcf732

由 David Howells 提交于 8月 19, 2019

If the kernel is locked down, require that all modules have valid
signatures that we can verify.

I have adjusted the errors generated:

 (1) If there's no signature (ENODATA) or we can't check it (ENOPKG,
     ENOKEY), then:

     (a) If signatures are enforced then EKEYREJECTED is returned.

     (b) If there's no signature or we can't check it, but the kernel is
	 locked down then EPERM is returned (this is then consistent with
	 other lockdown cases).

 (2) If the signature is unparseable (EBADMSG, EINVAL), the signature fails
     the check (EKEYREJECTED) or a system error occurs (eg. ENOMEM), we
     return the error we got.

Note that the X.509 code doesn't check for key expiry as the RTC might not
be valid or might not have been transferred to the kernel's clock yet.

 [Modified by Matthew Garrett to remove the IMA integration. This will
  be replaced with integration with the IMA architecture policy
  patchset.]
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NMatthew Garrett <matthewgarrett@google.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: Jessica Yu <jeyu@kernel.org>
Signed-off-by: NJames Morris <jmorris@namei.org>

49fcf732

security: Support early LSMs · e6b1db98

由 Matthew Garrett 提交于 8月 19, 2019

The lockdown module is intended to allow for kernels to be locked down
early in boot - sufficiently early that we don't have the ability to
kmalloc() yet. Add support for early initialisation of some LSMs, and
then add them to the list of names when we do full initialisation later.
Early LSMs are initialised in link order and cannot be overridden via
boot parameters, and cannot make use of kmalloc() (since the allocator
isn't initialised yet).

(Fixed by Stephen Rothwell to include a stub to fix builds when
!CONFIG_SECURITY)
Signed-off-by: NMatthew Garrett <mjg59@google.com>
Acked-by: NKees Cook <keescook@chromium.org>
Acked-by: NCasey Schaufler <casey@schaufler-ca.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NJames Morris <jmorris@namei.org>

e6b1db98

14 8月, 2019 1 次提交

Kbuild: Handle PREEMPT_RT for version string and magic · 4b950bb9

由 Thomas Gleixner 提交于 7月 28, 2019

Update the build scripts and the version magic to reflect when
CONFIG_PREEMPT_RT is enabled in the same way as CONFIG_PREEMPT is treated.

The resulting version strings:

  Linux m 5.3.0-rc1+ #100 SMP Fri Jul 26 ...
  Linux m 5.3.0-rc1+ #101 SMP PREEMPT Fri Jul 26 ...
  Linux m 5.3.0-rc1+ #102 SMP PREEMPT_RT Fri Jul 26 ...

The module vermagic:

  5.3.0-rc1+ SMP mod_unload modversions
  5.3.0-rc1+ SMP preempt mod_unload modversions
  5.3.0-rc1+ SMP preempt_rt mod_unload modversions
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>

4b950bb9

07 8月, 2019 1 次提交

init/Kconfig: Fix infinite Kconfig recursion on PPC · 71c67a31

由 Will Deacon 提交于 8月 07, 2019

Commit 5cf896fb ("arm64: Add support for relocating the kernel with
RELR relocations") introduced CONFIG_TOOLS_SUPPORT_RELR, which checks
for RELR support in the toolchain as part of the kernel configuration.
During this procedure, "$(NM)" is invoked to see if it supports the new
relocation format, however PowerPC conditionally overrides this variable
in the architecture Makefile in order to pass '--synthetic' when
targetting PPC64.

This conditional override causes Kconfig to recurse forever, since
CONFIG_TOOLS_SUPPORT_RELR cannot be determined without $(NM) being
defined, but that in turn depends on CONFIG_PPC64:

  $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu-
  scripts/kconfig/conf  --syncconfig Kconfig
  scripts/kconfig/conf  --syncconfig Kconfig
  scripts/kconfig/conf  --syncconfig Kconfig
  [...]

In this particular case, it looks like PowerPC may be able to pass
'--synthetic' unconditionally to nm or even drop it altogether. While
that is being resolved, let's just bodge the RELR check by picking up
$(NM) directly from the environment in whatever state it happens to be
in.

Cc: Peter Collingbourne <pcc@google.com>
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Suggested-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NWill Deacon <will@kernel.org>

71c67a31

06 8月, 2019 1 次提交

MODSIGN: Export module signature definitions · c8424e77

由 Thiago Jung Bauermann 提交于 7月 04, 2019

IMA will use the module_signature format for append signatures, so export
the relevant definitions and factor out the code which verifies that the
appended signature trailer is valid.

Also, create a CONFIG_MODULE_SIG_FORMAT option so that IMA can select it
and be able to use mod_check_sig() without having to depend on either
CONFIG_MODULE_SIG or CONFIG_MODULES.

s390 duplicated the definition of struct module_signature so now they can
use the new <linux/module_signature.h> header instead.
Signed-off-by: NThiago Jung Bauermann <bauerman@linux.ibm.com>
Acked-by: NJessica Yu <jeyu@kernel.org>
Reviewed-by: NPhilipp Rudo <prudo@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMimi Zohar <zohar@linux.ibm.com>

c8424e77

05 8月, 2019 1 次提交

arm64: Add support for relocating the kernel with RELR relocations · 5cf896fb

由 Peter Collingbourne 提交于 7月 31, 2019

RELR is a relocation packing format for relative relocations.
The format is described in a generic-abi proposal:
https://groups.google.com/d/topic/generic-abi/bX460iggiKg/discussion

The LLD linker can be instructed to pack relocations in the RELR
format by passing the flag --pack-dyn-relocs=relr.

This patch adds a new config option, CONFIG_RELR. Enabling this option
instructs the linker to pack vmlinux's relative relocations in the RELR
format, and causes the kernel to apply the relocations at startup along
with the RELA relocations. RELA relocations still need to be applied
because the linker will emit RELA relative relocations if they are
unrepresentable in the RELR format (i.e. address not a multiple of 2).

Enabling CONFIG_RELR reduces the size of a defconfig kernel image
with CONFIG_RANDOMIZE_BASE by 3.5MB/16% uncompressed, or 550KB/5%
compressed (lz4).
Signed-off-by: NPeter Collingbourne <pcc@google.com>
Tested-by: NNick Desaulniers <ndesaulniers@google.com>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NWill Deacon <will@kernel.org>

5cf896fb

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功