1. 19 5月, 2021 1 次提交
    • I
      ipv4: Add a sysctl to control multipath hash fields · ce5c9c20
      Ido Schimmel 提交于
      A subsequent patch will add a new multipath hash policy where the packet
      fields used for multipath hash calculation are determined by user space.
      This patch adds a sysctl that allows user space to set these fields.
      
      The packet fields are represented using a bitmask and are common between
      IPv4 and IPv6 to allow user space to use the same numbering across both
      protocols. For example, to hash based on standard 5-tuple:
      
       # sysctl -w net.ipv4.fib_multipath_hash_fields=0x0037
       net.ipv4.fib_multipath_hash_fields = 0x0037
      
      The kernel rejects unknown fields, for example:
      
       # sysctl -w net.ipv4.fib_multipath_hash_fields=0x1000
       sysctl: setting key "net.ipv4.fib_multipath_hash_fields": Invalid argument
      
      More fields can be added in the future, if needed.
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce5c9c20
  2. 18 5月, 2021 1 次提交
  3. 15 5月, 2021 3 次提交
  4. 07 5月, 2021 4 次提交
    • D
      drivers/char: remove /dev/kmem for good · bbcd53c9
      David Hildenbrand 提交于
      Patch series "drivers/char: remove /dev/kmem for good".
      
      Exploring /dev/kmem and /dev/mem in the context of memory hot(un)plug and
      memory ballooning, I started questioning the existence of /dev/kmem.
      
      Comparing it with the /proc/kcore implementation, it does not seem to be
      able to deal with things like
      
      a) Pages unmapped from the direct mapping (e.g., to be used by secretmem)
        -> kern_addr_valid(). virt_addr_valid() is not sufficient.
      
      b) Special cases like gart aperture memory that is not to be touched
        -> mem_pfn_is_ram()
      
      Unless I am missing something, it's at least broken in some cases and might
      fault/crash the machine.
      
      Looks like its existence has been questioned before in 2005 and 2010 [1],
      after ~11 additional years, it might make sense to revive the discussion.
      
      CONFIG_DEVKMEM is only enabled in a single defconfig (on purpose or by
      mistake?).  All distributions disable it: in Ubuntu it has been disabled
      for more than 10 years, in Debian since 2.6.31, in Fedora at least
      starting with FC3, in RHEL starting with RHEL4, in SUSE starting from
      15sp2, and OpenSUSE has it disabled as well.
      
      1) /dev/kmem was popular for rootkits [2] before it got disabled
         basically everywhere. Ubuntu documents [3] "There is no modern user of
         /dev/kmem any more beyond attackers using it to load kernel rootkits.".
         RHEL documents in a BZ [5] "it served no practical purpose other than to
         serve as a potential security problem or to enable binary module drivers
         to access structures/functions they shouldn't be touching"
      
      2) /proc/kcore is a decent interface to have a controlled way to read
         kernel memory for debugging puposes. (will need some extensions to
         deal with memory offlining/unplug, memory ballooning, and poisoned
         pages, though)
      
      3) It might be useful for corner case debugging [1]. KDB/KGDB might be a
         better fit, especially, to write random memory; harder to shoot
         yourself into the foot.
      
      4) "Kernel Memory Editor" [4] hasn't seen any updates since 2000 and seems
         to be incompatible with 64bit [1]. For educational purposes,
         /proc/kcore might be used to monitor value updates -- or older
         kernels can be used.
      
      5) It's broken on arm64, and therefore, completely disabled there.
      
      Looks like it's essentially unused and has been replaced by better
      suited interfaces for individual tasks (/proc/kcore, KDB/KGDB). Let's
      just remove it.
      
      [1] https://lwn.net/Articles/147901/
      [2] https://www.linuxjournal.com/article/10505
      [3] https://wiki.ubuntu.com/Security/Features#A.2Fdev.2Fkmem_disabled
      [4] https://sourceforge.net/projects/kme/
      [5] https://bugzilla.redhat.com/show_bug.cgi?id=154796
      
      Link: https://lkml.kernel.org/r/20210324102351.6932-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20210324102351.6932-2-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Alexander A. Klimov" <grandmaster@al2klimov.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Corentin Labbe <clabbe@baylibre.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Gregory Clement <gregory.clement@bootlin.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: huang ying <huang.ying.caritas@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: James Troup <james.troup@canonical.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kairui Song <kasong@redhat.com>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Cc: Liviu Dudau <liviu.dudau@arm.com>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Niklas Schnelle <schnelle@linux.ibm.com>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
      Cc: openrisc@lists.librecores.org
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Pavel Machek (CIP)" <pavel@denx.de>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Pierre Morel <pmorel@linux.ibm.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Cc: sparclinux@vger.kernel.org
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Cc: Theodore Dubois <tblodt@icloud.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: William Cohen <wcohen@redhat.com>
      Cc: Xiaoming Ni <nixiaoming@huawei.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bbcd53c9
    • R
      init/initramfs.c: do unpacking asynchronously · e7cb072e
      Rasmus Villemoes 提交于
      Patch series "background initramfs unpacking, and CONFIG_MODPROBE_PATH", v3.
      
      These two patches are independent, but better-together.
      
      The second is a rather trivial patch that simply allows the developer to
      change "/sbin/modprobe" to something else - e.g.  the empty string, so
      that all request_module() during early boot return -ENOENT early, without
      even spawning a usermode helper, needlessly synchronizing with the
      initramfs unpacking.
      
      The first patch delegates decompressing the initramfs to a worker thread,
      allowing do_initcalls() in main.c to proceed to the device_ and late_
      initcalls without waiting for that decompression (and populating of
      rootfs) to finish.  Obviously, some of those later calls may rely on the
      initramfs being available, so I've added synchronization points in the
      firmware loader and usermodehelper paths - there might be other places
      that would need this, but so far no one has been able to think of any
      places I have missed.
      
      There's not much to win if most of the functionality needed during boot is
      only available as modules.  But systems with a custom-made .config and
      initramfs can boot faster, partly due to utilizing more than one cpu
      earlier, partly by avoiding known-futile modprobe calls (which would still
      trigger synchronization with the initramfs unpacking, thus eliminating
      most of the first benefit).
      
      This patch (of 2):
      
      Most of the boot process doesn't actually need anything from the
      initramfs, until of course PID1 is to be executed.  So instead of doing
      the decompressing and populating of the initramfs synchronously in
      populate_rootfs() itself, push that off to a worker thread.
      
      This is primarily motivated by an embedded ppc target, where unpacking
      even the rather modest sized initramfs takes 0.6 seconds, which is long
      enough that the external watchdog becomes unhappy that it doesn't get
      attention soon enough.  By doing the initramfs decompression in a worker
      thread, we get to do the device_initcalls and hence start petting the
      watchdog much sooner.
      
      Normal desktops might benefit as well.  On my mostly stock Ubuntu kernel,
      my initramfs is a 26M xz-compressed blob, decompressing to around 126M.
      That takes almost two seconds:
      
      [    0.201454] Trying to unpack rootfs image as initramfs...
      [    1.976633] Freeing initrd memory: 29416K
      
      Before this patch, these lines occur consecutively in dmesg.  With this
      patch, the timestamps on these two lines is roughly the same as above, but
      with 172 lines inbetween - so more than one cpu has been kept busy doing
      work that would otherwise only happen after the populate_rootfs()
      finished.
      
      Should one of the initcalls done after rootfs_initcall time (i.e., device_
      and late_ initcalls) need something from the initramfs (say, a kernel
      module or a firmware blob), it will simply wait for the initramfs
      unpacking to be done before proceeding, which should in theory make this
      completely safe.
      
      But if some driver pokes around in the filesystem directly and not via one
      of the official kernel interfaces (i.e.  request_firmware*(),
      call_usermodehelper*) that theory may not hold - also, I certainly might
      have missed a spot when sprinkling wait_for_initramfs().  So there is an
      escape hatch in the form of an initramfs_async= command line parameter.
      
      Link: https://lkml.kernel.org/r/20210313212528.2956377-1-linux@rasmusvillemoes.dk
      Link: https://lkml.kernel.org/r/20210313212528.2956377-2-linux@rasmusvillemoes.dkSigned-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7cb072e
    • B
      scripts/gdb: add lx_current support for arm64 · 526940e3
      Barry Song 提交于
      arm64 uses SP_EL0 to save the current task_struct address.  While running
      in EL0, SP_EL0 is clobbered by userspace.  So if the upper bit is not 1
      (not TTBR1), the current address is invalid.  This patch checks the upper
      bit of SP_EL0, if the upper bit is 1, lx_current() of arm64 will return
      the derefrence of current task.  Otherwise, lx_current() will tell users
      they are running in userspace(EL0).
      
      While arm64 is running in EL0, it is actually pointless to print current
      task as the memory of kernel space is not accessible in EL0.
      
      Link: https://lkml.kernel.org/r/20210314203444.15188-3-song.bao.hua@hisilicon.comSigned-off-by: NBarry Song <song.bao.hua@hisilicon.com>
      Cc: Jan Kiszka <jan.kiszka@siemens.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kieran Bingham <kbingham@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      526940e3
    • B
      scripts/gdb: document lx_current is only supported by x86 · dc958682
      Barry Song 提交于
      Patch series "scripts/gdb: clarify the platforms supporting lx_current and add arm64 support", v2.
      
      lx_current depends on per_cpu current_task variable which exists on x86
      only.  so it actually works on x86 only.  the 1st patch documents this
      clearly; the 2nd patch adds support for arm64.
      
      This patch (of 2):
      
      x86 is the only architecture which has per_cpu current_task:
      
        arch$ git grep current_task | grep -i per_cpu
        x86/include/asm/current.h:DECLARE_PER_CPU(struct task_struct *, current_task);
        x86/kernel/cpu/common.c:DEFINE_PER_CPU(struct task_struct *, current_task) ____cacheline_aligned =
        x86/kernel/cpu/common.c:EXPORT_PER_CPU_SYMBOL(current_task);
        x86/kernel/cpu/common.c:DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task;
        x86/kernel/cpu/common.c:EXPORT_PER_CPU_SYMBOL(current_task);
        x86/kernel/smpboot.c:	per_cpu(current_task, cpu) = idle;
      
      On other architectures, lx_current() will lead to a python exception:
      
        (gdb) p $lx_current().pid
        Python Exception <class 'gdb.error'> No symbol "current_task" in current context.:
        Error occurred in Python: No symbol "current_task" in current context.
      
      To avoid more people struggling and wasting time in other architectures,
      document it.
      
      Link: https://lkml.kernel.org/r/20210314203444.15188-1-song.bao.hua@hisilicon.com
      Link: https://lkml.kernel.org/r/20210314203444.15188-2-song.bao.hua@hisilicon.comSigned-off-by: NBarry Song <song.bao.hua@hisilicon.com>
      Cc: Jan Kiszka <jan.kiszka@siemens.com>
      Cc: Kieran Bingham <kbingham@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc958682
  5. 06 5月, 2021 6 次提交
  6. 05 5月, 2021 5 次提交
  7. 04 5月, 2021 15 次提交
  8. 03 5月, 2021 1 次提交
  9. 01 5月, 2021 4 次提交