1. 27 8月, 2018 1 次提交
  2. 24 8月, 2018 2 次提交
  3. 23 8月, 2018 1 次提交
    • A
      arch: enable relative relocations for arm64, power and x86 · 271ca788
      Ard Biesheuvel 提交于
      Patch series "add support for relative references in special sections", v10.
      
      This adds support for emitting special sections such as initcall arrays,
      PCI fixups and tracepoints as relative references rather than absolute
      references.  This reduces the size by 50% on 64-bit architectures, but
      more importantly, it removes the need for carrying relocation metadata for
      these sections in relocatable kernels (e.g., for KASLR) that needs to be
      fixed up at boot time.  On arm64, this reduces the vmlinux footprint of
      such a reference by 8x (8 byte absolute reference + 24 byte RELA entry vs
      4 byte relative reference)
      
      Patch #3 was sent out before as a single patch.  This series supersedes
      the previous submission.  This version makes relative ksymtab entries
      dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
      than trying to infer from kbuild test robot replies for which
      architectures it should be blacklisted.
      
      Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
      and sets it for the main architectures that are expected to benefit the
      most from this feature, i.e., 64-bit architectures or ones that use
      runtime relocations.
      
      Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
      ksymtab/kcrctab sections in decompressor and EFI stub objects when
      rebuilding existing C files to run in a different context.
      
      Patches #4 - #6 implement relative references for initcalls, PCI fixups
      and tracepoints, respectively, all of which produce sections with order
      ~1000 entries on an arm64 defconfig kernel with tracing enabled.  This
      means we save about 28 KB of vmlinux space for each of these patches.
      
      [From the v7 series blurb, which included the jump_label patches as well]:
      
        For the arm64 kernel, all patches combined reduce the memory footprint
        of vmlinux by about 1.3 MB (using a config copied from Ubuntu that has
        KASLR enabled), of which ~1 MB is the size reduction of the RELA section
        in .init, and the remaining 300 KB is reduction of .text/.data.
      
      This patch (of 6):
      
      Before updating certain subsystems to use place relative 32-bit
      relocations in special sections, to save space and reduce the number of
      absolute relocations that need to be processed at runtime by relocatable
      kernels, introduce the Kconfig symbol and define it for some architectures
      that should be able to support and benefit from it.
      
      Link: http://lkml.kernel.org/r/20180704083651.24360-2-ard.biesheuvel@linaro.orgSigned-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
      Cc: James Morris <james.morris@microsoft.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      271ca788
  4. 02 8月, 2018 3 次提交
  5. 24 7月, 2018 1 次提交
    • A
      arm64: fix ACPI dependencies · 2c870e61
      Arnd Bergmann 提交于
      Kconfig reports a warning on x86 builds after the ARM64 dependency
      was added.
      
      drivers/acpi/Kconfig:6:error: recursive dependency detected!
      drivers/acpi/Kconfig:6:       symbol ACPI depends on EFI
      
      This rephrases the dependency to keep the ARM64 details out of the
      shared Kconfig file, so Kconfig no longer gets confused by it.
      
      For consistency, all three architectures that support ACPI now
      select ARCH_SUPPORTS_ACPI in exactly the configuration in which
      they allow it. We still need the 'default x86', as each one
      wants a different default: default-y on x86, default-n on arm64,
      and always-y on ia64.
      
      Fixes: 5bcd4408 ("drivers: acpi: add dependency of EFI for arm64")
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      2c870e61
  6. 16 7月, 2018 1 次提交
    • D
      x86/asm/memcpy_mcsafe: Fix copy_to_user_mcsafe() exception handling · 092b31aa
      Dan Williams 提交于
      All copy_to_user() implementations need to be prepared to handle faults
      accessing userspace. The __memcpy_mcsafe() implementation handles both
      mmu-faults on the user destination and machine-check-exceptions on the
      source buffer. However, the memcpy_mcsafe() wrapper may silently
      fallback to memcpy() depending on build options and cpu-capabilities.
      
      Force copy_to_user_mcsafe() to always use __memcpy_mcsafe() when
      available, and otherwise disable all of the copy_to_user_mcsafe()
      infrastructure when __memcpy_mcsafe() is not available, i.e.
      CONFIG_X86_MCE=n.
      
      This fixes crashes of the form:
          run fstests generic/323 at 2018-07-02 12:46:23
          BUG: unable to handle kernel paging request at 00007f0d50001000
          RIP: 0010:__memcpy+0x12/0x20
          [..]
          Call Trace:
           copyout_mcsafe+0x3a/0x50
           _copy_to_iter_mcsafe+0xa1/0x4a0
           ? dax_alive+0x30/0x50
           dax_iomap_actor+0x1f9/0x280
           ? dax_iomap_rw+0x100/0x100
           iomap_apply+0xba/0x130
           ? dax_iomap_rw+0x100/0x100
           dax_iomap_rw+0x95/0x100
           ? dax_iomap_rw+0x100/0x100
           xfs_file_dax_read+0x7b/0x1d0 [xfs]
           xfs_file_read_iter+0xa7/0xc0 [xfs]
           aio_read+0x11c/0x1a0
      Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Fixes: 8780356e ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()")
      Link: http://lkml.kernel.org/r/153108277790.37979.1486841789275803399.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      092b31aa
  7. 21 6月, 2018 2 次提交
    • J
      x86/stacktrace: Enable HAVE_RELIABLE_STACKTRACE for the ORC unwinder · 6415b38b
      Jiri Slaby 提交于
      In SUSE, we need a reliable stack unwinder for kernel live patching, but
      we do not want to enable frame pointers for performance reasons. So
      after the previous patches to make the ORC reliable, mark ORC as a
      reliable stack unwinder on x86.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/lkml/20180518064713.26440-6-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6415b38b
    • T
      cpu/hotplug: Provide knobs to control SMT · 05736e4a
      Thomas Gleixner 提交于
      Provide a command line and a sysfs knob to control SMT.
      
      The command line options are:
      
       'nosmt':	Enumerate secondary threads, but do not online them
       		
       'nosmt=force': Ignore secondary threads completely during enumeration
       		via MP table and ACPI/MADT.
      
      The sysfs control file has the following states (read/write):
      
       'on':		 SMT is enabled. Secondary threads can be freely onlined
       'off':		 SMT is disabled. Secondary threads, even if enumerated
       		 cannot be onlined
       'forceoff':	 SMT is permanentely disabled. Writes to the control
       		 file are rejected.
       'notsupported': SMT is not supported by the CPU
      
      The command line option 'nosmt' sets the sysfs control to 'off'. This
      can be changed to 'on' to reenable SMT during runtime.
      
      The command line option 'nosmt=force' sets the sysfs control to
      'forceoff'. This cannot be changed during runtime.
      
      When SMT is 'on' and the control file is changed to 'off' then all online
      secondary threads are offlined and attempts to online a secondary thread
      later on are rejected.
      
      When SMT is 'off' and the control file is changed to 'on' then secondary
      threads can be onlined again. The 'off' -> 'on' transition does not
      automatically online the secondary threads.
      
      When the control file is set to 'forceoff', the behaviour is the same as
      setting it to 'off', but the operation is irreversible and later writes to
      the control file are rejected.
      
      When the control status is 'notsupported' then writes to the control file
      are rejected.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      05736e4a
  8. 15 6月, 2018 2 次提交
  9. 08 6月, 2018 2 次提交
    • M
      stack-protector: test compiler capability in Kconfig and drop AUTO mode · 2a61f474
      Masahiro Yamada 提交于
      Move the test for -fstack-protector(-strong) option to Kconfig.
      
      If the compiler does not support the option, the corresponding menu
      is automatically hidden.  If STRONG is not supported, it will fall
      back to REGULAR.  If REGULAR is not supported, it will be disabled.
      This means, AUTO is implicitly handled by the dependency solver of
      Kconfig, hence removed.
      
      I also turned the 'choice' into only two boolean symbols.  The use of
      'choice' is not a good idea here, because all of all{yes,mod,no}config
      would choose the first visible value, while we want allnoconfig to
      disable as many features as possible.
      
      X86 has additional shell scripts in case the compiler supports those
      options, but generates broken code.  I added CC_HAS_SANE_STACKPROTECTOR
      to test this.  I had to add -m32 to gcc-x86_32-has-stack-protector.sh
      to make it work correctly.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      2a61f474
    • L
      mm: introduce ARCH_HAS_PTE_SPECIAL · 3010a5ea
      Laurent Dufour 提交于
      Currently the PTE special supports is turned on in per architecture
      header files.  Most of the time, it is defined in
      arch/*/include/asm/pgtable.h depending or not on some other per
      architecture static definition.
      
      This patch introduce a new configuration variable to manage this
      directly in the Kconfig files.  It would later replace
      __HAVE_ARCH_PTE_SPECIAL.
      
      Here notes for some architecture where the definition of
      __HAVE_ARCH_PTE_SPECIAL is not obvious:
      
      arm
       __HAVE_ARCH_PTE_SPECIAL which is currently defined in
      arch/arm/include/asm/pgtable-3level.h which is included by
      arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set.
      So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE.
      
      powerpc
      __HAVE_ARCH_PTE_SPECIAL is defined in 2 files:
       - arch/powerpc/include/asm/book3s/64/pgtable.h
       - arch/powerpc/include/asm/pte-common.h
      The first one is included if (PPC_BOOK3S & PPC64) while the second is
      included in all the other cases.
      So select ARCH_HAS_PTE_SPECIAL all the time.
      
      sparc:
      __HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) &&
      defined(__arch64__) which are defined through the compiler in
      sparc/Makefile if !SPARC32 which I assume to be if SPARC64.
      So select ARCH_HAS_PTE_SPECIAL if SPARC64
      
      There is no functional change introduced by this patch.
      
      Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.comSigned-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Suggested-by: NJerome Glisse <jglisse@redhat.com>
      Reviewed-by: NJerome Glisse <jglisse@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Albert Ou <albert@sifive.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Christophe LEROY <christophe.leroy@c-s.fr>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3010a5ea
  10. 06 6月, 2018 2 次提交
    • K
      x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME · 94d49eb3
      Kirill A. Shutemov 提交于
      AMD SME claims one bit from physical address to indicate whether the
      page is encrypted or not. To achieve that we clear out the bit from
      __PHYSICAL_MASK.
      
      The capability to adjust __PHYSICAL_MASK is required beyond AMD SME.
      For instance for upcoming Intel Multi-Key Total Memory Encryption.
      
      Factor it out into a separate feature with own Kconfig handle.
      
      It also helps with overhead of AMD SME. It saves more than 3k in .text
      on defconfig + AMD_MEM_ENCRYPT:
      
      	add/remove: 3/2 grow/shrink: 5/110 up/down: 189/-3753 (-3564)
      
      We would need to return to this once we have infrastructure to patch
      constants in code. That's good candidate for it.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NTom Lendacky <thomas.lendacky@amd.com>
      Cc: linux-mm@kvack.org
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/20180518113028.79825-1-kirill.shutemov@linux.intel.com
      94d49eb3
    • M
      x86: Add support for restartable sequences · d6761b8f
      Mathieu Desnoyers 提交于
      Call the rseq_handle_notify_resume() function on return to userspace if
      TIF_NOTIFY_RESUME thread flag is set.
      
      Perform fixup on the pre-signal frame when a signal is delivered on top
      of a restartable sequence critical section.
      
      Check that system calls are not invoked from within rseq critical
      sections by invoking rseq_signal() from syscall_return_slowpath().
      With CONFIG_DEBUG_RSEQ, such behavior results in termination of the
      process with SIGSEGV.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Chris Lameter <cl@linux.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ben Maurer <bmaurer@fb.com>
      Cc: linux-api@vger.kernel.org
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20180602124408.8430-7-mathieu.desnoyers@efficios.com
      d6761b8f
  11. 29 5月, 2018 1 次提交
    • M
      kconfig: reference environment variables directly and remove 'option env=' · 104daea1
      Masahiro Yamada 提交于
      To get access to environment variables, Kconfig needs to define a
      symbol using "option env=" syntax.  It is tedious to add a symbol entry
      for each environment variable given that we need to define much more
      such as 'CC', 'AS', 'srctree' etc. to evaluate the compiler capability
      in Kconfig.
      
      Adding '$' for symbol references is grammatically inconsistent.
      Looking at the code, the symbols prefixed with 'S' are expanded by:
       - conf_expand_value()
         This is used to expand 'arch/$ARCH/defconfig' and 'defconfig_list'
       - sym_expand_string_value()
         This is used to expand strings in 'source' and 'mainmenu'
      
      All of them are fixed values independent of user configuration.  So,
      they can be changed into the direct expansion instead of symbols.
      
      This change makes the code much cleaner.  The bounce symbols 'SRCARCH',
      'ARCH', 'SUBARCH', 'KERNELVERSION' are gone.
      
      sym_init() hard-coding 'UNAME_RELEASE' is also gone.  'UNAME_RELEASE'
      should be replaced with an environment variable.
      
      ARCH_DEFCONFIG is a normal symbol, so it should be simply referenced
      without '$' prefix.
      
      The new syntax is addicted by Make.  The variable reference needs
      parentheses, like $(FOO), but you can omit them for single-letter
      variables, like $F.  Yet, in Makefiles, people tend to use the
      parenthetical form for consistency / clarification.
      
      At this moment, only the environment variable is supported, but I will
      extend the concept of 'variable' later on.
      
      The variables are expanded in the lexer so we can simplify the token
      handling on the parser side.
      
      For example, the following code works.
      
      [Example code]
      
        config MY_TOOLCHAIN_LIST
                string
                default "My tools: CC=$(CC), AS=$(AS), CPP=$(CPP)"
      
      [Result]
      
        $ make -s alldefconfig && tail -n 1 .config
        CONFIG_MY_TOOLCHAIN_LIST="My tools: CC=gcc, AS=as, CPP=gcc -E"
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      104daea1
  12. 15 5月, 2018 1 次提交
  13. 09 5月, 2018 7 次提交
  14. 08 5月, 2018 1 次提交
  15. 04 5月, 2018 1 次提交
    • W
      bpf, x86_32: add eBPF JIT compiler for ia32 · 03f5781b
      Wang YanQing 提交于
      The JIT compiler emits ia32 bit instructions. Currently, It supports eBPF
      only. Classic BPF is supported because of the conversion by BPF core.
      
      Almost all instructions from eBPF ISA supported except the following:
      BPF_ALU64 | BPF_DIV | BPF_K
      BPF_ALU64 | BPF_DIV | BPF_X
      BPF_ALU64 | BPF_MOD | BPF_K
      BPF_ALU64 | BPF_MOD | BPF_X
      BPF_STX | BPF_XADD | BPF_W
      BPF_STX | BPF_XADD | BPF_DW
      
      It doesn't support BPF_JMP|BPF_CALL with BPF_PSEUDO_CALL at the moment.
      
      IA32 has few general purpose registers, EAX|EDX|ECX|EBX|ESI|EDI. I use
      EAX|EDX|ECX|EBX as temporary registers to simulate instructions in eBPF
      ISA, and allocate ESI|EDI to BPF_REG_AX for constant blinding, all others
      eBPF registers, R0-R10, are simulated through scratch space on stack.
      
      The reasons behind the hardware registers allocation policy are:
      1:MUL need EAX:EDX, shift operation need ECX, so they aren't fit
        for general eBPF 64bit register simulation.
      2:We need at least 4 registers to simulate most eBPF ISA operations
        on registers operands instead of on register&memory operands.
      3:We need to put BPF_REG_AX on hardware registers, or constant blinding
        will degrade jit performance heavily.
      
      Tested on PC (Intel(R) Core(TM) i5-5200U CPU).
      Testing results on i5-5200U:
      1) test_bpf: Summary: 349 PASSED, 0 FAILED, [319/341 JIT'ed]
      2) test_progs: Summary: 83 PASSED, 0 FAILED.
      3) test_lpm: OK
      4) test_lru_map: OK
      5) test_verifier: Summary: 828 PASSED, 0 FAILED.
      
      Above tests are all done in following two conditions separately:
      1:bpf_jit_enable=1 and bpf_jit_harden=0
      2:bpf_jit_enable=1 and bpf_jit_harden=2
      
      Below are some numbers for this jit implementation:
      Note:
        I run test_progs in kselftest 100 times continuously for every condition,
        the numbers are in format: total/times=avg.
        The numbers that test_bpf reports show almost the same relation.
      
      a:jit_enable=0 and jit_harden=0            b:jit_enable=1 and jit_harden=0
        test_pkt_access:PASS:ipv4:15622/100=156    test_pkt_access:PASS:ipv4:10674/100=106
        test_pkt_access:PASS:ipv6:9130/100=91      test_pkt_access:PASS:ipv6:4855/100=48
        test_xdp:PASS:ipv4:240198/100=2401         test_xdp:PASS:ipv4:138912/100=1389
        test_xdp:PASS:ipv6:137326/100=1373         test_xdp:PASS:ipv6:68542/100=685
        test_l4lb:PASS:ipv4:61100/100=611          test_l4lb:PASS:ipv4:37302/100=373
        test_l4lb:PASS:ipv6:101000/100=1010        test_l4lb:PASS:ipv6:55030/100=550
      
      c:jit_enable=1 and jit_harden=2
        test_pkt_access:PASS:ipv4:10558/100=105
        test_pkt_access:PASS:ipv6:5092/100=50
        test_xdp:PASS:ipv4:131902/100=1319
        test_xdp:PASS:ipv6:77932/100=779
        test_l4lb:PASS:ipv4:38924/100=389
        test_l4lb:PASS:ipv6:57520/100=575
      
      The numbers show we get 30%~50% improvement.
      
      See Documentation/networking/filter.txt for more information.
      
      Changelog:
      
       Changes v5-v6:
       1:Add do {} while (0) to RETPOLINE_RAX_BPF_JIT for
         consistence reason.
       2:Clean up non-standard comments, reported by Daniel Borkmann.
       3:Fix a memory leak issue, repoted by Daniel Borkmann.
      
       Changes v4-v5:
       1:Delete is_on_stack, BPF_REG_AX is the only one
         on real hardware registers, so just check with
         it.
       2:Apply commit 1612a981 ("bpf, x64: fix JIT emission
         for dead code"), suggested by Daniel Borkmann.
      
       Changes v3-v4:
       1:Fix changelog in commit.
         I install llvm-6.0, then test_progs willn't report errors.
         I submit another patch:
         "bpf: fix misaligned access for BPF_PROG_TYPE_PERF_EVENT program type on x86_32 platform"
         to fix another problem, after that patch, test_verifier willn't report errors too.
       2:Fix clear r0[1] twice unnecessarily in *BPF_IND|BPF_ABS* simulation.
      
       Changes v2-v3:
       1:Move BPF_REG_AX to real hardware registers for performance reason.
       3:Using bpf_load_pointer instead of bpf_jit32.S, suggested by Daniel Borkmann.
       4:Delete partial codes in 1c2a088a, suggested by Daniel Borkmann.
       5:Some bug fixes and comments improvement.
      
       Changes v1-v2:
       1:Fix bug in emit_ia32_neg64.
       2:Fix bug in emit_ia32_arsh_r64.
       3:Delete filename in top level comment, suggested by Thomas Gleixner.
       4:Delete unnecessary boiler plate text, suggested by Thomas Gleixner.
       5:Rewrite some words in changelog.
       6:CodingSytle improvement and a little more comments.
      Signed-off-by: NWang YanQing <udknight@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      03f5781b
  16. 25 4月, 2018 1 次提交
    • D
      x86/pti: Filter at vma->vm_page_prot population · 316d097c
      Dave Hansen 提交于
      commit ce9962bf7e22bb3891655c349faff618922d4a73
      
      0day reported warnings at boot on 32-bit systems without NX support:
      
      attempted to set unsupported pgprot: 8000000000000025 bits: 8000000000000000 supported: 7fffffffffffffff
      WARNING: CPU: 0 PID: 1 at
      arch/x86/include/asm/pgtable.h:540 handle_mm_fault+0xfc1/0xfe0:
       check_pgprot at arch/x86/include/asm/pgtable.h:535
       (inlined by) pfn_pte at arch/x86/include/asm/pgtable.h:549
       (inlined by) do_anonymous_page at mm/memory.c:3169
       (inlined by) handle_pte_fault at mm/memory.c:3961
       (inlined by) __handle_mm_fault at mm/memory.c:4087
       (inlined by) handle_mm_fault at mm/memory.c:4124
      
      The problem is that due to the recent commit which removed auto-massaging
      of page protections, filtering page permissions at PTE creation time is not
      longer done, so vma->vm_page_prot is passed unfiltered to PTE creation.
      
      Filter the page protections before they are installed in vma->vm_page_prot.
      
      Fixes: fb43d6cb ("x86/mm: Do not auto-massage page protections")
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Link: https://lkml.kernel.org/r/20180420222028.99D72858@viggo.jf.intel.com
      
      316d097c
  17. 14 4月, 2018 1 次提交
    • A
      kexec_file: make use of purgatory optional · b799a09f
      AKASHI Takahiro 提交于
      Patch series "kexec_file, x86, powerpc: refactoring for other
      architecutres", v2.
      
      This is a preparatory patchset for adding kexec_file support on arm64.
      
      It was originally included in a arm64 patch set[1], but Philipp is also
      working on their kexec_file support on s390[2] and some changes are now
      conflicting.
      
      So these common parts were extracted and put into a separate patch set
      for better integration.  What's more, my original patch#4 was split into
      a few small chunks for easier review after Dave's comment.
      
      As such, the resulting code is basically identical with my original, and
      the only *visible* differences are:
      
       - renaming of _kexec_kernel_image_probe() and  _kimage_file_post_load_cleanup()
      
       - change one of types of arguments at prepare_elf64_headers()
      
      Those, unfortunately, require a couple of trivial changes on the rest
      (#1, #6 to #13) of my arm64 kexec_file patch set[1].
      
      Patch #1 allows making a use of purgatory optional, particularly useful
      for arm64.
      
      Patch #2 commonalizes arch_kexec_kernel_{image_probe, image_load,
      verify_sig}() and arch_kimage_file_post_load_cleanup() across
      architectures.
      
      Patches #3-#7 are also intended to generalize parse_elf64_headers(),
      along with exclude_mem_range(), to be made best re-use of.
      
      [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-February/561182.html
      [2] http://lkml.iu.edu//hypermail/linux/kernel/1802.1/02596.html
      
      This patch (of 7):
      
      On arm64, crash dump kernel's usable memory is protected by *unmapping*
      it from kernel virtual space unlike other architectures where the region
      is just made read-only.  It is highly unlikely that the region is
      accidentally corrupted and this observation rationalizes that digest
      check code can also be dropped from purgatory.  The resulting code is so
      simple as it doesn't require a bit ugly re-linking/relocation stuff,
      i.e.  arch_kexec_apply_relocations_add().
      
      Please see:
      
         http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545428.html
      
      All that the purgatory does is to shuffle arguments and jump into a new
      kernel, while we still need to have some space for a hash value
      (purgatory_sha256_digest) which is never checked against.
      
      As such, it doesn't make sense to have trampline code between old kernel
      and new kernel on arm64.
      
      This patch introduces a new configuration, ARCH_HAS_KEXEC_PURGATORY, and
      allows related code to be compiled in only if necessary.
      
      [takahiro.akashi@linaro.org: fix trivial screwup]
        Link: http://lkml.kernel.org/r/20180309093346.GF25863@linaro.org
      Link: http://lkml.kernel.org/r/20180306102303.9063-2-takahiro.akashi@linaro.orgSigned-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org>
      Acked-by: NDave Young <dyoung@redhat.com>
      Tested-by: NDave Young <dyoung@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b799a09f
  18. 10 4月, 2018 1 次提交
    • A
      x86/olpc: Fix inconsistent MFD_CS5535 configuration · 92e830f2
      Arnd Bergmann 提交于
      This Kconfig warning appeared after a fix to the Kconfig validation.
      The GPIO_CS5535 driver depends on the MFD_CS5535 driver, but the former
      is selected in places where the latter is not:
      
      WARNING: unmet direct dependencies detected for GPIO_CS5535
        Depends on [m]: GPIOLIB [=y] && (X86 [=y] || MIPS || COMPILE_TEST [=y]) && MFD_CS5535 [=m]
        Selected by [y]:
        - OLPC_XO1_SCI [=y] && X86_32 [=y] && OLPC [=y] && OLPC_XO1_PM [=y] && INPUT [=y]=y
      
      The warning does seem appropriate, since the GPIO_CS5535 driver won't
      work unless MFD_CS5535 is also present. However, there is no link time
      dependency between the two, so this caused no problems during randconfig
      testing before.
      
      This changes the 'select GPIO_CS5535' to 'depends on GPIO_CS5535' to
      avoid the issue, at the expense of making it harder to configure the
      driver (one now has to select the dependencies first).
      
      The 'select MFD_CORE' part is completely redundant, since we already
      depend on MFD_CS5535 here, so I'm removing that as well.
      
      Ideally, the private symbols exported by that cs5535 gpio driver would
      just be converted to gpiolib interfaces so we could expletely avoid
      this dependency.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kbuild@vger.kernel.org
      Fixes: f622f827 ("kconfig: warn unmet direct dependency of tristate symbols selected by y")
      Link: http://lkml.kernel.org/r/20180404124539.3817101-1-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      92e830f2
  19. 05 4月, 2018 3 次提交
    • D
      syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64 · f8781c4a
      Dominik Brodowski 提交于
      Removing CONFIG_SYSCALL_PTREGS from arch/x86/Kconfig and simply selecting
      ARCH_HAS_SYSCALL_WRAPPER unconditionally on x86-64 allows us to simplify
      several codepaths.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180405095307.3730-7-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f8781c4a
    • D
      syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32 · ebeb8c82
      Dominik Brodowski 提交于
      Extend ARCH_HAS_SYSCALL_WRAPPER for i386 emulation and for x32 on 64-bit
      x86.
      
      For x32, all we need to do is to create an additional stub for each
      compat syscall which decodes the parameters in x86-64 ordering, e.g.:
      
      	asmlinkage long __compat_sys_x32_xyzzy(struct pt_regs *regs)
      	{
      		return c_SyS_xyzzy(regs->di, regs->si, regs->dx);
      	}
      
      For i386 emulation, we need to teach compat_sys_*() to take struct
      pt_regs as its only argument, e.g.:
      
      	asmlinkage long __compat_sys_ia32_xyzzy(struct pt_regs *regs)
      	{
      		return c_SyS_xyzzy(regs->bx, regs->cx, regs->dx);
      	}
      
      In addition, we need to create additional stubs for common syscalls
      (that is, for syscalls which have the same parameters on 32-bit and
      64-bit), e.g.:
      
      	asmlinkage long __sys_ia32_xyzzy(struct pt_regs *regs)
      	{
      		return c_sys_xyzzy(regs->bx, regs->cx, regs->dx);
      	}
      
      This approach avoids leaking random user-provided register content down
      the call chain.
      
      This patch is based on an original proof-of-concept
      
       | From: Linus Torvalds <torvalds@linux-foundation.org>
       | Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
      
      and was split up and heavily modified by me, in particular to base it on
      ARCH_HAS_SYSCALL_WRAPPER.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180405095307.3730-6-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ebeb8c82
    • D
      syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls · fa697140
      Dominik Brodowski 提交于
      Let's make use of ARCH_HAS_SYSCALL_WRAPPER=y on pure 64-bit x86-64 systems:
      
      Each syscall defines a stub which takes struct pt_regs as its only
      argument. It decodes just those parameters it needs, e.g:
      
      	asmlinkage long sys_xyzzy(const struct pt_regs *regs)
      	{
      		return SyS_xyzzy(regs->di, regs->si, regs->dx);
      	}
      
      This approach avoids leaking random user-provided register content down
      the call chain.
      
      For example, for sys_recv() which is a 4-parameter syscall, the assembly
      now is (in slightly reordered fashion):
      
      	<sys_recv>:
      		callq	<__fentry__>
      
      		/* decode regs->di, ->si, ->dx and ->r10 */
      		mov	0x70(%rdi),%rdi
      		mov	0x68(%rdi),%rsi
      		mov	0x60(%rdi),%rdx
      		mov	0x38(%rdi),%rcx
      
      		[ SyS_recv() is automatically inlined by the compiler,
      		  as it is not [yet] used anywhere else ]
      		/* clear %r9 and %r8, the 5th and 6th args */
      		xor	%r9d,%r9d
      		xor	%r8d,%r8d
      
      		/* do the actual work */
      		callq	__sys_recvfrom
      
      		/* cleanup and return */
      		cltq
      		retq
      
      The only valid place in an x86-64 kernel which rightfully calls
      a syscall function on its own -- vsyscall -- needs to be modified
      to pass struct pt_regs onwards as well.
      
      To keep the syscall table generation working independent of
      SYSCALL_PTREGS being enabled, the stubs are named the same as the
      "original" syscall stubs, i.e. sys_*().
      
      This patch is based on an original proof-of-concept
      
       | From: Linus Torvalds <torvalds@linux-foundation.org>
       | Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
      
      and was split up and heavily modified by me, in particular to base it on
      ARCH_HAS_SYSCALL_WRAPPER, to limit it to 64-bit-only for the time being,
      and to update the vsyscall to the new calling convention.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180405095307.3730-4-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fa697140
  20. 27 3月, 2018 2 次提交
  21. 20 3月, 2018 3 次提交
  22. 08 3月, 2018 1 次提交