1. 27 9月, 2018 1 次提交
  2. 24 8月, 2018 1 次提交
    • P
      mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE · d86564a2
      Peter Zijlstra 提交于
      Jann reported that x86 was missing required TLB invalidates when he
      hit the !*batch slow path in tlb_remove_table().
      
      This is indeed the case; RCU_TABLE_FREE does not provide TLB (cache)
      invalidates, the PowerPC-hash where this code originated and the
      Sparc-hash where this was subsequently used did not need that. ARM
      which later used this put an explicit TLB invalidate in their
      __p*_free_tlb() functions, and PowerPC-radix followed that example.
      
      But when we hooked up x86 we failed to consider this. Fix this by
      (optionally) hooking tlb_remove_table() into the TLB invalidate code.
      
      NOTE: s390 was also needing something like this and might now
            be able to use the generic code again.
      
      [ Modified to be on top of Nick's cleanups, which simplified this patch
        now that tlb_flush_mmu_tlbonly() really only flushes the TLB - Linus ]
      
      Fixes: 9e52fc2b ("x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)")
      Reported-by: NJann Horn <jannh@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NRik van Riel <riel@surriel.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d86564a2
  3. 23 8月, 2018 1 次提交
    • A
      arch: enable relative relocations for arm64, power and x86 · 271ca788
      Ard Biesheuvel 提交于
      Patch series "add support for relative references in special sections", v10.
      
      This adds support for emitting special sections such as initcall arrays,
      PCI fixups and tracepoints as relative references rather than absolute
      references.  This reduces the size by 50% on 64-bit architectures, but
      more importantly, it removes the need for carrying relocation metadata for
      these sections in relocatable kernels (e.g., for KASLR) that needs to be
      fixed up at boot time.  On arm64, this reduces the vmlinux footprint of
      such a reference by 8x (8 byte absolute reference + 24 byte RELA entry vs
      4 byte relative reference)
      
      Patch #3 was sent out before as a single patch.  This series supersedes
      the previous submission.  This version makes relative ksymtab entries
      dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
      than trying to infer from kbuild test robot replies for which
      architectures it should be blacklisted.
      
      Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
      and sets it for the main architectures that are expected to benefit the
      most from this feature, i.e., 64-bit architectures or ones that use
      runtime relocations.
      
      Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
      ksymtab/kcrctab sections in decompressor and EFI stub objects when
      rebuilding existing C files to run in a different context.
      
      Patches #4 - #6 implement relative references for initcalls, PCI fixups
      and tracepoints, respectively, all of which produce sections with order
      ~1000 entries on an arm64 defconfig kernel with tracing enabled.  This
      means we save about 28 KB of vmlinux space for each of these patches.
      
      [From the v7 series blurb, which included the jump_label patches as well]:
      
        For the arm64 kernel, all patches combined reduce the memory footprint
        of vmlinux by about 1.3 MB (using a config copied from Ubuntu that has
        KASLR enabled), of which ~1 MB is the size reduction of the RELA section
        in .init, and the remaining 300 KB is reduction of .text/.data.
      
      This patch (of 6):
      
      Before updating certain subsystems to use place relative 32-bit
      relocations in special sections, to save space and reduce the number of
      absolute relocations that need to be processed at runtime by relocatable
      kernels, introduce the Kconfig symbol and define it for some architectures
      that should be able to support and benefit from it.
      
      Link: http://lkml.kernel.org/r/20180704083651.24360-2-ard.biesheuvel@linaro.orgSigned-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
      Cc: James Morris <james.morris@microsoft.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      271ca788
  4. 22 8月, 2018 1 次提交
    • P
      compiler.h: Allow arch-specific asm/compiler.h · 04f264d3
      Paul Burton 提交于
      We have a need to override the definition of
      barrier_before_unreachable() for MIPS, which means we either need to add
      architecture-specific code into linux/compiler-gcc.h or we need to allow
      the architecture to provide a header that can define the macro before
      the generic definition. The latter seems like the better approach.
      
      A straightforward approach to the per-arch header is to make use of
      asm-generic to provide a default empty header & adjust architectures
      which don't need anything specific to make use of that by adding the
      header to generic-y. Unfortunately this doesn't work so well due to
      commit 28128c61 ("kconfig.h: Include compiler types to avoid missed
      struct attributes") which caused linux/compiler_types.h to be included
      in the compilation of every C file via the -include linux/kconfig.h flag
      in c_flags.
      
      Because the -include flag is present for all C files we compile, we need
      the architecture-provided header to be present before any C files are
      compiled. If any C files can be compiled prior to the asm-generic header
      wrappers being generated then we hit a build failure due to missing
      header. Such cases do exist - one pointed out by the kbuild test robot
      is the compilation of arch/ia64/kernel/nr-irqs.c, which occurs as part
      of the archprepare target [1].
      
      This leaves us with a few options:
      
        1) Use generic-y & fix any build failures we find by enforcing
           ordering such that the asm-generic target occurs before any C
           compilation, such that linux/compiler_types.h can always include
           the generated asm-generic wrapper which in turn includes the empty
           asm-generic header. This would rely on us finding all the
           problematic cases - I don't know for sure that the ia64 issue is
           the only one.
      
        2) Add an actual empty header to each architecture, so that we don't
           need the generated asm-generic wrapper. This seems messy.
      
        3) Give up & add #ifdef CONFIG_MIPS or similar to
           linux/compiler_types.h. This seems messy too.
      
        4) Include the arch header only when it's actually needed, removing
           the need for the asm-generic wrapper for all other architectures.
      
      This patch allows us to use approach 4, by including an asm/compiler.h
      header from linux/compiler_types.h after the inclusion of the
      compiler-specific linux/compiler-*.h header(s). We do this
      conditionally, only when CONFIG_HAVE_ARCH_COMPILER_H is selected, in
      order to avoid the need for asm-generic wrappers & the associated build
      ordering issue described above. The asm/compiler.h header is included
      after the generic linux/compiler-*.h header(s) for consistency with the
      way linux/compiler-intel.h & linux/compiler-clang.h are included after
      the linux/compiler-gcc.h header that they override.
      
      [1] https://lists.01.org/pipermail/kbuild-all/2018-August/051175.htmlSigned-off-by: NPaul Burton <paul.burton@mips.com>
      Reviewed-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Patchwork: https://patchwork.linux-mips.org/patch/20269/
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kbuild@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      04f264d3
  5. 02 8月, 2018 3 次提交
  6. 25 7月, 2018 1 次提交
  7. 21 6月, 2018 1 次提交
    • T
      cpu/hotplug: Provide knobs to control SMT · 05736e4a
      Thomas Gleixner 提交于
      Provide a command line and a sysfs knob to control SMT.
      
      The command line options are:
      
       'nosmt':	Enumerate secondary threads, but do not online them
       		
       'nosmt=force': Ignore secondary threads completely during enumeration
       		via MP table and ACPI/MADT.
      
      The sysfs control file has the following states (read/write):
      
       'on':		 SMT is enabled. Secondary threads can be freely onlined
       'off':		 SMT is disabled. Secondary threads, even if enumerated
       		 cannot be onlined
       'forceoff':	 SMT is permanentely disabled. Writes to the control
       		 file are rejected.
       'notsupported': SMT is not supported by the CPU
      
      The command line option 'nosmt' sets the sysfs control to 'off'. This
      can be changed to 'on' to reenable SMT during runtime.
      
      The command line option 'nosmt=force' sets the sysfs control to
      'forceoff'. This cannot be changed during runtime.
      
      When SMT is 'on' and the control file is changed to 'off' then all online
      secondary threads are offlined and attempts to online a secondary thread
      later on are rejected.
      
      When SMT is 'off' and the control file is changed to 'on' then secondary
      threads can be onlined again. The 'off' -> 'on' transition does not
      automatically online the secondary threads.
      
      When the control file is set to 'forceoff', the behaviour is the same as
      setting it to 'off', but the operation is irreversible and later writes to
      the control file are rejected.
      
      When the control status is 'notsupported' then writes to the control file
      are rejected.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      05736e4a
  8. 16 6月, 2018 1 次提交
  9. 15 6月, 2018 1 次提交
  10. 14 6月, 2018 1 次提交
    • L
      Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables · 050e9baa
      Linus Torvalds 提交于
      The changes to automatically test for working stack protector compiler
      support in the Kconfig files removed the special STACKPROTECTOR_AUTO
      option that picked the strongest stack protector that the compiler
      supported.
      
      That was all a nice cleanup - it makes no sense to have the AUTO case
      now that the Kconfig phase can just determine the compiler support
      directly.
      
      HOWEVER.
      
      It also meant that doing "make oldconfig" would now _disable_ the strong
      stackprotector if you had AUTO enabled, because in a legacy config file,
      the sane stack protector configuration would look like
      
        CONFIG_HAVE_CC_STACKPROTECTOR=y
        # CONFIG_CC_STACKPROTECTOR_NONE is not set
        # CONFIG_CC_STACKPROTECTOR_REGULAR is not set
        # CONFIG_CC_STACKPROTECTOR_STRONG is not set
        CONFIG_CC_STACKPROTECTOR_AUTO=y
      
      and when you ran this through "make oldconfig" with the Kbuild changes,
      it would ask you about the regular CONFIG_CC_STACKPROTECTOR (that had
      been renamed from CONFIG_CC_STACKPROTECTOR_REGULAR to just
      CONFIG_CC_STACKPROTECTOR), but it would think that the STRONG version
      used to be disabled (because it was really enabled by AUTO), and would
      disable it in the new config, resulting in:
      
        CONFIG_HAVE_CC_STACKPROTECTOR=y
        CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
        CONFIG_CC_STACKPROTECTOR=y
        # CONFIG_CC_STACKPROTECTOR_STRONG is not set
        CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
      
      That's dangerously subtle - people could suddenly find themselves with
      the weaker stack protector setup without even realizing.
      
      The solution here is to just rename not just the old RECULAR stack
      protector option, but also the strong one.  This does that by just
      removing the CC_ prefix entirely for the user choices, because it really
      is not about the compiler support (the compiler support now instead
      automatially impacts _visibility_ of the options to users).
      
      This results in "make oldconfig" actually asking the user for their
      choice, so that we don't have any silent subtle security model changes.
      The end result would generally look like this:
      
        CONFIG_HAVE_CC_STACKPROTECTOR=y
        CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
        CONFIG_STACKPROTECTOR=y
        CONFIG_STACKPROTECTOR_STRONG=y
        CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
      
      where the "CC_" versions really are about internal compiler
      infrastructure, not the user selections.
      Acked-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      050e9baa
  11. 11 6月, 2018 3 次提交
    • M
      gcc-plugins: disable GCC_PLUGIN_STRUCTLEAK_BYREF_ALL for COMPILE_TEST · caa91ba5
      Masahiro Yamada 提交于
      We have enabled GCC_PLUGINS for COMPILE_TEST, but allmodconfig now
      produces new warnings.
      
        CC [M]  drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.o
      drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function ‘wlc_phy_workarounds_nphy_rev7’:
      drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16563:1: warning: the frame size of 3128 bytes is larger than 2048 bytes [-Wframe-larger-than=]
       }
       ^
      drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function ‘wlc_phy_workarounds_nphy_rev3’:
      drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16905:1: warning: the frame size of 2800 bytes is larger than 2048 bytes [-Wframe-larger-than=]
       }
       ^
      drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function ‘wlc_phy_cal_txiqlo_nphy’:
      drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:26033:1: warning: the frame size of 2488 bytes is larger than 2048 bytes [-Wframe-larger-than=]
       }
       ^
      
      It looks like GCC_PLUGIN_STRUCTLEAK_BYREF_ALL is causing this.
      Add "depends on !COMPILE_TEST" to not dirturb the compile test.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Suggested-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      caa91ba5
    • M
      gcc-plugins: allow to enable GCC_PLUGINS for COMPILE_TEST · 1658dcee
      Masahiro Yamada 提交于
      Now that the compiler's plugin support is checked in Kconfig,
      all{yes,mod}config will not be bothered.
      
      Remove 'depends on !COMPILE_TEST' for GCC_PLUGINS.
      
      'depends on !COMPILE_TEST' for the following three are still kept:
        GCC_PLUGIN_CYC_COMPLEXITY
        GCC_PLUGIN_STRUCTLEAK_VERBOSE
        GCC_PLUGIN_RANDSTRUCT_PERFORMANCE
      
      Kees suggested to do so because the first two are too noisy, and the
      last one would reduce the compile test coverage.  I commented the
      reasons in arch/Kconfig.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      1658dcee
    • M
      gcc-plugins: test plugin support in Kconfig and clean up Makefile · 59f53855
      Masahiro Yamada 提交于
      Run scripts/gcc-plugin.sh from Kconfig so that users can enable
      GCC_PLUGINS only when the compiler supports building plugins.
      
      Kconfig defines a new symbol, PLUGIN_HOSTCC.  This will contain
      the compiler (g++ or gcc) used for building plugins, or empty
      if the plugin can not be supported at all.
      
      This allows us to remove all ugly testing in Makefile.gcc-plugins.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      59f53855
  12. 08 6月, 2018 1 次提交
    • M
      stack-protector: test compiler capability in Kconfig and drop AUTO mode · 2a61f474
      Masahiro Yamada 提交于
      Move the test for -fstack-protector(-strong) option to Kconfig.
      
      If the compiler does not support the option, the corresponding menu
      is automatically hidden.  If STRONG is not supported, it will fall
      back to REGULAR.  If REGULAR is not supported, it will be disabled.
      This means, AUTO is implicitly handled by the dependency solver of
      Kconfig, hence removed.
      
      I also turned the 'choice' into only two boolean symbols.  The use of
      'choice' is not a good idea here, because all of all{yes,mod,no}config
      would choose the first visible value, while we want allnoconfig to
      disable as many features as possible.
      
      X86 has additional shell scripts in case the compiler supports those
      options, but generates broken code.  I added CC_HAS_SANE_STACKPROTECTOR
      to test this.  I had to add -m32 to gcc-x86_32-has-stack-protector.sh
      to make it work correctly.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      2a61f474
  13. 06 6月, 2018 1 次提交
    • M
      rseq: Introduce restartable sequences system call · d7822b1e
      Mathieu Desnoyers 提交于
      Expose a new system call allowing each thread to register one userspace
      memory area to be used as an ABI between kernel and user-space for two
      purposes: user-space restartable sequences and quick access to read the
      current CPU number value from user-space.
      
      * Restartable sequences (per-cpu atomics)
      
      Restartables sequences allow user-space to perform update operations on
      per-cpu data without requiring heavy-weight atomic operations.
      
      The restartable critical sections (percpu atomics) work has been started
      by Paul Turner and Andrew Hunter. It lets the kernel handle restart of
      critical sections. [1] [2] The re-implementation proposed here brings a
      few simplifications to the ABI which facilitates porting to other
      architectures and speeds up the user-space fast path.
      
      Here are benchmarks of various rseq use-cases.
      
      Test hardware:
      
      arm32: ARMv7 Processor rev 4 (v7l) "Cubietruck", 2-core
      x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading
      
      The following benchmarks were all performed on a single thread.
      
      * Per-CPU statistic counter increment
      
                      getcpu+atomic (ns/op)    rseq (ns/op)    speedup
      arm32:                344.0                 31.4          11.0
      x86-64:                15.3                  2.0           7.7
      
      * LTTng-UST: write event 32-bit header, 32-bit payload into tracer
                   per-cpu buffer
      
                      getcpu+atomic (ns/op)    rseq (ns/op)    speedup
      arm32:               2502.0                 2250.0         1.1
      x86-64:               117.4                   98.0         1.2
      
      * liburcu percpu: lock-unlock pair, dereference, read/compare word
      
                      getcpu+atomic (ns/op)    rseq (ns/op)    speedup
      arm32:                751.0                 128.5          5.8
      x86-64:                53.4                  28.6          1.9
      
      * jemalloc memory allocator adapted to use rseq
      
      Using rseq with per-cpu memory pools in jemalloc at Facebook (based on
      rseq 2016 implementation):
      
      The production workload response-time has 1-2% gain avg. latency, and
      the P99 overall latency drops by 2-3%.
      
      * Reading the current CPU number
      
      Speeding up reading the current CPU number on which the caller thread is
      running is done by keeping the current CPU number up do date within the
      cpu_id field of the memory area registered by the thread. This is done
      by making scheduler preemption set the TIF_NOTIFY_RESUME flag on the
      current thread. Upon return to user-space, a notify-resume handler
      updates the current CPU value within the registered user-space memory
      area. User-space can then read the current CPU number directly from
      memory.
      
      Keeping the current cpu id in a memory area shared between kernel and
      user-space is an improvement over current mechanisms available to read
      the current CPU number, which has the following benefits over
      alternative approaches:
      
      - 35x speedup on ARM vs system call through glibc
      - 20x speedup on x86 compared to calling glibc, which calls vdso
        executing a "lsl" instruction,
      - 14x speedup on x86 compared to inlined "lsl" instruction,
      - Unlike vdso approaches, this cpu_id value can be read from an inline
        assembly, which makes it a useful building block for restartable
        sequences.
      - The approach of reading the cpu id through memory mapping shared
        between kernel and user-space is portable (e.g. ARM), which is not the
        case for the lsl-based x86 vdso.
      
      On x86, yet another possible approach would be to use the gs segment
      selector to point to user-space per-cpu data. This approach performs
      similarly to the cpu id cache, but it has two disadvantages: it is
      not portable, and it is incompatible with existing applications already
      using the gs segment selector for other purposes.
      
      Benchmarking various approaches for reading the current CPU number:
      
      ARMv7 Processor rev 4 (v7l)
      Machine model: Cubietruck
      - Baseline (empty loop):                                    8.4 ns
      - Read CPU from rseq cpu_id:                               16.7 ns
      - Read CPU from rseq cpu_id (lazy register):               19.8 ns
      - glibc 2.19-0ubuntu6.6 getcpu:                           301.8 ns
      - getcpu system call:                                     234.9 ns
      
      x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz:
      - Baseline (empty loop):                                    0.8 ns
      - Read CPU from rseq cpu_id:                                0.8 ns
      - Read CPU from rseq cpu_id (lazy register):                0.8 ns
      - Read using gs segment selector:                           0.8 ns
      - "lsl" inline assembly:                                   13.0 ns
      - glibc 2.19-0ubuntu6 getcpu:                              16.6 ns
      - getcpu system call:                                      53.9 ns
      
      - Speed (benchmark taken on v8 of patchset)
      
      Running 10 runs of hackbench -l 100000 seems to indicate, contrary to
      expectations, that enabling CONFIG_RSEQ slightly accelerates the
      scheduler:
      
      Configuration: 2 sockets * 8-core Intel(R) Xeon(R) CPU E5-2630 v3 @
      2.40GHz (directly on hardware, hyperthreading disabled in BIOS, energy
      saving disabled in BIOS, turboboost disabled in BIOS, cpuidle.off=1
      kernel parameter), with a Linux v4.6 defconfig+localyesconfig,
      restartable sequences series applied.
      
      * CONFIG_RSEQ=n
      
      avg.:      41.37 s
      std.dev.:   0.36 s
      
      * CONFIG_RSEQ=y
      
      avg.:      40.46 s
      std.dev.:   0.33 s
      
      - Size
      
      On x86-64, between CONFIG_RSEQ=n/y, the text size increase of vmlinux is
      567 bytes, and the data size increase of vmlinux is 5696 bytes.
      
      [1] https://lwn.net/Articles/650333/
      [2] http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdfSigned-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Chris Lameter <cl@linux.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ben Maurer <bmaurer@fb.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-api@vger.kernel.org
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com
      Link: http://lkml.kernel.org/r/20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com
      Link: https://lkml.kernel.org/r/20180602124408.8430-3-mathieu.desnoyers@efficios.com
      d7822b1e
  14. 17 5月, 2018 2 次提交
  15. 12 5月, 2018 1 次提交
  16. 08 5月, 2018 1 次提交
  17. 19 4月, 2018 2 次提交
    • D
      time: Introduce CONFIG_COMPAT_32BIT_TIME · 17435e5f
      Deepa Dinamani 提交于
      Compat functions are now used to support 32 bit time_t in
      compat mode on 64 bit architectures and in native mode on
      32 bit architectures.
      
      Introduce COMPAT_32BIT_TIME to conditionally compile these
      functions.
      
      Note that turning off 32 bit time_t support requires more
      changes on architecture side. For instance, architecure
      syscall tables need to be updated to drop support for 32 bit
      time_t syscalls.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      17435e5f
    • D
      time: Introduce CONFIG_64BIT_TIME in architectures · d4703dda
      Deepa Dinamani 提交于
      There are a total of 53 system calls (aside from ioctl) that pass a time_t
      or derived data structure as an argument, and in order to extend time_t
      to 64-bit, we have to replace them with new system calls and keep providing
      backwards compatibility.
      
      To avoid adding completely new and untested code for this purpose, we
      introduce a new CONFIG_64BIT_TIME symbol. Every architecture that supports
      new 64 bit time_t syscalls enables this config.
      
      After this is done for all architectures, the CONFIG_64BIT_TIME symbol
      will be deleted.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      d4703dda
  18. 26 3月, 2018 1 次提交
  19. 07 2月, 2018 2 次提交
  20. 16 1月, 2018 1 次提交
    • K
      fork: Provide usercopy whitelisting for task_struct · 5905429a
      Kees Cook 提交于
      While the blocked and saved_sigmask fields of task_struct are copied to
      userspace (via sigmask_to_save() and setup_rt_frame()), it is always
      copied with a static length (i.e. sizeof(sigset_t)).
      
      The only portion of task_struct that is potentially dynamically sized and
      may be copied to userspace is in the architecture-specific thread_struct
      at the end of task_struct.
      
      cache object allocation:
          kernel/fork.c:
              alloc_task_struct_node(...):
                  return kmem_cache_alloc_node(task_struct_cachep, ...);
      
              dup_task_struct(...):
                  ...
                  tsk = alloc_task_struct_node(node);
      
              copy_process(...):
                  ...
                  dup_task_struct(...)
      
              _do_fork(...):
                  ...
                  copy_process(...)
      
      example usage trace:
      
          arch/x86/kernel/fpu/signal.c:
              __fpu__restore_sig(...):
                  ...
                  struct task_struct *tsk = current;
                  struct fpu *fpu = &tsk->thread.fpu;
                  ...
                  __copy_from_user(&fpu->state.xsave, ..., state_size);
      
              fpu__restore_sig(...):
                  ...
                  return __fpu__restore_sig(...);
      
          arch/x86/kernel/signal.c:
              restore_sigcontext(...):
                  ...
                  fpu__restore_sig(...)
      
      This introduces arch_thread_struct_whitelist() to let an architecture
      declare specifically where the whitelist should be within thread_struct.
      If undefined, the entire thread_struct field is left whitelisted.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: "Mickaël Salaün" <mic@digikod.net>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      5905429a
  21. 13 1月, 2018 1 次提交
    • M
      error-injection: Separate error-injection from kprobe · 540adea3
      Masami Hiramatsu 提交于
      Since error-injection framework is not limited to be used
      by kprobes, nor bpf. Other kernel subsystems can use it
      freely for checking safeness of error-injection, e.g.
      livepatch, ftrace etc.
      So this separate error-injection framework from kprobes.
      
      Some differences has been made:
      
      - "kprobe" word is removed from any APIs/structures.
      - BPF_ALLOW_ERROR_INJECTION() is renamed to
        ALLOW_ERROR_INJECTION() since it is not limited for BPF too.
      - CONFIG_FUNCTION_ERROR_INJECTION is the config item of this
        feature. It is automatically enabled if the arch supports
        error injection feature for kprobe or ftrace etc.
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      540adea3
  22. 10 1月, 2018 2 次提交
  23. 13 12月, 2017 1 次提交
  24. 11 11月, 2017 2 次提交
  25. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  26. 20 10月, 2017 1 次提交
  27. 09 10月, 2017 1 次提交
  28. 17 8月, 2017 1 次提交
    • K
      locking/refcounts, x86/asm: Implement fast refcount overflow protection · 7a46ec0e
      Kees Cook 提交于
      This implements refcount_t overflow protection on x86 without a noticeable
      performance impact, though without the fuller checking of REFCOUNT_FULL.
      
      This is done by duplicating the existing atomic_t refcount implementation
      but with normally a single instruction added to detect if the refcount
      has gone negative (e.g. wrapped past INT_MAX or below zero). When detected,
      the handler saturates the refcount_t to INT_MIN / 2. With this overflow
      protection, the erroneous reference release that would follow a wrap back
      to zero is blocked from happening, avoiding the class of refcount-overflow
      use-after-free vulnerabilities entirely.
      
      Only the overflow case of refcounting can be perfectly protected, since
      it can be detected and stopped before the reference is freed and left to
      be abused by an attacker. There isn't a way to block early decrements,
      and while REFCOUNT_FULL stops increment-from-zero cases (which would
      be the state _after_ an early decrement and stops potential double-free
      conditions), this fast implementation does not, since it would require
      the more expensive cmpxchg loops. Since the overflow case is much more
      common (e.g. missing a "put" during an error path), this protection
      provides real-world protection. For example, the two public refcount
      overflow use-after-free exploits published in 2016 would have been
      rendered unexploitable:
      
        http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/
      
        http://cyseclabs.com/page?n=02012016
      
      This implementation does, however, notice an unchecked decrement to zero
      (i.e. caller used refcount_dec() instead of refcount_dec_and_test() and it
      resulted in a zero). Decrements under zero are noticed (since they will
      have resulted in a negative value), though this only indicates that a
      use-after-free may have already happened. Such notifications are likely
      avoidable by an attacker that has already exploited a use-after-free
      vulnerability, but it's better to have them reported than allow such
      conditions to remain universally silent.
      
      On first overflow detection, the refcount value is reset to INT_MIN / 2
      (which serves as a saturation value) and a report and stack trace are
      produced. When operations detect only negative value results (such as
      changing an already saturated value), saturation still happens but no
      notification is performed (since the value was already saturated).
      
      On the matter of races, since the entire range beyond INT_MAX but before
      0 is negative, every operation at INT_MIN / 2 will trap, leaving no
      overflow-only race condition.
      
      As for performance, this implementation adds a single "js" instruction
      to the regular execution flow of a copy of the standard atomic_t refcount
      operations. (The non-"and_test" refcount_dec() function, which is uncommon
      in regular refcount design patterns, has an additional "jz" instruction
      to detect reaching exactly zero.) Since this is a forward jump, it is by
      default the non-predicted path, which will be reinforced by dynamic branch
      prediction. The result is this protection having virtually no measurable
      change in performance over standard atomic_t operations. The error path,
      located in .text.unlikely, saves the refcount location and then uses UD0
      to fire a refcount exception handler, which resets the refcount, handles
      reporting, and returns to regular execution. This keeps the changes to
      .text size minimal, avoiding return jumps and open-coded calls to the
      error reporting routine.
      
      Example assembly comparison:
      
      refcount_inc() before:
      
        .text:
        ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
      
      refcount_inc() after:
      
        .text:
        ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
        ffffffff8154614d:       0f 88 80 d5 17 00       js     ffffffff816c36d3
        ...
        .text.unlikely:
        ffffffff816c36d3:       48 8d 4d f4             lea    -0xc(%rbp),%rcx
        ffffffff816c36d7:       0f ff                   (bad)
      
      These are the cycle counts comparing a loop of refcount_inc() from 1
      to INT_MAX and back down to 0 (via refcount_dec_and_test()), between
      unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL
      (refcount_t-full), and this overflow-protected refcount (refcount_t-fast):
      
        2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s:
      		    cycles		protections
        atomic_t           82249267387	none
        refcount_t-fast    82211446892	overflow, untested dec-to-zero
        refcount_t-full   144814735193	overflow, untested dec-to-zero, inc-from-zero
      
      This code is a modified version of the x86 PAX_REFCOUNT atomic_t
      overflow defense from the last public patch of PaX/grsecurity, based
      on my understanding of the code. Changes or omissions from the original
      code are mine and don't reflect the original grsecurity/PaX code. Thanks
      to PaX Team for various suggestions for improvement for repurposing this
      code to be a refcount-only protection.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Elena Reshetova <elena.reshetova@intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Hans Liljestrand <ishkamiel@gmail.com>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Serge E. Hallyn <serge@hallyn.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arozansk@redhat.com
      Cc: axboe@kernel.dk
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-arch <linux-arch@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20170815161924.GA133115@beastSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7a46ec0e
  29. 08 8月, 2017 1 次提交
    • A
      gcc-plugins: structleak: add option to init all vars used as byref args · f7dd2507
      Ard Biesheuvel 提交于
      In the Linux kernel, struct type variables are rarely passed by-value,
      and so functions that initialize such variables typically take an input
      reference to the variable rather than returning a value that can
      subsequently be used in an assignment.
      
      If the initalization function is not part of the same compilation unit,
      the lack of an assignment operation defeats any analysis the compiler
      can perform as to whether the variable may be used before having been
      initialized. This means we may end up passing on such variables
      uninitialized, resulting in potential information leaks.
      
      So extend the existing structleak GCC plugin so it will [optionally]
      apply to all struct type variables that have their address taken at any
      point, rather than only to variables of struct types that have a __user
      annotation.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      f7dd2507
  30. 02 8月, 2017 1 次提交
  31. 13 7月, 2017 1 次提交
    • D
      include/linux/string.h: add the option of fortified string.h functions · 6974f0c4
      Daniel Micay 提交于
      This adds support for compiling with a rough equivalent to the glibc
      _FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer
      overflow checks for string.h functions when the compiler determines the
      size of the source or destination buffer at compile-time.  Unlike glibc,
      it covers buffer reads in addition to writes.
      
      GNU C __builtin_*_chk intrinsics are avoided because they would force a
      much more complex implementation.  They aren't designed to detect read
      overflows and offer no real benefit when using an implementation based
      on inline checks.  Inline checks don't add up to much code size and
      allow full use of the regular string intrinsics while avoiding the need
      for a bunch of _chk functions and per-arch assembly to avoid wrapper
      overhead.
      
      This detects various overflows at compile-time in various drivers and
      some non-x86 core kernel code.  There will likely be issues caught in
      regular use at runtime too.
      
      Future improvements left out of initial implementation for simplicity,
      as it's all quite optional and can be done incrementally:
      
      * Some of the fortified string functions (strncpy, strcat), don't yet
        place a limit on reads from the source based on __builtin_object_size of
        the source buffer.
      
      * Extending coverage to more string functions like strlcat.
      
      * It should be possible to optionally use __builtin_object_size(x, 1) for
        some functions (C strings) to detect intra-object overflows (like
        glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative
        approach to avoid likely compatibility issues.
      
      * The compile-time checks should be made available via a separate config
        option which can be enabled by default (or always enabled) once enough
        time has passed to get the issues it catches fixed.
      
      Kees said:
       "This is great to have. While it was out-of-tree code, it would have
        blocked at least CVE-2016-3858 from being exploitable (improper size
        argument to strlcpy()). I've sent a number of fixes for
        out-of-bounds-reads that this detected upstream already"
      
      [arnd@arndb.de: x86: fix fortified memcpy]
        Link: http://lkml.kernel.org/r/20170627150047.660360-1-arnd@arndb.de
      [keescook@chromium.org: avoid panic() in favor of BUG()]
        Link: http://lkml.kernel.org/r/20170626235122.GA25261@beast
      [keescook@chromium.org: move from -mm, add ARCH_HAS_FORTIFY_SOURCE, tweak Kconfig help]
      Link: http://lkml.kernel.org/r/20170526095404.20439-1-danielmicay@gmail.com
      Link: http://lkml.kernel.org/r/1497903987-21002-8-git-send-email-keescook@chromium.orgSigned-off-by: NDaniel Micay <danielmicay@gmail.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6974f0c4