1. 06 4月, 2019 18 次提交
    • A
      kprobes/x86: Blacklist non-attachable interrupt functions · d5813e77
      Andrea Righi 提交于
      [ Upstream commit a50480cb6d61d5c5fc13308479407b628b6bc1c5 ]
      
      These interrupt functions are already non-attachable by kprobes.
      Blacklist them explicitly so that they can show up in
      /sys/kernel/debug/kprobes/blacklist and tools like BCC can use this
      additional information.
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yonghong Song <yhs@fb.com>
      Link: http://lkml.kernel.org/r/20181206095648.GA8249@DellSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d5813e77
    • R
      x86/build: Mark per-CPU symbols as absolute explicitly for LLD · 648b949b
      Rafael Ávila de Espíndola 提交于
      [ Upstream commit d071ae09a4a1414c1433d5ae9908959a7325b0ad ]
      
      Accessing per-CPU variables is done by finding the offset of the
      variable in the per-CPU block and adding it to the address of the
      respective CPU's block.
      
      Section 3.10.8 of ld.bfd's documentation states:
      
        For expressions involving numbers, relative addresses and absolute
        addresses, ld follows these rules to evaluate terms:
      
        Other binary operations, that is, between two relative addresses
        not in the same section, or between a relative address and an
        absolute address, first convert any non-absolute term to an
        absolute address before applying the operator."
      
      Note that LLVM's linker does not adhere to the GNU ld's implementation
      and as such requires implicitly-absolute terms to be explicitly marked
      as absolute in the linker script. If not, it fails currently with:
      
        ld.lld: error: ./arch/x86/kernel/vmlinux.lds:153: at least one side of the expression must be absolute
        ld.lld: error: ./arch/x86/kernel/vmlinux.lds:154: at least one side of the expression must be absolute
        Makefile:1040: recipe for target 'vmlinux' failed
      
      This is not a functional change for ld.bfd which converts the term to an
      absolute symbol anyways as specified above.
      
      Based on a previous submission by Tri Vo <trong@android.com>.
      Reported-by: NDmitry Golovin <dima@golovin.in>
      Signed-off-by: NRafael Ávila de Espíndola <rafael@espindo.la>
      [ Update commit message per Boris' and Michael's suggestions. ]
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      [ Massage commit message more, fix typos. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NDmitry Golovin <dima@golovin.in>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Cao Jin <caoj.fnst@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tri Vo <trong@android.com>
      Cc: dima@golovin.in
      Cc: morbo@google.com
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20181219190145.252035-1-ndesaulniers@google.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      648b949b
    • G
      x86/build: Specify elf_i386 linker emulation explicitly for i386 objects · d2053718
      George Rimar 提交于
      [ Upstream commit 927185c124d62a9a4d35878d7f6d432a166b74e3 ]
      
      The kernel uses the OUTPUT_FORMAT linker script command in it's linker
      scripts. Most of the time, the -m option is passed to the linker with
      correct architecture, but sometimes (at least for x86_64) the -m option
      contradicts the OUTPUT_FORMAT directive.
      
      Specifically, arch/x86/boot and arch/x86/realmode/rm produce i386 object
      files, but are linked with the -m elf_x86_64 linker flag when building
      for x86_64.
      
      The GNU linker manpage doesn't explicitly state any tie-breakers between
      -m and OUTPUT_FORMAT. But with BFD and Gold linkers, OUTPUT_FORMAT
      overrides the emulation value specified with the -m option.
      
      LLVM lld has a different behavior, however. When supplied with
      contradicting -m and OUTPUT_FORMAT values it fails with the following
      error message:
      
        ld.lld: error: arch/x86/realmode/rm/header.o is incompatible with elf_x86_64
      
      Therefore, just add the correct -m after the incorrect one (it overrides
      it), so the linker invocation looks like this:
      
        ld -m elf_x86_64 -z max-page-size=0x200000 -m elf_i386 --emit-relocs -T \
          realmode.lds header.o trampoline_64.o stack.o reboot.o -o realmode.elf
      
      This is not a functional change for GNU ld, because (although not
      explicitly documented) OUTPUT_FORMAT overrides -m EMULATION.
      
      Tested by building x86_64 kernel with GNU gcc/ld toolchain and booting
      it in QEMU.
      
       [ bp: massage and clarify text. ]
      Suggested-by: NDmitry Golovin <dima@golovin.in>
      Signed-off-by: NGeorge Rimar <grimar@accesssoftek.com>
      Signed-off-by: NTri Vo <trong@android.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NTri Vo <trong@android.com>
      Tested-by: NNick Desaulniers <ndesaulniers@google.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Matz <matz@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: morbo@google.com
      Cc: ndesaulniers@google.com
      Cc: ruiu@google.com
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190111201012.71210-1-trong@android.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      d2053718
    • N
      powerpc/pseries: Perform full re-add of CPU for topology update post-migration · 06af7dda
      Nathan Fontenot 提交于
      [ Upstream commit 81b61324922c67f73813d8a9c175f3c153f6a1c6 ]
      
      On pseries systems, performing a partition migration can result in
      altering the nodes a CPU is assigned to on the destination system. For
      exampl, pre-migration on the source system CPUs are in node 1 and 3,
      post-migration on the destination system CPUs are in nodes 2 and 3.
      
      Handling the node change for a CPU can cause corruption in the slab
      cache if we hit a timing where a CPUs node is changed while cache_reap()
      is invoked. The corruption occurs because the slab cache code appears
      to rely on the CPU and slab cache pages being on the same node.
      
      The current dynamic updating of a CPUs node done in arch/powerpc/mm/numa.c
      does not prevent us from hitting this scenario.
      
      Changing the device tree property update notification handler that
      recognizes an affinity change for a CPU to do a full DLPAR remove and
      add of the CPU instead of dynamically changing its node resolves this
      issue.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NMichael W. Bringmann <mwb@linux.vnet.ibm.com>
      Tested-by: NMichael W. Bringmann <mwb@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      06af7dda
    • N
      powerpc/64s: Clear on-stack exception marker upon exception return · b52681e6
      Nicolai Stange 提交于
      [ Upstream commit eddd0b332304d554ad6243942f87c2fcea98c56b ]
      
      The ppc64 specific implementation of the reliable stacktracer,
      save_stack_trace_tsk_reliable(), bails out and reports an "unreliable
      trace" whenever it finds an exception frame on the stack. Stack frames
      are classified as exception frames if the STACK_FRAME_REGS_MARKER
      magic, as written by exception prologues, is found at a particular
      location.
      
      However, as observed by Joe Lawrence, it is possible in practice that
      non-exception stack frames can alias with prior exception frames and
      thus, that the reliable stacktracer can find a stale
      STACK_FRAME_REGS_MARKER on the stack. It in turn falsely reports an
      unreliable stacktrace and blocks any live patching transition to
      finish. Said condition lasts until the stack frame is
      overwritten/initialized by function call or other means.
      
      In principle, we could mitigate this by making the exception frame
      classification condition in save_stack_trace_tsk_reliable() stronger:
      in addition to testing for STACK_FRAME_REGS_MARKER, we could also take
      into account that for all exceptions executing on the kernel stack
        - their stack frames's backlink pointers always match what is saved
          in their pt_regs instance's ->gpr[1] slot and that
        - their exception frame size equals STACK_INT_FRAME_SIZE, a value
          uncommonly large for non-exception frames.
      
      However, while these are currently true, relying on them would make
      the reliable stacktrace implementation more sensitive towards future
      changes in the exception entry code. Note that false negatives, i.e.
      not detecting exception frames, would silently break the live patching
      consistency model.
      
      Furthermore, certain other places (diagnostic stacktraces, perf, xmon)
      rely on STACK_FRAME_REGS_MARKER as well.
      
      Make the exception exit code clear the on-stack
      STACK_FRAME_REGS_MARKER for those exceptions running on the "normal"
      kernel stack and returning to kernelspace: because the topmost frame
      is ignored by the reliable stack tracer anyway, returns to userspace
      don't need to take care of clearing the marker.
      
      Furthermore, as I don't have the ability to test this on Book 3E or 32
      bits, limit the change to Book 3S and 64 bits.
      
      Fixes: df78d3f6 ("powerpc/livepatch: Implement reliable stack tracing for the consistency model")
      Reported-by: NJoe Lawrence <joe.lawrence@redhat.com>
      Signed-off-by: NNicolai Stange <nstange@suse.de>
      Signed-off-by: NJoe Lawrence <joe.lawrence@redhat.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b52681e6
    • R
      ARM: avoid Cortex-A9 livelock on tight dmb loops · 30d503ba
      Russell King 提交于
      [ Upstream commit 5388a5b82199facacd3d7ac0d05aca6e8f902fed ]
      
      machine_crash_nonpanic_core() does this:
      
      	while (1)
      		cpu_relax();
      
      because the kernel has crashed, and we have no known safe way to deal
      with the CPU.  So, we place the CPU into an infinite loop which we
      expect it to never exit - at least not until the system as a whole is
      reset by some method.
      
      In the absence of erratum 754327, this code assembles to:
      
      	b	.
      
      In other words, an infinite loop.  When erratum 754327 is enabled,
      this becomes:
      
      1:	dmb
      	b	1b
      
      It has been observed that on some systems (eg, OMAP4) where, if a
      crash is triggered, the system tries to kexec into the panic kernel,
      but fails after taking the secondary CPU down - placing it into one
      of these loops.  This causes the system to livelock, and the most
      noticable effect is the system stops after issuing:
      
      	Loading crashdump kernel...
      
      to the system console.
      
      The tested as working solution I came up with was to add wfe() to
      these infinite loops thusly:
      
      	while (1) {
      		cpu_relax();
      		wfe();
      	}
      
      which, without 754327 builds to:
      
      1:	wfe
      	b	1b
      
      or with 754327 is enabled:
      
      1:	dmb
      	wfe
      	b	1b
      
      Adding "wfe" does two things depending on the environment we're running
      under:
      - where we're running on bare metal, and the processor implements
        "wfe", it stops us spinning endlessly in a loop where we're never
        going to do any useful work.
      - if we're running in a VM, it allows the CPU to be given back to the
        hypervisor and rescheduled for other purposes (maybe a different VM)
        rather than wasting CPU cycles inside a crashed VM.
      
      However, in light of erratum 794072, Will Deacon wanted to see 10 nops
      as well - which is reasonable to cover the case where we have erratum
      754327 enabled _and_ we have a processor that doesn't implement the
      wfe hint.
      
      So, we now end up with:
      
      1:      wfe
              b       1b
      
      when erratum 754327 is disabled, or:
      
      1:      dmb
              nop
              nop
              nop
              nop
              nop
              nop
              nop
              nop
              nop
              nop
              wfe
              b       1b
      
      when erratum 754327 is enabled.  We also get the dmb + 10 nop
      sequence elsewhere in the kernel, in terminating loops.
      
      This is reasonable - it means we get the workaround for erratum
      794072 when erratum 754327 is enabled, but still relinquish the dead
      processor - either by placing it in a lower power mode when wfe is
      implemented as such or by returning it to the hypervisior, or in the
      case where wfe is a no-op, we use the workaround specified in erratum
      794072 to avoid the problem.
      
      These as two entirely orthogonal problems - the 10 nops addresses
      erratum 794072, and the wfe is an optimisation that makes the system
      more efficient when crashed either in terms of power consumption or
      by allowing the host/other VMs to make use of the CPU.
      
      I don't see any reason not to use kexec() inside a VM - it has the
      potential to provide automated recovery from a failure of the VMs
      kernel with the opportunity for saving a crashdump of the failure.
      A panic() with a reboot timeout won't do that, and reading the
      libvirt documentation, setting on_reboot to "preserve" won't either
      (the documentation states "The preserve action for an on_reboot event
      is treated as a destroy".)  Surely it has to be a good thing to
      avoiding having CPUs spinning inside a VM that is doing no useful
      work.
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      30d503ba
    • V
      ARM: 8830/1: NOMMU: Toggle only bits in EXC_RETURN we are really care of · d8945878
      Vladimir Murzin 提交于
      [ Upstream commit 72cd4064fccaae15ab84d40d4be23667402df4ed ]
      
      ARMv8M introduces support for Security extension to M class, among
      other things it affects exception handling, especially, encoding of
      EXC_RETURN.
      
      The new bits have been added:
      
      Bit [6]	Secure or Non-secure stack
      Bit [5]	Default callee register stacking
      Bit [0]	Exception Secure
      
      which conflicts with hard-coded value of EXC_RETURN:
      
      In fact, we only care of few bits:
      
      Bit [3]	 Mode (0 - Handler, 1 - Thread)
      Bit [2]	 Stack pointer selection (0 - Main, 1 - Process)
      
      We can toggle only those bits and left other bits as they were on
      exception entry.
      
      It is basically, what patch does - saves EXC_RETURN when we do
      transition form Thread to Handler mode (it is first svc), so later
      saved value is used instead of EXC_RET_THREADMODE_PROCESSSTACK.
      Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d8945878
    • M
      ARM: dts: lpc32xx: Remove leading 0x and 0s from bindings notation · 240a9050
      Mathieu Malaterre 提交于
      [ Upstream commit 3e3380d0675d5e20b0af067d60cb947a4348bf9b ]
      
      Improve the DTS files by removing all the leading "0x" and zeros to fix
      the following dtc warnings:
      
      Warning (unit_address_format): Node /XXX unit name should not have leading "0x"
      
      and
      
      Warning (unit_address_format): Node /XXX unit name should not have leading 0s
      
      Converted using the following command:
      
      find . -type f \( -iname *.dts -o -iname *.dtsi \) -exec sed -i -e "s/@\([0-9a-fA-FxX\.;:#]+\)\s*{/@\L\1 {/g" -e "s/@0x\(.*\) {/@\1 {/g" -e "s/@0+\(.*\) {/@\1 {/g" {} +
      
      For simplicity, two sed expressions were used to solve each warnings
      separately.
      
      To make the regex expression more robust a few other issues were resolved,
      namely setting unit-address to lower case, and adding a whitespace before
      the opening curly brace:
      
      https://elinux.org/Device_Tree_Linux#Linux_conventions
      
      This will solve as a side effect warning:
      
      Warning (simple_bus_reg): Node /XXX@<UPPER> simple-bus unit address format error, expected "<lower>"
      
      This is a follow up to commit 4c9847b7 ("dt-bindings: Remove leading 0x from bindings notation")
      Reported-by: NDavid Daney <ddaney@caviumnetworks.com>
      Suggested-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NMathieu Malaterre <malat@debian.org>
      [vzapolskiy: fixed commit message to pass checkpatch.pl test]
      Signed-off-by: NVladimir Zapolskiy <vz@mleia.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      240a9050
    • M
      perf/aux: Make perf_event accessible to setup_aux() · efd85d83
      Mathieu Poirier 提交于
      [ Upstream commit 840018668ce2d96783356204ff282d6c9b0e5f66 ]
      
      When pmu::setup_aux() is called the coresight PMU needs to know which
      sink to use for the session by looking up the information in the
      event's attr::config2 field.
      
      As such simply replace the cpu information by the complete perf_event
      structure and change all affected customers.
      Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: NSuzuki Poulouse <suzuki.poulose@arm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-s390@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190131184714.20388-2-mathieu.poirier@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      efd85d83
    • M
      ARM: dts: meson8b: fix the Ethernet data line signals in eth_rgmii_pins · d21a63fc
      Martin Blumenstingl 提交于
      [ Upstream commit 29f0023d01f063feacfc404f0446905aee4f82ee ]
      
      According to the Odroid-C1+ schematics the Ethernet TXD1 signal is
      routed to GPIOH_5 and the TXD0 signal is routed to GPIOH_6.
      The public S805 datasheet shows that TXD0 can be routed to DIF_2_P and
      TXD1 can be routed to DIF_2_N instead.
      
      The pin groups eth_txd0_0 (GPIOH_6) and eth_txd0_1 (DIF_2_P) are both
      configured as Ethernet TXD0 and TXD1 data lines in meson8b.dtsi. At the
      same time eth_txd1_0 (GPIOH_5) and eth_txd1_1 (DIF_2_N) are configured
      as TXD0 and TXD1 data lines as well.
      This results in a bad Ethernet receive performance. Presumably this is
      due to the eth_txd0 and eth_txd1 signal being routed to the wrong pins.
      As a result of that data can only be transmitted on eth_txd2 and
      eth_txd3. However, I have no scope to fully confirm this assumption.
      
      The vendor u-boot sources for Odroid-C1 use the following Ethernet
      pinmux configuration:
        SET_CBUS_REG_MASK(PERIPHS_PIN_MUX_6, 0x3f4f);
        SET_CBUS_REG_MASK(PERIPHS_PIN_MUX_7, 0xf00000);
      This translates to the following pin groups in the mainline kernel:
      - register 6 bit  0: eth_rxd1 (DIF_0_P)
      - register 6 bit  1: eth_rxd0 (DIF_0_N)
      - register 6 bit  2: eth_rx_dv (DIF_1_P)
      - register 6 bit  3: eth_rx_clk (DIF_1_N)
      - register 6 bit  6: eth_tx_en (DIF_3_P)
      - register 6 bit  8: eth_ref_clk (DIF_3_N)
      - register 6 bit  9: eth_mdc (DIF_4_P)
      - register 6 bit 10: eth_mdio_en (DIF_4_N)
      - register 6 bit 11: eth_tx_clk (GPIOH_9)
      - register 6 bit 12: eth_txd2 (GPIOH_8)
      - register 6 bit 13: eth_txd3 (GPIOH_7)
      - register 7 bit 20: eth_txd0_0 (GPIOH_6)
      - register 7 bit 21: eth_txd1_0 (GPIOH_5)
      - register 7 bit 22: eth_rxd3 (DIF_2_P)
      - register 7 bit 23: eth_rxd2 (DIF_2_N)
      
      Drop the eth_txd0_1 and eth_txd1_1 groups from eth_rgmii_pins to fix the
      Ethernet transmit performance on Odroid-C1. Also add the eth_rxd2 and
      eth_rxd3 groups so we don't rely on the bootloader to set them up.
      
      iperf3 statistics before this change:
      - transmitting from Odroid-C1: 741 Mbits/sec (0 retries)
      - receiving on Odroid-C1: 199 Mbits/sec (1713 retries)
      
      iperf3 statistics after this change:
      - transmitting from Odroid-C1: 667 Mbits/sec (0 retries)
      - receiving on Odroid-C1: 750 Mbits/sec (0 retries)
      
      Fixes: b9644654 ("ARM: dts: meson8b: extend ethernet controller description")
      Signed-off-by: NMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Cc: Emiliano Ingrassia <ingrassia@epigenesys.com>
      Cc: Linus Lüssing <linus.luessing@c0d3.blue>
      Tested-by: NEmiliano Ingrassia <ingrassia@epigenesys.com>
      Reviewed-by: NEmiliano Ingrassia <ingrassia@epigenesys.com>
      Signed-off-by: NKevin Hilman <khilman@baylibre.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d21a63fc
    • N
      ARM: 8833/1: Ensure that NEON code always compiles with Clang · d93fe5e6
      Nathan Chancellor 提交于
      [ Upstream commit de9c0d49d85dc563549972edc5589d195cd5e859 ]
      
      While building arm32 allyesconfig, I ran into the following errors:
      
        arch/arm/lib/xor-neon.c:17:2: error: You should compile this file with
        '-mfloat-abi=softfp -mfpu=neon'
      
        In file included from lib/raid6/neon1.c:27:
        /home/nathan/cbl/prebuilt/lib/clang/8.0.0/include/arm_neon.h:28:2:
        error: "NEON support not enabled"
      
      Building V=1 showed NEON_FLAGS getting passed along to Clang but
      __ARM_NEON__ was not getting defined. Ultimately, it boils down to Clang
      only defining __ARM_NEON__ when targeting armv7, rather than armv6k,
      which is the '-march' value for allyesconfig.
      
      >From lib/Basic/Targets/ARM.cpp in the Clang source:
      
        // This only gets set when Neon instructions are actually available, unlike
        // the VFP define, hence the soft float and arch check. This is subtly
        // different from gcc, we follow the intent which was that it should be set
        // when Neon instructions are actually available.
        if ((FPU & NeonFPU) && !SoftFloat && ArchVersion >= 7) {
          Builder.defineMacro("__ARM_NEON", "1");
          Builder.defineMacro("__ARM_NEON__");
          // current AArch32 NEON implementations do not support double-precision
          // floating-point even when it is present in VFP.
          Builder.defineMacro("__ARM_NEON_FP",
                              "0x" + Twine::utohexstr(HW_FP & ~HW_FP_DP));
        }
      
      Ard Biesheuvel recommended explicitly adding '-march=armv7-a' at the
      beginning of the NEON_FLAGS definitions so that __ARM_NEON__ always gets
      definined by Clang. This doesn't functionally change anything because
      that code will only run where NEON is supported, which is implicitly
      armv7.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/287Suggested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Acked-by: NNicolas Pitre <nico@linaro.org>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: NStefan Agner <stefan@agner.ch>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d93fe5e6
    • A
      powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback · 6cf5f631
      Aneesh Kumar K.V 提交于
      [ Upstream commit 5330367fa300742a97e20e953b1f77f48392faae ]
      
      After we ALIGN up the address we need to make sure we didn't overflow
      and resulted in zero address. In that case, we need to make sure that
      the returned address is greater than mmap_min_addr.
      
      This fixes selftest va_128TBswitch --run-hugetlb reporting failures when
      run as non root user for
      
      mmap(-1, MAP_HUGETLB)
      
      The bug is that a non-root user requesting address -1 will be given address 0
      which will then fail, whereas they should have been given something else that
      would have succeeded.
      
      We also avoid the first mmap(-1, MAP_HUGETLB) returning NULL address as mmap address
      with this change. So we think this is not a security issue, because it only affects
      whether we choose an address below mmap_min_addr, not whether we
      actually allow that address to be mapped. ie. there are existing capability
      checks to prevent a user mapping below mmap_min_addr and those will still be
      honoured even without this fix.
      
      Fixes: 48483760 ("powerpc/mm: Add radix support for hugetlb")
      Reviewed-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6cf5f631
    • S
      ARM: 8840/1: use a raw_spinlock_t in unwind · d81bdb3c
      Sebastian Andrzej Siewior 提交于
      [ Upstream commit 74ffe79ae538283bbf7c155e62339f1e5c87b55a ]
      
      Mostly unwind is done with irqs enabled however SLUB may call it with
      irqs disabled while creating a new SLUB cache.
      
      I had system freeze while loading a module which called
      kmem_cache_create() on init. That means SLUB's __slab_alloc() disabled
      interrupts and then
      
      ->new_slab_objects()
       ->new_slab()
        ->setup_object()
         ->setup_object_debug()
          ->init_tracking()
           ->set_track()
            ->save_stack_trace()
             ->save_stack_trace_tsk()
              ->walk_stackframe()
               ->unwind_frame()
                ->unwind_find_idx()
                 =>spin_lock_irqsave(&unwind_lock);
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d81bdb3c
    • N
      powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc · c70214d5
      Nathan Chancellor 提交于
      [ Upstream commit e7140639b1de65bba435a6bd772d134901141f86 ]
      
      When building with -Wsometimes-uninitialized, Clang warns:
      
        arch/powerpc/xmon/ppc-dis.c:157:7: warning: variable 'opcode' is used
        uninitialized whenever 'if' condition is false
        [-Wsometimes-uninitialized]
          if (cpu_has_feature(CPU_FTRS_POWER9))
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        arch/powerpc/xmon/ppc-dis.c:167:7: note: uninitialized use occurs here
          if (opcode == NULL)
              ^~~~~~
        arch/powerpc/xmon/ppc-dis.c:157:3: note: remove the 'if' if its
        condition is always true
          if (cpu_has_feature(CPU_FTRS_POWER9))
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        arch/powerpc/xmon/ppc-dis.c:132:38: note: initialize the variable
        'opcode' to silence this warning
          const struct powerpc_opcode *opcode;
                                             ^
                                              = NULL
        1 warning generated.
      
      This warning seems to make no sense on the surface because opcode is set
      to NULL right below this statement. However, there is a comma instead of
      semicolon to end the dialect assignment, meaning that the opcode
      assignment only happens in the if statement. Properly terminate that
      line so that Clang no longer warns.
      
      Fixes: 5b102782 ("powerpc/xmon: Enable disassembly files (compilation changes)")
      Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c70214d5
    • A
      powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables · 4acf7974
      Alexey Kardashevskiy 提交于
      [ Upstream commit 11f5acce2fa43b015a8120fa7620fa4efd0a2952 ]
      
      We store 2 multilevel tables in iommu_table - one for the hardware and
      one with the corresponding userspace addresses. Before allocating
      the tables, the iommu_table_group_ops::get_table_size() hook returns
      the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
      the locked_vm counter correctly. When the table is actually allocated,
      the amount of allocated memory is stored in iommu_table::it_allocated_size
      and used to decrement the locked_vm counter when we release the memory
      used by the table; .get_table_size() and .create_table() calculate it
      independently but the result is expected to be the same.
      
      However the allocator does not add the userspace table size to
      .it_allocated_size so when we destroy the table because of VFIO PCI
      unplug (i.e. VFIO container is gone but the userspace keeps running),
      we decrement locked_vm by just a half of size of memory we are
      releasing.
      
      To make things worse, since we enabled on-demand allocation of
      indirect levels, it_allocated_size contains only the amount of memory
      actually allocated at the table creation time which can just be a
      fraction. It is not a problem with incrementing locked_vm (as
      get_table_size() value is used) but it is with decrementing.
      
      As the result, we leak locked_vm and may not be able to allocate more
      IOMMU tables after few iterations of hotplug/unplug.
      
      This sets it_allocated_size in the pnv_pci_ioda2_ops::create_table()
      hook to what pnv_pci_ioda2_get_table_size() returns so from now on we
      have a single place which calculates the maximum memory a table can
      occupy. The original meaning of it_allocated_size is somewhat lost now
      though.
      
      We do not ditch it_allocated_size whatsoever here and we do not call
      get_table_size() from vfio_iommu_spapr_tce.c when decrementing
      locked_vm as we may have multiple IOMMU groups per container and even
      though they all are supposed to have the same get_table_size()
      implementation, there is a small chance for failure or confusion.
      
      Fixes: 090bad39 ("powerpc/powernv: Add indirect levels to it_userspace")
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4acf7974
    • K
      x86/hyperv: Fix kernel panic when kexec on HyperV · c3f28d59
      Kairui Song 提交于
      [ Upstream commit 179fb36abb097976997f50733d5b122a29158cba ]
      
      After commit 68bb7bfb ("X86/Hyper-V: Enable IPI enlightenments"),
      kexec fails with a kernel panic:
      
      kexec_core: Starting new kernel
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v3.0 03/02/2018
      RIP: 0010:0xffffc9000001d000
      
      Call Trace:
       ? __send_ipi_mask+0x1c6/0x2d0
       ? hv_send_ipi_mask_allbutself+0x6d/0xb0
       ? mp_save_irq+0x70/0x70
       ? __ioapic_read_entry+0x32/0x50
       ? ioapic_read_entry+0x39/0x50
       ? clear_IO_APIC_pin+0xb8/0x110
       ? native_stop_other_cpus+0x6e/0x170
       ? native_machine_shutdown+0x22/0x40
       ? kernel_kexec+0x136/0x156
      
      That happens if hypercall based IPIs are used because the hypercall page is
      reset very early upon kexec reboot, but kexec sends IPIs to stop CPUs,
      which invokes the hypercall and dereferences the unusable page.
      
      To fix his, reset hv_hypercall_pg to NULL before the page is reset to avoid
      any misuse, IPI sending will fall back to the non hypercall based
      method. This only happens on kexec / kdump so just setting the pointer to
      NULL is good enough.
      
      Fixes: 68bb7bfb ("X86/Hyper-V: Enable IPI enlightenments")
      Signed-off-by: NKairui Song <kasong@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: devel@linuxdriverproject.org
      Link: https://lkml.kernel.org/r/20190306111827.14131-1-kasong@redhat.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      c3f28d59
    • M
      h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux- · 56bb66c5
      Masahiro Yamada 提交于
      [ Upstream commit fc2b47b55f17fd996f7a01975ce1c33c2f2513f6 ]
      
      It believe it is a bad idea to hardcode a specific compiler prefix
      that may or may not be installed on a user's system. It is annoying
      when testing features that should not require compilers at all.
      
      For example, mrproper, headers_install, etc. should work without
      any compiler.
      
      They look like follows on my machine.
      
      $ make ARCH=h8300 mrproper
      ./scripts/gcc-version.sh: line 26: h8300-unknown-linux-gcc: command not found
      ./scripts/gcc-version.sh: line 27: h8300-unknown-linux-gcc: command not found
      make: h8300-unknown-linux-gcc: Command not found
      make: h8300-unknown-linux-gcc: Command not found
        [ a bunch of the same error messages continue ]
      
      $ make ARCH=h8300 headers_install
      ./scripts/gcc-version.sh: line 26: h8300-unknown-linux-gcc: command not found
      ./scripts/gcc-version.sh: line 27: h8300-unknown-linux-gcc: command not found
      make: h8300-unknown-linux-gcc: Command not found
        HOSTCC  scripts/basic/fixdep
      make: h8300-unknown-linux-gcc: Command not found
        WRAP    arch/h8300/include/generated/uapi/asm/kvm_para.h
        [ snip ]
      
      The solution is to delete this line, or to use cc-cross-prefix like
      some architectures do. I chose the latter as a moderate fixup.
      
      I added an alternative 'h8300-linux-' because it is available at:
      
      https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/8.1.0/Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      56bb66c5
    • W
      arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals · bd62f1fe
      Will Deacon 提交于
      commit b9a4b9d084d978f80eb9210727c81804588b42ff upstream.
      
      FAR_EL1 is UNKNOWN for all debug exceptions other than those caused by
      taking a hardware watchpoint. Unfortunately, if a debug handler returns
      a non-zero value, then we will propagate the UNKNOWN FAR value to
      userspace via the si_addr field of the SIGTRAP siginfo_t.
      
      Instead, let's set si_addr to take on the PC of the faulting instruction,
      which we have available in the current pt_regs.
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      bd62f1fe
  2. 03 4月, 2019 19 次提交
    • S
      KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts · 3a18eaba
      Sean Christopherson 提交于
      commit 0cf9135b773bf32fba9dd8e6699c1b331ee4b749 upstream.
      
      The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host
      userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES
      regardless of hardware support under the pretense that KVM fully
      emulates MSR_IA32_ARCH_CAPABILITIES.  Unfortunately, only VMX hosts
      handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS
      also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts).
      
      Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so
      that it's emulated on AMD hosts.
      
      Fixes: 1eaafe91 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported")
      Cc: stable@vger.kernel.org
      Reported-by: NXiaoyao Li <xiaoyao.li@linux.intel.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a18eaba
    • S
      KVM: x86: update %rip after emulating IO · b9733a74
      Sean Christopherson 提交于
      commit 45def77ebf79e2e8942b89ed79294d97ce914fa0 upstream.
      
      Most (all?) x86 platforms provide a port IO based reset mechanism, e.g.
      OUT 92h or CF9h.  Userspace may emulate said mechanism, i.e. reset a
      vCPU in response to KVM_EXIT_IO, without explicitly announcing to KVM
      that it is doing a reset, e.g. Qemu jams vCPU state and resumes running.
      
      To avoid corruping %rip after such a reset, commit 0967b7bf ("KVM:
      Skip pio instruction when it is emulated, not executed") changed the
      behavior of PIO handlers, i.e. today's "fast" PIO handling to skip the
      instruction prior to exiting to userspace.  Full emulation doesn't need
      such tricks becase re-emulating the instruction will naturally handle
      %rip being changed to point at the reset vector.
      
      Updating %rip prior to executing to userspace has several drawbacks:
      
        - Userspace sees the wrong %rip on the exit, e.g. if PIO emulation
          fails it will likely yell about the wrong address.
        - Single step exits to userspace for are effectively dropped as
          KVM_EXIT_DEBUG is overwritten with KVM_EXIT_IO.
        - Behavior of PIO emulation is different depending on whether it
          goes down the fast path or the slow path.
      
      Rather than skip the PIO instruction before exiting to userspace,
      snapshot the linear %rip and cancel PIO completion if the current
      value does not match the snapshot.  For a 64-bit vCPU, i.e. the most
      common scenario, the snapshot and comparison has negligible overhead
      as VMCS.GUEST_RIP will be cached regardless, i.e. there is no extra
      VMREAD in this case.
      
      All other alternatives to snapshotting the linear %rip that don't
      rely on an explicit reset announcenment suffer from one corner case
      or another.  For example, canceling PIO completion on any write to
      %rip fails if userspace does a save/restore of %rip, and attempting to
      avoid that issue by canceling PIO only if %rip changed then fails if PIO
      collides with the reset %rip.  Attempting to zero in on the exact reset
      vector won't work for APs, which means adding more hooks such as the
      vCPU's MP_STATE, and so on and so forth.
      
      Checking for a linear %rip match technically suffers from corner cases,
      e.g. userspace could theoretically rewrite the underlying code page and
      expect a different instruction to execute, or the guest hardcodes a PIO
      reset at 0xfffffff0, but those are far, far outside of what can be
      considered normal operation.
      
      Fixes: 432baf60 ("KVM: VMX: use kvm_fast_pio_in for handling IN I/O")
      Cc: <stable@vger.kernel.org>
      Reported-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9733a74
    • T
      x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y · a0713e81
      Thomas Gleixner 提交于
      commit bebd024e4815b1a170fcd21ead9c2222b23ce9e6 upstream.
      
      The SMT disable 'nosmt' command line argument is not working properly when
      CONFIG_HOTPLUG_CPU is disabled. The teardown of the sibling CPUs which are
      required to be brought up due to the MCE issues, cannot work. The CPUs are
      then kept in a half dead state.
      
      As the 'nosmt' functionality has become popular due to the speculative
      hardware vulnerabilities, the half torn down state is not a proper solution
      to the problem.
      
      Enforce CONFIG_HOTPLUG_CPU=y when SMP is enabled so the full operation is
      possible.
      Reported-by: NTianyu Lan <Tianyu.Lan@microsoft.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Konrad Wilk <konrad.wilk@oracle.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Mukesh Ojha <mojha@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190326163811.598166056@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0713e81
    • M
      powerpc/64: Fix memcmp reading past the end of src/dest · c91d07ad
      Michael Ellerman 提交于
      commit d9470757398a700d9450a43508000bcfd010c7a4 upstream.
      
      Chandan reported that fstests' generic/026 test hit a crash:
      
        BUG: Unable to handle kernel data access at 0xc00000062ac40000
        Faulting instruction address: 0xc000000000092240
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
        CPU: 0 PID: 27828 Comm: chacl Not tainted 5.0.0-rc2-next-20190115-00001-g6de6dba64dda #1
        NIP:  c000000000092240 LR: c00000000066a55c CTR: 0000000000000000
        REGS: c00000062c0c3430 TRAP: 0300   Not tainted  (5.0.0-rc2-next-20190115-00001-g6de6dba64dda)
        MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 44000842  XER: 20000000
        CFAR: 00007fff7f3108ac DAR: c00000062ac40000 DSISR: 40000000 IRQMASK: 0
        GPR00: 0000000000000000 c00000062c0c36c0 c0000000017f4c00 c00000000121a660
        GPR04: c00000062ac3fff9 0000000000000004 0000000000000020 00000000275b19c4
        GPR08: 000000000000000c 46494c4500000000 5347495f41434c5f c0000000026073a0
        GPR12: 0000000000000000 c0000000027a0000 0000000000000000 0000000000000000
        GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR20: c00000062ea70020 c00000062c0c38d0 0000000000000002 0000000000000002
        GPR24: c00000062ac3ffe8 00000000275b19c4 0000000000000001 c00000062ac30000
        GPR28: c00000062c0c38d0 c00000062ac30050 c00000062ac30058 0000000000000000
        NIP memcmp+0x120/0x690
        LR  xfs_attr3_leaf_lookup_int+0x53c/0x5b0
        Call Trace:
          xfs_attr3_leaf_lookup_int+0x78/0x5b0 (unreliable)
          xfs_da3_node_lookup_int+0x32c/0x5a0
          xfs_attr_node_addname+0x170/0x6b0
          xfs_attr_set+0x2ac/0x340
          __xfs_set_acl+0xf0/0x230
          xfs_set_acl+0xd0/0x160
          set_posix_acl+0xc0/0x130
          posix_acl_xattr_set+0x68/0x110
          __vfs_setxattr+0xa4/0x110
          __vfs_setxattr_noperm+0xac/0x240
          vfs_setxattr+0x128/0x130
          setxattr+0x248/0x600
          path_setxattr+0x108/0x120
          sys_setxattr+0x28/0x40
          system_call+0x5c/0x70
        Instruction dump:
        7d201c28 7d402428 7c295040 38630008 38840008 408201f0 4200ffe8 2c050000
        4182ff6c 20c50008 54c61838 7d201c28 <7d402428> 7d293436 7d4a3436 7c295040
      
      The instruction dump decodes as:
        subfic  r6,r5,8
        rlwinm  r6,r6,3,0,28
        ldbrx   r9,0,r3
        ldbrx   r10,0,r4      <-
      
      Which shows us doing an 8 byte load from c00000062ac3fff9, which
      crosses the page boundary at c00000062ac40000 and faults.
      
      It's not OK for memcmp to read past the end of the source or
      destination buffers if that would cross a page boundary, because we
      don't know that the next page is mapped.
      
      As pointed out by Segher, we can read past the end of the source or
      destination as long as we don't cross a 4K boundary, because that's
      our minimum page size on all platforms.
      
      The bug is in the code at the .Lcmp_rest_lt8bytes label. When we get
      there we know that s1 is 8-byte aligned and we have at least 1 byte to
      read, so a single 8-byte load won't read past the end of s1 and cross
      a page boundary.
      
      But we have to be more careful with s2. So check if it's within 8
      bytes of a 4K boundary and if so go to the byte-by-byte loop.
      
      Fixes: 2d9ee327 ("powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()")
      Cc: stable@vger.kernel.org # v4.19+
      Reported-by: NChandan Rajendra <chandan@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NSegher Boessenkool <segher@kernel.crashing.org>
      Tested-by: NChandan Rajendra <chandan@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c91d07ad
    • G
      powerpc/pseries/energy: Use OF accessor functions to read ibm,drc-indexes · d7c00bbb
      Gautham R. Shenoy 提交于
      commit ce9afe08e71e3f7d64f337a6e932e50849230fc2 upstream.
      
      In cpu_to_drc_index() in the case when FW_FEATURE_DRC_INFO is absent,
      we currently use of_read_property() to obtain the pointer to the array
      corresponding to the property "ibm,drc-indexes". The elements of this
      array are of type __be32, but are accessed without any conversion to
      the OS-endianness, which is buggy on a Little Endian OS.
      
      Fix this by using of_property_read_u32_index() accessor function to
      safely read the elements of the array.
      
      Fixes: e83636ac ("pseries/drc-info: Search DRC properties for CPU indexes")
      Cc: stable@vger.kernel.org # v4.16+
      Reported-by: NPavithra R. Prakash <pavrampu@in.ibm.com>
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      [mpe: Make the WARN_ON a WARN_ON_ONCE so it's not retriggerable]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7c00bbb
    • N
      powerpc: bpf: Fix generation of load/store DW instructions · 92d4ee2e
      Naveen N. Rao 提交于
      commit 86be36f6502c52ddb4b85938145324fd07332da1 upstream.
      
      Yauheni Kaliuta pointed out that PTR_TO_STACK store/load verifier test
      was failing on powerpc64 BE, and rightfully indicated that the PPC_LD()
      macro is not masking away the last two bits of the offset per the ISA,
      resulting in the generation of 'lwa' instruction instead of the intended
      'ld' instruction.
      
      Segher also pointed out that we can't simply mask away the last two bits
      as that will result in loading/storing from/to a memory location that
      was not intended.
      
      This patch addresses this by using ldx/stdx if the offset is not
      word-aligned. We load the offset into a temporary register (TMP_REG_2)
      and use that as the index register in a subsequent ldx/stdx. We fix
      PPC_LD() macro to mask off the last two bits, but enhance PPC_BPF_LL()
      and PPC_BPF_STL() to factor in the offset value and generate the proper
      instruction sequence. We also convert all existing users of PPC_LD() and
      PPC_STD() to use these macros. All existing uses of these macros have
      been audited to ensure that TMP_REG_2 can be clobbered.
      
      Fixes: 156d0e29 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
      Cc: stable@vger.kernel.org # v4.9+
      Reported-by: NYauheni Kaliuta <yauheni.kaliuta@redhat.com>
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92d4ee2e
    • K
      ARM: imx6q: cpuidle: fix bug that CPU might not wake up at expected time · 9397f0d9
      Kohji Okuno 提交于
      commit 91740fc8242b4f260cfa4d4536d8551804777fae upstream.
      
      In the current cpuidle implementation for i.MX6q, the CPU that sets
      'WAIT_UNCLOCKED' and the CPU that returns to 'WAIT_CLOCKED' are always
      the same. While the CPU that sets 'WAIT_UNCLOCKED' is in IDLE state of
      "WAIT", if the other CPU wakes up and enters IDLE state of "WFI"
      istead of "WAIT", this CPU can not wake up at expired time.
       Because, in the case of "WFI", the CPU must be waked up by the local
      timer interrupt. But, while 'WAIT_UNCLOCKED' is set, the local timer
      is stopped, when all CPUs execute "wfi" instruction. As a result, the
      local timer interrupt is not fired.
       In this situation, this CPU will wake up by IRQ different from local
      timer. (e.g. broacast timer)
      
      So, this fix changes CPU to return to 'WAIT_CLOCKED'.
      Signed-off-by: NKohji Okuno <okuno.kohji@jp.panasonic.com>
      Fixes: e5f9dec8 ("ARM: imx6q: support WAIT mode using cpuidle")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NShawn Guo <shawnguo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9397f0d9
    • M
      powerpc/security: Fix spectre_v2 reporting · a1df5db3
      Michael Ellerman 提交于
      commit 92edf8df0ff2ae86cc632eeca0e651fd8431d40d upstream.
      
      When I updated the spectre_v2 reporting to handle software count cache
      flush I got the logic wrong when there's no software count cache
      enabled at all.
      
      The result is that on systems with the software count cache flush
      disabled we print:
      
        Mitigation: Indirect branch cache disabled, Software count cache flush
      
      Which correctly indicates that the count cache is disabled, but
      incorrectly says the software count cache flush is enabled.
      
      The root of the problem is that we are trying to handle all
      combinations of options. But we know now that we only expect to see
      the software count cache flush enabled if the other options are false.
      
      So split the two cases, which simplifies the logic and fixes the bug.
      We were also missing a space before "(hardware accelerated)".
      
      The result is we see one of:
      
        Mitigation: Indirect branch serialisation (kernel only)
        Mitigation: Indirect branch cache disabled
        Mitigation: Software count cache flush
        Mitigation: Software count cache flush (hardware accelerated)
      
      Fixes: ee13cb24 ("powerpc/64s: Add support for software count cache flush")
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NMichael Neuling <mikey@neuling.org>
      Reviewed-by: NDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1df5db3
    • C
      powerpc/fsl: Fix the flush of branch predictor. · 986f0c65
      Christophe Leroy 提交于
      commit 27da80719ef132cf8c80eb406d5aeb37dddf78cc upstream.
      
      The commit identified below adds MC_BTB_FLUSH macro only when
      CONFIG_PPC_FSL_BOOK3E is defined. This results in the following error
      on some configs (seen several times with kisskb randconfig_defconfig)
      
      arch/powerpc/kernel/exceptions-64e.S:576: Error: Unrecognized opcode: `mc_btb_flush'
      make[3]: *** [scripts/Makefile.build:367: arch/powerpc/kernel/exceptions-64e.o] Error 1
      make[2]: *** [scripts/Makefile.build:492: arch/powerpc/kernel] Error 2
      make[1]: *** [Makefile:1043: arch/powerpc] Error 2
      make: *** [Makefile:152: sub-make] Error 2
      
      This patch adds a blank definition of MC_BTB_FLUSH for other cases.
      
      Fixes: 10c5e83afd4a ("powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)")
      Cc: Diana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: NDaniel Axtens <dja@axtens.net>
      Reviewed-by: NDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      986f0c65
    • D
      powerpc/fsl: Fixed warning: orphan section `__btb_flush_fixup' · b848d19c
      Diana Craciun 提交于
      commit 039daac5526932ec731e4499613018d263af8b3e upstream.
      
      Fixed the following build warning:
      powerpc-linux-gnu-ld: warning: orphan section `__btb_flush_fixup' from
      `arch/powerpc/kernel/head_44x.o' being placed in section
      `__btb_flush_fixup'.
      Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b848d19c
    • D
      powerpc/fsl: Update Spectre v2 reporting · 632d8392
      Diana Craciun 提交于
      commit dfa88658fb0583abb92e062c7a9cd5a5b94f2a46 upstream.
      
      Report branch predictor state flush as a mitigation for
      Spectre variant 2.
      Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      632d8392
    • D
      powerpc/fsl: Enable runtime patching if nospectre_v2 boot arg is used · 43f40620
      Diana Craciun 提交于
      commit 3bc8ea8603ae4c1e09aca8de229ad38b8091fcb3 upstream.
      
      If the user choses not to use the mitigations, replace
      the code sequence with nops.
      Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43f40620
    • D
      powerpc/fsl: Flush branch predictor when entering KVM · a46a5038
      Diana Craciun 提交于
      commit e7aa61f47b23afbec41031bc47ca8d6cb6516abc upstream.
      
      Switching from the guest to host is another place
      where the speculative accesses can be exploited.
      Flush the branch predictor when entering KVM.
      Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a46a5038
    • D
      powerpc/fsl: Flush the branch predictor at each kernel entry (32 bit) · 3cb931c7
      Diana Craciun 提交于
      commit 7fef436295bf6c05effe682c8797dfcb0deb112a upstream.
      
      In order to protect against speculation attacks on
      indirect branches, the branch predictor is flushed at
      kernel entry to protect for the following situations:
      - userspace process attacking another userspace process
      - userspace process attacking the kernel
      Basically when the privillege level change (i.e.the kernel
      is entered), the branch predictor state is flushed.
      Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3cb931c7
    • D
      powerpc/fsl: Flush the branch predictor at each kernel entry (64bit) · cf72dad9
      Diana Craciun 提交于
      commit 10c5e83afd4a3f01712d97d3bb1ae34d5b74a185 upstream.
      
      In order to protect against speculation attacks on
      indirect branches, the branch predictor is flushed at
      kernel entry to protect for the following situations:
      - userspace process attacking another userspace process
      - userspace process attacking the kernel
      Basically when the privillege level change (i.e. the
      kernel is entered), the branch predictor state is flushed.
      Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf72dad9
    • D
      powerpc/fsl: Add nospectre_v2 command line argument · 020e5f13
      Diana Craciun 提交于
      commit f633a8ad636efb5d4bba1a047d4a0f1ef719aa06 upstream.
      
      When the command line argument is present, the Spectre variant 2
      mitigations are disabled.
      Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      020e5f13
    • D
      powerpc/fsl: Emulate SPRN_BUCSR register · 4a6a2287
      Diana Craciun 提交于
      commit 98518c4d8728656db349f875fcbbc7c126d4c973 upstream.
      
      In order to flush the branch predictor the guest kernel performs
      writes to the BUCSR register which is hypervisor privilleged. However,
      the branch predictor is flushed at each KVM entry, so the branch
      predictor has been already flushed, so just return as soon as possible
      to guest.
      Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
      [mpe: Tweak comment formatting]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a6a2287
    • D
      powerpc/fsl: Add macro to flush the branch predictor · 4944f1d4
      Diana Craciun 提交于
      commit 1cbf8990d79ff69da8ad09e8a3df014e1494462b upstream.
      
      The BUCSR register can be used to invalidate the entries in the
      branch prediction mechanisms.
      Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4944f1d4
    • D
      powerpc/fsl: Add infrastructure to fixup branch predictor flush · d67ab3d9
      Diana Craciun 提交于
      commit 76a5eaa38b15dda92cd6964248c39b5a6f3a4e9d upstream.
      
      In order to protect against speculation attacks (Spectre
      variant 2) on NXP PowerPC platforms, the branch predictor
      should be flushed when the privillege level is changed.
      This patch is adding the infrastructure to fixup at runtime
      the code sections that are performing the branch predictor flush
      depending on a boot arg parameter which is added later in a
      separate patch.
      Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d67ab3d9
  3. 27 3月, 2019 3 次提交
    • J
      x86/unwind: Add hardcoded ORC entry for NULL · 206a76a6
      Jann Horn 提交于
      commit ac5ceccce5501e43d217c596e4ee859f2a3fef79 upstream.
      
      When the ORC unwinder is invoked for an oops caused by IP==0,
      it currently has no idea what to do because there is no debug information
      for the stack frame of NULL.
      
      But if RIP is NULL, it is very likely that the last successfully executed
      instruction was an indirect CALL/JMP, and it is possible to unwind out in
      the same way as for the first instruction of a normal function. Hardcode
      a corresponding ORC entry.
      
      With an artificially-added NULL call in prctl_set_seccomp(), before this
      patch, the trace is:
      
      Call Trace:
       ? __x64_sys_prctl+0x402/0x680
       ? __ia32_sys_prctl+0x6e0/0x6e0
       ? __do_page_fault+0x457/0x620
       ? do_syscall_64+0x6d/0x160
       ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      After this patch, the trace looks like this:
      
      Call Trace:
       __x64_sys_prctl+0x402/0x680
       ? __ia32_sys_prctl+0x6e0/0x6e0
       ? __do_page_fault+0x457/0x620
       do_syscall_64+0x6d/0x160
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      prctl_set_seccomp() still doesn't show up in the trace because for some
      reason, tail call optimization is only disabled in builds that use the
      frame pointer unwinder.
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: syzbot <syzbot+ca95b2b7aef9e7cbd6ab@syzkaller.appspotmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michal Marek <michal.lkml@markovi.net>
      Cc: linux-kbuild@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190301031201.7416-2-jannh@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      206a76a6
    • J
      x86/unwind: Handle NULL pointer calls better in frame unwinder · 367ccafb
      Jann Horn 提交于
      commit f4f34e1b82eb4219d8eaa1c7e2e17ca219a6a2b5 upstream.
      
      When the frame unwinder is invoked for an oops caused by a call to NULL, it
      currently skips the parent function because BP still points to the parent's
      stack frame; the (nonexistent) current function only has the first half of
      a stack frame, and BP doesn't point to it yet.
      
      Add a special case for IP==0 that calculates a fake BP from SP, then uses
      the real BP for the next frame.
      
      Note that this handles first_frame specially: Return information about the
      parent function as long as the saved IP is >=first_frame, even if the fake
      BP points below it.
      
      With an artificially-added NULL call in prctl_set_seccomp(), before this
      patch, the trace is:
      
      Call Trace:
       ? prctl_set_seccomp+0x3a/0x50
       __x64_sys_prctl+0x457/0x6f0
       ? __ia32_sys_prctl+0x750/0x750
       do_syscall_64+0x72/0x160
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      After this patch, the trace is:
      
      Call Trace:
       prctl_set_seccomp+0x3a/0x50
       __x64_sys_prctl+0x457/0x6f0
       ? __ia32_sys_prctl+0x750/0x750
       do_syscall_64+0x72/0x160
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: syzbot <syzbot+ca95b2b7aef9e7cbd6ab@syzkaller.appspotmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michal Marek <michal.lkml@markovi.net>
      Cc: linux-kbuild@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190301031201.7416-1-jannh@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      367ccafb
    • M
      powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038 · b8ea151a
      Michael Ellerman 提交于
      commit b5b4453e7912f056da1ca7572574cada32ecb60c upstream.
      
      Jakub Drnec reported:
        Setting the realtime clock can sometimes make the monotonic clock go
        back by over a hundred years. Decreasing the realtime clock across
        the y2k38 threshold is one reliable way to reproduce. Allegedly this
        can also happen just by running ntpd, I have not managed to
        reproduce that other than booting with rtc at >2038 and then running
        ntp. When this happens, anything with timers (e.g. openjdk) breaks
        rather badly.
      
      And included a test case (slightly edited for brevity):
        #define _POSIX_C_SOURCE 199309L
        #include <stdio.h>
        #include <time.h>
        #include <stdlib.h>
        #include <unistd.h>
      
        long get_time(void) {
          struct timespec tp;
          clock_gettime(CLOCK_MONOTONIC, &tp);
          return tp.tv_sec + tp.tv_nsec / 1000000000;
        }
      
        int main(void) {
          long last = get_time();
          while(1) {
            long now = get_time();
            if (now < last) {
              printf("clock went backwards by %ld seconds!\n", last - now);
            }
            last = now;
            sleep(1);
          }
          return 0;
        }
      
      Which when run concurrently with:
       # date -s 2040-1-1
       # date -s 2037-1-1
      
      Will detect the clock going backward.
      
      The root cause is that wtom_clock_sec in struct vdso_data is only a
      32-bit signed value, even though we set its value to be equal to
      tk->wall_to_monotonic.tv_sec which is 64-bits.
      
      Because the monotonic clock starts at zero when the system boots the
      wall_to_montonic.tv_sec offset is negative for current and future
      dates. Currently on a freshly booted system the offset will be in the
      vicinity of negative 1.5 billion seconds.
      
      However if the wall clock is set past the Y2038 boundary, the offset
      from wall to monotonic becomes less than negative 2^31, and no longer
      fits in 32-bits. When that value is assigned to wtom_clock_sec it is
      truncated and becomes positive, causing the VDSO assembly code to
      calculate CLOCK_MONOTONIC incorrectly.
      
      That causes CLOCK_MONOTONIC to jump ahead by ~4 billion seconds which
      it is not meant to do. Worse, if the time is then set back before the
      Y2038 boundary CLOCK_MONOTONIC will jump backward.
      
      We can fix it simply by storing the full 64-bit offset in the
      vdso_data, and using that in the VDSO assembly code. We also shuffle
      some of the fields in vdso_data to avoid creating a hole.
      
      The original commit that added the CLOCK_MONOTONIC support to the VDSO
      did actually use a 64-bit value for wtom_clock_sec, see commit
      a7f290da ("[PATCH] powerpc: Merge vdso's and add vdso support to
      32 bits kernel") (Nov 2005). However just 3 days later it was
      converted to 32-bits in commit 0c37ec2a ("[PATCH] powerpc: vdso
      fixes (take #2)"), and the bug has existed since then AFAICS.
      
      Fixes: 0c37ec2a ("[PATCH] powerpc: vdso fixes (take #2)")
      Cc: stable@vger.kernel.org # v2.6.15+
      Link: http://lkml.kernel.org/r/HaC.ZfES.62bwlnvAvMP.1STMMj@seznam.czReported-by: NJakub Drnec <jaydee@email.cz>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8ea151a