1. 08 5月, 2019 10 次提交
  2. 05 5月, 2019 2 次提交
  3. 04 5月, 2019 10 次提交
    • R
      x86/mm: Don't exceed the valid physical address space · 6a364b2e
      Ralph Campbell 提交于
      [ Upstream commit 92c77f7c4d5dfaaf45b2ce19360e69977c264766 ]
      
      valid_phys_addr_range() is used to sanity check the physical address range
      of an operation, e.g., access to /dev/mem. It uses __pa(high_memory)
      internally.
      
      If memory is populated at the end of the physical address space, then
      __pa(high_memory) is outside of the physical address space because:
      
         high_memory = (void *)__va(max_pfn * PAGE_SIZE - 1) + 1;
      
      For the comparison in valid_phys_addr_range() this is not an issue, but if
      CONFIG_DEBUG_VIRTUAL is enabled, __pa() maps to __phys_addr(), which
      verifies that the resulting physical address is within the valid physical
      address space of the CPU. So in the case that memory is populated at the
      end of the physical address space, this is not true and triggers a
      VIRTUAL_BUG_ON().
      
      Use __pa(high_memory - 1) to prevent the conversion from going beyond
      the end of valid physical addresses.
      
      Fixes: be62a320 ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses")
      Signed-off-by: NRalph Campbell <rcampbell@nvidia.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Craig Bergstrom <craigb@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sander Eikelenboom <linux@eikelenboom.it>
      Cc: Sean Young <sean@mess.org>
      
      Link: https://lkml.kernel.org/r/20190326001817.15413-2-rcampbell@nvidia.comSigned-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
      6a364b2e
    • M
      x86/realmode: Don't leak the trampoline kernel address · fe71e625
      Matteo Croce 提交于
      [ Upstream commit b929a500d68479163c48739d809cbf4c1335db6f ]
      
      Since commit
      
        ad67b74d ("printk: hash addresses printed with %p")
      
      at boot "____ptrval____" is printed instead of the trampoline addresses:
      
        Base memory trampoline at [(____ptrval____)] 99000 size 24576
      
      Remove the print as we don't want to leak kernel addresses and this
      statement is not needed anymore.
      
      Fixes: ad67b74d ("printk: hash addresses printed with %p")
      Signed-off-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190326203046.20787-1-mcroce@redhat.comSigned-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
      fe71e625
    • S
      ARM: davinci: fix build failure with allnoconfig · 6222f1c6
      Sekhar Nori 提交于
      [ Upstream commit 2dbed152e2d4c3fe2442284918d14797898b1e8a ]
      
      allnoconfig build with just ARCH_DAVINCI enabled
      fails because drivers/clk/davinci/* depends on
      REGMAP being enabled.
      
      Fix it by selecting REGMAP_MMIO when building in
      DaVinci support.
      Signed-off-by: NSekhar Nori <nsekhar@ti.com>
      Reviewed-by: NDavid Lechner <david@lechnology.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
      6222f1c6
    • M
      ARM: dts: imx6qdl: Fix typo in imx6qdl-icore-rqs.dtsi · abd76731
      Masanari Iida 提交于
      [ Upstream commit 41b37f4c0fa67185691bcbd30201cad566f2f0d1 ]
      
      This patch fixes a spelling typo.
      Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
      Fixes: cc42603d ("ARM: dts: imx6q-icore-rqs: Add Engicam IMX6 Q7 initial support")
      Signed-off-by: NShawn Guo <shawnguo@kernel.org>
      Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
      abd76731
    • M
      ARM: dts: pfla02: increase phy reset duration · c6694e7c
      Marco Felsch 提交于
      [ Upstream commit 032f85c9360fb1a08385c584c2c4ed114b33c260 ]
      
      Increase the reset duration to ensure correct phy functionality. The
      reset duration is taken from barebox commit 52fdd510de ("ARM: dts:
      pfla02: use long enough reset for ethernet phy"):
      
        Use a longer reset time for ethernet phy Micrel KSZ9031RNX. Otherwise a
        small percentage of modules have 'transmission timeouts' errors like
      
        barebox@Phytec phyFLEX-i.MX6 Quad Carrier-Board:/ ifup eth0
        warning: No MAC address set. Using random address 7e:94:4d:02:f8:f3
        eth0: 1000Mbps full duplex link detected
        eth0: transmission timeout
        T eth0: transmission timeout
        T eth0: transmission timeout
        T eth0: transmission timeout
        T eth0: transmission timeout
      
      Cc: Stefan Christ <s.christ@phytec.de>
      Cc: Christian Hemp <c.hemp@phytec.de>
      Signed-off-by: NMarco Felsch <m.felsch@pengutronix.de>
      Fixes: 3180f956 ("ARM: dts: Phytec imx6q pfla02 and pbab01 support")
      Signed-off-by: NShawn Guo <shawnguo@kernel.org>
      Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
      c6694e7c
    • M
      KVM: arm/arm64: vgic-its: Take the srcu lock when writing to guest memory · 0371fa03
      Marc Zyngier 提交于
      [ Upstream commit a6ecfb11bf37743c1ac49b266595582b107b61d4 ]
      
      When halting a guest, QEMU flushes the virtual ITS caches, which
      amounts to writing to the various tables that the guest has allocated.
      
      When doing this, we fail to take the srcu lock, and the kernel
      shouts loudly if running a lockdep kernel:
      
      [   69.680416] =============================
      [   69.680819] WARNING: suspicious RCU usage
      [   69.681526] 5.1.0-rc1-00008-g600025238f51-dirty #18 Not tainted
      [   69.682096] -----------------------------
      [   69.682501] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
      [   69.683225]
      [   69.683225] other info that might help us debug this:
      [   69.683225]
      [   69.683975]
      [   69.683975] rcu_scheduler_active = 2, debug_locks = 1
      [   69.684598] 6 locks held by qemu-system-aar/4097:
      [   69.685059]  #0: 0000000034196013 (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0
      [   69.686087]  #1: 00000000f2ed935e (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0
      [   69.686919]  #2: 000000005e71ea54 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
      [   69.687698]  #3: 00000000c17e548d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
      [   69.688475]  #4: 00000000ba386017 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
      [   69.689978]  #5: 00000000c2c3c335 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
      [   69.690729]
      [   69.690729] stack backtrace:
      [   69.691151] CPU: 2 PID: 4097 Comm: qemu-system-aar Not tainted 5.1.0-rc1-00008-g600025238f51-dirty #18
      [   69.691984] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019
      [   69.692831] Call trace:
      [   69.694072]  lockdep_rcu_suspicious+0xcc/0x110
      [   69.694490]  gfn_to_memslot+0x174/0x190
      [   69.694853]  kvm_write_guest+0x50/0xb0
      [   69.695209]  vgic_its_save_tables_v0+0x248/0x330
      [   69.695639]  vgic_its_set_attr+0x298/0x3a0
      [   69.696024]  kvm_device_ioctl_attr+0x9c/0xd8
      [   69.696424]  kvm_device_ioctl+0x8c/0xf8
      [   69.696788]  do_vfs_ioctl+0xc8/0x960
      [   69.697128]  ksys_ioctl+0x8c/0xa0
      [   69.697445]  __arm64_sys_ioctl+0x28/0x38
      [   69.697817]  el0_svc_common+0xd8/0x138
      [   69.698173]  el0_svc_handler+0x38/0x78
      [   69.698528]  el0_svc+0x8/0xc
      
      The fix is to obviously take the srcu lock, just like we do on the
      read side of things since bf308242. One wonders why this wasn't
      fixed at the same time, but hey...
      
      Fixes: bf308242 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock")
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
      0371fa03
    • M
      KVM: arm64: Reset the PMU in preemptible context · 51a5d70a
      Marc Zyngier 提交于
      [ Upstream commit ebff0b0e3d3c862c16c487959db5e0d879632559 ]
      
      We've become very cautious to now always reset the vcpu when nothing
      is loaded on the physical CPU. To do so, we now disable preemption
      and do a kvm_arch_vcpu_put() to make sure we have all the state
      in memory (and that it won't be loaded behind out back).
      
      This now causes issues with resetting the PMU, which calls into perf.
      Perf itself uses mutexes, which clashes with the lack of preemption.
      It is worth realizing that the PMU is fully emulated, and that
      no PMU state is ever loaded on the physical CPU. This means we can
      perfectly reset the PMU outside of the non-preemptible section.
      
      Fixes: e761a927bc9a ("KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded")
      Reported-by: NJulien Grall <julien.grall@arm.com>
      Tested-by: NJulien Grall <julien.grall@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
      51a5d70a
    • W
      ARM: imx51: fix a leaked reference by adding missing of_node_put · 2cbb465e
      Wen Yang 提交于
      [ Upstream commit 0c17e83fe423467e3ccf0a02f99bd050a73bbeb4 ]
      
      The call to of_get_next_child returns a node pointer with refcount
      incremented thus it must be explicitly decremented after the last
      usage.
      
      Detected by coccinelle with the following warnings:
      ./arch/arm/mach-imx/mach-imx51.c:64:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 57, but without a corresponding object release within this function.
      Signed-off-by: NWen Yang <wen.yang99@zte.com.cn>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Shawn Guo <shawnguo@kernel.org>
      Cc: Sascha Hauer <s.hauer@pengutronix.de>
      Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
      Cc: Fabio Estevam <festevam@gmail.com>
      Cc: NXP Linux Team <linux-imx@nxp.com>
      Cc: Lucas Stach <l.stach@pengutronix.de>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NShawn Guo <shawnguo@kernel.org>
      Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
      2cbb465e
    • M
      s390: limit brk randomization to 32MB · a1e34e28
      Martin Schwidefsky 提交于
      [ Upstream commit cd479eccd2e057116d504852814402a1e68ead80 ]
      
      For a 64-bit process the randomization of the program break is quite
      large with 1GB. That is as big as the randomization of the anonymous
      mapping base, for a test case started with '/lib/ld64.so.1 <exec>'
      it can happen that the heap is placed after the stack. To avoid
      this limit the program break randomization to 32MB for 64-bit and
      keep 8MB for 31-bit.
      Reported-by: NStefan Liebler <stli@linux.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
      a1e34e28
    • H
      ARM: dts: bcm283x: Fix hdmi hpd gpio pull · d52dfdf1
      Helen Koike 提交于
      [ Upstream commit 544e784188f1dd7c797c70b213385e67d92005b6 ]
      
      Raspberry pi board model B revison 2 have the hot plug detector gpio
      active high (and not low as it was in the dts).
      Signed-off-by: NHelen Koike <helen.koike@collabora.com>
      Fixes: 49ac67e0 ("ARM: bcm2835: Add VC4 to the device tree.")
      Reviewed-by: NEric Anholt <eric@anholt.net>
      Signed-off-by: NEric Anholt <eric@anholt.net>
      Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
      d52dfdf1
  4. 02 5月, 2019 8 次提交
    • S
      x86/fpu: Don't export __kernel_fpu_{begin,end}() · d4ff57d0
      Sebastian Andrzej Siewior 提交于
      commit 12209993e98c5fa1855c467f22a24e3d5b8be205 upstream.
      
      There is one user of __kernel_fpu_begin() and before invoking it,
      it invokes preempt_disable(). So it could invoke kernel_fpu_begin()
      right away. The 32bit version of arch_efi_call_virt_setup() and
      arch_efi_call_virt_teardown() does this already.
      
      The comment above *kernel_fpu*() claims that before invoking
      __kernel_fpu_begin() preemption should be disabled and that KVM is a
      good example of doing it. Well, KVM doesn't do that since commit
      
        f775b13e ("x86,kvm: move qemu/guest FPU switching out to vcpu_run")
      
      so it is not an example anymore.
      
      With EFI gone as the last user of __kernel_fpu_{begin|end}(), both can
      be made static and not exported anymore.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NRik van Riel <riel@surriel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: linux-efi <linux-efi@vger.kernel.org>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20181129150210.2k4mawt37ow6c2vq@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4ff57d0
    • D
      x86/retpolines: Disable switch jump tables when retpolines are enabled · e923c6b7
      Daniel Borkmann 提交于
      commit a9d57ef15cbe327fe54416dd194ee0ea66ae53a4 upstream.
      
      Commit ce02ef06fcf7 ("x86, retpolines: Raise limit for generating indirect
      calls from switch-case") raised the limit under retpolines to 20 switch
      cases where gcc would only then start to emit jump tables, and therefore
      effectively disabling the emission of slow indirect calls in this area.
      
      After this has been brought to attention to gcc folks [0], Martin Liska
      has then fixed gcc to align with clang by avoiding to generate switch jump
      tables entirely under retpolines. This is taking effect in gcc starting
      from stable version 8.4.0. Given kernel supports compilation with older
      versions of gcc where the fix is not being available or backported anymore,
      we need to keep the extra KBUILD_CFLAGS around for some time and generally
      set the -fno-jump-tables to align with what more recent gcc is doing
      automatically today.
      
      More than 20 switch cases are not expected to be fast-path critical, but
      it would still be good to align with gcc behavior for versions < 8.4.0 in
      order to have consistency across supported gcc versions. vmlinux size is
      slightly growing by 0.27% for older gcc. This flag is only set to work
      around affected gcc, no change for clang.
      
        [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952Suggested-by: NMartin Liska <mliska@suse.cz>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Björn Töpel<bjorn.topel@intel.com>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Link: https://lkml.kernel.org/r/20190325135620.14882-1-daniel@iogearbox.netSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e923c6b7
    • D
      x86, retpolines: Raise limit for generating indirect calls from switch-case · 6cfcff3c
      Daniel Borkmann 提交于
      commit ce02ef06fcf7a399a6276adb83f37373d10cbbe1 upstream.
      
      From networking side, there are numerous attempts to get rid of indirect
      calls in fast-path wherever feasible in order to avoid the cost of
      retpolines, for example, just to name a few:
      
        * 283c16a2dfd3 ("indirect call wrappers: helpers to speed-up indirect calls of builtin")
        * aaa5d90b395a ("net: use indirect call wrappers at GRO network layer")
        * 028e0a476684 ("net: use indirect call wrappers at GRO transport layer")
        * 356da6d0cde3 ("dma-mapping: bypass indirect calls for dma-direct")
        * 09772d92 ("bpf: avoid retpoline for lookup/update/delete calls on maps")
        * 10870dd89e95 ("netfilter: nf_tables: add direct calls for all builtin expressions")
        [...]
      
      Recent work on XDP from Björn and Magnus additionally found that manually
      transforming the XDP return code switch statement with more than 5 cases
      into if-else combination would result in a considerable speedup in XDP
      layer due to avoidance of indirect calls in CONFIG_RETPOLINE enabled
      builds. On i40e driver with XDP prog attached, a 20-26% speedup has been
      observed [0]. Aside from XDP, there are many other places later in the
      networking stack's critical path with similar switch-case
      processing. Rather than fixing every XDP-enabled driver and locations in
      stack by hand, it would be good to instead raise the limit where gcc would
      emit expensive indirect calls from the switch under retpolines and stick
      with the default as-is in case of !retpoline configured kernels. This would
      also have the advantage that for archs where this is not necessary, we let
      compiler select the underlying target optimization for these constructs and
      avoid potential slow-downs by if-else hand-rewrite.
      
      In case of gcc, this setting is controlled by case-values-threshold which
      has an architecture global default that selects 4 or 5 (latter if target
      does not have a case insn that compares the bounds) where some arch back
      ends like arm64 or s390 override it with their own target hooks, for
      example, in gcc commit db7a90aa0de5 ("S/390: Disable prediction of indirect
      branches") the threshold pretty much disables jump tables by limit of 20
      under retpoline builds.  Comparing gcc's and clang's default code
      generation on x86-64 under O2 level with retpoline build results in the
      following outcome for 5 switch cases:
      
      * gcc with -mindirect-branch=thunk-inline -mindirect-branch-register:
      
        # gdb -batch -ex 'disassemble dispatch' ./c-switch
        Dump of assembler code for function dispatch:
         0x0000000000400be0 <+0>:     cmp    $0x4,%edi
         0x0000000000400be3 <+3>:     ja     0x400c35 <dispatch+85>
         0x0000000000400be5 <+5>:     lea    0x915f8(%rip),%rdx        # 0x4921e4
         0x0000000000400bec <+12>:    mov    %edi,%edi
         0x0000000000400bee <+14>:    movslq (%rdx,%rdi,4),%rax
         0x0000000000400bf2 <+18>:    add    %rdx,%rax
         0x0000000000400bf5 <+21>:    callq  0x400c01 <dispatch+33>
         0x0000000000400bfa <+26>:    pause
         0x0000000000400bfc <+28>:    lfence
         0x0000000000400bff <+31>:    jmp    0x400bfa <dispatch+26>
         0x0000000000400c01 <+33>:    mov    %rax,(%rsp)
         0x0000000000400c05 <+37>:    retq
         0x0000000000400c06 <+38>:    nopw   %cs:0x0(%rax,%rax,1)
         0x0000000000400c10 <+48>:    jmpq   0x400c90 <fn_3>
         0x0000000000400c15 <+53>:    nopl   (%rax)
         0x0000000000400c18 <+56>:    jmpq   0x400c70 <fn_2>
         0x0000000000400c1d <+61>:    nopl   (%rax)
         0x0000000000400c20 <+64>:    jmpq   0x400c50 <fn_1>
         0x0000000000400c25 <+69>:    nopl   (%rax)
         0x0000000000400c28 <+72>:    jmpq   0x400c40 <fn_0>
         0x0000000000400c2d <+77>:    nopl   (%rax)
         0x0000000000400c30 <+80>:    jmpq   0x400cb0 <fn_4>
         0x0000000000400c35 <+85>:    push   %rax
         0x0000000000400c36 <+86>:    callq  0x40dd80 <abort>
        End of assembler dump.
      
      * clang with -mretpoline emitting search tree:
      
        # gdb -batch -ex 'disassemble dispatch' ./c-switch
        Dump of assembler code for function dispatch:
         0x0000000000400b30 <+0>:     cmp    $0x1,%edi
         0x0000000000400b33 <+3>:     jle    0x400b44 <dispatch+20>
         0x0000000000400b35 <+5>:     cmp    $0x2,%edi
         0x0000000000400b38 <+8>:     je     0x400b4d <dispatch+29>
         0x0000000000400b3a <+10>:    cmp    $0x3,%edi
         0x0000000000400b3d <+13>:    jne    0x400b52 <dispatch+34>
         0x0000000000400b3f <+15>:    jmpq   0x400c50 <fn_3>
         0x0000000000400b44 <+20>:    test   %edi,%edi
         0x0000000000400b46 <+22>:    jne    0x400b5c <dispatch+44>
         0x0000000000400b48 <+24>:    jmpq   0x400c20 <fn_0>
         0x0000000000400b4d <+29>:    jmpq   0x400c40 <fn_2>
         0x0000000000400b52 <+34>:    cmp    $0x4,%edi
         0x0000000000400b55 <+37>:    jne    0x400b66 <dispatch+54>
         0x0000000000400b57 <+39>:    jmpq   0x400c60 <fn_4>
         0x0000000000400b5c <+44>:    cmp    $0x1,%edi
         0x0000000000400b5f <+47>:    jne    0x400b66 <dispatch+54>
         0x0000000000400b61 <+49>:    jmpq   0x400c30 <fn_1>
         0x0000000000400b66 <+54>:    push   %rax
         0x0000000000400b67 <+55>:    callq  0x40dd20 <abort>
        End of assembler dump.
      
        For sake of comparison, clang without -mretpoline:
      
        # gdb -batch -ex 'disassemble dispatch' ./c-switch
        Dump of assembler code for function dispatch:
         0x0000000000400b30 <+0>:	cmp    $0x4,%edi
         0x0000000000400b33 <+3>:	ja     0x400b57 <dispatch+39>
         0x0000000000400b35 <+5>:	mov    %edi,%eax
         0x0000000000400b37 <+7>:	jmpq   *0x492148(,%rax,8)
         0x0000000000400b3e <+14>:	jmpq   0x400bf0 <fn_0>
         0x0000000000400b43 <+19>:	jmpq   0x400c30 <fn_4>
         0x0000000000400b48 <+24>:	jmpq   0x400c10 <fn_2>
         0x0000000000400b4d <+29>:	jmpq   0x400c20 <fn_3>
         0x0000000000400b52 <+34>:	jmpq   0x400c00 <fn_1>
         0x0000000000400b57 <+39>:	push   %rax
         0x0000000000400b58 <+40>:	callq  0x40dcf0 <abort>
        End of assembler dump.
      
      Raising the cases to a high number (e.g. 100) will still result in similar
      code generation pattern with clang and gcc as above, in other words clang
      generally turns off jump table emission by having an extra expansion pass
      under retpoline build to turn indirectbr instructions from their IR into
      switch instructions as a built-in -mno-jump-table lowering of a switch (in
      this case, even if IR input already contained an indirect branch).
      
      For gcc, adding --param=case-values-threshold=20 as in similar fashion as
      s390 in order to raise the limit for x86 retpoline enabled builds results
      in a small vmlinux size increase of only 0.13% (before=18,027,528
      after=18,051,192). For clang this option is ignored due to i) not being
      needed as mentioned and ii) not having above cmdline
      parameter. Non-retpoline-enabled builds with gcc continue to use the
      default case-values-threshold setting, so nothing changes here.
      
      [0] https://lore.kernel.org/netdev/20190129095754.9390-1-bjorn.topel@gmail.com/
          and "The Path to DPDK Speeds for AF_XDP", LPC 2018, networking track:
        - http://vger.kernel.org/lpc_net2018_talks/lpc18_pres_af_xdp_perf-v3.pdf
        - http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdfSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: netdev@vger.kernel.org
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: https://lkml.kernel.org/r/20190221221941.29358-1-daniel@iogearbox.netSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6cfcff3c
    • M
      powerpc/mm/radix: Make Radix require HUGETLB_PAGE · 087341c0
      Michael Ellerman 提交于
      commit 8adddf349fda0d3de2f6bb41ddf838cbf36a8ad2 upstream.
      
      Joel reported weird crashes using skiroot_defconfig, in his case we
      jumped into an NX page:
      
        kernel tried to execute exec-protected page (c000000002bff4f0) - exploit attempt? (uid: 0)
        BUG: Unable to handle kernel instruction fetch
        Faulting instruction address: 0xc000000002bff4f0
      
      Looking at the disassembly, we had simply branched to that address:
      
        c000000000c001bc  49fff335    bl     c000000002bff4f0
      
      But that didn't match the original kernel image:
      
        c000000000c001bc  4bfff335    bl     c000000000bff4f0 <kobject_get+0x8>
      
      When STRICT_KERNEL_RWX is enabled, and we're using the radix MMU, we
      call radix__change_memory_range() late in boot to change page
      protections. We do that both to mark rodata read only and also to mark
      init text no-execute. That involves walking the kernel page tables,
      and clearing _PAGE_WRITE or _PAGE_EXEC respectively.
      
      With radix we may use hugepages for the linear mapping, so the code in
      radix__change_memory_range() uses eg. pmd_huge() to test if it has
      found a huge mapping, and if so it stops the page table walk and
      changes the PMD permissions.
      
      However if the kernel is built without HUGETLBFS support, pmd_huge()
      is just a #define that always returns 0. That causes the code in
      radix__change_memory_range() to incorrectly interpret the PMD value as
      a pointer to a PTE page rather than as a PTE at the PMD level.
      
      We can see this using `dv` in xmon which also uses pmd_huge():
      
        0:mon> dv c000000000000000
        pgd  @ 0xc000000001740000
        pgdp @ 0xc000000001740000 = 0x80000000ffffb009
        pudp @ 0xc0000000ffffb000 = 0x80000000ffffa009
        pmdp @ 0xc0000000ffffa000 = 0xc00000000000018f   <- this is a PTE
        ptep @ 0xc000000000000100 = 0xa64bb17da64ab07d   <- kernel text
      
      The end result is we treat the value at 0xc000000000000100 as a PTE
      and clear _PAGE_WRITE or _PAGE_EXEC, potentially corrupting the code
      at that address.
      
      In Joel's specific case we cleared the sign bit in the offset of the
      branch, causing a backward branch to turn into a forward branch which
      caused us to branch into a non-executable page. However the exact
      nature of the crash depends on kernel version, compiler version, and
      other factors.
      
      We need to fix radix__change_memory_range() to not use accessors that
      depend on HUGETLBFS, but we also have radix memory hotplug code that
      uses pmd_huge() etc that will also need fixing. So for now just
      disallow the broken combination of Radix with HUGETLBFS disabled.
      
      The only defconfig we have that is affected is skiroot_defconfig, so
      turn on HUGETLBFS there so that it still gets Radix.
      
      Fixes: 566ca99a ("powerpc/mm/radix: Add dummy radix_enabled()")
      Cc: stable@vger.kernel.org # v4.7+
      Reported-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      087341c0
    • A
      ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache · 478afe34
      Ard Biesheuvel 提交于
      commit e17b1af96b2afc38e684aa2f1033387e2ed10029 upstream.
      
      The EFI stub is entered with the caches and MMU enabled by the
      firmware, and once the stub is ready to hand over to the decompressor,
      we clean and disable the caches.
      
      The cache clean routines use CP15 barrier instructions, which can be
      disabled via SCTLR. Normally, when using the provided cache handling
      routines to enable the caches and MMU, this bit is enabled as well.
      However, but since we entered the stub with the caches already enabled,
      this routine is not executed before we call the cache clean routines,
      resulting in undefined instruction exceptions if the firmware never
      enabled this bit.
      
      So set the bit explicitly in the EFI entry code, but do so in a way that
      guarantees that the resulting code can still run on v6 cores as well
      (which are guaranteed to have CP15 barriers enabled)
      
      Cc: <stable@vger.kernel.org> # v4.9+
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      478afe34
    • H
      perf/x86/intel: Update KBL Package C-state events to also include PC8/PC9/PC10 counters · 37ecf31a
      Harry Pan 提交于
      commit 82c99f7a81f28f8c1be5f701c8377d14c4075b10 upstream.
      
      Kaby Lake (and Coffee Lake) has PC8/PC9/PC10 residency counters.
      
      This patch updates the list of Kaby/Coffee Lake PMU event counters
      from the snb_cstates[] list of events to the hswult_cstates[]
      list of events, which keeps all previously supported events and
      also adds the PKG_C8, PKG_C9 and PKG_C10 residency counters.
      
      This allows user space tools to profile them through the perf interface.
      Signed-off-by: NHarry Pan <harry.pan@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: gs0622@gmail.com
      Link: http://lkml.kernel.org/r/20190424145033.1924-1-harry.pan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37ecf31a
    • A
      MIPS: scall64-o32: Fix indirect syscall number load · 7f9c9d1d
      Aurelien Jarno 提交于
      commit 79b4a9cf0e2ea8203ce777c8d5cfa86c71eae86e upstream.
      
      Commit 4c21b8fd (MIPS: seccomp: Handle indirect system calls (o32))
      added indirect syscall detection for O32 processes running on MIPS64,
      but it did not work correctly for big endian kernel/processes. The
      reason is that the syscall number is loaded from ARG1 using the lw
      instruction while this is a 64-bit value, so zero is loaded instead of
      the syscall number.
      
      Fix the code by using the ld instruction instead. When running a 32-bit
      processes on a 64 bit CPU, the values are properly sign-extended, so it
      ensures the value passed to syscall_trace_enter is correct.
      
      Recent systemd versions with seccomp enabled whitelist the getpid
      syscall for their internal  processes (e.g. systemd-journald), but call
      it through syscall(SYS_getpid). This fix therefore allows O32 big endian
      systems with a 64-bit kernel to run recent systemd versions.
      Signed-off-by: NAurelien Jarno <aurelien@aurel32.net>
      Cc: <stable@vger.kernel.org> # v3.15+
      Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: linux-mips@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f9c9d1d
    • C
      powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64 · 1e0cab1b
      Christophe Leroy 提交于
      [ Upstream commit dd9a994fc68d196a052b73747e3366c57d14a09e ]
      
      Commit b5b4453e7912 ("powerpc/vdso64: Fix CLOCK_MONOTONIC
      inconsistencies across Y2038") changed the type of wtom_clock_sec
      to s64 on PPC64. Therefore, VDSO32 needs to read it with a 4 bytes
      shift in order to retrieve the lower part of it.
      
      Fixes: b5b4453e7912 ("powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038")
      Reported-by: NChristian Zigotzky <chzigotzky@xenosoft.de>
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1e0cab1b
  5. 27 4月, 2019 10 次提交