1. 24 4月, 2021 1 次提交
  2. 22 4月, 2021 1 次提交
  3. 21 4月, 2021 1 次提交
    • K
      perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3 · 9d480158
      Kan Liang 提交于
      There may be a kernel panic on the Haswell server and the Broadwell
      server, if the snbep_pci2phy_map_init() return error.
      
      The uncore_extra_pci_dev[HSWEP_PCI_PCU_3] is used in the cpu_init() to
      detect the existence of the SBOX, which is a MSR type of PMON unit.
      The uncore_extra_pci_dev is allocated in the uncore_pci_init(). If the
      snbep_pci2phy_map_init() returns error, perf doesn't initialize the
      PCI type of the PMON units, so the uncore_extra_pci_dev will not be
      allocated. But perf may continue initializing the MSR type of PMON
      units. A null dereference kernel panic will be triggered.
      
      The sockets in a Haswell server or a Broadwell server are identical.
      Only need to detect the existence of the SBOX once.
      Current perf probes all available PCU devices and stores them into the
      uncore_extra_pci_dev. It's unnecessary.
      Use the pci_get_device() to replace the uncore_extra_pci_dev. Only
      detect the existence of the SBOX on the first available PCU device once.
      
      Factor out hswep_has_limit_sbox(), since the Haswell server and the
      Broadwell server uses the same way to detect the existence of the SBOX.
      
      Add some macros to replace the magic number.
      
      Fixes: 5306c31c ("perf/x86/uncore/hsw-ep: Handle systems with only two SBOXes")
      Reported-by: NSteve Wahl <steve.wahl@hpe.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NSteve Wahl <steve.wahl@hpe.com>
      Link: https://lkml.kernel.org/r/1618521764-100923-1-git-send-email-kan.liang@linux.intel.com
      9d480158
  4. 20 4月, 2021 1 次提交
    • M
      x86/crash: Fix crash_setup_memmap_entries() out-of-bounds access · 5849cdf8
      Mike Galbraith 提交于
      Commit in Fixes: added support for kexec-ing a kernel on panic using a
      new system call. As part of it, it does prepare a memory map for the new
      kernel.
      
      However, while doing so, it wrongly accesses memory it has not
      allocated: it accesses the first element of the cmem->ranges[] array in
      memmap_exclude_ranges() but it has not allocated the memory for it in
      crash_setup_memmap_entries(). As KASAN reports:
      
        BUG: KASAN: vmalloc-out-of-bounds in crash_setup_memmap_entries+0x17e/0x3a0
        Write of size 8 at addr ffffc90000426008 by task kexec/1187
      
        (gdb) list *crash_setup_memmap_entries+0x17e
        0xffffffff8107cafe is in crash_setup_memmap_entries (arch/x86/kernel/crash.c:322).
        317                                      unsigned long long mend)
        318     {
        319             unsigned long start, end;
        320
        321             cmem->ranges[0].start = mstart;
        322             cmem->ranges[0].end = mend;
        323             cmem->nr_ranges = 1;
        324
        325             /* Exclude elf header region */
        326             start = image->arch.elf_load_addr;
        (gdb)
      
      Make sure the ranges array becomes a single element allocated.
      
       [ bp: Write a proper commit message. ]
      
      Fixes: dd5f7260 ("kexec: support for kexec on panic using new system call")
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/725fa3dc1da2737f0f6188a1a9701bead257ea9d.camel@gmx.de
      5849cdf8
  5. 19 4月, 2021 1 次提交
  6. 17 4月, 2021 1 次提交
  7. 14 4月, 2021 3 次提交
  8. 13 4月, 2021 2 次提交
  9. 12 4月, 2021 1 次提交
  10. 10 4月, 2021 2 次提交
  11. 09 4月, 2021 1 次提交
  12. 08 4月, 2021 7 次提交
    • J
      x86/sgx: Do not update sgx_nr_free_pages in sgx_setup_epc_section() · ae40aaf6
      Jarkko Sakkinen 提交于
      The commit in Fixes: changed the SGX EPC page sanitization to end up in
      sgx_free_epc_page() which puts clean and sanitized pages on the free
      list.
      
      This was done for the reason that it is best to keep the logic to assign
      available-for-use EPC pages to the correct NUMA lists in a single
      location.
      
      sgx_nr_free_pages is also incremented by sgx_free_epc_pages() but those
      pages which are being added there per EPC section do not belong to the
      free list yet because they haven't been sanitized yet - they land on the
      dirty list first and the sanitization happens later when ksgxd starts
      massaging them.
      
      So remove that addition there and have sgx_free_epc_page() do that
      solely.
      
       [ bp: Sanitize commit message too. ]
      
      Fixes: 51ab30eb ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
      Signed-off-by: NJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20210408092924.7032-1-jarkko@kernel.org
      ae40aaf6
    • P
      bpf, x86: Validate computation of branch displacements for x86-32 · 26f55a59
      Piotr Krysiuk 提交于
      The branch displacement logic in the BPF JIT compilers for x86 assumes
      that, for any generated branch instruction, the distance cannot
      increase between optimization passes.
      
      But this assumption can be violated due to how the distances are
      computed. Specifically, whenever a backward branch is processed in
      do_jit(), the distance is computed by subtracting the positions in the
      machine code from different optimization passes. This is because part
      of addrs[] is already updated for the current optimization pass, before
      the branch instruction is visited.
      
      And so the optimizer can expand blocks of machine code in some cases.
      
      This can confuse the optimizer logic, where it assumes that a fixed
      point has been reached for all machine code blocks once the total
      program size stops changing. And then the JIT compiler can output
      abnormal machine code containing incorrect branch displacements.
      
      To mitigate this issue, we assert that a fixed point is reached while
      populating the output image. This rejects any problematic programs.
      The issue affects both x86-32 and x86-64. We mitigate separately to
      ease backporting.
      Signed-off-by: NPiotr Krysiuk <piotras@gmail.com>
      Reviewed-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      26f55a59
    • P
      bpf, x86: Validate computation of branch displacements for x86-64 · e4d4d456
      Piotr Krysiuk 提交于
      The branch displacement logic in the BPF JIT compilers for x86 assumes
      that, for any generated branch instruction, the distance cannot
      increase between optimization passes.
      
      But this assumption can be violated due to how the distances are
      computed. Specifically, whenever a backward branch is processed in
      do_jit(), the distance is computed by subtracting the positions in the
      machine code from different optimization passes. This is because part
      of addrs[] is already updated for the current optimization pass, before
      the branch instruction is visited.
      
      And so the optimizer can expand blocks of machine code in some cases.
      
      This can confuse the optimizer logic, where it assumes that a fixed
      point has been reached for all machine code blocks once the total
      program size stops changing. And then the JIT compiler can output
      abnormal machine code containing incorrect branch displacements.
      
      To mitigate this issue, we assert that a fixed point is reached while
      populating the output image. This rejects any problematic programs.
      The issue affects both x86-32 and x86-64. We mitigate separately to
      ease backporting.
      Signed-off-by: NPiotr Krysiuk <piotras@gmail.com>
      Reviewed-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      e4d4d456
    • P
      KVM: x86/mmu: preserve pending TLB flush across calls to kvm_tdp_mmu_zap_sp · 315f02c6
      Paolo Bonzini 提交于
      Right now, if a call to kvm_tdp_mmu_zap_sp returns false, the caller
      will skip the TLB flush, which is wrong.  There are two ways to fix
      it:
      
      - since kvm_tdp_mmu_zap_sp will not yield and therefore will not flush
        the TLB itself, we could change the call to kvm_tdp_mmu_zap_sp to
        use "flush |= ..."
      
      - or we can chain the flush argument through kvm_tdp_mmu_zap_sp down
        to __kvm_tdp_mmu_zap_gfn_range.  Note that kvm_tdp_mmu_zap_sp will
        neither yield nor flush, so flush would never go from true to
        false.
      
      This patch does the former to simplify application to stable kernels,
      and to make it further clearer that kvm_tdp_mmu_zap_sp will not flush.
      
      Cc: seanjc@google.com
      Fixes: 048f4980 ("KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping")
      Cc: <stable@vger.kernel.org> # 5.10.x: 048f4980: KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
      Cc: <stable@vger.kernel.org> # 5.10.x: 33a31641: KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      315f02c6
    • Z
      x86/msr: Make locally used functions static · 3e7bbe15
      Zhao Xuehui 提交于
      The functions msr_read() and msr_write() are not used outside of msr.c,
      make them static.
      
       [ bp: Massage commit message. ]
      Signed-off-by: NZhao Xuehui <zhaoxuehui1@huawei.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20210408095218.152264-1-zhaoxuehui1@huawei.com
      3e7bbe15
    • Y
      x86/cacheinfo: Remove unneeded dead-store initialization · dda451f3
      Yang Li 提交于
      $ make CC=clang clang-analyzer
      
      (needs clang-tidy installed on the system too)
      
      on x86_64 defconfig triggers:
      
        arch/x86/kernel/cpu/cacheinfo.c:880:24: warning: Value stored to 'this_cpu_ci' \
      	  during its initialization is never read [clang-analyzer-deadcode.DeadStores]
              struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
                                    ^
        arch/x86/kernel/cpu/cacheinfo.c:880:24: note: Value stored to 'this_cpu_ci' \
      	during its initialization is never read
      
      So simply remove this unneeded dead-store initialization.
      
      As compilers will detect this unneeded assignment and optimize this
      anyway the resulting object code is identical before and after this
      change.
      
      No functional change. No change to object code.
      
       [ bp: Massage commit message. ]
      Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: NYang Li <yang.lee@linux.alibaba.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Link: https://lkml.kernel.org/r/1617177624-24670-1-git-send-email-yang.lee@linux.alibaba.com
      dda451f3
    • V
      ACPI: processor: Fix build when CONFIG_ACPI_PROCESSOR=m · fa26d0c7
      Vitaly Kuznetsov 提交于
      Commit 8cdddd18 ("ACPI: processor: Fix CPU0 wakeup in
      acpi_idle_play_dead()") tried to fix CPU0 hotplug breakage by copying
      wakeup_cpu0() + start_cpu0() logic from hlt_play_dead()//mwait_play_dead()
      into acpi_idle_play_dead(). The problem is that these functions are not
      exported to modules so when CONFIG_ACPI_PROCESSOR=m build fails.
      
      The issue could've been fixed by exporting both wakeup_cpu0()/start_cpu0()
      (the later from assembly) but it seems putting the whole pattern into a
      new function and exporting it instead is better.
      Reported-by: Nkernel test robot <lkp@intel.com>
      Fixes: 8cdddd18 ("CPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead()")
      Cc: <stable@vger.kernel.org> # 5.10+
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      fa26d0c7
  13. 07 4月, 2021 2 次提交
  14. 06 4月, 2021 8 次提交
  15. 02 4月, 2021 1 次提交
    • A
      crypto: poly1305 - fix poly1305_core_setkey() declaration · 8d195e7a
      Arnd Bergmann 提交于
      gcc-11 points out a mismatch between the declaration and the definition
      of poly1305_core_setkey():
      
      lib/crypto/poly1305-donna32.c:13:67: error: argument 2 of type ‘const u8[16]’ {aka ‘const unsigned char[16]’} with mismatched bound [-Werror=array-parameter=]
         13 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 raw_key[16])
            |                                                          ~~~~~~~~~^~~~~~~~~~~
      In file included from lib/crypto/poly1305-donna32.c:11:
      include/crypto/internal/poly1305.h:21:68: note: previously declared as ‘const u8 *’ {aka ‘const unsigned char *’}
         21 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 *raw_key);
      
      This is harmless in principle, as the calling conventions are the same,
      but the more specific prototype allows better type checking in the
      caller.
      
      Change the declaration to match the actual function definition.
      The poly1305_simd_init() is a bit suspicious here, as it previously
      had a 32-byte argument type, but looks like it needs to take the
      16-byte POLY1305_BLOCK_SIZE array instead.
      
      Fixes: 1c08a104 ("crypto: poly1305 - add new 32 and 64-bit generic versions")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      8d195e7a
  16. 01 4月, 2021 6 次提交
    • V
      ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead() · 8cdddd18
      Vitaly Kuznetsov 提交于
      Commit 496121c0 ("ACPI: processor: idle: Allow probing on platforms
      with one ACPI C-state") broke CPU0 hotplug on certain systems, e.g.
      I'm observing the following on AWS Nitro (e.g r5b.xlarge but other
      instance types are affected as well):
      
       # echo 0 > /sys/devices/system/cpu/cpu0/online
       # echo 1 > /sys/devices/system/cpu/cpu0/online
       <10 seconds delay>
       -bash: echo: write error: Input/output error
      
      In fact, the above mentioned commit only revealed the problem and did
      not introduce it. On x86, to wakeup CPU an NMI is being used and
      hlt_play_dead()/mwait_play_dead() loops are prepared to handle it:
      
      	/*
      	 * If NMI wants to wake up CPU0, start CPU0.
      	 */
      	if (wakeup_cpu0())
      		start_cpu0();
      
      cpuidle_play_dead() -> acpi_idle_play_dead() (which is now being called on
      systems where it wasn't called before the above mentioned commit) serves
      the same purpose but it doesn't have a path for CPU0. What happens now on
      wakeup is:
       - NMI is sent to CPU0
       - wakeup_cpu0_nmi() works as expected
       - we get back to while (1) loop in acpi_idle_play_dead()
       - safe_halt() puts CPU0 to sleep again.
      
      The straightforward/minimal fix is add the special handling for CPU0 on x86
      and that's what the patch is doing.
      
      Fixes: 496121c0 ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8cdddd18
    • V
      KVM: x86: Prevent 'hv_clock->system_time' from going negative in kvm_guest_time_update() · 77fcbe82
      Vitaly Kuznetsov 提交于
      When guest time is reset with KVM_SET_CLOCK(0), it is possible for
      'hv_clock->system_time' to become a small negative number. This happens
      because in KVM_SET_CLOCK handling we set 'kvm->arch.kvmclock_offset' based
      on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled,
      kvm_guest_time_update() does (masterclock in use case):
      
      hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset;
      
      And 'master_kernel_ns' represents the last time when masterclock
      got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a
      problem, the difference is very small, e.g. I'm observing
      hv_clock.system_time = -70 ns. The issue comes from the fact that
      'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in
      compute_tsc_page_parameters() becomes a very big number.
      
      Use 'master_kernel_ns' instead of get_kvmclock_ns() when masterclock is in
      use and get_kvmclock_base_ns() when it's not to prevent 'system_time' from
      going negative.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210331124130.337992-2-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      77fcbe82
    • P
      KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken · a83829f5
      Paolo Bonzini 提交于
      pvclock_gtod_sync_lock can be taken with interrupts disabled if the
      preempt notifier calls get_kvmclock_ns to update the Xen
      runstate information:
      
         spin_lock include/linux/spinlock.h:354 [inline]
         get_kvmclock_ns+0x25/0x390 arch/x86/kvm/x86.c:2587
         kvm_xen_update_runstate+0x3d/0x2c0 arch/x86/kvm/xen.c:69
         kvm_xen_update_runstate_guest+0x74/0x320 arch/x86/kvm/xen.c:100
         kvm_xen_runstate_set_preempted arch/x86/kvm/xen.h:96 [inline]
         kvm_arch_vcpu_put+0x2d8/0x5a0 arch/x86/kvm/x86.c:4062
      
      So change the users of the spinlock to spin_lock_irqsave and
      spin_unlock_irqrestore.
      
      Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
      Fixes: 30b5c851 ("KVM: x86/xen: Add support for vCPU runstate information")
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a83829f5
    • P
      KVM: x86: reduce pvclock_gtod_sync_lock critical sections · c2c647f9
      Paolo Bonzini 提交于
      There is no need to include changes to vcpu->requests into
      the pvclock_gtod_sync_lock critical section.  The changes to
      the shared data structures (in pvclock_update_vm_gtod_copy)
      already occur under the lock.
      
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c2c647f9
    • P
      KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit · 3c346c0c
      Paolo Bonzini 提交于
      Fixing nested_vmcb_check_save to avoid all TOC/TOU races
      is a bit harder in released kernels, so do the bare minimum
      by avoiding that EFER.SVME is cleared.  This is problematic
      because svm_set_efer frees the data structures for nested
      virtualization if EFER.SVME is cleared.
      
      Also check that EFER.SVME remains set after a nested vmexit;
      clearing it could happen if the bit is zero in the save area
      that is passed to KVM_SET_NESTED_STATE (the save area of the
      nested state corresponds to the nested hypervisor's state
      and is restored on the next nested vmexit).
      
      Cc: stable@vger.kernel.org
      Fixes: 2fcf4876 ("KVM: nSVM: implement on demand allocation of the nested state")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3c346c0c
    • P
      KVM: SVM: load control fields from VMCB12 before checking them · a58d9166
      Paolo Bonzini 提交于
      Avoid races between check and use of the nested VMCB controls.  This
      for example ensures that the VMRUN intercept is always reflected to the
      nested hypervisor, instead of being processed by the host.  Without this
      patch, it is possible to end up with svm->nested.hsave pointing to
      the MSR permission bitmap for nested guests.
      
      This bug is CVE-2021-29657.
      Reported-by: NFelix Wilhelm <fwilhelm@google.com>
      Cc: stable@vger.kernel.org
      Fixes: 2fcf4876 ("KVM: nSVM: implement on demand allocation of the nested state")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a58d9166
  17. 31 3月, 2021 1 次提交
    • S
      KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages · 33a31641
      Sean Christopherson 提交于
      Prevent the TDP MMU from yielding when zapping a gfn range during NX
      page recovery.  If a flush is pending from a previous invocation of the
      zapping helper, either in the TDP MMU or the legacy MMU, but the TDP MMU
      has not accumulated a flush for the current invocation, then yielding
      will release mmu_lock with stale TLB entries.
      
      That being said, this isn't technically a bug fix in the current code, as
      the TDP MMU will never yield in this case.  tdp_mmu_iter_cond_resched()
      will yield if and only if it has made forward progress, as defined by the
      current gfn vs. the last yielded (or starting) gfn.  Because zapping a
      single shadow page is guaranteed to (a) find that page and (b) step
      sideways at the level of the shadow page, the TDP iter will break its loop
      before getting a chance to yield.
      
      But that is all very, very subtle, and will break at the slightest sneeze,
      e.g. zapping while holding mmu_lock for read would break as the TDP MMU
      wouldn't be guaranteed to see the present shadow page, and thus could step
      sideways at a lower level.
      
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210325200119.1359384-4-seanjc@google.com>
      [Add lockdep assertion. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      33a31641