1. 16 9月, 2019 25 次提交
    • B
      kvm: mmu: Fix overflow on kvm mmu page limit calculation · 163b24b1
      Ben Gardon 提交于
      [ Upstream commit bc8a3d8925a8fa09fa550e0da115d95851ce33c6 ]
      
      KVM bases its memory usage limits on the total number of guest pages
      across all memslots. However, those limits, and the calculations to
      produce them, use 32 bit unsigned integers. This can result in overflow
      if a VM has more guest pages that can be represented by a u32. As a
      result of this overflow, KVM can use a low limit on the number of MMU
      pages it will allocate. This makes KVM unable to map all of guest memory
      at once, prompting spurious faults.
      
      Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch
      	introduced no new failures.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      163b24b1
    • D
      arm64: dts: stratix10: add the sysmgr-syscon property from the gmac's · 37222eaf
      Dinh Nguyen 提交于
      [ Upstream commit 8efd6365417a044db03009724ecc1a9521524913 ]
      
      The gmac ethernet driver uses the "altr,sysmgr-syscon" property to
      configure phy settings for the gmac controller.
      
      Add the "altr,sysmgr-syscon" property to all gmac nodes.
      
      This patch fixes:
      
      [    0.917530] socfpga-dwmac ff800000.ethernet: No sysmgr-syscon node found
      [    0.924209] socfpga-dwmac ff800000.ethernet: Unable to parse OF data
      
      Cc: stable@vger.kernel.org
      Reported-by: NLey Foon Tan <ley.foon.tan@intel.com>
      Signed-off-by: NDinh Nguyen <dinguyen@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      37222eaf
    • M
      powerpc/kvm: Save and restore host AMR/IAMR/UAMOR · 915c9d0a
      Michael Ellerman 提交于
      [ Upstream commit c3c7470c75566a077c8dc71dcf8f1948b8ddfab4 ]
      
      When the hash MMU is active the AMR, IAMR and UAMOR are used for
      pkeys. The AMR is directly writable by user space, and the UAMOR masks
      those writes, meaning both registers are effectively user register
      state. The IAMR is used to create an execute only key.
      
      Also we must maintain the value of at least the AMR when running in
      process context, so that any memory accesses done by the kernel on
      behalf of the process are correctly controlled by the AMR.
      
      Although we are correctly switching all registers when going into a
      guest, on returning to the host we just write 0 into all regs, except
      on Power9 where we restore the IAMR correctly.
      
      This could be observed by a user process if it writes the AMR, then
      runs a guest and we then return immediately to it without
      rescheduling. Because we have written 0 to the AMR that would have the
      effect of granting read/write permission to pages that the process was
      trying to protect.
      
      In addition, when using the Radix MMU, the AMR can prevent inadvertent
      kernel access to userspace data, writing 0 to the AMR disables that
      protection.
      
      So save and restore AMR, IAMR and UAMOR.
      
      Fixes: cf43d3b2 ("powerpc: Enable pkey subsystem")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      915c9d0a
    • P
      x86/kvmclock: set offset for kvm unstable clock · 1d60902a
      Pavel Tatashin 提交于
      [ Upstream commit b5179ec4187251a751832193693d6e474d3445ac ]
      
      VMs may show incorrect uptime and dmesg printk offsets on hypervisors with
      unstable clock. The problem is produced when VM is rebooted without exiting
      from qemu.
      
      The fix is to calculate clock offset not only for stable clock but for
      unstable clock as well, and use kvm_sched_clock_read() which substracts
      the offset for both clocks.
      
      This is safe, because pvclock_clocksource_read() does the right thing and
      makes sure that clock always goes forward, so once offset is calculated
      with unstable clock, we won't get new reads that are smaller than offset,
      and thus won't get negative results.
      
      Thank you Jon DeVree for helping to reproduce this issue.
      
      Fixes: 857baa87 ("sched/clock: Enable sched clock early")
      Cc: stable@vger.kernel.org
      Reported-by: NDominique Martinet <asmadeus@codewreck.org>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1d60902a
    • S
      KVM: VMX: Compare only a single byte for VMCS' "launched" in vCPU-run · cd490d44
      Sean Christopherson 提交于
      [ Upstream commit 61c08aa9606d4e48a8a50639c956448a720174c3 ]
      
      The vCPU-run asm blob does a manual comparison of a VMCS' launched
      status to execute the correct VM-Enter instruction, i.e. VMLAUNCH vs.
      VMRESUME.  The launched flag is a bool, which is a typedef of _Bool.
      C99 does not define an exact size for _Bool, stating only that is must
      be large enough to hold '0' and '1'.  Most, if not all, compilers use
      a single byte for _Bool, including gcc[1].
      
      Originally, 'launched' was of type 'int' and so the asm blob used 'cmpl'
      to check the launch status.  When 'launched' was moved to be stored on a
      per-VMCS basis, struct vcpu_vmx's "temporary" __launched flag was added
      in order to avoid having to pass the current VMCS into the asm blob.
      The new  '__launched' was defined as a 'bool' and not an 'int', but the
      'cmp' instruction was not updated.
      
      This has not caused any known problems, likely due to compilers aligning
      variables to 4-byte or 8-byte boundaries and KVM zeroing out struct
      vcpu_vmx during allocation.  I.e. vCPU-run accesses "junk" data, it just
      happens to always be zero and so doesn't affect the result.
      
      [1] https://gcc.gnu.org/ml/gcc-patches/2000-10/msg01127.html
      
      Fixes: d462b819 ("KVM: VMX: Keep list of loaded VMCSs, instead of vcpus")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      cd490d44
    • V
      ARC: mm: do_page_fault fixes #1: relinquish mmap_sem if signal arrives while handle_mm_fault · 8c6fb55a
      Vineet Gupta 提交于
      [ Upstream commit 4d447455e73b47c43dd35fcc38ed823d3182a474 ]
      
      do_page_fault() forgot to relinquish mmap_sem if a signal came while
      handling handle_mm_fault() - due to say a ctl+c or oom etc.
      This would later cause a deadlock by acquiring it twice.
      
      This came to light when running libc testsuite tst-tls3-malloc test but
      is likely also the cause for prior seen LTP failures. Using lockdep
      clearly showed what the issue was.
      
      | # while true; do ./tst-tls3-malloc ; done
      | Didn't expect signal from child: got `Segmentation fault'
      | ^C
      | ============================================
      | WARNING: possible recursive locking detected
      | 4.17.0+ #25 Not tainted
      | --------------------------------------------
      | tst-tls3-malloc/510 is trying to acquire lock:
      | 606c7728 (&mm->mmap_sem){++++}, at: __might_fault+0x28/0x5c
      |
      |but task is already holding lock:
      |606c7728 (&mm->mmap_sem){++++}, at: do_page_fault+0x9c/0x2a0
      |
      | other info that might help us debug this:
      |  Possible unsafe locking scenario:
      |
      |       CPU0
      |       ----
      |  lock(&mm->mmap_sem);
      |  lock(&mm->mmap_sem);
      |
      | *** DEADLOCK ***
      |
      
      ------------------------------------------------------------
      What the change does is not obvious (note to myself)
      
      prior code was
      
      | do_page_fault
      |
      |   down_read()		<-- lock taken
      |   handle_mm_fault	<-- signal pending as this runs
      |   if fatal_signal_pending
      |       if VM_FAULT_ERROR
      |           up_read
      |       if user_mode
      |          return	<-- lock still held, this was the BUG
      
      New code
      
      | do_page_fault
      |
      |   down_read()		<-- lock taken
      |   handle_mm_fault	<-- signal pending as this runs
      |   if fatal_signal_pending
      |       if VM_FAULT_RETRY
      |          return       <-- not same case as above, but still OK since
      |                           core mm already relinq lock for FAULT_RETRY
      |    ...
      |
      |   < Now falls through for bug case above >
      |
      |   up_read()		<-- lock relinquished
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8c6fb55a
    • V
      ARC: show_regs: lockdep: re-enable preemption · 96af7d92
      Vineet Gupta 提交于
      [ Upstream commit f731a8e89f8c78985707c626680f3e24c7a60772 ]
      
      signal handling core calls show_regs() with preemption disabled which
      on ARC takes mmap_sem for mm/vma access, causing lockdep splat.
      
      | [ARCLinux]# ./segv-null-ptr
      | potentially unexpected fatal signal 11.
      | BUG: sleeping function called from invalid context at kernel/fork.c:1011
      | in_atomic(): 1, irqs_disabled(): 0, pid: 70, name: segv-null-ptr
      | no locks held by segv-null-ptr/70.
      | CPU: 0 PID: 70 Comm: segv-null-ptr Not tainted 4.18.0+ #69
      |
      | Stack Trace:
      |  arc_unwind_core+0xcc/0x100
      |  ___might_sleep+0x17a/0x190
      |  mmput+0x16/0xb8
      |  show_regs+0x52/0x310
      |  get_signal+0x5ee/0x610
      |  do_signal+0x2c/0x218
      |  resume_user_mode_begin+0x90/0xd8
      
      Workaround by re-enabling preemption temporarily.
      
      Note that the preemption disabling in core code around show_regs()
      was introduced by commit 3a9f84d3 ("signals, debug: fix BUG: using
      smp_processor_id() in preemptible code in print_fatal_signal()")
      
      to silence a differnt lockdep seen on x86 bakc in 2009.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      96af7d92
    • R
      powerpc/pkeys: Fix handling of pkey state across fork() · cfbf227e
      Ram Pai 提交于
      [ Upstream commit 2cd4bd192ee94848695c1c052d87913260e10f36 ]
      
      Protection key tracking information is not copied over to the
      mm_struct of the child during fork(). This can cause the child to
      erroneously allocate keys that were already allocated. Any allocated
      execute-only key is lost aswell.
      
      Add code; called by dup_mmap(), to copy the pkey state from parent to
      child explicitly.
      
      This problem was originally found by Dave Hansen on x86, which turns
      out to be a problem on powerpc aswell.
      
      Fixes: cf43d3b2 ("powerpc: Enable pkey subsystem")
      Cc: stable@vger.kernel.org # v4.16+
      Reviewed-by: NThiago Jung Bauermann <bauerman@linux.ibm.com>
      Signed-off-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      cfbf227e
    • P
      KVM: PPC: Book3S HV: Fix race between kvm_unmap_hva_range and MMU mode switch · d3984e80
      Paul Mackerras 提交于
      [ Upstream commit 234ff0b729ad882d20f7996591a964965647addf ]
      
      Testing has revealed an occasional crash which appears to be caused
      by a race between kvmppc_switch_mmu_to_hpt and kvm_unmap_hva_range_hv.
      The symptom is a NULL pointer dereference in __find_linux_pte() called
      from kvm_unmap_radix() with kvm->arch.pgtable == NULL.
      
      Looking at kvmppc_switch_mmu_to_hpt(), it does indeed clear
      kvm->arch.pgtable (via kvmppc_free_radix()) before setting
      kvm->arch.radix to NULL, and there is nothing to prevent
      kvm_unmap_hva_range_hv() or the other MMU callback functions from
      being called concurrently with kvmppc_switch_mmu_to_hpt() or
      kvmppc_switch_mmu_to_radix().
      
      This patch therefore adds calls to spin_lock/unlock on the kvm->mmu_lock
      around the assignments to kvm->arch.radix, and makes sure that the
      partition-scoped radix tree or HPT is only freed after changing
      kvm->arch.radix.
      
      This also takes the kvm->mmu_lock in kvmppc_rmap_reset() to make sure
      that the clearing of each rmap array (one per memslot) doesn't happen
      concurrently with use of the array in the kvm_unmap_hva_range_hv()
      or the other MMU callbacks.
      
      Fixes: 18c3640c ("KVM: PPC: Book3S HV: Add infrastructure for running HPT guests on radix host")
      Cc: stable@vger.kernel.org # v4.15+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d3984e80
    • B
      ARM: davinci: dm644x: define gpio interrupts as separate resources · a4f404af
      Bartosz Golaszewski 提交于
      [ Upstream commit adcf60ce14c8250761af9de907eb6c7d096c26d3 ]
      
      Since commit eb3744a2 ("gpio: davinci: Do not assume continuous
      IRQ numbering") the davinci GPIO driver fails to probe if we boot
      in legacy mode from any of the board files. Since the driver now
      expects every interrupt to be defined as a separate resource, split
      the definition of IRQ resources instead of having a single continuous
      interrupt range.
      
      Fixes: eb3744a2 ("gpio: davinci: Do not assume continuous IRQ numbering")
      Cc: stable@vger.kernel.org
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NSekhar Nori <nsekhar@ti.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a4f404af
    • B
      ARM: davinci: dm355: define gpio interrupts as separate resources · 8d6b2b24
      Bartosz Golaszewski 提交于
      [ Upstream commit 27db7baab640ea28d7994eda943fef170e347081 ]
      
      Since commit eb3744a2 ("gpio: davinci: Do not assume continuous
      IRQ numbering") the davinci GPIO driver fails to probe if we boot
      in legacy mode from any of the board files. Since the driver now
      expects every interrupt to be defined as a separate resource, split
      the definition of IRQ resources instead of having a single continuous
      interrupt range.
      
      Fixes: eb3744a2 ("gpio: davinci: Do not assume continuous IRQ numbering")
      Cc: stable@vger.kernel.org
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NSekhar Nori <nsekhar@ti.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8d6b2b24
    • B
      ARM: davinci: dm646x: define gpio interrupts as separate resources · d31f2b61
      Bartosz Golaszewski 提交于
      [ Upstream commit 2c9c83491f30afbce25796e185cd4d5e36080e31 ]
      
      Since commit eb3744a2 ("gpio: davinci: Do not assume continuous
      IRQ numbering") the davinci GPIO driver fails to probe if we boot
      in legacy mode from any of the board files. Since the driver now
      expects every interrupt to be defined as a separate resource, split
      the definition of IRQ resources instead of having a single continuous
      interrupt range.
      
      Fixes: eb3744a2 ("gpio: davinci: Do not assume continuous IRQ numbering")
      Cc: stable@vger.kernel.org
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NSekhar Nori <nsekhar@ti.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d31f2b61
    • B
      ARM: davinci: dm365: define gpio interrupts as separate resources · 4883e9e6
      Bartosz Golaszewski 提交于
      [ Upstream commit 193c04374e281a56c7d4f96e66d329671945bebe ]
      
      Since commit eb3744a2 ("gpio: davinci: Do not assume continuous
      IRQ numbering") the davinci GPIO driver fails to probe if we boot
      in legacy mode from any of the board files. Since the driver now
      expects every interrupt to be defined as a separate resource, split
      the definition of IRQ resources instead of having a single continuous
      interrupt range.
      
      Fixes: eb3744a2 ("gpio: davinci: Do not assume continuous IRQ numbering")
      Cc: stable@vger.kernel.org
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NSekhar Nori <nsekhar@ti.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4883e9e6
    • B
      ARM: davinci: da8xx: define gpio interrupts as separate resources · 0a6c3bda
      Bartosz Golaszewski 提交于
      [ Upstream commit 58a0afbf4c99ac355df16773af835b919b9432ee ]
      
      Since commit eb3744a2 ("gpio: davinci: Do not assume continuous
      IRQ numbering") the davinci GPIO driver fails to probe if we boot
      in legacy mode from any of the board files. Since the driver now
      expects every interrupt to be defined as a separate resource, split
      the definition of IRQ resources instead of having a single continuous
      interrupt range.
      
      Fixes: eb3744a2 ("gpio: davinci: Do not assume continuous IRQ numbering")
      Cc: stable@vger.kernel.org
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NSekhar Nori <nsekhar@ti.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0a6c3bda
    • V
      x86/kvm/lapic: preserve gfn_to_hva_cache len on cache reinit · 796469e3
      Vitaly Kuznetsov 提交于
      [ Upstream commit a7c42bb6da6b1b54b2e7bd567636d72d87b10a79 ]
      
      vcpu->arch.pv_eoi is accessible through both HV_X64_MSR_VP_ASSIST_PAGE and
      MSR_KVM_PV_EOI_EN so on migration userspace may try to restore them in any
      order. Values match, however, kvm_lapic_enable_pv_eoi() uses different
      length: for Hyper-V case it's the whole struct hv_vp_assist_page, for KVM
      native case it is 8. In case we restore KVM-native MSR last cache will
      be reinitialized with len=8 so trying to access VP assist page beyond
      8 bytes with kvm_read_guest_cached() will fail.
      
      Check if we re-initializing cache for the same address and preserve length
      in case it was greater.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      796469e3
    • L
      KVM: hyperv: define VP assist page helpers · cdad0f65
      Ladi Prosek 提交于
      [ Upstream commit 72bbf9358c3676bd89dc4bd8fb0b1f2a11c288fc ]
      
      The state related to the VP assist page is still managed by the LAPIC
      code in the pv_eoi field.
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      cdad0f65
    • V
      KVM: x86: hyperv: keep track of mismatched VP indexes · b0d9043b
      Vitaly Kuznetsov 提交于
      [ Upstream commit 87ee613d076351950b74383215437f841ebbeb75 ]
      
      In most common cases VP index of a vcpu matches its vcpu index. Userspace
      is, however, free to set any mapping it wishes and we need to account for
      that when we need to find a vCPU with a particular VP index. To keep search
      algorithms optimal in both cases introduce 'num_mismatched_vp_indexes'
      counter showing how many vCPUs with mismatching VP index we have. In case
      the counter is zero we can assume vp_index == vcpu_idx.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b0d9043b
    • V
      KVM: x86: hyperv: consistently use 'hv_vcpu' for 'struct kvm_vcpu_hv' variables · f031fd03
      Vitaly Kuznetsov 提交于
      [ Upstream commit 1779a39f786397760ae7a7cc03cf37697d8ae58d ]
      
      Rename 'hv' to 'hv_vcpu' in kvm_hv_set_msr/kvm_hv_get_msr(); 'hv' is
      'reserved' for 'struct kvm_hv' variables across the file.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f031fd03
    • V
      KVM: x86: hyperv: enforce vp_index < KVM_MAX_VCPUS · 0b535f7b
      Vitaly Kuznetsov 提交于
      [ Upstream commit 9170200ec0ebad70e5b9902bc93e2b1b11456a3b ]
      
      Hyper-V TLFS (5.0b) states:
      
      > Virtual processors are identified by using an index (VP index). The
      > maximum number of virtual processors per partition supported by the
      > current implementation of the hypervisor can be obtained through CPUID
      > leaf 0x40000005. A virtual processor index must be less than the
      > maximum number of virtual processors per partition.
      
      Forbid userspace to set VP_INDEX above KVM_MAX_VCPUS. get_vcpu_by_vpidx()
      can now be optimized to bail early when supplied vpidx is >= KVM_MAX_VCPUS.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0b535f7b
    • Z
      x86, hibernate: Fix nosave_regions setup for hibernation · 4d970758
      Zhimin Gu 提交于
      [ Upstream commit cc55f7537db6af371e9c1c6a71161ee40f918824 ]
      
      On 32bit systems, nosave_regions(non RAM areas) located between
      max_low_pfn and max_pfn are not excluded from hibernation snapshot
      currently, which may result in a machine check exception when
      trying to access these unsafe regions during hibernation:
      
      [  612.800453] Disabling lock debugging due to kernel taint
      [  612.805786] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 6: fe00000000801136
      [  612.814344] mce: [Hardware Error]: RIP !INEXACT! 60:<00000000d90be566> {swsusp_save+0x436/0x560}
      [  612.823167] mce: [Hardware Error]: TSC 1f5939fe276 ADDR dd000000 MISC 30e0000086
      [  612.830677] mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1529487426 SOCKET 0 APIC 0 microcode 24
      [  612.839581] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
      [  612.846394] mce: [Hardware Error]: Machine check: Processor context corrupt
      [  612.853380] Kernel panic - not syncing: Fatal machine check
      [  612.858978] Kernel Offset: 0x18000000 from 0xc1000000 (relocation range: 0xc0000000-0xf7ffdfff)
      
      This is because on 32bit systems, pages above max_low_pfn are regarded
      as high memeory, and accessing unsafe pages might cause expected MCE.
      On the problematic 32bit system, there are reserved memory above low
      memory, which triggered the MCE:
      
      e820 memory mapping:
      [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable
      [    0.000000] BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
      [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000d160cfff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000d160d000-0x00000000d1613fff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000d1614000-0x00000000d1a44fff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000d1a45000-0x00000000d1ecffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000d1ed0000-0x00000000d7eeafff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000d7eeb000-0x00000000d7ffffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000d8000000-0x00000000d875ffff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000d8760000-0x00000000d87fffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000d8800000-0x00000000d8fadfff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000d8fae000-0x00000000d8ffffff] ACPI data
      [    0.000000] BIOS-e820: [mem 0x00000000d9000000-0x00000000da71bfff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000da71c000-0x00000000da7fffff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000da800000-0x00000000dbb8bfff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000dbb8c000-0x00000000dbffffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000dd000000-0x00000000df1fffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed03fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
      [    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000041edfffff] usable
      
      Fix this problem by changing pfn limit from max_low_pfn to max_pfn.
      This fix does not impact 64bit system because on 64bit max_low_pfn
      is the same as max_pfn.
      Signed-off-by: NZhimin Gu <kookoo.gu@intel.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: All applicable <stable@vger.kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4d970758
    • D
      riscv: remove unused variable in ftrace · 5f147150
      David Abdurachmanov 提交于
      [ Upstream commit 397182e0db56b8894a43631ce72de14d90a29834 ]
      
      Noticed while building kernel-4.20.0-0.rc5.git2.1.fc30 for
      Fedora 30/RISCV.
      
      [..]
      BUILDSTDERR: arch/riscv/kernel/ftrace.c: In function 'prepare_ftrace_return':
      BUILDSTDERR: arch/riscv/kernel/ftrace.c:135:6: warning: unused variable 'err' [-Wunused-variable]
      BUILDSTDERR:   int err;
      BUILDSTDERR:       ^~~
      [..]
      Signed-off-by: NDavid Abdurachmanov <david.abdurachmanov@gmail.com>
      Fixes: e949b6db51dc1 ("riscv/function_graph: Simplify with function_graph_enter()")
      Reviewed-by: NOlof Johansson <olof@lixom.net>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5f147150
    • D
      arm64: dts: rockchip: enable usb-host regulators at boot on rk3328-rock64 · 6c550a5d
      Dmitry Voytik 提交于
      [ Upstream commit 26e2d7b03ea7ff254bf78305aa44dda62e70b78e ]
      
      After commit ef05bcb60c1a, boot from USB drives is broken.
      Fix this problem by enabling usb-host regulators during boot time.
      
      Fixes: ef05bcb60c1a ("arm64: dts: rockchip: fix vcc_host1_5v pin assign on rk3328-rock64")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDmitry Voytik <voytikd@gmail.com>
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6c550a5d
    • C
      powerpc/64: mark start_here_multiplatform as __ref · 7f8b2360
      Christophe Leroy 提交于
      [ Upstream commit 9c4e4c90ec24652921e31e9551fcaedc26eec86d ]
      
      Otherwise, the following warning is encountered:
      
      WARNING: vmlinux.o(.text+0x3dc6): Section mismatch in reference from the variable start_here_multiplatform to the function .init.text:.early_setup()
      The function start_here_multiplatform() references
      the function __init .early_setup().
      This is often because start_here_multiplatform lacks a __init
      annotation or the annotation of .early_setup is wrong.
      
      Fixes: 56c46bba9bbf ("powerpc/64: Fix booting large kernels with STRICT_KERNEL_RWX")
      Cc: Russell Currey <ruscur@russell.cc>
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7f8b2360
    • S
      x86/ftrace: Fix warning and considate ftrace_jmp_replace() and ftrace_call_replace() · 85a24825
      Steven Rostedt (VMware) 提交于
      [ Upstream commit 745cfeaac09ce359130a5451d90cb0bd4094c290 ]
      
      Arnd reported the following compiler warning:
      
      arch/x86/kernel/ftrace.c:669:23: error: 'ftrace_jmp_replace' defined but not used [-Werror=unused-function]
      
      The ftrace_jmp_replace() function now only has a single user and should be
      simply moved by that user. But looking at the code, it shows that
      ftrace_jmp_replace() is similar to ftrace_call_replace() except that instead
      of using the opcode of 0xe8 it uses 0xe9. It makes more sense to consolidate
      that function into one implementation that both ftrace_jmp_replace() and
      ftrace_call_replace() use by passing in the op code separate.
      
      The structure in ftrace_code_union is also modified to replace the "e8"
      field with the more appropriate name "op".
      
      Cc: stable@vger.kernel.org
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Link: http://lkml.kernel.org/r/20190304200748.1418790-1-arnd@arndb.de
      Fixes: d2a68c4effd8 ("x86/ftrace: Do not call function graph from dynamic trampolines")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      85a24825
    • G
      powerpc/tm: Fix FP/VMX unavailable exceptions inside a transaction · 47a0f70d
      Gustavo Romero 提交于
      commit 8205d5d98ef7f155de211f5e2eb6ca03d95a5a60 upstream.
      
      When we take an FP unavailable exception in a transaction we have to
      account for the hardware FP TM checkpointed registers being
      incorrect. In this case for this process we know the current and
      checkpointed FP registers must be the same (since FP wasn't used
      inside the transaction) hence in the thread_struct we copy the current
      FP registers to the checkpointed ones.
      
      This copy is done in tm_reclaim_thread(). We use thread->ckpt_regs.msr
      to determine if FP was on when in userspace. thread->ckpt_regs.msr
      represents the state of the MSR when exiting userspace. This is setup
      by check_if_tm_restore_required().
      
      Unfortunatley there is an optimisation in giveup_all() which returns
      early if tsk->thread.regs->msr (via local variable `usermsr`) has
      FP=VEC=VSX=SPE=0. This optimisation means that
      check_if_tm_restore_required() is not called and hence
      thread->ckpt_regs.msr is not updated and will contain an old value.
      
      This can happen if due to load_fp=255 we start a userspace process
      with MSR FP=1 and then we are context switched out. In this case
      thread->ckpt_regs.msr will contain FP=1. If that same process is then
      context switched in and load_fp overflows, MSR will have FP=0. If that
      process now enters a transaction and does an FP instruction, the FP
      unavailable will not update thread->ckpt_regs.msr (the bug) and MSR
      FP=1 will be retained in thread->ckpt_regs.msr.  tm_reclaim_thread()
      will then not perform the required memcpy and the checkpointed FP regs
      in the thread struct will contain the wrong values.
      
      The code path for this happening is:
      
             Userspace:                      Kernel
                         Start userspace
                          with MSR FP/VEC/VSX/SPE=0 TM=1
                            < -----
             ...
             tbegin
             bne
             fp instruction
                         FP unavailable
                             ---- >
                                              fp_unavailable_tm()
      					  tm_reclaim_current()
      					    tm_reclaim_thread()
      					      giveup_all()
      					        return early since FP/VMX/VSX=0
      						/* ckpt MSR not updated (Incorrect) */
      					      tm_reclaim()
      					        /* thread_struct ckpt FP regs contain junk (OK) */
                                                    /* Sees ckpt MSR FP=1 (Incorrect) */
      					      no memcpy() performed
      					        /* thread_struct ckpt FP regs not fixed (Incorrect) */
      					  tm_recheckpoint()
      					     /* Put junk in hardware checkpoint FP regs */
                                               ....
                            < -----
                         Return to userspace
                           with MSR TM=1 FP=1
                           with junk in the FP TM checkpoint
             TM rollback
             reads FP junk
      
      This is a data integrity problem for the current process as the FP
      registers are corrupted. It's also a security problem as the FP
      registers from one process may be leaked to another.
      
      This patch moves up check_if_tm_restore_required() in giveup_all() to
      ensure thread->ckpt_regs.msr is updated correctly.
      
      A simple testcase to replicate this will be posted to
      tools/testing/selftests/powerpc/tm/tm-poison.c
      
      Similarly for VMX.
      
      This fixes CVE-2019-15030.
      
      Fixes: f48e91e8 ("powerpc/tm: Fix FP and VMX register corruption")
      Cc: stable@vger.kernel.org # 4.12+
      Signed-off-by: NGustavo Romero <gromero@linux.vnet.ibm.com>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190904045529.23002-1-gromero@linux.vnet.ibm.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47a0f70d
  2. 10 9月, 2019 4 次提交
  3. 06 9月, 2019 8 次提交
    • G
      x86/ptrace: fix up botched merge of spectrev1 fix · b307f99d
      Greg Kroah-Hartman 提交于
      I incorrectly merged commit 31a2fbb390fe ("x86/ptrace: Fix possible
      spectre-v1 in ptrace_get_debugreg()") when backporting it, as was
      graciously pointed out at
      https://grsecurity.net/teardown_of_a_failed_linux_lts_spectre_fix.php
      
      Resolve the upstream difference with the stable kernel merge to properly
      protect things.
      Reported-by: NBrad Spengler <spender@grsecurity.net>
      Cc: Dianzhang Chen <dianzhangchen0@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <bp@alien8.de>
      Cc: <hpa@zytor.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b307f99d
    • A
      KVM: PPC: Book3S: Fix incorrect guest-to-user-translation error handling · db1841a2
      Alexey Kardashevskiy 提交于
      [ Upstream commit ddfd151f3def9258397fcde7a372205a2d661903 ]
      
      H_PUT_TCE_INDIRECT handlers receive a page with up to 512 TCEs from
      a guest. Although we verify correctness of TCEs before we do anything
      with the existing tables, there is a small window when a check in
      kvmppc_tce_validate might pass and right after that the guest alters
      the page of TCEs, causing an early exit from the handler and leaving
      srcu_read_lock(&vcpu->kvm->srcu) (virtual mode) or lock_rmap(rmap)
      (real mode) locked.
      
      This fixes the bug by jumping to the common exit code with an appropriate
      unlock.
      
      Cc: stable@vger.kernel.org # v4.11+
      Fixes: 121f80ba ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO")
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      db1841a2
    • B
      x86/apic: Include the LDR when clearing out APIC registers · edc454cd
      Bandan Das 提交于
      commit 558682b5291937a70748d36fd9ba757fb25b99ae upstream.
      
      Although APIC initialization will typically clear out the LDR before
      setting it, the APIC cleanup code should reset the LDR.
      
      This was discovered with a 32-bit KVM guest jumping into a kdump
      kernel. The stale bits in the LDR triggered a bug in the KVM APIC
      implementation which caused the destination mapping for VCPUs to be
      corrupted.
      
      Note that this isn't intended to paper over the KVM APIC bug. The kernel
      has to clear the LDR when resetting the APIC registers except when X2APIC
      is enabled.
      
      This lacks a Fixes tag because missing to clear LDR goes way back into pre
      git history.
      
      [ tglx: Made x2apic_enabled a function call as required ]
      Signed-off-by: NBandan Das <bsd@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190826101513.5080-3-bsd@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edc454cd
    • B
      x86/apic: Do not initialize LDR and DFR for bigsmp · 95983265
      Bandan Das 提交于
      commit bae3a8d3308ee69a7dbdf145911b18dfda8ade0d upstream.
      
      Legacy apic init uses bigsmp for smp systems with 8 and more CPUs. The
      bigsmp APIC implementation uses physical destination mode, but it
      nevertheless initializes LDR and DFR. The LDR even ends up incorrectly with
      multiple bit being set.
      
      This does not cause a functional problem because LDR and DFR are ignored
      when physical destination mode is active, but it triggered a problem on a
      32-bit KVM guest which jumps into a kdump kernel.
      
      The multiple bits set unearthed a bug in the KVM APIC implementation. The
      code which creates the logical destination map for VCPUs ignores the
      disabled state of the APIC and ends up overwriting an existing valid entry
      and as a result, APIC calibration hangs in the guest during kdump
      initialization.
      
      Remove the bogus LDR/DFR initialization.
      
      This is not intended to work around the KVM APIC bug. The LDR/DFR
      ininitalization is wrong on its own.
      
      The issue goes back into the pre git history. The fixes tag is the commit
      in the bitkeeper import which introduced bigsmp support in 2003.
      
        git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
      
      Fixes: db7b9e9f26b8 ("[PATCH] Clustered APIC setup for >8 CPU systems")
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBandan Das <bsd@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190826101513.5080-2-bsd@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95983265
    • S
      uprobes/x86: Fix detection of 32-bit user mode · 941d875c
      Sebastian Mayr 提交于
      commit 9212ec7d8357ea630031e89d0d399c761421c83b upstream.
      
      32-bit processes running on a 64-bit kernel are not always detected
      correctly, causing the process to crash when uretprobes are installed.
      
      The reason for the crash is that in_ia32_syscall() is used to determine the
      process's mode, which only works correctly when called from a syscall.
      
      In the case of uretprobes, however, the function is called from a exception
      and always returns 'false' on a 64-bit kernel. In consequence this leads to
      corruption of the process's return address.
      
      Fix this by using user_64bit_mode() instead of in_ia32_syscall(), which
      is correct in any situation.
      
      [ tglx: Add a comment and the following historical info ]
      
      This should have been detected by the rename which happened in commit
      
        abfb9498 ("x86/entry: Rename is_{ia32,x32}_task() to in_{ia32,x32}_syscall()")
      
      which states in the changelog:
      
          The is_ia32_task()/is_x32_task() function names are a big misnomer: they
          suggests that the compat-ness of a system call is a task property, which
          is not true, the compatness of a system call purely depends on how it
          was invoked through the system call layer.
          .....
      
      and then it went and blindly renamed every call site.
      
      Sadly enough this was already mentioned here:
      
         8faaed1b ("uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and
      arch_uretprobe_hijack_return_addr()")
      
      where the changelog says:
      
          TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
          not necessarily mean 32bit. Fortunately syscall-like insns can't be
          probed so it actually works, but it would be better to rename and
          use is_ia32_frame().
      
      and goes all the way back to:
      
          0326f5a9 ("uprobes/core: Handle breakpoint and singlestep exceptions")
      
      Oh well. 7+ years until someone actually tried a uretprobe on a 32bit
      process on a 64bit kernel....
      
      Fixes: 0326f5a9 ("uprobes/core: Handle breakpoint and singlestep exceptions")
      Signed-off-by: NSebastian Mayr <me@sam.st>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190728152617.7308-1-me@sam.stSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      941d875c
    • S
      KVM: x86: Don't update RIP or do single-step on faulting emulation · 3c2b4827
      Sean Christopherson 提交于
      commit 75ee23b30dc712d80d2421a9a547e7ab6e379b44 upstream.
      
      Don't advance RIP or inject a single-step #DB if emulation signals a
      fault.  This logic applies to all state updates that are conditional on
      clean retirement of the emulation instruction, e.g. updating RFLAGS was
      previously handled by commit 38827dbd ("KVM: x86: Do not update
      EFLAGS on faulting emulation").
      
      Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with
      ctxt->_eip until emulation "retires" anyways.  Skipping #DB injection
      fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to
      invalid state with EFLAGS.TF=1 would loop indefinitely due to emulation
      overwriting the #UD with #DB and thus restarting the bad SYSCALL over
      and over.
      
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: stable@vger.kernel.org
      Reported-by: NAndy Lutomirski <luto@kernel.org>
      Fixes: 663f4c61 ("KVM: x86: handle singlestep during emulation")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c2b4827
    • R
      kvm: x86: skip populating logical dest map if apic is not sw enabled · 3ec35109
      Radim Krcmar 提交于
      commit b14c876b994f208b6b95c222056e1deb0a45de0e upstream.
      
      recalculate_apic_map does not santize ldr and it's possible that
      multiple bits are set. In that case, a previous valid entry
      can potentially be overwritten by an invalid one.
      
      This condition is hit when booting a 32 bit, >8 CPU, RHEL6 guest and then
      triggering a crash to boot a kdump kernel. This is the sequence of
      events:
      1. Linux boots in bigsmp mode and enables PhysFlat, however, it still
      writes to the LDR which probably will never be used.
      2. However, when booting into kdump, the stale LDR values remain as
      they are not cleared by the guest and there isn't a apic reset.
      3. kdump boots with 1 cpu, and uses Logical Destination Mode but the
      logical map has been overwritten and points to an inactive vcpu.
      Signed-off-by: NRadim Krcmar <rkrcmar@redhat.com>
      Signed-off-by: NBandan Das <bsd@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ec35109
    • W
      arm64: cpufeature: Don't treat granule sizes as strict · 8bd54268
      Will Deacon 提交于
      [ Upstream commit 5717fe5ab38f9ccb32718bcb03bea68409c9cce4 ]
      
      If a CPU doesn't support the page size for which the kernel is
      configured, then we will complain and refuse to bring it online. For
      secondary CPUs (and the boot CPU on a system booting with EFI), we will
      also print an error identifying the mismatch.
      
      Consequently, the only time that the cpufeature code can detect a
      granule size mismatch is for a granule other than the one that is
      currently being used. Although we would rather such systems didn't
      exist, we've unfortunately lost that battle and Kevin reports that
      on his amlogic S922X (odroid-n2 board) we end up warning and taining
      with defconfig because 16k pages are not supported by all of the CPUs.
      
      In such a situation, we don't actually care about the feature mismatch,
      particularly now that KVM only exposes the sanitised view of the CPU
      registers (commit 93390c0a - "arm64: KVM: Hide unsupported AArch64
      CPU features from guests"). Treat the granule fields as non-strict and
      let Kevin run without a tainted kernel.
      
      Cc: Marc Zyngier <maz@kernel.org>
      Reported-by: NKevin Hilman <khilman@baylibre.com>
      Tested-by: NKevin Hilman <khilman@baylibre.com>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      [catalin.marinas@arm.com: changelog updated with KVM sanitised regs commit]
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8bd54268
  4. 29 8月, 2019 3 次提交