1. 19 9月, 2019 4 次提交
  2. 16 9月, 2019 36 次提交
    • G
      powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts · 569775bd
      Gustavo Romero 提交于
      [ Upstream commit a8318c13e79badb92bc6640704a64cc022a6eb97 ]
      
      When in userspace and MSR FP=0 the hardware FP state is unrelated to
      the current process. This is extended for transactions where if tbegin
      is run with FP=0, the hardware checkpoint FP state will also be
      unrelated to the current process. Due to this, we need to ensure this
      hardware checkpoint is updated with the correct state before we enable
      FP for this process.
      
      Unfortunately we get this wrong when returning to a process from a
      hardware interrupt. A process that starts a transaction with FP=0 can
      take an interrupt. When the kernel returns back to that process, we
      change to FP=1 but with hardware checkpoint FP state not updated. If
      this transaction is then rolled back, the FP registers now contain the
      wrong state.
      
      The process looks like this:
         Userspace:                      Kernel
      
                     Start userspace
                      with MSR FP=0 TM=1
                        < -----
         ...
         tbegin
         bne
                     Hardware interrupt
                         ---- >
                                          <do_IRQ...>
                                          ....
                                          ret_from_except
                                            restore_math()
      				        /* sees FP=0 */
                                              restore_fp()
                                                tm_active_with_fp()
      					    /* sees FP=1 (Incorrect) */
                                                load_fp_state()
                                              FP = 0 -> 1
                        < -----
                     Return to userspace
                       with MSR TM=1 FP=1
                       with junk in the FP TM checkpoint
         TM rollback
         reads FP junk
      
      When returning from the hardware exception, tm_active_with_fp() is
      incorrectly making restore_fp() call load_fp_state() which is setting
      FP=1.
      
      The fix is to remove tm_active_with_fp().
      
      tm_active_with_fp() is attempting to handle the case where FP state
      has been changed inside a transaction. In this case the checkpointed
      and transactional FP state is different and hence we must restore the
      FP state (ie. we can't do lazy FP restore inside a transaction that's
      used FP). It's safe to remove tm_active_with_fp() as this case is
      handled by restore_tm_state(). restore_tm_state() detects if FP has
      been using inside a transaction and will set load_fp and call
      restore_math() to ensure the FP state (checkpoint and transaction) is
      restored.
      
      This is a data integrity problem for the current process as the FP
      registers are corrupted. It's also a security problem as the FP
      registers from one process may be leaked to another.
      
      Similarly for VMX.
      
      A simple testcase to replicate this will be posted to
      tools/testing/selftests/powerpc/tm/tm-poison.c
      
      This fixes CVE-2019-15031.
      
      Fixes: a7771176 ("powerpc: Don't enable FP/Altivec if not checkpointed")
      Cc: stable@vger.kernel.org # 4.15+
      Signed-off-by: NGustavo Romero <gromero@linux.ibm.com>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190904045529.23002-2-gromero@linux.vnet.ibm.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      569775bd
    • B
      powerpc/tm: Remove msr_tm_active() · 052bc385
      Breno Leitao 提交于
      [ Upstream commit 5c784c8414fba11b62e12439f11e109fb5751f38 ]
      
      Currently msr_tm_active() is a wrapper around MSR_TM_ACTIVE() if
      CONFIG_PPC_TRANSACTIONAL_MEM is set, or it is just a function that
      returns false if CONFIG_PPC_TRANSACTIONAL_MEM is not set.
      
      This function is not necessary, since MSR_TM_ACTIVE() just do the same and
      could be used, removing the dualism and simplifying the code.
      
      This patchset remove every instance of msr_tm_active() and replaced it
      by MSR_TM_ACTIVE().
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      052bc385
    • S
      powerpc/mm: Limit rma_size to 1TB when running without HV mode · c4fc7cb9
      Suraj Jitindar Singh 提交于
      [ Upstream commit da0ef93310e67ae6902efded60b6724dab27a5d1 ]
      
      The virtual real mode addressing (VRMA) mechanism is used when a
      partition is using HPT (Hash Page Table) translation and performs real
      mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this mode
      effective address bits 0:23 are treated as zero (i.e. the access is
      aliased to 0) and the access is performed using an implicit 1TB SLB
      entry.
      
      The size of the RMA (Real Memory Area) is communicated to the guest as
      the size of the first memory region in the device tree. And because of
      the mechanism described above can be expected to not exceed 1TB. In
      the event that the host erroneously represents the RMA as being larger
      than 1TB, guest accesses in real mode to memory addresses above 1TB
      will be aliased down to below 1TB. This means that a memory access
      performed in real mode may differ to one performed in virtual mode for
      the same memory address, which would likely have unintended
      consequences.
      
      To avoid this outcome have the guest explicitly limit the size of the
      RMA to the current maximum, which is 1TB. This means that even if the
      first memory block is larger than 1TB, only the first 1TB should be
      accessed in real mode.
      
      Fixes: c610d65c ("powerpc/pseries: lift RTAS limit for hash")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Tested-by: NSatheesh Rajendran <sathnaga@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190710052018.14628-1-sjitindarsingh@gmail.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      c4fc7cb9
    • L
      ARM: dts: gemini: Set DIR-685 SPI CS as active low · bab0ff2d
      Linus Walleij 提交于
      [ Upstream commit f90b8fda3a9d72a9422ea80ae95843697f94ea4a ]
      
      The SPI to the display on the DIR-685 is active low, we were
      just saved by the SPI library enforcing active low on everything
      before, so set it as active low to avoid ambiguity.
      
      Link: https://lore.kernel.org/r/20190715202101.16060-1-linus.walleij@linaro.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      bab0ff2d
    • M
      KVM: PPC: Book3S HV: Fix CR0 setting in TM emulation · 3a1b79ad
      Michael Neuling 提交于
      [ Upstream commit 3fefd1cd95df04da67c83c1cb93b663f04b3324f ]
      
      When emulating tsr, treclaim and trechkpt, we incorrectly set CR0. The
      code currently sets:
          CR0 <- 00 || MSR[TS]
      but according to the ISA it should be:
          CR0 <-  0 || MSR[TS] || 0
      
      This fixes the bit shift to put the bits in the correct location.
      
      This is a data integrity issue as CR0 is corrupted.
      
      Fixes: 4bb3c7a0 ("KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9")
      Cc: stable@vger.kernel.org # v4.17+
      Tested-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3a1b79ad
    • P
      KVM: PPC: Use ccr field in pt_regs struct embedded in vcpu struct · 3ac71806
      Paul Mackerras 提交于
      [ Upstream commit fd0944baad806dfb4c777124ec712c55b714ff51 ]
      
      When the 'regs' field was added to struct kvm_vcpu_arch, the code
      was changed to use several of the fields inside regs (e.g., gpr, lr,
      etc.) but not the ccr field, because the ccr field in struct pt_regs
      is 64 bits on 64-bit platforms, but the cr field in kvm_vcpu_arch is
      only 32 bits.  This changes the code to use the regs.ccr field
      instead of cr, and changes the assembly code on 64-bit platforms to
      use 64-bit loads and stores instead of 32-bit ones.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3ac71806
    • W
      KVM: VMX: check CPUID before allowing read/write of IA32_XSS · beeeead9
      Wanpeng Li 提交于
      [ Upstream commit 4d763b168e9c5c366b05812c7bba7662e5ea3669 ]
      
      Raise #GP when guest read/write IA32_XSS, but the CPUID bits
      say that it shouldn't exist.
      
      Fixes: 20300099 (kvm: vmx: add MSR logic for XSAVES)
      Reported-by: NXiaoyao Li <xiaoyao.li@linux.intel.com>
      Reported-by: NTao Xu <tao3.xu@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      beeeead9
    • S
      KVM: VMX: Fix handling of #MC that occurs during VM-Entry · 891011ca
      Sean Christopherson 提交于
      [ Upstream commit beb8d93b3e423043e079ef3dda19dad7b28467a8 ]
      
      A previous fix to prevent KVM from consuming stale VMCS state after a
      failed VM-Entry inadvertantly blocked KVM's handling of machine checks
      that occur during VM-Entry.
      
      Per Intel's SDM, a #MC during VM-Entry is handled in one of three ways,
      depending on when the #MC is recognoized.  As it pertains to this bug
      fix, the third case explicitly states EXIT_REASON_MCE_DURING_VMENTRY
      is handled like any other VM-Exit during VM-Entry, i.e. sets bit 31 to
      indicate the VM-Entry failed.
      
      If a machine-check event occurs during a VM entry, one of the following occurs:
       - The machine-check event is handled as if it occurred before the VM entry:
              ...
       - The machine-check event is handled after VM entry completes:
              ...
       - A VM-entry failure occurs as described in Section 26.7. The basic
         exit reason is 41, for "VM-entry failure due to machine-check event".
      
      Explicitly handle EXIT_REASON_MCE_DURING_VMENTRY as a one-off case in
      vmx_vcpu_run() instead of binning it into vmx_complete_atomic_exit().
      Doing so allows vmx_vcpu_run() to handle VMX_EXIT_REASONS_FAILED_VMENTRY
      in a sane fashion and also simplifies vmx_complete_atomic_exit() since
      VMCS.VM_EXIT_INTR_INFO is guaranteed to be fresh.
      
      Fixes: b060ca3b ("kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      891011ca
    • S
      KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with bad value · 74ce1333
      Sean Christopherson 提交于
      [ Upstream commit d28f4290b53a157191ed9991ad05dffe9e8c0c89 ]
      
      The behavior of WRMSR is in no way dependent on whether or not KVM
      consumes the value.
      
      Fixes: 4566654b ("KVM: vmx: Inject #GP on invalid PAT CR")
      Cc: stable@vger.kernel.org
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      74ce1333
    • P
      KVM: x86: optimize check for valid PAT value · 74fd8aae
      Paolo Bonzini 提交于
      [ Upstream commit 674ea351cdeb01d2740edce31db7f2d79ce6095d ]
      
      This check will soon be done on every nested vmentry and vmexit,
      "parallelize" it using bitwise operations.
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      74fd8aae
    • P
      kvm: Check irqchip mode before assign irqfd · d5f65393
      Peter Xu 提交于
      [ Upstream commit 654f1f13ea56b92bacade8ce2725aea0457f91c0 ]
      
      When assigning kvm irqfd we didn't check the irqchip mode but we allow
      KVM_IRQFD to succeed with all the irqchip modes.  However it does not
      make much sense to create irqfd even without the kernel chips.  Let's
      provide a arch-dependent helper to check whether a specific irqfd is
      allowed by the arch.  At least for x86, it should make sense to check:
      
      - when irqchip mode is NONE, all irqfds should be disallowed, and,
      
      - when irqchip mode is SPLIT, irqfds that are with resamplefd should
        be disallowed.
      
      For either of the case, previously we'll silently ignore the irq or
      the irq ack event if the irqchip mode is incorrect.  However that can
      cause misterious guest behaviors and it can be hard to triage.  Let's
      fail KVM_IRQFD even earlier to detect these incorrect configurations.
      
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Radim Krčmář <rkrcmar@redhat.com>
      CC: Alex Williamson <alex.williamson@redhat.com>
      CC: Eduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d5f65393
    • E
      ARC: mm: SIGSEGV userspace trying to access kernel virtual memory · cacbc853
      Eugeniy Paltsev 提交于
      [ Upstream commit a8c715b4dd73c26a81a9cc8dc792aa715d8b4bb2 ]
      
      As of today if userspace process tries to access a kernel virtual addres
      (0x7000_0000 to 0x7ffff_ffff) such that a legit kernel mapping already
      exists, that process hangs instead of being killed with SIGSEGV
      
      Fix that by ensuring that do_page_fault() handles kenrel vaddr only if
      in kernel mode.
      
      And given this, we can also simplify the code a bit. Now a vmalloc fault
      implies kernel mode so its failure (for some reason) can reuse the
      @no_context label and we can remove @bad_area_nosemaphore.
      
      Reproduce user test for original problem:
      
      ------------------------>8-----------------
       #include <stdlib.h>
       #include <stdint.h>
      
       int main(int argc, char *argv[])
       {
       	volatile uint32_t temp;
      
       	temp = *(uint32_t *)(0x70000000);
       }
      ------------------------>8-----------------
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      cacbc853
    • E
      ARC: mm: fix uninitialised signal code in do_page_fault · 7edfa9c9
      Eugeniy Paltsev 提交于
      [ Upstream commit 121e38e5acdc8e1e4cdb750fcdcc72f94e420968 ]
      
      Commit 15773ae938d8 ("signal/arc: Use force_sig_fault where
      appropriate") introduced undefined behaviour by leaving si_code
      unitiailized and leaking random kernel values to user space.
      
      Fixes: 15773ae938d8 ("signal/arc: Use force_sig_fault where appropriate")
      Signed-off-by: NEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7edfa9c9
    • E
      signal/arc: Use force_sig_fault where appropriate · 0828438e
      Eric W. Biederman 提交于
      [ Upstream commit 15773ae938d8d93d982461990bebad6e1d7a1830 ]
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0828438e
    • C
      ARM: dts: qcom: ipq4019: enlarge PCIe BAR range · 0a0176f9
      Christian Lamparter 提交于
      [ Upstream commit f3e35357cd460a8aeb48b8113dc4b761a7d5c828 ]
      
      David Bauer reported that the VDSL modem (attached via PCIe)
      on his AVM Fritz!Box 7530 was complaining about not having
      enough space in the BAR. A closer inspection of the old
      qcom-ipq40xx.dtsi pulled from the GL-iNet repository listed:
      
      | qcom,pcie@80000 {
      |	compatible = "qcom,msm_pcie";
      |	reg = <0x80000 0x2000>,
      |	      <0x99000 0x800>,
      |	      <0x40000000 0xf1d>,
      |	      <0x40000f20 0xa8>,
      |	      <0x40100000 0x1000>,
      |	      <0x40200000 0x100000>,
      |	      <0x40300000 0xd00000>;
      |	reg-names = "parf", "phy", "dm_core", "elbi",
      |			"conf", "io", "bars";
      
      Matching the reg-names with the listed reg leads to
      <0xd00000> as the size for the "bars".
      
      Cc: stable@vger.kernel.org
      BugLink: https://www.mail-archive.com/openwrt-devel@lists.openwrt.org/msg45212.htmlReported-by: NDavid Bauer <mail@david-bauer.net>
      Signed-off-by: NChristian Lamparter <chunkeey@gmail.com>
      Signed-off-by: NAndy Gross <agross@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0a0176f9
    • N
      ARM: dts: qcom: ipq4019: Fix MSI IRQ type · 445a78ea
      Niklas Cassel 提交于
      [ Upstream commit 97131f85c08e024df49480ed499aae8fb754067f ]
      
      The databook clearly states that the MSI IRQ (msi_ctrl_int) is a level
      triggered interrupt.
      
      The msi_ctrl_int will be high for as long as any MSI status bit is set,
      thus the IRQ type should be set to IRQ_TYPE_LEVEL_HIGH, causing the
      IRQ handler to keep getting called, as long as any MSI status bit is set.
      
      A git grep shows that ipq4019 is the only SoC using snps,dw-pcie that has
      configured this IRQ incorrectly.
      
      Not having the correct IRQ type defined will cause us to lose interrupts,
      which in turn causes timeouts in the PCIe endpoint drivers.
      Signed-off-by: NNiklas Cassel <niklas.cassel@linaro.org>
      Reviewed-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: NAndy Gross <andy.gross@linaro.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      445a78ea
    • M
      ARM: dts: qcom: ipq4019: fix PCI range · df1216d8
      Mathias Kresin 提交于
      [ Upstream commit da89f500cb55fb3f19c4b399b46d8add0abbd4d6 ]
      
      The PCI range is invalid and PCI attached devices doen't work.
      Signed-off-by: NMathias Kresin <dev@kresin.me>
      Signed-off-by: NJohn Crispin <john@phrozen.org>
      Signed-off-by: NAndy Gross <andy.gross@linaro.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      df1216d8
    • S
      KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels · df5d4ea2
      Sean Christopherson 提交于
      [ Upstream commit b68f3cc7d978943fcf85148165b00594c38db776 ]
      
      Invoking the 64-bit variation on a 32-bit kenrel will crash the guest,
      trigger a WARN, and/or lead to a buffer overrun in the host, e.g.
      rsm_load_state_64() writes r8-r15 unconditionally, but enum kvm_reg and
      thus x86_emulate_ctxt._regs only define r8-r15 for CONFIG_X86_64.
      
      KVM allows userspace to report long mode support via CPUID, even though
      the guest is all but guaranteed to crash if it actually tries to enable
      long mode.  But, a pure 32-bit guest that is ignorant of long mode will
      happily plod along.
      
      SMM complicates things as 64-bit CPUs use a different SMRAM save state
      area.  KVM handles this correctly for 64-bit kernels, e.g. uses the
      legacy save state map if userspace has hid long mode from the guest,
      but doesn't fare well when userspace reports long mode support on a
      32-bit host kernel (32-bit KVM doesn't support 64-bit guests).
      
      Since the alternative is to crash the guest, e.g. by not loading state
      or explicitly requesting shutdown, unconditionally use the legacy SMRAM
      save state map for 32-bit KVM.  If a guest has managed to get far enough
      to handle SMIs when running under a weird/buggy userspace hypervisor,
      then don't deliberately crash the guest since there are no downsides
      (from KVM's perspective) to allow it to continue running.
      
      Fixes: 660a5d51 ("KVM: x86: save/load state on SMM switch")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      df5d4ea2
    • W
      x86/kvm: move kvm_load/put_guest_xcr0 into atomic context · 7a74d806
      WANG Chao 提交于
      [ Upstream commit 1811d979c71621aafc7b879477202d286f7e863b ]
      
      guest xcr0 could leak into host when MCE happens in guest mode. Because
      do_machine_check() could schedule out at a few places.
      
      For example:
      
      kvm_load_guest_xcr0
      ...
      kvm_x86_ops->run(vcpu) {
        vmx_vcpu_run
          vmx_complete_atomic_exit
            kvm_machine_check
              do_machine_check
                do_memory_failure
                  memory_failure
                    lock_page
      
      In this case, host_xcr0 is 0x2ff, guest vcpu xcr0 is 0xff. After schedule
      out, host cpu has guest xcr0 loaded (0xff).
      
      In __switch_to {
           switch_fpu_finish
             copy_kernel_to_fpregs
               XRSTORS
      
      If any bit i in XSTATE_BV[i] == 1 and xcr0[i] == 0, XRSTORS will
      generate #GP (In this case, bit 9). Then ex_handler_fprestore kicks in
      and tries to reinitialize fpu by restoring init fpu state. Same story as
      last #GP, except we get DOUBLE FAULT this time.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NWANG Chao <chao.wang@ucloud.cn>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7a74d806
    • B
      kvm: mmu: Fix overflow on kvm mmu page limit calculation · 163b24b1
      Ben Gardon 提交于
      [ Upstream commit bc8a3d8925a8fa09fa550e0da115d95851ce33c6 ]
      
      KVM bases its memory usage limits on the total number of guest pages
      across all memslots. However, those limits, and the calculations to
      produce them, use 32 bit unsigned integers. This can result in overflow
      if a VM has more guest pages that can be represented by a u32. As a
      result of this overflow, KVM can use a low limit on the number of MMU
      pages it will allocate. This makes KVM unable to map all of guest memory
      at once, prompting spurious faults.
      
      Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch
      	introduced no new failures.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      163b24b1
    • D
      arm64: dts: stratix10: add the sysmgr-syscon property from the gmac's · 37222eaf
      Dinh Nguyen 提交于
      [ Upstream commit 8efd6365417a044db03009724ecc1a9521524913 ]
      
      The gmac ethernet driver uses the "altr,sysmgr-syscon" property to
      configure phy settings for the gmac controller.
      
      Add the "altr,sysmgr-syscon" property to all gmac nodes.
      
      This patch fixes:
      
      [    0.917530] socfpga-dwmac ff800000.ethernet: No sysmgr-syscon node found
      [    0.924209] socfpga-dwmac ff800000.ethernet: Unable to parse OF data
      
      Cc: stable@vger.kernel.org
      Reported-by: NLey Foon Tan <ley.foon.tan@intel.com>
      Signed-off-by: NDinh Nguyen <dinguyen@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      37222eaf
    • M
      powerpc/kvm: Save and restore host AMR/IAMR/UAMOR · 915c9d0a
      Michael Ellerman 提交于
      [ Upstream commit c3c7470c75566a077c8dc71dcf8f1948b8ddfab4 ]
      
      When the hash MMU is active the AMR, IAMR and UAMOR are used for
      pkeys. The AMR is directly writable by user space, and the UAMOR masks
      those writes, meaning both registers are effectively user register
      state. The IAMR is used to create an execute only key.
      
      Also we must maintain the value of at least the AMR when running in
      process context, so that any memory accesses done by the kernel on
      behalf of the process are correctly controlled by the AMR.
      
      Although we are correctly switching all registers when going into a
      guest, on returning to the host we just write 0 into all regs, except
      on Power9 where we restore the IAMR correctly.
      
      This could be observed by a user process if it writes the AMR, then
      runs a guest and we then return immediately to it without
      rescheduling. Because we have written 0 to the AMR that would have the
      effect of granting read/write permission to pages that the process was
      trying to protect.
      
      In addition, when using the Radix MMU, the AMR can prevent inadvertent
      kernel access to userspace data, writing 0 to the AMR disables that
      protection.
      
      So save and restore AMR, IAMR and UAMOR.
      
      Fixes: cf43d3b2 ("powerpc: Enable pkey subsystem")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      915c9d0a
    • P
      x86/kvmclock: set offset for kvm unstable clock · 1d60902a
      Pavel Tatashin 提交于
      [ Upstream commit b5179ec4187251a751832193693d6e474d3445ac ]
      
      VMs may show incorrect uptime and dmesg printk offsets on hypervisors with
      unstable clock. The problem is produced when VM is rebooted without exiting
      from qemu.
      
      The fix is to calculate clock offset not only for stable clock but for
      unstable clock as well, and use kvm_sched_clock_read() which substracts
      the offset for both clocks.
      
      This is safe, because pvclock_clocksource_read() does the right thing and
      makes sure that clock always goes forward, so once offset is calculated
      with unstable clock, we won't get new reads that are smaller than offset,
      and thus won't get negative results.
      
      Thank you Jon DeVree for helping to reproduce this issue.
      
      Fixes: 857baa87 ("sched/clock: Enable sched clock early")
      Cc: stable@vger.kernel.org
      Reported-by: NDominique Martinet <asmadeus@codewreck.org>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1d60902a
    • S
      KVM: VMX: Compare only a single byte for VMCS' "launched" in vCPU-run · cd490d44
      Sean Christopherson 提交于
      [ Upstream commit 61c08aa9606d4e48a8a50639c956448a720174c3 ]
      
      The vCPU-run asm blob does a manual comparison of a VMCS' launched
      status to execute the correct VM-Enter instruction, i.e. VMLAUNCH vs.
      VMRESUME.  The launched flag is a bool, which is a typedef of _Bool.
      C99 does not define an exact size for _Bool, stating only that is must
      be large enough to hold '0' and '1'.  Most, if not all, compilers use
      a single byte for _Bool, including gcc[1].
      
      Originally, 'launched' was of type 'int' and so the asm blob used 'cmpl'
      to check the launch status.  When 'launched' was moved to be stored on a
      per-VMCS basis, struct vcpu_vmx's "temporary" __launched flag was added
      in order to avoid having to pass the current VMCS into the asm blob.
      The new  '__launched' was defined as a 'bool' and not an 'int', but the
      'cmp' instruction was not updated.
      
      This has not caused any known problems, likely due to compilers aligning
      variables to 4-byte or 8-byte boundaries and KVM zeroing out struct
      vcpu_vmx during allocation.  I.e. vCPU-run accesses "junk" data, it just
      happens to always be zero and so doesn't affect the result.
      
      [1] https://gcc.gnu.org/ml/gcc-patches/2000-10/msg01127.html
      
      Fixes: d462b819 ("KVM: VMX: Keep list of loaded VMCSs, instead of vcpus")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      cd490d44
    • V
      ARC: mm: do_page_fault fixes #1: relinquish mmap_sem if signal arrives while handle_mm_fault · 8c6fb55a
      Vineet Gupta 提交于
      [ Upstream commit 4d447455e73b47c43dd35fcc38ed823d3182a474 ]
      
      do_page_fault() forgot to relinquish mmap_sem if a signal came while
      handling handle_mm_fault() - due to say a ctl+c or oom etc.
      This would later cause a deadlock by acquiring it twice.
      
      This came to light when running libc testsuite tst-tls3-malloc test but
      is likely also the cause for prior seen LTP failures. Using lockdep
      clearly showed what the issue was.
      
      | # while true; do ./tst-tls3-malloc ; done
      | Didn't expect signal from child: got `Segmentation fault'
      | ^C
      | ============================================
      | WARNING: possible recursive locking detected
      | 4.17.0+ #25 Not tainted
      | --------------------------------------------
      | tst-tls3-malloc/510 is trying to acquire lock:
      | 606c7728 (&mm->mmap_sem){++++}, at: __might_fault+0x28/0x5c
      |
      |but task is already holding lock:
      |606c7728 (&mm->mmap_sem){++++}, at: do_page_fault+0x9c/0x2a0
      |
      | other info that might help us debug this:
      |  Possible unsafe locking scenario:
      |
      |       CPU0
      |       ----
      |  lock(&mm->mmap_sem);
      |  lock(&mm->mmap_sem);
      |
      | *** DEADLOCK ***
      |
      
      ------------------------------------------------------------
      What the change does is not obvious (note to myself)
      
      prior code was
      
      | do_page_fault
      |
      |   down_read()		<-- lock taken
      |   handle_mm_fault	<-- signal pending as this runs
      |   if fatal_signal_pending
      |       if VM_FAULT_ERROR
      |           up_read
      |       if user_mode
      |          return	<-- lock still held, this was the BUG
      
      New code
      
      | do_page_fault
      |
      |   down_read()		<-- lock taken
      |   handle_mm_fault	<-- signal pending as this runs
      |   if fatal_signal_pending
      |       if VM_FAULT_RETRY
      |          return       <-- not same case as above, but still OK since
      |                           core mm already relinq lock for FAULT_RETRY
      |    ...
      |
      |   < Now falls through for bug case above >
      |
      |   up_read()		<-- lock relinquished
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8c6fb55a
    • V
      ARC: show_regs: lockdep: re-enable preemption · 96af7d92
      Vineet Gupta 提交于
      [ Upstream commit f731a8e89f8c78985707c626680f3e24c7a60772 ]
      
      signal handling core calls show_regs() with preemption disabled which
      on ARC takes mmap_sem for mm/vma access, causing lockdep splat.
      
      | [ARCLinux]# ./segv-null-ptr
      | potentially unexpected fatal signal 11.
      | BUG: sleeping function called from invalid context at kernel/fork.c:1011
      | in_atomic(): 1, irqs_disabled(): 0, pid: 70, name: segv-null-ptr
      | no locks held by segv-null-ptr/70.
      | CPU: 0 PID: 70 Comm: segv-null-ptr Not tainted 4.18.0+ #69
      |
      | Stack Trace:
      |  arc_unwind_core+0xcc/0x100
      |  ___might_sleep+0x17a/0x190
      |  mmput+0x16/0xb8
      |  show_regs+0x52/0x310
      |  get_signal+0x5ee/0x610
      |  do_signal+0x2c/0x218
      |  resume_user_mode_begin+0x90/0xd8
      
      Workaround by re-enabling preemption temporarily.
      
      Note that the preemption disabling in core code around show_regs()
      was introduced by commit 3a9f84d3 ("signals, debug: fix BUG: using
      smp_processor_id() in preemptible code in print_fatal_signal()")
      
      to silence a differnt lockdep seen on x86 bakc in 2009.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      96af7d92
    • R
      powerpc/pkeys: Fix handling of pkey state across fork() · cfbf227e
      Ram Pai 提交于
      [ Upstream commit 2cd4bd192ee94848695c1c052d87913260e10f36 ]
      
      Protection key tracking information is not copied over to the
      mm_struct of the child during fork(). This can cause the child to
      erroneously allocate keys that were already allocated. Any allocated
      execute-only key is lost aswell.
      
      Add code; called by dup_mmap(), to copy the pkey state from parent to
      child explicitly.
      
      This problem was originally found by Dave Hansen on x86, which turns
      out to be a problem on powerpc aswell.
      
      Fixes: cf43d3b2 ("powerpc: Enable pkey subsystem")
      Cc: stable@vger.kernel.org # v4.16+
      Reviewed-by: NThiago Jung Bauermann <bauerman@linux.ibm.com>
      Signed-off-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      cfbf227e
    • P
      KVM: PPC: Book3S HV: Fix race between kvm_unmap_hva_range and MMU mode switch · d3984e80
      Paul Mackerras 提交于
      [ Upstream commit 234ff0b729ad882d20f7996591a964965647addf ]
      
      Testing has revealed an occasional crash which appears to be caused
      by a race between kvmppc_switch_mmu_to_hpt and kvm_unmap_hva_range_hv.
      The symptom is a NULL pointer dereference in __find_linux_pte() called
      from kvm_unmap_radix() with kvm->arch.pgtable == NULL.
      
      Looking at kvmppc_switch_mmu_to_hpt(), it does indeed clear
      kvm->arch.pgtable (via kvmppc_free_radix()) before setting
      kvm->arch.radix to NULL, and there is nothing to prevent
      kvm_unmap_hva_range_hv() or the other MMU callback functions from
      being called concurrently with kvmppc_switch_mmu_to_hpt() or
      kvmppc_switch_mmu_to_radix().
      
      This patch therefore adds calls to spin_lock/unlock on the kvm->mmu_lock
      around the assignments to kvm->arch.radix, and makes sure that the
      partition-scoped radix tree or HPT is only freed after changing
      kvm->arch.radix.
      
      This also takes the kvm->mmu_lock in kvmppc_rmap_reset() to make sure
      that the clearing of each rmap array (one per memslot) doesn't happen
      concurrently with use of the array in the kvm_unmap_hva_range_hv()
      or the other MMU callbacks.
      
      Fixes: 18c3640c ("KVM: PPC: Book3S HV: Add infrastructure for running HPT guests on radix host")
      Cc: stable@vger.kernel.org # v4.15+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d3984e80
    • B
      ARM: davinci: dm644x: define gpio interrupts as separate resources · a4f404af
      Bartosz Golaszewski 提交于
      [ Upstream commit adcf60ce14c8250761af9de907eb6c7d096c26d3 ]
      
      Since commit eb3744a2 ("gpio: davinci: Do not assume continuous
      IRQ numbering") the davinci GPIO driver fails to probe if we boot
      in legacy mode from any of the board files. Since the driver now
      expects every interrupt to be defined as a separate resource, split
      the definition of IRQ resources instead of having a single continuous
      interrupt range.
      
      Fixes: eb3744a2 ("gpio: davinci: Do not assume continuous IRQ numbering")
      Cc: stable@vger.kernel.org
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NSekhar Nori <nsekhar@ti.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a4f404af
    • B
      ARM: davinci: dm355: define gpio interrupts as separate resources · 8d6b2b24
      Bartosz Golaszewski 提交于
      [ Upstream commit 27db7baab640ea28d7994eda943fef170e347081 ]
      
      Since commit eb3744a2 ("gpio: davinci: Do not assume continuous
      IRQ numbering") the davinci GPIO driver fails to probe if we boot
      in legacy mode from any of the board files. Since the driver now
      expects every interrupt to be defined as a separate resource, split
      the definition of IRQ resources instead of having a single continuous
      interrupt range.
      
      Fixes: eb3744a2 ("gpio: davinci: Do not assume continuous IRQ numbering")
      Cc: stable@vger.kernel.org
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NSekhar Nori <nsekhar@ti.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8d6b2b24
    • B
      ARM: davinci: dm646x: define gpio interrupts as separate resources · d31f2b61
      Bartosz Golaszewski 提交于
      [ Upstream commit 2c9c83491f30afbce25796e185cd4d5e36080e31 ]
      
      Since commit eb3744a2 ("gpio: davinci: Do not assume continuous
      IRQ numbering") the davinci GPIO driver fails to probe if we boot
      in legacy mode from any of the board files. Since the driver now
      expects every interrupt to be defined as a separate resource, split
      the definition of IRQ resources instead of having a single continuous
      interrupt range.
      
      Fixes: eb3744a2 ("gpio: davinci: Do not assume continuous IRQ numbering")
      Cc: stable@vger.kernel.org
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NSekhar Nori <nsekhar@ti.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d31f2b61
    • B
      ARM: davinci: dm365: define gpio interrupts as separate resources · 4883e9e6
      Bartosz Golaszewski 提交于
      [ Upstream commit 193c04374e281a56c7d4f96e66d329671945bebe ]
      
      Since commit eb3744a2 ("gpio: davinci: Do not assume continuous
      IRQ numbering") the davinci GPIO driver fails to probe if we boot
      in legacy mode from any of the board files. Since the driver now
      expects every interrupt to be defined as a separate resource, split
      the definition of IRQ resources instead of having a single continuous
      interrupt range.
      
      Fixes: eb3744a2 ("gpio: davinci: Do not assume continuous IRQ numbering")
      Cc: stable@vger.kernel.org
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NSekhar Nori <nsekhar@ti.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4883e9e6
    • B
      ARM: davinci: da8xx: define gpio interrupts as separate resources · 0a6c3bda
      Bartosz Golaszewski 提交于
      [ Upstream commit 58a0afbf4c99ac355df16773af835b919b9432ee ]
      
      Since commit eb3744a2 ("gpio: davinci: Do not assume continuous
      IRQ numbering") the davinci GPIO driver fails to probe if we boot
      in legacy mode from any of the board files. Since the driver now
      expects every interrupt to be defined as a separate resource, split
      the definition of IRQ resources instead of having a single continuous
      interrupt range.
      
      Fixes: eb3744a2 ("gpio: davinci: Do not assume continuous IRQ numbering")
      Cc: stable@vger.kernel.org
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NSekhar Nori <nsekhar@ti.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0a6c3bda
    • V
      x86/kvm/lapic: preserve gfn_to_hva_cache len on cache reinit · 796469e3
      Vitaly Kuznetsov 提交于
      [ Upstream commit a7c42bb6da6b1b54b2e7bd567636d72d87b10a79 ]
      
      vcpu->arch.pv_eoi is accessible through both HV_X64_MSR_VP_ASSIST_PAGE and
      MSR_KVM_PV_EOI_EN so on migration userspace may try to restore them in any
      order. Values match, however, kvm_lapic_enable_pv_eoi() uses different
      length: for Hyper-V case it's the whole struct hv_vp_assist_page, for KVM
      native case it is 8. In case we restore KVM-native MSR last cache will
      be reinitialized with len=8 so trying to access VP assist page beyond
      8 bytes with kvm_read_guest_cached() will fail.
      
      Check if we re-initializing cache for the same address and preserve length
      in case it was greater.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      796469e3
    • L
      KVM: hyperv: define VP assist page helpers · cdad0f65
      Ladi Prosek 提交于
      [ Upstream commit 72bbf9358c3676bd89dc4bd8fb0b1f2a11c288fc ]
      
      The state related to the VP assist page is still managed by the LAPIC
      code in the pv_eoi field.
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      cdad0f65
    • V
      KVM: x86: hyperv: keep track of mismatched VP indexes · b0d9043b
      Vitaly Kuznetsov 提交于
      [ Upstream commit 87ee613d076351950b74383215437f841ebbeb75 ]
      
      In most common cases VP index of a vcpu matches its vcpu index. Userspace
      is, however, free to set any mapping it wishes and we need to account for
      that when we need to find a vCPU with a particular VP index. To keep search
      algorithms optimal in both cases introduce 'num_mismatched_vp_indexes'
      counter showing how many vCPUs with mismatching VP index we have. In case
      the counter is zero we can assume vp_index == vcpu_idx.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b0d9043b