1. 26 7月, 2018 1 次提交
    • S
      KVM: PPC: Book3S HV: Pack VCORE IDs to access full VCPU ID space · 1e175d2e
      Sam Bobroff 提交于
      It is not currently possible to create the full number of possible
      VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses fewer
      threads per core than its core stride (or "VSMT mode"). This is
      because the VCORE ID and XIVE offsets grow beyond KVM_MAX_VCPUS
      even though the VCPU ID is less than KVM_MAX_VCPU_ID.
      
      To address this, "pack" the VCORE ID and XIVE offsets by using
      knowledge of the way the VCPU IDs will be used when there are fewer
      guest threads per core than the core stride. The primary thread of
      each core will always be used first. Then, if the guest uses more than
      one thread per core, these secondary threads will sequentially follow
      the primary in each core.
      
      So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the
      VCPUs are being spaced apart, so at least half of each core is empty,
      and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped
      into the second half of each core (4..7, in an 8-thread core).
      
      Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of
      each core is being left empty, and we can map down into the second and
      third quarters of each core (2, 3 and 5, 6 in an 8-thread core).
      
      Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary
      threads are being used and 7/8 of the core is empty, allowing use of
      the 1, 5, 3 and 7 thread slots.
      
      (Strides less than 8 are handled similarly.)
      
      This allows the VCORE ID or offset to be calculated quickly from the
      VCPU ID or XIVE server numbers, without access to the VCPU structure.
      
      [paulus@ozlabs.org - tidied up comment a little, changed some WARN_ONCE
       to pr_devel, wrapped line, fixed id check.]
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1e175d2e
  2. 18 7月, 2018 4 次提交
    • N
      KVM: PPC: Book3S HV: Fix constant size warning · 0abb75b7
      Nicholas Mc Guire 提交于
      The constants are 64bit but not explicitly declared UL resulting
      in sparse warnings. Fix this by declaring the constants UL.
      Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      0abb75b7
    • N
      KVM: PPC: Book3S HV: Add of_node_put() in success path · 51eaa08f
      Nicholas Mc Guire 提交于
      The call to of_find_compatible_node() is returning a pointer with
      incremented refcount so it must be explicitly decremented after the
      last use. As here it is only being used for checking of node presence
      but the result is not actually used in the success path it can be
      dropped immediately.
      Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
      Fixes: commit f725758b ("KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      51eaa08f
    • A
      KVM: PPC: Book3S: Fix matching of hardware and emulated TCE tables · 76346cd9
      Alexey Kardashevskiy 提交于
      When attaching a hardware table to LIOBN in KVM, we match table parameters
      such as page size, table offset and table size. However the tables are
      created via very different paths - VFIO and KVM - and the VFIO path goes
      through the platform code which has minimum TCE page size requirement
      (which is 4K but since we allocate memory by pages and cannot avoid
      alignment anyway, we align to 64k pages for powernv_defconfig).
      
      So when we match the tables, one might be bigger that the other which
      means the hardware table cannot get attached to LIOBN and DMA mapping
      fails.
      
      This removes the table size alignment from the guest visible table.
      This does not affect the memory allocation which is still aligned -
      kvmppc_tce_pages() takes care of this.
      
      This relaxes the check we do when attaching tables to allow the hardware
      table be bigger than the guest visible table.
      
      Ideally we want the KVM table to cover the same space as the hardware
      table does but since the hardware table may use multiple levels, and
      all levels must use the same table size (IODA2 design), the area it can
      actually cover might get very different from the window size which
      the guest requested, even though the guest won't map it all.
      
      Fixes: ca1fc489 "KVM: PPC: Book3S: Allow backing bigger guest IOMMU pages with smaller physical pages"
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      76346cd9
    • S
      KVM: PPC: Remove mmio_vsx_tx_sx_enabled in KVM MMIO emulation · 4eeb8556
      Simon Guo 提交于
      Originally PPC KVM MMIO emulation uses only 0~31#(5 bits) for VSR
      reg number, and use mmio_vsx_tx_sx_enabled field together for
      0~63# VSR regs.
      
      Currently PPC KVM MMIO emulation is reimplemented with analyse_instr()
      assistance.  analyse_instr() returns 0~63 for VSR register number, so
      it is not necessary to use additional mmio_vsx_tx_sx_enabled field
      any more.
      
      This patch extends related reg bits (expand io_gpr to u16 from u8
      and use 6 bits for VSR reg#), so that mmio_vsx_tx_sx_enabled can
      be removed.
      Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      4eeb8556
  3. 24 6月, 2018 1 次提交
    • A
      efi/x86: Fix incorrect invocation of PciIo->Attributes() · 2e6eb40c
      Ard Biesheuvel 提交于
      The following commit:
      
        2c3625cb ("efi/x86: Fold __setup_efi_pci32() and __setup_efi_pci64() into one function")
      
      ... merged the two versions of __setup_efi_pciXX(), without taking into
      account that the 32-bit version used a rather dodgy trick to pass an
      immediate 0 constant as argument for a uint64_t parameter.
      
      The issue is caused by the fact that on x86, UEFI protocol method calls
      are redirected via struct efi_config::call(), which is a variadic function,
      and so the compiler has to infer the types of the parameters from the
      arguments rather than from the prototype.
      
      As the 32-bit x86 calling convention passes arguments via the stack,
      passing the unqualified constant 0 twice is the same as passing 0ULL,
      which is why the 32-bit code in __setup_efi_pci32() contained the
      following call:
      
        status = efi_early->call(pci->attributes, pci,
                                 EfiPciIoAttributeOperationGet, 0, 0,
                                 &attributes);
      
      to invoke this UEFI protocol method:
      
        typedef
        EFI_STATUS
        (EFIAPI *EFI_PCI_IO_PROTOCOL_ATTRIBUTES) (
          IN  EFI_PCI_IO_PROTOCOL                     *This,
          IN  EFI_PCI_IO_PROTOCOL_ATTRIBUTE_OPERATION Operation,
          IN  UINT64                                  Attributes,
          OUT UINT64                                  *Result OPTIONAL
          );
      
      After the merge, we inadvertently ended up with this version for both
      32-bit and 64-bit builds, breaking the latter.
      
      So replace the two zeroes with the explicitly typed constant 0ULL,
      which works as expected on both 32-bit and 64-bit builds.
      
      Wilfried tested the 64-bit build, and I checked the generated assembly
      of a 32-bit build with and without this patch, and they are identical.
      Reported-by: NWilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
      Tested-by: NWilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: hdegoede@redhat.com
      Cc: linux-efi@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2e6eb40c
  4. 23 6月, 2018 6 次提交
  5. 22 6月, 2018 4 次提交
    • M
      kvm: vmx: Nested VM-entry prereqs for event inj. · 0447378a
      Marc Orr 提交于
      This patch extends the checks done prior to a nested VM entry.
      Specifically, it extends the check_vmentry_prereqs function with checks
      for fields relevant to the VM-entry event injection information, as
      described in the Intel SDM, volume 3.
      
      This patch is motivated by a syzkaller bug, where a bad VM-entry
      interruption information field is generated in the VMCS02, which causes
      the nested VM launch to fail. Then, KVM fails to resume L1.
      
      While KVM should be improved to correctly resume L1 execution after a
      failed nested launch, this change is justified because the existing code
      to resume L1 is flaky/ad-hoc and the test coverage for resuming L1 is
      sparse.
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NMarc Orr <marcorr@google.com>
      [Removed comment whose parts were describing previous revisions and the
       rest was obvious from function/variable naming. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      0447378a
    • Z
      x86/microcode/intel: Fix memleak in save_microcode_patch() · 0218c766
      Zhenzhong Duan 提交于
      Free useless ucode_patch entry when it's replaced.
      
      [ bp: Drop the memfree_patch() two-liner. ]
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Srinivas REDDY Eeda <srinivas.eeda@oracle.com>
      Link: http://lkml.kernel.org/r/888102f0-fd22-459d-b090-a1bd8a00cb2b@default
      0218c766
    • T
      x86/mce: Fix incorrect "Machine check from unknown source" message · 40c36e27
      Tony Luck 提交于
      Some injection testing resulted in the following console log:
      
        mce: [Hardware Error]: CPU 22: Machine Check Exception: f Bank 1: bd80000000100134
        mce: [Hardware Error]: RIP 10:<ffffffffc05292dd> {pmem_do_bvec+0x11d/0x330 [nd_pmem]}
        mce: [Hardware Error]: TSC c51a63035d52 ADDR 3234bc4000 MISC 88
        mce: [Hardware Error]: PROCESSOR 0:50654 TIME 1526502199 SOCKET 0 APIC 38 microcode 2000043
        mce: [Hardware Error]: Run the above through 'mcelog --ascii'
        Kernel panic - not syncing: Machine check from unknown source
      
      This confused everybody because the first line quite clearly shows
      that we found a logged error in "Bank 1", while the last line says
      "unknown source".
      
      The problem is that the Linux code doesn't do the right thing
      for a local machine check that results in a fatal error.
      
      It turns out that we know very early in the handler whether the
      machine check is fatal. The call to mce_no_way_out() has checked
      all the banks for the CPU that took the local machine check. If
      it says we must crash, we can do so right away with the right
      messages.
      
      We do scan all the banks again. This means that we might initially
      not see a problem, but during the second scan find something fatal.
      If this happens we print a slightly different message (so I can
      see if it actually every happens).
      
      [ bp: Remove unneeded severity assignment. ]
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: stable@vger.kernel.org # 4.2
      Link: http://lkml.kernel.org/r/52e049a497e86fd0b71c529651def8871c804df0.1527283897.git.tony.luck@intel.com
      40c36e27
    • B
      x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out() · 1f74c8a6
      Borislav Petkov 提交于
      mce_no_way_out() does a quick check during #MC to see whether some of
      the MCEs logged would require the kernel to panic immediately. And it
      passes a struct mce where MCi_STATUS gets written.
      
      However, after having saved a valid status value, the next iteration
      of the loop which goes over the MCA banks on the CPU, overwrites the
      valid status value because we're using struct mce as storage instead of
      a temporary variable.
      
      Which leads to MCE records with an empty status value:
      
        mce: [Hardware Error]: CPU 0: Machine Check Exception: 6 Bank 0: 0000000000000000
        mce: [Hardware Error]: RIP 10:<ffffffffbd42fbd7> {trigger_mce+0x7/0x10}
      
      In order to prevent the loss of the status register value, return
      immediately when severity is a panic one so that we can panic
      immediately with the first fatal MCE logged. This is also the intention
      of this function and not to noodle over the banks while a fatal MCE is
      already logged.
      
      Tony: read the rest of the MCA bank to populate the struct mce fully.
      Suggested-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20180622095428.626-8-bp@alien8.de
      1f74c8a6
  6. 21 6月, 2018 13 次提交
  7. 20 6月, 2018 9 次提交
  8. 19 6月, 2018 2 次提交