1. 05 2月, 2020 1 次提交
    • S
      kvm: x86: Introduce APICv inhibit reason bits · 4e19c36f
      Suravee Suthikulpanit 提交于
      There are several reasons in which a VM needs to deactivate APICv
      e.g. disable APICv via parameter during module loading, or when
      enable Hyper-V SynIC support. Additional inhibit reasons will be
      introduced later on when dynamic APICv is supported,
      
      Introduce KVM APICv inhibit reason bits along with a new variable,
      apicv_inhibit_reasons, to help keep track of APICv state for each VM,
      
      Initially, the APICV_INHIBIT_REASON_DISABLE bit is used to indicate
      the case where APICv is disabled during KVM module load.
      (e.g. insmod kvm_amd avic=0 or insmod kvm_intel enable_apicv=0).
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      [Do not use get_enable_apicv; consider irqchip_split in svm.c. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4e19c36f
  2. 31 1月, 2020 2 次提交
  3. 28 1月, 2020 3 次提交
  4. 24 1月, 2020 5 次提交
    • S
      KVM: x86: Move kvm_vcpu_init() invocation to common code · 987b2594
      Sean Christopherson 提交于
      Move the kvm_cpu_{un}init() calls to common x86 code as an intermediate
      step to removing kvm_cpu_{un}init() altogether.
      
      Note, VMX'x alloc_apic_access_page() and init_rmode_identity_map() are
      per-VM allocations and are intentionally kept if vCPU creation fails.
      They are freed by kvm_arch_destroy_vm().
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      987b2594
    • S
      KVM: x86: Allocate vcpu struct in common x86 code · a9dd6f09
      Sean Christopherson 提交于
      Move allocation of VMX and SVM vcpus to common x86.  Although the struct
      being allocated is technically a VMX/SVM struct, it can be interpreted
      directly as a 'struct kvm_vcpu' because of the pre-existing requirement
      that 'struct kvm_vcpu' be located at offset zero of the arch/vendor vcpu
      struct.
      
      Remove the message from the build-time assertions regarding placement of
      the struct, as compatibility with the arch usercopy region is no longer
      the sole dependent on 'struct kvm_vcpu' being at offset zero.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a9dd6f09
    • D
      x86/asm: add iosubmit_cmds512() based on MOVDIR64B CPU instruction · 232bb01b
      Dave Jiang 提交于
      With the introduction of MOVDIR64B instruction, there is now an instruction
      that can write 64 bytes of data atomically.
      
      Quoting from Intel SDM:
      "There is no atomicity guarantee provided for the 64-byte load operation
      from source address, and processor implementations may use multiple
      load operations to read the 64-bytes. The 64-byte direct-store issued
      by MOVDIR64B guarantees 64-byte write-completion atomicity. This means
      that the data arrives at the destination in a single undivided 64-byte
      write transaction."
      
      We have identified at least 3 different use cases for this instruction in
      the format of func(dst, src, count):
      1) Clear poison / Initialize MKTME memory
         @dst is normal memory.
         @src in normal memory. Does not increment. (Copy same line to all
         targets)
         @count (to clear/init multiple lines)
      2) Submit command(s) to new devices
         @dst is a special MMIO region for a device. Does not increment.
         @src is normal memory. Increments.
         @count usually is 1, but can be multiple.
      3) Copy to iomem in big chunks
         @dst is iomem and increments
         @src in normal memory and increments
         @count is number of chunks to copy
      
      Add support for case #2 to support device that will accept commands via
      this instruction. We provide a @count in order to submit a batch of
      preprogrammed descriptors in virtually contiguous memory. This
      allows the caller to submit multiple descriptors to a device with a single
      submission. The special device requires the entire 64bytes descriptor to
      be written atomically and will accept MOVDIR64B instruction.
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Link: https://lore.kernel.org/r/157965022175.73301.10174614665472962675.stgit@djiang5-desk3.ch.intel.comSigned-off-by: NVinod Koul <vkoul@kernel.org>
      232bb01b
    • D
      x86/mpx: remove MPX from arch/x86 · 45fc24e8
      Dave Hansen 提交于
      From: Dave Hansen <dave.hansen@linux.intel.com>
      
      MPX is being removed from the kernel due to a lack of support
      in the toolchain going forward (gcc).
      
      This removes all the remaining (dead at this point) MPX handling
      code remaining in the tree.  The only remaining code is the XSAVE
      support for MPX state which is currently needd for KVM to handle
      VMs which might use MPX.
      
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: x86@kernel.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      45fc24e8
    • D
      mm: remove arch_bprm_mm_init() hook · 42222eae
      Dave Hansen 提交于
      From: Dave Hansen <dave.hansen@linux.intel.com>
      
      MPX is being removed from the kernel due to a lack of support
      in the toolchain going forward (gcc).
      
      arch_bprm_mm_init() is used at execve() time.  The only non-stub
      implementation is on x86 for MPX.  Remove the hook entirely from
      all architectures and generic code.
      
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: x86@kernel.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-arch@vger.kernel.org
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      42222eae
  5. 23 1月, 2020 8 次提交
  6. 21 1月, 2020 3 次提交
    • S
      KVM: x86: Add dedicated emulator helpers for querying CPUID features · 5ae78e95
      Sean Christopherson 提交于
      Add feature-specific helpers for querying guest CPUID support from the
      emulator instead of having the emulator do a full CPUID and perform its
      own bit tests.  The primary motivation is to eliminate the emulator's
      usage of bit() so that future patches can add more extensive build-time
      assertions on the usage of bit() without having to expose yet more code
      to the emulator.
      
      Note, providing a generic guest_cpuid_has() to the emulator doesn't work
      due to the existing built-time assertions in guest_cpuid_has(), which
      require the feature being checked to be a compile-time constant.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5ae78e95
    • M
      KVM: Fix some writing mistakes · 311497e0
      Miaohe Lin 提交于
      Fix some writing mistakes in the comments.
      Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      311497e0
    • W
      KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath · 1e9e2622
      Wanpeng Li 提交于
      ICR and TSCDEADLINE MSRs write cause the main MSRs write vmexits in our
      product observation, multicast IPIs are not as common as unicast IPI like
      RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc.
      
      This patch introduce a mechanism to handle certain performance-critical
      WRMSRs in a very early stage of KVM VMExit handler.
      
      This mechanism is specifically used for accelerating writes to x2APIC ICR
      that attempt to send a virtual IPI with physical destination-mode, fixed
      delivery-mode and single target. Which was found as one of the main causes
      of VMExits for Linux workloads.
      
      The reason this mechanism significantly reduce the latency of such virtual
      IPIs is by sending the physical IPI to the target vCPU in a very early stage
      of KVM VMExit handler, before host interrupts are enabled and before expensive
      operations such as reacquiring KVM’s SRCU lock.
      Latency is reduced even more when KVM is able to use APICv posted-interrupt
      mechanism (which allows to deliver the virtual IPI directly to target vCPU
      without the need to kick it to host).
      
      Testing on Xeon Skylake server:
      
      The virtual IPI latency from sender send to receiver receive reduces
      more than 200+ cpu cycles.
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1e9e2622
  7. 20 1月, 2020 2 次提交
    • A
      efi/x86: Limit EFI old memory map to SGI UV machines · 1f299fad
      Ard Biesheuvel 提交于
      We carry a quirk in the x86 EFI code to switch back to an older
      method of mapping the EFI runtime services memory regions, because
      it was deemed risky at the time to implement a new method without
      providing a fallback to the old method in case problems arose.
      
      Such problems did arise, but they appear to be limited to SGI UV1
      machines, and so these are the only ones for which the fallback gets
      enabled automatically (via a DMI quirk). The fallback can be enabled
      manually as well, by passing efi=old_map, but there is very little
      evidence that suggests that this is something that is being relied
      upon in the field.
      
      Given that UV1 support is not enabled by default by the distros
      (Ubuntu, Fedora), there is no point in carrying this fallback code
      all the time if there are no other users. So let's move it into the
      UV support code, and document that efi=old_map now requires this
      support code to be enabled.
      
      Note that efi=old_map has been used in the past on other SGI UV
      machines to work around kernel regressions in production, so we
      keep the option to enable it by hand, but only if the kernel was
      built with UV support.
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20200113172245.27925-8-ardb@kernel.org
      1f299fad
    • A
      efi/libstub/x86: Use const attribute for efi_is_64bit() · 796eb8d2
      Ard Biesheuvel 提交于
      Reshuffle the x86 stub code a bit so that we can tag the efi_is_64bit()
      function with the 'const' attribute, which permits the compiler to
      optimize away any redundant calls. Since we have two different entry
      points for 32 and 64 bit firmware in the startup code, this also
      simplifies the C code since we'll enter it with the efi_is64 variable
      already set.
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20200113172245.27925-2-ardb@kernel.org
      796eb8d2
  8. 17 1月, 2020 1 次提交
  9. 14 1月, 2020 9 次提交
    • D
      x86/vdso: Add time napespace page · 550a77a7
      Dmitry Safonov 提交于
      To support time namespaces in the VDSO with a minimal impact on regular non
      time namespace affected tasks, the namespace handling needs to be hidden in
      a slow path.
      
      The most obvious place is vdso_seq_begin(). If a task belongs to a time
      namespace then the VVAR page which contains the system wide VDSO data is
      replaced with a namespace specific page which has the same layout as the
      VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path
      and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time
      namespace handling path.
      
      The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
      update of the VDSO data is in progress, is not really affecting regular
      tasks which are not part of a time namespace as the task is spin waiting
      for the update to finish and vdso_data->seq to become even again.
      
      If a time namespace task hits that code path, it invokes the corresponding
      time getter function which retrieves the real VVAR page, reads host time
      and then adds the offset for the requested clock which is stored in the
      special VVAR page.
      
      Allocate the time namespace page among VVAR pages and place vdso_data on
      it.  Provide __arch_get_timens_vdso_data() helper for VDSO code to get the
      code-relative position of VVARs on that special page.
      Co-developed-by: NAndrei Vagin <avagin@openvz.org>
      Signed-off-by: NAndrei Vagin <avagin@openvz.org>
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20191112012724.250792-23-dima@arista.com
      
      550a77a7
    • D
      x86/vdso: Provide vdso_data offset on vvar_page · 64b302ab
      Dmitry Safonov 提交于
      VDSO support for time namespaces needs to set up a page with the same
      layout as VVAR. That timens page will be placed on position of VVAR page
      inside namespace. That page has vdso_data->seq set to 1 to enforce
      the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce
      the time namespace handling path.
      
      To prepare the time namespace page the kernel needs to know the vdso_data
      offset.  Provide arch_get_vdso_data() helper for locating vdso_data on VVAR
      page.
      Co-developed-by: NAndrei Vagin <avagin@openvz.org>
      Signed-off-by: NAndrei Vagin <avagin@openvz.org>
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20191112012724.250792-22-dima@arista.com
      
      64b302ab
    • V
      x86/vdso: Remove unused VDSO_HAS_32BIT_FALLBACK · 0b5c1233
      Vincenzo Frascino 提交于
      VDSO_HAS_32BIT_FALLBACK has been removed from the core since
      the architectures that support the generic vDSO library have
      been converted to support the 32 bit fallbacks.
      
      Remove unused VDSO_HAS_32BIT_FALLBACK from x86 vdso.
      Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20190830135902.20861-9-vincenzo.frascino@arm.com
      
      0b5c1233
    • S
      perf/x86: Provide stubs of KVM helpers for non-Intel CPUs · 616c59b5
      Sean Christopherson 提交于
      Provide stubs for perf_guest_get_msrs() and intel_pt_handle_vmx() when
      building without support for Intel CPUs, i.e. CPU_SUP_INTEL=n.  Lack of
      stubs is not currently a problem as the only user, KVM_INTEL, takes a
      dependency on CPU_SUP_INTEL=y.  Provide the stubs for all CPUs so that
      KVM_INTEL can be built for any CPU with compatible hardware support,
      e.g. Centuar and Zhaoxin CPUs.
      
      Note, the existing stub for perf_guest_get_msrs() is essentially dead
      code as KVM selects CONFIG_PERF_EVENTS, i.e. the only user guarantees
      the full implementation is built.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20191221044513.21680-19-sean.j.christopherson@intel.com
      616c59b5
    • S
      KVM: VMX: Use VMX_FEATURE_* flags to define VMCS control bits · b39033f5
      Sean Christopherson 提交于
      Define the VMCS execution control flags (consumed by KVM) using their
      associated VMX_FEATURE_* to provide a strong hint that new VMX features
      are expected to be added to VMX_FEATURE and considered for reporting via
      /proc/cpuinfo.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20191221044513.21680-18-sean.j.christopherson@intel.com
      b39033f5
    • S
      x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured · 85c17291
      Sean Christopherson 提交于
      Add a new feature flag, X86_FEATURE_MSR_IA32_FEAT_CTL, to track whether
      IA32_FEAT_CTL has been initialized.  This will allow KVM, and any future
      subsystems that depend on IA32_FEAT_CTL, to rely purely on cpufeatures
      to query platform support, e.g. allows a future patch to remove KVM's
      manual IA32_FEAT_CTL MSR checks.
      
      Various features (on platforms that support IA32_FEAT_CTL) are dependent
      on IA32_FEAT_CTL being configured and locked, e.g. VMX and LMCE.  The
      MSR is always configured during boot, but only if the CPU vendor is
      recognized by the kernel.  Because CPUID doesn't incorporate the current
      IA32_FEAT_CTL value in its reporting of relevant features, it's possible
      for a feature to be reported as supported in cpufeatures but not truly
      enabled, e.g. if the CPU supports VMX but the kernel doesn't recognize
      the CPU.
      
      As a result, without the flag, KVM would see VMX as supported even if
      IA32_FEAT_CTL hasn't been initialized, and so would need to manually
      read the MSR and check the various enabling bits to avoid taking an
      unexpected #GP on VMXON.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20191221044513.21680-14-sean.j.christopherson@intel.com
      85c17291
    • S
      x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs · b47ce1fe
      Sean Christopherson 提交于
      Add an entry in struct cpuinfo_x86 to track VMX capabilities and fill
      the capabilities during IA32_FEAT_CTL MSR initialization.
      
      Make the VMX capabilities dependent on IA32_FEAT_CTL and
      X86_FEATURE_NAMES so as to avoid unnecessary overhead on CPUs that can't
      possibly support VMX, or when /proc/cpuinfo is not available.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20191221044513.21680-11-sean.j.christopherson@intel.com
      b47ce1fe
    • S
      x86/vmx: Introduce VMX_FEATURES_* · 15934878
      Sean Christopherson 提交于
      Add a VMX-specific variant of X86_FEATURE_* flags, which will eventually
      supplant the synthetic VMX flags defined in cpufeatures word 8.  Use the
      Intel-defined layouts for the major VMX execution controls so that their
      word entries can be directly populated from their respective MSRs, and
      so that the VMX_FEATURE_* flags can be used to define the existing bit
      definitions in asm/vmx.h, i.e. force developers to define a VMX_FEATURE
      flag when adding support for a new hardware feature.
      
      The majority of Intel's (and compatible CPU's) VMX capabilities are
      enumerated via MSRs and not CPUID, i.e. querying /proc/cpuinfo doesn't
      naturally provide any insight into the virtualization capabilities of
      VMX enabled CPUs.  Commit
      
        e38e05a8 ("x86: extended "flags" to show virtualization HW feature
      		 in /proc/cpuinfo")
      
      attempted to address the issue by synthesizing select VMX features into
      a Linux-defined word in cpufeatures.
      
      Lack of reporting of VMX capabilities via /proc/cpuinfo is problematic
      because there is no sane way for a user to query the capabilities of
      their platform, e.g. when trying to find a platform to test a feature or
      debug an issue that has a hardware dependency.  Lack of reporting is
      especially problematic when the user isn't familiar with VMX, e.g. the
      format of the MSRs is non-standard, existence of some MSRs is reported
      by bits in other MSRs, several "features" from KVM's point of view are
      enumerated as 3+ distinct features by hardware, etc...
      
      The synthetic cpufeatures approach has several flaws:
      
        - The set of synthesized VMX flags has become extremely stale with
          respect to the full set of VMX features, e.g. only one new flag
          (EPT A/D) has been added in the the decade since the introduction of
          the synthetic VMX features.  Failure to keep the VMX flags up to
          date is likely due to the lack of a mechanism that forces developers
          to consider whether or not a new feature is worth reporting.
      
        - The synthetic flags may incorrectly be misinterpreted as affecting
          kernel behavior, i.e. KVM, the kernel's sole consumer of VMX,
          completely ignores the synthetic flags.
      
        - New CPU vendors that support VMX have duplicated the hideous code
          that propagates VMX features from MSRs to cpufeatures.  Bringing the
          synthetic VMX flags up to date would exacerbate the copy+paste
          trainwreck.
      
      Define separate VMX_FEATURE flags to set the stage for enumerating VMX
      capabilities outside of the cpu_has() framework, and for adding
      functional usage of VMX_FEATURE_* to help ensure the features reported
      via /proc/cpuinfo is up to date with respect to kernel recognition of
      VMX capabilities.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20191221044513.21680-10-sean.j.christopherson@intel.com
      15934878
    • S
      x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR · 32ad73db
      Sean Christopherson 提交于
      As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
      are quite a mouthful, especially the VMX bits which must differentiate
      between enabling VMX inside and outside SMX (TXT) operation.  Rename the
      MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
      make them a little friendlier on the eyes.
      
      Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
      to match Intel's SDM, but a future patch will add a dedicated Kconfig,
      file and functions for the MSR. Using the full name for those assets is
      rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
      nomenclature is consistent throughout the kernel.
      
      Opportunistically, fix a few other annoyances with the defines:
      
        - Relocate the bit defines so that they immediately follow the MSR
          define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
        - Add whitespace around the block of feature control defines to make
          it clear they're all related.
        - Use BIT() instead of manually encoding the bit shift.
        - Use "VMX" instead of "VMXON" to match the SDM.
        - Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
          be consistent with the kernel's verbiage used for all other feature
          control bits.  Note, the SDM refers to the LMCE bit as LMCE_ON,
          likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN.  Ignore
          the (literal) one-off usage of _ON, the SDM is simply "wrong".
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
      32ad73db
  10. 13 1月, 2020 1 次提交
    • J
      x86/mce: Take action on UCNA/Deferred errors again · 8438b84a
      Jan H. Schönherr 提交于
      Commit
      
        fa92c586 ("x86, mce: Support memory error recovery for both UCNA
      		and Deferred error in machine_check_poll")
      
      added handling of UCNA and Deferred errors by adding them to the ring
      for SRAO errors.
      
      Later, commit
      
        fd4cf79f ("x86/mce: Remove the MCE ring for Action Optional errors")
      
      switched storage from the SRAO ring to the unified pool that is still
      in use today. In order to only act on the intended errors, a filter
      for MCE_AO_SEVERITY is used -- effectively removing handling of
      UCNA/Deferred errors again.
      
      Extend the severity filter to include UCNA/Deferred errors again.
      Also, generalize the naming of the notifier from SRAO to UC to capture
      the extended scope.
      
      Note, that this change may cause a message like the following to appear,
      as the same address may be reported as SRAO and as UCNA:
      
       Memory failure: 0x5fe3284: already hardware poisoned
      
      Technically, this is a return to previous behavior.
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Link: https://lkml.kernel.org/r/20200103150722.20313-2-jschoenh@amazon.de
      8438b84a
  11. 11 1月, 2020 5 次提交
    • C
      x86/nmi: Remove irq_work from the long duration NMI handler · 248ed510
      Changbin Du 提交于
      First, printk() is NMI-context safe now since the safe printk() has been
      implemented and it already has an irq_work to make NMI-context safe.
      
      Second, this NMI irq_work actually does not work if a NMI handler causes
      panic by watchdog timeout. It has no chance to run in such case, while
      the safe printk() will flush its per-cpu buffers before panicking.
      
      While at it, repurpose the irq_work callback into a function which
      concentrates the NMI duration checking and makes the code easier to
      follow.
      
       [ bp: Massage. ]
      Signed-off-by: NChangbin Du <changbin.du@gmail.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20200111125427.15662-1-changbin.du@gmail.com
      248ed510
    • M
      efi: Allow disabling PCI busmastering on bridges during boot · 4444f854
      Matthew Garrett 提交于
      Add an option to disable the busmaster bit in the control register on
      all PCI bridges before calling ExitBootServices() and passing control
      to the runtime kernel. System firmware may configure the IOMMU to prevent
      malicious PCI devices from being able to attack the OS via DMA. However,
      since firmware can't guarantee that the OS is IOMMU-aware, it will tear
      down IOMMU configuration when ExitBootServices() is called. This leaves
      a window between where a hostile device could still cause damage before
      Linux configures the IOMMU again.
      
      If CONFIG_EFI_DISABLE_PCI_DMA is enabled or "efi=disable_early_pci_dma"
      is passed on the command line, the EFI stub will clear the busmaster bit
      on all PCI bridges before ExitBootServices() is called. This will
      prevent any malicious PCI devices from being able to perform DMA until
      the kernel reenables busmastering after configuring the IOMMU.
      
      This option may cause failures with some poorly behaved hardware and
      should not be enabled without testing. The kernel commandline options
      "efi=disable_early_pci_dma" or "efi=no_disable_early_pci_dma" may be
      used to override the default. Note that PCI devices downstream from PCI
      bridges are disconnected from their drivers first, using the UEFI
      driver model API, so that DMA can be disabled safely at the bridge
      level.
      
      [ardb: disconnect PCI I/O handles first, as suggested by Arvind]
      Co-developed-by: NMatthew Garrett <mjg59@google.com>
      Signed-off-by: NMatthew Garrett <mjg59@google.com>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arvind Sankar <nivedita@alum.mit.edu>
      Cc: Matthew Garrett <matthewgarrett@google.com>
      Cc: linux-efi@vger.kernel.org
      Link: https://lkml.kernel.org/r/20200103113953.9571-18-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4444f854
    • A
      efi/x86: Allow translating 64-bit arguments for mixed mode calls · ea7d87f9
      Arvind Sankar 提交于
      Introduce the ability to define macros to perform argument translation
      for the calls that need it, and define them for the boot services that
      we currently use.
      
      When calling 32-bit firmware methods in mixed mode, all output
      parameters that are 32-bit according to the firmware, but 64-bit in the
      kernel (ie OUT UINTN * or OUT VOID **) must be initialized in the
      kernel, or the upper 32 bits may contain garbage. Define macros that
      zero out the upper 32 bits of the output before invoking the firmware
      method.
      
      When a 32-bit EFI call takes 64-bit arguments, the mixed-mode call must
      push the two 32-bit halves as separate arguments onto the stack. This
      can be achieved by splitting the argument into its two halves when
      calling the assembler thunk. Define a macro to do this for the
      free_pages boot service.
      Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Matthew Garrett <mjg59@google.com>
      Cc: linux-efi@vger.kernel.org
      Link: https://lkml.kernel.org/r/20200103113953.9571-17-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ea7d87f9
    • A
      efi/x86: Check number of arguments to variadic functions · 14b864f4
      Arvind Sankar 提交于
      On x86 we need to thunk through assembler stubs to call the EFI services
      for mixed mode, and for runtime services in 64-bit mode. The assembler
      stubs have limits on how many arguments it handles. Introduce a few
      macros to check that we do not try to pass too many arguments to the
      stubs.
      Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Matthew Garrett <mjg59@google.com>
      Cc: linux-efi@vger.kernel.org
      Link: https://lkml.kernel.org/r/20200103113953.9571-16-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      14b864f4
    • A
      efi/x86: Simplify mixed mode call wrapper · ea5e1919
      Ard Biesheuvel 提交于
      Calling 32-bit EFI runtime services from a 64-bit OS involves
      switching back to the flat mapping with a stack carved out of
      memory that is 32-bit addressable.
      
      There is no need to actually execute the 64-bit part of this
      routine from the flat mapping as well, as long as the entry
      and return address fit in 32 bits. There is also no need to
      preserve part of the calling context in global variables: we
      can simply push the old stack pointer value to the new stack,
      and keep the return address from the code32 section in EBX.
      
      While at it, move the conditional check whether to invoke
      the mixed mode version of SetVirtualAddressMap() into the
      64-bit implementation of the wrapper routine.
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arvind Sankar <nivedita@alum.mit.edu>
      Cc: Matthew Garrett <mjg59@google.com>
      Cc: linux-efi@vger.kernel.org
      Link: https://lkml.kernel.org/r/20200103113953.9571-11-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ea5e1919
新手
引导
客服 返回
顶部