1. 06 8月, 2018 24 次提交
  2. 18 7月, 2018 2 次提交
    • P
      kvmclock: fix TSC calibration for nested guests · e10f7805
      Peng Hao 提交于
      Inside a nested guest, access to hardware can be slow enough that
      tsc_read_refs always return ULLONG_MAX, causing tsc_refine_calibration_work
      to be called periodically and the nested guest to spend a lot of time
      reading the ACPI timer.
      
      However, if the TSC frequency is available from the pvclock page,
      we can just set X86_FEATURE_TSC_KNOWN_FREQ and avoid the recalibration.
      'refine' operation.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPeng Hao <peng.hao2@zte.com.cn>
      [Commit message rewritten. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e10f7805
    • L
      KVM: VMX: Mark VMXArea with revision_id of physical CPU even when eVMCS enabled · 2307af1c
      Liran Alon 提交于
      When eVMCS is enabled, all VMCS allocated to be used by KVM are marked
      with revision_id of KVM_EVMCS_VERSION instead of revision_id reported
      by MSR_IA32_VMX_BASIC.
      
      However, even though not explictly documented by TLFS, VMXArea passed
      as VMXON argument should still be marked with revision_id reported by
      physical CPU.
      
      This issue was found by the following setup:
      * L0 = KVM which expose eVMCS to it's L1 guest.
      * L1 = KVM which consume eVMCS reported by L0.
      This setup caused the following to occur:
      1) L1 execute hardware_enable().
      2) hardware_enable() calls kvm_cpu_vmxon() to execute VMXON.
      3) L0 intercept L1 VMXON and execute handle_vmon() which notes
      vmxarea->revision_id != VMCS12_REVISION and therefore fails with
      nested_vmx_failInvalid() which sets RFLAGS.CF.
      4) L1 kvm_cpu_vmxon() don't check RFLAGS.CF for failure and therefore
      hardware_enable() continues as usual.
      5) L1 hardware_enable() then calls ept_sync_global() which executes
      INVEPT.
      6) L0 intercept INVEPT and execute handle_invept() which notes
      !vmx->nested.vmxon and thus raise a #UD to L1.
      7) Raised #UD caused L1 to panic.
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Cc: stable@vger.kernel.org
      Fixes: 773e8a04Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2307af1c
  3. 17 7月, 2018 1 次提交
  4. 16 7月, 2018 2 次提交
    • V
      x86/apm: Don't access __preempt_count with zeroed fs · 6f6060a5
      Ville Syrjälä 提交于
      APM_DO_POP_SEGS does not restore fs/gs which were zeroed by
      APM_DO_ZERO_SEGS. Trying to access __preempt_count with
      zeroed fs doesn't really work.
      
      Move the ibrs call outside the APM_DO_SAVE_SEGS/APM_DO_RESTORE_SEGS
      invocations so that fs is actually restored before calling
      preempt_enable().
      
      Fixes the following sort of oopses:
      [    0.313581] general protection fault: 0000 [#1] PREEMPT SMP
      [    0.313803] Modules linked in:
      [    0.314040] CPU: 0 PID: 268 Comm: kapmd Not tainted 4.16.0-rc1-triton-bisect-00090-gdd84441a #19
      [    0.316161] EIP: __apm_bios_call_simple+0xc8/0x170
      [    0.316161] EFLAGS: 00210016 CPU: 0
      [    0.316161] EAX: 00000102 EBX: 00000000 ECX: 00000102 EDX: 00000000
      [    0.316161] ESI: 0000530e EDI: dea95f64 EBP: dea95f18 ESP: dea95ef0
      [    0.316161]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
      [    0.316161] CR0: 80050033 CR2: 00000000 CR3: 015d3000 CR4: 000006d0
      [    0.316161] Call Trace:
      [    0.316161]  ? cpumask_weight.constprop.15+0x20/0x20
      [    0.316161]  on_cpu0+0x44/0x70
      [    0.316161]  apm+0x54e/0x720
      [    0.316161]  ? __switch_to_asm+0x26/0x40
      [    0.316161]  ? __schedule+0x17d/0x590
      [    0.316161]  kthread+0xc0/0xf0
      [    0.316161]  ? proc_apm_show+0x150/0x150
      [    0.316161]  ? kthread_create_worker_on_cpu+0x20/0x20
      [    0.316161]  ret_from_fork+0x2e/0x38
      [    0.316161] Code: da 8e c2 8e e2 8e ea 57 55 2e ff 1d e0 bb 5d b1 0f 92 c3 5d 5f 07 1f 89 47 0c 90 8d b4 26 00 00 00 00 90 8d b4 26 00 00 00 00 90 <64> ff 0d 84 16 5c b1 74 7f 8b 45 dc 8e e0 8b 45 d8 8e e8 8b 45
      [    0.316161] EIP: __apm_bios_call_simple+0xc8/0x170 SS:ESP: 0068:dea95ef0
      [    0.316161] ---[ end trace 656253db2deaa12c ]---
      
      Fixes: dd84441a ("x86/speculation: Use IBRS if available before calling into firmware")
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc:  David Woodhouse <dwmw@amazon.co.uk>
      Cc:  "H. Peter Anvin" <hpa@zytor.com>
      Cc:  x86@kernel.org
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/20180709133534.5963-1-ville.syrjala@linux.intel.com
      6f6060a5
    • D
      x86/asm/memcpy_mcsafe: Fix copy_to_user_mcsafe() exception handling · 092b31aa
      Dan Williams 提交于
      All copy_to_user() implementations need to be prepared to handle faults
      accessing userspace. The __memcpy_mcsafe() implementation handles both
      mmu-faults on the user destination and machine-check-exceptions on the
      source buffer. However, the memcpy_mcsafe() wrapper may silently
      fallback to memcpy() depending on build options and cpu-capabilities.
      
      Force copy_to_user_mcsafe() to always use __memcpy_mcsafe() when
      available, and otherwise disable all of the copy_to_user_mcsafe()
      infrastructure when __memcpy_mcsafe() is not available, i.e.
      CONFIG_X86_MCE=n.
      
      This fixes crashes of the form:
          run fstests generic/323 at 2018-07-02 12:46:23
          BUG: unable to handle kernel paging request at 00007f0d50001000
          RIP: 0010:__memcpy+0x12/0x20
          [..]
          Call Trace:
           copyout_mcsafe+0x3a/0x50
           _copy_to_iter_mcsafe+0xa1/0x4a0
           ? dax_alive+0x30/0x50
           dax_iomap_actor+0x1f9/0x280
           ? dax_iomap_rw+0x100/0x100
           iomap_apply+0xba/0x130
           ? dax_iomap_rw+0x100/0x100
           dax_iomap_rw+0x95/0x100
           ? dax_iomap_rw+0x100/0x100
           xfs_file_dax_read+0x7b/0x1d0 [xfs]
           xfs_file_read_iter+0xa7/0xc0 [xfs]
           aio_read+0x11c/0x1a0
      Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Fixes: 8780356e ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()")
      Link: http://lkml.kernel.org/r/153108277790.37979.1486841789275803399.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      092b31aa
  5. 15 7月, 2018 7 次提交
  6. 13 7月, 2018 1 次提交
  7. 12 7月, 2018 1 次提交
  8. 11 7月, 2018 1 次提交
    • A
      efi/x86: Fix mixed mode reboot loop by removing pointless call to PciIo->Attributes() · e2967018
      Ard Biesheuvel 提交于
      Hans de Goede reported that his mixed EFI mode Bay Trail tablet
      would not boot at all any more, but enter a reboot loop without
      any logs printed by the kernel.
      
      Unbreak 64-bit Linux/x86 on 32-bit UEFI:
      
      When it was first introduced, the EFI stub code that copies the
      contents of PCI option ROMs originally only intended to do so if
      the EFI_PCI_IO_ATTRIBUTE_EMBEDDED_ROM attribute was *not* set.
      
      The reason was that the UEFI spec permits PCI option ROM images
      to be provided by the platform directly, rather than via the ROM
      BAR, and in this case, the OS can only access them at runtime if
      they are preserved at boot time by copying them from the areas
      described by PciIo->RomImage and PciIo->RomSize.
      
      However, it implemented this check erroneously, as can be seen in
      commit:
      
        dd5fc854 ("EFI: Stash ROMs if they're not in the PCI BAR")
      
      which introduced:
      
          if (!attributes & EFI_PCI_IO_ATTRIBUTE_EMBEDDED_ROM)
                  continue;
      
      and given that the numeric value of EFI_PCI_IO_ATTRIBUTE_EMBEDDED_ROM
      is 0x4000, this condition never becomes true, and so the option ROMs
      were copied unconditionally.
      
      This was spotted and 'fixed' by commit:
      
        886d751a ("x86, efi: correct precedence of operators in setup_efi_pci")
      
      but inadvertently inverted the logic at the same time, defeating
      the purpose of the code, since it now only preserves option ROM
      images that can be read from the ROM BAR as well.
      
      Unsurprisingly, this broke some systems, and so the check was removed
      entirely in the following commit:
      
        73970188 ("x86, efi: remove attribute check from setup_efi_pci")
      
      It is debatable whether this check should have been included in the
      first place, since the option ROM image provided to the UEFI driver by
      the firmware may be different from the one that is actually present in
      the card's flash ROM, and so whatever PciIo->RomImage points at should
      be preferred regardless of whether the attribute is set.
      
      As this was the only use of the attributes field, we can remove
      the call to PciIo->Attributes() entirely, which is especially
      nice because its prototype involves uint64_t type by-value
      arguments which the EFI mixed mode has trouble dealing with.
      
      Any mixed mode system with PCI is likely to be affected.
      Tested-by: NWilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
      Tested-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180711090235.9327-2-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e2967018
  9. 08 7月, 2018 1 次提交