1. 23 9月, 2016 3 次提交
    • W
      KVM: VMX: Enable MSR-BASED TPR shadow even if APICv is inactive · f6e90f9e
      Wanpeng Li 提交于
      I observed that kvmvapic(to optimize flexpriority=N or AMD) is used
      to boost TPR access when testing kvm-unit-test/eventinj.flat tpr case
      on my haswell desktop (w/ flexpriority, w/o APICv). Commit (8d14695f
      x86, apicv: add virtual x2apic support) disable virtual x2apic mode
      completely if w/o APICv, and the author also told me that windows guest
      can't enter into x2apic mode when he developed the APICv feature several
      years ago. However, it is not truth currently, Interrupt Remapping and
      vIOMMU is added to qemu and the developers from Intel test windows 8 can
      work in x2apic mode w/ Interrupt Remapping enabled recently.
      
      This patch enables TPR shadow for virtual x2apic mode to boost
      windows guest in x2apic mode even if w/o APICv.
      
      Can pass the kvm-unit-test.
      Suggested-by: NRadim Krčmář <rkrcmar@redhat.com>
      Suggested-by: NWincy Van <fanwenyi0529@gmail.com>
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Wincy Van <fanwenyi0529@gmail.com>
      Cc: Yang Zhang <yang.zhang.wz@gmail.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f6e90f9e
    • W
      KVM: nVMX: Fix reload apic access page warning · c83b6d15
      Wanpeng Li 提交于
      WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80
      do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8d0de7f9>] prepare_to_swait+0x39/0xa0
      CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
      Call Trace:
       dump_stack+0x99/0xd0
       __warn+0xd1/0xf0
       warn_slowpath_fmt+0x4f/0x60
       ? prepare_to_swait+0x39/0xa0
       ? prepare_to_swait+0x39/0xa0
       __might_sleep+0x7e/0x80
       __gfn_to_pfn_memslot+0x156/0x480 [kvm]
       gfn_to_pfn+0x2a/0x30 [kvm]
       gfn_to_page+0xe/0x20 [kvm]
       kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
       nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
       ? _raw_spin_unlock_irqrestore+0x36/0x80
       vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
       kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
       kvm_vcpu_check_block+0x12/0x60 [kvm]
       kvm_vcpu_block+0x94/0x4c0 [kvm]
       kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
       ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
       kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      4.8.0-rc5+ #47 Not tainted
      -------------------------------
      ./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 0
      1 lock held by qemu-system-x86/4230:
       #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffc062975c>] vcpu_load+0x1c/0x60 [kvm]
      
      stack backtrace:
      CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
      Call Trace:
       dump_stack+0x99/0xd0
       lockdep_rcu_suspicious+0xe7/0x120
       gfn_to_memslot+0x12a/0x140 [kvm]
       gfn_to_pfn+0x12/0x30 [kvm]
       gfn_to_page+0xe/0x20 [kvm]
       kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
       nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
       ? _raw_spin_unlock_irqrestore+0x36/0x80
       vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
       kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
       kvm_vcpu_check_block+0x12/0x60 [kvm]
       kvm_vcpu_block+0x94/0x4c0 [kvm]
       kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
       ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
       kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
       ? __fget+0xfd/0x210
       ? __lock_is_held+0x54/0x70
       do_vfs_ioctl+0x96/0x6a0
       ? __fget+0x11c/0x210
       ? __fget+0x5/0x210
       SyS_ioctl+0x79/0x90
       do_syscall_64+0x81/0x220
       entry_SYSCALL64_slow_path+0x25/0x25
      
      These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat
      
      The nested preemption timer is based on hrtimer which is started on L2
      entry, stopped on L2 exit and evaluated via the new check_nested_events
      hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE)
      if need to yield pCPU and w/o holding srcu read lock when accesses memslots,
      both can be in nested preemption timer evaluation path which results in
      the warning above.
      
      This patch fix it by leveraging request bit to async reload APIC access
      page before vmentry in order to avoid to reload directly during the nested
      preemption timer evaluation, it is safe since the vmcs01 is loaded and
      current is nested vmexit.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Yunhong Jiang <yunhong.jiang@intel.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      c83b6d15
    • R
      config: move x86 kvm_guest.config to a common location · bd6c9222
      Rob Herring 提交于
      kvm_guest.config is useful for KVM guests on other arches, and nothing
      in it appears to be x86 specific, so just move the whole file. Kbuild
      will find it in either location.
      Signed-off-by: NRob Herring <robh@kernel.org>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: kvmarm@lists.cs.columbia.edu
      Cc: kvm@vger.kernel.org
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      bd6c9222
  2. 20 9月, 2016 5 次提交
  3. 16 9月, 2016 4 次提交
  4. 08 9月, 2016 11 次提交
  5. 19 8月, 2016 3 次提交
  6. 18 8月, 2016 5 次提交
    • P
      kvm: nVMX: fix nested tsc scaling · c95ba92a
      Peter Feiner 提交于
      When the host supported TSC scaling, L2 would use a TSC multiplier of
      0, which causes a VM entry failure. Now L2's TSC uses the same
      multiplier as L1.
      Signed-off-by: NPeter Feiner <pfeiner@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c95ba92a
    • R
      KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write · dccbfcf5
      Radim Krčmář 提交于
      If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the
      write with vmcs02 as the current VMCS.
      This will incorrectly apply modifications intended for vmcs01 to vmcs02
      and L2 can use it to gain access to L0's x2APIC registers by disabling
      virtualized x2APIC while using msr bitmap that assumes enabled.
      
      Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the
      current VMCS.  An alternative solution would temporarily make vmcs01 the
      current VMCS, but it requires more care.
      
      Fixes: 8d14695f ("x86, apicv: add virtual x2apic support")
      Reported-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      dccbfcf5
    • R
      KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC · d048c098
      Radim Krčmář 提交于
      msr bitmap can be used to avoid a VM exit (interception) on guest MSR
      accesses.  In some configurations of VMX controls, the guest can even
      directly access host's x2APIC MSRs.  See SDM 29.5 VIRTUALIZING MSR-BASED
      APIC ACCESSES.
      
      L2 could read all L0's x2APIC MSRs and write TPR, EOI, and SELF_IPI.
      To do so, L1 would first trick KVM to disable all possible interceptions
      by enabling APICv features and then would turn those features off;
      nested_vmx_merge_msr_bitmap() only disabled interceptions, so VMX would
      not intercept previously enabled MSRs even though they were not safe
      with the new configuration.
      
      Correctly re-enabling interceptions is not enough as a second bug would
      still allow L1+L2 to access host's MSRs: msr bitmap was shared for all
      VMCSs, so L1 could trigger a race to get the desired combination of msr
      bitmap and VMX controls.
      
      This fix allocates a msr bitmap for every L1 VCPU, allows only safe
      x2APIC MSRs from L1's msr bitmap, and disables msr bitmaps if they would
      have to intercept everything anyway.
      
      Fixes: 3af18d9c ("KVM: nVMX: Prepare for using hardware MSR bitmap")
      Reported-by: NJim Mattson <jmattson@google.com>
      Suggested-by: NWincy Van <fanwenyi0529@gmail.com>
      Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      d048c098
    • J
      x86/smp: Fix __max_logical_packages value setup · 7b0501b1
      Jiri Olsa 提交于
      Frank reported kernel panic when he disabled several cores in BIOS
      via following option:
      
        Core Disable Bitmap(Hex)   [0]
      
      with number 0xFFE, which leaves 16 CPUs in system (out of 48).
      
      The kernel panic below goes along with following messages:
      
       smpboot: Max logical packages: 2^M
       smpboot: APIC(0) Converting physical 0 to logical package 0^M
       smpboot: APIC(20) Converting physical 1 to logical package 1^M
       smpboot: APIC(40) Package 2 exceeds logical package map^M
       smpboot: CPU 8 APICId 40 disabled^M
       smpboot: APIC(60) Package 3 exceeds logical package map^M
       smpboot: CPU 12 APICId 60 disabled^M
       ...
       general protection fault: 0000 [#1] SMP^M
       Modules linked in:^M
       CPU: 15 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc5+ #1^M
       Hardware name: SGI UV300/UV300, BIOS SGI UV 300 series BIOS 05/25/2016^M
       task: ffff8801673e0000 ti: ffff8801673ac000 task.ti: ffff8801673ac000^M
       RIP: 0010:[<ffffffff81014d54>]  [<ffffffff81014d54>] uncore_change_context+0xd4/0x180^M
       ...
        [<ffffffff810158ac>] uncore_event_init_cpu+0x6c/0x70^M
        [<ffffffff81d8c91c>] intel_uncore_init+0x1c2/0x2dd^M
        [<ffffffff81d8c75a>] ? uncore_cpu_setup+0x17/0x17^M
        [<ffffffff81002190>] do_one_initcall+0x50/0x190^M
        [<ffffffff810ab193>] ? parse_args+0x293/0x480^M
        [<ffffffff81d87365>] kernel_init_freeable+0x1a5/0x249^M
        [<ffffffff81d86a35>] ? set_debug_rodata+0x12/0x12^M
        [<ffffffff816dc19e>] kernel_init+0xe/0x110^M
        [<ffffffff816e93bf>] ret_from_fork+0x1f/0x40^M
        [<ffffffff816dc190>] ? rest_init+0x80/0x80^M
      
      The reason for the panic is wrong value of __max_logical_packages,
      which lets logical_package_map uninitialized and the uncore code
      relying on this map being properly initialized (maybe we should
      add some safety checks there as well).
      
      The __max_logical_packages is computed as:
      
        DIV_ROUND_UP(total_cpus, ncpus);
        - ncpus being number of cores
      
      With above BIOS setup we get total_cpus == 16 which set
      __max_logical_packages to 2 (ncpus is 12).
      
      Once topology_update_package_map processes CPU with logical
      pkg over 2 we display above messages and fail to initialize
      the physical_to_logical_pkg map, which makes the uncore code
      crash.
      
      The fix is to remove logical_package_map bitmap completely
      and keep and update the logical_packages number instead.
      
      After we enumerate all the present CPUs, we check if the
      enumerated logical packages count is within its computed
      maximum from BIOS data.
      
      If it's not the case, we set this maximum to the new enumerated
      value and freeze any new addition of logical packages.
      
      The freeze is because lot of init code like uncore/rapl/cqm
      depends on having maximum logical package value set to allocate
      their data, so we can't change it later on.
      
      Prarit Bhargava tested the patch and confirms that it solves
      the problem:
      
        From dmidecode:
                Core Count: 24
                Core Enabled: 24
                Thread Count: 48
      
      Orig kernel boot log:
      
       [    0.464981] smpboot: Max logical packages: 19
       [    0.469861] smpboot: APIC(0) Converting physical 0 to logical package 0
       [    0.477261] smpboot: APIC(40) Converting physical 1 to logical package 1
       [    0.484760] smpboot: APIC(80) Converting physical 2 to logical package 2
       [    0.492258] smpboot: APIC(c0) Converting physical 3 to logical package 3
      
      1.  nr_cpus=8, should stop enumerating in package 0:
      
       [    0.533664] smpboot: APIC(0) Converting physical 0 to logical package 0
       [    0.539596] smpboot: Max logical packages: 19
      
      2.  max_cpus=8, should still enumerate all packages:
      
       [    0.526494] smpboot: APIC(0) Converting physical 0 to logical package 0
       [    0.532428] smpboot: APIC(40) Converting physical 1 to logical package 1
       [    0.538456] smpboot: APIC(80) Converting physical 2 to logical package 2
       [    0.544486] smpboot: APIC(c0) Converting physical 3 to logical package 3
       [    0.550524] smpboot: Max logical packages: 19
      
      3.  nr_cpus=49 ( 2 socket + 1 core on 3rd socket), should stop enumerating in
          package 2:
      
       [    0.521378] smpboot: APIC(0) Converting physical 0 to logical package 0
       [    0.527314] smpboot: APIC(40) Converting physical 1 to logical package 1
       [    0.533345] smpboot: APIC(80) Converting physical 2 to logical package 2
       [    0.539368] smpboot: Max logical packages: 19
      
      4.  maxcpus=49, should still enumerate all packages:
      
       [    0.525591] smpboot: APIC(0) Converting physical 0 to logical package 0
       [    0.531525] smpboot: APIC(40) Converting physical 1 to logical package 1
       [    0.537547] smpboot: APIC(80) Converting physical 2 to logical package 2
       [    0.543579] smpboot: APIC(c0) Converting physical 3 to logical package 3
       [    0.549624] smpboot: Max logical packages: 19
      
      5.  kdump (nr_cpus=1) works as well.
      Reported-by: NFrank Ramsay <framsay@redhat.com>
      Tested-by: NPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Reviewed-by: NPrarit Bhargava <prarit@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160815101700.GA30090@kravaSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7b0501b1
    • B
      x86/microcode/AMD: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y · 88b2f634
      Borislav Petkov 提交于
      Similar to:
      
        efaad554 ("x86/microcode/intel: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y")
      
      ... fix microcode loading from the initrd on AMD by adding the
      randomization offset to the microcode patch container within the initrd.
      Reported-and-tested-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-tip-commits@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160817113314.GA19221@nazgul.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      88b2f634
  7. 16 8月, 2016 1 次提交
  8. 12 8月, 2016 3 次提交
    • K
      perf/x86/intel/uncore: Add enable_box for client MSR uncore · 95f3be79
      Kan Liang 提交于
      There are bug reports about miscounting uncore counters on some
      client machines like Sandybridge, Broadwell and Skylake. It is
      very likely to be observed on idle systems.
      
      This issue is caused by a hardware issue. PERF_GLOBAL_CTL could be
      cleared after Package C7, and nothing will be count.
      The related errata (HSD 158) could be found in:
      
        www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf
      
      This patch tries to work around this issue by re-enabling PERF_GLOBAL_CTL
      in ->enable_box(). The workaround does not cover all cases. It helps for new
      events after returning from C7. But it cannot prevent C7, it will still
      miscount if a counter is already active.
      
      There is no drawback in leaving it enabled, so it does not need
      disable_box() here.
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1470925874-59943-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      95f3be79
    • K
      perf/x86/intel/uncore: Fix uncore num_counters · 10e9e7bd
      Kan Liang 提交于
      Some uncore boxes' num_counters value for Haswell server and
      Broadwell server are not correct (too large, off by one).
      
      This issue was found by comparing the code with the document. Although
      there is no bug report from users yet, accessing non-existent counters
      is dangerous and the behavior is undefined: it may cause miscounting or
      even crashes.
      
      This patch makes them consistent with the uncore document.
      Reported-by: NLukasz Odzioba <lukasz.odzioba@intel.com>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1470925820-59847-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      10e9e7bd
    • D
      uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions · 68187872
      Denys Vlasenko 提交于
      Since instruction decoder now supports EVEX-encoded instructions, two fixes
      are needed to correctly handle them in uprobes.
      
      Extended bits for MODRM.rm field need to be sanitized just like we do it
      for VEX3, to avoid encoding wrong register for register-relative access.
      
      EVEX has _two_ extended bits: b and x. Theoretically, EVEX.x should be
      ignored by the CPU (since GPRs go only up to 15, not 31), but let's be
      paranoid here: proper encoding for register-relative access
      should have EVEX.x = 1.
      
      Secondly, we should fetch vex.vvvv for EVEX too.
      This is now super easy because instruction decoder populates
      vex_prefix.bytes[2] for all flavors of (e)vex encodings, even for VEX2.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org> # v4.1+
      Fixes: 8a764a87 ("x86/asm/decoder: Create artificial 3rd byte for 2-byte VEX")
      Link: http://lkml.kernel.org/r/20160811154521.20469-1-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      68187872
  9. 11 8月, 2016 5 次提交
    • S
      x86/apic/x2apic, smp/hotplug: Don't use before alloc in x2apic_cluster_probe() · d52c0569
      Sebastian Andrzej Siewior 提交于
      I made a mistake while converting the driver to the hotplug state
      machine and as a result x2apic_cluster_probe() was accessing
      cpus_in_cluster before allocating it.
      
      This patch fixes it by setting the cpumask after the allocation the
      memory succeeded.
      
      While at it, I marked two functions static which are only used within
      this file.
      Reported-by: NLaura Abbott <labbott@redhat.com>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 6b2c2847 ("x86/x2apic: Convert to CPU hotplug state machine")
      Link: http://lkml.kernel.org/r/1470924515-9444-1-git-send-email-bigeasy@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d52c0569
    • A
      x86/platform/uv: Skip UV runtime services mapping in the efi_runtime_disabled case · f72075c9
      Alex Thorlton 提交于
      This problem has actually been in the UV code for a while, but we didn't
      catch it until recently, because we had been relying on EFI_OLD_MEMMAP
      to allow our systems to boot for a period of time.  We noticed the issue
      when trying to kexec a recent community kernel, where we hit this NULL
      pointer dereference in efi_sync_low_kernel_mappings():
      
       [    0.337515] BUG: unable to handle kernel NULL pointer dereference at 0000000000000880
       [    0.346276] IP: [<ffffffff8105df8d>] efi_sync_low_kernel_mappings+0x5d/0x1b0
      
      The problem doesn't show up with EFI_OLD_MEMMAP because we skip the
      chunk of setup_efi_state() that sets the efi_loader_signature for the
      kexec'd kernel.  When the kexec'd kernel boots, it won't set EFI_BOOT in
      setup_arch, so we completely avoid the bug.
      
      We always kexec with noefi on the command line, so this shouldn't be an
      issue, but since we're not actually checking for efi_runtime_disabled in
      uv_bios_init(), we end up trying to do EFI runtime callbacks when we
      shouldn't be. This patch just adds a check for efi_runtime_disabled in
      uv_bios_init() so that we don't map in uv_systab when runtime_disabled ==
      true.
      Signed-off-by: NAlex Thorlton <athorlton@sgi.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: <stable@vger.kernel.org> # v4.7
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1470912120-22831-2-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f72075c9
    • A
      x86/efi: Allocate a trampoline if needed in efi_free_boot_services() · 5bc653b7
      Andy Lutomirski 提交于
      On my Dell XPS 13 9350 with firmware 1.4.4 and SGX on, if I boot
      Fedora 24's grub2-efi off a hard disk, my first 1MB of RAM looks
      like:
      
       efi: mem00: [Runtime Data       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000000000-0x0000000000000fff] (0MB)
       efi: mem01: [Boot Data          |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000001000-0x0000000000027fff] (0MB)
       efi: mem02: [Loader Data        |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000028000-0x0000000000029fff] (0MB)
       efi: mem03: [Reserved           |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000000002a000-0x000000000002bfff] (0MB)
       efi: mem04: [Runtime Data       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000000002c000-0x000000000002cfff] (0MB)
       efi: mem05: [Loader Data        |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000000002d000-0x000000000002dfff] (0MB)
       efi: mem06: [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000000002e000-0x0000000000057fff] (0MB)
       efi: mem07: [Reserved           |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000058000-0x0000000000058fff] (0MB)
       efi: mem08: [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000059000-0x000000000009ffff] (0MB)
      
      My EBDA is at 0x2c000, which blocks off everything from 0x2c000 and
      up, and my trampoline is 0x6000 bytes (6 pages), so it doesn't fit
      in the loader data range at 0x28000.
      
      Without this patch, it panics due to a failure to allocate the
      trampoline.  With this patch, it works:
      
       [  +0.001744] Base memory trampoline at [ffff880000001000] 1000 size 24576
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matt Fleming <mfleming@suse.de>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/998c77b3bf709f3dfed85cb30701ed1a5d8a438b.1470821230.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5bc653b7
    • A
      x86/boot: Rework reserve_real_mode() to allow multiple tries · 5ff3e2c3
      Andy Lutomirski 提交于
      If reserve_real_mode() fails, panicing immediately means we're
      doomed.  Make it safe to try more than once to allocate the
      trampoline:
      
       - Degrade a failure from panic() to pr_info().  (If we make it to
         setup_real_mode() without reserving the trampoline, we'll panic
         them.)
      
       - Factor out helpers so that platform code can supply a specific
         address to try.
      
       - Warn if reserve_real_mode() is called after we're done with the
         memblock allocator.  If that were to happen, we would behave
         unpredictably.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matt Fleming <mfleming@suse.de>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/876e383038f3e9971aa72fd20a4f5da05f9d193d.1470821230.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5ff3e2c3
    • A
      x86/boot: Defer setup_real_mode() to early_initcall time · d0de0f68
      Andy Lutomirski 提交于
      There's no need to run setup_real_mode() as early as we run it.
      Defer it to the same early_initcall that sets up the page
      permissions for the real mode code.
      
      This should be a code size reduction.  More importantly, it give us
      a longer window in which we can allocate the real mode trampoline.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matt Fleming <mfleming@suse.de>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/fd62f0da4f79357695e9bf3e365623736b05f119.1470821230.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d0de0f68