1. 05 8月, 2018 4 次提交
  2. 25 7月, 2018 1 次提交
  3. 20 7月, 2018 1 次提交
  4. 19 7月, 2018 1 次提交
    • N
      x86/KVM/VMX: Initialize the vmx_l1d_flush_pages' content · 288d152c
      Nicolai Stange 提交于
      The slow path in vmx_l1d_flush() reads from vmx_l1d_flush_pages in order
      to evict the L1d cache.
      
      However, these pages are never cleared and, in theory, their data could be
      leaked.
      
      More importantly, KSM could merge a nested hypervisor's vmx_l1d_flush_pages
      to fewer than 1 << L1D_CACHE_ORDER host physical pages and this would break
      the L1d flushing algorithm: L1D on x86_64 is tagged by physical addresses.
      
      Fix this by initializing the individual vmx_l1d_flush_pages with a
      different pattern each.
      
      Rename the "empty_zp" asm constraint identifier in vmx_l1d_flush() to
      "flush_pages" to reflect this change.
      
      Fixes: a47dd5f0 ("x86/KVM/VMX: Add L1D flush algorithm")
      Signed-off-by: NNicolai Stange <nstange@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      288d152c
  5. 15 7月, 2018 1 次提交
    • J
      x86/speculation/l1tf: Unbreak !__HAVE_ARCH_PFN_MODIFY_ALLOWED architectures · 6c26fcd2
      Jiri Kosina 提交于
      pfn_modify_allowed() and arch_has_pfn_modify_check() are outside of the 
      !__ASSEMBLY__ section in include/asm-generic/pgtable.h, which confuses 
      assembler on archs that don't have __HAVE_ARCH_PFN_MODIFY_ALLOWED (e.g. 
      ia64) and breaks build:
      
          include/asm-generic/pgtable.h: Assembler messages:
          include/asm-generic/pgtable.h:538: Error: Unknown opcode `static inline bool pfn_modify_allowed(unsigned long pfn,pgprot_t prot)'
          include/asm-generic/pgtable.h:540: Error: Unknown opcode `return true'
          include/asm-generic/pgtable.h:543: Error: Unknown opcode `static inline bool arch_has_pfn_modify_check(void)'
          include/asm-generic/pgtable.h:545: Error: Unknown opcode `return false'
          arch/ia64/kernel/entry.S:69: Error: `mov' does not fit into bundle
      
      Move those two static inlines into the !__ASSEMBLY__ section so that they 
      don't confuse the asm build pass.
      
      Fixes: 42e4089c ("x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings")
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6c26fcd2
  6. 13 7月, 2018 11 次提交
  7. 09 7月, 2018 1 次提交
  8. 05 7月, 2018 10 次提交
  9. 02 7月, 2018 2 次提交
    • T
      cpu/hotplug: Boot HT siblings at least once · 0cc3cd21
      Thomas Gleixner 提交于
      Due to the way Machine Check Exceptions work on X86 hyperthreads it's
      required to boot up _all_ logical cores at least once in order to set the
      CR4.MCE bit.
      
      So instead of ignoring the sibling threads right away, let them boot up
      once so they can configure themselves. After they came out of the initial
      boot stage check whether its a "secondary" sibling and cancel the operation
      which puts the CPU back into offline state.
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NTony Luck <tony.luck@intel.com>
      0cc3cd21
    • T
      Revert "x86/apic: Ignore secondary threads if nosmt=force" · 506a66f3
      Thomas Gleixner 提交于
      Dave Hansen reported, that it's outright dangerous to keep SMT siblings
      disabled completely so they are stuck in the BIOS and wait for SIPI.
      
      The reason is that Machine Check Exceptions are broadcasted to siblings and
      the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a
      logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or
      reboots the machine. The MCE chapter in the SDM contains the following
      blurb:
      
          Because the logical processors within a physical package are tightly
          coupled with respect to shared hardware resources, both logical
          processors are notified of machine check errors that occur within a
          given physical processor. If machine-check exceptions are enabled when
          a fatal error is reported, all the logical processors within a physical
          package are dispatched to the machine-check exception handler. If
          machine-check exceptions are disabled, the logical processors enter the
          shutdown state and assert the IERR# signal. When enabling machine-check
          exceptions, the MCE flag in control register CR4 should be set for each
          logical processor.
      
      Reverting the commit which ignores siblings at enumeration time solves only
      half of the problem. The core cpuhotplug logic needs to be adjusted as
      well.
      
      This thoughtful engineered mechanism also turns the boot process on all
      Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU
      before the secondary CPUs are brought up. Depending on the number of
      physical cores the window in which this situation can happen is smaller or
      larger. On a HSW-EX it's about 750ms:
      
      MCE is enabled on the boot CPU:
      
      [    0.244017] mce: CPU supports 22 MCE banks
      
      The corresponding sibling #72 boots:
      
      [    1.008005] .... node  #0, CPUs:    #72
      
      That means if an MCE hits on physical core 0 (logical CPUs 0 and 72)
      between these two points the machine is going to shutdown. At least it's a
      known safe state.
      
      It's obvious that the early boot can be hit by an MCE as well and then runs
      into the same situation because MCEs are not yet enabled on the boot CPU.
      But after enabling them on the boot CPU, it does not make any sense to
      prevent the kernel from recovering.
      
      Adjust the nosmt kernel parameter documentation as well.
      
      Reverts: 2207def7 ("x86/apic: Ignore secondary threads if nosmt=force")
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NTony Luck <tony.luck@intel.com>
      506a66f3
  10. 30 6月, 2018 1 次提交
  11. 27 6月, 2018 1 次提交
    • V
      x86/speculation/l1tf: Protect PAE swap entries against L1TF · 0d0f6249
      Vlastimil Babka 提交于
      The PAE 3-level paging code currently doesn't mitigate L1TF by flipping the
      offset bits, and uses the high PTE word, thus bits 32-36 for type, 37-63 for
      offset. The lower word is zeroed, thus systems with less than 4GB memory are
      safe. With 4GB to 128GB the swap type selects the memory locations vulnerable
      to L1TF; with even more memory, also the swap offfset influences the address.
      This might be a problem with 32bit PAE guests running on large 64bit hosts.
      
      By continuing to keep the whole swap entry in either high or low 32bit word of
      PTE we would limit the swap size too much. Thus this patch uses the whole PAE
      PTE with the same layout as the 64bit version does. The macros just become a
      bit tricky since they assume the arch-dependent swp_entry_t to be 32bit.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      0d0f6249
  12. 22 6月, 2018 1 次提交
  13. 21 6月, 2018 5 次提交
    • K
      x86/cpufeatures: Add detection of L1D cache flush support. · 11e34e64
      Konrad Rzeszutek Wilk 提交于
      336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR
      (IA32_FLUSH_CMD) which is detected by CPUID.7.EDX[28]=1 bit being set.
      
      This new MSR "gives software a way to invalidate structures with finer
      granularity than other architectual methods like WBINVD."
      
      A copy of this document is available at
        https://bugzilla.kernel.org/show_bug.cgi?id=199511Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      11e34e64
    • V
      x86/speculation/l1tf: Extend 64bit swap file size limit · 1a7ed1ba
      Vlastimil Babka 提交于
      The previous patch has limited swap file size so that large offsets cannot
      clear bits above MAX_PA/2 in the pte and interfere with L1TF mitigation.
      
      It assumed that offsets are encoded starting with bit 12, same as pfn. But
      on x86_64, offsets are encoded starting with bit 9.
      
      Thus the limit can be raised by 3 bits. That means 16TB with 42bit MAX_PA
      and 256TB with 46bit MAX_PA.
      
      Fixes: 377eeaa8 ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1a7ed1ba
    • T
      x86/apic: Ignore secondary threads if nosmt=force · 2207def7
      Thomas Gleixner 提交于
      nosmt on the kernel command line merely prevents the onlining of the
      secondary SMT siblings.
      
      nosmt=force makes the APIC detection code ignore the secondary SMT siblings
      completely, so they even do not show up as possible CPUs. That reduces the
      amount of memory allocations for per cpu variables and saves other
      resources from being allocated too large.
      
      This is not fully equivalent to disabling SMT in the BIOS because the low
      level SMT enabling in the BIOS can result in partitioning of resources
      between the siblings, which is not undone by just ignoring them. Some CPUs
      can use the full resources when their sibling is not onlined, but this is
      depending on the CPU family and model and it's not well documented whether
      this applies to all partitioned resources. That means depending on the
      workload disabling SMT in the BIOS might result in better performance.
      
      Linus analysis of the Intel manual:
      
        The intel optimization manual is not very clear on what the partitioning
        rules are.
      
        I find:
      
          "In general, the buffers for staging instructions between major pipe
           stages  are partitioned. These buffers include µop queues after the
           execution trace cache, the queues after the register rename stage, the
           reorder buffer which stages instructions for retirement, and the load
           and store buffers.
      
           In the case of load and store buffers, partitioning also provided an
           easier implementation to maintain memory ordering for each logical
           processor and detect memory ordering violations"
      
        but some of that partitioning may be relaxed if the HT thread is "not
        active":
      
          "In Intel microarchitecture code name Sandy Bridge, the micro-op queue
           is statically partitioned to provide 28 entries for each logical
           processor,  irrespective of software executing in single thread or
           multiple threads. If one logical processor is not active in Intel
           microarchitecture code name Ivy Bridge, then a single thread executing
           on that processor  core can use the 56 entries in the micro-op queue"
      
        but I do not know what "not active" means, and how dynamic it is. Some of
        that partitioning may be entirely static and depend on the early BIOS
        disabling of HT, and even if we park the cores, the resources will just be
        wasted.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      2207def7
    • T
      x86/cpu/AMD: Evaluate smp_num_siblings early · 1e1d7e25
      Thomas Gleixner 提交于
      To support force disabling of SMT it's required to know the number of
      thread siblings early. amd_get_topology() cannot be called before the APIC
      driver is selected, so split out the part which initializes
      smp_num_siblings and invoke it from amd_early_init().
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      1e1d7e25
    • B
      x86/CPU/AMD: Do not check CPUID max ext level before parsing SMP info · 119bff8a
      Borislav Petkov 提交于
      Old code used to check whether CPUID ext max level is >= 0x80000008 because
      that last leaf contains the number of cores of the physical CPU.  The three
      functions called there now do not depend on that leaf anymore so the check
      can go.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      119bff8a