1. 02 7月, 2019 1 次提交
  2. 28 6月, 2019 2 次提交
    • R
      x86/mtrr: Skip cache flushes on CPUs with cache self-snooping · fd329f27
      Ricardo Neri 提交于
      Programming MTRR registers in multi-processor systems is a rather lengthy
      process. Furthermore, all processors must program these registers in lock
      step and with interrupts disabled; the process also involves flushing
      caches and TLBs twice. As a result, the process may take a considerable
      amount of time.
      
      On some platforms, this can lead to a large skew of the refined-jiffies
      clock source. Early when booting, if no other clock is available (e.g.,
      booting with hpet=disabled), the refined-jiffies clock source is used to
      monitor the TSC clock source. If the skew of refined-jiffies is too large,
      Linux wrongly assumes that the TSC is unstable:
      
        clocksource: timekeeping watchdog on CPU1: Marking clocksource
                     'tsc-early' as unstable because the skew is too large:
        clocksource: 'refined-jiffies' wd_now: fffedc10 wd_last:
                     fffedb90 mask: ffffffff
        clocksource: 'tsc-early' cs_now: 5eccfddebc cs_last: 5e7e3303d4
                     mask: ffffffffffffffff
        tsc: Marking TSC unstable due to clocksource watchdog
      
      As per measurements, around 98% of the time needed by the procedure to
      program MTRRs in multi-processor systems is spent flushing caches with
      wbinvd(). As per the Section 11.11.8 of the Intel 64 and IA 32
      Architectures Software Developer's Manual, it is not necessary to flush
      caches if the CPU supports cache self-snooping. Thus, skipping the cache
      flushes can reduce by several tens of milliseconds the time needed to
      complete the programming of the MTRR registers:
      
      Platform                      	Before	   After
      104-core (208 Threads) Skylake  1437ms      28ms
        2-core (  4 Threads) Haswell   114ms       2ms
      Reported-by: NMohammad Etemadi <mohammad.etemadi@intel.com>
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Alan Cox <alan.cox@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Andi Kleen <andi.kleen@intel.com>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jordan Borgner <mail@jordan-borgner.de>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Ricardo Neri <ricardo.neri@intel.com>
      Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Link: https://lkml.kernel.org/r/1561689337-19390-3-git-send-email-ricardo.neri-calderon@linux.intel.com
      fd329f27
    • R
      x86/cpu/intel: Clear cache self-snoop capability in CPUs with known errata · 1e03bff3
      Ricardo Neri 提交于
      Processors which have self-snooping capability can handle conflicting
      memory type across CPUs by snooping its own cache. However, there exists
      CPU models in which having conflicting memory types still leads to
      unpredictable behavior, machine check errors, or hangs.
      
      Clear this feature on affected CPUs to prevent its use.
      Suggested-by: NAlan Cox <alan.cox@intel.com>
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Andi Kleen <andi.kleen@intel.com>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jordan Borgner <mail@jordan-borgner.de>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Mohammad Etemadi <mohammad.etemadi@intel.com>
      Cc: Ricardo Neri <ricardo.neri@intel.com>
      Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Link: https://lkml.kernel.org/r/1561689337-19390-2-git-send-email-ricardo.neri-calderon@linux.intel.com
      1e03bff3
  3. 24 6月, 2019 5 次提交
    • F
      Documentation/ABI: Document umwait control sysfs interfaces · 203dffac
      Fenghua Yu 提交于
      Since two new sysfs interface files are created for umwait control, add
      an ABI document entry for the files:
      
         /sys/devices/system/cpu/umwait_control/enable_c02
         /sys/devices/system/cpu/umwait_control/max_time
      
      [ tglx: Made the write value instructions readable ]
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAshok Raj <ashok.raj@intel.com>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Andy Lutomirski" <luto@kernel.org>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-6-git-send-email-fenghua.yu@intel.com
      203dffac
    • F
      x86/umwait: Add sysfs interface to control umwait maximum time · bd9a0c97
      Fenghua Yu 提交于
      IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta
      that processor can stay in C0.1 or C0.2. A zero value means no maximum
      time.
      
      Each instruction sets its own deadline in the instruction's implicit
      input EDX:EAX value. The instruction wakes up if the time-stamp counter
      reaches or exceeds the specified deadline, or the umwait maximum time
      expires, or a store happens in the monitored address range in umwait.
      
      The administrator can write an unsigned 32-bit number to
      /sys/devices/system/cpu/umwait_control/max_time to change the default
      value. Note that a value of zero means there is no limit. The lower two
      bits of the value must be zero.
      
      [ tglx: Simplify the write function. Massage changelog ]
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAshok Raj <ashok.raj@intel.com>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Andy Lutomirski" <luto@kernel.org>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-5-git-send-email-fenghua.yu@intel.com
      bd9a0c97
    • F
      x86/umwait: Add sysfs interface to control umwait C0.2 state · ff4b353f
      Fenghua Yu 提交于
      C0.2 state in umwait and tpause instructions can be enabled or disabled
      on a processor through IA32_UMWAIT_CONTROL MSR register.
      
      By default, C0.2 is enabled and the user wait instructions results in
      lower power consumption with slower wakeup time.
      
      But in real time systems which require faster wakeup time although power
      savings could be smaller, the administrator needs to disable C0.2 and all
      umwait invocations from user applications use C0.1.
      
      Create a sysfs interface which allows the administrator to control C0.2
      state during run time.
      
      Andy Lutomirski suggested to turn off local irqs before writing the MSR to
      ensure the cached control value is not changed by a concurrent sysfs write
      from a different CPU via IPI.
      
      [ tglx: Simplified the update logic in the write function and got rid of
        	all the convoluted type casts. Added a shared update function and
      	made the namespace consistent. Moved the sysfs create invocation.
      	Massaged changelog ]
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAshok Raj <ashok.raj@intel.com>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Andy Lutomirski" <luto@kernel.org>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-4-git-send-email-fenghua.yu@intel.com
      ff4b353f
    • F
      x86/umwait: Initialize umwait control values · bd688c69
      Fenghua Yu 提交于
      umwait or tpause allows the processor to enter a light-weight
      power/performance optimized state (C0.1 state) or an improved
      power/performance optimized state (C0.2 state) for a period specified by
      the instruction or until the system time limit or until a store to the
      monitored address range in umwait.
      
      IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on
      the processor and to set the maximum time the processor can reside in C0.1
      or C0.2.
      
      By default C0.2 is enabled so the user wait instructions can enter the
      C0.2 state to save more power with slower wakeup time.
      
      Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by
      default. A quote from Andy:
      
        "What I want to avoid is the case where it works dramatically differently
         on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may
         behave a bit differently if the max timeout is hit, and I'd like that
         path to get exercised widely by making it happen even on default
         configs."
      
      A sysfs interface to adjust the time and the C0.2 enablement is provided in
      a follow up change.
      
      [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to
        	MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as
        	mask throughout the code.
      	Massaged comments and changelog ]
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAshok Raj <ashok.raj@intel.com>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
      bd688c69
    • F
      x86/cpufeatures: Enumerate user wait instructions · 6dbbf5ec
      Fenghua Yu 提交于
      umonitor, umwait, and tpause are a set of user wait instructions.
      
      umonitor arms address monitoring hardware using an address. The
      address range is determined by using CPUID.0x5. A store to
      an address within the specified address range triggers the
      monitoring hardware to wake up the processor waiting in umwait.
      
      umwait instructs the processor to enter an implementation-dependent
      optimized state while monitoring a range of addresses. The optimized
      state may be either a light-weight power/performance optimized state
      (C0.1 state) or an improved power/performance optimized state
      (C0.2 state).
      
      tpause instructs the processor to enter an implementation-dependent
      optimized state C0.1 or C0.2 state and wake up when time-stamp counter
      reaches specified timeout.
      
      The three instructions may be executed at any privilege level.
      
      The instructions provide power saving method while waiting in
      user space. Additionally, they can allow a sibling hyperthread to
      make faster progress while this thread is waiting. One example of an
      application usage of umwait is when waiting for input data from another
      application, such as a user level multi-threaded packet processing
      engine.
      
      Availability of the user wait instructions is indicated by the presence
      of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].
      
      Detailed information on the instructions and CPUID feature WAITPKG flag
      can be found in the latest Intel Architecture Instruction Set Extensions
      and Future Features Programming Reference and Intel 64 and IA-32
      Architectures Software Developer's Manual.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAshok Raj <ashok.raj@intel.com>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-2-git-send-email-fenghua.yu@intel.com
      6dbbf5ec
  4. 22 6月, 2019 22 次提交
  5. 20 6月, 2019 4 次提交
    • F
      x86/cpufeatures: Enumerate the new AVX512 BFLOAT16 instructions · b302e4b1
      Fenghua Yu 提交于
      AVX512 BFLOAT16 instructions support 16-bit BFLOAT16 floating-point
      format (BF16) for deep learning optimization.
      
      BF16 is a short version of 32-bit single-precision floating-point
      format (FP32) and has several advantages over 16-bit half-precision
      floating-point format (FP16). BF16 keeps FP32 accumulation after
      multiplication without loss of precision, offers more than enough
      range for deep learning training tasks, and doesn't need to handle
      hardware exception.
      
      AVX512 BFLOAT16 instructions are enumerated in CPUID.7.1:EAX[bit 5]
      AVX512_BF16.
      
      CPUID.7.1:EAX contains only feature bits. Reuse the currently empty
      word 12 as a pure features word to hold the feature bits including
      AVX512_BF16.
      
      Detailed information of the CPUID bit and AVX512 BFLOAT16 instructions
      can be found in the latest Intel Architecture Instruction Set Extensions
      and Future Features Programming Reference.
      
       [ bp: Check CPUID(7) subleaf validity before accessing subleaf 1. ]
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: Robert Hoo <robert.hu@linux.intel.com>
      Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
      Cc: x86 <x86@kernel.org>
      Link: https://lkml.kernel.org/r/1560794416-217638-3-git-send-email-fenghua.yu@intel.com
      b302e4b1
    • F
      x86/cpufeatures: Combine word 11 and 12 into a new scattered features word · acec0ce0
      Fenghua Yu 提交于
      It's a waste for the four X86_FEATURE_CQM_* feature bits to occupy two
      whole feature bits words. To better utilize feature words, re-define
      word 11 to host scattered features and move the four X86_FEATURE_CQM_*
      features into Linux defined word 11. More scattered features can be
      added in word 11 in the future.
      
      Rename leaf 11 in cpuid_leafs to CPUID_LNX_4 to reflect it's a
      Linux-defined leaf.
      
      Rename leaf 12 as CPUID_DUMMY which will be replaced by a meaningful
      name in the next patch when CPUID.7.1:EAX occupies world 12.
      
      Maximum number of RMID and cache occupancy scale are retrieved from
      CPUID.0xf.1 after scattered CQM features are enumerated. Carve out the
      code into a separate function.
      
      KVM doesn't support resctrl now. So it's safe to move the
      X86_FEATURE_CQM_* features to scattered features word 11 for KVM.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Aaron Lewis <aaronlewis@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Babu Moger <babu.moger@amd.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
      Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
      Cc: x86 <x86@kernel.org>
      Link: https://lkml.kernel.org/r/1560794416-217638-2-git-send-email-fenghua.yu@intel.com
      acec0ce0
    • B
      x86/cpufeatures: Carve out CQM features retrieval · 45fc56e6
      Borislav Petkov 提交于
      ... into a separate function for better readability. Split out from a
      patch from Fenghua Yu <fenghua.yu@intel.com> to keep the mechanical,
      sole code movement separate for easy review.
      
      No functional changes.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: x86@kernel.org
      45fc56e6
    • Q
      x86/cacheinfo: Fix a -Wtype-limits warning · 1b7aebf0
      Qian Cai 提交于
      cpuinfo_x86.x86_model is an unsigned type, so comparing against zero
      will generate a compilation warning:
      
        arch/x86/kernel/cpu/cacheinfo.c: In function 'cacheinfo_amd_init_llc_id':
        arch/x86/kernel/cpu/cacheinfo.c:662:19: warning: comparison is always true \
          due to limited range of data type [-Wtype-limits]
      
      Remove the unnecessary lower bound check.
      
       [ bp: Massage. ]
      
      Fixes: 68091ee7 ("x86/CPU/AMD: Calculate last level cache ID from number of sharing threads")
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/1560954773-11967-1-git-send-email-cai@lca.pw
      1b7aebf0
  6. 14 6月, 2019 3 次提交
  7. 09 6月, 2019 3 次提交