1. 18 12月, 2016 1 次提交
  2. 17 12月, 2016 1 次提交
  3. 15 12月, 2016 3 次提交
    • K
      x86/mm: Drop unused argument 'removed' from sync_global_pgds() · 5372e155
      Kirill A. Shutemov 提交于
      Since commit af2cf278 ("x86/mm/hotplug: Don't remove PGD entries in
      remove_pagetable()") there are no callers of sync_global_pgds() which set
      the 'removed' argument to 1.
      
      Remove the argument and the related conditionals in the function.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Link: http://lkml.kernel.org/r/20161214234403.137556-1-kirill.shutemov@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5372e155
    • T
      x86/tsc: Force TSC_ADJUST register to value >= zero · 5bae1562
      Thomas Gleixner 提交于
      Roland reported that his DELL T5810 sports a value add BIOS which
      completely wreckages the TSC. The squirmware [(TM) Ingo Molnar] boots with
      random negative TSC_ADJUST values, different on all CPUs. That renders the
      TSC useless because the sycnchronization check fails.
      
      Roland tested the new TSC_ADJUST mechanism. While it manages to readjust
      the TSCs he needs to disable the TSC deadline timer, otherwise the machine
      just stops booting.
      
      Deeper investigation unearthed that the TSC deadline timer is sensitive to
      the TSC_ADJUST value. Writing TSC_ADJUST to a negative value results in an
      interrupt storm caused by the TSC deadline timer.
      
      This does not make any sense and it's hard to imagine what kind of hardware
      wreckage is behind that misfeature, but it's reliably reproducible on other
      systems which have TSC_ADJUST and TSC deadline timer.
      
      While it would be understandable that a big enough negative value which
      moves the resulting TSC readout into the negative space could have the
      described effect, this happens even with a adjust value of -1, which keeps
      the TSC readout definitely in the positive space. The compare register for
      the TSC deadline timer is set to a positive value larger than the TSC, but
      despite not having reached the deadline the interrupt is raised
      immediately. If this happens on the boot CPU, then the machine dies
      silently because this setup happens before the NMI watchdog is armed.
      
      Further experiments showed that any other adjustment of TSC_ADJUST works as
      expected as long as it stays in the positive range. The direction of the
      adjustment has no influence either. See the lkml link for further analysis.
      
      Yet another proof for the theory that timers are designed by janitors and
      the underlying (obviously undocumented) mechanisms which allow BIOSes to
      wreckage them are considered a feature. Well done Intel - NOT!
      
      To address this wreckage add the following sanity measures:
      
      - If the TSC_ADJUST value on the boot cpu is not 0, set it to 0
      
      - If the TSC_ADJUST value on any cpu is negative, set it to 0
      
      - Prevent the cross package synchronization mechanism from setting negative
        TSC_ADJUST values.
      Reported-and-tested-by: NRoland Scheidegger <rscheidegger_lists@hispeed.ch>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
      Cc: Kevin Stanton <kevin.b.stanton@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Allen Hung <allen_hung@dell.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/r/20161213131211.397588033@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5bae1562
    • T
      x86/tsc: Validate TSC_ADJUST after resume · 6a369583
      Thomas Gleixner 提交于
      Some 'feature' BIOSes fiddle with the TSC_ADJUST register during
      suspend/resume which renders the TSC unusable.
      
      Add sanity checks into the resume path and restore the
      original value if it was adjusted.
      Reported-and-tested-by: NRoland Scheidegger <rscheidegger_lists@hispeed.ch>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
      Cc: Kevin Stanton <kevin.b.stanton@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Allen Hung <allen_hung@dell.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/r/20161213131211.317654500@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      6a369583
  4. 14 12月, 2016 1 次提交
  5. 11 12月, 2016 1 次提交
    • P
      x86/paravirt: Fix bool return type for PVOP_CALL() · 11f254db
      Peter Zijlstra 提交于
      Commit:
      
        3cded417 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()")
      
      introduced a paravirt op with bool return type [*]
      
      It turns out that the PVOP_CALL*() macros miscompile when rettype is
      bool. Code that looked like:
      
         83 ef 01                sub    $0x1,%edi
         ff 15 32 a0 d8 00       callq  *0xd8a032(%rip)        # ffffffff81e28120 <pv_lock_ops+0x20>
         84 c0                   test   %al,%al
      
      ended up looking like so after PVOP_CALL1() was applied:
      
         83 ef 01                sub    $0x1,%edi
         48 63 ff                movslq %edi,%rdi
         ff 14 25 20 81 e2 81    callq  *0xffffffff81e28120
         48 85 c0                test   %rax,%rax
      
      Note how it tests the whole of %rax, even though a typical bool return
      function only sets %al, like:
      
        0f 95 c0                setne  %al
        c3                      retq
      
      This is because ____PVOP_CALL() does:
      
      		__ret = (rettype)__eax;
      
      and while regular integer type casts truncate the result, a cast to
      bool tests for any !0 value. Fix this by explicitly truncating to
      sizeof(rettype) before casting.
      
      [*] The actual bug should've been exposed in commit:
            446f3dc8 ("locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests")
          but that didn't properly implement the paravirt call.
      Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 3cded417 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()")
      Link: http://lkml.kernel.org/r/20161208154349.346057680@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      11f254db
  6. 10 12月, 2016 5 次提交
  7. 09 12月, 2016 2 次提交
    • S
      tracing: Have the reg function allow to fail · 8cf868af
      Steven Rostedt (Red Hat) 提交于
      Some tracepoints have a registration function that gets enabled when the
      tracepoint is enabled. There may be cases that the registraction function
      must fail (for example, can't allocate enough memory). In this case, the
      tracepoint should also fail to register, otherwise the user would not know
      why the tracepoint is not working.
      
      Cc: David Howells <dhowells@redhat.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8cf868af
    • A
      x86: Make E820_X_MAX unconditionally larger than E820MAX · 9d2f86c6
      Alex Thorlton 提交于
      It's really not necessary to limit E820_X_MAX to 128 in the non-EFI
      case.  This commit drops E820_X_MAX's dependency on CONFIG_EFI, so that
      E820_X_MAX is always at least slightly larger than E820MAX.
      
      The real motivation behind this is actually to prevent some issues in
      the Xen kernel, where the XENMEM_machine_memory_map hypercall can
      produce an e820 map larger than 128 entries, even on systems where the
      original e820 table was quite a bit smaller than that, depending on how
      many IOAPICs are installed on the system.
      Signed-off-by: NAlex Thorlton <athorlton@sgi.com>
      Suggested-by: NIngo Molnar <mingo@redhat.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      9d2f86c6
  8. 08 12月, 2016 5 次提交
    • L
      KVM: nVMX: check host CR3 on vmentry and vmexit · 1dc35dac
      Ladi Prosek 提交于
      This commit adds missing host CR3 checks. Before entering guest mode, the value
      of CR3 is checked for reserved bits. After returning, nested_vmx_load_cr3 is
      called to set the new CR3 value and check and load PDPTRs.
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      1dc35dac
    • L
      KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry · 9ed38ffa
      Ladi Prosek 提交于
      Loading CR3 as part of emulating vmentry is different from regular CR3 loads,
      as implemented in kvm_set_cr3, in several ways.
      
      * different rules are followed to check CR3 and it is desirable for the caller
      to distinguish between the possible failures
      * PDPTRs are not loaded if PAE paging and nested EPT are both enabled
      * many MMU operations are not necessary
      
      This patch introduces nested_vmx_load_cr3 suitable for CR3 loads as part of
      nested vmentry and vmexit, and makes use of it on the nested vmentry path.
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      9ed38ffa
    • D
      KVM: nVMX: support restore of VMX capability MSRs · 62cc6b9d
      David Matlack 提交于
      The VMX capability MSRs advertise the set of features the KVM virtual
      CPU can support. This set of features varies across different host CPUs
      and KVM versions. This patch aims to addresses both sources of
      differences, allowing VMs to be migrated across CPUs and KVM versions
      without guest-visible changes to these MSRs. Note that cross-KVM-
      version migration is only supported from this point forward.
      
      When the VMX capability MSRs are restored, they are audited to check
      that the set of features advertised are a subset of what KVM and the
      CPU support.
      
      Since the VMX capability MSRs are read-only, they do not need to be on
      the default MSR save/restore lists. The userspace hypervisor can set
      the values of these MSRs or read them from KVM at VCPU creation time,
      and restore the same value after every save/restore.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      62cc6b9d
    • K
      KVM: x86: Add kvm_skip_emulated_instruction and use it. · 6affcbed
      Kyle Huey 提交于
      kvm_skip_emulated_instruction calls both
      kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
      skipping the emulated instruction and generating a trap if necessary.
      
      Replacing skip_emulated_instruction calls with
      kvm_skip_emulated_instruction is straightforward, except for:
      
      - ICEBP, which is already inside a trap, so avoid triggering another trap.
      - Instructions that can trigger exits to userspace, such as the IO insns,
        MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
        KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
        IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
        take precedence. The singlestep will be triggered again on the next
        instruction, which is the current behavior.
      - Task switch instructions which would require additional handling (e.g.
        the task switch bit) and are instead left alone.
      - Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
        which do not trigger singlestep traps as mentioned previously.
      Signed-off-by: NKyle Huey <khuey@kylehuey.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      6affcbed
    • K
      KVM: x86: Add a return value to kvm_emulate_cpuid · 6a908b62
      Kyle Huey 提交于
      Once skipping the emulated instruction can potentially trigger an exit to
      userspace (via KVM_GUESTDBG_SINGLESTEP) kvm_emulate_cpuid will need to
      propagate a return value.
      Signed-off-by: NKyle Huey <khuey@kylehuey.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      6a908b62
  9. 06 12月, 2016 1 次提交
  10. 30 11月, 2016 5 次提交
    • T
      x86/tsc: Fix broken CONFIG_X86_TSC=n build · b8365543
      Thomas Gleixner 提交于
      Add the missing return statement to the inline stub
      tsc_store_and_check_tsc_adjust() and add the other stubs to make a
      SMP=y,TSC=n build happy.
      
      While at it, remove the unused variable from the UP variant of
      tsc_store_and_check_tsc_adjust().
      
      Fixes: commit ba75fb646931 ("x86/tsc: Sync test only for the first cpu in a package")
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b8365543
    • T
      sched/x86: Change CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO · de966cf4
      Tim Chen 提交于
      Rename CONFIG_SCHED_ITMT for Intel Turbo Boost Max Technology 3.0
      to CONFIG_SCHED_MC_PRIO.  This makes the configuration extensible
      in future to other architectures that wish to similarly establish
      CPU core priorities support in the scheduler.
      
      The description in Kconfig is updated to reflect this change with
      added details for better clarity.  The configuration is explicitly
      default-y, to enable the feature on CPUs that have this feature.
      
      It has no effect on non-TBM3 CPUs.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bp@suse.de
      Cc: jolsa@redhat.com
      Cc: linux-acpi@vger.kernel.org
      Cc: linux-pm@vger.kernel.org
      Cc: rjw@rjwysocki.net
      Link: http://lkml.kernel.org/r/2b2ee29d93e3f162922d72d0165a1405864fbb23.1480444902.git.tim.c.chen@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      de966cf4
    • T
      x86/tsc: Sync test only for the first cpu in a package · a36f5136
      Thomas Gleixner 提交于
      If the TSC_ADJUST MSR is available all CPUs in a package are forced to the
      same value. So TSCs cannot be out of sync when the first CPU in the package
      was in sync.
      
      That allows to skip the sync test for all CPUs except the first starting
      CPU in a package.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/r/20161119134017.809901363@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a36f5136
    • T
      x86/tsc: Verify TSC_ADJUST from idle · 1d0095fe
      Thomas Gleixner 提交于
      When entering idle, it's a good oportunity to verify that the TSC_ADJUST
      MSR has not been tampered with (BIOS hiding SMM cycles). If tampering is
      detected, emit a warning and restore it to the previous value.
      
      This is especially important for machines, which mark the TSC reliable
      because there is no watchdog clocksource available (SoCs).
      
      This is not sufficient for HPC (NOHZ_FULL) situations where a CPU never
      goes idle, but adding a timer to do the check periodically is not an option
      either. On a machine, which has this issue, the check triggeres right
      during boot, so there is a decent chance that the sysadmin will notice.
      
      Rate limit the check to once per second and warn only once per cpu.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/r/20161119134017.732180441@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1d0095fe
    • T
      x86/tsc: Store and check TSC ADJUST MSR · 8b223bc7
      Thomas Gleixner 提交于
      The TSC_ADJUST MSR shows whether the TSC has been modified. This is helpful
      in a two aspects:
      
      1) It allows to detect BIOS wreckage, where SMM code tries to 'hide' the
         cycles spent by storing the TSC value at SMM entry and restoring it at
         SMM exit. On affected machines the TSCs run slowly out of sync up to the
         point where the clocksource watchdog (if available) detects it.
      
         The TSC_ADJUST MSR allows to detect the TSC modification before that and
         eventually restore it. This is also important for SoCs which have no
         watchdog clocksource and therefore TSC wreckage cannot be detected and
         acted upon.
      
      2) All threads in a package are required to have the same TSC_ADJUST
         value. Broken BIOSes break that and as a result the TSC synchronization
         check fails.
      
         The TSC_ADJUST MSR allows to detect the deviation when a CPU comes
         online. If detected set it to the value of an already online CPU in the
         same package. This also allows to reduce the number of sync tests
         because with that in place the test is only required for the first CPU
         in a package.
      
         In principle all CPUs in a system should have the same TSC_ADJUST value
         even across packages, but with physical CPU hotplug this assumption is
         not true because the TSC starts with power on, so physical hotplug has
         to do some trickery to bring the TSC into sync with already running
         packages, which requires to use an TSC_ADJUST value different from CPUs
         which got powered earlier.
      
         A final enhancement is the opportunity to compensate for unsynced TSCs
         accross nodes at boot time and make the TSC usable that way. It won't
         help for TSCs which run apart due to frequency skew between packages,
         but this gets detected by the clocksource watchdog later.
      
      The first step toward this is to store the TSC_ADJUST value of a starting
      CPU and compare it with the value of an already online CPU in the same
      package. If they differ, emit a warning and adjust it to the reference
      value. The !SMP version just stores the boot value for later verification.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/r/20161119134017.655323776@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8b223bc7
  11. 28 11月, 2016 2 次提交
    • H
      crypto: glue_helper - Add skcipher xts helpers · 065ce327
      Herbert Xu 提交于
      This patch adds xts helpers that use the skcipher interface rather
      than blkcipher.  This will be used by aesni_intel.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      065ce327
    • P
      x86/build: Remove three unneeded genhdr-y entries · 9190e217
      Paul Bolle 提交于
      In x86's include/asm/Kbuild three entries are appended to the genhdr-y make
      variable:
      
          genhdr-y += unistd_32.h
          genhdr-y += unistd_64.h
          genhdr-y += unistd_x32.h
      
      The same entries are also appended to that variable in
      include/uapi/asm/Kbuild. So commit:
      
        10b63956 ("UAPI: Plumb the UAPI Kbuilds into the user header installation and checking")
      
      ... removed these three entries from include/asm/Kbuild. But, apparently, some
      merge conflict resolution re-added them.
      
      The net effect is, in short, that the genhdr-y make variable contains these
      file names twice and, as a consequence, that the corresponding headers get
      installed twice. And so the build prints:
      
        INSTALL usr/include/asm/ (65 files)
      
      ... while in reality only 62 files are installed in that directory.
      
      Nothing breaks because of all that, but it's a good idea to finally remove
      these unneeded entries nevertheless.
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1480077707-2837-1-git-send-email-pebolle@tiscali.nlSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9190e217
  12. 25 11月, 2016 5 次提交
  13. 24 11月, 2016 1 次提交
    • D
      x86/coredump: Always use user_regs_struct for compat_elf_gregset_t · 7b2dd368
      Dmitry Safonov 提交于
      Commit:
      
        90954e7b ("x86/coredump: Use pr_reg size, rather that TIF_IA32 flag")
      
      changed the coredumping code to construct the elf coredump file according
      to register set size - and that's good: if binary crashes with 32-bit code
      selector, generate 32-bit ELF core, otherwise - 64-bit core.
      
      That was made for restoring 32-bit applications on x86_64: we want
      32-bit application after restore to generate 32-bit ELF dump on crash.
      
      All was quite good and recently I started reworking 32-bit applications
      dumping part of CRIU: now it has two parasites (32 and 64) for seizing
      compat/native tasks, after rework it'll have one parasite, working in
      64-bit mode, to which 32-bit prologue long-jumps during infection.
      
      And while it has worked for my work machine, in VM with
      !CONFIG_X86_X32_ABI during reworking I faced that segfault in 32-bit
      binary, that has long-jumped to 64-bit mode results in dereference
      of garbage:
      
       32-victim[19266]: segfault at f775ef65 ip 00000000f775ef65 sp 00000000f776aa50 error 14
       BUG: unable to handle kernel paging request at ffffffffffffffff
       IP: [<ffffffff81332ce0>] strlen+0x0/0x20
       [...]
       Call Trace:
        [] elf_core_dump+0x11a9/0x1480
        [] do_coredump+0xa6b/0xe60
        [] get_signal+0x1a8/0x5c0
        [] do_signal+0x23/0x660
        [] exit_to_usermode_loop+0x34/0x65
        [] prepare_exit_to_usermode+0x2f/0x40
        [] retint_user+0x8/0x10
      
      That's because we have 64-bit registers set (with according total size)
      and we're writing it to elf_thread_core_info which has smaller size
      on !CONFIG_X86_X32_ABI. That lead to overwriting ELF notes part.
      
      Tested on 32-, 64-bit ELF crashes and on 32-bit binaries that have
      jumped with 64-bit code selector - all is readable with gdb.
      Signed-off-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Fixes: 90954e7b ("x86/coredump: Use pr_reg size, rather that TIF_IA32 flag")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7b2dd368
  14. 23 11月, 2016 2 次提交
  15. 22 11月, 2016 4 次提交
    • P
      x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted() · 3cded417
      Peter Zijlstra 提交于
      Avoid the pointless function call to pv_lock_ops.vcpu_is_preempted()
      when a paravirt spinlock enabled kernel is ran on native hardware.
      
      Do this by patching out the CALL instruction with "XOR %RAX,%RAX"
      which has the same effect (0 return value).
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David.Laight@ACULAB.COM
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: benh@kernel.crashing.org
      Cc: boqun.feng@gmail.com
      Cc: borntraeger@de.ibm.com
      Cc: bsingharora@gmail.com
      Cc: dave@stgolabs.net
      Cc: jgross@suse.com
      Cc: kernellwp@gmail.com
      Cc: konrad.wilk@oracle.com
      Cc: mpe@ellerman.id.au
      Cc: paulmck@linux.vnet.ibm.com
      Cc: paulus@samba.org
      Cc: pbonzini@redhat.com
      Cc: rkrcmar@redhat.com
      Cc: will.deacon@arm.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3cded417
    • P
      x86/kvm: Support the vCPU preemption check · 0b9f6c46
      Pan Xinhui 提交于
      Support the vcpu_is_preempted() functionality under KVM. This will
      enhance lock performance on overcommitted hosts (more runnable vCPUs
      than physical CPUs in the system) as doing busy waits for preempted
      vCPUs will hurt system performance far worse than early yielding.
      
      Use struct kvm_steal_time::preempted to indicate that if a vCPU
      is running or not.
      Signed-off-by: NPan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: David.Laight@ACULAB.COM
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: benh@kernel.crashing.org
      Cc: boqun.feng@gmail.com
      Cc: borntraeger@de.ibm.com
      Cc: bsingharora@gmail.com
      Cc: dave@stgolabs.net
      Cc: jgross@suse.com
      Cc: kernellwp@gmail.com
      Cc: konrad.wilk@oracle.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: mpe@ellerman.id.au
      Cc: paulmck@linux.vnet.ibm.com
      Cc: paulus@samba.org
      Cc: rkrcmar@redhat.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: will.deacon@arm.com
      Cc: xen-devel-request@lists.xenproject.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1478077718-37424-9-git-send-email-xinhui.pan@linux.vnet.ibm.com
      [ Typo fixes. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0b9f6c46
    • P
      locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests · 446f3dc8
      Pan Xinhui 提交于
      Optimize spinlock and mutex busy-loops by providing a vcpu_is_preempted(cpu)
      function on KVM and Xen platforms.
      
      Extend the pv_lock_ops interface accordingly and implement the callbacks
      on KVM and Xen.
      Signed-off-by: NPan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      [ Translated to English. ]
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: David.Laight@ACULAB.COM
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: benh@kernel.crashing.org
      Cc: boqun.feng@gmail.com
      Cc: borntraeger@de.ibm.com
      Cc: bsingharora@gmail.com
      Cc: dave@stgolabs.net
      Cc: jgross@suse.com
      Cc: kernellwp@gmail.com
      Cc: konrad.wilk@oracle.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: mpe@ellerman.id.au
      Cc: paulmck@linux.vnet.ibm.com
      Cc: paulus@samba.org
      Cc: rkrcmar@redhat.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: will.deacon@arm.com
      Cc: xen-devel-request@lists.xenproject.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1478077718-37424-7-git-send-email-xinhui.pan@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      446f3dc8
    • Y
      x86/mce/AMD: Add system physical address translation for AMD Fam17h · f5382de9
      Yazen Ghannam 提交于
      The Unified Memory Controllers (UMCs) on Fam17h log a normalized address
      in their MCA_ADDR registers. We need to convert that normalized address
      to a system physical address in order to support a few facilities:
      
      1) To offline poisoned pages in DRAM proactively in the deferred error
         handler.
      
      2) To print sysaddr and page info for DRAM ECC errors in EDAC.
      
      [ Boris: fixes/cleanups ontop:
      
        * hi_addr_offset = 0 - no need for that branch. Stick it all under the
          HiAddrOffsetEn case. It confines hi_addr_offset's declaration too.
      
        * Move variables to the innermost scope they're used at so that we save
          on stack and not blow it up immediately on function entry.
      
        * Do not modify *sys_addr prematurely - we want to not exit early and
          have modified *sys_addr some, which callers get to see. We either
          convert to a sys_addr or we don't do anything. And we signal that with
          the retval of the function.
      
        * Rename label out -> out_err - because it is the error path.
      
        * No need to pr_err of the conversion failed case: imagine a
          sparsely-populated machine with UMCs which don't have DIMMs. Callers
          should look at the retval instead and issue a printk only when really
          necessary. No need for useless info in dmesg.
      
        * s/temp_reg/tmp/ and other variable names shortening => shorter code.
      
        * Use BIT() everywhere.
      
        * Make error messages more informative.
      
        *  Small build fix for the !CONFIG_X86_MCE_AMD case.
      
        * ... and more minor cleanups.
      ]
      Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20161122111133.mjzpvzhf7o7yl2oa@pd.tnic
      [ Typo fixes. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f5382de9
  16. 21 11月, 2016 1 次提交