1. 17 5月, 2017 1 次提交
    • P
      KVM: x86: lower default for halt_poll_ns · b401ee0b
      Paolo Bonzini 提交于
      In some fio benchmarks, halt_poll_ns=400000 caused CPU utilization to
      increase heavily even in cases where the performance improvement was
      small.  In particular, bandwidth divided by CPU usage was as much as
      60% lower.
      
      To some extent this is the expected effect of the patch, and the
      additional CPU utilization is only visible when running the
      benchmarks.  However, halving the threshold also halves the extra
      CPU utilization (from +30-130% to +20-70%) and has no negative
      effect on performance.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b401ee0b
  2. 10 5月, 2017 1 次提交
  3. 09 5月, 2017 3 次提交
  4. 08 5月, 2017 1 次提交
    • X
      x86/mm: Add support for gbpages to kernel_ident_mapping_init() · 66aad4fd
      Xunlei Pang 提交于
      Kernel identity mappings on x86-64 kernels are created in two
      ways: by the early x86 boot code, or by kernel_ident_mapping_init().
      
      Native kernels (which is the dominant usecase) use the former,
      but the kexec and the hibernation code uses kernel_ident_mapping_init().
      
      There's a subtle difference between these two ways of how identity
      mappings are created, the current kernel_ident_mapping_init() code
      creates identity mappings always using 2MB page(PMD level) - while
      the native kernel boot path also utilizes gbpages where available.
      
      This difference is suboptimal both for performance and for memory
      usage: kernel_ident_mapping_init() needs to allocate pages for the
      page tables when creating the new identity mappings.
      
      This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init()
      to address these concerns.
      
      The primary advantage would be better TLB coverage/performance,
      because we'd utilize 1GB TLBs instead of 2MB ones.
      
      It is also useful for machines with large number of memory to
      save paging structure allocations(around 4MB/TB using 2MB page)
      when setting identity mappings for all the memory, after using
      1GB page it will consume only 8KB/TB.
      
      ( Note that this change alone does not activate gbpages in kexec,
        we are doing that in a separate patch. )
      Signed-off-by: NXunlei Pang <xlpang@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: akpm@linux-foundation.org
      Cc: kexec@lists.infradead.org
      Link: http://lkml.kernel.org/r/1493862171-8799-1-git-send-email-xlpang@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      66aad4fd
  5. 05 5月, 2017 1 次提交
    • M
      x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility · 121843eb
      Matthias Kaehlcke 提交于
      The constraint "rm" allows the compiler to put mix_const into memory.
      When the input operand is a memory location then MUL needs an operand
      size suffix, since Clang can't infer the multiplication width from the
      operand.
      
      Add and use the _ASM_MUL macro which determines the operand size and
      resolves to the NUL instruction with the corresponding suffix.
      
      This fixes the following error when building with clang:
      
        CC      arch/x86/lib/kaslr.o
        /tmp/kaslr-dfe1ad.s: Assembler messages:
        /tmp/kaslr-dfe1ad.s:182: Error: no instruction mnemonic suffix given and no register operands; can't size instruction
      Signed-off-by: NMatthias Kaehlcke <mka@chromium.org>
      Cc: Grant Grundler <grundler@chromium.org>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Davidson <md@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170501224741.133938-1-mka@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      121843eb
  6. 02 5月, 2017 5 次提交
  7. 29 4月, 2017 2 次提交
  8. 27 4月, 2017 3 次提交
    • L
      KVM: x86: fix emulation of RSM and IRET instructions · 6ed071f0
      Ladi Prosek 提交于
      On AMD, the effect of set_nmi_mask called by emulate_iret_real and em_rsm
      on hflags is reverted later on in x86_emulate_instruction where hflags are
      overwritten with ctxt->emul_flags (the kvm_set_hflags call). This manifests
      as a hang when rebooting Windows VMs with QEMU, OVMF, and >1 vcpu.
      
      Instead of trying to merge ctxt->emul_flags into vcpu->arch.hflags after
      an instruction is emulated, this commit deletes emul_flags altogether and
      makes the emulator access vcpu->arch.hflags using two new accessors. This
      way all changes, on the emulator side as well as in functions called from
      the emulator and accessing vcpu state with emul_to_vcpu, are preserved.
      
      More details on the bug and its manifestation with Windows and OVMF:
      
        It's a KVM bug in the interaction between SMI/SMM and NMI, specific to AMD.
        I believe that the SMM part explains why we started seeing this only with
        OVMF.
      
        KVM masks and unmasks NMI when entering and leaving SMM. When KVM emulates
        the RSM instruction in em_rsm, the set_nmi_mask call doesn't stick because
        later on in x86_emulate_instruction we overwrite arch.hflags with
        ctxt->emul_flags, effectively reverting the effect of the set_nmi_mask call.
        The AMD-specific hflag of interest here is HF_NMI_MASK.
      
        When rebooting the system, Windows sends an NMI IPI to all but the current
        cpu to shut them down. Only after all of them are parked in HLT will the
        initiating cpu finish the restart. If NMI is masked, other cpus never get
        the memo and the initiating cpu spins forever, waiting for
        hal!HalpInterruptProcessorsStarted to drop. That's the symptom we observe.
      
      Fixes: a584539b ("KVM: x86: pass the whole hflags field to emulator and back")
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6ed071f0
    • P
      KVM: mark requests that need synchronization · 7a97cec2
      Paolo Bonzini 提交于
      kvm_make_all_requests() provides a synchronization that waits until all
      kicked VCPUs have acknowledged the kick.  This is important for
      KVM_REQ_MMU_RELOAD as it prevents freeing while lockless paging is
      underway.
      
      This patch adds the synchronization property into all requests that are
      currently being used with kvm_make_all_requests() in order to preserve
      the current behavior and only introduce a new framework.  Removing it
      from requests where it is not necessary is left for future patches.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7a97cec2
    • R
      KVM: mark requests that do not need a wakeup · 930f7fd6
      Radim Krčmář 提交于
      Some operations must ensure that the guest is not running with stale
      data, but if the guest is halted, then the update can wait until another
      event happens.  kvm_make_all_requests() currently doesn't wake up, so we
      can mark all requests used with it.
      
      First 8 bits were arbitrarily reserved for request numbers.
      
      Most uses of requests have the request type as a constant, so a compiler
      will optimize the '&'.
      
      An alternative would be to have an inline function that would return
      whether the request needs a wake-up or not, but I like this one better
      even though it might produce worse assembly.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      930f7fd6
  9. 26 4月, 2017 2 次提交
    • A
      x86/mm: Remove flush_tlb() and flush_tlb_current_task() · 29961b59
      Andy Lutomirski 提交于
      I was trying to figure out what how flush_tlb_current_task() would
      possibly work correctly if current->mm != current->active_mm, but I
      realized I could spare myself the effort: it has no callers except
      the unused flush_tlb() macro.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/e52d64c11690f85e9f1d69d7b48cc2269cd2e94b.1492844372.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      29961b59
    • D
      x86, dax, pmem: remove indirection around memcpy_from_pmem() · 6abccd1b
      Dan Williams 提交于
      memcpy_from_pmem() maps directly to memcpy_mcsafe(). The wrapper
      serves no real benefit aside from affording a more generic function name
      than the x86-specific 'mcsafe'. However this would not be the first time
      that x86 terminology leaked into the global namespace. For lack of
      better name, just use memcpy_mcsafe() directly.
      
      This conversion also catches a place where we should have been using
      plain memcpy, acpi_nfit_blk_single_io().
      
      Cc: <x86@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      6abccd1b
  10. 23 4月, 2017 1 次提交
    • I
      Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" · 6dd29b3d
      Ingo Molnar 提交于
      This reverts commit 2947ba05.
      
      Dan Williams reported dax-pmem kernel warnings with the following signature:
      
         WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
         percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0 (0) after switching to atomic
      
      ... and bisected it to this commit, which suggests possible memory corruption
      caused by the x86 fast-GUP conversion.
      
      He also pointed out:
      
       "
        This is similar to the backtrace when we were not properly handling
        pud faults and was fixed with this commit: 220ced16 "mm: fix
        get_user_pages() vs device-dax pud mappings"
      
        I've found some missing _devmap checks in the generic
        get_user_pages_fast() path, but this does not fix the regression
        [...]
       "
      
      So given that there are known bugs, and a pretty robust looking bisection
      points to this commit suggesting that are unknown bugs in the conversion
      as well, revert it for the time being - we'll re-try in v4.13.
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: dann.frazier@canonical.com
      Cc: dave.hansen@intel.com
      Cc: steve.capper@linaro.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6dd29b3d
  11. 21 4月, 2017 1 次提交
  12. 20 4月, 2017 1 次提交
  13. 19 4月, 2017 2 次提交
  14. 18 4月, 2017 1 次提交
    • A
      Remove compat_sys_getdents64() · 2611dc19
      Al Viro 提交于
      Unlike normal compat syscall variants, it is needed only for
      biarch architectures that have different alignement requirements for
      u64 in 32bit and 64bit ABI *and* have __put_user() that won't handle
      a store of 64bit value at 32bit-aligned address.  We used to have one
      such (ia64), but its biarch support has been gone since 2010 (after
      being broken in 2008, which went unnoticed since nobody had been using
      it).
      
      It had escaped removal at the same time only because back in 2004
      a patch that switched several syscalls on amd64 from private wrappers to
      generic compat ones had switched to use of compat_sys_getdents64(), which
      hadn't needed (or used) a compat wrapper on amd64.
      
      Let's bury it - it's at least 7 years overdue.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2611dc19
  15. 17 4月, 2017 1 次提交
  16. 15 4月, 2017 3 次提交
  17. 14 4月, 2017 11 次提交