1. 17 5月, 2017 1 次提交
    • P
      KVM: x86: lower default for halt_poll_ns · b401ee0b
      Paolo Bonzini 提交于
      In some fio benchmarks, halt_poll_ns=400000 caused CPU utilization to
      increase heavily even in cases where the performance improvement was
      small.  In particular, bandwidth divided by CPU usage was as much as
      60% lower.
      
      To some extent this is the expected effect of the patch, and the
      additional CPU utilization is only visible when running the
      benchmarks.  However, halving the threshold also halves the extra
      CPU utilization (from +30-130% to +20-70%) and has no negative
      effect on performance.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b401ee0b
  2. 16 5月, 2017 1 次提交
  3. 15 5月, 2017 3 次提交
    • W
      KVM: VMX: Don't enable EPT A/D feature if EPT feature is disabled · fce6ac4c
      Wanpeng Li 提交于
      We can observe eptad kvm_intel module parameter is still Y
      even if ept is disabled which is weird. This patch will
      not enable EPT A/D feature if EPT feature is disabled.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      fce6ac4c
    • W
      KVM: x86: Fix load damaged SSEx MXCSR register · a575813b
      Wanpeng Li 提交于
      Reported by syzkaller:
      
         BUG: unable to handle kernel paging request at ffffffffc07f6a2e
         IP: report_bug+0x94/0x120
         PGD 348e12067
         P4D 348e12067
         PUD 348e14067
         PMD 3cbd84067
         PTE 80000003f7e87161
      
         Oops: 0003 [#1] SMP
         CPU: 2 PID: 7091 Comm: kvm_load_guest_ Tainted: G           OE   4.11.0+ #8
         task: ffff92fdfb525400 task.stack: ffffbda6c3d04000
         RIP: 0010:report_bug+0x94/0x120
         RSP: 0018:ffffbda6c3d07b20 EFLAGS: 00010202
          do_trap+0x156/0x170
          do_error_trap+0xa3/0x170
          ? kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm]
          ? mark_held_locks+0x79/0xa0
          ? retint_kernel+0x10/0x10
          ? trace_hardirqs_off_thunk+0x1a/0x1c
          do_invalid_op+0x20/0x30
          invalid_op+0x1e/0x30
         RIP: 0010:kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm]
          ? kvm_load_guest_fpu.part.175+0x1c/0x170 [kvm]
          kvm_arch_vcpu_ioctl_run+0xed6/0x1b70 [kvm]
          kvm_vcpu_ioctl+0x384/0x780 [kvm]
          ? kvm_vcpu_ioctl+0x384/0x780 [kvm]
          ? sched_clock+0x13/0x20
          ? __do_page_fault+0x2a0/0x550
          do_vfs_ioctl+0xa4/0x700
          ? up_read+0x1f/0x40
          ? __do_page_fault+0x2a0/0x550
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x23/0xc2
      
      SDM mentioned that "The MXCSR has several reserved bits, and attempting to write
      a 1 to any of these bits will cause a general-protection exception(#GP) to be
      generated". The syzkaller forks' testcase overrides xsave area w/ random values
      and steps on the reserved bits of MXCSR register. The damaged MXCSR register
      values of guest will be restored to SSEx MXCSR register before vmentry. This
      patch fixes it by catching userspace override MXCSR register reserved bits w/
      random values and bails out immediately.
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a575813b
    • D
      kvm: nVMX: off by one in vmx_write_pml_buffer() · 4769886b
      Dan Carpenter 提交于
      There are PML_ENTITY_NUM elements in the pml_address[] array so the >
      should be >= or we write beyond the end of the array when we do:
      
      	pml_address[vmcs12->guest_pml_index--] = gpa;
      
      Fixes: c5f983f6 ("nVMX: Implement emulated Page Modification Logging")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      4769886b
  4. 13 5月, 2017 1 次提交
  5. 11 5月, 2017 2 次提交
  6. 10 5月, 2017 3 次提交
    • N
      uapi: export all headers under uapi directories · fcc8487d
      Nicolas Dichtel 提交于
      Regularly, when a new header is created in include/uapi/, the developer
      forgets to add it in the corresponding Kbuild file. This error is usually
      detected after the release is out.
      
      In fact, all headers under uapi directories should be exported, thus it's
      useless to have an exhaustive list.
      
      After this patch, the following files, which were not exported, are now
      exported (with make headers_install_all):
      asm-arc/kvm_para.h
      asm-arc/ucontext.h
      asm-blackfin/shmparam.h
      asm-blackfin/ucontext.h
      asm-c6x/shmparam.h
      asm-c6x/ucontext.h
      asm-cris/kvm_para.h
      asm-h8300/shmparam.h
      asm-h8300/ucontext.h
      asm-hexagon/shmparam.h
      asm-m32r/kvm_para.h
      asm-m68k/kvm_para.h
      asm-m68k/shmparam.h
      asm-metag/kvm_para.h
      asm-metag/shmparam.h
      asm-metag/ucontext.h
      asm-mips/hwcap.h
      asm-mips/reg.h
      asm-mips/ucontext.h
      asm-nios2/kvm_para.h
      asm-nios2/ucontext.h
      asm-openrisc/shmparam.h
      asm-parisc/kvm_para.h
      asm-powerpc/perf_regs.h
      asm-sh/kvm_para.h
      asm-sh/ucontext.h
      asm-tile/shmparam.h
      asm-unicore32/shmparam.h
      asm-unicore32/ucontext.h
      asm-x86/hwcap2.h
      asm-xtensa/kvm_para.h
      drm/armada_drm.h
      drm/etnaviv_drm.h
      drm/vgem_drm.h
      linux/aspeed-lpc-ctrl.h
      linux/auto_dev-ioctl.h
      linux/bcache.h
      linux/btrfs_tree.h
      linux/can/vxcan.h
      linux/cifs/cifs_mount.h
      linux/coresight-stm.h
      linux/cryptouser.h
      linux/fsmap.h
      linux/genwqe/genwqe_card.h
      linux/hash_info.h
      linux/kcm.h
      linux/kcov.h
      linux/kfd_ioctl.h
      linux/lightnvm.h
      linux/module.h
      linux/nbd-netlink.h
      linux/nilfs2_api.h
      linux/nilfs2_ondisk.h
      linux/nsfs.h
      linux/pr.h
      linux/qrtr.h
      linux/rpmsg.h
      linux/sched/types.h
      linux/sed-opal.h
      linux/smc.h
      linux/smc_diag.h
      linux/stm.h
      linux/switchtec_ioctl.h
      linux/vfio_ccw.h
      linux/wil6210_uapi.h
      rdma/bnxt_re-abi.h
      
      Note that I have removed from this list the files which are generated in every
      exported directories (like .install or .install.cmd).
      
      Thanks to Julien Floret <julien.floret@6wind.com> for the tip to get all
      subdirs with a pure makefile command.
      
      For the record, note that exported files for asm directories are a mix of
      files listed by:
       - include/uapi/asm-generic/Kbuild.asm;
       - arch/<arch>/include/uapi/asm/Kbuild;
       - arch/<arch>/include/asm/Kbuild.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Acked-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Acked-by: NMark Salter <msalter@redhat.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      fcc8487d
    • N
      x86: stop exporting msr-index.h to userland · 25dc1d6c
      Nicolas Dichtel 提交于
      Even if this file was not in an uapi directory, it was exported because
      it was listed in the Kbuild file.
      
      Fixes: b72e7464 ("x86/uapi: Do not export <asm/msr-index.h> as part of the user API headers")
      Suggested-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      25dc1d6c
    • B
      x86, pmem: Fix cache flushing for iovec write < 8 bytes · 8376efd3
      Ben Hutchings 提交于
      Commit 11e63f6d added cache flushing for unaligned writes from an
      iovec, covering the first and last cache line of a >= 8 byte write and
      the first cache line of a < 8 byte write.  But an unaligned write of
      2-7 bytes can still cover two cache lines, so make sure we flush both
      in that case.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 11e63f6d ("x86, pmem: fix broken __copy_user_nocache ...")
      Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      8376efd3
  7. 09 5月, 2017 12 次提交
  8. 08 5月, 2017 2 次提交
    • X
      x86/kexec/64: Use gbpages for identity mappings if available · 8638100c
      Xunlei Pang 提交于
      Kexec sets up all identity mappings before booting into the new
      kernel, and this will cause extra memory consumption for paging
      structures which is quite considerable on modern machines with
      huge memory sizes.
      
      E.g. on a 32TB machine that is kdumping, it could waste around
      128MB (around 4MB/TB) from the reserved memory after kexec sets
      all the identity mappings using the current 2MB page.
      
      Add to that the memory needed for the loaded kdump kernel, initramfs,
      etc., and it causes a kexec syscall -NOMEM failure.
      
      As a result, we had to enlarge reserved memory via "crashkernel=X"
      to work around this problem.
      
      This causes some trouble for distributions that use policies
      to evaluate the proper "crashkernel=X" value for users.
      
      So enable gbpages for kexec mappings.
      Signed-off-by: NXunlei Pang <xlpang@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: akpm@linux-foundation.org
      Cc: kexec@lists.infradead.org
      Link: http://lkml.kernel.org/r/1493862171-8799-2-git-send-email-xlpang@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8638100c
    • X
      x86/mm: Add support for gbpages to kernel_ident_mapping_init() · 66aad4fd
      Xunlei Pang 提交于
      Kernel identity mappings on x86-64 kernels are created in two
      ways: by the early x86 boot code, or by kernel_ident_mapping_init().
      
      Native kernels (which is the dominant usecase) use the former,
      but the kexec and the hibernation code uses kernel_ident_mapping_init().
      
      There's a subtle difference between these two ways of how identity
      mappings are created, the current kernel_ident_mapping_init() code
      creates identity mappings always using 2MB page(PMD level) - while
      the native kernel boot path also utilizes gbpages where available.
      
      This difference is suboptimal both for performance and for memory
      usage: kernel_ident_mapping_init() needs to allocate pages for the
      page tables when creating the new identity mappings.
      
      This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init()
      to address these concerns.
      
      The primary advantage would be better TLB coverage/performance,
      because we'd utilize 1GB TLBs instead of 2MB ones.
      
      It is also useful for machines with large number of memory to
      save paging structure allocations(around 4MB/TB using 2MB page)
      when setting identity mappings for all the memory, after using
      1GB page it will consume only 8KB/TB.
      
      ( Note that this change alone does not activate gbpages in kexec,
        we are doing that in a separate patch. )
      Signed-off-by: NXunlei Pang <xlpang@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: akpm@linux-foundation.org
      Cc: kexec@lists.infradead.org
      Link: http://lkml.kernel.org/r/1493862171-8799-1-git-send-email-xlpang@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      66aad4fd
  9. 07 5月, 2017 1 次提交
    • K
      x86/boot: Declare error() as noreturn · 60854a12
      Kees Cook 提交于
      The compressed boot function error() is used to halt execution, but it
      wasn't marked with "noreturn". This fixes that in preparation for
      supporting kernel FORTIFY_SOURCE, which uses the noreturn annotation
      on panic, and calls error(). GCC would warn about a noreturn function
      calling a non-noreturn function:
      
        arch/x86/boot/compressed/misc.c: In function ‘fortify_panic’:
        arch/x86/boot/compressed/misc.c:416:1: warning: ‘noreturn’ function does return
         }
       ^
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/20170506045116.GA2879@beastSigned-off-by: NIngo Molnar <mingo@kernel.org>
      60854a12
  10. 05 5月, 2017 6 次提交
    • B
      xen/x86: Do not call xen_init_time_ops() until shared_info is initialized · d162809f
      Boris Ostrovsky 提交于
      Routines that are set by xen_init_time_ops() use shared_info's
      pvclock_vcpu_time_info area. This area is not properly available until
      shared_info is mapped in xen_setup_shared_info().
      
      This became especially problematic due to commit dd759d93 ("x86/timers:
      Add simple udelay calibration") where we end up reading tsc_to_system_mul
      from xen_dummy_shared_info (i.e. getting zero value) and then trying
      to divide by it in pvclock_tsc_khz().
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      d162809f
    • J
      x86/xen: fix xsave capability setting · 40f4ac0b
      Juergen Gross 提交于
      Commit 690b7f10b4f9f ("x86/xen: use capabilities instead of fake cpuid
      values for xsave") introduced a regression as it tried to make use of
      the fixup feature before it being available.
      
      Fall back to the old variant testing via cpuid().
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      40f4ac0b
    • J
      kvm: nVMX: Don't validate disabled secondary controls · 2e5b0bd9
      Jim Mattson 提交于
      According to the SDM, if the "activate secondary controls" primary
      processor-based VM-execution control is 0, no checks are performed on
      the secondary processor-based VM-execution controls.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2e5b0bd9
    • M
      x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility · 121843eb
      Matthias Kaehlcke 提交于
      The constraint "rm" allows the compiler to put mix_const into memory.
      When the input operand is a memory location then MUL needs an operand
      size suffix, since Clang can't infer the multiplication width from the
      operand.
      
      Add and use the _ASM_MUL macro which determines the operand size and
      resolves to the NUL instruction with the corresponding suffix.
      
      This fixes the following error when building with clang:
      
        CC      arch/x86/lib/kaslr.o
        /tmp/kaslr-dfe1ad.s: Assembler messages:
        /tmp/kaslr-dfe1ad.s:182: Error: no instruction mnemonic suffix given and no register operands; can't size instruction
      Signed-off-by: NMatthias Kaehlcke <mka@chromium.org>
      Cc: Grant Grundler <grundler@chromium.org>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Davidson <md@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170501224741.133938-1-mka@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      121843eb
    • B
      x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds() · fc5f9d5f
      Baoquan He 提交于
      Jeff Moyer reported that on his system with two memory regions 0~64G and
      1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling KASLR
      will make the system hang intermittently during boot. While adding 'nokaslr'
      won't.
      
      The back trace is:
      
       Oops: 0000 [#1] SMP
      
       RIP: memcpy_erms()
       [ .... ]
       Call Trace:
        pmem_rw_page()
        bdev_read_page()
        do_mpage_readpage()
        mpage_readpages()
        blkdev_readpages()
        __do_page_cache_readahead()
        force_page_cache_readahead()
        page_cache_sync_readahead()
        generic_file_read_iter()
        blkdev_read_iter()
        __vfs_read()
        vfs_read()
        SyS_read()
        entry_SYSCALL_64_fastpath()
      
      This crash happens because the for loop count calculation in sync_global_pgds()
      is not correct. When a mapping area crosses PGD entries, we should
      calculate the starting address of region which next PGD covers and assign
      it to next for loop count, but not add PGDIR_SIZE directly. The old
      code works right only if the mapping area is an exact multiple of PGDIR_SIZE,
      otherwize the end region could be skipped so that it can't be synchronized
      to all other processes from kernel PGD init_mm.pgd.
      
      In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than
      PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it
      makes this area be mapped inside one PGD entry. With KASLR enabled,
      this area could cross two PGD entries, then the next PGD entry won't
      be synced to all other processes. That is why we saw empty PGD.
      
      Fix it.
      Reported-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jinbum Park <jinb.park7@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1493864747-8506-1-git-send-email-bhe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fc5f9d5f
    • J
      x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic() · 42fc6c6c
      Josh Poimboeuf 提交于
      Andrey Konovalov reported the following warning while fuzzing the kernel
      with syzkaller:
      
        WARNING: kernel stack regs at ffff8800686869f8 in a.out:4933 has bad 'bp' value c3fc855a10167ec0
      
      The unwinder dump revealed that RBP had a bad value when an interrupt
      occurred in csum_partial_copy_generic().
      
      That function saves RBP on the stack and then overwrites it, using it as
      a scratch register.  That's problematic because it breaks stack traces
      if an interrupt occurs in the middle of the function.
      
      Replace the usage of RBP with another callee-saved register (R15) so
      stack traces are no longer affected.
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: David S . Miller <davem@davemloft.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: linux-sctp@vger.kernel.org
      Cc: netdev <netdev@vger.kernel.org>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Link: http://lkml.kernel.org/r/4b03a961efda5ec9bfe46b7b9c9ad72d1efad343.1493909486.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      42fc6c6c
  11. 04 5月, 2017 2 次提交
  12. 03 5月, 2017 3 次提交
  13. 02 5月, 2017 3 次提交
    • D
      KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING · 5c0aea0e
      David Hildenbrand 提交于
      We needed the lock to avoid racing with creation of the irqchip on x86. As
      kvm_set_irq_routing() calls srcu_synchronize_expedited(), this lock
      might be held for a longer time.
      
      Let's introduce an arch specific callback to check if we can actually
      add irq routes. For x86, all we have to do is check if we have an
      irqchip in the kernel. We don't need kvm->lock at that point as the
      irqchip is marked as inititalized only when actually fully created.
      Reported-by: NSteve Rutherford <srutherford@google.com>
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Fixes: 1df6dded ("KVM: x86: race between KVM_SET_GSI_ROUTING and KVM_CREATE_IRQCHIP")
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5c0aea0e
    • J
      xen: Implement EFI reset_system callback · e371fd76
      Julien Grall 提交于
      When rebooting DOM0 with ACPI on ARM64, the kernel is crashing with the stack
      trace [1].
      
      This is happening because when EFI runtimes are enabled, the reset code
      (see machine_restart) will first try to use EFI restart method.
      
      However, the EFI restart code is expecting the reset_system callback to
      be always set. This is not the case for Xen and will lead to crash.
      
      The EFI restart helper is used in multiple places and some of them don't
      not have fallback (see machine_power_off). So implement reset_system
      callback as a call to xen_reboot when using EFI Xen.
      
      [   36.999270] reboot: Restarting system
      [   37.002921] Internal error: Attempting to execute userspace memory: 86000004 [#1] PREEMPT SMP
      [   37.011460] Modules linked in:
      [   37.014598] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 4.11.0-rc1-00003-g1e248b60a39b-dirty #506
      [   37.023903] Hardware name: (null) (DT)
      [   37.027734] task: ffff800902068000 task.stack: ffff800902064000
      [   37.033739] PC is at 0x0
      [   37.036359] LR is at efi_reboot+0x94/0xd0
      [   37.040438] pc : [<0000000000000000>] lr : [<ffff00000880f2c4>] pstate: 404001c5
      [   37.047920] sp : ffff800902067cf0
      [   37.051314] x29: ffff800902067cf0 x28: ffff800902068000
      [   37.056709] x27: ffff000008992000 x26: 000000000000008e
      [   37.062104] x25: 0000000000000123 x24: 0000000000000015
      [   37.067499] x23: 0000000000000000 x22: ffff000008e6e250
      [   37.072894] x21: ffff000008e6e000 x20: 0000000000000000
      [   37.078289] x19: ffff000008e5d4c8 x18: 0000000000000010
      [   37.083684] x17: 0000ffffa7c27470 x16: 00000000deadbeef
      [   37.089079] x15: 0000000000000006 x14: ffff000088f42bef
      [   37.094474] x13: ffff000008f42bfd x12: ffff000008e706c0
      [   37.099870] x11: ffff000008e70000 x10: 0000000005f5e0ff
      [   37.105265] x9 : ffff800902067a50 x8 : 6974726174736552
      [   37.110660] x7 : ffff000008cc6fb8 x6 : ffff000008cc6fb0
      [   37.116055] x5 : ffff000008c97dd8 x4 : 0000000000000000
      [   37.121453] x3 : 0000000000000000 x2 : 0000000000000000
      [   37.126845] x1 : 0000000000000000 x0 : 0000000000000000
      [   37.132239]
      [   37.133808] Process systemd-shutdow (pid: 1, stack limit = 0xffff800902064000)
      [   37.141118] Stack: (0xffff800902067cf0 to 0xffff800902068000)
      [   37.146949] 7ce0:                                   ffff800902067d40 ffff000008085334
      [   37.154869] 7d00: 0000000000000000 ffff000008f3b000 ffff800902067d40 ffff0000080852e0
      [   37.162787] 7d20: ffff000008cc6fb0 ffff000008cc6fb8 ffff000008c7f580 ffff000008c97dd8
      [   37.170706] 7d40: ffff800902067d60 ffff0000080e2c2c 0000000000000000 0000000001234567
      [   37.178624] 7d60: ffff800902067d80 ffff0000080e2ee8 0000000000000000 ffff0000080e2df4
      [   37.186544] 7d80: 0000000000000000 ffff0000080830f0 0000000000000000 00008008ff1c1000
      [   37.194462] 7da0: ffffffffffffffff 0000ffffa7c4b1cc 0000000000000000 0000000000000024
      [   37.202380] 7dc0: ffff800902067dd0 0000000000000005 0000fffff24743c8 0000000000000004
      [   37.210299] 7de0: 0000fffff2475f03 0000000000000010 0000fffff2474418 0000000000000005
      [   37.218218] 7e00: 0000fffff2474578 000000000000000a 0000aaaad6b722c0 0000000000000001
      [   37.226136] 7e20: 0000000000000123 0000000000000038 ffff800902067e50 ffff0000081e7294
      [   37.234055] 7e40: ffff800902067e60 ffff0000081e935c ffff800902067e60 ffff0000081e9388
      [   37.241973] 7e60: ffff800902067eb0 ffff0000081ea388 0000000000000000 00008008ff1c1000
      [   37.249892] 7e80: ffffffffffffffff 0000ffffa7c4a79c 0000000000000000 ffff000000020000
      [   37.257810] 7ea0: 0000010000000004 0000000000000000 0000000000000000 ffff0000080830f0
      [   37.265729] 7ec0: fffffffffee1dead 0000000028121969 0000000001234567 0000000000000000
      [   37.273651] 7ee0: ffffffffffffffff 8080000000800000 0000800000008080 feffa9a9d4ff2d66
      [   37.281567] 7f00: 000000000000008e feffa9a9d5b60e0f 7f7fffffffff7f7f 0101010101010101
      [   37.289485] 7f20: 0000000000000010 0000000000000008 000000000000003a 0000ffffa7ccf588
      [   37.297404] 7f40: 0000aaaad6b87d00 0000ffffa7c4b1b0 0000fffff2474be0 0000aaaad6b88000
      [   37.305326] 7f60: 0000fffff2474fb0 0000000001234567 0000000000000000 0000000000000000
      [   37.313240] 7f80: 0000000000000000 0000000000000001 0000aaaad6b70d4d 0000000000000000
      [   37.321159] 7fa0: 0000000000000001 0000fffff2474ea0 0000aaaad6b5e2e0 0000fffff2474e80
      [   37.329078] 7fc0: 0000ffffa7c4b1cc 0000000000000000 fffffffffee1dead 000000000000008e
      [   37.336997] 7fe0: 0000000000000000 0000000000000000 9ce839cffee77eab fafdbf9f7ed57f2f
      [   37.344911] Call trace:
      [   37.347437] Exception stack(0xffff800902067b20 to 0xffff800902067c50)
      [   37.353970] 7b20: ffff000008e5d4c8 0001000000000000 0000000080f82000 0000000000000000
      [   37.361883] 7b40: ffff800902067b60 ffff000008e17000 ffff000008f44c68 00000001081081b4
      [   37.369802] 7b60: ffff800902067bf0 ffff000008108478 0000000000000000 ffff000008c235b0
      [   37.377721] 7b80: ffff800902067ce0 0000000000000000 0000000000000000 0000000000000015
      [   37.385643] 7ba0: 0000000000000123 000000000000008e ffff000008992000 ffff800902068000
      [   37.393557] 7bc0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [   37.401477] 7be0: 0000000000000000 ffff000008c97dd8 ffff000008cc6fb0 ffff000008cc6fb8
      [   37.409396] 7c00: 6974726174736552 ffff800902067a50 0000000005f5e0ff ffff000008e70000
      [   37.417318] 7c20: ffff000008e706c0 ffff000008f42bfd ffff000088f42bef 0000000000000006
      [   37.425234] 7c40: 00000000deadbeef 0000ffffa7c27470
      [   37.430190] [<          (null)>]           (null)
      [   37.434982] [<ffff000008085334>] machine_restart+0x6c/0x70
      [   37.440550] [<ffff0000080e2c2c>] kernel_restart+0x6c/0x78
      [   37.446030] [<ffff0000080e2ee8>] SyS_reboot+0x130/0x228
      [   37.451337] [<ffff0000080830f0>] el0_svc_naked+0x24/0x28
      [   37.456737] Code: bad PC value
      [   37.459891] ---[ end trace 76e2fc17e050aecd ]---
      Signed-off-by: NJulien Grall <julien.grall@arm.com>
      
      --
      
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      
      The x86 code has theoritically a similar issue, altought EFI does not
      seem to be the preferred method. I have only built test it on x86.
      
      This should also probably be fixed in stable tree.
      
          Changes in v2:
              - Implement xen_efi_reset_system using xen_reboot
              - Move xen_efi_reset_system in drivers/xen/efi.c
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      e371fd76
    • J
      xen: Export xen_reboot · 5d9404e1
      Julien Grall 提交于
      The helper xen_reboot will be called by the EFI code in a later patch.
      
      Note that the ARM version does not yet exist and will be added in a
      later patch too.
      Signed-off-by: NJulien Grall <julien.grall@arm.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      5d9404e1