1. 28 2月, 2019 1 次提交
  2. 26 2月, 2019 1 次提交
  3. 15 2月, 2019 1 次提交
  4. 14 2月, 2019 1 次提交
  5. 10 2月, 2019 1 次提交
    • J
      x86/mm: Make set_pmd_at() paravirt aware · 20e55bc1
      Juergen Gross 提交于
      set_pmd_at() calls native_set_pmd() unconditionally on x86. This was
      fine as long as only huge page entries were written via set_pmd_at(),
      as Xen pv guests don't support those.
      
      Commit 2c91bd4a ("mm: speed up mremap by 20x on large regions")
      introduced a usage of set_pmd_at() possible on pv guests, leading to
      failures like:
      
      BUG: unable to handle kernel paging request at ffff888023e26778
      #PF error: [PROT] [WRITE]
      RIP: e030:move_page_tables+0x7c1/0xae0
      move_vma.isra.3+0xd1/0x2d0
      __se_sys_mremap+0x3c6/0x5b0
       do_syscall_64+0x49/0x100
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Make set_pmd_at() paravirt aware by just letting it use set_pmd().
      
      Fixes: 2c91bd4a ("mm: speed up mremap by 20x on large regions")
      Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: xen-devel@lists.xenproject.org
      Cc: boris.ostrovsky@oracle.com
      Cc: sstabellini@kernel.org
      Cc: hpa@zytor.com
      Cc: bp@alien8.de
      Cc: torvalds@linux-foundation.org
      Link: https://lkml.kernel.org/r/20190210074056.11842-1-jgross@suse.com
      20e55bc1
  6. 02 2月, 2019 1 次提交
    • J
      x86/resctrl: Avoid confusion over the new X86_RESCTRL config · e6d42931
      Johannes Weiner 提交于
      "Resource Control" is a very broad term for this CPU feature, and a term
      that is also associated with containers, cgroups etc. This can easily
      cause confusion.
      
      Make the user prompt more specific. Match the config symbol name.
      
       [ bp: In the future, the corresponding ARM arch-specific code will be
         under ARM_CPU_RESCTRL and the arch-agnostic bits will be carved out
         under the CPU_RESCTRL umbrella symbol. ]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Babu Moger <Babu.Moger@amd.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: linux-doc@vger.kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Reinette Chatre <reinette.chatre@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190130195621.GA30653@cmpxchg.org
      e6d42931
  7. 29 1月, 2019 1 次提交
  8. 15 1月, 2019 1 次提交
    • D
      x86/pkeys: Properly copy pkey state at fork() · a31e184e
      Dave Hansen 提交于
      Memory protection key behavior should be the same in a child as it was
      in the parent before a fork.  But, there is a bug that resets the
      state in the child at fork instead of preserving it.
      
      The creation of new mm's is a bit convoluted.  At fork(), the code
      does:
      
        1. memcpy() the parent mm to initialize child
        2. mm_init() to initalize some select stuff stuff
        3. dup_mmap() to create true copies that memcpy() did not do right
      
      For pkeys two bits of state need to be preserved across a fork:
      'execute_only_pkey' and 'pkey_allocation_map'.
      
      Those are preserved by the memcpy(), but mm_init() invokes
      init_new_context() which overwrites 'execute_only_pkey' and
      'pkey_allocation_map' with "new" values.
      
      The author of the code erroneously believed that init_new_context is *only*
      called at execve()-time.  But, alas, init_new_context() is used at execve()
      and fork().
      
      The result is that, after a fork(), the child's pkey state ends up looking
      like it does after an execve(), which is totally wrong.  pkeys that are
      already allocated can be allocated again, for instance.
      
      To fix this, add code called by dup_mmap() to copy the pkey state from
      parent to child explicitly.  Also add a comment above init_new_context() to
      make it more clear to the next poor sod what this code is used for.
      
      Fixes: e8c24d3a ("x86/pkeys: Allocation/free syscalls")
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: bp@alien8.de
      Cc: hpa@zytor.com
      Cc: peterz@infradead.org
      Cc: mpe@ellerman.id.au
      Cc: will.deacon@arm.com
      Cc: luto@kernel.org
      Cc: jroedel@suse.de
      Cc: stable@vger.kernel.org
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Link: https://lkml.kernel.org/r/20190102215655.7A69518C@viggo.jf.intel.com
      a31e184e
  9. 09 1月, 2019 1 次提交
  10. 06 1月, 2019 1 次提交
    • M
      jump_label: move 'asm goto' support test to Kconfig · e9666d10
      Masahiro Yamada 提交于
      Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
      
      The jump label is controlled by HAVE_JUMP_LABEL, which is defined
      like this:
      
        #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
        # define HAVE_JUMP_LABEL
        #endif
      
      We can improve this by testing 'asm goto' support in Kconfig, then
      make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
      
      Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
      match to the real kernel capability.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      e9666d10
  11. 05 1月, 2019 6 次提交
    • L
      x86: re-introduce non-generic memcpy_{to,from}io · 170d13ca
      Linus Torvalds 提交于
      This has been broken forever, and nobody ever really noticed because
      it's purely a performance issue.
      
      Long long ago, in commit 6175ddf0 ("x86: Clean up mem*io functions")
      Brian Gerst simplified the memory copies to and from iomem, since on
      x86, the instructions to access iomem are exactly the same as the
      regular instructions.
      
      That is technically true, and things worked, and nobody said anything.
      Besides, back then the regular memcpy was pretty simple and worked fine.
      
      Nobody noticed except for David Laight, that is.  David has a testing a
      TLP monitor he was writing for an FPGA, and has been occasionally
      complaining about how memcpy_toio() writes things one byte at a time.
      
      Which is completely unacceptable from a performance standpoint, even if
      it happens to technically work.
      
      The reason it's writing one byte at a time is because while it's
      technically true that accesses to iomem are the same as accesses to
      regular memory on x86, the _granularity_ (and ordering) of accesses
      matter to iomem in ways that they don't matter to regular cached memory.
      
      In particular, when ERMS is set, we default to using "rep movsb" for
      larger memory copies.  That is indeed perfectly fine for real memory,
      since the whole point is that the CPU is going to do cacheline
      optimizations and executes the memory copy efficiently for cached
      memory.
      
      With iomem? Not so much.  With iomem, "rep movsb" will indeed work, but
      it will copy things one byte at a time. Slowly and ponderously.
      
      Now, originally, back in 2010 when commit 6175ddf0 was done, we
      didn't use ERMS, and this was much less noticeable.
      
      Our normal memcpy() was simpler in other ways too.
      
      Because in fact, it's not just about using the string instructions.  Our
      memcpy() these days does things like "read and write overlapping values"
      to handle the last bytes of the copy.  Again, for normal memory,
      overlapping accesses isn't an issue.  For iomem? It can be.
      
      So this re-introduces the specialized memcpy_toio(), memcpy_fromio() and
      memset_io() functions.  It doesn't particularly optimize them, but it
      tries to at least not be horrid, or do overlapping accesses.  In fact,
      this uses the existing __inline_memcpy() function that we still had
      lying around that uses our very traditional "rep movsl" loop followed by
      movsw/movsb for the final bytes.
      
      Somebody may decide to try to improve on it, but if we've gone almost a
      decade with only one person really ever noticing and complaining, maybe
      it's not worth worrying about further, once it's not _completely_ broken?
      Reported-by: NDavid Laight <David.Laight@aculab.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      170d13ca
    • L
      Use __put_user_goto in __put_user_size() and unsafe_put_user() · a959dc88
      Linus Torvalds 提交于
      This actually enables the __put_user_goto() functionality in
      unsafe_put_user().
      
      For an example of the effect of this, this is the code generated for the
      
              unsafe_put_user(signo, &infop->si_signo, Efault);
      
      in the waitid() system call:
      
      	movl %ecx,(%rbx)        # signo, MEM[(struct __large_struct *)_2]
      
      It's just one single store instruction, along with generating an
      exception table entry pointing to the Efault label case in case that
      instruction faults.
      
      Before, we would generate this:
      
      	xorl    %edx, %edx
      	movl %ecx,(%rbx)        # signo, MEM[(struct __large_struct *)_3]
              testl   %edx, %edx
              jne     .L309
      
      with the exception table generated for that 'mov' instruction causing us
      to jump to a stub that set %edx to -EFAULT and then jumped back to the
      'testl' instruction.
      
      So not only do we now get rid of the extra code in the normal sequence,
      we also avoid unnecessarily keeping that extra error register live
      across it all.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a959dc88
    • L
      x86 uaccess: Introduce __put_user_goto · 4a789213
      Linus Torvalds 提交于
      This is finally the actual reason for the odd error handling in the
      "unsafe_get/put_user()" functions, introduced over three years ago.
      
      Using a "jump to error label" interface is somewhat odd, but very
      convenient as a programming interface, and more importantly, it fits
      very well with simply making the target be the exception handler address
      directly from the inline asm.
      
      The reason it took over three years to actually do this? We need "asm
      goto" support for it, which only became the default on x86 last year.
      It's now been a year that we've forced asm goto support (see commit
      e501ce95 "x86: Force asm-goto"), and so let's just do it here too.
      
      [ Side note: this commit was originally done back in 2016. The above
        commentary about timing is obviously about it only now getting merged
        into my real upstream tree     - Linus ]
      
      Sadly, gcc still only supports "asm goto" with asms that do not have any
      outputs, so we are limited to only the put_user case for this.  Maybe in
      several more years we can do the get_user case too.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a789213
    • J
      mm: treewide: remove unused address argument from pte_alloc functions · 4cf58924
      Joel Fernandes (Google) 提交于
      Patch series "Add support for fast mremap".
      
      This series speeds up the mremap(2) syscall by copying page tables at
      the PMD level even for non-THP systems.  There is concern that the extra
      'address' argument that mremap passes to pte_alloc may do something
      subtle architecture related in the future that may make the scheme not
      work.  Also we find that there is no point in passing the 'address' to
      pte_alloc since its unused.  This patch therefore removes this argument
      tree-wide resulting in a nice negative diff as well.  Also ensuring
      along the way that the enabled architectures do not do anything funky
      with the 'address' argument that goes unnoticed by the optimization.
      
      Build and boot tested on x86-64.  Build tested on arm64.  The config
      enablement patch for arm64 will be posted in the future after more
      testing.
      
      The changes were obtained by applying the following Coccinelle script.
      (thanks Julia for answering all Coccinelle questions!).
      Following fix ups were done manually:
      * Removal of address argument from  pte_fragment_alloc
      * Removal of pte_alloc_one_fast definitions from m68k and microblaze.
      
      // Options: --include-headers --no-includes
      // Note: I split the 'identifier fn' line, so if you are manually
      // running it, please unsplit it so it runs for you.
      
      virtual patch
      
      @pte_alloc_func_def depends on patch exists@
      identifier E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      type T2;
      @@
      
       fn(...
      - , T2 E2
       )
       { ... }
      
      @pte_alloc_func_proto_noarg depends on patch exists@
      type T1, T2, T3, T4;
      identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1, T2);
      + T3 fn(T1);
      |
      - T3 fn(T1, T2, T4);
      + T3 fn(T1, T2);
      )
      
      @pte_alloc_func_proto depends on patch exists@
      identifier E1, E2, E4;
      type T1, T2, T3, T4;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1 E1, T2 E2);
      + T3 fn(T1 E1);
      |
      - T3 fn(T1 E1, T2 E2, T4 E4);
      + T3 fn(T1 E1, T2 E2);
      )
      
      @pte_alloc_func_call depends on patch exists@
      expression E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
       fn(...
      -,  E2
       )
      
      @pte_alloc_macro depends on patch exists@
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      identifier a, b, c;
      expression e;
      position p;
      @@
      
      (
      - #define fn(a, b, c) e
      + #define fn(a, b) e
      |
      - #define fn(a, b) e
      + #define fn(a) e
      )
      
      Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
      Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Julia Lawall <Julia.Lawall@lip6.fr>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4cf58924
    • M
      fls: change parameter to unsigned int · 3fc2579e
      Matthew Wilcox 提交于
      When testing in userspace, UBSAN pointed out that shifting into the sign
      bit is undefined behaviour.  It doesn't really make sense to ask for the
      highest set bit of a negative value, so just turn the argument type into
      an unsigned int.
      
      Some architectures (eg ppc) already had it declared as an unsigned int,
      so I don't expect too many problems.
      
      Link: http://lkml.kernel.org/r/20181105221117.31828-1-willy@infradead.orgSigned-off-by: NMatthew Wilcox <willy@infradead.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3fc2579e
    • L
      make 'user_access_begin()' do 'access_ok()' · 594cc251
      Linus Torvalds 提交于
      Originally, the rule used to be that you'd have to do access_ok()
      separately, and then user_access_begin() before actually doing the
      direct (optimized) user access.
      
      But experience has shown that people then decide not to do access_ok()
      at all, and instead rely on it being implied by other operations or
      similar.  Which makes it very hard to verify that the access has
      actually been range-checked.
      
      If you use the unsafe direct user accesses, hardware features (either
      SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged
      Access Never - on ARM) do force you to use user_access_begin().  But
      nothing really forces the range check.
      
      By putting the range check into user_access_begin(), we actually force
      people to do the right thing (tm), and the range check vill be visible
      near the actual accesses.  We have way too long a history of people
      trying to avoid them.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      594cc251
  12. 04 1月, 2019 1 次提交
    • L
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds 提交于
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  13. 29 12月, 2018 1 次提交
  14. 21 12月, 2018 13 次提交
    • R
      a0aea130
    • S
      KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup · e8143499
      Sean Christopherson 提交于
      ____kvm_handle_fault_on_reboot() provides a generic exception fixup
      handler that is used to cleanly handle faults on VMX/SVM instructions
      during reboot (or at least try to).  If there isn't a reboot in
      progress, ____kvm_handle_fault_on_reboot() treats any exception as
      fatal to KVM and invokes kvm_spurious_fault(), which in turn generates
      a BUG() to get a stack trace and die.
      
      When it was originally added by commit 4ecac3fd ("KVM: Handle
      virtualization instruction #UD faults during reboot"), the "call" to
      kvm_spurious_fault() was handcoded as PUSH+JMP, where the PUSH'd value
      is the RIP of the faulting instructing.
      
      The PUSH+JMP trickery is necessary because the exception fixup handler
      code lies outside of its associated function, e.g. right after the
      function.  An actual CALL from the .fixup code would show a slightly
      bogus stack trace, e.g. an extra "random" function would be inserted
      into the trace, as the return RIP on the stack would point to no known
      function (and the unwinder will likely try to guess who owns the RIP).
      
      Unfortunately, the JMP was replaced with a CALL when the macro was
      reworked to not spin indefinitely during reboot (commit b7c4145b
      "KVM: Don't spin on virt instruction faults during reboot").  This
      causes the aforementioned behavior where a bogus function is inserted
      into the stack trace, e.g. my builds like to blame free_kvm_area().
      
      Revert the CALL back to a JMP.  The changelog for commit b7c4145b
      ("KVM: Don't spin on virt instruction faults during reboot") contains
      nothing that indicates the switch to CALL was deliberate.  This is
      backed up by the fact that the PUSH <insn RIP> was left intact.
      
      Note that an alternative to the PUSH+JMP magic would be to JMP back
      to the "real" code and CALL from there, but that would require adding
      a JMP in the non-faulting path to avoid calling kvm_spurious_fault()
      and would add no value, i.e. the stack trace would be the same.
      
      Using CALL:
      
      ------------[ cut here ]------------
      kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356!
      invalid opcode: 0000 [#1] SMP
      CPU: 4 PID: 1057 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm]
      Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
      RSP: 0018:ffffc900004bbcc8 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffff888273fd8000 R08: 00000000000003e8 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000371fb0
      R13: 0000000000000000 R14: 000000026d763cf4 R15: ffff888273fd8000
      FS:  00007f3d69691700(0000) GS:ffff888277800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055f89bc56fe0 CR3: 0000000271a5a001 CR4: 0000000000362ee0
      Call Trace:
       free_kvm_area+0x1044/0x43ea [kvm_intel]
       ? vmx_vcpu_run+0x156/0x630 [kvm_intel]
       ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm]
       ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
       ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
       ? __set_task_blocked+0x38/0x90
       ? __set_current_blocked+0x50/0x60
       ? __fpu__restore_sig+0x97/0x490
       ? do_vfs_ioctl+0xa1/0x620
       ? __x64_sys_futex+0x89/0x180
       ? ksys_ioctl+0x66/0x70
       ? __x64_sys_ioctl+0x16/0x20
       ? do_syscall_64+0x4f/0x100
       ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc
      ---[ end trace 9775b14b123b1713 ]---
      
      Using JMP:
      
      ------------[ cut here ]------------
      kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356!
      invalid opcode: 0000 [#1] SMP
      CPU: 6 PID: 1067 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm]
      Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
      RSP: 0018:ffffc90000497cd0 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffff88827058bd40 R08: 00000000000003e8 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000369fb0
      R13: 0000000000000000 R14: 00000003c8fc6642 R15: ffff88827058bd40
      FS:  00007f3d7219e700(0000) GS:ffff888277900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f3d64001000 CR3: 0000000271c6b004 CR4: 0000000000362ee0
      Call Trace:
       vmx_vcpu_run+0x156/0x630 [kvm_intel]
       ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm]
       ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
       ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
       ? __set_task_blocked+0x38/0x90
       ? __set_current_blocked+0x50/0x60
       ? __fpu__restore_sig+0x97/0x490
       ? do_vfs_ioctl+0xa1/0x620
       ? __x64_sys_futex+0x89/0x180
       ? ksys_ioctl+0x66/0x70
       ? __x64_sys_ioctl+0x16/0x20
       ? do_syscall_64+0x4f/0x100
       ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc
      ---[ end trace f9daedb85ab3ddba ]---
      
      Fixes: b7c4145b ("KVM: Don't spin on virt instruction faults during reboot")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e8143499
    • U
      KVM/x86: Use SVM assembly instruction mnemonics instead of .byte streams · ac5ffda2
      Uros Bizjak 提交于
      Recently the minimum required version of binutils was changed to 2.20,
      which supports all SVM instruction mnemonics. The patch removes
      all .byte #defines and uses real instruction mnemonics instead.
      Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ac5ffda2
    • L
      KVM: Make kvm_set_spte_hva() return int · 748c0e31
      Lan Tianyu 提交于
      The patch is to make kvm_set_spte_hva() return int and caller can
      check return value to determine flush tlb or not.
      Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com>
      Acked-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      748c0e31
    • L
      x86/hyper-v: Add HvFlushGuestAddressList hypercall support · cc4edae4
      Lan Tianyu 提交于
      Hyper-V provides HvFlushGuestAddressList() hypercall to flush EPT tlb
      with specified ranges. This patch is to add the hypercall support.
      Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cc4edae4
    • L
      KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops · a49b9635
      Lan Tianyu 提交于
      Add flush range call back in the kvm_x86_ops and platform can use it
      to register its associated function. The parameter "kvm_tlb_range"
      accepts a single range and flush list which contains a list of ranges.
      Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a49b9635
    • C
      KVM: x86: Add Intel Processor Trace cpuid emulation · 86f5201d
      Chao Peng 提交于
      Expose Intel Processor Trace to guest only when
      the PT works in Host-Guest mode.
      Signed-off-by: NChao Peng <chao.p.peng@linux.intel.com>
      Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      86f5201d
    • C
      KVM: x86: Add Intel PT virtualization work mode · f99e3daf
      Chao Peng 提交于
      Intel Processor Trace virtualization can be work in one
      of 2 possible modes:
      
      a. System-Wide mode (default):
         When the host configures Intel PT to collect trace packets
         of the entire system, it can leave the relevant VMX controls
         clear to allow VMX-specific packets to provide information
         across VMX transitions.
         KVM guest will not aware this feature in this mode and both
         host and KVM guest trace will output to host buffer.
      
      b. Host-Guest mode:
         Host can configure trace-packet generation while in
         VMX non-root operation for guests and root operation
         for native executing normally.
         Intel PT will be exposed to KVM guest in this mode, and
         the trace output to respective buffer of host and guest.
         In this mode, tht status of PT will be saved and disabled
         before VM-entry and restored after VM-exit if trace
         a virtual machine.
      Signed-off-by: NChao Peng <chao.p.peng@linux.intel.com>
      Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f99e3daf
    • L
      perf/x86/intel/pt: add new capability for Intel PT · e0018afe
      Luwei Kang 提交于
      This adds support for "output to Trace Transport subsystem"
      capability of Intel PT. It means that PT can output its
      trace to an MMIO address range rather than system memory buffer.
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e0018afe
    • L
      perf/x86/intel/pt: Add new bit definitions for PT MSRs · 69843a91
      Luwei Kang 提交于
      Add bit definitions for Intel PT MSRs to support trace output
      directed to the memeory subsystem and holds a count if packet
      bytes that have been sent out.
      
      These are required by the upcoming PT support in KVM guests
      for MSRs read/write emulation.
      Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      69843a91
    • L
      perf/x86/intel/pt: Introduce intel_pt_validate_cap() · 61be2998
      Luwei Kang 提交于
      intel_pt_validate_hw_cap() validates whether a given PT capability is
      supported by the hardware. It checks the PT capability array which
      reflects the capabilities of the hardware on which the code is executed.
      
      For setting up PT for KVM guests this is not correct as the capability
      array for the guest can be different from the host array.
      
      Provide a new function to check against a given capability array.
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      61be2998
    • C
      perf/x86/intel/pt: Export pt_cap_get() · f6d079ce
      Chao Peng 提交于
      pt_cap_get() is required by the upcoming PT support in KVM guests.
      
      Export it and move the capabilites enum to a global header.
      
      As a global functions, "pt_*" is already used for ptrace and
      other things, so it makes sense to use "intel_pt_*" as a prefix.
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NChao Peng <chao.p.peng@linux.intel.com>
      Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f6d079ce
    • C
      perf/x86/intel/pt: Move Intel PT MSRs bit defines to global header · 887eda13
      Chao Peng 提交于
      The Intel Processor Trace (PT) MSR bit defines are in a private
      header. The upcoming support for PT virtualization requires these defines
      to be accessible from KVM code.
      
      Move them to the global MSR header file.
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NChao Peng <chao.p.peng@linux.intel.com>
      Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      887eda13
  15. 20 12月, 2018 1 次提交
  16. 19 12月, 2018 8 次提交