1. 08 7月, 2016 4 次提交
    • T
      x86/mm: Refactor KASLR entropy functions · d899a7d1
      Thomas Garnier 提交于
      Move the KASLR entropy functions into arch/x86/lib to be used in early
      kernel boot for KASLR memory randomization.
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Cc: Alexander Popov <alpopov@ptsecurity.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lv Zheng <lv.zheng@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/1466556426-32664-2-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d899a7d1
    • B
      x86/KASLR: Fix boot crash with certain memory configurations · 6daa2ec0
      Baoquan He 提交于
      Ye Xiaolong reported this boot crash:
      
      |
      |  XZ-compressed data is corrupt
      |
      |   -- System halted
      |
      
      Fix the bug in mem_avoid_overlap() of finding the earliest overlap.
      Reported-and-tested-by: NYe Xiaolong <xiaolong.ye@intel.com>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6daa2ec0
    • D
      x86/vdso: Add mremap hook to vm_special_mapping · b059a453
      Dmitry Safonov 提交于
      Add possibility for 32-bit user-space applications to move
      the vDSO mapping.
      
      Previously, when a user-space app called mremap() for the vDSO
      address, in the syscall return path it would land on the previous
      address of the vDSOpage, resulting in segmentation violation.
      
      Now it lands fine and returns to userspace with a remapped vDSO.
      
      This will also fix the context.vdso pointer for 64-bit, which does
      not affect the user of vDSO after mremap() currently, but this
      may change in the future.
      
      As suggested by Andy, return -EINVAL for mremap() that would
      split the vDSO image: that operation cannot possibly result in
      a working system so reject it.
      
      Renamed and moved the text_mapping structure declaration inside
      map_vdso(), as it used only there and now it complements the
      vvar_mapping variable.
      
      There is still a problem for remapping the vDSO in glibc
      applications: the linker relocates addresses for syscalls
      on the vDSO page, so you need to relink with the new
      addresses.
      
      Without that the next syscall through glibc may fail:
      
        Program received signal SIGSEGV, Segmentation fault.
        #0  0xf7fd9b80 in __kernel_vsyscall ()
        #1  0xf7ec8238 in _exit () from /usr/lib32/libc.so.6
      Signed-off-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: 0x7f454c46@gmail.com
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160628113539.13606-2-dsafonov@virtuozzo.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b059a453
    • J
      x86/mm/pat, /dev/mem: Remove superfluous error message · 39380b80
      Jiri Kosina 提交于
      Currently it's possible for broken (or malicious) userspace to flood a
      kernel log indefinitely with messages a-la
      
      	Program dmidecode tried to access /dev/mem between f0000->100000
      
      because range_is_allowed() is case of CONFIG_STRICT_DEVMEM being turned on
      dumps this information each and every time devmem_is_allowed() fails.
      
      Reportedly userspace that is able to trigger contignuous flow of these
      messages exists.
      
      It would be possible to rate limit this message, but that'd have a
      questionable value; the administrator wouldn't get information about all
      the failing accessess, so then the information would be both superfluous
      and incomplete at the same time :)
      
      Returning EPERM (which is what is actually happening) is enough indication
      for userspace what has happened; no need to log this particular error as
      some sort of special condition.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1607081137020.24757@cbobk.fhfr.pmSigned-off-by: NIngo Molnar <mingo@kernel.org>
      39380b80
  2. 27 6月, 2016 7 次提交
    • Q
      KVM: nVMX: VMX instructions: fix segment checks when L1 is in long mode. · ff30ef40
      Quentin Casasnovas 提交于
      I couldn't get Xen to boot a L2 HVM when it was nested under KVM - it was
      getting a GP(0) on a rather unspecial vmread from Xen:
      
           (XEN) ----[ Xen-4.7.0-rc  x86_64  debug=n  Not tainted ]----
           (XEN) CPU:    1
           (XEN) RIP:    e008:[<ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450
           (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d1v0)
           (XEN) rax: ffff82d0801e6288   rbx: ffff83003ffbfb7c   rcx: fffffffffffab928
           (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: ffff83000bdd0000
           (XEN) rbp: ffff83000bdd0000   rsp: ffff83003ffbfab0   r8:  ffff830038813910
           (XEN) r9:  ffff83003faf3958   r10: 0000000a3b9f7640   r11: ffff83003f82d418
           (XEN) r12: 0000000000000000   r13: ffff83003ffbffff   r14: 0000000000004802
           (XEN) r15: 0000000000000008   cr0: 0000000080050033   cr4: 00000000001526e0
           (XEN) cr3: 000000003fc79000   cr2: 0000000000000000
           (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
           (XEN) Xen code around <ffff82d0801e629e> (vmx_get_segment_register+0x14e/0x450):
           (XEN)  00 00 41 be 02 48 00 00 <44> 0f 78 74 24 08 0f 86 38 56 00 00 b8 08 68 00
           (XEN) Xen stack trace from rsp=ffff83003ffbfab0:
      
           ...
      
           (XEN) Xen call trace:
           (XEN)    [<ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450
           (XEN)    [<ffff82d0801f3695>] get_page_from_gfn_p2m+0x165/0x300
           (XEN)    [<ffff82d0801bfe32>] hvmemul_get_seg_reg+0x52/0x60
           (XEN)    [<ffff82d0801bfe93>] hvm_emulate_prepare+0x53/0x70
           (XEN)    [<ffff82d0801ccacb>] handle_mmio+0x2b/0xd0
           (XEN)    [<ffff82d0801be591>] emulate.c#_hvm_emulate_one+0x111/0x2c0
           (XEN)    [<ffff82d0801cd6a4>] handle_hvm_io_completion+0x274/0x2a0
           (XEN)    [<ffff82d0801f334a>] __get_gfn_type_access+0xfa/0x270
           (XEN)    [<ffff82d08012f3bb>] timer.c#add_entry+0x4b/0xb0
           (XEN)    [<ffff82d08012f80c>] timer.c#remove_entry+0x7c/0x90
           (XEN)    [<ffff82d0801c8433>] hvm_do_resume+0x23/0x140
           (XEN)    [<ffff82d0801e4fe7>] vmx_do_resume+0xa7/0x140
           (XEN)    [<ffff82d080164aeb>] context_switch+0x13b/0xe40
           (XEN)    [<ffff82d080128e6e>] schedule.c#schedule+0x22e/0x570
           (XEN)    [<ffff82d08012c0cc>] softirq.c#__do_softirq+0x5c/0x90
           (XEN)    [<ffff82d0801602c5>] domain.c#idle_loop+0x25/0x50
           (XEN)
           (XEN)
           (XEN) ****************************************
           (XEN) Panic on CPU 1:
           (XEN) GENERAL PROTECTION FAULT
           (XEN) [error_code=0000]
           (XEN) ****************************************
      
      Tracing my host KVM showed it was the one injecting the GP(0) when
      emulating the VMREAD and checking the destination segment permissions in
      get_vmx_mem_address():
      
           3)               |    vmx_handle_exit() {
           3)               |      handle_vmread() {
           3)               |        nested_vmx_check_permission() {
           3)               |          vmx_get_segment() {
           3)   0.074 us    |            vmx_read_guest_seg_base();
           3)   0.065 us    |            vmx_read_guest_seg_selector();
           3)   0.066 us    |            vmx_read_guest_seg_ar();
           3)   1.636 us    |          }
           3)   0.058 us    |          vmx_get_rflags();
           3)   0.062 us    |          vmx_read_guest_seg_ar();
           3)   3.469 us    |        }
           3)               |        vmx_get_cs_db_l_bits() {
           3)   0.058 us    |          vmx_read_guest_seg_ar();
           3)   0.662 us    |        }
           3)               |        get_vmx_mem_address() {
           3)   0.068 us    |          vmx_cache_reg();
           3)               |          vmx_get_segment() {
           3)   0.074 us    |            vmx_read_guest_seg_base();
           3)   0.068 us    |            vmx_read_guest_seg_selector();
           3)   0.071 us    |            vmx_read_guest_seg_ar();
           3)   1.756 us    |          }
           3)               |          kvm_queue_exception_e() {
           3)   0.066 us    |            kvm_multiple_exception();
           3)   0.684 us    |          }
           3)   4.085 us    |        }
           3)   9.833 us    |      }
           3) + 10.366 us   |    }
      
      Cross-checking the KVM/VMX VMREAD emulation code with the Intel Software
      Developper Manual Volume 3C - "VMREAD - Read Field from Virtual-Machine
      Control Structure", I found that we're enforcing that the destination
      operand is NOT located in a read-only data segment or any code segment when
      the L1 is in long mode - BUT that check should only happen when it is in
      protected mode.
      
      Shuffling the code a bit to make our emulation follow the specification
      allows me to boot a Xen dom0 in a nested KVM and start HVM L2 guests
      without problems.
      
      Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
      Signed-off-by: NQuentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Eugene Korenevsky <ekorenevsky@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ff30ef40
    • M
      KVM: LAPIC: cap __delay at lapic_timer_advance_ns · b606f189
      Marcelo Tosatti 提交于
      The host timer which emulates the guest LAPIC TSC deadline
      timer has its expiration diminished by lapic_timer_advance_ns
      nanoseconds. Therefore if, at wait_lapic_expire, a difference
      larger than lapic_timer_advance_ns is encountered, delay at most
      lapic_timer_advance_ns.
      
      This fixes a problem where the guest can cause the host
      to delay for large amounts of time.
      Reported-by: NAlan Jenkins <alan.christopher.jenkins@gmail.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b606f189
    • M
      KVM: x86: move nsec_to_cycles from x86.c to x86.h · 8d93c874
      Marcelo Tosatti 提交于
      Move the inline function nsec_to_cycles from x86.c to x86.h, as
      the next patch uses it from lapic.c.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8d93c874
    • M
      pvclock: Get rid of __pvclock_read_cycles in function pvclock_read_flags · ed911b43
      Minfei Huang 提交于
      There is a generic function __pvclock_read_cycles to be used to get both
      flags and cycles. For function pvclock_read_flags, it's useless to get
      cycles value. To make this function be more effective, get this variable
      flags directly in function.
      Signed-off-by: NMinfei Huang <mnghuan@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ed911b43
    • M
      pvclock: Cleanup to remove function pvclock_get_nsec_offset · f7550d07
      Minfei Huang 提交于
      Function __pvclock_read_cycles is short enough, so there is no need to
      have another function pvclock_get_nsec_offset to calculate tsc delta.
      It's better to combine it into function __pvclock_read_cycles.
      
      Remove useless variables in function __pvclock_read_cycles.
      Signed-off-by: NMinfei Huang <mnghuan@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f7550d07
    • M
      pvclock: Add CPU barriers to get correct version value · 749d088b
      Minfei Huang 提交于
      Protocol for the "version" fields is: hypervisor raises it (making it
      uneven) before it starts updating the fields and raises it again (making
      it even) when it is done.  Thus the guest can make sure the time values
      it got are consistent by checking the version before and after reading
      them.
      
      Add CPU barries after getting version value just like what function
      vread_pvclock does, because all of callees in this function is inline.
      
      Fixes: 502dfeff
      Cc: stable@vger.kernel.org
      Signed-off-by: NMinfei Huang <mnghuan@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      749d088b
    • B
      x86/boot/64: Add forgotten end of function marker · dbf984d8
      Borislav Petkov 提交于
      Add secondary_startup_64()'s ENDPROC() marker.
      
      No functionality change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160625112457.16930-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dbf984d8
  3. 26 6月, 2016 6 次提交
    • Y
      x86/KASLR: Allow randomization below the load address · e066cc47
      Yinghai Lu 提交于
      Currently the kernel image physical address randomization's lower
      boundary is the original kernel load address.
      
      For bootloaders that load kernels into very high memory (e.g. kexec),
      this means randomization takes place in a very small window at the
      top of memory, ignoring the large region of physical memory below
      the load address.
      
      Since mem_avoid[] is already correctly tracking the regions that must be
      avoided, this patch changes the minimum address to whatever is less:
      512M (to conservatively avoid unknown things in lower memory) or the
      load address. Now, for example, if the kernel is loaded at 8G, [512M,
      8G) will be added to the list of possible physical memory positions.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      [ Rewrote the changelog, refactored the code to use min(). ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1464216334-17200-6-git-send-email-keescook@chromium.org
      [ Edited the changelog some more, plus the code comment as well. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e066cc47
    • K
      x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G · ed9f007e
      Kees Cook 提交于
      We want the physical address to be randomized anywhere between
      16MB and the top of physical memory (up to 64TB).
      
      This patch exchanges the prior slots[] array for the new slot_areas[]
      array, and lifts the limitation of KERNEL_IMAGE_SIZE on the physical
      address offset for 64-bit. As before, process_e820_entry() walks
      memory and populates slot_areas[], splitting on any detected mem_avoid
      collisions.
      
      Finally, since the slots[] array and its associated functions are not
      needed any more, so they are removed.
      
      Based on earlier patches by Baoquan He.
      
      Originally-from: Baoquan He <bhe@redhat.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1464216334-17200-5-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ed9f007e
    • B
      x86/KASLR: Randomize virtual address separately · 8391c73c
      Baoquan He 提交于
      The current KASLR implementation randomizes the physical and virtual
      addresses of the kernel together (both are offset by the same amount). It
      calculates the delta of the physical address where vmlinux was linked
      to load and where it is finally loaded. If the delta is not equal to 0
      (i.e. the kernel was relocated), relocation handling needs be done.
      
      On 64-bit, this patch randomizes both the physical address where kernel
      is decompressed and the virtual address where kernel text is mapped and
      will execute from. We now have two values being chosen, so the function
      arguments are reorganized to pass by pointer so they can be directly
      updated. Since relocation handling only depends on the virtual address,
      we must check the virtual delta, not the physical delta for processing
      kernel relocations. This also populates the page table for the new
      virtual address range. 32-bit does not support a separate virtual address,
      so it continues to use the physical offset for its virtual offset.
      
      Additionally updates the sanity checks done on the resulting kernel
      addresses since they are potentially separate now.
      
      [kees: rewrote changelog, limited virtual split to 64-bit only, update checks]
      [kees: fix CONFIG_RANDOMIZE_BASE=n boot failure]
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1464216334-17200-4-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8391c73c
    • K
      x86/KASLR: Clarify identity map interface · 11fdf97a
      Kees Cook 提交于
      This extracts the call to prepare_level4() into a top-level function
      that the user of the pagetable.c interface must call to initialize
      the new page tables. For clarity and to match the "finalize" function,
      it has been renamed to initialize_identity_maps(). This function also
      gains the initialization of mapping_info so we don't have to do it each
      time in add_identity_map().
      
      Additionally add copyright notice to the top, to make it clear that the
      bulk of the pagetable.c code was written by Yinghai, and that I just
      added bugs later. :)
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1464216334-17200-3-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      11fdf97a
    • K
      x86/boot: Refuse to build with data relocations · 98f78525
      Kees Cook 提交于
      The compressed kernel is built with -fPIC/-fPIE so that it can run in any
      location a bootloader happens to put it. However, since ELF relocation
      processing is not happening (and all the relocation information has
      already been stripped at link time), none of the code can use data
      relocations (e.g. static assignments of pointers). This is already noted
      in a warning comment at the top of misc.c, but this adds an explicit
      check for the condition during the linking stage to block any such bugs
      from appearing.
      
      If this was in place with the earlier bug in pagetable.c, the build
      would fail like this:
      
        ...
          CC      arch/x86/boot/compressed/pagetable.o
          DATAREL arch/x86/boot/compressed/vmlinux
        error: arch/x86/boot/compressed/pagetable.o has data relocations!
        make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1
        ...
      
      A clean build shows:
      
        ...
          CC      arch/x86/boot/compressed/pagetable.o
          DATAREL arch/x86/boot/compressed/vmlinux
          LD      arch/x86/boot/compressed/vmlinux
        ...
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1464216334-17200-2-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      98f78525
    • K
      x86/KASLR, x86/power: Remove x86 hibernation restrictions · 65fe935d
      Kees Cook 提交于
      With the following fix:
      
        70595b479ce1 ("x86/power/64: Fix crash whan the hibernation code passes control to the image kernel")
      
      ... there is no longer a problem with hibernation resuming a
      KASLR-booted kernel image, so remove the restriction.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Linux PM list <linux-pm@vger.kernel.org>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160613221002.GA29719@www.outflux.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      65fe935d
  4. 25 6月, 2016 4 次提交
    • M
      x86/efi: get rid of superfluous __GFP_REPEAT · f58f230a
      Michal Hocko 提交于
      __GFP_REPEAT has a rather weak semantic but since it has been introduced
      around 2.6.12 it has been ignored for low order allocations.
      
      efi_alloc_page_tables uses __GFP_REPEAT but it allocates an order-0
      page.  This means that this flag has never been actually useful here
      because it has always been used only for PAGE_ALLOC_COSTLY requests.
      
      Link: http://lkml.kernel.org/r/1464599699-30131-4-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f58f230a
    • M
      x86: get rid of superfluous __GFP_REPEAT · a3a9a59d
      Michal Hocko 提交于
      __GFP_REPEAT has a rather weak semantic but since it has been introduced
      around 2.6.12 it has been ignored for low order allocations.
      
      PGALLOC_GFP uses __GFP_REPEAT but none of the allocation which uses this
      flag is for more than order-0.  This means that this flag has never been
      actually useful here because it has always been used only for
      PAGE_ALLOC_COSTLY requests.
      
      Link: http://lkml.kernel.org/r/1464599699-30131-3-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a3a9a59d
    • M
      tree wide: get rid of __GFP_REPEAT for order-0 allocations part I · 32d6bd90
      Michal Hocko 提交于
      This is the third version of the patchset previously sent [1].  I have
      basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get
      rid of superfluous gfp flags" which went through dm tree.  I am sending
      it now because it is tree wide and chances for conflicts are reduced
      considerably when we want to target rc2.  I plan to send the next step
      and rename the flag and move to a better semantic later during this
      release cycle so we will have a new semantic ready for 4.8 merge window
      hopefully.
      
      Motivation:
      
      While working on something unrelated I've checked the current usage of
      __GFP_REPEAT in the tree.  It seems that a majority of the usage is and
      always has been bogus because __GFP_REPEAT has always been about costly
      high order allocations while we are using it for order-0 or very small
      orders very often.  It seems that a big pile of them is just a
      copy&paste when a code has been adopted from one arch to another.
      
      I think it makes some sense to get rid of them because they are just
      making the semantic more unclear.  Please note that GFP_REPEAT is
      documented as
      
      * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt
      
      * _might_ fail.  This depends upon the particular VM implementation.
        while !costly requests have basically nofail semantic.  So one could
        reasonably expect that order-0 request with __GFP_REPEAT will not loop
        for ever.  This is not implemented right now though.
      
      I would like to move on with __GFP_REPEAT and define a better semantic
      for it.
      
        $ git grep __GFP_REPEAT origin/master | wc -l
        111
        $ git grep __GFP_REPEAT | wc -l
        36
      
      So we are down to the third after this patch series.  The remaining
      places really seem to be relying on __GFP_REPEAT due to large allocation
      requests.  This still needs some double checking which I will do later
      after all the simple ones are sorted out.
      
      I am touching a lot of arch specific code here and I hope I got it right
      but as a matter of fact I even didn't compile test for some archs as I
      do not have cross compiler for them.  Patches should be quite trivial to
      review for stupid compile mistakes though.  The tricky parts are usually
      hidden by macro definitions and thats where I would appreciate help from
      arch maintainers.
      
      [1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org
      
      This patch (of 19):
      
      __GFP_REPEAT has a rather weak semantic but since it has been introduced
      around 2.6.12 it has been ignored for low order allocations.  Yet we
      have the full kernel tree with its usage for apparently order-0
      allocations.  This is really confusing because __GFP_REPEAT is
      explicitly documented to allow allocation failures which is a weaker
      semantic than the current order-0 has (basically nofail).
      
      Let's simply drop __GFP_REPEAT from those places.  This would allow to
      identify place which really need allocator to retry harder and formulate
      a more specific semantic for what the flag is supposed to do actually.
      
      Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: John Crispin <blogic@openwrt.org>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32d6bd90
    • L
      x86: fix up a few misc stack pointer vs thread_info confusions · aca9c293
      Linus Torvalds 提交于
      As the actual pointer value is the same for the thread stack allocation
      and the thread_info, code that confused the two worked fine, but will
      break when the thread info is moved away from the stack allocation.  It
      also looks very confusing.
      
      For example, the kprobe code wanted to know the current top of stack.
      To do that, it used this:
      
      	(unsigned long)current_thread_info() + THREAD_SIZE
      
      which did indeed give the correct value.  But it's not only a fairly
      nonsensical expression, it's also rather complex, especially since we
      actually have this:
      
      	static inline unsigned long current_top_of_stack(void)
      
      which not only gives us the value we are interested in, but happens to
      be how "current_thread_info()" is currently defined as:
      
      	(struct thread_info *)(current_top_of_stack() - THREAD_SIZE);
      
      so using current_thread_info() to figure out the top of the stack really
      is a very round-about thing to do.
      
      The other cases are just simpler confusion about task_thread_info() vs
      task_stack_page(), which currently return the same pointer - but if you
      want the stack page, you really should be using the latter one.
      
      And there was one entirely unused assignment of the current stack to a
      thread_info pointer.
      
      All cleaned up to make more sense today, and make it easier to move the
      thread_info away from the stack in the future.
      
      No semantic changes.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aca9c293
  5. 24 6月, 2016 1 次提交
  6. 23 6月, 2016 2 次提交
  7. 18 6月, 2016 1 次提交
    • W
      isa: Allow ISA-style drivers on modern systems · 3a495511
      William Breathitt Gray 提交于
      Several modern devices, such as PC/104 cards, are expected to run on
      modern systems via an ISA bus interface. Since ISA is a legacy interface
      for most modern architectures, ISA support should remain disabled in
      general. Support for ISA-style drivers should be enabled on a per driver
      basis.
      
      To allow ISA-style drivers on modern systems, this patch introduces the
      ISA_BUS_API and ISA_BUS Kconfig options. The ISA bus driver will now
      build conditionally on the ISA_BUS_API Kconfig option, which defaults to
      the legacy ISA Kconfig option. The ISA_BUS Kconfig option allows the
      ISA_BUS_API Kconfig option to be selected on architectures which do not
      enable ISA (e.g. X86_64).
      
      The ISA_BUS Kconfig option is currently only implemented for X86
      architectures. Other architectures may have their own ISA_BUS Kconfig
      options added as required.
      Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NWilliam Breathitt Gray <vilhelm.gray@gmail.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a495511
  8. 16 6月, 2016 3 次提交
  9. 14 6月, 2016 1 次提交
    • M
      kprobes/x86: Clear TF bit in fault on single-stepping · dcfc4724
      Masami Hiramatsu 提交于
      Fix kprobe_fault_handler() to clear the TF (trap flag) bit of
      the flags register in the case of a fault fixup on single-stepping.
      
      If we put a kprobe on the instruction which caused a
      page fault (e.g. actual mov instructions in copy_user_*),
      that fault happens on the single-stepping buffer. In this
      case, kprobes resets running instance so that the CPU can
      retry execution on the original ip address.
      
      However, current code forgets to reset the TF bit. Since this
      fault happens with TF bit set for enabling single-stepping,
      when it retries, it causes a debug exception and kprobes
      can not handle it because it already reset itself.
      
      On the most of x86-64 platform, it can be easily reproduced
      by using kprobe tracer. E.g.
      
        # cd /sys/kernel/debug/tracing
        # echo p copy_user_enhanced_fast_string+5 > kprobe_events
        # echo 1 > events/kprobes/enable
      
      And you'll see a kernel panic on do_debug(), since the debug
      trap is not handled by kprobes.
      
      To fix this problem, we just need to clear the TF bit when
      resetting running kprobe.
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NAnanth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: systemtap@sourceware.org
      Cc: stable@vger.kernel.org # All the way back to ancient kernels
      Link: http://lkml.kernel.org/r/20160611140648.25885.37482.stgit@devbox
      [ Updated the comments. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      dcfc4724
  10. 10 6月, 2016 2 次提交
  11. 08 6月, 2016 3 次提交
    • B
      x86/cpu/AMD: Extend X86_FEATURE_TOPOEXT workaround to newer models · 96685a55
      Borislav Petkov 提交于
      We need to reenable the topology extensions CPUID leafs on newer models
      too, if BIOS has disabled them, as we rely on them to get proper compute
      unit topology.
      
      Make the printk a once thing, while at it.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rui Huang <ray.huang@amd.com>
      Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-hwmon@vger.kernel.org
      Link: http://lkml.kernel.org/r/1464775468-23355-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      96685a55
    • D
      x86/cpu/intel: Introduce macros for Intel family numbers · 970442c5
      Dave Hansen 提交于
      Problem:
      
      We have a boatload of open-coded family-6 model numbers.  Half of
      them have these model numbers in hex and the other half in
      decimal.  This makes grepping for them tons of fun, if you were
      to try.
      
      Solution:
      
      Consolidate all the magic numbers.  Put all the definitions in
      one header.
      
      The names here are closely derived from the comments describing
      the models from arch/x86/events/intel/core.c.  We could easily
      make them shorter by doing things like s/SANDYBRIDGE/SNB/, but
      they seemed fine even with the longer versions to me.
      
      Do not take any of these names too literally, like "DESKTOP"
      or "MOBILE".  These are all colloquial names and not precise
      descriptions of everywhere a given model will show up.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Darren Hart <dvhart@infradead.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Doug Thompson <dougthompson@xmission.com>
      Cc: Eduardo Valentin <edubezval@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com>
      Cc: Souvik Kumar Chakravarty <souvik.k.chakravarty@intel.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Ulf Hansson <ulf.hansson@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Vishwanath Somayaji <vishwanath.somayaji@intel.com>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: jacob.jun.pan@intel.com
      Cc: linux-acpi@vger.kernel.org
      Cc: linux-edac@vger.kernel.org
      Cc: linux-mmc@vger.kernel.org
      Cc: linux-pm@vger.kernel.org
      Cc: platform-driver-x86@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160603001927.F2A7D828@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      970442c5
    • H
      x86, build: copy ldlinux.c32 to image.iso · 9c77679c
      H. Peter Anvin 提交于
      For newer versions of Syslinux, we need ldlinux.c32 in addition to
      isolinux.bin to reside on the boot disk, so if the latter is found,
      copy it, too, to the isoimage tree.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Linux Stable Tree <stable@vger.kernel.org>
      9c77679c
  12. 06 6月, 2016 1 次提交
  13. 03 6月, 2016 1 次提交
  14. 02 6月, 2016 4 次提交
    • P
      KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS · d14bdb55
      Paolo Bonzini 提交于
      MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to
      any of bits 63:32.  However, this is not detected at KVM_SET_DEBUGREGS
      time, and the next KVM_RUN oopses:
      
         general protection fault: 0000 [#1] SMP
         CPU: 2 PID: 14987 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1
         Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
         [...]
         Call Trace:
          [<ffffffffa072c93d>] kvm_arch_vcpu_ioctl_run+0x141d/0x14e0 [kvm]
          [<ffffffffa071405d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm]
          [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480
          [<ffffffff812418a9>] SyS_ioctl+0x79/0x90
          [<ffffffff817a0f2e>] entry_SYSCALL_64_fastpath+0x12/0x71
         Code: 55 83 ff 07 48 89 e5 77 27 89 ff ff 24 fd 90 87 80 81 0f 23 fe 5d c3 0f 23 c6 5d c3 0f 23 ce 5d c3 0f 23 d6 5d c3 0f 23 de 5d c3 <0f> 23 f6 5d c3 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
         RIP  [<ffffffff810639eb>] native_set_debugreg+0x2b/0x40
          RSP <ffff88005836bd50>
      
      Testcase (beautified/reduced from syzkaller output):
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
      
          long r[8];
      
          int main()
          {
              struct kvm_debugregs dr = { 0 };
      
              r[2] = open("/dev/kvm", O_RDONLY);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
              r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
      
              memcpy(&dr,
                     "\x5d\x6a\x6b\xe8\x57\x3b\x4b\x7e\xcf\x0d\xa1\x72"
                     "\xa3\x4a\x29\x0c\xfc\x6d\x44\x00\xa7\x52\xc7\xd8"
                     "\x00\xdb\x89\x9d\x78\xb5\x54\x6b\x6b\x13\x1c\xe9"
                     "\x5e\xd3\x0e\x40\x6f\xb4\x66\xf7\x5b\xe3\x36\xcb",
                     48);
              r[7] = ioctl(r[4], KVM_SET_DEBUGREGS, &dr);
              r[6] = ioctl(r[4], KVM_RUN, 0);
          }
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      d14bdb55
    • P
      KVM: fail KVM_SET_VCPU_EVENTS with invalid exception number · 78e546c8
      Paolo Bonzini 提交于
      This cannot be returned by KVM_GET_VCPU_EVENTS, so it is okay to return
      EINVAL.  It causes a WARN from exception_type:
      
          WARNING: CPU: 3 PID: 16732 at arch/x86/kvm/x86.c:345 exception_type+0x49/0x50 [kvm]()
          CPU: 3 PID: 16732 Comm: a.out Tainted: G        W       4.4.6-300.fc23.x86_64 #1
          Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
           0000000000000286 000000006308a48b ffff8800bec7fcf8 ffffffff813b542e
           0000000000000000 ffffffffa0966496 ffff8800bec7fd30 ffffffff810a40f2
           ffff8800552a8000 0000000000000000 00000000002c267c 0000000000000001
          Call Trace:
           [<ffffffff813b542e>] dump_stack+0x63/0x85
           [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0
           [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20
           [<ffffffffa0924809>] exception_type+0x49/0x50 [kvm]
           [<ffffffffa0934622>] kvm_arch_vcpu_ioctl_run+0x10a2/0x14e0 [kvm]
           [<ffffffffa091c04d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm]
           [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480
           [<ffffffff812414a9>] SyS_ioctl+0x79/0x90
           [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71
          ---[ end trace b1a0391266848f50 ]---
      
      Testcase (beautified/reduced from syzkaller output):
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
          #include <linux/kvm.h>
      
          long r[31];
      
          int main()
          {
              memset(r, -1, sizeof(r));
              r[2] = open("/dev/kvm", O_RDONLY);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
              r[7] = ioctl(r[3], KVM_CREATE_VCPU, 0);
      
              struct kvm_vcpu_events ve = {
                      .exception.injected = 1,
                      .exception.nr = 0xd4
              };
              r[27] = ioctl(r[7], KVM_SET_VCPU_EVENTS, &ve);
              r[30] = ioctl(r[7], KVM_RUN, 0);
              return 0;
          }
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      78e546c8
    • P
      KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID · 83676e92
      Paolo Bonzini 提交于
      This causes an ugly dmesg splat.  Beautified syzkaller testcase:
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <sys/ioctl.h>
          #include <fcntl.h>
          #include <linux/kvm.h>
      
          long r[8];
      
          int main()
          {
              struct kvm_cpuid2 c = { 0 };
              r[2] = open("/dev/kvm", O_RDWR);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
              r[4] = ioctl(r[3], KVM_CREATE_VCPU, 0x8);
              r[7] = ioctl(r[4], KVM_SET_CPUID, &c);
              return 0;
          }
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      83676e92
    • P
      kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDR · b21629da
      Paolo Bonzini 提交于
      Found by syzkaller:
      
          WARNING: CPU: 3 PID: 15175 at arch/x86/kvm/x86.c:7705 __x86_set_memory_region+0x1dc/0x1f0 [kvm]()
          CPU: 3 PID: 15175 Comm: a.out Tainted: G        W       4.4.6-300.fc23.x86_64 #1
          Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
           0000000000000286 00000000950899a7 ffff88011ab3fbf0 ffffffff813b542e
           0000000000000000 ffffffffa0966496 ffff88011ab3fc28 ffffffff810a40f2
           00000000000001fd 0000000000003000 ffff88014fc50000 0000000000000000
          Call Trace:
           [<ffffffff813b542e>] dump_stack+0x63/0x85
           [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0
           [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20
           [<ffffffffa09251cc>] __x86_set_memory_region+0x1dc/0x1f0 [kvm]
           [<ffffffffa092521b>] x86_set_memory_region+0x3b/0x60 [kvm]
           [<ffffffffa09bb61c>] vmx_set_tss_addr+0x3c/0x150 [kvm_intel]
           [<ffffffffa092f4d4>] kvm_arch_vm_ioctl+0x654/0xbc0 [kvm]
           [<ffffffffa091d31a>] kvm_vm_ioctl+0x9a/0x6f0 [kvm]
           [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480
           [<ffffffff812414a9>] SyS_ioctl+0x79/0x90
           [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71
      
      Testcase:
      
          #include <unistd.h>
          #include <sys/ioctl.h>
          #include <fcntl.h>
          #include <string.h>
          #include <linux/kvm.h>
      
          long r[8];
      
          int main()
          {
              memset(r, -1, sizeof(r));
      	r[2] = open("/dev/kvm", O_RDONLY|O_TRUNC);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul);
              r[5] = ioctl(r[3], KVM_SET_TSS_ADDR, 0x20000000ul);
              r[7] = ioctl(r[3], KVM_SET_TSS_ADDR, 0x20000000ul);
              return 0;
          }
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b21629da