1. 03 2月, 2017 9 次提交
    • J
      KVM: MIPS: Remove duplicated ASIDs from vcpu · c550d539
      James Hogan 提交于
      The kvm_vcpu_arch structure contains both mm_structs for allocating MMU
      contexts (primarily the ASID) but it also copies the resulting ASIDs
      into guest_{user,kernel}_asid[] arrays which are referenced from uasm
      generated code.
      
      This duplication doesn't seem to serve any purpose, and it gets in the
      way of generalising the ASID handling across guest kernel/user modes, so
      lets just extract the ASID straight out of the mm_struct on demand, and
      in fact there are convenient cpu_context() and cpu_asid() macros for
      doing so.
      
      To reduce the verbosity of this code we do also add kern_mm and user_mm
      local variables where the kernel and user mm_structs are used.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      c550d539
    • J
      KVM: MIPS/MMU: Move preempt/ASID handling to implementation · 1581ff3d
      James Hogan 提交于
      The MIPS KVM host and guest GVA ASIDs may need regenerating when
      scheduling a process in guest context, which is done from the
      kvm_arch_vcpu_load() / kvm_arch_vcpu_put() functions in mmu.c.
      
      However this is a fairly implementation specific detail. VZ for example
      may use GuestIDs instead of normal ASIDs to distinguish mappings
      belonging to different guests, and even on VZ without GuestID the root
      TLB will be used differently to trap & emulate.
      
      Trap & emulate GVA ASIDs only relate to the user part of the full
      address space, so can be left active during guest exit handling (guest
      context) to allow guest instructions to be easily read and translated.
      
      VZ root ASIDs however are for GPA mappings so can't be left active
      during normal kernel code. They also aren't useful for accessing guest
      virtual memory, and we should have CP0_BadInstr[P] registers available
      to provide encodings of trapping guest instructions anyway.
      
      Therefore move the ASID preemption handling into the implementation
      callback.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      1581ff3d
    • J
      KVM: MIPS: Convert get/set_regs -> vcpu_load/put · a60b8438
      James Hogan 提交于
      Convert the get_regs() and set_regs() callbacks to vcpu_load() and
      vcpu_put(), which provide a cpu argument and more closely match the
      kvm_arch_vcpu_load() / kvm_arch_vcpu_put() that they are called by.
      
      This is in preparation for moving ASID management into the
      implementations.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      a60b8438
    • J
      KVM: MIPS/MMU: Simplify ASID restoration · 1534b396
      James Hogan 提交于
      KVM T&E uses an ASID for guest kernel mode and an ASID for guest user
      mode. The current ASID is saved when the guest is scheduled out, and
      restored when scheduling back in, with checks for whether the ASID needs
      to be regenerated.
      
      This isn't really necessary as the ASID can be easily determined by the
      current guest mode, so lets simplify it to just read the required ASID
      from guest_kernel_asid or guest_user_asid even if the ASID hasn't been
      regenerated.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      1534b396
    • J
      KVM: MIPS: Drop partial KVM_NMI implementation · 00104b41
      James Hogan 提交于
      MIPS incompletely implements the KVM_NMI ioctl to supposedly perform a
      CPU reset, but all it actually does is invalidate the ASIDs. It doesn't
      expose the KVM_CAP_USER_NMI capability which is supposed to indicate the
      presence of the KVM_NMI ioctl, and no user software actually uses it on
      MIPS.
      
      Since this is dead code that would technically need updating for GVA
      page table handling in upcoming patches, remove it now. If we wanted to
      implement NMI injection later it can always be done properly along with
      the KVM_CAP_USER_NMI capability, and if we wanted to implement a proper
      CPU reset it would be better done with a separate ioctl.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      00104b41
    • J
      MIPS: Add return errors to protected cache ops · 7170bdc7
      James Hogan 提交于
      The protected cache ops contain no out of line fixup code to return an
      error code in the event of a fault, with the cache op being skipped in
      that case. For KVM however we'd like to detect this case as page
      faulting will be disabled so it could happen during normal operation if
      the GVA page tables were flushed, and need to be handled by the caller.
      
      Add the out-of-line fixup code to load the error value -EFAULT into the
      return variable, and adapt the protected cache line functions to pass
      the error back to the caller.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      7170bdc7
    • J
      MIPS: Export some tlbex internals for KVM to use · 722b4544
      James Hogan 提交于
      Export to TLB exception code generating functions so that KVM can
      construct a fast TLB refill handler for guest context without
      reinventing the wheel quite so much.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      722b4544
    • J
      MIPS: uasm: Add include guards in asm/uasm.h · 93a93c24
      James Hogan 提交于
      Add include guards in asm/uasm.h to allow it to be safely used by a new
      header asm/tlbex.h in the next patch to expose TLB exception building
      functions for KVM to use.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      93a93c24
    • J
      MIPS: Export pgd/pmd symbols for KVM · ccf01516
      James Hogan 提交于
      Export pmd_init(), invalid_pmd_table and tlbmiss_handler_setup_pgd to
      GPL kernel modules so that MIPS KVM can use the inline page table
      management functions and switch between page tables:
      
      - pmd_init() will be used directly by KVM to initialise newly allocated
        pmd tables with invalid lower level table pointers.
      
      - invalid_pmd_table is used by pud_present(), pud_none(), and
        pud_clear(), which KVM will use to test and clear pud entries.
      
      - tlbmiss_handler_setup_pgd() will be called by KVM entry code to switch
        to the appropriate GVA page tables.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      ccf01516
  2. 02 2月, 2017 2 次提交
  3. 21 1月, 2017 1 次提交
  4. 18 1月, 2017 1 次提交
    • P
      kvm: x86: Expose Intel VPOPCNTDQ feature to guest · a17f3227
      Piotr Luc 提交于
      Vector population count instructions for dwords and qwords are to be
      used in future Intel Xeon & Xeon Phi processors. The bit 14 of
      CPUID[level:0x07, ECX] indicates that the new instructions are
      supported by a processor.
      
      The spec can be found in the Intel Software Developer Manual (SDM)
      or in the Instruction Set Extensions Programming Reference (ISE).
      Signed-off-by: NPiotr Luc <piotr.luc@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a17f3227
  5. 17 1月, 2017 1 次提交
  6. 14 1月, 2017 5 次提交
    • P
      efi/x86: Prune invalid memory map entries and fix boot regression · 0100a3e6
      Peter Jones 提交于
      Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW
      (2.28), include memory map entries with phys_addr=0x0 and num_pages=0.
      
      These machines fail to boot after the following commit,
      
        commit 8e80632f ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
      
      Fix this by removing such bogus entries from the memory map.
      
      Furthermore, currently the log output for this case (with efi=debug)
      looks like:
      
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |  |  |  |  |  ] range=[0x0000000000000000-0xffffffffffffffff] (0MB)
      
      This is clearly wrong, and also not as informative as it could be.  This
      patch changes it so that if we find obviously invalid memory map
      entries, we print an error and skip those entries.  It also detects the
      display of the address range calculation overflow, so the new output is:
      
       [    0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |   |  |  |  |  ] range=[0x0000000000000000-0x0000000000000000] (invalid)
      
      It also detects memory map sizes that would overflow the physical
      address, for example phys_addr=0xfffffffffffff000 and
      num_pages=0x0200000000000001, and prints:
      
       [    0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |   |  |  |  |  ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid)
      
      It then removes these entries from the memory map.
      Signed-off-by: NPeter Jones <pjones@redhat.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      [ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT]
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      [Matt: Include bugzilla info in commit log]
      Cc: <stable@vger.kernel.org> # v4.9+
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0100a3e6
    • J
      perf/x86: Reject non sampling events with precise_ip · 18e7a45a
      Jiri Olsa 提交于
      As Peter suggested [1] rejecting non sampling PEBS events,
      because they dont make any sense and could cause bugs
      in the NMI handler [2].
      
        [1] http://lkml.kernel.org/r/20170103094059.GC3093@worktop
        [2] http://lkml.kernel.org/r/1482931866-6018-3-git-send-email-jolsa@kernel.orgSigned-off-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20170103142454.GA26251@kravaSigned-off-by: NIngo Molnar <mingo@kernel.org>
      18e7a45a
    • J
      perf/x86/intel: Account interrupts for PEBS errors · 475113d9
      Jiri Olsa 提交于
      It's possible to set up PEBS events to get only errors and not
      any data, like on SNB-X (model 45) and IVB-EP (model 62)
      via 2 perf commands running simultaneously:
      
          taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10
      
      This leads to a soft lock up, because the error path of the
      intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt
      for error PEBS interrupts, so in case you're getting ONLY
      errors you don't have a way to stop the event when it's over
      the max_samples_per_tick limit:
      
        NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816]
        ...
        RIP: 0010:[<ffffffff81159232>]  [<ffffffff81159232>] smp_call_function_single+0xe2/0x140
        ...
        Call Trace:
         ? trace_hardirqs_on_caller+0xf5/0x1b0
         ? perf_cgroup_attach+0x70/0x70
         perf_install_in_context+0x199/0x1b0
         ? ctx_resched+0x90/0x90
         SYSC_perf_event_open+0x641/0xf90
         SyS_perf_event_open+0x9/0x10
         do_syscall_64+0x6c/0x1f0
         entry_SYSCALL64_slow_path+0x25/0x25
      
      Add perf_event_account_interrupt() which does the interrupt
      and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s
      error path.
      
      We keep the pending_kill and pending_wakeup logic only in the
      __perf_event_overflow() path, because they make sense only if
      there's any data to deliver.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      475113d9
    • T
      x86/mpx: Use compatible types in comparison to fix sparse error · 45382862
      Tobias Klauser 提交于
      info->si_addr is of type void __user *, so it should be compared against
      something from the same address space.
      
      This fixes the following sparse error:
      
        arch/x86/mm/mpx.c:296:27: error: incompatible types in comparison expression (different address spaces)
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      45382862
    • L
      x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc() · 695085b4
      Len Brown 提交于
      The Intel Denverton microserver uses a 25 MHz TSC crystal,
      so we can derive its exact [*] TSC frequency
      using CPUID and some arithmetic, eg.:
      
        TSC: 1800 MHz (25000000 Hz * 216 / 3 / 1000000)
      
      [*] 'exact' is only as good as the crystal, which should be +/- 20ppm
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/306899f94804aece6d8fa8b4223ede3b48dbb59c.1484287748.git.len.brown@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      695085b4
  7. 13 1月, 2017 1 次提交
  8. 12 1月, 2017 8 次提交
    • P
      KVM: x86: fix emulation of "MOV SS, null selector" · 33ab9110
      Paolo Bonzini 提交于
      This is CVE-2017-2583.  On Intel this causes a failed vmentry because
      SS's type is neither 3 nor 7 (even though the manual says this check is
      only done for usable SS, and the dmesg splat says that SS is unusable!).
      On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.
      
      The fix fabricates a data segment descriptor when SS is set to a null
      selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
      Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
      this in turn ensures CPL < 3 because RPL must be equal to CPL.
      
      Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
      the bug and deciphering the manuals.
      Reported-by: NXiaohan Zhang <zhangxiaohan1@huawei.com>
      Fixes: 79d5b4c3
      Cc: stable@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      33ab9110
    • W
      KVM: x86: fix NULL deref in vcpu_scan_ioapic · 546d87e5
      Wanpeng Li 提交于
      Reported by syzkaller:
      
          BUG: unable to handle kernel NULL pointer dereference at 00000000000001b0
          IP: _raw_spin_lock+0xc/0x30
          PGD 3e28eb067
          PUD 3f0ac6067
          PMD 0
          Oops: 0002 [#1] SMP
          CPU: 0 PID: 2431 Comm: test Tainted: G           OE   4.10.0-rc1+ #3
          Call Trace:
           ? kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
           kvm_arch_vcpu_ioctl_run+0x10a8/0x15f0 [kvm]
           ? pick_next_task_fair+0xe1/0x4e0
           ? kvm_arch_vcpu_load+0xea/0x260 [kvm]
           kvm_vcpu_ioctl+0x33a/0x600 [kvm]
           ? hrtimer_try_to_cancel+0x29/0x130
           ? do_nanosleep+0x97/0xf0
           do_vfs_ioctl+0xa1/0x5d0
           ? __hrtimer_init+0x90/0x90
           ? do_nanosleep+0x5b/0xf0
           SyS_ioctl+0x79/0x90
           do_syscall_64+0x6e/0x180
           entry_SYSCALL64_slow_path+0x25/0x25
          RIP: _raw_spin_lock+0xc/0x30 RSP: ffffa43688973cc0
      
      The syzkaller folks reported a NULL pointer dereference due to
      ENABLE_CAP succeeding even without an irqchip.  The Hyper-V
      synthetic interrupt controller is activated, resulting in a
      wrong request to rescan the ioapic and a NULL pointer dereference.
      
          #include <sys/ioctl.h>
          #include <sys/mman.h>
          #include <sys/types.h>
          #include <linux/kvm.h>
          #include <pthread.h>
          #include <stddef.h>
          #include <stdint.h>
          #include <stdlib.h>
          #include <string.h>
          #include <unistd.h>
      
          #ifndef KVM_CAP_HYPERV_SYNIC
          #define KVM_CAP_HYPERV_SYNIC 123
          #endif
      
          void* thr(void* arg)
          {
      	struct kvm_enable_cap cap;
      	cap.flags = 0;
      	cap.cap = KVM_CAP_HYPERV_SYNIC;
      	ioctl((long)arg, KVM_ENABLE_CAP, &cap);
      	return 0;
          }
      
          int main()
          {
      	void *host_mem = mmap(0, 0x1000, PROT_READ|PROT_WRITE,
      			MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
      	int kvmfd = open("/dev/kvm", 0);
      	int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
      	struct kvm_userspace_memory_region memreg;
      	memreg.slot = 0;
      	memreg.flags = 0;
      	memreg.guest_phys_addr = 0;
      	memreg.memory_size = 0x1000;
      	memreg.userspace_addr = (unsigned long)host_mem;
      	host_mem[0] = 0xf4;
      	ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, &memreg);
      	int cpufd = ioctl(vmfd, KVM_CREATE_VCPU, 0);
      	struct kvm_sregs sregs;
      	ioctl(cpufd, KVM_GET_SREGS, &sregs);
      	sregs.cr0 = 0;
      	sregs.cr4 = 0;
      	sregs.efer = 0;
      	sregs.cs.selector = 0;
      	sregs.cs.base = 0;
      	ioctl(cpufd, KVM_SET_SREGS, &sregs);
      	struct kvm_regs regs = { .rflags = 2 };
      	ioctl(cpufd, KVM_SET_REGS, &regs);
      	ioctl(vmfd, KVM_CREATE_IRQCHIP, 0);
      	pthread_t th;
      	pthread_create(&th, 0, thr, (void*)(long)cpufd);
      	usleep(rand() % 10000);
      	ioctl(cpufd, KVM_RUN, 0);
      	pthread_join(th, 0);
      	return 0;
          }
      
      This patch fixes it by failing ENABLE_CAP if without an irqchip.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Fixes: 5c919412 (kvm/x86: Hyper-V synthetic interrupt controller)
      Cc: stable@vger.kernel.org # 4.5+
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      546d87e5
    • S
      KVM: x86: Introduce segmented_write_std · 129a72a0
      Steve Rutherford 提交于
      Introduces segemented_write_std.
      
      Switches from emulated reads/writes to standard read/writes in fxsave,
      fxrstor, sgdt, and sidt.  This fixes CVE-2017-2584, a longstanding
      kernel memory leak.
      
      Since commit 283c95d0 ("KVM: x86: emulate FXSAVE and FXRSTOR",
      2016-11-09), which is luckily not yet in any final release, this would
      also be an exploitable kernel memory *write*!
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: 96051572
      Fixes: 283c95d0Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSteve Rutherford <srutherford@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      129a72a0
    • D
      KVM: x86: flush pending lapic jump label updates on module unload · cef84c30
      David Matlack 提交于
      KVM's lapic emulation uses static_key_deferred (apic_{hw,sw}_disabled).
      These are implemented with delayed_work structs which can still be
      pending when the KVM module is unloaded. We've seen this cause kernel
      panics when the kvm_intel module is quickly reloaded.
      
      Use the new static_key_deferred_flush() API to flush pending updates on
      module unload.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cef84c30
    • J
      x86/entry: Fix the end of the stack for newly forked tasks · ff3f7e24
      Josh Poimboeuf 提交于
      When unwinding a task, the end of the stack is always at the same offset
      right below the saved pt_regs, regardless of which syscall was used to
      enter the kernel.  That convention allows the unwinder to verify that a
      stack is sane.
      
      However, newly forked tasks don't always follow that convention, as
      reported by the following unwinder warning seen by Dave Jones:
      
        WARNING: kernel stack frame pointer at ffffc90001443f30 in kworker/u8:8:30468 has bad value           (null)
      
      The warning was due to the following call chain:
      
        (ftrace handler)
        call_usermodehelper_exec_async+0x5/0x140
        ret_from_fork+0x22/0x30
      
      The problem is that ret_from_fork() doesn't create a stack frame before
      calling other functions.  Fix that by carefully using the frame pointer
      macros.
      
      In addition to conforming to the end of stack convention, this also
      makes related stack traces more sensible by making it clear to the user
      that ret_from_fork() was involved.
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/8854cdaab980e9700a81e9ebf0d4238e4bbb68ef.1483978430.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ff3f7e24
    • J
      x86/unwind: Include __schedule() in stack traces · 2c96b2fe
      Josh Poimboeuf 提交于
      In the following commit:
      
        0100301b ("sched/x86: Rewrite the switch_to() code")
      
      ... the layout of the 'inactive_task_frame' struct was designed to have
      a frame pointer header embedded in it, so that the unwinder could use
      the 'bp' and 'ret_addr' fields to report __schedule() on the stack (or
      ret_from_fork() for newly forked tasks which haven't actually run yet).
      
      Finish the job by changing get_frame_pointer() to return a pointer to
      inactive_task_frame's 'bp' field rather than 'bp' itself.  This allows
      the unwinder to start one frame higher on the stack, so that it properly
      reports __schedule().
      Reported-by: NMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/598e9f7505ed0aba86e8b9590aa528c6c7ae8dcd.1483978430.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2c96b2fe
    • J
      x86/unwind: Disable KASAN checks for non-current tasks · 84936118
      Josh Poimboeuf 提交于
      There are a handful of callers to save_stack_trace_tsk() and
      show_stack() which try to unwind the stack of a task other than current.
      In such cases, it's remotely possible that the task is running on one
      CPU while the unwinder is reading its stack from another CPU, causing
      the unwinder to see stack corruption.
      
      These cases seem to be mostly harmless.  The unwinder has checks which
      prevent it from following bad pointers beyond the bounds of the stack.
      So it's not really a bug as long as the caller understands that
      unwinding another task will not always succeed.
      
      In such cases, it's possible that the unwinder may read a KASAN-poisoned
      region of the stack.  Account for that by using READ_ONCE_NOCHECK() when
      reading the stack of another task.
      
      Use READ_ONCE() when reading the stack of the current task, since KASAN
      warnings can still be useful for finding bugs in that case.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/4c575eb288ba9f73d498dfe0acde2f58674598f1.1483978430.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      84936118
    • J
      x86/unwind: Silence warnings for non-current tasks · 900742d8
      Josh Poimboeuf 提交于
      There are a handful of callers to save_stack_trace_tsk() and
      show_stack() which try to unwind the stack of a task other than current.
      In such cases, it's remotely possible that the task is running on one
      CPU while the unwinder is reading its stack from another CPU, causing
      the unwinder to see stack corruption.
      
      These cases seem to be mostly harmless.  The unwinder has checks which
      prevent it from following bad pointers beyond the bounds of the stack.
      So it's not really a bug as long as the caller understands that
      unwinding another task will not always succeed.
      
      Since stack "corruption" on another task's stack isn't necessarily a
      bug, silence the warnings when unwinding tasks other than current.
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/00d8c50eea3446c1524a2a755397a3966629354c.1483978430.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      900742d8
  9. 11 1月, 2017 3 次提交
  10. 10 1月, 2017 5 次提交
  11. 09 1月, 2017 4 次提交