1. 23 6月, 2017 16 次提交
  2. 19 6月, 2017 1 次提交
    • H
      mm: larger stack guard gap, between vmas · 1be7107f
      Hugh Dickins 提交于
      Stack guard page is a useful feature to reduce a risk of stack smashing
      into a different mapping. We have been using a single page gap which
      is sufficient to prevent having stack adjacent to a different mapping.
      But this seems to be insufficient in the light of the stack usage in
      userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
      used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
      which is 256kB or stack strings with MAX_ARG_STRLEN.
      
      This will become especially dangerous for suid binaries and the default
      no limit for the stack size limit because those applications can be
      tricked to consume a large portion of the stack and a single glibc call
      could jump over the guard page. These attacks are not theoretical,
      unfortunatelly.
      
      Make those attacks less probable by increasing the stack guard gap
      to 1MB (on systems with 4k pages; but make it depend on the page size
      because systems with larger base pages might cap stack allocations in
      the PAGE_SIZE units) which should cover larger alloca() and VLA stack
      allocations. It is obviously not a full fix because the problem is
      somehow inherent, but it should reduce attack space a lot.
      
      One could argue that the gap size should be configurable from userspace,
      but that can be done later when somebody finds that the new 1MB is wrong
      for some special case applications.  For now, add a kernel command line
      option (stack_guard_gap) to specify the stack gap size (in page units).
      
      Implementation wise, first delete all the old code for stack guard page:
      because although we could get away with accounting one extra page in a
      stack vma, accounting a larger gap can break userspace - case in point,
      a program run with "ulimit -S -v 20000" failed when the 1MB gap was
      counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
      and strict non-overcommit mode.
      
      Instead of keeping gap inside the stack vma, maintain the stack guard
      gap as a gap between vmas: using vm_start_gap() in place of vm_start
      (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
      places which need to respect the gap - mainly arch_get_unmapped_area(),
      and and the vma tree's subtree_gap support for that.
      Original-patch-by: NOleg Nesterov <oleg@redhat.com>
      Original-patch-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Tested-by: Helge Deller <deller@gmx.de> # parisc
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1be7107f
  3. 13 6月, 2017 2 次提交
  4. 11 6月, 2017 1 次提交
    • W
      KVM: async_pf: avoid async pf injection when in guest mode · 9bc1f09f
      Wanpeng Li 提交于
       INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
             Not tainted 4.12.0-rc4+ #8
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       gnome-terminal- D    0  1734   1015 0x00000000
       Call Trace:
        __schedule+0x3cd/0xb30
        schedule+0x40/0x90
        kvm_async_pf_task_wait+0x1cc/0x270
        ? __vfs_read+0x37/0x150
        ? prepare_to_swait+0x22/0x70
        do_async_page_fault+0x77/0xb0
        ? do_async_page_fault+0x77/0xb0
        async_page_fault+0x28/0x30
      
      This is triggered by running both win7 and win2016 on L1 KVM simultaneously,
      and then gives stress to memory on L1, I can observed this hang on L1 when
      at least ~70% swap area is occupied on L0.
      
      This is due to async pf was injected to L2 which should be injected to L1,
      L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host
      actually), and L1 guest starts accumulating tasks stuck in D state in
      kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.
      
      This patch fixes the hang by doing async pf when executing L1 guest.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9bc1f09f
  5. 09 6月, 2017 1 次提交
  6. 08 6月, 2017 2 次提交
    • W
      KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation · a3641631
      Wanpeng Li 提交于
      If "i" is the last element in the vcpu->arch.cpuid_entries[] array, it
      potentially can be exploited the vulnerability. this will out-of-bounds
      read and write.  Luckily, the effect is small:
      
      	/* when no next entry is found, the current entry[i] is reselected */
      	for (j = i + 1; ; j = (j + 1) % nent) {
      		struct kvm_cpuid_entry2 *ej = &vcpu->arch.cpuid_entries[j];
      		if (ej->function == e->function) {
      
      It reads ej->maxphyaddr, which is user controlled.  However...
      
      			ej->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT;
      
      After cpuid_entries there is
      
      	int maxphyaddr;
      	struct x86_emulate_ctxt emulate_ctxt;  /* 16-byte aligned */
      
      So we have:
      
      - cpuid_entries at offset 1B50 (6992)
      - maxphyaddr at offset 27D0 (6992 + 3200 = 10192)
      - padding at 27D4...27DF
      - emulate_ctxt at 27E0
      
      And it writes in the padding.  Pfew, writing the ops field of emulate_ctxt
      would have been much worse.
      
      This patch fixes it by modding the index to avoid the out-of-bounds
      access. Worst case, i == j and ej->function == e->function,
      the loop can bail out.
      Reported-by: NMoguofang <moguofang@huawei.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Guofang Mo <moguofang@huawei.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a3641631
    • D
      x86/microcode/intel: Clear patch pointer before jettisoning the initrd · 5b0bc9ac
      Dominik Brodowski 提交于
      During early boot, load_ucode_intel_ap() uses __load_ucode_intel()
      to obtain a pointer to the relevant microcode patch (embedded in the
      initrd), and stores this value in 'intel_ucode_patch' to speed up the
      microcode patch application for subsequent CPUs.
      
      On resuming from suspend-to-RAM, however, load_ucode_ap() calls
      load_ucode_intel_ap() for each non-boot-CPU. By then the initramfs is
      long gone so the pointer stored in 'intel_ucode_patch' no longer points to
      a valid microcode patch.
      
      Clear that pointer so that we effectively fall back to the CPU hotplug
      notifier callbacks to update the microcode.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      [ Edit and massage commit message. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> # 4.10..
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170607095819.9754-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5b0bc9ac
  7. 06 6月, 2017 2 次提交
    • W
      KVM: nVMX: Fix exception injection · d4912215
      Wanpeng Li 提交于
       WARNING: CPU: 3 PID: 2840 at arch/x86/kvm/vmx.c:10966 nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel]
       CPU: 3 PID: 2840 Comm: qemu-system-x86 Tainted: G           OE   4.12.0-rc3+ #23
       RIP: 0010:nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel]
       Call Trace:
        ? kvm_check_async_pf_completion+0xef/0x120 [kvm]
        ? rcu_read_lock_sched_held+0x79/0x80
        vmx_queue_exception+0x104/0x160 [kvm_intel]
        ? vmx_queue_exception+0x104/0x160 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0x1171/0x1ce0 [kvm]
        ? kvm_arch_vcpu_load+0x47/0x240 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x240 [kvm]
        kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
        ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
        ? __fget+0xf3/0x210
        do_vfs_ioctl+0xa4/0x700
        ? __fget+0x114/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x81/0x220
        entry_SYSCALL64_slow_path+0x25/0x25
      
      This is triggered occasionally by running both win7 and win2016 in L2, in
      addition, EPT is disabled on both L1 and L2. It can't be reproduced easily.
      
      Commit 0b6ac343 (KVM: nVMX: Correct handling of exception injection) mentioned
      that "KVM wants to inject page-faults which it got to the guest. This function
      assumes it is called with the exit reason in vmcs02 being a #PF exception".
      Commit e011c663 (KVM: nVMX: Check all exceptions for intercept during delivery to
      L2) allows to check all exceptions for intercept during delivery to L2. However,
      there is no guarantee the exit reason is exception currently, when there is an
      external interrupt occurred on host, maybe a time interrupt for host which should
      not be injected to guest, and somewhere queues an exception, then the function
      nested_vmx_check_exception() will be called and the vmexit emulation codes will
      try to emulate the "Acknowledge interrupt on exit" behavior, the warning is
      triggered.
      
      Reusing the exit reason from the L2->L0 vmexit is wrong in this case,
      the reason must always be EXCEPTION_NMI when injecting an exception into
      L1 as a nested vmexit.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Fixes: e011c663 ("KVM: nVMX: Check all exceptions for intercept during delivery to L2")
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      d4912215
    • P
      kvm: async_pf: fix rcu_irq_enter() with irqs enabled · bbaf0e2b
      Paolo Bonzini 提交于
      native_safe_halt enables interrupts, and you just shouldn't
      call rcu_irq_enter() with interrupts enabled.  Reorder the
      call with the following local_irq_disable() to respect the
      invariant.
      Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      bbaf0e2b
  8. 05 6月, 2017 1 次提交
    • C
      x86/cpu/cyrix: Add alternative Device ID of Geode GX1 SoC · ae1d557d
      Christian Sünkenberg 提交于
      A SoC variant of Geode GX1, notably NSC branded SC1100, seems to
      report an inverted Device ID in its DIR0 configuration register,
      specifically 0xb instead of the expected 0x4.
      
      Catch this presumably quirky version so it's properly recognized
      as GX1 and has its cache switched to write-back mode, which provides
      a significant performance boost in most workloads.
      
      SC1100's datasheet "Geode™ SC1100 Information Appliance On a Chip",
      states in section 1.1.7.1 "Device ID" that device identification
      values are specified in SC1100's device errata. These, however,
      seem to not have been publicly released.
      
      Wading through a number of boot logs and /proc/cpuinfo dumps found on
      pastebin and blogs, this patch should mostly be relevant for a number
      of now admittedly aging Soekris NET4801 and PC Engines WRAP devices,
      the latter being the platform this issue was discovered on.
      Performance impact was verified using "openssl speed", with
      write-back caching scaling throughput between -3% and +41%.
      Signed-off-by: NChristian Sünkenberg <christian.suenkenberg@student.kit.edu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1496596719.26725.14.camel@student.kit.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ae1d557d
  9. 01 6月, 2017 3 次提交
    • I
      Revert "x86/PAT: Fix Xorg regression on CPUs that don't support PAT" · c08d5174
      Ingo Molnar 提交于
      This reverts commit cbed27cd.
      
      As Andy Lutomirski observed:
      
       "I think this patch is bogus. pat_enabled() sure looks like it's
        supposed to return true if PAT is *enabled*, and these days PAT is
        'enabled' even if there's no HW PAT support."
      Reported-by: NBernhard Held <berny156@gmx.de>
      Reported-by: NChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: stable@vger.kernel.org # v4.2+
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c08d5174
    • Z
      KVM: x86: Fix nmi injection failure when vcpu got blocked · 47a66eed
      ZhuangYanying 提交于
      When spin_lock_irqsave() deadlock occurs inside the guest, vcpu threads,
      other than the lock-holding one, would enter into S state because of
      pvspinlock. Then inject NMI via libvirt API "inject-nmi", the NMI could
      not be injected into vm.
      
      The reason is:
      1 It sets nmi_queued to 1 when calling ioctl KVM_NMI in qemu, and sets
      cpu->kvm_vcpu_dirty to true in do_inject_external_nmi() meanwhile.
      2 It sets nmi_queued to 0 in process_nmi(), before entering guest, because
      cpu->kvm_vcpu_dirty is true.
      
      It's not enough just to check nmi_queued to decide whether to stay in
      vcpu_block() or not. NMI should be injected immediately at any situation.
      Add checking nmi_pending, and testing KVM_REQ_NMI replaces nmi_queued
      in vm_vcpu_has_events().
      
      Do the same change for SMIs.
      Signed-off-by: NZhuang Yanying <ann.zhuangyanying@huawei.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      47a66eed
    • R
      KVM: SVM: do not zero out segment attributes if segment is unusable or not present · d9c1b543
      Roman Pen 提交于
      This is a fix for the problem [1], where VMCB.CPL was set to 0 and interrupt
      was taken on userspace stack.  The root cause lies in the specific AMD CPU
      behaviour which manifests itself as unusable segment attributes on SYSRET.
      The corresponding work around for the kernel is the following:
      
      61f01dd9 ("x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue")
      
      In other turn virtualization side treated unusable segment incorrectly and
      restored CPL from SS attributes, which were zeroed out few lines above.
      
      In current patch it is assured only that P bit is cleared in VMCB.save state
      and segment attributes are not zeroed out if segment is not presented or is
      unusable, therefore CPL can be safely restored from DPL field.
      
      This is only one part of the fix, since QEMU side should be fixed accordingly
      not to zero out attributes on its side.  Corresponding patch will follow.
      
      [1] Message id: CAJrWOzD6Xq==b-zYCDdFLgSRMPM-NkNuTSDFEtX=7MreT45i7Q@mail.gmail.com
      Signed-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
      Signed-off-by: NMikhail Sennikovskii <mikhail.sennikovskii@profitbricks.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmář <rkrcmar@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d9c1b543
  10. 30 5月, 2017 2 次提交
  11. 29 5月, 2017 2 次提交
  12. 28 5月, 2017 3 次提交
    • B
      x86/efi: Correct EFI identity mapping under 'efi=old_map' when KASLR is enabled · 94133e46
      Baoquan He 提交于
      For EFI with the 'efi=old_map' kernel option specified, the kernel will panic
      when KASLR is enabled:
      
        BUG: unable to handle kernel paging request at 000000007febd57e
        IP: 0x7febd57e
        PGD 1025a067
        PUD 0
      
        Oops: 0010 [#1] SMP
        Call Trace:
         efi_enter_virtual_mode()
         start_kernel()
         x86_64_start_reservations()
         x86_64_start_kernel()
         start_cpu()
      
      The root cause is that the identity mapping is not built correctly
      in the 'efi=old_map' case.
      
      On 'nokaslr' kernels, PAGE_OFFSET is 0xffff880000000000 which is PGDIR_SIZE
      aligned. We can borrow the PUD table from the direct mappings safely. Given a
      physical address X, we have pud_index(X) == pud_index(__va(X)).
      
      However, on KASLR kernels, PAGE_OFFSET is PUD_SIZE aligned. For a given physical
      address X, pud_index(X) != pud_index(__va(X)). We can't just copy the PGD entry
      from direct mapping to build identity mapping, instead we need to copy the
      PUD entries one by one from the direct mapping.
      
      Fix it.
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Bhupesh Sharma <bhsharma@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Frank Ramsay <frank.ramsay@hpe.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170526113652.21339-5-matt@codeblueprint.co.uk
      [ Fixed and reworded the changelog and code comments to be more readable. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      94133e46
    • S
      x86/efi: Disable runtime services on kexec kernel if booted with efi=old_map · 4e52797d
      Sai Praneeth 提交于
      Booting kexec kernel with "efi=old_map" in kernel command line hits
      kernel panic as shown below.
      
       BUG: unable to handle kernel paging request at ffff88007fe78070
       IP: virt_efi_set_variable.part.7+0x63/0x1b0
       PGD 7ea28067
       PUD 7ea2b067
       PMD 7ea2d067
       PTE 0
       [...]
       Call Trace:
        virt_efi_set_variable()
        efi_delete_dummy_variable()
        efi_enter_virtual_mode()
        start_kernel()
        x86_64_start_reservations()
        x86_64_start_kernel()
        start_cpu()
      
      [ efi=old_map was never intended to work with kexec. The problem with
        using efi=old_map is that the virtual addresses are assigned from the
        memory region used by other kernel mappings; vmalloc() space.
        Potentially there could be collisions when booting kexec if something
        else is mapped at the virtual address we allocated for runtime service
        regions in the initial boot - Matt Fleming ]
      
      Since kexec was never intended to work with efi=old_map, disable
      runtime services in kexec if booted with efi=old_map, so that we don't
      panic.
      Tested-by: NLee Chun-Yi <jlee@suse.com>
      Signed-off-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Acked-by: NDave Young <dyoung@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Ricardo Neri <ricardo.neri@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170526113652.21339-4-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4e52797d
    • J
      efi: Don't issue error message when booted under Xen · 1ea34adb
      Juergen Gross 提交于
      When booted as Xen dom0 there won't be an EFI memmap allocated. Avoid
      issuing an error message in this case:
      
        [    0.144079] efi: Failed to allocate new EFI memmap
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: <stable@vger.kernel.org> # v4.9+
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170526113652.21339-2-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1ea34adb
  13. 27 5月, 2017 4 次提交
    • T
      x86/ftrace: Make sure that ftrace trampolines are not RWX · 6ee98ffe
      Thomas Gleixner 提交于
      ftrace use module_alloc() to allocate trampoline pages. The mapping of
      module_alloc() is RWX, which makes sense as the memory is written to right
      after allocation. But nothing makes these pages RO after writing to them.
      
      Add proper set_memory_rw/ro() calls to protect the trampolines after
      modification.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705251056410.1862@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      6ee98ffe
    • S
      x86/mm/ftrace: Do not bug in early boot on irqs_disabled in cpu_flush_range() · a53276e2
      Steven Rostedt (VMware) 提交于
      With function tracing starting in early bootup and having its trampoline
      pages being read only, a bug triggered with the following:
      
      kernel BUG at arch/x86/mm/pageattr.c:189!
      invalid opcode: 0000 [#1] SMP
      Modules linked in:
      CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc2-test+ #3
      Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
      task: ffffffffb4222500 task.stack: ffffffffb4200000
      RIP: 0010:change_page_attr_set_clr+0x269/0x302
      RSP: 0000:ffffffffb4203c88 EFLAGS: 00010046
      RAX: 0000000000000046 RBX: 0000000000000000 RCX: 00000001b6000000
      RDX: ffffffffb4203d40 RSI: 0000000000000000 RDI: ffffffffb4240d60
      RBP: ffffffffb4203d18 R08: 00000001b6000000 R09: 0000000000000001
      R10: ffffffffb4203aa8 R11: 0000000000000003 R12: ffffffffc029b000
      R13: ffffffffb4203d40 R14: 0000000000000001 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff9a639ea00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff9a636b384000 CR3: 00000001ea21d000 CR4: 00000000000406b0
      Call Trace:
       change_page_attr_clear+0x1f/0x21
       set_memory_ro+0x1e/0x20
       arch_ftrace_update_trampoline+0x207/0x21c
       ? ftrace_caller+0x64/0x64
       ? 0xffffffffc029b000
       ftrace_startup+0xf4/0x198
       register_ftrace_function+0x26/0x3c
       function_trace_init+0x5e/0x73
       tracer_init+0x1e/0x23
       tracing_set_tracer+0x127/0x15a
       register_tracer+0x19b/0x1bc
       init_function_trace+0x90/0x92
       early_trace_init+0x236/0x2b3
       start_kernel+0x200/0x3f5
       x86_64_start_reservations+0x29/0x2b
       x86_64_start_kernel+0x17c/0x18f
       secondary_startup_64+0x9f/0x9f
       ? secondary_startup_64+0x9f/0x9f
      
      Interrupts should not be enabled at this early in the boot process. It is
      also fine to leave interrupts enabled during this time as there's only one
      CPU running, and on_each_cpu() means to only run on the current CPU.
      
      If early_boot_irqs_disabled is set, it is safe to run cpu_flush_range() with
      interrupts disabled. Don't trigger a BUG_ON() in that case.
      
      Link: http://lkml.kernel.org/r/20170526093717.0be3b849@gandalf.local.homeSuggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      a53276e2
    • M
      kprobes/x86: Fix to set RWX bits correctly before releasing trampoline · c93f5cf5
      Masami Hiramatsu 提交于
      Fix kprobes to set(recover) RWX bits correctly on trampoline
      buffer before releasing it. Releasing readonly page to
      module_memfree() crash the kernel.
      
      Without this fix, if kprobes user register a bunch of kprobes
      in function body (since kprobes on function entry usually
      use ftrace) and unregister it, kernel hits a BUG and crash.
      
      Link: http://lkml.kernel.org/r/149570868652.3518.14120169373590420503.stgit@devboxSigned-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Fixes: d0381c81 ("kprobes/x86: Set kprobes pages read-only")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      c93f5cf5
    • J
      KVM: x86: Fix virtual wire mode · 52b54190
      Jan H. Schönherr 提交于
      Intel SDM says, that at most one LAPIC should be configured with ExtINT
      delivery. KVM configures all LAPICs this way. This causes pic_unlock()
      to kick the first available vCPU from the internal KVM data structures.
      If this vCPU is not the BSP, but some not-yet-booted AP, the BSP may
      never realize that there is an interrupt.
      
      Fix that by enabling ExtINT delivery only for the BSP.
      
      This allows booting a Linux guest without a TSC in the above situation.
      Otherwise the BSP gets stuck in calibrate_delay_converge().
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      52b54190