1. 03 6月, 2020 7 次提交
  2. 29 5月, 2020 2 次提交
    • J
      x86/ioperm: Prevent a memory leak when fork fails · 4bfe6cce
      Jay Lang 提交于
      In the copy_process() routine called by _do_fork(), failure to allocate
      a PID (or further along in the function) will trigger an invocation to
      exit_thread(). This is done to clean up from an earlier call to
      copy_thread_tls(). Naturally, the child task is passed into exit_thread(),
      however during the process, io_bitmap_exit() nullifies the parent's
      io_bitmap rather than the child's.
      
      As copy_thread_tls() has been called ahead of the failure, the reference
      count on the calling thread's io_bitmap is incremented as we would expect.
      However, io_bitmap_exit() doesn't accept any arguments, and thus assumes
      it should trash the current thread's io_bitmap reference rather than the
      child's. This is pretty sneaky in practice, because in all instances but
      this one, exit_thread() is called with respect to the current task and
      everything works out.
      
      A determined attacker can issue an appropriate ioctl (i.e. KDENABIO) to
      get a bitmap allocated, and force a clone3() syscall to fail by passing
      in a zeroed clone_args structure. The kernel handles the erroneous struct
      and the buggy code path is followed, and even though the parent's reference
      to the io_bitmap is trashed, the child still holds a reference and thus
      the structure will never be freed.
      
      Fix this by tweaking io_bitmap_exit() and its subroutines to accept a
      task_struct argument which to operate on.
      
      Fixes: ea5f1cd7 ("x86/ioperm: Remove bitmap if all permissions dropped")
      Signed-off-by: NJay Lang <jaytlang@mit.edu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable#@vger.kernel.org
      Link: https://lkml.kernel.org/r/20200524162742.253727-1-jaytlang@mit.edu
      4bfe6cce
    • A
      x86/dma: Fix max PFN arithmetic overflow on 32 bit systems · 88743470
      Alexander Dahl 提交于
      The intermediate result of the old term (4UL * 1024 * 1024 * 1024) is
      4 294 967 296 or 0x100000000 which is no problem on 64 bit systems.
      The patch does not change the later overall result of 0x100000 for
      MAX_DMA32_PFN (after it has been shifted by PAGE_SHIFT). The new
      calculation yields the same result, but does not require 64 bit
      arithmetic.
      
      On 32 bit systems the old calculation suffers from an arithmetic
      overflow in that intermediate term in braces: 4UL aka unsigned long int
      is 4 byte wide and an arithmetic overflow happens (the 0x100000000 does
      not fit in 4 bytes), the in braces result is truncated to zero, the
      following right shift does not alter that, so MAX_DMA32_PFN evaluates to
      0 on 32 bit systems.
      
      That wrong value is a problem in a comparision against MAX_DMA32_PFN in
      the init code for swiotlb in pci_swiotlb_detect_4gb() to decide if
      swiotlb should be active.  That comparison yields the opposite result,
      when compiling on 32 bit systems.
      
      This was not possible before
      
        1b7e03ef ("x86, NUMA: Enable emulation on 32bit too")
      
      when that MAX_DMA32_PFN was first made visible to x86_32 (and which
      landed in v3.0).
      
      In practice this wasn't a problem, unless CONFIG_SWIOTLB is active on
      x86-32.
      
      However if one has set CONFIG_IOMMU_INTEL, since
      
        c5a5dc4c ("iommu/vt-d: Don't switch off swiotlb if bounce page is used")
      
      there's a dependency on CONFIG_SWIOTLB, which was not necessarily
      active before. That landed in v5.4, where we noticed it in the fli4l
      Linux distribution. We have CONFIG_IOMMU_INTEL active on both 32 and 64
      bit kernel configs there (I could not find out why, so let's just say
      historical reasons).
      
      The effect is at boot time 64 MiB (default size) were allocated for
      bounce buffers now, which is a noticeable amount of memory on small
      systems like pcengines ALIX 2D3 with 256 MiB memory, which are still
      frequently used as home routers.
      
      We noticed this effect when migrating from kernel v4.19 (LTS) to v5.4
      (LTS) in fli4l and got that kernel messages for example:
      
        Linux version 5.4.22 (buildroot@buildroot) (gcc version 7.3.0 (Buildroot 2018.02.8)) #1 SMP Mon Nov 26 23:40:00 CET 2018
        …
        Memory: 183484K/261756K available (4594K kernel code, 393K rwdata, 1660K rodata, 536K init, 456K bss , 78272K reserved, 0K cma-reserved, 0K highmem)
        …
        PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
        software IO TLB: mapped [mem 0x0bb78000-0x0fb78000] (64MB)
      
      The initial analysis and the suggested fix was done by user 'sourcejedi'
      at stackoverflow and explicitly marked as GPLv2 for inclusion in the
      Linux kernel:
      
        https://unix.stackexchange.com/a/520525/50007
      
      The new calculation, which does not suffer from that overflow, is the
      same as for arch/mips now as suggested by Robin Murphy.
      
      The fix was tested by fli4l users on round about two dozen different
      systems, including both 32 and 64 bit archs, bare metal and virtualized
      machines.
      
       [ bp: Massage commit message. ]
      
      Fixes: 1b7e03ef ("x86, NUMA: Enable emulation on 32bit too")
      Reported-by: NAlan Jenkins <alan.christopher.jenkins@gmail.com>
      Suggested-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NAlexander Dahl <post@lespocky.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: stable@vger.kernel.org
      Link: https://unix.stackexchange.com/q/520065/50007
      Link: https://web.nettworks.org/bugs/browse/FFL-2560
      Link: https://lkml.kernel.org/r/20200526175749.20742-1-post@lespocky.de
      88743470
  3. 28 5月, 2020 1 次提交
  4. 27 5月, 2020 1 次提交
  5. 26 5月, 2020 1 次提交
  6. 24 5月, 2020 1 次提交
  7. 23 5月, 2020 1 次提交
    • J
      x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasks · 187b96db
      Josh Poimboeuf 提交于
      Normally, show_trace_log_lvl() scans the stack, looking for text
      addresses to print.  In parallel, it unwinds the stack with
      unwind_next_frame().  If the stack address matches the pointer returned
      by unwind_get_return_address_ptr() for the current frame, the text
      address is printed normally without a question mark.  Otherwise it's
      considered a breadcrumb (potentially from a previous call path) and it's
      printed with a question mark to indicate that the address is unreliable
      and typically can be ignored.
      
      Since the following commit:
      
        f1d9a2ab ("x86/unwind/orc: Don't skip the first frame for inactive tasks")
      
      ... for inactive tasks, show_trace_log_lvl() prints *only* unreliable
      addresses (prepended with '?').
      
      That happens because, for the first frame of an inactive task,
      unwind_get_return_address_ptr() returns the wrong return address
      pointer: one word *below* the task stack pointer.  show_trace_log_lvl()
      starts scanning at the stack pointer itself, so it never finds the first
      'reliable' address, causing only guesses to being printed.
      
      The first frame of an inactive task isn't a normal stack frame.  It's
      actually just an instance of 'struct inactive_task_frame' which is left
      behind by __switch_to_asm().  Now that this inactive frame is actually
      exposed to callers, fix unwind_get_return_address_ptr() to interpret it
      properly.
      
      Fixes: f1d9a2ab ("x86/unwind/orc: Don't skip the first frame for inactive tasks")
      Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200522135435.vbxs7umku5pyrdbk@treble
      187b96db
  8. 20 5月, 2020 1 次提交
  9. 16 5月, 2020 1 次提交
  10. 15 5月, 2020 3 次提交
    • D
      bpf: Restrict bpf_probe_read{, str}() only to archs where they work · 0ebeea8c
      Daniel Borkmann 提交于
      Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs
      with overlapping address ranges, we should really take the next step to
      disable them from BPF use there.
      
      To generally fix the situation, we've recently added new helper variants
      bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str().
      For details on them, see 6ae08ae3 ("bpf: Add probe_read_{user, kernel}
      and probe_read_{user,kernel}_str helpers").
      
      Given bpf_probe_read{,str}() have been around for ~5 years by now, there
      are plenty of users at least on x86 still relying on them today, so we
      cannot remove them entirely w/o breaking the BPF tracing ecosystem.
      
      However, their use should be restricted to archs with non-overlapping
      address ranges where they are working in their current form. Therefore,
      move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and
      have x86, arm64, arm select it (other archs supporting it can follow-up
      on it as well).
      
      For the remaining archs, they can workaround easily by relying on the
      feature probe from bpftool which spills out defines that can be used out
      of BPF C code to implement the drop-in replacement for old/new kernels
      via: bpftool feature probe macro
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/bpf/20200515101118.6508-2-daniel@iogearbox.net
      0ebeea8c
    • B
      x86: Fix early boot crash on gcc-10, third try · a9a3ed1e
      Borislav Petkov 提交于
      ... or the odyssey of trying to disable the stack protector for the
      function which generates the stack canary value.
      
      The whole story started with Sergei reporting a boot crash with a kernel
      built with gcc-10:
      
        Kernel panic — not syncing: stack-protector: Kernel stack is corrupted in: start_secondary
        CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc5—00235—gfffb08b3 #139
        Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M—D3H, BIOS F12 11/14/2013
        Call Trace:
          dump_stack
          panic
          ? start_secondary
          __stack_chk_fail
          start_secondary
          secondary_startup_64
        -—-[ end Kernel panic — not syncing: stack—protector: Kernel stack is corrupted in: start_secondary
      
      This happens because gcc-10 tail-call optimizes the last function call
      in start_secondary() - cpu_startup_entry() - and thus emits a stack
      canary check which fails because the canary value changes after the
      boot_init_stack_canary() call.
      
      To fix that, the initial attempt was to mark the one function which
      generates the stack canary with:
      
        __attribute__((optimize("-fno-stack-protector"))) ... start_secondary(void *unused)
      
      however, using the optimize attribute doesn't work cumulatively
      as the attribute does not add to but rather replaces previously
      supplied optimization options - roughly all -fxxx options.
      
      The key one among them being -fno-omit-frame-pointer and thus leading to
      not present frame pointer - frame pointer which the kernel needs.
      
      The next attempt to prevent compilers from tail-call optimizing
      the last function call cpu_startup_entry(), shy of carving out
      start_secondary() into a separate compilation unit and building it with
      -fno-stack-protector, was to add an empty asm("").
      
      This current solution was short and sweet, and reportedly, is supported
      by both compilers but we didn't get very far this time: future (LTO?)
      optimization passes could potentially eliminate this, which leads us
      to the third attempt: having an actual memory barrier there which the
      compiler cannot ignore or move around etc.
      
      That should hold for a long time, but hey we said that about the other
      two solutions too so...
      Reported-by: NSergei Trofimovich <slyfox@gentoo.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NKalle Valo <kvalo@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200314164451.346497-1-slyfox@gentoo.org
      a9a3ed1e
    • J
      x86/unwind/orc: Fix error handling in __unwind_start() · 71c95825
      Josh Poimboeuf 提交于
      The unwind_state 'error' field is used to inform the reliable unwinding
      code that the stack trace can't be trusted.  Set this field for all
      errors in __unwind_start().
      
      Also, move the zeroing out of the unwind_state struct to before the ORC
      table initialization check, to prevent the caller from reading
      uninitialized data if the ORC table is corrupted.
      
      Fixes: af085d90 ("stacktrace/x86: add function for detecting reliable stack traces")
      Fixes: d3a09104 ("x86/unwinder/orc: Dont bail on stack overflow")
      Fixes: 98d0c8eb ("x86/unwind/orc: Prevent unwinding before ORC initialization")
      Reported-by: NPavel Machek <pavel@denx.de>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/d6ac7215a84ca92b895fdd2e1aa546729417e6e6.1589487277.git.jpoimboe@redhat.com
      71c95825
  11. 14 5月, 2020 1 次提交
    • A
      x86/boot: Mark global variables as static · e78d334a
      Arvind Sankar 提交于
      Mike Lothian reports that after commit
        964124a9 ("efi/x86: Remove extra headroom for setup block")
      gcc 10.1.0 fails with
      
        HOSTCC  arch/x86/boot/tools/build
        /usr/lib/gcc/x86_64-pc-linux-gnu/10.1.0/../../../../x86_64-pc-linux-gnu/bin/ld:
        error: linker defined: multiple definition of '_end'
        /usr/lib/gcc/x86_64-pc-linux-gnu/10.1.0/../../../../x86_64-pc-linux-gnu/bin/ld:
        /tmp/ccEkW0jM.o: previous definition here
        collect2: error: ld returned 1 exit status
        make[1]: *** [scripts/Makefile.host:103: arch/x86/boot/tools/build] Error 1
        make: *** [arch/x86/Makefile:303: bzImage] Error 2
      
      The issue is with the _end variable that was added, to hold the end of
      the compressed kernel from zoffsets.h (ZO__end). The name clashes with
      the linker-defined _end symbol that indicates the end of the build
      program itself.
      
      Even when there is no compile-time error, this causes build to use
      memory past the end of its .bss section.
      
      To solve this, mark _end as static, and for symmetry, mark the rest of
      the variables that keep track of symbols from the compressed kernel as
      static as well.
      
      Fixes: 964124a9 ("efi/x86: Remove extra headroom for setup block")
      Reported-by: NMike Lothian <mike@fireburn.co.uk>
      Tested-by: NMike Lothian <mike@fireburn.co.uk>
      Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
      Link: https://lore.kernel.org/r/20200511225849.1311869-1-nivedita@alum.mit.eduSigned-off-by: NArd Biesheuvel <ardb@kernel.org>
      e78d334a
  12. 13 5月, 2020 3 次提交
    • B
      KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c · 37486135
      Babu Moger 提交于
      Though rdpkru and wrpkru are contingent upon CR4.PKE, the PKRU
      resource isn't. It can be read with XSAVE and written with XRSTOR.
      So, if we don't set the guest PKRU value here(kvm_load_guest_xsave_state),
      the guest can read the host value.
      
      In case of kvm_load_host_xsave_state, guest with CR4.PKE clear could
      potentially use XRSTOR to change the host PKRU value.
      
      While at it, move pkru state save/restore to common code and the
      host_pkru field to kvm_vcpu_arch.  This will let SVM support protection keys.
      
      Cc: stable@vger.kernel.org
      Reported-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NBabu Moger <babu.moger@amd.com>
      Message-Id: <158932794619.44260.14508381096663848853.stgit@naples-babu.amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      37486135
    • V
      x86/hyperv: Properly suspend/resume reenlightenment notifications · 38dce419
      Vitaly Kuznetsov 提交于
      Errors during hibernation with reenlightenment notifications enabled were
      reported:
      
       [   51.730435] PM: hibernation entry
       [   51.737435] PM: Syncing filesystems ...
       ...
       [   54.102216] Disabling non-boot CPUs ...
       [   54.106633] smpboot: CPU 1 is now offline
       [   54.110006] unchecked MSR access error: WRMSR to 0x40000106 (tried to
           write 0x47c72780000100ee) at rIP: 0xffffffff90062f24
           native_write_msr+0x4/0x20)
       [   54.110006] Call Trace:
       [   54.110006]  hv_cpu_die+0xd9/0xf0
       ...
      
      Normally, hv_cpu_die() just reassigns reenlightenment notifications to some
      other CPU when the CPU receiving them goes offline. Upon hibernation, there
      is no other CPU which is still online so cpumask_any_but(cpu_online_mask)
      returns >= nr_cpu_ids and using it as hv_vp_index index is incorrect.
      Disable the feature when cpumask_any_but() fails.
      
      Also, as we now disable reenlightenment notifications upon hibernation we
      need to restore them on resume. Check if hv_reenlightenment_cb was
      previously set and restore from hv_resume().
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NDexuan Cui <decui@microsoft.com>
      Reviewed-by: NTianyu Lan <Tianyu.Lan@microsoft.com>
      Link: https://lore.kernel.org/r/20200512160153.134467-1-vkuznets@redhat.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
      38dce419
    • S
      x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up · 59566b0b
      Steven Rostedt (VMware) 提交于
      Booting one of my machines, it triggered the following crash:
      
       Kernel/User page tables isolation: enabled
       ftrace: allocating 36577 entries in 143 pages
       Starting tracer 'function'
       BUG: unable to handle page fault for address: ffffffffa000005c
       #PF: supervisor write access in kernel mode
       #PF: error_code(0x0003) - permissions violation
       PGD 2014067 P4D 2014067 PUD 2015063 PMD 7b253067 PTE 7b252061
       Oops: 0003 [#1] PREEMPT SMP PTI
       CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-test+ #24
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
       RIP: 0010:text_poke_early+0x4a/0x58
       Code: 34 24 48 89 54 24 08 e8 bf 72 0b 00 48 8b 34 24 48 8b 4c 24 08 84 c0 74 0b 48 89 df f3 a4 48 83 c4 10 5b c3 9c 58 fa 48 89 df <f3> a4 50 9d 48 83 c4 10 5b e9 d6 f9 ff ff
      0 41 57 49
       RSP: 0000:ffffffff82003d38 EFLAGS: 00010046
       RAX: 0000000000000046 RBX: ffffffffa000005c RCX: 0000000000000005
       RDX: 0000000000000005 RSI: ffffffff825b9a90 RDI: ffffffffa000005c
       RBP: ffffffffa000005c R08: 0000000000000000 R09: ffffffff8206e6e0
       R10: ffff88807b01f4c0 R11: ffffffff8176c106 R12: ffffffff8206e6e0
       R13: ffffffff824f2440 R14: 0000000000000000 R15: ffffffff8206eac0
       FS:  0000000000000000(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: ffffffffa000005c CR3: 0000000002012000 CR4: 00000000000006b0
       Call Trace:
        text_poke_bp+0x27/0x64
        ? mutex_lock+0x36/0x5d
        arch_ftrace_update_trampoline+0x287/0x2d5
        ? ftrace_replace_code+0x14b/0x160
        ? ftrace_update_ftrace_func+0x65/0x6c
        __register_ftrace_function+0x6d/0x81
        ftrace_startup+0x23/0xc1
        register_ftrace_function+0x20/0x37
        func_set_flag+0x59/0x77
        __set_tracer_option.isra.19+0x20/0x3e
        trace_set_options+0xd6/0x13e
        apply_trace_boot_options+0x44/0x6d
        register_tracer+0x19e/0x1ac
        early_trace_init+0x21b/0x2c9
        start_kernel+0x241/0x518
        ? load_ucode_intel_bsp+0x21/0x52
        secondary_startup_64+0xa4/0xb0
      
      I was able to trigger it on other machines, when I added to the kernel
      command line of both "ftrace=function" and "trace_options=func_stack_trace".
      
      The cause is the "ftrace=function" would register the function tracer
      and create a trampoline, and it will set it as executable and
      read-only. Then the "trace_options=func_stack_trace" would then update
      the same trampoline to include the stack tracer version of the function
      tracer. But since the trampoline already exists, it updates it with
      text_poke_bp(). The problem is that text_poke_bp() called while
      system_state == SYSTEM_BOOTING, it will simply do a memcpy() and not
      the page mapping, as it would think that the text is still read-write.
      But in this case it is not, and we take a fault and crash.
      
      Instead, lets keep the ftrace trampolines read-write during boot up,
      and then when the kernel executable text is set to read-only, the
      ftrace trampolines get set to read-only as well.
      
      Link: https://lkml.kernel.org/r/20200430202147.4dc6e2de@oasis.local.home
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: stable@vger.kernel.org
      Fixes: 768ae440 ("x86/ftrace: Use text_poke()")
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      59566b0b
  13. 08 5月, 2020 7 次提交
    • S
      KVM: SVM: Disable AVIC before setting V_IRQ · 7d611233
      Suravee Suthikulpanit 提交于
      The commit 64b5bd27 ("KVM: nSVM: ignore L1 interrupt window
      while running L2 with V_INTR_MASKING=1") introduced a WARN_ON,
      which checks if AVIC is enabled when trying to set V_IRQ
      in the VMCB for enabling irq window.
      
      The following warning is triggered because the requesting vcpu
      (to deactivate AVIC) does not get to process APICv update request
      for itself until the next #vmexit.
      
      WARNING: CPU: 0 PID: 118232 at arch/x86/kvm/svm/svm.c:1372 enable_irq_window+0x6a/0xa0 [kvm_amd]
       RIP: 0010:enable_irq_window+0x6a/0xa0 [kvm_amd]
       Call Trace:
        kvm_arch_vcpu_ioctl_run+0x6e3/0x1b50 [kvm]
        ? kvm_vm_ioctl_irq_line+0x27/0x40 [kvm]
        ? _copy_to_user+0x26/0x30
        ? kvm_vm_ioctl+0xb3e/0xd90 [kvm]
        ? set_next_entity+0x78/0xc0
        kvm_vcpu_ioctl+0x236/0x610 [kvm]
        ksys_ioctl+0x8a/0xc0
        __x64_sys_ioctl+0x1a/0x20
        do_syscall_64+0x58/0x210
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes by sending APICV update request to all other vcpus, and
      immediately update APIC for itself.
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Link: https://lkml.org/lkml/2020/5/2/167
      Fixes: 64b5bd27 ("KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1")
      Message-Id: <1588818939-54264-1-git-send-email-suravee.suthikulpanit@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7d611233
    • S
      KVM: Introduce kvm_make_all_cpus_request_except() · 54163a34
      Suravee Suthikulpanit 提交于
      This allows making request to all other vcpus except the one
      specified in the parameter.
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Message-Id: <1588771076-73790-2-git-send-email-suravee.suthikulpanit@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      54163a34
    • P
      KVM: VMX: pass correct DR6 for GD userspace exit · 45981ded
      Paolo Bonzini 提交于
      When KVM_EXIT_DEBUG is raised for the disabled-breakpoints case (DR7.GD),
      DR6 was incorrectly copied from the value in the VM.  Instead,
      DR6.BD should be set in order to catch this case.
      
      On AMD this does not need any special code because the processor triggers
      a #DB exception that is intercepted.  However, the testcase would fail
      without the previous patch because both DR6.BS and DR6.BD would be set.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      45981ded
    • P
      KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6 · d67668e9
      Paolo Bonzini 提交于
      There are two issues with KVM_EXIT_DEBUG on AMD, whose root cause is the
      different handling of DR6 on intercepted #DB exceptions on Intel and AMD.
      
      On Intel, #DB exceptions transmit the DR6 value via the exit qualification
      field of the VMCS, and the exit qualification only contains the description
      of the precise event that caused a vmexit.
      
      On AMD, instead the DR6 field of the VMCB is filled in as if the #DB exception
      was to be injected into the guest.  This has two effects when guest debugging
      is in use:
      
      * the guest DR6 is clobbered
      
      * the kvm_run->debug.arch.dr6 field can accumulate more debug events, rather
      than just the last one that happened (the testcase in the next patch covers
      this issue).
      
      This patch fixes both issues by emulating, so to speak, the Intel behavior
      on AMD processors.  The important observation is that (after the previous
      patches) the VMCB value of DR6 is only ever observable from the guest is
      KVM_DEBUGREG_WONT_EXIT is set.  Therefore we can actually set vmcb->save.dr6
      to any value we want as long as KVM_DEBUGREG_WONT_EXIT is clear, which it
      will be if guest debugging is enabled.
      
      Therefore it is possible to enter the guest with an all-zero DR6,
      reconstruct the #DB payload from the DR6 we get at exit time, and let
      kvm_deliver_exception_payload move the newly set bits into vcpu->arch.dr6.
      Some extra bits may be included in the payload if KVM_DEBUGREG_WONT_EXIT
      is set, but this is harmless.
      
      This may not be the most optimized way to deal with this, but it is
      simple and, being confined within SVM code, it gets rid of the set_dr6
      callback and kvm_update_dr6.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d67668e9
    • P
      KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6 · 5679b803
      Paolo Bonzini 提交于
      kvm_x86_ops.set_dr6 is only ever called with vcpu->arch.dr6 as the
      second argument.  Ensure that the VMCB value is synchronized to
      vcpu->arch.dr6 on #DB (both "normal" and nested) and nested vmentry, so
      that the current value of DR6 is always available in vcpu->arch.dr6.
      The get_dr6 callback can just access vcpu->arch.dr6 and becomes redundant.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5679b803
    • E
      crypto: lib/sha1 - remove unnecessary includes of linux/cryptohash.h · 2aaba014
      Eric Biggers 提交于
      <linux/cryptohash.h> sounds very generic and important, like it's the
      header to include if you're doing cryptographic hashing in the kernel.
      But actually it only includes the library implementation of the SHA-1
      compression function (not even the full SHA-1).  This should basically
      never be used anymore; SHA-1 is no longer considered secure, and there
      are much better ways to do cryptographic hashing in the kernel.
      
      Most files that include this header don't actually need it.  So in
      preparation for removing it, remove all these unneeded includes of it.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      2aaba014
    • J
      arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory() · 996ed22c
      Janakarajan Natarajan 提交于
      When trying to lock read-only pages, sev_pin_memory() fails because
      FOLL_WRITE is used as the flag for get_user_pages_fast().
      
      Commit 73b0140b ("mm/gup: change GUP fast to use flags rather than a
      write 'bool'") updated the get_user_pages_fast() call sites to use
      flags, but incorrectly updated the call in sev_pin_memory().  As the
      original coding of this call was correct, revert the change made by that
      commit.
      
      Fixes: 73b0140b ("mm/gup: change GUP fast to use flags rather than a write 'bool'")
      Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Mike Marshall <hubcap@omnibond.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Link: http://lkml.kernel.org/r/20200423152419.87202-1-Janakarajan.Natarajan@amd.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      996ed22c
  14. 07 5月, 2020 8 次提交
  15. 06 5月, 2020 2 次提交