1. 27 5月, 2021 18 次提交
    • A
      KVM: selftests: add shmem backing source type · c9befd59
      Axel Rasmussen 提交于
      This lets us run the demand paging test on top of a shmem-backed area.
      In follow-up commits, we'll 1) leverage this new capability to create an
      alias mapping, and then 2) use the alias mapping to exercise UFFD minor
      faults.
      Signed-off-by: NAxel Rasmussen <axelrasmussen@google.com>
      Message-Id: <20210519200339.829146-8-axelrasmussen@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c9befd59
    • A
      KVM: selftests: refactor vm_mem_backing_src_type flags · b3784bc2
      Axel Rasmussen 提交于
      Each struct vm_mem_backing_src_alias has a flags field, which denotes
      the flags used to mmap() an area of that type. Previously, this field
      never included MAP_PRIVATE | MAP_ANONYMOUS, because
      vm_userspace_mem_region_add assumed that *all* types would always use
      those flags, and so it hardcoded them.
      
      In a follow-up commit, we'll add a new type: shmem. Areas of this type
      must not have MAP_PRIVATE | MAP_ANONYMOUS, and instead they must have
      MAP_SHARED.
      
      So, refactor things. Make it so that the flags field of
      struct vm_mem_backing_src_alias really is a complete set of flags, and
      don't add in any extras in vm_userspace_mem_region_add. This will let us
      easily tack on shmem.
      Signed-off-by: NAxel Rasmussen <axelrasmussen@google.com>
      Message-Id: <20210519200339.829146-7-axelrasmussen@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b3784bc2
    • A
      KVM: selftests: allow different backing source types · 0368c2c1
      Axel Rasmussen 提交于
      Add an argument which lets us specify a different backing memory type
      for the test. The default is just to use anonymous, matching existing
      behavior.
      
      This is in preparation for testing UFFD minor faults. For that, we'll
      need to use a new backing memory type which is setup with MAP_SHARED.
      Signed-off-by: NAxel Rasmussen <axelrasmussen@google.com>
      Message-Id: <20210519200339.829146-6-axelrasmussen@google.com>
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0368c2c1
    • A
      KVM: selftests: compute correct demand paging size · 32ffa4f7
      Axel Rasmussen 提交于
      This is a preparatory commit needed before we can use different kinds of
      backing pages for guest memory.
      
      Previously, we used perf_test_args.host_page_size, which is the host's
      native page size (commonly 4K). For VM_MEM_SRC_ANONYMOUS this turns out
      to be okay, but in a follow-up commit we want to allow using different
      kinds of backing memory.
      
      Take VM_MEM_SRC_ANONYMOUS_HUGETLB for example. Without this change, if
      we used that backing page type, when we issued a UFFDIO_COPY ioctl we'd
      only do so with 4K, rather than the full 2M of a backing hugepage. In
      this case, UFFDIO_COPY returns -EINVAL (__mcopy_atomic_hugetlb checks
      the size).
      Signed-off-by: NAxel Rasmussen <axelrasmussen@google.com>
      Message-Id: <20210519200339.829146-5-axelrasmussen@google.com>
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      32ffa4f7
    • A
      KVM: selftests: simplify setup_demand_paging error handling · 25408e5a
      Axel Rasmussen 提交于
      A small cleanup. Our caller writes:
      
        r = setup_demand_paging(...);
        if (r < 0) exit(-r);
      
      Since we're just going to exit anyway, instead of returning an error we
      can just re-use TEST_ASSERT. This makes the caller simpler, as well as
      the function itself - no need to write our branches, etc.
      Signed-off-by: NAxel Rasmussen <axelrasmussen@google.com>
      Message-Id: <20210519200339.829146-3-axelrasmussen@google.com>
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      25408e5a
    • D
      KVM: selftests: Print a message if /dev/kvm is missing · 2aab4b35
      David Matlack 提交于
      If a KVM selftest is run on a machine without /dev/kvm, it will exit
      silently. Make it easy to tell what's happening by printing an error
      message.
      
      Opportunistically consolidate all codepaths that open /dev/kvm into a
      single function so they all print the same message.
      
      This slightly changes the semantics of vm_is_unrestricted_guest() by
      changing a TEST_ASSERT() to exit(KSFT_SKIP). However
      vm_is_unrestricted_guest() is only called in one place
      (x86_64/mmio_warning_test.c) and that is to determine if the test should
      be skipped or not.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20210511202120.1371800-1-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2aab4b35
    • A
      KVM: selftests: trivial comment/logging fixes · c887d6a1
      Axel Rasmussen 提交于
      Some trivial fixes I found while touching related code in this series,
      factored out into a separate commit for easier reviewing:
      
      - s/gor/got/ and add a newline in demand_paging_test.c
      - s/backing_src/src_type/ in a comment to be consistent with the real
        function signature in kvm_util.c
      Signed-off-by: NAxel Rasmussen <axelrasmussen@google.com>
      Message-Id: <20210519200339.829146-2-axelrasmussen@google.com>
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c887d6a1
    • D
      KVM: selftests: Fix hang in hardware_disable_test · a10453c0
      David Matlack 提交于
      If /dev/kvm is not available then hardware_disable_test will hang
      indefinitely because the child process exits before posting to the
      semaphore for which the parent is waiting.
      
      Fix this by making the parent periodically check if the child has
      exited. We have to be careful to forward the child's exit status to
      preserve a KSFT_SKIP status.
      
      I considered just checking for /dev/kvm before creating the child
      process, but there are so many other reasons why the child could exit
      early that it seemed better to handle that as general case.
      
      Tested:
      
      $ ./hardware_disable_test
      /dev/kvm not available, skipping test
      $ echo $?
      4
      $ modprobe kvm_intel
      $ ./hardware_disable_test
      $ echo $?
      0
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20210514230521.2608768-1-dmatlack@google.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a10453c0
    • D
      KVM: selftests: Ignore CPUID.0DH.1H in get_cpuid_test · 50bc913d
      David Matlack 提交于
      Similar to CPUID.0DH.0H this entry depends on the vCPU's XCR0 register
      and IA32_XSS MSR. Since this test does not control for either before
      assigning the vCPU's CPUID, these entries will not necessarily match
      the supported CPUID exposed by KVM.
      
      This fixes get_cpuid_test on Cascade Lake CPUs.
      Suggested-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20210519211345.3944063-1-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      50bc913d
    • D
      KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn() · ef4c9f4f
      David Matlack 提交于
      vm_get_max_gfn() casts vm->max_gfn from a uint64_t to an unsigned int,
      which causes the upper 32-bits of the max_gfn to get truncated.
      
      Nobody noticed until now likely because vm_get_max_gfn() is only used
      as a mechanism to create a memslot in an unused region of the guest
      physical address space (the top), and the top of the 32-bit physical
      address space was always good enough.
      
      This fix reveals a bug in memslot_modification_stress_test which was
      trying to create a dummy memslot past the end of guest physical memory.
      Fix that by moving the dummy memslot lower.
      
      Fixes: 52200d0d ("KVM: selftests: Remove duplicate guest mode handling")
      Reviewed-by: NVenkatesh Srinivas <venkateshs@chromium.org>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20210521173828.1180619-1-dmatlack@google.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ef4c9f4f
    • M
      KVM: selftests: add a memslot-related performance benchmark · cad347fa
      Maciej S. Szmigiero 提交于
      This benchmark contains the following tests:
      * Map test, where the host unmaps guest memory while the guest writes to
      it (maps it).
      
      The test is designed in a way to make the unmap operation on the host
      take a negligible amount of time in comparison with the mapping
      operation in the guest.
      
      The test area is actually split in two: the first half is being mapped
      by the guest while the second half in being unmapped by the host.
      Then a guest <-> host sync happens and the areas are reversed.
      
      * Unmap test which is broadly similar to the above map test, but it is
      designed in an opposite way: to make the mapping operation in the guest
      take a negligible amount of time in comparison with the unmap operation
      on the host.
      This test is available in two variants: with per-page unmap operation
      or a chunked one (using 2 MiB chunk size).
      
      * Move active area test which involves moving the last (highest gfn)
      memslot a bit back and forth on the host while the guest is
      concurrently writing around the area being moved (including over the
      moved memslot).
      
      * Move inactive area test which is similar to the previous move active
      area test, but now guest writes all happen outside of the area being
      moved.
      
      * Read / write test in which the guest writes to the beginning of each
      page of the test area while the host writes to the middle of each such
      page.
      Then each side checks the values the other side has written.
      This particular test is not expected to give different results depending
      on particular memslots implementation, it is meant as a rough sanity
      check and to provide insight on the spread of test results expected.
      
      Each test performs its operation in a loop until a test period ends
      (this is 5 seconds by default, but it is configurable).
      Then the total count of loops done is divided by the actual elapsed
      time to give the test result.
      
      The tests have a configurable memslot cap with the "-s" test option, by
      default the system maximum is used.
      Each test is repeated a particular number of times (by default 20
      times), the best result achieved is printed.
      
      The test memory area is divided equally between memslots, the reminder
      is added to the last memslot.
      The test area size does not depend on the number of memslots in use.
      
      The tests also measure the time that it took to add all these memslots.
      The best result from the tests that use the whole test area is printed
      after all the requested tests are done.
      
      In general, these tests are designed to use as much memory as possible
      (within reason) while still doing 100+ loops even on high memslot counts
      with the default test length.
      Increasing the test runtime makes it increasingly more likely that some
      event will happen on the system during the test run, which might lower
      the test result.
      Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Message-Id: <8d31bb3d92bc8fa33a9756fa802ee14266ab994e.1618253574.git.maciej.szmigiero@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cad347fa
    • M
      KVM: selftests: Keep track of memslots more efficiently · 22721a56
      Maciej S. Szmigiero 提交于
      The KVM selftest framework was using a simple list for keeping track of
      the memslots currently in use.
      This resulted in lookups and adding a single memslot being O(n), the
      later due to linear scanning of the existing memslot set to check for
      the presence of any conflicting entries.
      
      Before this change, benchmarking high count of memslots was more or less
      impossible as pretty much all the benchmark time was spent in the
      selftest framework code.
      
      We can simply use a rbtree for keeping track of both of gfn and hva.
      We don't need an interval tree for hva here as we can't have overlapping
      memslots because we allocate a completely new memory chunk for each new
      memslot.
      Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Message-Id: <b12749d47ee860468240cf027412c91b76dbe3db.1618253574.git.maciej.szmigiero@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      22721a56
    • P
      selftests: kvm: fix potential issue with ELF loading · a13534d6
      Paolo Bonzini 提交于
      vm_vaddr_alloc() sets up GVA to GPA mapping page by page; therefore, GPAs
      may not be continuous if same memslot is used for data and page table allocation.
      
      kvm_vm_elf_load() however expects a continuous range of HVAs (and thus GPAs)
      because it does not try to read file data page by page.  Fix this mismatch
      by allocating memory in one step.
      Reported-by: NZhenzhong Duan <zhenzhong.duan@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a13534d6
    • Z
      selftests: kvm: make allocation of extra memory take effect · 39fe2fc9
      Zhenzhong Duan 提交于
      The extra memory pages is missed to be allocated during VM creating.
      perf_test_util and kvm_page_table_test use it to alloc extra memory
      currently.
      
      Fix it by adding extra_mem_pages to the total memory calculation before
      allocate.
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@intel.com>
      Message-Id: <20210512043107.30076-1-zhenzhong.duan@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      39fe2fc9
    • W
      KVM: X86: hyper-v: Task srcu lock when accessing kvm_memslots() · da6d63a0
      Wanpeng Li 提交于
         WARNING: suspicious RCU usage
         5.13.0-rc1 #4 Not tainted
         -----------------------------
         ./include/linux/kvm_host.h:710 suspicious rcu_dereference_check() usage!
      
        other info that might help us debug this:
      
        rcu_scheduler_active = 2, debug_locks = 1
         1 lock held by hyperv_clock/8318:
          #0: ffffb6b8cb05a7d8 (&hv->hv_lock){+.+.}-{3:3}, at: kvm_hv_invalidate_tsc_page+0x3e/0xa0 [kvm]
      
        stack backtrace:
        CPU: 3 PID: 8318 Comm: hyperv_clock Not tainted 5.13.0-rc1 #4
        Call Trace:
         dump_stack+0x87/0xb7
         lockdep_rcu_suspicious+0xce/0xf0
         kvm_write_guest_page+0x1c1/0x1d0 [kvm]
         kvm_write_guest+0x50/0x90 [kvm]
         kvm_hv_invalidate_tsc_page+0x79/0xa0 [kvm]
         kvm_gen_update_masterclock+0x1d/0x110 [kvm]
         kvm_arch_vm_ioctl+0x2a7/0xc50 [kvm]
         kvm_vm_ioctl+0x123/0x11d0 [kvm]
         __x64_sys_ioctl+0x3ed/0x9d0
         do_syscall_64+0x3d/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      kvm_memslots() will be called by kvm_write_guest(), so we should take the srcu lock.
      
      Fixes: e880c6ea (KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs)
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1621339235-11131-4-git-send-email-wanpengli@tencent.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      da6d63a0
    • W
      KVM: X86: Fix vCPU preempted state from guest's point of view · 1eff0ada
      Wanpeng Li 提交于
      Commit 66570e96 (kvm: x86: only provide PV features if enabled in guest's
      CPUID) avoids to access pv tlb shootdown host side logic when this pv feature
      is not exposed to guest, however, kvm_steal_time.preempted not only leveraged
      by pv tlb shootdown logic but also mitigate the lock holder preemption issue.
      From guest's point of view, vCPU is always preempted since we lose the reset
      of kvm_steal_time.preempted before vmentry if pv tlb shootdown feature is not
      exposed. This patch fixes it by clearing kvm_steal_time.preempted before
      vmentry.
      
      Fixes: 66570e96 (kvm: x86: only provide PV features if enabled in guest's CPUID)
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1621339235-11131-3-git-send-email-wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1eff0ada
    • W
      KVM: X86: Bail out of direct yield in case of under-committed scenarios · 72b268a8
      Wanpeng Li 提交于
      In case of under-committed scenarios, vCPUs can be scheduled easily;
      kvm_vcpu_yield_to adds extra overhead, and it is also common to see
      when vcpu->ready is true but yield later failing due to p->state is
      TASK_RUNNING.
      
      Let's bail out in such scenarios by checking the length of current cpu
      runqueue, which can be treated as a hint of under-committed instead of
      guarantee of accuracy. 30%+ of directed-yield attempts can now avoid
      the expensive lookups in kvm_sched_yield() in an under-committed scenario.
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1621339235-11131-2-git-send-email-wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      72b268a8
    • W
      KVM: PPC: exit halt polling on need_resched() · 6bd5b743
      Wanpeng Li 提交于
      This is inspired by commit 262de410 (kvm: exit halt polling on
      need_resched() as well). Due to PPC implements an arch specific halt
      polling logic, we have to the need_resched() check there as well. This
      patch adds a helper function that can be shared between book3s and generic
      halt-polling loops.
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NVenkatesh Srinivas <venkateshs@chromium.org>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Venkatesh Srinivas <venkateshs@chromium.org>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: David Matlack <dmatlack@google.com>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1621339235-11131-1-git-send-email-wanpengli@tencent.com>
      [Make the function inline. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6bd5b743
  2. 25 5月, 2021 3 次提交
  3. 17 5月, 2021 1 次提交
  4. 15 5月, 2021 7 次提交
  5. 10 5月, 2021 9 次提交
    • L
      Linux 5.13-rc1 · 6efb943b
      Linus Torvalds 提交于
      6efb943b
    • L
      fbmem: fix horribly incorrect placement of __maybe_unused · 6dae40ae
      Linus Torvalds 提交于
      Commit b9d79e4c ("fbmem: Mark proc_fb_seq_ops as __maybe_unused")
      places the '__maybe_unused' in an entirely incorrect location between
      the "struct" keyword and the structure name.
      
      It's a wonder that gcc accepts that silently, but clang quite reasonably
      warns about it:
      
          drivers/video/fbdev/core/fbmem.c:736:21: warning: attribute declaration must precede definition [-Wignored-attributes]
          static const struct __maybe_unused seq_operations proc_fb_seq_ops = {
                              ^
      
      Fix it.
      
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6dae40ae
    • L
      Merge tag 'drm-next-2021-05-10' of git://anongit.freedesktop.org/drm/drm · efc58a96
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "Bit later than usual, I queued them all up on Friday then promptly
        forgot to write the pull request email. This is mainly amdgpu fixes,
        with some radeon/msm/fbdev and one i915 gvt fix thrown in.
      
        amdgpu:
         - MPO hang workaround
         - Fix for concurrent VM flushes on vega/navi
         - dcefclk is not adjustable on navi1x and newer
         - MST HPD debugfs fix
         - Suspend/resumes fixes
         - Register VGA clients late in case driver fails to load
         - Fix GEM leak in user framebuffer create
         - Add support for polaris12 with 32 bit memory interface
         - Fix duplicate cursor issue when using overlay
         - Fix corruption with tiled surfaces on VCN3
         - Add BO size and stride check to fix BO size verification
      
        radeon:
         - Fix off-by-one in power state parsing
         - Fix possible memory leak in power state parsing
      
        msm:
         - NULL ptr dereference fix
      
        fbdev:
         - procfs disabled warning fix
      
        i915:
         - gvt: Fix a possible division by zero in vgpu display rate
           calculation"
      
      * tag 'drm-next-2021-05-10' of git://anongit.freedesktop.org/drm/drm:
        drm/amdgpu: Use device specific BO size & stride check.
        drm/amdgpu: Init GFX10_ADDR_CONFIG for VCN v3 in DPG mode.
        drm/amd/pm: initialize variable
        drm/radeon: Avoid power table parsing memory leaks
        drm/radeon: Fix off-by-one power_state index heap overwrite
        drm/amd/display: Fix two cursor duplication when using overlay
        drm/amdgpu: add new MC firmware for Polaris12 32bit ASIC
        fbmem: Mark proc_fb_seq_ops as __maybe_unused
        drm/msm/dpu: Delete bonkers code
        drm/i915/gvt: Prevent divided by zero when calculating refresh rate
        amdgpu: fix GEM obj leak in amdgpu_display_user_framebuffer_create
        drm/amdgpu: Register VGA clients after init can no longer fail
        drm/amdgpu: Handling of amdgpu_device_resume return value for graceful teardown
        drm/amdgpu: fix r initial values
        drm/amd/display: fix wrong statement in mst hpd debugfs
        amdgpu/pm: set pp_dpm_dcefclk to readonly on NAVI10 and newer gpus
        amdgpu/pm: Prevent force of DCEFCLK on NAVI10 and SIENNA_CICHLID
        drm/amdgpu: fix concurrent VM flushes on Vega/Navi v2
        drm/amd/display: Reject non-zero src_y and src_x for video planes
      efc58a96
    • L
      Merge tag 'block-5.13-2021-05-09' of git://git.kernel.dk/linux-block · 506c3079
      Linus Torvalds 提交于
      Pull block fix from Jens Axboe:
       "Turns out the bio max size change still has issues, so let's get it
        reverted for 5.13-rc1. We'll shake out the issues there and defer it
        to 5.14 instead"
      
      * tag 'block-5.13-2021-05-09' of git://git.kernel.dk/linux-block:
        Revert "bio: limit bio max size"
      506c3079
    • L
      Merge tag '5.13-rc-smb3-part3' of git://git.samba.org/sfrench/cifs-2.6 · 0a55a1fb
      Linus Torvalds 提交于
      Pull cifs fixes from Steve French:
       "Three small SMB3 chmultichannel related changesets (also for stable)
        from the SMB3 test event this week.
      
        The other fixes are still in review/testing"
      
      * tag '5.13-rc-smb3-part3' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: if max_channels set to more than one channel request multichannel
        smb3: do not attempt multichannel to server which does not support it
        smb3: when mounting with multichannel include it in requested capabilities
      0a55a1fb
    • L
      Merge tag 'sched-urgent-2021-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9819f682
      Linus Torvalds 提交于
      Pull scheduler fixes from Thomas Gleixner:
       "A set of scheduler updates:
      
         - Prevent PSI state corruption when schedule() races with cgroup
           move.
      
           A recent commit combined two PSI callbacks to reduce the number of
           cgroup tree updates, but missed that schedule() can drop rq::lock
           for load balancing, which opens the race window for
           cgroup_move_task() which then observes half updated state.
      
           The fix is to solely use task::ps_flags instead of looking at the
           potentially mismatching scheduler state
      
         - Prevent an out-of-bounds access in uclamp caused bu a rounding
           division which can lead to an off-by-one error exceeding the
           buckets array size.
      
         - Prevent unfairness caused by missing load decay when a task is
           attached to a cfs runqueue.
      
           The old load of the task was attached to the runqueue and never
           removed. Fix it by enforcing the load update through the hierarchy
           for unthrottled run queue instances.
      
         - A documentation fix fot the 'sched_verbose' command line option"
      
      * tag 'sched-urgent-2021-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/fair: Fix unfairness caused by missing load decay
        sched: Fix out-of-bound access in uclamp
        psi: Fix psi state corruption when schedule() races with cgroup move
        sched,doc: sched_debug_verbose cmdline should be sched_verbose
      9819f682
    • L
      Merge tag 'locking-urgent-2021-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 732a27a0
      Linus Torvalds 提交于
      Pull locking fixes from Thomas Gleixner:
       "A set of locking related fixes and updates:
      
         - Two fixes for the futex syscall related to the timeout handling.
      
           FUTEX_LOCK_PI does not support the FUTEX_CLOCK_REALTIME bit and
           because it's not set the time namespace adjustment for clock
           MONOTONIC is applied wrongly.
      
           FUTEX_WAIT cannot support the FUTEX_CLOCK_REALTIME bit because its
           always a relative timeout.
      
         - Cleanups in the futex syscall entry points which became obvious
           when the two timeout handling bugs were fixed.
      
         - Cleanup of queued_write_lock_slowpath() as suggested by Linus
      
         - Fixup of the smp_call_function_single_async() prototype"
      
      * tag 'locking-urgent-2021-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        futex: Make syscall entry points less convoluted
        futex: Get rid of the val2 conditional dance
        futex: Do not apply time namespace adjustment on FUTEX_LOCK_PI
        Revert 337f1304 ("futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op")
        locking/qrwlock: Cleanup queued_write_lock_slowpath()
        smp: Fix smp_call_function_single_async prototype
      732a27a0
    • L
      Merge tag 'perf_urgent_for_v5.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 85bbba1c
      Linus Torvalds 提交于
      Pull x86 perf fix from Borislav Petkov:
       "Handle power-gating of AMD IOMMU perf counters properly when they are
        used"
      
      * tag 'perf_urgent_for_v5.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
      85bbba1c
    • L
      Merge tag 'x86_urgent_for_v5.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · dd3e4012
      Linus Torvalds 提交于
      Pull x86 fixes from Borislav Petkov:
       "A bunch of things accumulated for x86 in the last two weeks:
      
         - Fix guest vtime accounting so that ticks happening while the guest
           is running can also be accounted to it. Along with a consolidation
           to the guest-specific context tracking helpers.
      
         - Provide for the host NMI handler running after a VMX VMEXIT to be
           able to run on the kernel stack correctly.
      
         - Initialize MSR_TSC_AUX when RDPID is supported and not RDTSCP (virt
           relevant - real hw supports both)
      
         - A code generation improvement to TASK_SIZE_MAX through the use of
           alternatives
      
         - The usual misc and related cleanups and improvements"
      
      * tag 'x86_urgent_for_v5.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        KVM: x86: Consolidate guest enter/exit logic to common helpers
        context_tracking: KVM: Move guest enter/exit wrappers to KVM's domain
        context_tracking: Consolidate guest enter/exit wrappers
        sched/vtime: Move guest enter/exit vtime accounting to vtime.h
        sched/vtime: Move vtime accounting external declarations above inlines
        KVM: x86: Defer vtime accounting 'til after IRQ handling
        context_tracking: Move guest exit vtime accounting to separate helpers
        context_tracking: Move guest exit context tracking to separate helpers
        KVM/VMX: Invoke NMI non-IST entry instead of IST entry
        x86/cpu: Remove write_tsc() and write_rdtscp_aux() wrappers
        x86/cpu: Initialize MSR_TSC_AUX if RDTSCP *or* RDPID is supported
        x86/resctrl: Fix init const confusion
        x86: Delete UD0, UD1 traces
        x86/smpboot: Remove duplicate includes
        x86/cpu: Use alternative to generate the TASK_SIZE_MAX constant
      dd3e4012
  6. 09 5月, 2021 2 次提交