1. 13 8月, 2021 2 次提交
    • S
      KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs · 0103098f
      Sean Christopherson 提交于
      Set the min_level for the TDP iterator at the root level when zapping all
      SPTEs to optimize the iterator's try_step_down().  Zapping a non-leaf
      SPTE will recursively zap all its children, thus there is no need for the
      iterator to attempt to step down.  This avoids rereading the top-level
      SPTEs after they are zapped by causing try_step_down() to short-circuit.
      
      In most cases, optimizing try_step_down() will be in the noise as the cost
      of zapping SPTEs completely dominates the overall time.  The optimization
      is however helpful if the zap occurs with relatively few SPTEs, e.g. if KVM
      is zapping in response to multiple memslot updates when userspace is adding
      and removing read-only memslots for option ROMs.  In that case, the task
      doing the zapping likely isn't a vCPU thread, but it still holds mmu_lock
      for read and thus can be a noisy neighbor of sorts.
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210812181414.3376143-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0103098f
    • S
      KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs · 524a1e4e
      Sean Christopherson 提交于
      Pass "all ones" as the end GFN to signal "zap all" for the TDP MMU and
      really zap all SPTEs in this case.  As is, zap_gfn_range() skips non-leaf
      SPTEs whose range exceeds the range to be zapped.  If shadow_phys_bits is
      not aligned to the range size of top-level SPTEs, e.g. 512gb with 4-level
      paging, the "zap all" flows will skip top-level SPTEs whose range extends
      beyond shadow_phys_bits and leak their SPs when the VM is destroyed.
      
      Use the current upper bound (based on host.MAXPHYADDR) to detect that the
      caller wants to zap all SPTEs, e.g. instead of using the max theoretical
      gfn, 1 << (52 - 12).  The more precise upper bound allows the TDP iterator
      to terminate its walk earlier when running on hosts with MAXPHYADDR < 52.
      
      Add a WARN on kmv->arch.tdp_mmu_pages when the TDP MMU is destroyed to
      help future debuggers should KVM decide to leak SPTEs again.
      
      The bug is most easily reproduced by running (and unloading!) KVM in a
      VM whose host.MAXPHYADDR < 39, as the SPTE for gfn=0 will be skipped.
      
        =============================================================================
        BUG kvm_mmu_page_header (Not tainted): Objects remaining in kvm_mmu_page_header on __kmem_cache_shutdown()
        -----------------------------------------------------------------------------
        Slab 0x000000004d8f7af1 objects=22 used=2 fp=0x00000000624d29ac flags=0x4000000000000200(slab|zone=1)
        CPU: 0 PID: 1582 Comm: rmmod Not tainted 5.14.0-rc2+ #420
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        Call Trace:
         dump_stack_lvl+0x45/0x59
         slab_err+0x95/0xc9
         __kmem_cache_shutdown.cold+0x3c/0x158
         kmem_cache_destroy+0x3d/0xf0
         kvm_mmu_module_exit+0xa/0x30 [kvm]
         kvm_arch_exit+0x5d/0x90 [kvm]
         kvm_exit+0x78/0x90 [kvm]
         vmx_exit+0x1a/0x50 [kvm_intel]
         __x64_sys_delete_module+0x13f/0x220
         do_syscall_64+0x3b/0xc0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: faaf05b0 ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
      Cc: stable@vger.kernel.org
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210812181414.3376143-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      524a1e4e
  2. 05 8月, 2021 1 次提交
    • S
      KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds · d5aaad6f
      Sean Christopherson 提交于
      Take a signed 'long' instead of an 'unsigned long' for the number of
      pages to add/subtract to the total number of pages used by the MMU.  This
      fixes a zero-extension bug on 32-bit kernels that effectively corrupts
      the per-cpu counter used by the shrinker.
      
      Per-cpu counters take a signed 64-bit value on both 32-bit and 64-bit
      kernels, whereas kvm_mod_used_mmu_pages() takes an unsigned long and thus
      an unsigned 32-bit value on 32-bit kernels.  As a result, the value used
      to adjust the per-cpu counter is zero-extended (unsigned -> signed), not
      sign-extended (signed -> signed), and so KVM's intended -1 gets morphed to
      4294967295 and effectively corrupts the counter.
      
      This was found by a staggering amount of sheer dumb luck when running
      kvm-unit-tests on a 32-bit KVM build.  The shrinker just happened to kick
      in while running tests and do_shrink_slab() logged an error about trying
      to free a negative number of objects.  The truly lucky part is that the
      kernel just happened to be a slightly stale build, as the shrinker no
      longer yells about negative objects as of commit 18bb473e ("mm:
      vmscan: shrink deferred objects proportional to priority").
      
       vmscan: shrink_slab: mmu_shrink_scan+0x0/0x210 [kvm] negative objects to delete nr=-858993460
      
      Fixes: bc8a3d89 ("kvm: mmu: Fix overflow on kvm mmu page limit calculation")
      Cc: stable@vger.kernel.org
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210804214609.1096003-1-seanjc@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d5aaad6f
  3. 04 8月, 2021 4 次提交
    • M
      KVM: selftests: fix hyperv_clock test · 13c2c3cf
      Maxim Levitsky 提交于
      The test was mistakenly using addr_gpa2hva on a gva and that happened
      to work accidentally.  Commit 106a2e76 ("KVM: selftests: Lower the
      min virtual address for misc page allocations") revealed this bug.
      
      Fixes: 2c7f76b4 ("selftests: kvm: Add basic Hyper-V clocksources tests", 2021-03-18)
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210804112057.409498-1-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      13c2c3cf
    • M
      KVM: SVM: improve the code readability for ASID management · bb2baeb2
      Mingwei Zhang 提交于
      KVM SEV code uses bitmaps to manage ASID states. ASID 0 was always skipped
      because it is never used by VM. Thus, in existing code, ASID value and its
      bitmap postion always has an 'offset-by-1' relationship.
      
      Both SEV and SEV-ES shares the ASID space, thus KVM uses a dynamic range
      [min_asid, max_asid] to handle SEV and SEV-ES ASIDs separately.
      
      Existing code mixes the usage of ASID value and its bitmap position by
      using the same variable called 'min_asid'.
      
      Fix the min_asid usage: ensure that its usage is consistent with its name;
      allocate extra size for ASID 0 to ensure that each ASID has the same value
      with its bitmap position. Add comments on ASID bitmap allocation to clarify
      the size change.
      Signed-off-by: NMingwei Zhang <mizhang@google.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Alper Gun <alpergun@google.com>
      Cc: Dionna Glaze <dionnaglaze@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Vipin Sharma <vipinsh@google.com>
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Message-Id: <20210802180903.159381-1-mizhang@google.com>
      [Fix up sev_asid_free to also index by ASID, as suggested by Sean
       Christopherson, and use nr_asids in sev_cpu_init. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bb2baeb2
    • S
      KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB · 179c6c27
      Sean Christopherson 提交于
      Use the raw ASID, not ASID-1, when nullifying the last used VMCB when
      freeing an SEV ASID.  The consumer, pre_sev_run(), indexes the array by
      the raw ASID, thus KVM could get a false negative when checking for a
      different VMCB if KVM manages to reallocate the same ASID+VMCB combo for
      a new VM.
      
      Note, this cannot cause a functional issue _in the current code_, as
      pre_sev_run() also checks which pCPU last did VMRUN for the vCPU, and
      last_vmentry_cpu is initialized to -1 during vCPU creation, i.e. is
      guaranteed to mismatch on the first VMRUN.  However, prior to commit
      8a14fe4f ("kvm: x86: Move last_cpu into kvm_vcpu_arch as
      last_vmentry_cpu"), SVM tracked pCPU on its own and zero-initialized the
      last_cpu variable.  Thus it's theoretically possible that older versions
      of KVM could miss a TLB flush if the first VMRUN is on pCPU0 and the ASID
      and VMCB exactly match those of a prior VM.
      
      Fixes: 70cd94e6 ("KVM: SVM: VMRUN should use associated ASID when SEV is enabled")
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      179c6c27
    • P
      KVM: Do not leak memory for duplicate debugfs directories · 85cd39af
      Paolo Bonzini 提交于
      KVM creates a debugfs directory for each VM in order to store statistics
      about the virtual machine.  The directory name is built from the process
      pid and a VM fd.  While generally unique, it is possible to keep a
      file descriptor alive in a way that causes duplicate directories, which
      manifests as these messages:
      
        [  471.846235] debugfs: Directory '20245-4' with parent 'kvm' already present!
      
      Even though this should not happen in practice, it is more or less
      expected in the case of KVM for testcases that call KVM_CREATE_VM and
      close the resulting file descriptor repeatedly and in parallel.
      
      When this happens, debugfs_create_dir() returns an error but
      kvm_create_vm_debugfs() goes on to allocate stat data structs which are
      later leaked.  The slow memory leak was spotted by syzkaller, where it
      caused OOM reports.
      
      Since the issue only affects debugfs, do a lookup before calling
      debugfs_create_dir, so that the message is downgraded and rate-limited.
      While at it, ensure kvm->debugfs_dentry is NULL rather than an error
      if it is not created.  This fixes kvm_destroy_vm_debugfs, which was not
      checking IS_ERR_OR_NULL correctly.
      
      Cc: stable@vger.kernel.org
      Fixes: 536a6f88 ("KVM: Create debugfs dir and stat files for each VM")
      Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Suggested-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      85cd39af
  4. 03 8月, 2021 4 次提交
  5. 30 7月, 2021 1 次提交
    • P
      KVM: x86: accept userspace interrupt only if no event is injected · fa7a549d
      Paolo Bonzini 提交于
      Once an exception has been injected, any side effects related to
      the exception (such as setting CR2 or DR6) have been taked place.
      Therefore, once KVM sets the VM-entry interruption information
      field or the AMD EVENTINJ field, the next VM-entry must deliver that
      exception.
      
      Pending interrupts are processed after injected exceptions, so
      in theory it would not be a problem to use KVM_INTERRUPT when
      an injected exception is present.  However, DOSEMU is using
      run->ready_for_interrupt_injection to detect interrupt windows
      and then using KVM_SET_SREGS/KVM_SET_REGS to inject the
      interrupt manually.  For this to work, the interrupt window
      must be delayed after the completion of the previous event
      injection.
      
      Cc: stable@vger.kernel.org
      Reported-by: NStas Sergeev <stsp2@yandex.ru>
      Tested-by: NStas Sergeev <stsp2@yandex.ru>
      Fixes: 71cc849b ("KVM: x86: Fix split-irqchip vs interrupt injection window request")
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fa7a549d
  6. 28 7月, 2021 10 次提交
    • P
      KVM: add missing compat KVM_CLEAR_DIRTY_LOG · 8750f9bb
      Paolo Bonzini 提交于
      The arguments to the KVM_CLEAR_DIRTY_LOG ioctl include a pointer,
      therefore it needs a compat ioctl implementation.  Otherwise,
      32-bit userspace fails to invoke it on 64-bit kernels; for x86
      it might work fine by chance if the padding is zero, but not
      on big-endian architectures.
      
      Reported-by: Thomas Sattler
      Cc: stable@vger.kernel.org
      Fixes: 2a31b9db ("kvm: introduce manual dirty log reprotect")
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8750f9bb
    • L
      KVM: use cpu_relax when halt polling · 74775654
      Li RongQing 提交于
      SMT siblings share caches and other hardware, and busy halt polling
      will degrade its sibling performance if its sibling is working
      
      Sean Christopherson suggested as below:
      
      "Rather than disallowing halt-polling entirely, on x86 it should be
      sufficient to simply have the hardware thread yield to its sibling(s)
      via PAUSE.  It probably won't get back all performance, but I would
      expect it to be close.
      This compiles on all KVM architectures, and AFAICT the intended usage
      of cpu_relax() is identical for all architectures."
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NLi RongQing <lirongqing@baidu.com>
      Message-Id: <20210727111247.55510-1-lirongqing@baidu.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      74775654
    • M
      KVM: SVM: use vmcb01 in svm_refresh_apicv_exec_ctrl · 5868b822
      Maxim Levitsky 提交于
      Currently when SVM is enabled in guest CPUID, AVIC is inhibited as soon
      as the guest CPUID is set.
      
      AVIC happens to be fully disabled on all vCPUs by the time any guest
      entry starts (if after migration the entry can be nested).
      
      The reason is that currently we disable avic right away on vCPU from which
      the kvm_request_apicv_update was called and for this case, it happens to be
      called on all vCPUs (by svm_vcpu_after_set_cpuid).
      
      After we stop doing this, AVIC will end up being disabled only when
      KVM_REQ_APICV_UPDATE is processed which is after we done switching to the
      nested guest.
      
      Fix this by just using vmcb01 in svm_refresh_apicv_exec_ctrl for avic
      (which is a right thing to do anyway).
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210713142023.106183-4-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5868b822
    • M
      KVM: SVM: tweak warning about enabled AVIC on nested entry · feea0136
      Maxim Levitsky 提交于
      It is possible that AVIC was requested to be disabled but
      not yet disabled, e.g if the nested entry is done right
      after svm_vcpu_after_set_cpuid.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210713142023.106183-3-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      feea0136
    • M
      KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be deactivated · f1577ab2
      Maxim Levitsky 提交于
      It is possible for AVIC inhibit and AVIC active state to be mismatched.
      Currently we disable AVIC right away on vCPU which started the AVIC inhibit
      request thus this warning doesn't trigger but at least in theory,
      if svm_set_vintr is called at the same time on multiple vCPUs,
      the warning can happen.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210713142023.106183-2-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f1577ab2
    • C
      KVM: s390: restore old debugfs names · bb000f64
      Christian Borntraeger 提交于
      commit bc9e9e67 ("KVM: debugfs: Reuse binary stats descriptors")
      did replace the old definitions with the binary ones. While doing that
      it missed that some files are names different than the counters. This
      is especially important for kvm_stat which does have special handling
      for counters named instruction_*.
      
      Fixes: commit bc9e9e67 ("KVM: debugfs: Reuse binary stats descriptors")
      CC: Jing Zhang <jingzhangos@google.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Message-Id: <20210726150108.5603-1-borntraeger@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bb000f64
    • P
      KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized · 3fa5e8fd
      Paolo Bonzini 提交于
      Right now, svm_hv_vmcb_dirty_nested_enlightenments has an incorrect
      dereference of vmcb->control.reserved_sw before the vmcb is checked
      for being non-NULL.  The compiler is usually sinking the dereference
      after the check; instead of doing this ourselves in the source,
      ensure that svm_hv_vmcb_dirty_nested_enlightenments is only called
      with a non-NULL VMCB.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: Vineeth Pillai <viremana@linux.microsoft.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      [Untested for now due to issues with my AMD machine. - Paolo]
      3fa5e8fd
    • D
      KVM: selftests: Introduce access_tracking_perf_test · c33e05d9
      David Matlack 提交于
      This test measures the performance effects of KVM's access tracking.
      Access tracking is driven by the MMU notifiers test_young, clear_young,
      and clear_flush_young. These notifiers do not have a direct userspace
      API, however the clear_young notifier can be triggered by marking a
      pages as idle in /sys/kernel/mm/page_idle/bitmap. This test leverages
      that mechanism to enable access tracking on guest memory.
      
      To measure performance this test runs a VM with a configurable number of
      vCPUs that each touch every page in disjoint regions of memory.
      Performance is measured in the time it takes all vCPUs to finish
      touching their predefined region.
      
      Example invocation:
      
        $ ./access_tracking_perf_test -v 8
        Testing guest mode: PA-bits:ANY, VA-bits:48,  4K pages
        guest physical test memory offset: 0xffdfffff000
      
        Populating memory             : 1.337752570s
        Writing to populated memory   : 0.010177640s
        Reading from populated memory : 0.009548239s
        Mark memory idle              : 23.973131748s
        Writing to idle memory        : 0.063584496s
        Mark memory idle              : 24.924652964s
        Reading from idle memory      : 0.062042814s
      
      Breaking down the results:
      
       * "Populating memory": The time it takes for all vCPUs to perform the
         first write to every page in their region.
      
       * "Writing to populated memory" / "Reading from populated memory": The
         time it takes for all vCPUs to write and read to every page in their
         region after it has been populated. This serves as a control for the
         later results.
      
       * "Mark memory idle": The time it takes for every vCPU to mark every
         page in their region as idle through page_idle.
      
       * "Writing to idle memory" / "Reading from idle memory": The time it
         takes for all vCPUs to write and read to every page in their region
         after it has been marked idle.
      
      This test should be portable across architectures but it is only enabled
      for x86_64 since that's all I have tested.
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20210713220957.3493520-7-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c33e05d9
    • D
      KVM: selftests: Fix missing break in dirty_log_perf_test arg parsing · 15b7b737
      David Matlack 提交于
      There is a missing break statement which causes a fallthrough to the
      next statement where optarg will be null and a segmentation fault will
      be generated.
      
      Fixes: 9e965bb7 ("KVM: selftests: Add backing src parameter to dirty_log_perf_test")
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20210713220957.3493520-6-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      15b7b737
    • J
      x86/kvm: fix vcpu-id indexed array sizes · 76b4f357
      Juergen Gross 提交于
      KVM_MAX_VCPU_ID is the maximum vcpu-id of a guest, and not the number
      of vcpu-ids. Fix array indexed by vcpu-id to have KVM_MAX_VCPU_ID+1
      elements.
      
      Note that this is currently no real problem, as KVM_MAX_VCPU_ID is
      an odd number, resulting in always enough padding being available at
      the end of those arrays.
      
      Nevertheless this should be fixed in order to avoid rare problems in
      case someone is using an even number for KVM_MAX_VCPU_ID.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Message-Id: <20210701154105.23215-2-jgross@suse.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      76b4f357
  7. 26 7月, 2021 5 次提交
  8. 19 7月, 2021 6 次提交
    • P
      Merge tag 'kvmarm-fixes-5.14-1' of... · 7025098a
      Paolo Bonzini 提交于
      Merge tag 'kvmarm-fixes-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 5.14, take #1
      
      - Fix MTE shared page detection
      
      - Fix selftest use of obsolete pthread_yield() in favour of sched_yield()
      
      - Enable selftest's use of PMU registers when asked to
      7025098a
    • L
      Linux 5.14-rc2 · 2734d6c1
      Linus Torvalds 提交于
      2734d6c1
    • L
      Merge tag 'perf-tools-fixes-for-v5.14-2021-07-18' of... · 8c25c447
      Linus Torvalds 提交于
      Merge tag 'perf-tools-fixes-for-v5.14-2021-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Skip invalid hybrid PMU on hybrid systems when the atom (little) CPUs
         are offlined.
      
       - Fix 'perf test' problems related to the recently added hybrid
         (BIG/little) code.
      
       - Split ARM's coresight (hw tracing) decode by aux records to avoid
         fatal decoding errors.
      
       - Fix add event failure in 'perf probe' when running 32-bit perf in a
         64-bit kernel.
      
       - Fix 'perf sched record' failure when CONFIG_SCHEDSTATS is not set.
      
       - Fix memory and refcount leaks detected by ASAn when running 'perf
         test', should be clean of warnings now.
      
       - Remove broken definition of __LITTLE_ENDIAN from tools'
         linux/kconfig.h, which was breaking the build in some systems.
      
       - Cast PTHREAD_STACK_MIN to int as it may turn into 'long
         sysconf(__SC_THREAD_STACK_MIN_VALUE), breaking the build in some
         systems.
      
       - Fix libperf build error with LIBPFM4=1.
      
       - Sync UAPI files changed by the memfd_secret new syscall.
      
      * tag 'perf-tools-fixes-for-v5.14-2021-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (35 commits)
        perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set
        perf probe: Fix add event failure when running 32-bit perf in a 64-bit kernel
        perf data: Close all files in close_dir()
        perf probe-file: Delete namelist in del_events() on the error path
        perf test bpf: Free obj_buf
        perf trace: Free strings in trace__parse_events_option()
        perf trace: Free syscall tp fields in evsel->priv
        perf trace: Free syscall->arg_fmt
        perf trace: Free malloc'd trace fields on exit
        perf lzma: Close lzma stream on exit
        perf script: Fix memory 'threads' and 'cpus' leaks on exit
        perf script: Release zstd data
        perf session: Cleanup trace_event
        perf inject: Close inject.output on exit
        perf report: Free generated help strings for sort option
        perf env: Fix memory leak of cpu_pmu_caps
        perf test maps__merge_in: Fix memory leak of maps
        perf dso: Fix memory leak in dso__new_map()
        perf test event_update: Fix memory leak of unit
        perf test event_update: Fix memory leak of evlist
        ...
      8c25c447
    • L
      Merge tag 'xfs-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f0eb870a
      Linus Torvalds 提交于
      Pull xfs fixes from Darrick Wong:
       "A few fixes for issues in the new online shrink code, additional
        corrections for my recent bug-hunt w.r.t. extent size hints on
        realtime, and improved input checking of the GROWFSRT ioctl.
      
        IOW, the usual 'I somehow got bored during the merge window and
        resumed auditing the farther reaches of xfs':
      
         - Fix shrink eligibility checking when sparse inode clusters enabled
      
         - Reset '..' directory entries when unlinking directories to prevent
           verifier errors if fs is shrinked later
      
         - Don't report unusable extent size hints to FSGETXATTR
      
         - Don't warn when extent size hints are unusable because the sysadmin
           configured them that way
      
         - Fix insufficient parameter validation in GROWFSRT ioctl
      
         - Fix integer overflow when adding rt volumes to filesystem"
      
      * tag 'xfs-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: detect misaligned rtinherit directory extent size hints
        xfs: fix an integer overflow error in xfs_growfs_rt
        xfs: improve FSGROWFSRT precondition checking
        xfs: don't expose misaligned extszinherit hints to userspace
        xfs: correct the narrative around misaligned rtinherit/extszinherit dirs
        xfs: reset child dir '..' entry when unlinking child
        xfs: check for sparse inode clusters that cross new EOAG when shrinking
      f0eb870a
    • L
      Merge tag 'iomap-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · fbf1bddc
      Linus Torvalds 提交于
      Pull iomap fixes from Darrick Wong:
       "A handful of bugfixes for the iomap code.
      
        There's nothing especially exciting here, just fixes for UBSAN (not
        KASAN as I erroneously wrote in the tag message) warnings about
        undefined behavior in the SEEK_DATA/SEEK_HOLE code, and some
        reshuffling of per-page block state info to fix some problems with
        gfs2.
      
         - Fix KASAN warnings due to integer overflow in SEEK_DATA/SEEK_HOLE
      
         - Fix assertion errors when using inlinedata files on gfs2"
      
      * tag 'iomap-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: Don't create iomap_page objects in iomap_page_mkwrite_actor
        iomap: Don't create iomap_page objects for inline files
        iomap: Permit pages without an iop to enter writeback
        iomap: remove the length variable in iomap_seek_hole
        iomap: remove the length variable in iomap_seek_data
      fbf1bddc
    • L
      Merge tag 'kbuild-fixes-v5.14' of... · 6750691a
      Linus Torvalds 提交于
      Merge tag 'kbuild-fixes-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - Restore the original behavior of scripts/setlocalversion when
         LOCALVERSION is set to empty.
      
       - Show Kconfig prompts even for 'make -s'
      
       - Fix the combination of COFNIG_LTO_CLANG=y and CONFIG_MODVERSIONS=y
         for older GNU Make versions
      
      * tag 'kbuild-fixes-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        Documentation: Fix intiramfs script name
        Kbuild: lto: fix module versionings mismatch in GNU make 3.X
        kbuild: do not suppress Kconfig prompts for silent build
        scripts/setlocalversion: fix a bug when LOCALVERSION is empty
      6750691a
  9. 18 7月, 2021 7 次提交
    • R
      Documentation: Fix intiramfs script name · 5e60f363
      Robert Richter 提交于
      Documentation was not changed when renaming the script in commit
      80e715a0 ("initramfs: rename gen_initramfs_list.sh to
      gen_initramfs.sh"). Fixing this.
      
      Basically does:
      
       $ sed -i -e s/gen_initramfs_list.sh/gen_initramfs.sh/g $(git grep -l gen_initramfs_list.sh)
      
      Fixes: 80e715a0 ("initramfs: rename gen_initramfs_list.sh to gen_initramfs.sh")
      Signed-off-by: NRobert Richter <rrichter@amd.com>
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      5e60f363
    • L
      Kbuild: lto: fix module versionings mismatch in GNU make 3.X · 1d11053d
      Lecopzer Chen 提交于
      When building modules(CONFIG_...=m), I found some of module versions
      are incorrect and set to 0.
      This can be found in build log for first clean build which shows
      
      WARNING: EXPORT symbol "XXXX" [drivers/XXX/XXX.ko] version generation failed,
      symbol will not be versioned.
      
      But in second build(incremental build), the WARNING disappeared and the
      module version becomes valid CRC and make someone who want to change
      modules without updating kernel image can't insert their modules.
      
      The problematic code is
      +	$(foreach n, $(filter-out FORCE,$^),				\
      +		$(if $(wildcard $(n).symversions),			\
      +			; cat $(n).symversions >> $@.symversions))
      
      For example:
        rm -f fs/notify/built-in.a.symversions    ; rm -f fs/notify/built-in.a; \
      llvm-ar cDPrST fs/notify/built-in.a fs/notify/fsnotify.o \
      fs/notify/notification.o fs/notify/group.o ...
      
      `foreach n` shows nothing to `cat` into $(n).symversions because
      `if $(wildcard $(n).symversions)` return nothing, but actually
      they do exist during this line was executed.
      
      -rw-r--r-- 1 root root 168580 Jun 13 19:10 fs/notify/fsnotify.o
      -rw-r--r-- 1 root root    111 Jun 13 19:10 fs/notify/fsnotify.o.symversions
      
      The reason is the $(n).symversions are generated at runtime, but
      Makefile wildcard function expends and checks the file exist or not
      during parsing the Makefile.
      
      Thus fix this by use `test` shell command to check the file
      existence in runtime.
      
      Rebase from both:
      1. [https://lore.kernel.org/lkml/20210616080252.32046-1-lecopzer.chen@mediatek.com/]
      2. [https://lore.kernel.org/lkml/20210702032943.7865-1-lecopzer.chen@mediatek.com/]
      
      Fixes: 38e89184 ("kbuild: lto: fix module versioning")
      Co-developed-by: NSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: NLecopzer Chen <lecopzer.chen@mediatek.com>
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      1d11053d
    • M
      kbuild: do not suppress Kconfig prompts for silent build · d952cfaf
      Masahiro Yamada 提交于
      When a new CONFIG option is available, Kbuild shows a prompt to get
      the user input.
      
        $ make
        [ snip ]
        Core Scheduling for SMT (SCHED_CORE) [N/y/?] (NEW)
      
      This is the only interactive place in the build process.
      
      Commit 174a1dcc ("kbuild: sink stdout from cmd for silent build")
      suppressed Kconfig prompts as well because syncconfig is invoked by
      the 'cmd' macro. You cannot notice the fact that Kconfig is waiting
      for the user input.
      
      Use 'kecho' to show the equivalent short log without suppressing stdout
      from sub-make.
      
      Fixes: 174a1dcc ("kbuild: sink stdout from cmd for silent build")
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      Tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      d952cfaf
    • M
      scripts/setlocalversion: fix a bug when LOCALVERSION is empty · 5df99bec
      Mikulas Patocka 提交于
      The commit 042da426 ("scripts/setlocalversion: simplify the short
      version part") reduces indentation. Unfortunately, it also changes behavior
      in a subtle way - if the user has empty "LOCALVERSION" variable, the plus
      sign is appended to the kernel version. It wasn't appended before.
      
      This patch reverts to the old behavior - we append the plus sign only if
      the LOCALVERSION variable is not set.
      
      Fixes: 042da426 ("scripts/setlocalversion: simplify the short version part")
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      5df99bec
    • Y
      perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set · b0f00855
      Yang Jihong 提交于
      The tracepoints trace_sched_stat_{wait, sleep, iowait} are not exposed to user
      if CONFIG_SCHEDSTATS is not set, "perf sched record" records the three events.
      As a result, the command fails.
      
      Before:
      
        #perf sched record sleep 1
        event syntax error: 'sched:sched_stat_wait'
                             \___ unknown tracepoint
      
        Error:  File /sys/kernel/tracing/events/sched/sched_stat_wait not found.
        Hint:   Perhaps this kernel misses some CONFIG_ setting to enable this feature?.
      
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      
      Solution:
        Check whether schedstat tracepoints are exposed. If no, these events are not recorded.
      
      After:
        # perf sched record sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.163 MB perf.data (1091 samples) ]
        # perf sched report
        run measurement overhead: 4736 nsecs
        sleep measurement overhead: 9059979 nsecs
        the run test took 999854 nsecs
        the sleep test took 8945271 nsecs
        nr_run_events:        716
        nr_sleep_events:      785
        nr_wakeup_events:     0
        ...
        ------------------------------------------------------------
      
      Fixes: 2a09b5de ("sched/fair: do not expose some tracepoints to user if CONFIG_SCHEDSTATS is not set")
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Yafang Shao <laoar.shao@gmail.com>
      Link: http://lore.kernel.org/lkml/20210713112358.194693-1-yangjihong1@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b0f00855
    • Y
      perf probe: Fix add event failure when running 32-bit perf in a 64-bit kernel · 22a66551
      Yang Jihong 提交于
      The "address" member of "struct probe_trace_point" uses long data type.
      If kernel is 64-bit and perf program is 32-bit, size of "address"
      variable is 32 bits.
      
      As a result, upper 32 bits of address read from kernel are truncated, an
      error occurs during address comparison in kprobe_warn_out_range().
      
      Before:
      
        # perf probe -a schedule
        schedule is out of .text, skip it.
          Error: Failed to add events.
      
      Solution:
        Change data type of "address" variable to u64 and change corresponding
      address printing and value assignment.
      
      After:
      
        # perf.new.new probe -a schedule
        Added new event:
          probe:schedule       (on schedule)
      
        You can now use it in all perf tools, such as:
      
                perf record -e probe:schedule -aR sleep 1
      
        # perf probe -l
          probe:schedule       (on schedule@kernel/sched/core.c)
        # perf record -e probe:schedule -aR sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.156 MB perf.data (1366 samples) ]
        # perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'probe:schedule'
        # Event count (approx.): 1366
        #
        # Overhead  Command          Shared Object      Symbol
        # ........  ...............  .................  ............
        #
             6.22%  migration/0      [kernel.kallsyms]  [k] schedule
             6.22%  migration/1      [kernel.kallsyms]  [k] schedule
             6.22%  migration/2      [kernel.kallsyms]  [k] schedule
             6.22%  migration/3      [kernel.kallsyms]  [k] schedule
             6.15%  migration/10     [kernel.kallsyms]  [k] schedule
             6.15%  migration/11     [kernel.kallsyms]  [k] schedule
             6.15%  migration/12     [kernel.kallsyms]  [k] schedule
             6.15%  migration/13     [kernel.kallsyms]  [k] schedule
             6.15%  migration/14     [kernel.kallsyms]  [k] schedule
             6.15%  migration/15     [kernel.kallsyms]  [k] schedule
             6.15%  migration/4      [kernel.kallsyms]  [k] schedule
             6.15%  migration/5      [kernel.kallsyms]  [k] schedule
             6.15%  migration/6      [kernel.kallsyms]  [k] schedule
             6.15%  migration/7      [kernel.kallsyms]  [k] schedule
             6.15%  migration/8      [kernel.kallsyms]  [k] schedule
             6.15%  migration/9      [kernel.kallsyms]  [k] schedule
             0.22%  rcu_sched        [kernel.kallsyms]  [k] schedule
        ...
        #
        # (Cannot load tips.txt file, please install perf!)
        #
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jianlin Lv <jianlin.lv@arm.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Li Huafei <lihuafei1@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Link: http://lore.kernel.org/lkml/20210715063723.11926-1-yangjihong1@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      22a66551
    • R
      perf data: Close all files in close_dir() · d4b3eedc
      Riccardo Mancini 提交于
      When using 'perf report' in directory mode, the first file is not closed
      on exit, causing a memory leak.
      
      The problem is caused by the iterating variable never reaching 0.
      
      Fixes: 14552063 ("perf data: Add perf_data__(create_dir|close_dir) functions")
      Signed-off-by: NRiccardo Mancini <rickyman7@gmail.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zhen Lei <thunder.leizhen@huawei.com>
      Link: http://lore.kernel.org/lkml/20210716141122.858082-1-rickyman7@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d4b3eedc