1. 31 3月, 2021 2 次提交
  2. 25 3月, 2021 2 次提交
  3. 23 3月, 2021 2 次提交
  4. 20 3月, 2021 2 次提交
  5. 19 3月, 2021 4 次提交
    • T
      x86/ioapic: Ignore IRQ2 again · a501b048
      Thomas Gleixner 提交于
      Vitaly ran into an issue with hotplugging CPU0 on an Amazon instance where
      the matrix allocator claimed to be out of vectors. He analyzed it down to
      the point that IRQ2, the PIC cascade interrupt, which is supposed to be not
      ever routed to the IO/APIC ended up having an interrupt vector assigned
      which got moved during unplug of CPU0.
      
      The underlying issue is that IRQ2 for various reasons (see commit
      af174783 ("x86: I/O APIC: Never configure IRQ2" for details) is treated
      as a reserved system vector by the vector core code and is not accounted as
      a regular vector. The Amazon BIOS has an routing entry of pin2 to IRQ2
      which causes the IO/APIC setup to claim that interrupt which is granted by
      the vector domain because there is no sanity check. As a consequence the
      allocation counter of CPU0 underflows which causes a subsequent unplug to
      fail with:
      
        [ ... ] CPU 0 has 4294967295 vectors, 589 available. Cannot disable CPU
      
      There is another sanity check missing in the matrix allocator, but the
      underlying root cause is that the IO/APIC code lost the IRQ2 ignore logic
      during the conversion to irqdomains.
      
      For almost 6 years nobody complained about this wreckage, which might
      indicate that this requirement could be lifted, but for any system which
      actually has a PIC IRQ2 is unusable by design so any routing entry has no
      effect and the interrupt cannot be connected to a device anyway.
      
      Due to that and due to history biased paranoia reasons restore the IRQ2
      ignore logic and treat it as non existent despite a routing entry claiming
      otherwise.
      
      Fixes: d32932d0 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
      Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210318192819.636943062@linutronix.de
      
      a501b048
    • W
      x86/kvm: Fix broken irq restoration in kvm_wait · f4e61f0c
      Wanpeng Li 提交于
      After commit 997acaf6 (lockdep: report broken irq restoration), the guest
      splatting below during boot:
      
       raw_local_irq_restore() called with IRQs enabled
       WARNING: CPU: 1 PID: 169 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x26/0x30
       Modules linked in: hid_generic usbhid hid
       CPU: 1 PID: 169 Comm: systemd-udevd Not tainted 5.11.0+ #25
       RIP: 0010:warn_bogus_irq_restore+0x26/0x30
       Call Trace:
        kvm_wait+0x76/0x90
        __pv_queued_spin_lock_slowpath+0x285/0x2e0
        do_raw_spin_lock+0xc9/0xd0
        _raw_spin_lock+0x59/0x70
        lockref_get_not_dead+0xf/0x50
        __legitimize_path+0x31/0x60
        legitimize_root+0x37/0x50
        try_to_unlazy_next+0x7f/0x1d0
        lookup_fast+0xb0/0x170
        path_openat+0x165/0x9b0
        do_filp_open+0x99/0x110
        do_sys_openat2+0x1f1/0x2e0
        do_sys_open+0x5c/0x80
        __x64_sys_open+0x21/0x30
        do_syscall_64+0x32/0x50
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The new consistency checking,  expects local_irq_save() and
      local_irq_restore() to be paired and sanely nested, and therefore expects
      local_irq_restore() to be called with irqs disabled.
      The irqflags handling in kvm_wait() which ends up doing:
      
      	local_irq_save(flags);
      	safe_halt();
      	local_irq_restore(flags);
      
      instead triggers it.  This patch fixes it by using
      local_irq_disable()/enable() directly.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1615791328-2735-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f4e61f0c
    • W
      KVM: X86: Fix missing local pCPU when executing wbinvd on all dirty pCPUs · c2162e13
      Wanpeng Li 提交于
      In order to deal with noncoherent DMA, we should execute wbinvd on
      all dirty pCPUs when guest wbinvd exits to maintain data consistency.
      smp_call_function_many() does not execute the provided function on the
      local core, therefore replace it by on_each_cpu_mask().
      Reported-by: NNadav Amit <namit@vmware.com>
      Cc: Nadav Amit <namit@vmware.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1615517151-7465-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c2162e13
    • S
      KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish · b318e8de
      Sean Christopherson 提交于
      Fix a plethora of issues with MSR filtering by installing the resulting
      filter as an atomic bundle instead of updating the live filter one range
      at a time.  The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as
      the hardware MSR bitmaps won't be updated until the next VM-Enter, but
      the relevant software struct is atomically updated, which is what KVM
      really needs.
      
      Similar to the approach used for modifying memslots, make arch.msr_filter
      a SRCU-protected pointer, do all the work configuring the new filter
      outside of kvm->lock, and then acquire kvm->lock only when the new filter
      has been vetted and created.  That way vCPU readers either see the old
      filter or the new filter in their entirety, not some half-baked state.
      
      Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a
      TOCTOU bug, but that's just the tip of the iceberg...
      
        - Nothing is __rcu annotated, making it nigh impossible to audit the
          code for correctness.
        - kvm_add_msr_filter() has an unpaired smp_wmb().  Violation of kernel
          coding style aside, the lack of a smb_rmb() anywhere casts all code
          into doubt.
        - kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs
          count before taking the lock.
        - kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug.
      
      The entire approach of updating the live filter is also flawed.  While
      installing a new filter is inherently racy if vCPUs are running, fixing
      the above issues also makes it trivial to ensure certain behavior is
      deterministic, e.g. KVM can provide deterministic behavior for MSRs with
      identical settings in the old and new filters.  An atomic update of the
      filter also prevents KVM from getting into a half-baked state, e.g. if
      installing a filter fails, the existing approach would leave the filter
      in a half-baked state, having already committed whatever bits of the
      filter were already processed.
      
      [*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com
      
      Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering")
      Cc: stable@vger.kernel.org
      Cc: Alexander Graf <graf@amazon.com>
      Reported-by: NYuan Yao <yaoyuan0329os@gmail.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210316184436.2544875-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b318e8de
  6. 18 3月, 2021 4 次提交
    • V
      KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment · 0469f2f7
      Vitaly Kuznetsov 提交于
      When guest opts for re-enlightenment notifications upon migration, it is
      in its right to assume that TSC page values never change (as they're only
      supposed to change upon migration and the host has to keep things as they
      are before it receives confirmation from the guest). This is mostly true
      until the guest is migrated somewhere. KVM userspace (e.g. QEMU) will
      trigger masterclock update by writing to HV_X64_MSR_REFERENCE_TSC, by
      calling KVM_SET_CLOCK,... and as TSC value and kvmclock reading drift
      apart (even slightly), the update causes TSC page values to change.
      
      The issue at hand is that when Hyper-V is migrated, it uses stale (cached)
      TSC page values to compute the difference between its own clocksource
      (provided by KVM) and its guests' TSC pages to program synthetic timers
      and in some cases, when TSC page is updated, this puts all stimer
      expirations in the past. This, in its turn, causes an interrupt storm
      and L2 guests not making much forward progress.
      
      Note, KVM doesn't fully implement re-enlightenment notification. Basically,
      the support for reenlightenment MSRs is just a stub and userspace is only
      expected to expose the feature when TSC scaling on the expected destination
      hosts is available. With TSC scaling, no real re-enlightenment is needed
      as TSC frequency doesn't change. With TSC scaling becoming ubiquitous, it
      likely makes little sense to fully implement re-enlightenment in KVM.
      
      Prevent TSC page from being updated after migration. In case it's not the
      guest who's initiating the change and when TSC page is already enabled,
      just keep it as it is: TSC value is supposed to be preserved across
      migration and TSC frequency can't change with re-enlightenment enabled.
      The guest is doomed anyway if any of this is not true.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210316143736.964151-5-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0469f2f7
    • V
      KVM: x86: hyper-v: Track Hyper-V TSC page status · cc9cfddb
      Vitaly Kuznetsov 提交于
      Create an infrastructure for tracking Hyper-V TSC page status, i.e. if it
      was updated from guest/host side or if we've failed to set it up (because
      e.g. guest wrote some garbage to HV_X64_MSR_REFERENCE_TSC) and there's no
      need to retry.
      
      Also, in a hypothetical situation when we are in 'always catchup' mode for
      TSC we can now avoid contending 'hv->hv_lock' on every guest enter by
      setting the state to HV_TSC_PAGE_BROKEN after compute_tsc_page_parameters()
      returns false.
      
      Check for HV_TSC_PAGE_SET state instead of '!hv->tsc_ref.tsc_sequence' in
      get_time_ref_counter() to properly handle the situation when we failed to
      write the updated TSC page values to the guest.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210316143736.964151-4-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cc9cfddb
    • A
      bpf: Fix fexit trampoline. · e21aa341
      Alexei Starovoitov 提交于
      The fexit/fmod_ret programs can be attached to kernel functions that can sleep.
      The synchronize_rcu_tasks() will not wait for such tasks to complete.
      In such case the trampoline image will be freed and when the task
      wakes up the return IP will point to freed memory causing the crash.
      Solve this by adding percpu_ref_get/put for the duration of trampoline
      and separate trampoline vs its image life times.
      The "half page" optimization has to be removed, since
      first_half->second_half->first_half transition cannot be guaranteed to
      complete in deterministic time. Every trampoline update becomes a new image.
      The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and
      call_rcu_tasks. Together they will wait for the original function and
      trampoline asm to complete. The trampoline is patched from nop to jmp to skip
      fexit progs. They are freed independently from the trampoline. The image with
      fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which
      will wait for both sleepable and non-sleepable progs to complete.
      
      Fixes: fec56f58 ("bpf: Introduce BPF trampoline")
      Reported-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: Paul E. McKenney <paulmck@kernel.org>  # for RCU
      Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
      e21aa341
    • L
      module: remove never implemented MODULE_SUPPORTED_DEVICE · 6417f031
      Leon Romanovsky 提交于
      MODULE_SUPPORTED_DEVICE was added in pre-git era and never was
      implemented. We can safely remove it, because the kernel has grown
      to have many more reliable mechanisms to determine if device is
      supported or not.
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6417f031
  7. 17 3月, 2021 12 次提交
  8. 13 3月, 2021 4 次提交
    • W
      KVM: LAPIC: Advancing the timer expiration on guest initiated write · 35737d2d
      Wanpeng Li 提交于
      Advancing the timer expiration should only be necessary on guest initiated
      writes. When we cancel the timer and clear .pending during state restore,
      clear expired_tscdeadline as well.
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1614818118-965-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      35737d2d
    • S
      KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode · 8df9f1af
      Sean Christopherson 提交于
      If mmu_lock is held for write, don't bother setting !PRESENT SPTEs to
      REMOVED_SPTE when recursively zapping SPTEs as part of shadow page
      removal.  The concurrent write protections provided by REMOVED_SPTE are
      not needed, there are no backing page side effects to record, and MMIO
      SPTEs can be left as is since they are protected by the memslot
      generation, not by ensuring that the MMIO SPTE is unreachable (which
      is racy with respect to lockless walks regardless of zapping behavior).
      
      Skipping !PRESENT drastically reduces the number of updates needed to
      tear down sparsely populated MMUs, e.g. when tearing down a 6gb VM that
      didn't touch much memory, 6929/7168 (~96.6%) of SPTEs were '0' and could
      be skipped.
      
      Avoiding the write itself is likely close to a wash, but avoiding
      __handle_changed_spte() is a clear-cut win as that involves saving and
      restoring all non-volatile GPRs (it's a subtly big function), as well as
      several conditional branches before bailing out.
      
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210310003029.1250571-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8df9f1af
    • W
      KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged · d7eb79c6
      Wanpeng Li 提交于
      # lscpu
      Architecture:          x86_64
      CPU op-mode(s):        32-bit, 64-bit
      Byte Order:            Little Endian
      CPU(s):                88
      On-line CPU(s) list:   0-63
      Off-line CPU(s) list:  64-87
      
      # cat /proc/cmdline
      BOOT_IMAGE=/vmlinuz-5.10.0-rc3-tlinux2-0050+ root=/dev/mapper/cl-root ro
      rd.lvm.lv=cl/root rhgb quiet console=ttyS0 LANG=en_US .UTF-8 no-kvmclock-vsyscall
      
      # echo 1 > /sys/devices/system/cpu/cpu76/online
      -bash: echo: write error: Cannot allocate memory
      
      The per-cpu vsyscall pvclock data pointer assigns either an element of the
      static array hv_clock_boot (#vCPU <= 64) or dynamically allocated memory
      hvclock_mem (vCPU > 64), the dynamically memory will not be allocated if
      kvmclock vsyscall is disabled, this can result in cpu hotpluged fails in
      kvmclock_setup_percpu() which returns -ENOMEM. It's broken for no-vsyscall
      and sometimes you end up with vsyscall disabled if the host does something
      strange. This patch fixes it by allocating this dynamically memory
      unconditionally even if vsyscall is disabled.
      
      Fixes: 6a1cac56 ("x86/kvm: Use __bss_decrypted attribute in shared variables")
      Reported-by: NZelin Deng <zelin.deng@linux.alibaba.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: stable@vger.kernel.org#v4.19-rc5+
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1614130683-24137-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d7eb79c6
    • M
      kvm: x86: annotate RCU pointers · 6fcd9cbc
      Muhammad Usama Anjum 提交于
      This patch adds the annotation to fix the following sparse errors:
      arch/x86/kvm//x86.c:8147:15: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//x86.c:8147:15:    struct kvm_apic_map [noderef] __rcu *
      arch/x86/kvm//x86.c:8147:15:    struct kvm_apic_map *
      arch/x86/kvm//x86.c:10628:16: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//x86.c:10628:16:    struct kvm_apic_map [noderef] __rcu *
      arch/x86/kvm//x86.c:10628:16:    struct kvm_apic_map *
      arch/x86/kvm//x86.c:10629:15: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//x86.c:10629:15:    struct kvm_pmu_event_filter [noderef] __rcu *
      arch/x86/kvm//x86.c:10629:15:    struct kvm_pmu_event_filter *
      arch/x86/kvm//lapic.c:267:15: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//lapic.c:267:15:    struct kvm_apic_map [noderef] __rcu *
      arch/x86/kvm//lapic.c:267:15:    struct kvm_apic_map *
      arch/x86/kvm//lapic.c:269:9: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//lapic.c:269:9:    struct kvm_apic_map [noderef] __rcu *
      arch/x86/kvm//lapic.c:269:9:    struct kvm_apic_map *
      arch/x86/kvm//lapic.c:637:15: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//lapic.c:637:15:    struct kvm_apic_map [noderef] __rcu *
      arch/x86/kvm//lapic.c:637:15:    struct kvm_apic_map *
      arch/x86/kvm//lapic.c:994:15: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//lapic.c:994:15:    struct kvm_apic_map [noderef] __rcu *
      arch/x86/kvm//lapic.c:994:15:    struct kvm_apic_map *
      arch/x86/kvm//lapic.c:1036:15: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//lapic.c:1036:15:    struct kvm_apic_map [noderef] __rcu *
      arch/x86/kvm//lapic.c:1036:15:    struct kvm_apic_map *
      arch/x86/kvm//lapic.c:1173:15: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//lapic.c:1173:15:    struct kvm_apic_map [noderef] __rcu *
      arch/x86/kvm//lapic.c:1173:15:    struct kvm_apic_map *
      arch/x86/kvm//pmu.c:190:18: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//pmu.c:190:18:    struct kvm_pmu_event_filter [noderef] __rcu *
      arch/x86/kvm//pmu.c:190:18:    struct kvm_pmu_event_filter *
      arch/x86/kvm//pmu.c:251:18: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//pmu.c:251:18:    struct kvm_pmu_event_filter [noderef] __rcu *
      arch/x86/kvm//pmu.c:251:18:    struct kvm_pmu_event_filter *
      arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//pmu.c:522:18:    struct kvm_pmu_event_filter [noderef] __rcu *
      arch/x86/kvm//pmu.c:522:18:    struct kvm_pmu_event_filter *
      arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces):
      arch/x86/kvm//pmu.c:522:18:    struct kvm_pmu_event_filter [noderef] __rcu *
      arch/x86/kvm//pmu.c:522:18:    struct kvm_pmu_event_filter *
      Signed-off-by: NMuhammad Usama Anjum <musamaanjum@gmail.com>
      Message-Id: <20210305191123.GA497469@LEGION>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6fcd9cbc
  9. 12 3月, 2021 1 次提交
  10. 11 3月, 2021 2 次提交
  11. 10 3月, 2021 2 次提交
    • S
      x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case · c8e2fe13
      Sean Christopherson 提交于
      Initialize x86_pmu.guest_get_msrs to return 0/NULL to handle the "nop"
      case.  Patching in perf_guest_get_msrs_nop() during setup does not work
      if there is no PMU, as setup bails before updating the static calls,
      leaving x86_pmu.guest_get_msrs NULL and thus a complete nop.  Ultimately,
      this causes VMX abort on VM-Exit due to KVM putting random garbage from
      the stack into the MSR load list.
      
      Add a comment in KVM to note that nr_msrs is valid if and only if the
      return value is non-NULL.
      
      Fixes: abd562df ("x86/perf: Use static_call for x86_pmu.guest_get_msrs")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reported-by: syzbot+cce9ef2dd25246f815ee@syzkaller.appspotmail.com
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210309171019.1125243-1-seanjc@google.com
      c8e2fe13
    • Y
      bpf, x86: Use kvmalloc_array instead kmalloc_array in bpf_jit_comp · de920fc6
      Yonghong Song 提交于
      x86 bpf_jit_comp.c used kmalloc_array to store jited addresses
      for each bpf insn. With a large bpf program, we have see the
      following allocation failures in our production server:
      
          page allocation failure: order:5, mode:0x40cc0(GFP_KERNEL|__GFP_COMP),
                                   nodemask=(null),cpuset=/,mems_allowed=0"
          Call Trace:
          dump_stack+0x50/0x70
          warn_alloc.cold.120+0x72/0xd2
          ? __alloc_pages_direct_compact+0x157/0x160
          __alloc_pages_slowpath+0xcdb/0xd00
          ? get_page_from_freelist+0xe44/0x1600
          ? vunmap_page_range+0x1ba/0x340
          __alloc_pages_nodemask+0x2c9/0x320
          kmalloc_order+0x18/0x80
          kmalloc_order_trace+0x1d/0xa0
          bpf_int_jit_compile+0x1e2/0x484
          ? kmalloc_order_trace+0x1d/0xa0
          bpf_prog_select_runtime+0xc3/0x150
          bpf_prog_load+0x480/0x720
          ? __mod_memcg_lruvec_state+0x21/0x100
          __do_sys_bpf+0xc31/0x2040
          ? close_pdeo+0x86/0xe0
          do_syscall_64+0x42/0x110
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
          RIP: 0033:0x7f2f300f7fa9
          Code: Bad RIP value.
      
      Dumped assembly:
      
          ffffffff810b6d70 <bpf_int_jit_compile>:
          ; {
          ffffffff810b6d70: e8 eb a5 b4 00        callq   0xffffffff81c01360 <__fentry__>
          ffffffff810b6d75: 41 57                 pushq   %r15
          ...
          ffffffff810b6f39: e9 72 fe ff ff        jmp     0xffffffff810b6db0 <bpf_int_jit_compile+0x40>
          ;       addrs = kmalloc_array(prog->len + 1, sizeof(*addrs), GFP_KERNEL);
          ffffffff810b6f3e: 8b 45 0c              movl    12(%rbp), %eax
          ;       return __kmalloc(bytes, flags);
          ffffffff810b6f41: be c0 0c 00 00        movl    $3264, %esi
          ;       addrs = kmalloc_array(prog->len + 1, sizeof(*addrs), GFP_KERNEL);
          ffffffff810b6f46: 8d 78 01              leal    1(%rax), %edi
          ;       if (unlikely(check_mul_overflow(n, size, &bytes)))
          ffffffff810b6f49: 48 c1 e7 02           shlq    $2, %rdi
          ;       return __kmalloc(bytes, flags);
          ffffffff810b6f4d: e8 8e 0c 1d 00        callq   0xffffffff81287be0 <__kmalloc>
          ;       if (!addrs) {
          ffffffff810b6f52: 48 85 c0              testq   %rax, %rax
      
      Change kmalloc_array() to kvmalloc_array() to avoid potential
      allocation error for big bpf programs.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210309015647.3657852-1-yhs@fb.com
      de920fc6
  12. 09 3月, 2021 3 次提交