1. 17 7月, 2021 8 次提交
    • A
      bpf: Add ambient BPF runtime context stored in current · c7603cfa
      Andrii Nakryiko 提交于
      b910eaaa ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
      helper") fixed the problem with cgroup-local storage use in BPF by
      pre-allocating per-CPU array of 8 cgroup storage pointers to accommodate
      possible BPF program preemptions and nested executions.
      
      While this seems to work good in practice, it introduces new and unnecessary
      failure mode in which not all BPF programs might be executed if we fail to
      find an unused slot for cgroup storage, however unlikely it is. It might also
      not be so unlikely when/if we allow sleepable cgroup BPF programs in the
      future.
      
      Further, the way that cgroup storage is implemented as ambiently-available
      property during entire BPF program execution is a convenient way to pass extra
      information to BPF program and helpers without requiring user code to pass
      around extra arguments explicitly. So it would be good to have a generic
      solution that can allow implementing this without arbitrary restrictions.
      Ideally, such solution would work for both preemptable and sleepable BPF
      programs in exactly the same way.
      
      This patch introduces such solution, bpf_run_ctx. It adds one pointer field
      (bpf_ctx) to task_struct. This field is maintained by BPF_PROG_RUN family of
      macros in such a way that it always stays valid throughout BPF program
      execution. BPF program preemption is handled by remembering previous
      current->bpf_ctx value locally while executing nested BPF program and
      restoring old value after nested BPF program finishes. This is handled by two
      helper functions, bpf_set_run_ctx() and bpf_reset_run_ctx(), which are
      supposed to be used before and after BPF program runs, respectively.
      
      Restoring old value of the pointer handles preemption, while bpf_run_ctx
      pointer being a property of current task_struct naturally solves this problem
      for sleepable BPF programs by "following" BPF program execution as it is
      scheduled in and out of CPU. It would even allow CPU migration of BPF
      programs, even though it's not currently allowed by BPF infra.
      
      This patch cleans up cgroup local storage handling as a first application. The
      design itself is generic, though, with bpf_run_ctx being an empty struct that
      is supposed to be embedded into a specific struct for a given BPF program type
      (bpf_cg_run_ctx in this case). Follow up patches are planned that will expand
      this mechanism for other uses within tracing BPF programs.
      
      To verify that this change doesn't revert the fix to the original cgroup
      storage issue, I ran the same repro as in the original report ([0]) and didn't
      get any problems. Replacing bpf_reset_run_ctx(old_run_ctx) with
      bpf_reset_run_ctx(NULL) triggers the issue pretty quickly (so repro does work).
      
        [0] https://lore.kernel.org/bpf/YEEvBUiJl2pJkxTd@krava/
      
      Fixes: b910eaaa ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210712230615.3525979-1-andrii@kernel.org
      c7603cfa
    • P
      netdevsim: Add multi-queue support · d4861fc6
      Peilin Ye 提交于
      Currently netdevsim only supports a single queue per port, which is
      insufficient for testing multi-queue TC schedulers e.g. sch_mq.  Extend
      the current sysfs interface so that users can create ports with multiple
      queues:
      
      $ echo "[ID] [PORT_COUNT] [NUM_QUEUES]" > /sys/bus/netdevsim/new_device
      
      As an example, echoing "2 4 8" creates 4 ports, with 8 queues per port.
      Note, this is compatible with the current interface, with default number
      of queues set to 1.  For example, echoing "2 4" creates 4 ports with 1
      queue per port; echoing "2" simply creates 1 port with 1 queue.
      Reviewed-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4861fc6
    • M
      openvswitch: Introduce per-cpu upcall dispatch · b83d23a2
      Mark Gray 提交于
      The Open vSwitch kernel module uses the upcall mechanism to send
      packets from kernel space to user space when it misses in the kernel
      space flow table. The upcall sends packets via a Netlink socket.
      Currently, a Netlink socket is created for every vport. In this way,
      there is a 1:1 mapping between a vport and a Netlink socket.
      When a packet is received by a vport, if it needs to be sent to
      user space, it is sent via the corresponding Netlink socket.
      
      This mechanism, with various iterations of the corresponding user
      space code, has seen some limitations and issues:
      
      * On systems with a large number of vports, there is a correspondingly
      large number of Netlink sockets which can limit scaling.
      (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
      * Packet reordering on upcalls.
      (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
      * A thundering herd issue.
      (https://bugzilla.redhat.com/show_bug.cgi?id=1834444)
      
      This patch introduces an alternative, feature-negotiated, upcall
      mode using a per-cpu dispatch rather than a per-vport dispatch.
      
      In this mode, the Netlink socket to be used for the upcall is
      selected based on the CPU of the thread that is executing the upcall.
      In this way, it resolves the issues above as:
      
      a) The number of Netlink sockets scales with the number of CPUs
      rather than the number of vports.
      b) Ordering per-flow is maintained as packets are distributed to
      CPUs based on mechanisms such as RSS and flows are distributed
      to a single user space thread.
      c) Packets from a flow can only wake up one user space thread.
      
      The corresponding user space code can be found at:
      https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385139.html
      
      Bugzilla: https://bugzilla.redhat.com/1844576Signed-off-by: NMark Gray <mark.d.gray@redhat.com>
      Acked-by: NFlavio Leitner <fbl@sysclose.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b83d23a2
    • B
      bnx2x: remove unused variable 'cur_data_offset' · 919d5279
      Bill Wendling 提交于
      Fix the clang build warning:
      
        drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:1862:13: error: variable 'cur_data_offset' set but not used [-Werror,-Wunused-but-set-variable]
              dma_addr_t cur_data_offset;
      Signed-off-by: NBill Wendling <morbo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      919d5279
    • C
      net: switchdev: Simplify 'mlxsw_sp_mc_write_mdb_entry()' · a99f030b
      Christophe JAILLET 提交于
      Use 'bitmap_alloc()/bitmap_free()' instead of hand-writing it.
      This makes the code less verbose.
      
      Also, use 'bitmap_alloc()' instead of 'bitmap_zalloc()' because the bitmap
      is fully overridden by a 'bitmap_copy()' call just after its allocation.
      
      While at it, remove an extra and unneeded space.
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a99f030b
    • Y
      net/sched: Remove unnecessary if statement · f79a3bcb
      Yajun Deng 提交于
      It has been deal with the 'if (err' statement in rtnetlink_send()
      and rtnl_unicast(). so remove unnecessary if statement.
      
      v2: use the raw name rtnetlink_send().
      Signed-off-by: NYajun Deng <yajun.deng@linux.dev>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f79a3bcb
    • Y
      rtnetlink: use nlmsg_notify() in rtnetlink_send() · cfdf0d9a
      Yajun Deng 提交于
      The netlink_{broadcast, unicast} don't deal with 'if (err > 0' statement
      but nlmsg_{multicast, unicast} do. The nlmsg_notify() contains them.
      so use nlmsg_notify() instead. so that the caller wouldn't deal with
      'if (err > 0' statement.
      
      v2: use nlmsg_notify() will do well.
      Signed-off-by: NYajun Deng <yajun.deng@linux.dev>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfdf0d9a
    • H
      gve: fix the wrong AdminQ buffer overflow check · 63a9192b
      Haiyue Wang 提交于
      The 'tail' pointer is also free-running count, so it needs to be masked
      as 'adminq_prod_cnt' does, to become an index value of AdminQ buffer.
      
      Fixes: 5cdad90d ("gve: Batch AQ commands for creating and destroying queues.")
      Signed-off-by: NHaiyue Wang <haiyue.wang@intel.com>
      Reviewed-by: NCatherine Sullivan <csully@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63a9192b
  2. 16 7月, 2021 32 次提交