1. 22 9月, 2020 2 次提交
    • M
      net/mlx5e: Use RCU to protect rq->xdp_prog · fe45386a
      Maxim Mikityanskiy 提交于
      Currently, the RQs are temporarily deactivated while hot-replacing the
      XDP program, and napi_synchronize is used to make sure rq->xdp_prog is
      not in use. However, napi_synchronize is not ideal: instead of waiting
      till the end of a NAPI cycle, it polls and waits until NAPI is not
      running, sleeping for 1ms between the periodic checks. Under heavy
      workloads, this loop will never end, which may even lead to a kernel
      panic if the kernel detects the hangup. Such workloads include XSK TX
      and possibly also heavy RX (XSK or normal).
      
      The fix is inspired by commit 326fe02d ("net/mlx4_en: protect
      ring->xdp_prog with rcu_read_lock"). As mlx5e_xdp_handle is already
      protected by rcu_read_lock, and bpf_prog_put uses call_rcu to free the
      program, there is no need for additional synchronization if proper RCU
      functions are used to access the pointer. This patch converts all
      accesses to rq->xdp_prog to use RCU functions.
      
      Fixes: 86994156 ("net/mlx5e: XDP fast RX drop bpf programs support")
      Fixes: db05815b ("net/mlx5e: Add XSK zero-copy support")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      fe45386a
    • M
      net/mlx5: Fix FTE cleanup · cefc2355
      Maor Gottlieb 提交于
      Currently, when an FTE is allocated, its refcount is decreased to 0
      with the purpose it will not be a stand alone steering object and every
      rule (destination) of the FTE would increase the refcount.
      When mlx5_cleanup_fs is called while not all rules were deleted by the
      steering users, it hit refcount underflow on the FTE once clean_tree
      calls to tree_remove_node after the deleted rules already decreased
      the refcount to 0.
      
      FTE is no longer destroyed implicitly when the last rule (destination)
      is deleted. mlx5_del_flow_rules avoids it by increasing the refcount on
      the FTE and destroy it explicitly after all rules were deleted. So we
      can avoid the refcount underflow by making FTE as stand alone object.
      In addition need to set del_hw_func to FTE so the HW object will be
      destroyed when the FTE is deleted from the cleanup_tree flow.
      
      refcount_t: underflow; use-after-free.
      WARNING: CPU: 2 PID: 15715 at lib/refcount.c:28 refcount_warn_saturate+0xd9/0xe0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Call Trace:
       tree_put_node+0xf2/0x140 [mlx5_core]
       clean_tree+0x4e/0xf0 [mlx5_core]
       clean_tree+0x4e/0xf0 [mlx5_core]
       clean_tree+0x4e/0xf0 [mlx5_core]
       clean_tree+0x5f/0xf0 [mlx5_core]
       clean_tree+0x4e/0xf0 [mlx5_core]
       clean_tree+0x5f/0xf0 [mlx5_core]
       mlx5_cleanup_fs+0x26/0x270 [mlx5_core]
       mlx5_unload+0x2e/0xa0 [mlx5_core]
       mlx5_unload_one+0x51/0x120 [mlx5_core]
       mlx5_devlink_reload_down+0x51/0x90 [mlx5_core]
       devlink_reload+0x39/0x120
       ? devlink_nl_cmd_reload+0x43/0x220
       genl_rcv_msg+0x1e4/0x420
       ? genl_family_rcv_msg_attrs_parse+0x100/0x100
       netlink_rcv_skb+0x47/0x110
       genl_rcv+0x24/0x40
       netlink_unicast+0x217/0x2f0
       netlink_sendmsg+0x30f/0x430
       sock_sendmsg+0x30/0x40
       __sys_sendto+0x10e/0x140
       ? handle_mm_fault+0xc4/0x1f0
       ? do_page_fault+0x33f/0x630
       __x64_sys_sendto+0x24/0x30
       do_syscall_64+0x48/0x130
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 718ce4d6 ("net/mlx5: Consolidate update FTE for all removal changes")
      Fixes: bd71b08e ("net/mlx5: Support multiple updates of steering rules in parallel")
      Signed-off-by: NMaor Gottlieb <maorg@nvidia.com>
      Reviewed-by: NMark Bloch <mbloch@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      cefc2355
  2. 18 9月, 2020 10 次提交
  3. 17 9月, 2020 3 次提交
  4. 16 9月, 2020 6 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · d5d325ea
      David S. Miller 提交于
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2020-09-15
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 12 non-merge commits during the last 19 day(s) which contain
      a total of 10 files changed, 47 insertions(+), 38 deletions(-).
      
      The main changes are:
      
      1) docs/bpf fixes, from Andrii.
      
      2) ld_abs fix, from Daniel.
      
      3) socket casting helpers fix, from Martin.
      
      4) hash iterator fixes, from Yonghong.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5d325ea
    • Y
      bpf: Fix a rcu warning for bpffs map pretty-print · ce880cb8
      Yonghong Song 提交于
      Running selftest
        ./btf_btf -p
      the kernel had the following warning:
        [   51.528185] WARNING: CPU: 3 PID: 1756 at kernel/bpf/hashtab.c:717 htab_map_get_next_key+0x2eb/0x300
        [   51.529217] Modules linked in:
        [   51.529583] CPU: 3 PID: 1756 Comm: test_btf Not tainted 5.9.0-rc1+ #878
        [   51.530346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01/2014
        [   51.531410] RIP: 0010:htab_map_get_next_key+0x2eb/0x300
        ...
        [   51.542826] Call Trace:
        [   51.543119]  map_seq_next+0x53/0x80
        [   51.543528]  seq_read+0x263/0x400
        [   51.543932]  vfs_read+0xad/0x1c0
        [   51.544311]  ksys_read+0x5f/0xe0
        [   51.544689]  do_syscall_64+0x33/0x40
        [   51.545116]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The related source code in kernel/bpf/hashtab.c:
        709 static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
        710 {
        711         struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
        712         struct hlist_nulls_head *head;
        713         struct htab_elem *l, *next_l;
        714         u32 hash, key_size;
        715         int i = 0;
        716
        717         WARN_ON_ONCE(!rcu_read_lock_held());
      
      In kernel/bpf/inode.c, bpffs map pretty print calls map->ops->map_get_next_key()
      without holding a rcu_read_lock(), hence causing the above warning.
      To fix the issue, just surrounding map->ops->map_get_next_key() with rcu read lock.
      
      Fixes: a26ca7c9 ("bpf: btf: Add pretty print support to the basic arraymap")
      Reported-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20200916004401.146277-1-yhs@fb.com
      ce880cb8
    • M
      bpf: Bpf_skc_to_* casting helpers require a NULL check on sk · 8c33dadc
      Martin KaFai Lau 提交于
      The bpf_skc_to_* type casting helpers are available to
      BPF_PROG_TYPE_TRACING.  The traced PTR_TO_BTF_ID may be NULL.
      For example, the skb->sk may be NULL.  Thus, these casting helpers
      need to check "!sk" also and this patch fixes them.
      
      Fixes: 0d4fad3e ("bpf: Add bpf_skc_to_udp6_sock() helper")
      Fixes: 478cfbdf ("bpf: Add bpf_skc_to_{tcp, tcp_timewait, tcp_request}_sock() helpers")
      Fixes: af7ec138 ("bpf: Add bpf_skc_to_tcp6_sock() helper")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NYonghong Song <yhs@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200915182959.241101-1-kafai@fb.com
      8c33dadc
    • D
      ipv4: Update exception handling for multipath routes via same device · 2fbc6e89
      David Ahern 提交于
      Kfir reported that pmtu exceptions are not created properly for
      deployments where multipath routes use the same device.
      
      After some digging I see 2 compounding problems:
      1. ip_route_output_key_hash_rcu is updating the flowi4_oif *after*
         the route lookup. This is the second use case where this has
         been a problem (the first is related to use of vti devices with
         VRF). I can not find any reason for the oif to be changed after the
         lookup; the code goes back to the start of git. It does not seem
         logical so remove it.
      
      2. fib_lookups for exceptions do not call fib_select_path to handle
         multipath route selection based on the hash.
      
      The end result is that the fib_lookup used to add the exception
      always creates it based using the first leg of the route.
      
      An example topology showing the problem:
      
                       |  host1
                   +------+
                   | eth0 |  .209
                   +------+
                       |
                   +------+
           switch  | br0  |
                   +------+
                       |
             +---------+---------+
             | host2             |  host3
         +------+             +------+
         | eth0 | .250        | eth0 | 192.168.252.252
         +------+             +------+
      
         +-----+             +-----+
         | vti | .2          | vti | 192.168.247.3
         +-----+             +-----+
             \                  /
       =================================
       tunnels
               192.168.247.1/24
      
      for h in host1 host2 host3; do
              ip netns add ${h}
              ip -netns ${h} link set lo up
              ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1
      done
      
      ip netns add switch
      ip -netns switch li set lo up
      ip -netns switch link add br0 type bridge stp 0
      ip -netns switch link set br0 up
      
      for n in 1 2 3; do
              ip -netns switch link add eth-sw type veth peer name eth-h${n}
              ip -netns switch li set eth-h${n} master br0 up
              ip -netns switch li set eth-sw netns host${n} name eth0
      done
      
      ip -netns host1 addr add 192.168.252.209/24 dev eth0
      ip -netns host1 link set dev eth0 up
      ip -netns host1 route add 192.168.247.0/24 \
              nexthop via 192.168.252.250 dev eth0 nexthop via 192.168.252.252 dev eth0
      
      ip -netns host2 addr add 192.168.252.250/24 dev eth0
      ip -netns host2 link set dev eth0 up
      
      ip -netns host2 addr add 192.168.252.252/24 dev eth0
      ip -netns host3 link set dev eth0 up
      
      ip netns add tunnel
      ip -netns tunnel li set lo up
      ip -netns tunnel li add br0 type bridge
      ip -netns tunnel li set br0 up
      for n in $(seq 11 20); do
              ip -netns tunnel addr add dev br0 192.168.247.${n}/24
      done
      
      for n in 2 3
      do
              ip -netns tunnel link add vti${n} type veth peer name eth${n}
              ip -netns tunnel link set eth${n} mtu 1360 master br0 up
              ip -netns tunnel link set vti${n} netns host${n} mtu 1360 up
              ip -netns host${n} addr add dev vti${n} 192.168.247.${n}/24
      done
      ip -netns tunnel ro add default nexthop via 192.168.247.2 nexthop via 192.168.247.3
      
      ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.11
      ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.15
      ip -netns host1 ro ls cache
      
      Before this patch the cache always shows exceptions against the first
      leg in the multipath route; 192.168.252.250 per this example. Since the
      hash has an initial random seed, you may need to vary the final octet
      more than what is listed. In my tests, using addresses between 11 and 19
      usually found 1 that used both legs.
      
      With this patch, the cache will have exceptions for both legs.
      
      Fixes: 4895c771 ("ipv4: Add FIB nexthop exceptions")
      Reported-by: NKfir Itzhak <mastertheknife@gmail.com>
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fbc6e89
    • L
      net: tipc: kerneldoc fixes · 2e5117ba
      Lu Wei 提交于
      Fix parameter description of tipc_link_bc_create()
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Fixes: 16ad3f40 ("tipc: introduce variable window congestion control")
      Signed-off-by: NLu Wei <luwei32@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e5117ba
    • D
      ibmvnic: update MAINTAINERS · d3f2ef18
      Dany Madden 提交于
      Update supporters for IBM Power SRIOV Virtual NIC Device Driver.
      Thomas Falcon is moving on to other works. Dany Madden, Lijun Pan
      and Sukadev Bhattiprolu are the current supporters.
      Signed-off-by: NDany Madden <drt@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3f2ef18
  5. 15 9月, 2020 15 次提交
  6. 12 9月, 2020 4 次提交