1. 22 8月, 2020 3 次提交
  2. 21 8月, 2020 15 次提交
  3. 20 8月, 2020 9 次提交
  4. 19 8月, 2020 13 次提交
    • Y
      bpftool: Handle EAGAIN error code properly in pids collection · 00fa1d83
      Yonghong Song 提交于
      When the error code is EAGAIN, the kernel signals the user
      space should retry the read() operation for bpf iterators.
      Let us do it.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200818222312.2181675-1-yhs@fb.com
      00fa1d83
    • Y
      bpf: Avoid visit same object multiple times · e60572b8
      Yonghong Song 提交于
      Currently when traversing all tasks, the next tid
      is always increased by one. This may result in
      visiting the same task multiple times in a
      pid namespace.
      
      This patch fixed the issue by seting the next
      tid as pid_nr_ns(pid, ns) + 1, similar to
      funciton next_tgid().
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Link: https://lore.kernel.org/bpf/20200818222310.2181500-1-yhs@fb.com
      e60572b8
    • Y
      bpf: Fix a rcu_sched stall issue with bpf task/task_file iterator · e679654a
      Yonghong Song 提交于
      In our production system, we observed rcu stalls when
      'bpftool prog` is running.
        rcu: INFO: rcu_sched self-detected stall on CPU
        rcu: \x097-....: (20999 ticks this GP) idle=302/1/0x4000000000000000 softirq=1508852/1508852 fqs=4913
        \x09(t=21031 jiffies g=2534773 q=179750)
        NMI backtrace for cpu 7
        CPU: 7 PID: 184195 Comm: bpftool Kdump: loaded Tainted: G        W         5.8.0-00004-g68bfc7f8c1b4 #6
        Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019
        Call Trace:
        <IRQ>
        dump_stack+0x57/0x70
        nmi_cpu_backtrace.cold+0x14/0x53
        ? lapic_can_unplug_cpu.cold+0x39/0x39
        nmi_trigger_cpumask_backtrace+0xb7/0xc7
        rcu_dump_cpu_stacks+0xa2/0xd0
        rcu_sched_clock_irq.cold+0x1ff/0x3d9
        ? tick_nohz_handler+0x100/0x100
        update_process_times+0x5b/0x90
        tick_sched_timer+0x5e/0xf0
        __hrtimer_run_queues+0x12a/0x2a0
        hrtimer_interrupt+0x10e/0x280
        __sysvec_apic_timer_interrupt+0x51/0xe0
        asm_call_on_stack+0xf/0x20
        </IRQ>
        sysvec_apic_timer_interrupt+0x6f/0x80
        asm_sysvec_apic_timer_interrupt+0x12/0x20
        RIP: 0010:task_file_seq_get_next+0x71/0x220
        Code: 00 00 8b 53 1c 49 8b 7d 00 89 d6 48 8b 47 20 44 8b 18 41 39 d3 76 75 48 8b 4f 20 8b 01 39 d0 76 61 41 89 d1 49 39 c1 48 19 c0 <48> 8b 49 08 21 d0 48 8d 04 c1 4c 8b 08 4d 85 c9 74 46 49 8b 41 38
        RSP: 0018:ffffc90006223e10 EFLAGS: 00000297
        RAX: ffffffffffffffff RBX: ffff888f0d172388 RCX: ffff888c8c07c1c0
        RDX: 00000000000f017b RSI: 00000000000f017b RDI: ffff888c254702c0
        RBP: ffffc90006223e68 R08: ffff888be2a1c140 R09: 00000000000f017b
        R10: 0000000000000002 R11: 0000000000100000 R12: ffff888f23c24118
        R13: ffffc90006223e60 R14: ffffffff828509a0 R15: 00000000ffffffff
        task_file_seq_next+0x52/0xa0
        bpf_seq_read+0xb9/0x320
        vfs_read+0x9d/0x180
        ksys_read+0x5f/0xe0
        do_syscall_64+0x38/0x60
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f8815f4f76e
        Code: c0 e9 f6 fe ff ff 55 48 8d 3d 76 70 0a 00 48 89 e5 e8 36 06 02 00 66 0f 1f 44 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 52 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5
        RSP: 002b:00007fff8f9df578 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
        RAX: ffffffffffffffda RBX: 000000000170b9c0 RCX: 00007f8815f4f76e
        RDX: 0000000000001000 RSI: 00007fff8f9df5b0 RDI: 0000000000000007
        RBP: 00007fff8f9e05f0 R08: 0000000000000049 R09: 0000000000000010
        R10: 00007f881601fa40 R11: 0000000000000246 R12: 00007fff8f9e05a8
        R13: 00007fff8f9e05a8 R14: 0000000001917f90 R15: 000000000000e22e
      
      Note that `bpftool prog` actually calls a task_file bpf iterator
      program to establish an association between prog/map/link/btf anon
      files and processes.
      
      In the case where the above rcu stall occured, we had a process
      having 1587 tasks and each task having roughly 81305 files.
      This implied 129 million bpf prog invocations. Unfortunwtely none of
      these files are prog/map/link/btf files so bpf iterator/prog needs
      to traverse all these files and not able to return to user space
      since there are no seq_file buffer overflow.
      
      This patch fixed the issue in bpf_seq_read() to limit the number
      of visited objects. If the maximum number of visited objects is
      reached, no more objects will be visited in the current syscall.
      If there is nothing written in the seq_file buffer, -EAGAIN will
      return to the user so user can try again.
      
      The maximum number of visited objects is set at 1 million.
      In our Intel Xeon D-2191 2.3GHZ 18-core server, bpf_seq_read()
      visiting 1 million files takes around 0.18 seconds.
      
      We did not use cond_resched() since for some iterators, e.g.,
      netlink iterator, where rcu read_lock critical section spans between
      consecutive seq_ops->next(), which makes impossible to do cond_resched()
      in the key while loop of function bpf_seq_read().
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Link: https://lore.kernel.org/bpf/20200818222309.2181348-1-yhs@fb.com
      e679654a
    • C
      net: ipv4: remove duplicate "the the" phrase in Kconfig text · ad664118
      Colin Ian King 提交于
      The Kconfig help text contains the phrase "the the" in the help
      text. Fix this and reformat the block of help text.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad664118
    • C
      net: mscc: ocelot: remove duplicate "the the" phrase in Kconfig text · 17340552
      Colin Ian King 提交于
      The Kconfig help text contains the phrase "the the" in the help
      text. Fix this.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17340552
    • D
      Merge branch 'ethtool-netlink-bug-fixes' · 0df55a03
      David S. Miller 提交于
      Maxim Mikityanskiy says:
      
      ====================
      ethtool-netlink bug fixes
      
      This series contains a few bug fixes for ethtool-netlink. These bugs are
      specific for the netlink interface, and the legacy ioctl interface is
      not affected. These patches aim to have the same behavior in
      ethtool-netlink as in the legacy ethtool.
      
      Please also see the sibling series for the userspace tool.
      
      v2 changes: Added Fixes tags.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0df55a03
    • M
      ethtool: Don't omit the netlink reply if no features were changed · f01204ec
      Maxim Mikityanskiy 提交于
      The legacy ethtool userspace tool shows an error when no features could
      be changed. It's useful to have a netlink reply to be able to show this
      error when __netdev_update_features wasn't called, for example:
      
      1. ethtool -k eth0
         large-receive-offload: off
      2. ethtool -K eth0 rx-fcs on
      3. ethtool -K eth0 lro on
         Could not change any device features
         rx-lro: off [requested on]
      4. ethtool -K eth0 lro on
         # The output should be the same, but without this patch the kernel
         # doesn't send the reply, and ethtool is unable to detect the error.
      
      This commit makes ethtool-netlink always return a reply when requested,
      and it still avoids unnecessary calls to __netdev_update_features if the
      wanted features haven't changed.
      
      Fixes: 0980bfcd ("ethtool: set netdev features with FEATURES_SET request")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f01204ec
    • M
      ethtool: Account for hw_features in netlink interface · 2847bfed
      Maxim Mikityanskiy 提交于
      ethtool-netlink ignores dev->hw_features and may confuse the drivers by
      asking them to enable features not in the hw_features bitmask. For
      example:
      
      1. ethtool -k eth0
         tls-hw-tx-offload: off [fixed]
      2. ethtool -K eth0 tls-hw-tx-offload on
         tls-hw-tx-offload: on
      3. ethtool -k eth0
         tls-hw-tx-offload: on [fixed]
      
      Fitler out dev->hw_features from req_wanted to fix it and to resemble
      the legacy ethtool behavior.
      
      Fixes: 0980bfcd ("ethtool: set netdev features with FEATURES_SET request")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2847bfed
    • M
      ethtool: Fix preserving of wanted feature bits in netlink interface · 840110a4
      Maxim Mikityanskiy 提交于
      Currently, ethtool-netlink calculates new wanted bits as:
      (req_wanted & req_mask) | (old_active & ~req_mask)
      
      It completely discards the old wanted bits, so they are forgotten with
      the next ethtool command. Sample steps to reproduce:
      
      1. ethtool -k eth0
         tx-tcp-segmentation: on # TSO is on from the beginning
      2. ethtool -K eth0 tx off
         tx-tcp-segmentation: off [not requested]
      3. ethtool -k eth0
         tx-tcp-segmentation: off [requested on]
      4. ethtool -K eth0 rx off # Some change unrelated to TSO
      5. ethtool -k eth0
         tx-tcp-segmentation: off # "Wanted on" is forgotten
      
      This commit fixes it by changing the formula to:
      (req_wanted & req_mask) | (old_wanted & ~req_mask),
      where old_active was replaced by old_wanted to account for the wanted
      bits.
      
      The shortcut condition for the case where nothing was changed now
      compares wanted bitmasks, instead of wanted to active.
      
      Fixes: 0980bfcd ("ethtool: set netdev features with FEATURES_SET request")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      840110a4
    • X
      ipv6: some fixes for ipv6_dev_find() · 4ef1a7cb
      Xin Long 提交于
      This patch is to do 3 things for ipv6_dev_find():
      
        As David A. noticed,
      
        - rt6_lookup() is not really needed. Different from __ip_dev_find(),
          ipv6_dev_find() doesn't have a compatibility problem, so remove it.
      
        As Hideaki suggested,
      
        - "valid" (non-tentative) check for the address is also needed.
          ipv6_chk_addr() calls ipv6_chk_addr_and_flags(), which will
          traverse the address hash list, but it's heavy to be called
          inside ipv6_dev_find(). This patch is to reuse the code of
          ipv6_chk_addr_and_flags() for ipv6_dev_find().
      
        - dev parameter is passed into ipv6_dev_find(), as link-local
          addresses from user space has sin6_scope_id set and the dev
          lookup needs it.
      
      Fixes: 81f6cb31 ("ipv6: add ipv6_dev_find()")
      Suggested-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
      Reported-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ef1a7cb
    • J
      bonding: fix active-backup failover for current ARP slave · 0410d071
      Jiri Wiesner 提交于
      When the ARP monitor is used for link detection, ARP replies are
      validated for all slaves (arp_validate=3) and fail_over_mac is set to
      active, two slaves of an active-backup bond may get stuck in a state
      where both of them are active and pass packets that they receive to
      the bond. This state makes IPv6 duplicate address detection fail. The
      state is reached thus:
      1. The current active slave goes down because the ARP target
         is not reachable.
      2. The current ARP slave is chosen and made active.
      3. A new slave is enslaved. This new slave becomes the current active
         slave and can reach the ARP target.
      As a result, the current ARP slave stays active after the enslave
      action has finished and the log is littered with "PROBE BAD" messages:
      > bond0: PROBE: c_arp ens10 && cas ens11 BAD
      The workaround is to remove the slave with "going back" status from
      the bond and re-enslave it. This issue was encountered when DPDK PMD
      interfaces were being enslaved to an active-backup bond.
      
      I would be possible to fix the issue in bond_enslave() or
      bond_change_active_slave() but the ARP monitor was fixed instead to
      keep most of the actions changing the current ARP slave in the ARP
      monitor code. The current ARP slave is set as inactive and backup
      during the commit phase. A new state, BOND_LINK_FAIL, has been
      introduced for slaves in the context of the ARP monitor. This allows
      administrators to see how slaves are rotated for sending ARP requests
      and attempts are made to find a new active slave.
      
      Fixes: b2220cad ("bonding: refactor ARP active-backup monitor")
      Signed-off-by: NJiri Wiesner <jwiesner@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0410d071
    • M
      net: handle the return value of pskb_carve_frag_list() correctly · eabe8618
      Miaohe Lin 提交于
      pskb_carve_frag_list() may return -ENOMEM in pskb_carve_inside_nonlinear().
      we should handle this correctly or we would get wrong sk_buff.
      
      Fixes: 6fa01ccd ("skbuff: Add pskb_extract() helper function")
      Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eabe8618
    • S
      net: gianfar: Add of_node_put() before goto statement · 989e4da0
      Sumera Priyadarsini 提交于
      Every iteration of for_each_available_child_of_node() decrements
      reference count of the previous node, however when control
      is transferred from the middle of the loop, as in the case of
      a return or break or goto, there is no decrement thus ultimately
      resulting in a memory leak.
      
      Fix a potential memory leak in gianfar.c by inserting of_node_put()
      before the goto statement.
      
      Issue found with Coccinelle.
      Signed-off-by: NSumera Priyadarsini <sylphrenadin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      989e4da0