1. 30 12月, 2021 1 次提交
  2. 14 12月, 2021 1 次提交
  3. 13 12月, 2021 1 次提交
    • S
      net: bonding: debug: avoid printing debug logs when bond is not notifying peers · fee32de2
      Suresh Kumar 提交于
      Currently "bond_should_notify_peers: slave ..." messages are printed whenever
      "bond_should_notify_peers" function is called.
      
      +++
      Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
      Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
      Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
      Dec 12 12:33:26 node1 kernel: bond0: (slave enp0s25): Received LACPDU on port 1
      Dec 12 12:33:26 node1 kernel: bond0: (slave enp0s25): Rx Machine: Port=1, Last State=6, Curr State=6
      Dec 12 12:33:26 node1 kernel: bond0: (slave enp0s25): partner sync=1
      Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
      Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
      Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
      ...
      Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
      Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
      Dec 12 12:33:30 node1 kernel: bond0: (slave enp4s3): Received LACPDU on port 2
      Dec 12 12:33:30 node1 kernel: bond0: (slave enp4s3): Rx Machine: Port=2, Last State=6, Curr State=6
      Dec 12 12:33:30 node1 kernel: bond0: (slave enp4s3): partner sync=1
      Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
      Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
      Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
      +++
      
      This is confusing and can also clutter up debug logs.
      Print logs only when the peer notification happens.
      Signed-off-by: NSuresh Kumar <suresh2514@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fee32de2
  4. 30 11月, 2021 2 次提交
    • H
      bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device · 94dd016a
      Hangbin Liu 提交于
      We have VLAN PTP support(via get_ts_info) on kernel, and bond support(by
      getting active interface via netlink message) on userspace tool linuxptp.
      But there are always some users who want to use PTP with VLAN over bond,
      which is not able to do with the current implementation.
      
      This patch passed get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device
      with bond mode active-backup/tlb/alb. With this users could get kernel native
      bond or VLAN over bond PTP support.
      
      Test with ptp4l and it works with VLAN over bond after this patch:
      ]# ptp4l -m -i bond0.23
      ptp4l[53377.141]: selected /dev/ptp4 as PTP clock
      ptp4l[53377.142]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
      ptp4l[53377.143]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
      ptp4l[53377.143]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
      ptp4l[53384.127]: port 1: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
      ptp4l[53384.127]: selected local clock e41d2d.fffe.123db0 as best master
      ptp4l[53384.127]: port 1: assuming the grand master role
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94dd016a
    • H
      Bonding: add arp_missed_max option · 5944b5ab
      Hangbin Liu 提交于
      Currently, we use hard code number to verify if we are in the
      arp_interval timeslice. But some user may want to reduce/extend
      the verify timeslice. With the similar team option 'missed_max'
      the uers could change that number based on their own environment.
      Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5944b5ab
  5. 22 11月, 2021 1 次提交
  6. 24 10月, 2021 1 次提交
  7. 09 10月, 2021 1 次提交
  8. 06 9月, 2021 1 次提交
  9. 05 9月, 2021 1 次提交
  10. 16 8月, 2021 1 次提交
  11. 14 8月, 2021 1 次提交
  12. 12 8月, 2021 1 次提交
  13. 10 8月, 2021 2 次提交
    • J
      net, bonding: Add XDP support to the bonding driver · 9e2ee5c7
      Jussi Maki 提交于
      XDP is implemented in the bonding driver by transparently delegating
      the XDP program loading, removal and xmit operations to the bonding
      slave devices. The overall goal of this work is that XDP programs
      can be attached to a bond device *without* any further changes (or
      awareness) necessary to the program itself, meaning the same XDP
      program can be attached to a native device but also a bonding device.
      
      Semantics of XDP_TX when attached to a bond are equivalent in such
      setting to the case when a tc/BPF program would be attached to the
      bond, meaning transmitting the packet out of the bond itself using one
      of the bond's configured xmit methods to select a slave device (rather
      than XDP_TX on the slave itself). Handling of XDP_TX to transmit
      using the configured bonding mechanism is therefore implemented by
      rewriting the BPF program return value in bpf_prog_run_xdp. To avoid
      performance impact this check is guarded by a static key, which is
      incremented when a XDP program is loaded onto a bond device. This
      approach was chosen to avoid changes to drivers implementing XDP. If
      the slave device does not match the receive device, then XDP_REDIRECT
      is transparently used to perform the redirection in order to have
      the network driver release the packet from its RX ring. The bonding
      driver hashing functions have been refactored to allow reuse with
      xdp_buff's to avoid code duplication.
      
      The motivation for this change is to enable use of bonding (and
      802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
      XDP and also to transparently support bond devices for projects that
      use XDP given most modern NICs have dual port adapters. An alternative
      to this approach would be to implement 802.3ad in user-space and
      implement the bonding load-balancing in the XDP program itself, but
      is rather a cumbersome endeavor in terms of slave device management
      (e.g. by watching netlink) and requires separate programs for native
      vs bond cases for the orchestrator. A native in-kernel implementation
      overcomes these issues and provides more flexibility.
      
      Below are benchmark results done on two machines with 100Gbit
      Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
      16-core 3950X on receiving machine. 64 byte packets were sent with
      pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
      ice driver, so the tests were performed with iommu=off and patch [2]
      applied. Additionally the bonding round robin algorithm was modified
      to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
      of cache misses were caused by the shared rr_tx_counter (see patch
      2/3). The statistics were collected using "sar -n dev -u 1 10". On top
      of that, for ice, further work is in progress on improving the XDP_TX
      numbers [4].
      
       -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
       without patch (1 dev):
         XDP_DROP:              3.15%      48.6Mpps
         XDP_TX:                3.12%      18.3Mpps     18.3Mpps
         XDP_DROP (RSS):        9.47%      116.5Mpps
         XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
       -----------------------
       with patch, bond (1 dev):
         XDP_DROP:              3.14%      46.7Mpps
         XDP_TX:                3.15%      13.9Mpps     13.9Mpps
         XDP_DROP (RSS):        10.33%     117.2Mpps
         XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
       -----------------------
       with patch, bond (2 devs):
         XDP_DROP:              6.27%      92.7Mpps
         XDP_TX:                6.26%      17.6Mpps     17.5Mpps
         XDP_DROP (RSS):       11.38%      117.2Mpps
         XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
       --------------------------------------------------------------
      
      RSS: Receive Side Scaling, e.g. the packets were sent to a range of
      destination IPs.
      
        [1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
        [2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
        [3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/
        [4]: https://lore.kernel.org/bpf/20210805230046.28715-1-maciej.fijalkowski@intel.com/T/#tSigned-off-by: NJussi Maki <joamaki@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Link: https://lore.kernel.org/bpf/20210731055738.16820-4-joamaki@gmail.com
      9e2ee5c7
    • J
      net, bonding: Refactor bond_xmit_hash for use with xdp_buff · a815bde5
      Jussi Maki 提交于
      In preparation for adding XDP support to the bonding driver
      refactor the packet hashing functions to be able to work with
      any linear data buffer without an skb.
      Signed-off-by: NJussi Maki <joamaki@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Link: https://lore.kernel.org/bpf/20210731055738.16820-2-joamaki@gmail.com
      a815bde5
  14. 03 8月, 2021 1 次提交
    • H
      bonding: add new option lacp_active · 3a755cd8
      Hangbin Liu 提交于
      Add an option lacp_active, which is similar with team's runner.active.
      This option specifies whether to send LACPDU frames periodically. If set
      on, the LACPDU frames are sent along with the configured lacp_rate
      setting. If set off, the LACPDU frames acts as "speak when spoken to".
      
      Note, the LACPDU state frames still will be sent when init or unbind port.
      
      v2: remove module parameter
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a755cd8
  15. 02 8月, 2021 1 次提交
    • Y
      bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler() · 220ade77
      Yufeng Mo 提交于
      Some time ago, I reported a calltrace issue
      "did not find a suitable aggregator", please see[1].
      After a period of analysis and reproduction, I find
      that this problem is caused by concurrency.
      
      Before the problem occurs, the bond structure is like follows:
      
      bond0 - slaver0(eth0) - agg0.lag_ports -> port0 - port1
                            \
                              port0
            \
              slaver1(eth1) - agg1.lag_ports -> NULL
                            \
                              port1
      
      If we run 'ifenslave bond0 -d eth1', the process is like below:
      
      excuting __bond_release_one()
      |
      bond_upper_dev_unlink()[step1]
      |                       |                       |
      |                       |                       bond_3ad_lacpdu_recv()
      |                       |                       ->bond_3ad_rx_indication()
      |                       |                       spin_lock_bh()
      |                       |                       ->ad_rx_machine()
      |                       |                       ->__record_pdu()[step2]
      |                       |                       spin_unlock_bh()
      |                       |                       |
      |                       bond_3ad_state_machine_handler()
      |                       spin_lock_bh()
      |                       ->ad_port_selection_logic()
      |                       ->try to find free aggregator[step3]
      |                       ->try to find suitable aggregator[step4]
      |                       ->did not find a suitable aggregator[step5]
      |                       spin_unlock_bh()
      |                       |
      |                       |
      bond_3ad_unbind_slave() |
      spin_lock_bh()
      spin_unlock_bh()
      
      step1: already removed slaver1(eth1) from list, but port1 remains
      step2: receive a lacpdu and update port0
      step3: port0 will be removed from agg0.lag_ports. The struct is
             "agg0.lag_ports -> port1" now, and agg0 is not free. At the
      	   same time, slaver1/agg1 has been removed from the list by step1.
      	   So we can't find a free aggregator now.
      step4: can't find suitable aggregator because of step2
      step5: cause a calltrace since port->aggregator is NULL
      
      To solve this concurrency problem, put bond_upper_dev_unlink()
      after bond_3ad_unbind_slave(). In this way, we can invalid the port
      first and skip this port in bond_3ad_state_machine_handler(). This
      eliminates the situation that the slaver has been removed from the
      list but the port is still valid.
      
      [1]https://lore.kernel.org/netdev/10374.1611947473@famine/Signed-off-by: NYufeng Mo <moyufeng@huawei.com>
      Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      220ade77
  16. 28 7月, 2021 3 次提交
    • A
      net: bonding: move ioctl handling to private ndo operation · 3d9d00bd
      Arnd Bergmann 提交于
      All other user triggered operations are gone from ndo_ioctl, so move
      the SIOCBOND family into a custom operation as well.
      
      The .ndo_ioctl() helper is no longer called by the dev_ioctl.c code now,
      but there are still a few definitions in obsolete wireless drivers as well
      as the appletalk and ieee802154 layers to call SIOCSIFADDR/SIOCGIFADDR
      helpers from inside the kernel.
      
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d9d00bd
    • A
      dev_ioctl: split out ndo_eth_ioctl · a7605370
      Arnd Bergmann 提交于
      Most users of ndo_do_ioctl are ethernet drivers that implement
      the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware
      timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP.
      
      Separate these from the few drivers that use ndo_do_ioctl to
      implement SIOCBOND, SIOCBR and SIOCWANDEV commands.
      
      This is a purely cosmetic change intended to help readers find
      their way through the implementation.
      
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Vivien Didelot <vivien.didelot@gmail.com>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Vladimir Oltean <olteanv@gmail.com>
      Cc: Leon Romanovsky <leon@kernel.org>
      Cc: linux-rdma@vger.kernel.org
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7605370
    • A
      bonding: use siocdevprivate · 232ec98e
      Arnd Bergmann 提交于
      The bonding driver supports two command codes for each operation: one
      in the SIOCDEVPRIVATE range and another one with the same definition
      but a unique command code.
      
      Only the second set currently works in compat mode, as the ifr_data
      expansion overwrites part of the ifr_slave field.
      
      Move the private ones into ndo_siocdevprivate and change the
      implementation to call the other function.  This makes both version
      work correctly.
      
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      232ec98e
  17. 17 7月, 2021 1 次提交
    • M
      bonding: fix build issue · 5b69874f
      Mahesh Bandewar 提交于
      The commit 9a560550 (" bonding: Add struct bond_ipesc to manage SA") is causing
      following build error when XFRM is not selected in kernel config.
      
      lld: error: undefined symbol: xfrm_dev_state_flush
      >>> referenced by bond_main.c:3453 (drivers/net/bonding/bond_main.c:3453)
      >>>               net/bonding/bond_main.o:(bond_netdev_event) in archive drivers/built-in.a
      
      Fixes: 9a560550 (" bonding: Add struct bond_ipesc to manage SA")
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      CC: Taehee Yoo <ap420073@gmail.com>
      CC: Jay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b69874f
  18. 07 7月, 2021 7 次提交
    • T
      bonding: fix incorrect return value of bond_ipsec_offload_ok() · 168e696a
      Taehee Yoo 提交于
      bond_ipsec_offload_ok() is called to check whether the interface supports
      ipsec offload or not.
      bonding interface support ipsec offload only in active-backup mode.
      So, if a bond interface is not in active-backup mode, it should return
      false but it returns true.
      
      Fixes: a3b658cf ("bonding: allow xfrm offload setup post-module-load")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      168e696a
    • T
      bonding: fix suspicious RCU usage in bond_ipsec_offload_ok() · 955b785e
      Taehee Yoo 提交于
      To dereference bond->curr_active_slave, it uses rcu_dereference().
      But it and the caller doesn't acquire RCU so a warning occurs.
      So add rcu_read_lock().
      
      Splat looks like:
      WARNING: suspicious RCU usage
      5.13.0-rc6+ #1179 Not tainted
      drivers/net/bonding/bond_main.c:571 suspicious
      rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      1 lock held by ping/974:
       #0: ffff888109e7db70 (sk_lock-AF_INET){+.+.}-{0:0},
      at: raw_sendmsg+0x1303/0x2cb0
      
      stack backtrace:
      CPU: 2 PID: 974 Comm: ping Not tainted 5.13.0-rc6+ #1179
      Call Trace:
       dump_stack+0xa4/0xe5
       bond_ipsec_offload_ok+0x1f4/0x260 [bonding]
       xfrm_output+0x179/0x890
       xfrm4_output+0xfa/0x410
       ? __xfrm4_output+0x4b0/0x4b0
       ? __ip_make_skb+0xecc/0x2030
       ? xfrm4_udp_encap_rcv+0x800/0x800
       ? ip_local_out+0x21/0x3a0
       ip_send_skb+0x37/0xa0
       raw_sendmsg+0x1bfd/0x2cb0
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      955b785e
    • T
      bonding: Add struct bond_ipesc to manage SA · 9a560550
      Taehee Yoo 提交于
      bonding has been supporting ipsec offload.
      When SA is added, bonding just passes SA to its own active real interface.
      But it doesn't manage SA.
      So, when events(add/del real interface, active real interface change, etc)
      occur, bonding can't handle that well because It doesn't manage SA.
      So some problems(panic, UAF, refcnt leak)occur.
      
      In order to make it stable, it should manage SA.
      That's the reason why struct bond_ipsec is added.
      When a new SA is added to bonding interface, it is stored in the
      bond_ipsec list. And the SA is passed to a current active real interface.
      If events occur, it uses bond_ipsec data to handle these events.
      bond->ipsec_list is protected by bond->ipsec_lock.
      
      If a current active real interface is changed, the following logic works.
      1. delete all SAs from old active real interface
      2. Add all SAs to the new active real interface.
      3. If a new active real interface doesn't support ipsec offload or SA's
      option, it sets real_dev to NULL.
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a560550
    • T
      bonding: disallow setting nested bonding + ipsec offload · b1216933
      Taehee Yoo 提交于
      bonding interface can be nested and it supports ipsec offload.
      So, it allows setting the nested bonding + ipsec scenario.
      But code does not support this scenario.
      So, it should be disallowed.
      
      interface graph:
      bond2
         |
      bond1
         |
      eth0
      
      The nested bonding + ipsec offload may not a real usecase.
      So, disallowing this scenario is fine.
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1216933
    • T
      bonding: fix suspicious RCU usage in bond_ipsec_del_sa() · a22c39b8
      Taehee Yoo 提交于
      To dereference bond->curr_active_slave, it uses rcu_dereference().
      But it and the caller doesn't acquire RCU so a warning occurs.
      So add rcu_read_lock().
      
      Test commands:
          ip netns add A
          ip netns exec A bash
          modprobe netdevsim
          echo "1 1" > /sys/bus/netdevsim/new_device
          ip link add bond0 type bond
          ip link set eth0 master bond0
          ip link set eth0 up
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \
      transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
      dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
          ip x s f
      
      Splat looks like:
      =============================
      WARNING: suspicious RCU usage
      5.13.0-rc3+ #1168 Not tainted
      -----------------------------
      drivers/net/bonding/bond_main.c:448 suspicious rcu_dereference_check()
      usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by ip/705:
       #0: ffff888106701780 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3},
      at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user]
       #1: ffff8880075b0098 (&x->lock){+.-.}-{2:2},
      at: xfrm_state_delete+0x16/0x30
      
      stack backtrace:
      CPU: 6 PID: 705 Comm: ip Not tainted 5.13.0-rc3+ #1168
      Call Trace:
       dump_stack+0xa4/0xe5
       bond_ipsec_del_sa+0x16a/0x1c0 [bonding]
       __xfrm_state_delete+0x51f/0x730
       xfrm_state_delete+0x1e/0x30
       xfrm_state_flush+0x22f/0x390
       xfrm_flush_sa+0xd8/0x260 [xfrm_user]
       ? xfrm_flush_policy+0x290/0x290 [xfrm_user]
       xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? find_held_lock+0x3a/0x1c0
       ? mutex_lock_io_nested+0x1210/0x1210
       ? sched_clock_cpu+0x18/0x170
       netlink_rcv_skb+0x121/0x350
      [ ... ]
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a22c39b8
    • T
      bonding: fix null dereference in bond_ipsec_add_sa() · 105cd17a
      Taehee Yoo 提交于
      If bond doesn't have real device, bond->curr_active_slave is null.
      But bond_ipsec_add_sa() dereferences bond->curr_active_slave without
      null checking.
      So, null-ptr-deref would occur.
      
      Test commands:
          ip link add bond0 type bond
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi \
      0x07 mode transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
      dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
      
      Splat looks like:
      KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      CPU: 4 PID: 680 Comm: ip Not tainted 5.13.0-rc3+ #1168
      RIP: 0010:bond_ipsec_add_sa+0xc4/0x2e0 [bonding]
      Code: 85 21 02 00 00 4d 8b a6 48 0c 00 00 e8 75 58 44 ce 85 c0 0f 85 14
      01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02
      00 0f 85 fc 01 00 00 48 8d bb e0 02 00 00 4d 8b 2c 24 48
      RSP: 0018:ffff88810946f508 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: ffff88810b4e8040 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: ffffffff8fe34280 RDI: ffff888115abe100
      RBP: ffff88810946f528 R08: 0000000000000003 R09: fffffbfff2287e11
      R10: 0000000000000001 R11: ffff888115abe0c8 R12: 0000000000000000
      R13: ffffffffc0aea9a0 R14: ffff88800d7d2000 R15: ffff88810b4e8330
      FS:  00007efc5552e680(0000) GS:ffff888119c00000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055c2530dbf40 CR3: 0000000103056004 CR4: 00000000003706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       xfrm_dev_state_add+0x2a9/0x770
       ? memcpy+0x38/0x60
       xfrm_add_sa+0x2278/0x3b10 [xfrm_user]
       ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user]
       ? register_lock_class+0x1750/0x1750
       xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? find_held_lock+0x3a/0x1c0
       ? mutex_lock_io_nested+0x1210/0x1210
       ? sched_clock_cpu+0x18/0x170
       netlink_rcv_skb+0x121/0x350
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? netlink_ack+0x9d0/0x9d0
       ? netlink_deliver_tap+0x17c/0xa50
       xfrm_netlink_rcv+0x68/0x80 [xfrm_user]
       netlink_unicast+0x41c/0x610
       ? netlink_attachskb+0x710/0x710
       netlink_sendmsg+0x6b9/0xb70
      [ ...]
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      105cd17a
    • T
      bonding: fix suspicious RCU usage in bond_ipsec_add_sa() · b648eba4
      Taehee Yoo 提交于
      To dereference bond->curr_active_slave, it uses rcu_dereference().
      But it and the caller doesn't acquire RCU so a warning occurs.
      So add rcu_read_lock().
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add bond0 type bond
          ip link set dummy0 master bond0
          ip link set dummy0 up
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 \
      	    mode transport \
      	    reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      	    0x44434241343332312423222114131211f4f3f2f1 128 sel \
      	    src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp offload \
      	    dev bond0 dir in
      
      Splat looks like:
      =============================
      WARNING: suspicious RCU usage
      5.13.0-rc3+ #1168 Not tainted
      -----------------------------
      drivers/net/bonding/bond_main.c:411 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      1 lock held by ip/684:
       #0: ffffffff9a2757c0 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3},
      at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user]
         55.191733][  T684] stack backtrace:
      CPU: 0 PID: 684 Comm: ip Not tainted 5.13.0-rc3+ #1168
      Call Trace:
       dump_stack+0xa4/0xe5
       bond_ipsec_add_sa+0x18c/0x1f0 [bonding]
       xfrm_dev_state_add+0x2a9/0x770
       ? memcpy+0x38/0x60
       xfrm_add_sa+0x2278/0x3b10 [xfrm_user]
       ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user]
       ? register_lock_class+0x1750/0x1750
       xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? find_held_lock+0x3a/0x1c0
       ? mutex_lock_io_nested+0x1210/0x1210
       ? sched_clock_cpu+0x18/0x170
       netlink_rcv_skb+0x121/0x350
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? netlink_ack+0x9d0/0x9d0
       ? netlink_deliver_tap+0x17c/0xa50
       xfrm_netlink_rcv+0x68/0x80 [xfrm_user]
       netlink_unicast+0x41c/0x610
       ? netlink_attachskb+0x710/0x710
       netlink_sendmsg+0x6b9/0xb70
      [ ... ]
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b648eba4
  19. 24 6月, 2021 1 次提交
  20. 23 6月, 2021 1 次提交
    • D
      bonding: avoid adding slave device with IFF_MASTER flag · 3c9ef511
      Di Zhu 提交于
      The following steps will definitely cause the kernel to crash:
      	ip link add vrf1 type vrf table 1
      	modprobe bonding.ko max_bonds=1
      	echo "+vrf1" >/sys/class/net/bond0/bonding/slaves
      	rmmod bonding
      
      The root cause is that: When the VRF is added to the slave device,
      it will fail, and some cleaning work will be done. because VRF device
      has IFF_MASTER flag, cleanup process  will not clear the IFF_BONDING flag.
      Then, when we unload the bonding module, unregister_netdevice_notifier()
      will treat the VRF device as a bond master device and treat netdev_priv()
      as struct bonding{} which actually is struct net_vrf{}.
      
      By analyzing the processing logic of bond_enslave(), it seems that
      it is not allowed to add the slave device with the IFF_MASTER flag, so
      we need to add a code check for this situation.
      Signed-off-by: NDi Zhu <zhudi21@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c9ef511
  21. 16 6月, 2021 1 次提交
    • J
      net: bonding: Use per-cpu rr_tx_counter · 848ca918
      Jussi Maki 提交于
      The round-robin rr_tx_counter was shared across CPUs leading to
      significant cache thrashing at high packet rates. This patch switches
      the round-robin packet counter to use a per-cpu variable to decide
      the destination slave.
      
      On a test with 2x100Gbit ICE nic with pktgen_sample_04_many_flows.sh
      (-s 64 -t 32) the tx rate was 19.6Mpps before and 22.3Mpps after
      this patch.
      
      "perf top -e cache_misses" before:
          12.31%  [bonding]       [k] bond_xmit_roundrobin_slave_get
          10.59%  [sch_fq_codel]  [k] fq_codel_dequeue
           9.34%  [kernel]        [k] skb_release_data
      after:
          15.42%  [sch_fq_codel]  [k] fq_codel_dequeue
          10.06%  [kernel]        [k] __memset
           9.12%  [kernel]        [k] skb_release_data
      Signed-off-by: NJussi Maki <joamaki@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      848ca918
  22. 04 6月, 2021 1 次提交
  23. 21 5月, 2021 2 次提交
  24. 18 5月, 2021 1 次提交
  25. 22 4月, 2021 1 次提交
    • J
      bonding: 3ad: Fix the conflict between bond_update_slave_arr and the state machine · 83d686a6
      jinyiting 提交于
      The bond works in mode 4, and performs down/up operations on the bond
      that is normally negotiated. The probability of bond-> slave_arr is NULL
      
      Test commands:
         ifconfig bond1 down
         ifconfig bond1 up
      
      The conflict occurs in the following process:
      
      __dev_open (CPU A)
      --bond_open
        --queue_delayed_work(bond->wq,&bond->ad_work,0);
        --bond_update_slave_arr
          --bond_3ad_get_active_agg_info
      
      ad_work(CPU B)
      --bond_3ad_state_machine_handler
        --ad_agg_selection_logic
      
      ad_work runs on cpu B. In the function ad_agg_selection_logic, all
      agg->is_active will be cleared. Before the new active aggregator is
      selected on CPU B, bond_3ad_get_active_agg_info failed on CPU A,
      bond->slave_arr will be set to NULL. The best aggregator in
      ad_agg_selection_logic has not changed, no need to update slave arr.
      
      The conflict occurred in that ad_agg_selection_logic clears
      agg->is_active under mode_lock, but bond_open -> bond_update_slave_arr
      is inspecting agg->is_active outside the lock.
      
      Also, bond_update_slave_arr is normal for potential sleep when
      allocating memory, so replace the WARN_ON with a call to might_sleep.
      Signed-off-by: Njinyiting <jinyiting@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83d686a6
  26. 30 3月, 2021 1 次提交
  27. 13 3月, 2021 1 次提交
  28. 09 3月, 2021 1 次提交
  29. 20 1月, 2021 1 次提交
    • J
      bonding: add a vlan+srcmac tx hashing option · 7b8fc010
      Jarod Wilson 提交于
      This comes from an end-user request, where they're running multiple VMs on
      hosts with bonded interfaces connected to some interest switch topologies,
      where 802.3ad isn't an option. They're currently running a proprietary
      solution that effectively achieves load-balancing of VMs and bandwidth
      utilization improvements with a similar form of transmission algorithm.
      
      Basically, each VM has it's own vlan, so it always sends its traffic out
      the same interface, unless that interface fails. Traffic gets split
      between the interfaces, maintaining a consistent path, with failover still
      available if an interface goes down.
      
      Unlike bond_eth_hash(), this hash function is using the full source MAC
      address instead of just the last byte, as there are so few components to
      the hash, and in the no-vlan case, we would be returning just the last
      byte of the source MAC as the hash value. It's entirely possible to have
      two NICs in a bond with the same last byte of their MAC, but not the same
      MAC, so this adjustment should guarantee distinct hashes in all cases.
      
      This has been rudimetarily tested to provide similar results to the
      proprietary solution it is aiming to replace. A patch for iproute2 is also
      posted, to properly support the new mode there as well.
      
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Thomas Davis <tadavis@lbl.gov>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Link: https://lore.kernel.org/r/20210119010927.1191922-1-jarod@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      7b8fc010