提交 · b7f14132bf58256e841774ae07d3ffb7a841c2bc · openeuler / Kernel

05 9月, 2022 1 次提交

bonding: use unspecified address if no available link local address · b7f14132

由 Hangbin Liu 提交于 8月 30, 2022

When ns_ip6_target was set, the ipv6_dev_get_saddr() will be called to get
available source address and send IPv6 neighbor solicit message.

If the target is global address, ipv6_dev_get_saddr() will get any
available src address. But if the target is link local address,
ipv6_dev_get_saddr() will only get available address from our interface,
i.e. the corresponding bond interface.

But before bond interface up, all the address is tentative, while
ipv6_dev_get_saddr() will ignore tentative address. This makes we can't
find available link local src address, then bond_ns_send() will not be
called and no NS message was sent. Finally bond interface will keep in
down state.

Fix this by sending NS with unspecified address if there is no available
source address.
Reported-by: NLiLiang <liali@redhat.com>
Fixes: 5e1eeef6 ("bonding: NS target should accept link local address")
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7f14132

23 8月, 2022 1 次提交

bonding: 3ad: make ad_ticks_per_sec a const · f2e44dff

由 Jonathan Toppins 提交于 8月 19, 2022

The value is only ever set once in bond_3ad_initialize and only ever
read otherwise. There seems to be no reason to set the variable via
bond_3ad_initialize when setting the global variable will do. Change
ad_ticks_per_sec to a const to enforce its read-only usage.
Signed-off-by: NJonathan Toppins <jtoppins@redhat.com>
Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

f2e44dff

11 8月, 2022 1 次提交

net/tls: Use RCU API to access tls_ctx->netdev · 94ce3b64

由 Maxim Mikityanskiy 提交于 8月 10, 2022

Currently, tls_device_down synchronizes with tls_device_resync_rx using
RCU, however, the pointer to netdev is stored using WRITE_ONCE and
loaded using READ_ONCE.

Although such approach is technically correct (rcu_dereference is
essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store
NULL), using special RCU helpers for pointers is more valid, as it
includes additional checks and might change the implementation
transparently to the callers.

Mark the netdev pointer as __rcu and use the correct RCU helpers to
access it. For non-concurrent access pass the right conditions that
guarantee safe access (locks taken, refcount value). Also use the
correct helper in mlx5e, where even READ_ONCE was missing.

The transition to RCU exposes existing issues, fixed by this commit:

1. bond_tls_device_xmit could read netdev twice, and it could become
NULL the second time, after the NULL check passed.

2. Drivers shouldn't stop processing the last packet if tls_device_down
just set netdev to NULL, before tls_dev_del was called. This prevents a
possible packet drop when transitioning to the fallback software mode.

Fixes: 89df6a81 ("net/bonding: Implement TLS TX device offload")
Fixes: c55dcdd4 ("net/tls: Fix use-after-free after the TLS device goes down and up")
Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Link: https://lore.kernel.org/r/20220810081602.1435800-1-maximmi@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

94ce3b64

04 8月, 2022 1 次提交

net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS · 06799a90

由 Vladimir Oltean 提交于 7月 31, 2022

The bonding driver piggybacks on time stamps kept by the network stack
for the purpose of the netdev TX watchdog, and this is problematic
because it does not work with NETIF_F_LLTX devices.

It is hard to say why the driver looks at dev_trans_start() of the
slave->dev, considering that this is updated even by non-ARP/NS probes
sent by us, and even by traffic not sent by us at all (for example PTP
on physical slave devices). ARP monitoring in active-backup mode appears
to still work even if we track only the last TX time of actual ARP
probes.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

06799a90

24 6月, 2022 1 次提交

Bonding: add per-port priority for failover re-selection · 0a2ff7cc

由 Hangbin Liu 提交于 6月 21, 2022

Add per port priority support for bonding active slave re-selection during
failover. A higher number means higher priority in selection. The primary
slave still has the highest priority. This option also follows the
primary_reselect rules.

This option could only be configured via netlink.
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Acked-by: NJonathan Toppins <jtoppins@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a2ff7cc

18 6月, 2022 1 次提交

bonding: ARP monitor spams NETDEV_NOTIFY_PEERS notifiers · 7a9214f3

由 Jay Vosburgh 提交于 6月 16, 2022

The bonding ARP monitor fails to decrement send_peer_notif, the
number of peer notifications (gratuitous ARP or ND) to be sent. This
results in a continuous series of notifications.

Correct this by decrementing the counter for each notification.
Reported-by: NJonathan Toppins <jtoppins@redhat.com>
Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
Fixes: b0929915 ("bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for ab arp monitor")
Link: https://lore.kernel.org/netdev/b2fd4147-8f50-bebd-963a-1a3e8d1d9715@redhat.com/Tested-by: NJonathan Toppins <jtoppins@redhat.com>
Reviewed-by: NJonathan Toppins <jtoppins@redhat.com>
Link: https://lore.kernel.org/r/9400.1655407960@famineSigned-off-by: NJakub Kicinski <kuba@kernel.org>

7a9214f3

10 6月, 2022 1 次提交

bonding: cleanup bond_create · 2fa3ee93

由 Jonathan Toppins 提交于 6月 08, 2022

Setting RLB_NULL_INDEX is not needed as this is done in bond_alb_initialize
which is called by bond_open.

Also reduce the number of rtnl_unlock calls by just using the standard
goto cleanup path.
Signed-off-by: NJonathan Toppins <jtoppins@redhat.com>
Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

2fa3ee93

01 6月, 2022 1 次提交

bonding: guard ns_targets by CONFIG_IPV6 · c4caa500

由 Hangbin Liu 提交于 5月 31, 2022

Guard ns_targets in struct bond_params by CONFIG_IPV6, which could save
256 bytes if IPv6 not configed. Also add this protection for function
bond_is_ip6_target_ok() and bond_get_targets_ip6().

Remove the IS_ENABLED() check for bond_opts[] as this will make
BOND_OPT_NS_TARGETS uninitialized if CONFIG_IPV6 not enabled. Add
a dummy bond_option_ns_ip6_targets_set() for this situation.

Fixes: 4e24be01 ("bonding: add new parameter ns_targets")
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Acked-by: NJonathan Toppins <jtoppins@redhat.com>
Link: https://lore.kernel.org/r/20220531063727.224043-1-liuhangbin@gmail.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>

c4caa500

20 5月, 2022 1 次提交

bonding: fix missed rcu protection · 9b80ccda

由 Hangbin Liu 提交于 5月 19, 2022

When removing the rcu_read_lock in bond_ethtool_get_ts_info() as
discussed [1], I didn't notice it could be called via setsockopt,
which doesn't hold rcu lock, as syzbot pointed:

  stack backtrace:
  CPU: 0 PID: 3599 Comm: syz-executor317 Not tainted 5.18.0-rc5-syzkaller-01392-g01f46857 #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   <TASK>
   __dump_stack lib/dump_stack.c:88 [inline]
   dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
   bond_option_active_slave_get_rcu include/net/bonding.h:353 [inline]
   bond_ethtool_get_ts_info+0x32c/0x3a0 drivers/net/bonding/bond_main.c:5595
   __ethtool_get_ts_info+0x173/0x240 net/ethtool/common.c:554
   ethtool_get_phc_vclocks+0x99/0x110 net/ethtool/common.c:568
   sock_timestamping_bind_phc net/core/sock.c:869 [inline]
   sock_set_timestamping+0x3a3/0x7e0 net/core/sock.c:916
   sock_setsockopt+0x543/0x2ec0 net/core/sock.c:1221
   __sys_setsockopt+0x55e/0x6a0 net/socket.c:2223
   __do_sys_setsockopt net/socket.c:2238 [inline]
   __se_sys_setsockopt net/socket.c:2235 [inline]
   __x64_sys_setsockopt+0xba/0x150 net/socket.c:2235
   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
   do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7f8902c8eb39

Fix it by adding rcu_read_lock and take a ref on the real_dev.
Since dev_hold() and dev_put() can take NULL these days, we can
skip checking if real_dev exist.

[1] https://lore.kernel.org/netdev/27565.1642742439@famine/

Reported-by: syzbot+92beb3d46aab498710fa@syzkaller.appspotmail.com
Fixes: aa603467 ("bonding: use rcu_dereference_rtnl when get bonding active slave")
Suggested-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Suggested-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220519020148.1058344-1-liuhangbin@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

9b80ccda

06 5月, 2022 1 次提交

net: make drivers set the TSO limit not the GSO limit · ee8b7a11

由 Jakub Kicinski 提交于 5月 05, 2022

Drivers should call the TSO setting helper, GSO is controllable
by user space.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee8b7a11

22 4月, 2022 1 次提交

ipv6: Use ipv6_only_sock() helper in condition. · 81ee0eb6

由 Kuniyuki Iwashima 提交于 4月 20, 2022

This patch replaces some sk_ipv6only tests with ipv6_only_sock().
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

81ee0eb6

17 4月, 2022 1 次提交

bonding: do not discard lowest hash bit for non layer3+4 hashing · 49aefd13

由 suresh kumar 提交于 4月 16, 2022

Commit b5f86218 was introduced to discard lowest hash bit for layer3+4 hashing
but it also removes last bit from non layer3+4 hashing

Below script shows layer2+3 hashing will result in same slave to be used with above commit.
$ cat hash.py
#/usr/bin/python3.6

h_dests=[0xa0, 0xa1]
h_source=0xe3
hproto=0x8
saddr=0x1e7aa8c0
daddr=0x17aa8c0

for h_dest in h_dests:
    hash = (h_dest ^ h_source ^ hproto ^ saddr ^ daddr)
    hash ^= hash >> 16
    hash ^= hash >> 8
    print(hash)

print("with last bit removed")
for h_dest in h_dests:
    hash = (h_dest ^ h_source ^ hproto ^ saddr ^ daddr)
    hash ^= hash >> 16
    hash ^= hash >> 8
    hash = hash >> 1
    print(hash)

Output:
$ python3.6 hash.py
522133332
522133333   <-------------- will result in both slaves being used

with last bit removed
261066666
261066666   <-------------- only single slave used
Signed-off-by: Nsuresh kumar <suresh2514@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

49aefd13

12 3月, 2022 1 次提交

net: add per-cpu storage and net->core_stats · 625788b5

由 Eric Dumazet 提交于 3月 10, 2022

Before adding yet another possibly contended atomic_long_t,
it is time to add per-cpu storage for existing ones:
 dev->tx_dropped, dev->rx_dropped, and dev->rx_nohandler

Because many devices do not have to increment such counters,
allocate the per-cpu storage on demand, so that dev_get_stats()
does not have to spend considerable time folding zero counters.

Note that some drivers have abused these counters which
were supposed to be only used by core networking stack.

v4: should use per_cpu_ptr() in dev_get_stats() (Jakub)
v3: added a READ_ONCE() in netdev_core_stats_alloc() (Paolo)
v2: add a missing include (reported by kernel test robot <lkp@intel.com>)
    Change in netdev_core_stats_alloc() (Jakub)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: jeffreyji <jeffreyji@google.com>
Reviewed-by: NBrian Vazquez <brianvv@google.com>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220311051420.2608812-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

625788b5

21 2月, 2022 2 次提交

bonding: add new parameter ns_targets · 4e24be01

由 Hangbin Liu 提交于 2月 21, 2022

Add a new bonding parameter ns_targets to store IPv6 address.
Add required bond_ns_send/rcv functions first before adding
IPv6 address option setting.

Add two functions bond_send/rcv_validate so we can send/recv
ARP and NS at the same time.
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e24be01

Bonding: split bond_handle_vlan from bond_arp_send · 1fcd5d44

由 Hangbin Liu 提交于 2月 21, 2022

Function bond_handle_vlan() is split from bond_arp_send() for later
IPv6 usage.
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fcd5d44

18 2月, 2022 1 次提交

bonding: force carrier update when releasing slave · a6ab75ce

由 Zhang Changzhong 提交于 2月 16, 2022

In __bond_release_one(), bond_set_carrier() is only called when bond
device has no slave. Therefore, if we remove the up slave from a master
with two slaves and keep the down slave, the master will remain up.

Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
statement.

Reproducer:
$ insmod bonding.ko mode=0 miimon=100 max_bonds=2
$ ifconfig bond0 up
$ ifenslave bond0 eth0 eth1
$ ifconfig eth0 down
$ ifenslave -d bond0 eth1
$ cat /proc/net/bonding/bond0

Fixes: ff59c456 ("[PATCH] bonding: support carrier state for master")
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/1645021088-38370-1-git-send-email-zhangchangzhong@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

a6ab75ce

09 2月, 2022 1 次提交

bonding: switch bond_net_exit() to batch mode · 16a41634

由 Eric Dumazet 提交于 2月 07, 2022

cleanup_net() is competing with other rtnl users.

Batching bond_net_exit() factorizes all rtnl acquistions
to a single one, giving chance for cleanup_net()
to progress much faster, holding rtnl a bit longer.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

16a41634

24 1月, 2022 1 次提交

bonding: use rcu_dereference_rtnl when get bonding active slave · aa603467

由 Hangbin Liu 提交于 1月 21, 2022

bond_option_active_slave_get_rcu() should not be used in rtnl_mutex as it
use rcu_dereference(). Replace to rcu_dereference_rtnl() so we also can use
this function in rtnl protected context.

With this update, we can rmeove the rcu_read_lock/unlock in
bonding .ndo_eth_ioctl and .get_ts_info.
Reported-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Fixes: 94dd016a ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device")
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa603467

17 1月, 2022 1 次提交

bonding: Fix extraction of ports from the packet headers · 429e3d12

由 Moshe Tal 提交于 1月 16, 2022

Wrong hash sends single stream to multiple output interfaces.

The offset calculation was relative to skb->head, fix it to be relative
to skb->data.

Fixes: a815bde5 ("net, bonding: Refactor bond_xmit_hash for use with
xdp_buff")
Reviewed-by: NJussi Maki <joamaki@gmail.com>
Reviewed-by: NSaeed Mahameed <saeedm@nvidia.com>
Reviewed-by: NGal Pressman <gal@nvidia.com>
Signed-off-by: NMoshe Tal <moshet@nvidia.com>
Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

429e3d12

12 1月, 2022 1 次提交

net: bonding: fix bond_xmit_broadcast return value error bug · 4e5bd03a

由 Jie Wang 提交于 1月 12, 2022

In Linux bonding scenario, one packet is copied to several copies and sent
by all slave device of bond0 in mode 3(broadcast mode). The mode 3 xmit
function bond_xmit_broadcast() only ueses the last slave device's tx result
as the final result. In this case, if the last slave device is down, then
it always return NET_XMIT_DROP, even though the other slave devices xmit
success. It may cause the tx statistics error, and cause the application
(e.g. scp) consider the network is unreachable.

For example, use the following command to configure server A.

echo 3 > /sys/class/net/bond0/bonding/mode
ifconfig bond0 up
ifenslave bond0 eth0 eth1
ifconfig bond0 192.168.1.125
ifconfig eth0 up
ifconfig eth1 down
The slave device eth0 and eth1 are connected to server B(192.168.1.107).
Run the ping 192.168.1.107 -c 3 -i 0.2 command, the following information
is displayed.

PING 192.168.1.107 (192.168.1.107) 56(84) bytes of data.
64 bytes from 192.168.1.107: icmp_seq=1 ttl=64 time=0.077 ms
64 bytes from 192.168.1.107: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from 192.168.1.107: icmp_seq=3 ttl=64 time=0.051 ms

 192.168.1.107 ping statistics
0 packets transmitted, 3 received

Actually, the slave device eth0 of the bond successfully sends three
ICMP packets, but the result shows that 0 packets are transmitted.

Also if we use scp command to get remote files, the command end with the
following printings.

ssh_exchange_identification: read: Connection timed out

So this patch modifies the bond_xmit_broadcast to return NET_XMIT_SUCCESS
if one slave device in the bond sends packets successfully. If all slave
devices send packets fail, the discarded packets stats is increased. The
skb is released when there is no slave device in the bond or the last slave
device is down.

Fixes: ae46f184 ("bonding: propagate transmit status")
Signed-off-by: NJie Wang <wangjie125@huawei.com>
Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e5bd03a

30 12月, 2021 2 次提交

Bonding: return HWTSTAMP_FLAG_BONDED_PHC_INDEX to notify user space · cfe355c5

由 Hangbin Liu 提交于 12月 29, 2021

If the userspace program is distributed in binary form (distro package),
there is no way to know on which kernel versions it will run.

Let's only check if the flag was set when do SIOCSHWTSTAMP. And return
hwtstamp_config with flag HWTSTAMP_FLAG_BONDED_PHC_INDEX to notify
userspace whether the new feature is supported or not.
Suggested-by: NJakub Kicinski <kuba@kernel.org>
Fixes: 085d6100 ("Bonding: force user to add HWTSTAMP_FLAG_BONDED_PHC_INDEX when get/set HWTSTAMP")
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

cfe355c5

net: Don't include filter.h from net/sock.h · b6459415

由 Jakub Kicinski 提交于 12月 28, 2021

sock.h is pretty heavily used (5k objects rebuilt on x86 after
it's touched). We can drop the include of filter.h from it and
add a forward declaration of struct sk_filter instead.
This decreases the number of rebuilt objects when bpf.h
is touched from ~5k to ~1k.

There's a lot of missing includes this was masking. Primarily
in networking tho, this time.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: NStefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org

b6459415

14 12月, 2021 1 次提交

Bonding: force user to add HWTSTAMP_FLAG_BONDED_PHC_INDEX when get/set HWTSTAMP · 085d6100

由 Hangbin Liu 提交于 12月 10, 2021

When there is a failover, the PHC index of bond active interface will be
changed. This may break the user space program if the author didn't aware.

By setting this flag, the user should aware that the PHC index get/set
by syscall is not stable. And the user space is able to deal with it.
Without this flag, the kernel will reject the request forwarding to
bonding.
Reported-by: NJakub Kicinski <kuba@kernel.org>
Fixes: 94dd016a ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device")
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

085d6100

13 12月, 2021 1 次提交

net: bonding: debug: avoid printing debug logs when bond is not notifying peers · fee32de2

由 Suresh Kumar 提交于 12月 13, 2021

Currently "bond_should_notify_peers: slave ..." messages are printed whenever
"bond_should_notify_peers" function is called.

+++
Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:26 node1 kernel: bond0: (slave enp0s25): Received LACPDU on port 1
Dec 12 12:33:26 node1 kernel: bond0: (slave enp0s25): Rx Machine: Port=1, Last State=6, Curr State=6
Dec 12 12:33:26 node1 kernel: bond0: (slave enp0s25): partner sync=1
Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
...
Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:30 node1 kernel: bond0: (slave enp4s3): Received LACPDU on port 2
Dec 12 12:33:30 node1 kernel: bond0: (slave enp4s3): Rx Machine: Port=2, Last State=6, Curr State=6
Dec 12 12:33:30 node1 kernel: bond0: (slave enp4s3): partner sync=1
Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
+++

This is confusing and can also clutter up debug logs.
Print logs only when the peer notification happens.
Signed-off-by: NSuresh Kumar <suresh2514@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fee32de2

30 11月, 2021 2 次提交

bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device · 94dd016a

由 Hangbin Liu 提交于 11月 30, 2021

We have VLAN PTP support(via get_ts_info) on kernel, and bond support(by
getting active interface via netlink message) on userspace tool linuxptp.
But there are always some users who want to use PTP with VLAN over bond,
which is not able to do with the current implementation.

This patch passed get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device
with bond mode active-backup/tlb/alb. With this users could get kernel native
bond or VLAN over bond PTP support.

Test with ptp4l and it works with VLAN over bond after this patch:
]# ptp4l -m -i bond0.23
ptp4l[53377.141]: selected /dev/ptp4 as PTP clock
ptp4l[53377.142]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[53377.143]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[53377.143]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[53384.127]: port 1: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
ptp4l[53384.127]: selected local clock e41d2d.fffe.123db0 as best master
ptp4l[53384.127]: port 1: assuming the grand master role
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94dd016a

Bonding: add arp_missed_max option · 5944b5ab

由 Hangbin Liu 提交于 11月 30, 2021

Currently, we use hard code number to verify if we are in the
arp_interval timeslice. But some user may want to reduce/extend
the verify timeslice. With the similar team option 'missed_max'
the uers could change that number based on their own environment.
Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5944b5ab

22 11月, 2021 1 次提交

net: annotate accesses to dev->gso_max_segs · 6d872df3

由 Eric Dumazet 提交于 11月 19, 2021

dev->gso_max_segs is written under RTNL protection, or when the device is
not yet visible, but is read locklessly.

Add netif_set_gso_max_segs() helper.

Add the READ_ONCE()/WRITE_ONCE() pairs, and use netif_set_gso_max_segs()
where we can to better document what is going on.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d872df3

24 10月, 2021 1 次提交

net: bonding: constify and use dev_addr_set() · 6f238100

由 Jakub Kicinski 提交于 10月 22, 2021

Commit 406f42fa ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it go through appropriate helpers.

Make sure local references to netdev->dev_addr are constant.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f238100

09 10月, 2021 1 次提交

net: use dev_addr_set() · ea52a0b5

由 Jakub Kicinski 提交于 10月 08, 2021

Use dev_addr_set() instead of writing directly to netdev->dev_addr
in various misc and old drivers.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea52a0b5

06 9月, 2021 1 次提交

bonding: Fix negative jump label count on nested bonding · 6d5f1ef8

由 Jussi Maki 提交于 9月 06, 2021

With nested bonding devices the nested bond device's ndo_bpf was
called without a program causing it to decrement the static key
without a prior increment leading to negative count.

Fix the issue by 1) only calling slave's ndo_bpf when there's a
program to be loaded and 2) only decrement the count when a program
is unloaded.

Fixes: 9e2ee5c7 ("net, bonding: Add XDP support to the bonding driver")
Reported-by: syzbot+30622fb04ddd72a4d167@syzkaller.appspotmail.com
Signed-off-by: NJussi Maki <joamaki@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d5f1ef8

05 9月, 2021 1 次提交

bonding: complain about missing route only once for A/B ARP probes · 0a4fd8df

由 David Decotigny 提交于 9月 03, 2021

On configs where there is no confirgured direct route to the target of
the ARP probes, these probes are still sent and may be replied to
properly, so no need to repeatedly complain about the missing route.
Signed-off-by: NDavid Decotigny <ddecotig@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a4fd8df

16 8月, 2021 1 次提交

bonding: improve nl error msg when device can't be enslaved because of IFF_MASTER · 1b3f78df

由 Antoine Tenart 提交于 8月 16, 2021

Use a more user friendly netlink error message when a device can't be
enslaved because it has IFF_MASTER, by not referring directly to a
kernel internal flag.
Signed-off-by: NAntoine Tenart <atenart@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b3f78df

14 8月, 2021 1 次提交

net, bonding: Disallow vlan+srcmac with XDP · 39a0876d

由 Jussi Maki 提交于 8月 12, 2021

The new vlan+srcmac xmit policy is not implementable with XDP since
in many cases the 802.1Q payload is not present in the packet. This
can be for example due to hardware offload or in the case of veth
due to use of skbuffs internally.

This also fixes the NULL deref with the vlan+srcmac xmit policy
reported by Jonathan Toppins by additionally checking the skb
pointer.

Fixes: a815bde5 ("net, bonding: Refactor bond_xmit_hash for use with xdp_buff")
Reported-by: NJonathan Toppins <jtoppins@redhat.com>
Signed-off-by: NJussi Maki <joamaki@gmail.com>
Reviewed-by: NJonathan Toppins <jtoppins@redhat.com>
Link: https://lore.kernel.org/r/20210812145241.12449-1-joamaki@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

39a0876d

12 8月, 2021 1 次提交

bonding: combine netlink and console error messages · 6569fa2d

由 Jonathan Toppins 提交于 8月 10, 2021

There seems to be no reason to have different error messages between
netlink and printk. It also cleans up the function slightly.
Signed-off-by: NJonathan Toppins <jtoppins@redhat.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

6569fa2d

10 8月, 2021 2 次提交

net, bonding: Add XDP support to the bonding driver · 9e2ee5c7

由 Jussi Maki 提交于 7月 31, 2021

XDP is implemented in the bonding driver by transparently delegating
the XDP program loading, removal and xmit operations to the bonding
slave devices. The overall goal of this work is that XDP programs
can be attached to a bond device *without* any further changes (or
awareness) necessary to the program itself, meaning the same XDP
program can be attached to a native device but also a bonding device.

Semantics of XDP_TX when attached to a bond are equivalent in such
setting to the case when a tc/BPF program would be attached to the
bond, meaning transmitting the packet out of the bond itself using one
of the bond's configured xmit methods to select a slave device (rather
than XDP_TX on the slave itself). Handling of XDP_TX to transmit
using the configured bonding mechanism is therefore implemented by
rewriting the BPF program return value in bpf_prog_run_xdp. To avoid
performance impact this check is guarded by a static key, which is
incremented when a XDP program is loaded onto a bond device. This
approach was chosen to avoid changes to drivers implementing XDP. If
the slave device does not match the receive device, then XDP_REDIRECT
is transparently used to perform the redirection in order to have
the network driver release the packet from its RX ring. The bonding
driver hashing functions have been refactored to allow reuse with
xdp_buff's to avoid code duplication.

The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters. An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.

Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter (see patch
2/3). The statistics were collected using "sar -n dev -u 1 10". On top
of that, for ice, further work is in progress on improving the XDP_TX
numbers [4].

 -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
 without patch (1 dev):
   XDP_DROP:              3.15%      48.6Mpps
   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
   XDP_DROP (RSS):        9.47%      116.5Mpps
   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
 -----------------------
 with patch, bond (1 dev):
   XDP_DROP:              3.14%      46.7Mpps
   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
   XDP_DROP (RSS):        10.33%     117.2Mpps
   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
 -----------------------
 with patch, bond (2 devs):
   XDP_DROP:              6.27%      92.7Mpps
   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
   XDP_DROP (RSS):       11.38%      117.2Mpps
   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
 --------------------------------------------------------------

RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.

  [1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
  [2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
  [3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/
  [4]: https://lore.kernel.org/bpf/20210805230046.28715-1-maciej.fijalkowski@intel.com/T/#tSigned-off-by: NJussi Maki <joamaki@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20210731055738.16820-4-joamaki@gmail.com

9e2ee5c7

net, bonding: Refactor bond_xmit_hash for use with xdp_buff · a815bde5

由 Jussi Maki 提交于 7月 31, 2021

In preparation for adding XDP support to the bonding driver
refactor the packet hashing functions to be able to work with
any linear data buffer without an skb.
Signed-off-by: NJussi Maki <joamaki@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Link: https://lore.kernel.org/bpf/20210731055738.16820-2-joamaki@gmail.com

a815bde5

03 8月, 2021 1 次提交

bonding: add new option lacp_active · 3a755cd8

由 Hangbin Liu 提交于 8月 02, 2021

Add an option lacp_active, which is similar with team's runner.active.
This option specifies whether to send LACPDU frames periodically. If set
on, the LACPDU frames are sent along with the configured lacp_rate
setting. If set off, the LACPDU frames acts as "speak when spoken to".

Note, the LACPDU state frames still will be sent when init or unbind port.

v2: remove module parameter
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a755cd8

02 8月, 2021 1 次提交

bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler() · 220ade77

由 Yufeng Mo 提交于 7月 30, 2021

Some time ago, I reported a calltrace issue
"did not find a suitable aggregator", please see[1].
After a period of analysis and reproduction, I find
that this problem is caused by concurrency.

Before the problem occurs, the bond structure is like follows:

bond0 - slaver0(eth0) - agg0.lag_ports -> port0 - port1
                      \
                        port0
      \
        slaver1(eth1) - agg1.lag_ports -> NULL
                      \
                        port1

If we run 'ifenslave bond0 -d eth1', the process is like below:

excuting __bond_release_one()
|
bond_upper_dev_unlink()[step1]
|                       |                       |
|                       |                       bond_3ad_lacpdu_recv()
|                       |                       ->bond_3ad_rx_indication()
|                       |                       spin_lock_bh()
|                       |                       ->ad_rx_machine()
|                       |                       ->__record_pdu()[step2]
|                       |                       spin_unlock_bh()
|                       |                       |
|                       bond_3ad_state_machine_handler()
|                       spin_lock_bh()
|                       ->ad_port_selection_logic()
|                       ->try to find free aggregator[step3]
|                       ->try to find suitable aggregator[step4]
|                       ->did not find a suitable aggregator[step5]
|                       spin_unlock_bh()
|                       |
|                       |
bond_3ad_unbind_slave() |
spin_lock_bh()
spin_unlock_bh()

step1: already removed slaver1(eth1) from list, but port1 remains
step2: receive a lacpdu and update port0
step3: port0 will be removed from agg0.lag_ports. The struct is
       "agg0.lag_ports -> port1" now, and agg0 is not free. At the
	   same time, slaver1/agg1 has been removed from the list by step1.
	   So we can't find a free aggregator now.
step4: can't find suitable aggregator because of step2
step5: cause a calltrace since port->aggregator is NULL

To solve this concurrency problem, put bond_upper_dev_unlink()
after bond_3ad_unbind_slave(). In this way, we can invalid the port
first and skip this port in bond_3ad_state_machine_handler(). This
eliminates the situation that the slaver has been removed from the
list but the port is still valid.

[1]https://lore.kernel.org/netdev/10374.1611947473@famine/Signed-off-by: NYufeng Mo <moyufeng@huawei.com>
Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

220ade77

28 7月, 2021 2 次提交

net: bonding: move ioctl handling to private ndo operation · 3d9d00bd

由 Arnd Bergmann 提交于 7月 27, 2021

All other user triggered operations are gone from ndo_ioctl, so move
the SIOCBOND family into a custom operation as well.

The .ndo_ioctl() helper is no longer called by the dev_ioctl.c code now,
but there are still a few definitions in obsolete wireless drivers as well
as the appletalk and ieee802154 layers to call SIOCSIFADDR/SIOCGIFADDR
helpers from inside the kernel.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d9d00bd

dev_ioctl: split out ndo_eth_ioctl · a7605370

由 Arnd Bergmann 提交于 7月 27, 2021

Most users of ndo_do_ioctl are ethernet drivers that implement
the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware
timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP.

Separate these from the few drivers that use ndo_do_ioctl to
implement SIOCBOND, SIOCBR and SIOCWANDEV commands.

This is a purely cosmetic change intended to help readers find
their way through the implementation.

Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Vladimir Oltean <olteanv@gmail.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NJason Gunthorpe <jgg@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7605370

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功