提交 · e79c1055749e3183a2beee04a24da378623329c5 · openanolis / cloud-kernel

17 5月, 2018 1 次提交

bonding: allow use of tx hashing in balance-alb · e79c1055

由 Debabrata Banerjee 提交于 5月 14, 2018

The rx load balancing provided by balance-alb is not mutually
exclusive with using hashing for tx selection, and should provide a decent
speed increase because this eliminates spinlocks and cache contention.
Signed-off-by: NDebabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e79c1055

12 5月, 2018 2 次提交

bonding: use the skb_get/set_queue_mapping · ae35c6f7

由 Tonghao Zhang 提交于 5月 11, 2018

Use the skb_get_queue_mapping, skb_set_queue_mapping
and skb_rx_queue_recorded for skb queue_mapping in bonding
driver, but not use it directly.
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae35c6f7

bonding: replace the return value type · dbdc8a21

由 Tonghao Zhang 提交于 5月 11, 2018

The method ndo_start_xmit is defined as returning a
netdev_tx_t, which is a typedef for an enum type,
but the implementation in this driver returns an int.
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dbdc8a21

11 5月, 2018 1 次提交

bonding: send learning packets for vlans on slave · 21706ee8

由 Debabrata Banerjee 提交于 5月 09, 2018

There was a regression at some point from the intended functionality of
commit f60c3704 ("bonding: Fix alb mode to only use first level
vlans.")

Given the return value vlan_get_encap_level() we need to store the nest
level of the bond device, and then compare the vlan's encap level to
this. Without this, this check always fails and learning packets are
never sent.

In addition, this same commit caused a regression in the behavior of
balance_alb, which requires learning packets be sent for all interfaces
using the slave's mac in order to load balance properly. For vlan's
that have not set a user mac, we can send after checking one bit.
Otherwise we need send the set mac, albeit defeating rx load balancing
for that vlan.
Signed-off-by: NDebabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21706ee8

23 4月, 2018 1 次提交

bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave · ddea788c

由 Xin Long 提交于 4月 22, 2018

After Commit 8a8efa22 ("bonding: sync netpoll code with bridge"), it
would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev
if bond->dev->npinfo was set.

However now slave_dev npinfo is set with bond->dev->npinfo before calling
slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called
in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup().
It causes that the lower dev of this slave dev can't set its npinfo.

One way to reproduce it:

  # modprobe bonding
  # brctl addbr br0
  # brctl addif br0 eth1
  # ifconfig bond0 192.168.122.1/24 up
  # ifenslave bond0 eth2
  # systemctl restart netconsole
  # ifenslave bond0 br0
  # ifconfig eth2 down
  # systemctl restart netconsole

The netpoll won't really work.

This patch is to remove that slave_dev npinfo setting in bond_enslave().

Fixes: 8a8efa22 ("bonding: sync netpoll code with bridge")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddea788c

28 3月, 2018 1 次提交

net: Drop pernet_operations::async · 2f635cee

由 Kirill Tkhai 提交于 3月 27, 2018

Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f635cee

27 3月, 2018 3 次提交

bonding: process the err returned by dev_set_allmulti properly in bond_enslave · 9f5a90c1

由 Xin Long 提交于 3月 26, 2018

When dev_set_promiscuity(1) succeeds but dev_set_allmulti(1) fails,
dev_set_promiscuity(-1) should be done before going to the err path.
Otherwise, dev->promiscuity will leak.

Fixes: 7e1a1ac1 ("bonding: Check return of dev_set_promiscuity/allmulti")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NAndy Gospodarek <andy@greyhouse.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f5a90c1

bonding: move dev_mc_sync after master_upper_dev_link in bond_enslave · ae42cc62

由 Xin Long 提交于 3月 26, 2018

Beniamino found a crash when adding vlan as slave of bond which is also
the parent link:

  ip link add bond1 type bond
  ip link set bond1 up
  ip link add link bond1 vlan1 type vlan id 80
  ip link set vlan1 master bond1

The call trace is as below:

  [<ffffffffa850842a>] queued_spin_lock_slowpath+0xb/0xf
  [<ffffffffa8515680>] _raw_spin_lock+0x20/0x30
  [<ffffffffa83f6f07>] dev_mc_sync+0x37/0x80
  [<ffffffffc08687dc>] vlan_dev_set_rx_mode+0x1c/0x30 [8021q]
  [<ffffffffa83efd2a>] __dev_set_rx_mode+0x5a/0xa0
  [<ffffffffa83f7138>] dev_mc_sync_multiple+0x78/0x80
  [<ffffffffc084127c>] bond_enslave+0x67c/0x1190 [bonding]
  [<ffffffffa8401909>] do_setlink+0x9c9/0xe50
  [<ffffffffa8403bf2>] rtnl_newlink+0x522/0x880
  [<ffffffffa8403ff7>] rtnetlink_rcv_msg+0xa7/0x260
  [<ffffffffa8424ecb>] netlink_rcv_skb+0xab/0xc0
  [<ffffffffa83fe498>] rtnetlink_rcv+0x28/0x30
  [<ffffffffa8424850>] netlink_unicast+0x170/0x210
  [<ffffffffa8424bf8>] netlink_sendmsg+0x308/0x420
  [<ffffffffa83cc396>] sock_sendmsg+0xb6/0xf0

This is actually a dead lock caused by sync slave hwaddr from master when
the master is the slave's 'slave'. This dead loop check is actually done
by netdev_master_upper_dev_link. However, Commit 1f718f0f ("bonding:
populate neighbour's private on enslave") moved it after dev_mc_sync.

This patch is to fix it by moving dev_mc_sync after master_upper_dev_link,
so that this loop check would be earlier than dev_mc_sync. It also moves
if (mode == BOND_MODE_8023AD) into if (!bond_uses_primary) clause as an
improvement.

Note team driver also has this issue, I will fix it in another patch.

Fixes: 1f718f0f ("bonding: populate neighbour's private on enslave")
Reported-by: NBeniamino Galvani <bgalvani@redhat.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NAndy Gospodarek <andy@greyhouse.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae42cc62

bonding: fix the err path for dev hwaddr sync in bond_enslave · 5c78f6bf

由 Xin Long 提交于 3月 26, 2018

vlan_vids_add_by_dev is called right after dev hwaddr sync, so on
the err path it should unsync dev hwaddr. Otherwise, the slave
dev's hwaddr will never be unsync when this err happens.

Fixes: 1ff412ad ("bonding: change the bond's vlan syncing functions with the standard ones")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: NAndy Gospodarek <andy@greyhouse.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c78f6bf

28 2月, 2018 1 次提交

net: Convert bond_net_ops · 6963ad69

由 Kirill Tkhai 提交于 2月 26, 2018

These pernet_operations populate/depopulate /proc and /sys
entries. Exit method unregisters all net bond devices, and
it seems another pernet_operations are not interested in
foreign net bond list. So, it's possible to mark them async.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6963ad69

08 11月, 2017 2 次提交

bonding: fix slave stuck in BOND_LINK_FAIL state · 055db695

由 Jay Vosburgh 提交于 11月 07, 2017

The bonding miimon logic has a flaw, in that a failure of the
rtnl_trylock can cause a slave to become permanently stuck in
BOND_LINK_FAIL state.

	The sequence of events to cause this is as follows:

	1) bond_miimon_inspect finds that a slave's link is down, and so
calls bond_propose_link_state, setting slave->new_link_state to
BOND_LINK_FAIL, then sets slave->new_link to BOND_LINK_DOWN and returns
non-zero.

	2) In bond_mii_monitor, the rtnl_trylock fails, and the timer is
rescheduled.  No change is committed.

	3) bond_miimon_inspect is called again, but this time the slave
from step 1 has recovered.  slave->new_link is reset to NOCHANGE, and, as
slave->link was never changed, the switch enters the BOND_LINK_UP case,
and does nothing.  The pending BOND_LINK_FAIL state from step 1 remains
pending, as new_link_state is not reset.

	4) The state from step 3 persists until another slave changes link
state and causes bond_miimon_inspect to return non-zero.  At this point,
the BOND_LINK_FAIL state change on the slave from steps 1-3 is committed,
and the slave will remain stuck in BOND_LINK_FAIL state even though it
is actually link up.

	The remedy for this is to initialize new_link_state on each entry
to bond_miimon_inspect, as is already done with new_link.

Fixes: fb9eb899 ("bonding: handle link transition from FAIL to UP correctly")
Reported-by: NAlex Sidorenko <alexandre.sidorenko@hpe.com>
Reviewed-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
Acked-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

055db695

bonding: discard lowest hash bit for 802.3ad layer3+4 · b5f86218

由 Hangbin Liu 提交于 11月 06, 2017

After commit 07f4c900 ("tcp/dccp: try to not exhaust ip_local_port_range
in connect()"), we will try to use even ports for connect(). Then if an
application (seen clearly with iperf) opens multiple streams to the same
destination IP and port, each stream will be given an even source port.

So the bonding driver's simple xmit_hash_policy based on layer3+4 addressing
will always hash all these streams to the same interface. And the total
throughput will limited to a single slave.

Change the tcp code will impact the whole tcp behavior, only for bonding
usage. Paolo Abeni suggested fix this by changing the bonding code only,
which should be more reasonable, and less impact.

Fix this by discarding the lowest hash bit because it contains little entropy.
After the fix we can re-balance between slaves.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5f86218

25 10月, 2017 2 次提交

locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05

由 Mark Rutland 提交于 10月 23, 2017

locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()

Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6aa7de05

bonding: remove rtmsg_ifinfo called in bond_master_upper_dev_link · 4597efe3

由 Xin Long 提交于 10月 24, 2017

Since commit 42e52bf9 ("net: add netnotifier event for upper device
change"), netdev_master_upper_dev_link has generated NETDEV_CHANGEUPPER
event which would send a notification to userspace in rtnetlink_event.

There's no need to call rtmsg_ifinfo to send the notification any more.
So this patch is to remove it from bond_master_upper_dev_link as well
as bond_upper_dev_unlink to avoid the redundant notifications.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4597efe3

05 10月, 2017 3 次提交

net: bonding: Add extack messages for some enslave failures · 759088bd

由 David Ahern 提交于 10月 04, 2017

A number of bond_enslave errors are logged using the netdev_err API.
Return those messages to userspace via the extack facility.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

759088bd

net: Add extack to upper device linking · 42ab19ee

由 David Ahern 提交于 10月 04, 2017

Add extack arg to netdev_upper_dev_link and netdev_master_upper_dev_link
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42ab19ee

net: Add extack to ndo_add_slave · 33eaf2a6

由 David Ahern 提交于 10月 04, 2017

Pass extack to do_set_master and down to ndo_add_slave
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33eaf2a6

04 10月, 2017 1 次提交

bonding: speed/duplex update at NETDEV_UP event · 4d2c0cda

由 Mahesh Bandewar 提交于 9月 27, 2017

Some NIC drivers don't have correct speed/duplex settings at the
time they send NETDEV_UP notification and that messes up the
bonding state. Especially 802.3ad mode which is very sensitive
to these settings. In the current implementation we invoke
bond_update_speed_duplex() when we receive NETDEV_UP, however,
ignore the return value. If the values we get are invalid
(UNKNOWN), then slave gets removed from the aggregator with
speed and duplex set to UNKNOWN while link is still marked as UP.

This patch fixes this scenario. Also 802.3ad mode is sensitive to
these conditions while other modes are not, so making sure that it
doesn't change the behavior for other modes.
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4d2c0cda

29 9月, 2017 1 次提交

arp: make arp_hdr_len() return unsigned int · 6ade97da

由 Alexey Dobriyan 提交于 9月 26, 2017

Negative ARP header length are not a thing.

Constify arguments while I'm at it.

Space savings:

	add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-3 (-3)
	function                        old     new   delta
	arpt_do_table                  1163    1160      -3
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ade97da

13 9月, 2017 1 次提交

net: bonding: fix tlb_dynamic_lb default value · f13ad104

由 Nikolay Aleksandrov 提交于 9月 12, 2017

Commit 8b426dc5 ("bonding: remove hardcoded value") changed the
default value for tlb_dynamic_lb which lead to either broken ALB mode
(since tlb_dynamic_lb can be changed only in TLB) or setting TLB mode
with tlb_dynamic_lb equal to 0.
The first issue was recently fixed by setting tlb_dynamic_lb to 1 always
when switching to ALB mode, but the default value is still wrong and
we'll enter TLB mode with tlb_dynamic_lb equal to 0 if the mode is
changed via netlink or sysfs. In order to restore the previous behaviour
and default value simply remove the mode check around the default param
initialization for tlb_dynamic_lb which will always set it to 1 as
before.

Fixes: 8b426dc5 ("bonding: remove hardcoded value")
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f13ad104

14 8月, 2017 1 次提交

bonding: ratelimit failed speed/duplex update warning · 11e9d782

由 Andreas Born 提交于 8月 12, 2017

bond_miimon_commit() handles the UP transition for each slave of a bond
in the case of MII. It is triggered 10 times per second for the default
MII Polling interval of 100ms. For device drivers that do not implement
__ethtool_get_link_ksettings() the call to bond_update_speed_duplex()
fails persistently while the MII status could remain UP. That is, in
this and other cases where the speed/duplex update keeps failing over a
longer period of time while the MII state is UP, a warning is printed
every MII polling interval.

To address these excessive warnings net_ratelimit() should be used.
Printing a warning once would not be sufficient since the call to
bond_update_speed_duplex() could recover to succeed and fail again
later. In that case there would be no new indication what went wrong.

Fixes: b5bf0f5b (bonding: correctly update link status during mii-commit phase)
Signed-off-by: NAndreas Born <futur.andy@googlemail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

11e9d782

12 8月, 2017 1 次提交

bonding: require speed/duplex only for 802.3ad, alb and tlb · ad729bc9

由 Andreas Born 提交于 8月 10, 2017

The patch c4adfc82 ("bonding: make speed, duplex setting consistent
with link state") puts the link state to down if
bond_update_speed_duplex() cannot retrieve speed and duplex settings.
Assumably the patch was written with 802.3ad mode in mind which relies
on link speed/duplex settings. For other modes like active-backup these
settings are not required. Thus, only for these other modes, this patch
reintroduces support for slaves that do not support reporting speed or
duplex such as wireless devices. This fixes the regression reported in
bug 196547 (https://bugzilla.kernel.org/show_bug.cgi?id=196547).

Fixes: c4adfc82 ("bonding: make speed, duplex setting consistent
with link state")
Signed-off-by: NAndreas Born <futur.andy@googlemail.com>
Acked-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad729bc9

27 7月, 2017 1 次提交

bonding: commit link status change after propose · d94708a5

由 WANG Cong 提交于 7月 25, 2017

Commit de77ecd4 ("bonding: improve link-status update in mii-monitoring")
moves link status commitment into bond_mii_monitor(), but it still relies
on the return value of bond_miimon_inspect() as the hint. We need to return
non-zero as long as we propose a link status change.

Fixes: de77ecd4 ("bonding: improve link-status update in mii-monitoring")
Reported-by: NBenjamin Gilbert <benjamin.gilbert@coreos.com>
Tested-by: NBenjamin Gilbert <benjamin.gilbert@coreos.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d94708a5

21 7月, 2017 1 次提交

net: bonding: Fix transmit load balancing in balance-alb mode · cbf5ecb3

由 Kosuke Tatsukawa 提交于 7月 20, 2017

balance-alb mode used to have transmit dynamic load balancing feature
enabled by default.  However, transmit dynamic load balancing no longer
works in balance-alb after commit 8b426dc5 ("bonding: remove
hardcoded value").

Both balance-tlb and balance-alb use the function bond_do_alb_xmit() to
send packets.  This function uses the parameter tlb_dynamic_lb.
tlb_dynamic_lb used to have the default value of 1 for balance-alb, but
now the value is set to 0 except in balance-tlb.

Re-enable transmit dyanmic load balancing by initializing tlb_dynamic_lb
for balance-alb similar to balance-tlb.

Fixes: 8b426dc5 ("bonding: remove hardcoded value")
Signed-off-by: NKosuke Tatsukawa <tatsu@ab.jp.nec.com>
Acked-by: NAndy Gospodarek <andy@greyhouse.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbf5ecb3

08 7月, 2017 1 次提交

bonding: avoid NETDEV_CHANGEMTU event when unregistering slave · f51048c3

由 WANG Cong 提交于 7月 06, 2017

As Hongjun/Nicolas summarized in their original patch:

"
When a device changes from one netns to another, it's first unregistered,
then the netns reference is updated and the dev is registered in the new
netns. Thus, when a slave moves to another netns, it is first
unregistered. This triggers a NETDEV_UNREGISTER event which is caught by
the bonding driver. The driver calls bond_release(), which calls
dev_set_mtu() and thus triggers NETDEV_CHANGEMTU (the device is still in
the old netns).
"

This is a very special case, because the device is being unregistered
no one should still care about the NETDEV_CHANGEMTU event triggered
at this point, we can avoid broadcasting this event on this path,
and avoid touching inetdev_event()/addrconf_notify() path.

It requires to export __dev_set_mtu() to bonding driver.
Reported-by: NHongjun Li <hongjun.li@6wind.com>
Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f51048c3

09 6月, 2017 1 次提交

net: Remove support for bridge bypass ndos from stacked devices · eca59f69

由 Arkadi Sharshevsky 提交于 6月 08, 2017

Remove support for bridge bypass ndos from stacked devices. At this point
no driver which supports stack device behavior offload supports operation
with SELF flag. The case for upper device is already taken care of in both
of the following cases:

1. FDB add/del - driver should check at the notification cb if the
                 stacked device contains his ports.

2. Port attribute - calls switchdev code directly which checks
                    for case of stack device.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eca59f69

08 6月, 2017 1 次提交

net: Fix inconsistent teardown and release of private netdev state. · cf124db5

由 David S. Miller 提交于 5月 08, 2017

Network devices can allocate reasources and private memory using
netdev_ops->ndo_init().  However, the release of these resources
can occur in one of two different places.

Either netdev_ops->ndo_uninit() or netdev->destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev->destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
it is not able to invoke netdev->destructor().

This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev->destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().

netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf124db5

28 5月, 2017 1 次提交

bonding: Prevent duplicate userspace notification · 7a7e96e0

由 Vlad Yasevich 提交于 5月 27, 2017

Whenever a user changes bonding options, a NETDEV_CHANGEINFODATA
notificatin is generated which results in a rtnelink message to
be sent.  While runnig 'ip monitor', we can actually see 2 messages,
one a result of the event, and the other a result of state change
that is generated bo netdev_state_change().  However, this is not
always the case. If bonding changes were done via sysfs or ifenslave
(old ioctl interface), then only 1 message is seen.

This patch removes duplicate messages in the case of using netlink
to configure bonding.  It introduceds a separte function that
triggers a netdev event and uses that function in the syfs and ioctl
cases.

This was discovered while auditing all the different envents and
continues the effort of cleaning up duplicated netlink messages.

CC: David Ahern <dsa@cumulusnetworks.com>
CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a7e96e0

26 5月, 2017 1 次提交

bonding: Don't update slave->link until ready to commit · 797a9364

由 Nithin Sujir 提交于 5月 24, 2017

In the loadbalance arp monitoring scheme, when a slave link change is
detected, the slave->link is immediately updated and slave_state_changed
is set. Later down the function, the rtnl_lock is acquired and the
changes are committed, updating the bond link state.

However, the acquisition of the rtnl_lock can fail. The next time the
monitor runs, since slave->link is already updated, it determines that
link is unchanged. This results in the bond link state permanently out
of sync with the slave link.

This patch modifies bond_loadbalance_arp_mon() to handle link changes
identical to bond_ab_arp_{inspect/commit}(). The new link state is
maintained in slave->new_link until we're ready to commit at which point
it's copied into slave->link.

NOTE: miimon_{inspect/commit}() has a more complex state machine
requiring the use of the bond_{propose,commit}_link_state() functions
which maintains the intermediate state in slave->link_new_state. The arp
monitors don't require that.

Testing: This bug is very easy to reproduce with the following steps.
1. In a loop, toggle a slave link of a bond slave interface.
2. In a separate loop, do ifconfig up/down of an unrelated interface to
create contention for rtnl_lock.
Within a few iterations, the bond link goes out of sync with the slave
link.
Signed-off-by: NNithin Nayak Sujir <nsujir@tintri.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
Acked-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

797a9364

23 5月, 2017 1 次提交

bonding: fix randomly populated arp target array · 72ccc471

由 Jarod Wilson 提交于 5月 19, 2017

In commit dc9c4d0f, the arp_target array moved from a static global
to a local variable. By the nature of static globals, the array used to
be initialized to all 0. At present, it's full of random data, which
that gets interpreted as arp_target values, when none have actually been
specified. Systems end up booting with spew along these lines:

[   32.161783] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
[   32.168475] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
[   32.175089] 8021q: adding VLAN 0 to HW filter on device lacp0
[   32.193091] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
[   32.204892] lacp0: Setting MII monitoring interval to 100
[   32.211071] lacp0: Removing ARP target 216.124.228.17
[   32.216824] lacp0: Removing ARP target 218.160.255.255
[   32.222646] lacp0: Removing ARP target 185.170.136.184
[   32.228496] lacp0: invalid ARP target 255.255.255.255 specified for removal
[   32.236294] lacp0: option arp_ip_target: invalid value (-255.255.255.255)
[   32.243987] lacp0: Removing ARP target 56.125.228.17
[   32.249625] lacp0: Removing ARP target 218.160.255.255
[   32.255432] lacp0: Removing ARP target 15.157.233.184
[   32.261165] lacp0: invalid ARP target 255.255.255.255 specified for removal
[   32.268939] lacp0: option arp_ip_target: invalid value (-255.255.255.255)
[   32.276632] lacp0: Removing ARP target 16.0.0.0
[   32.281755] lacp0: Removing ARP target 218.160.255.255
[   32.287567] lacp0: Removing ARP target 72.125.228.17
[   32.293165] lacp0: Removing ARP target 218.160.255.255
[   32.298970] lacp0: Removing ARP target 8.125.228.17
[   32.304458] lacp0: Removing ARP target 218.160.255.255

None of these were actually specified as ARP targets, and the driver does
seem to clean up the mess okay, but it's rather noisy and confusing, leaks
values to userspace, and the 255.255.255.255 spew shows up even when debug
prints are disabled.

The fix: just zero out arp_target at init time.

While we're in here, init arp_all_targets_value in the right place.

Fixes: dc9c4d0f ("bonding: reduce scope of some global variables")
CC: Mahesh Bandewar <maheshb@google.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
CC: stable@vger.kernel.org
Signed-off-by: NJarod Wilson <jarod@redhat.com>
Acked-by: NAndy Gospodarek <andy@greyhouse.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72ccc471

29 4月, 2017 1 次提交

bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal · 19cdead3

由 Paolo Abeni 提交于 4月 27, 2017

On slave list updates, the bonding driver computes its hard_header_len
as the maximum of all enslaved devices's hard_header_len.
If the slave list is empty, e.g. on last enslaved device removal,
ETH_HLEN is used.

Since the bonding header_ops are set only when the first enslaved
device is attached, the above can lead to header_ops->create()
being called with the wrong skb headroom in place.

If bond0 is configured on top of ipoib devices, with the
following commands:

ifup bond0
for slave in $BOND_SLAVES_LIST; do
	ip link set dev $slave nomaster
done
ping -c 1 <ip on bond0 subnet>

we will obtain a skb_under_panic() with a similar call trace:
	skb_push+0x3d/0x40
	push_pseudo_header+0x17/0x30 [ib_ipoib]
	ipoib_hard_header+0x4e/0x80 [ib_ipoib]
	arp_create+0x12f/0x220
	arp_send_dst.part.19+0x28/0x50
	arp_solicit+0x115/0x290
	neigh_probe+0x4d/0x70
	__neigh_event_send+0xa7/0x230
	neigh_resolve_output+0x12e/0x1c0
	ip_finish_output2+0x14b/0x390
	ip_finish_output+0x136/0x1e0
	ip_output+0x76/0xe0
	ip_local_out+0x35/0x40
	ip_send_skb+0x19/0x40
	ip_push_pending_frames+0x33/0x40
	raw_sendmsg+0x7d3/0xb50
	inet_sendmsg+0x31/0xb0
	sock_sendmsg+0x38/0x50
	SYSC_sendto+0x102/0x190
	SyS_sendto+0xe/0x10
	do_syscall_64+0x67/0x180
	entry_SYSCALL64_slow_path+0x25/0x25

This change addresses the issue avoiding updating the bonding device
hard_header_len when the slaves list become empty, forbidding to
shrink it below the value used by header_ops->create().

The bug is there since commit 54ef3137 ("[PATCH] bonding: Handle large
hard_header_len") but the panic can be triggered only since
commit fc791b63 ("IB/ipoib: move back IB LL address into the hard
header").
Reported-by: NNorbert P <noe@physik.uzh.ch>
Fixes: 54ef3137 ("[PATCH] bonding: Handle large hard_header_len")
Fixes: fc791b63 ("IB/ipoib: move back IB LL address into the hard header")
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19cdead3

22 4月, 2017 1 次提交

bonding: fix wq initialization for links created via netlink · ea8ffc08

由 Mahesh Bandewar 提交于 4月 20, 2017

Earlier patch 4493b81b ("bonding: initialize work-queues during
creation of bond") moved the work-queue initialization from bond_open()
to bond_create(). However this caused the link those are created using
netlink 'create bond option' (ip link add bondX type bond); create the
new trunk without initializing work-queues. Prior to the above mentioned
change, ndo_open was in both paths and things worked correctly. The
consequence is visible in the report shared by Joe Stringer -

I've noticed that this patch breaks bonding within namespaces if
you're not careful to perform device cleanup correctly.

Here's my repro script, you can run on any net-next with this patch
and you'll start seeing some weird behaviour:

ip netns add foo
ip li add veth0 type veth peer name veth0+ netns foo
ip li add veth1 type veth peer name veth1+ netns foo
ip netns exec foo ip li add bond0 type bond
ip netns exec foo ip li set dev veth0+ master bond0
ip netns exec foo ip li set dev veth1+ master bond0
ip netns exec foo ip addr add dev bond0 192.168.0.1/24
ip netns exec foo ip li set dev bond0 up
ip li del dev veth0
ip li del dev veth1

The second to last command segfaults, last command hangs. rtnl is now
permanently locked. It's not a problem if you take bond0 down before
deleting veths, or delete bond0 before deleting veths. If you delete
either end of the veth pair as per above, either inside or outside the
namespace, it hits this problem.

Here's some kernel logs:
[ 1221.801610] bond0: Enslaving veth0+ as an active interface with an up link
[ 1224.449581] bond0: Enslaving veth1+ as an active interface with an up link
[ 1281.193863] bond0: Releasing backup interface veth0+
[ 1281.193866] bond0: the permanent HWaddr of veth0+ -
16:bf:fb:e0:b8:43 - is still in use by bond0 - set the HWaddr of
veth0+ to a different address to avoid conflicts
[ 1281.193867] ------------[ cut here ]------------
[ 1281.193873] WARNING: CPU: 0 PID: 2024 at kernel/workqueue.c:1511
__queue_delayed_work+0x13f/0x150
[ 1281.193873] Modules linked in: bonding veth openvswitch nf_nat_ipv6
nf_nat_ipv4 nf_nat autofs4 nfsd auth_rpcgss nfs_acl binfmt_misc nfs
lockd grace sunrpc fscache ppdev vmw_balloon coretemp psmouse
serio_raw vmwgfx ttm drm_kms_helper vmw_vmci netconsole parport_pc
configfs drm i2c_piix4 fb_sys_fops syscopyarea sysfillrect sysimgblt
shpchp mac_hid nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4
nf_defrag_ipv4 nf_conntrack libcrc32c lp parport hid_generic usbhid
hid mptspi mptscsih e1000 mptbase ahci libahci
[ 1281.193905] CPU: 0 PID: 2024 Comm: ip Tainted: G        W
4.10.0-bisect-bond-v0.14 #37
[ 1281.193906] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[ 1281.193906] Call Trace:
[ 1281.193912]  dump_stack+0x63/0x89
[ 1281.193915]  __warn+0xd1/0xf0
[ 1281.193917]  warn_slowpath_null+0x1d/0x20
[ 1281.193918]  __queue_delayed_work+0x13f/0x150
[ 1281.193920]  queue_delayed_work_on+0x27/0x40
[ 1281.193929]  bond_change_active_slave+0x25b/0x670 [bonding]
[ 1281.193932]  ? synchronize_rcu_expedited+0x27/0x30
[ 1281.193935]  __bond_release_one+0x489/0x510 [bonding]
[ 1281.193939]  ? addrconf_notify+0x1b7/0xab0
[ 1281.193942]  bond_netdev_event+0x2c5/0x2e0 [bonding]
[ 1281.193944]  ? netconsole_netdev_event+0x124/0x190 [netconsole]
[ 1281.193947]  notifier_call_chain+0x49/0x70
[ 1281.193948]  raw_notifier_call_chain+0x16/0x20
[ 1281.193950]  call_netdevice_notifiers_info+0x35/0x60
[ 1281.193951]  rollback_registered_many+0x23b/0x3e0
[ 1281.193953]  unregister_netdevice_many+0x24/0xd0
[ 1281.193955]  rtnl_delete_link+0x3c/0x50
[ 1281.193956]  rtnl_dellink+0x8d/0x1b0
[ 1281.193960]  rtnetlink_rcv_msg+0x95/0x220
[ 1281.193962]  ? __kmalloc_node_track_caller+0x35/0x280
[ 1281.193964]  ? __netlink_lookup+0xf1/0x110
[ 1281.193966]  ? rtnl_newlink+0x830/0x830
[ 1281.193967]  netlink_rcv_skb+0xa7/0xc0
[ 1281.193969]  rtnetlink_rcv+0x28/0x30
[ 1281.193970]  netlink_unicast+0x15b/0x210
[ 1281.193971]  netlink_sendmsg+0x319/0x390
[ 1281.193974]  sock_sendmsg+0x38/0x50
[ 1281.193975]  ___sys_sendmsg+0x25c/0x270
[ 1281.193978]  ? mem_cgroup_commit_charge+0x76/0xf0
[ 1281.193981]  ? page_add_new_anon_rmap+0x89/0xc0
[ 1281.193984]  ? lru_cache_add_active_or_unevictable+0x35/0xb0
[ 1281.193985]  ? __handle_mm_fault+0x4e9/0x1170
[ 1281.193987]  __sys_sendmsg+0x45/0x80
[ 1281.193989]  SyS_sendmsg+0x12/0x20
[ 1281.193991]  do_syscall_64+0x6e/0x180
[ 1281.193993]  entry_SYSCALL64_slow_path+0x25/0x25
[ 1281.193995] RIP: 0033:0x7f6ec122f5a0
[ 1281.193995] RSP: 002b:00007ffe69e89c48 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[ 1281.193997] RAX: ffffffffffffffda RBX: 00007ffe69e8dd60 RCX: 00007f6ec122f5a0
[ 1281.193997] RDX: 0000000000000000 RSI: 00007ffe69e89c90 RDI: 0000000000000003
[ 1281.193998] RBP: 00007ffe69e89c90 R08: 0000000000000000 R09: 0000000000000003
[ 1281.193999] R10: 00007ffe69e89a10 R11: 0000000000000246 R12: 0000000058f14b9f
[ 1281.193999] R13: 0000000000000000 R14: 00000000006473a0 R15: 00007ffe69e8e450
[ 1281.194001] ---[ end trace 713a77486cbfbfa3 ]---

Fixes: 4493b81b ("bonding: initialize work-queues during creation of bond")
Reported-by: NJoe Stringer <joe@ovn.org>
Tested-by: NJoe Stringer <joe@ovn.org>
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Acked-by: NAndy Gospodarek <andy@greyhouse.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea8ffc08

18 4月, 2017 1 次提交

bonding: deliver link-local packets with skb->dev set to link that packets arrived on · b89f04c6

由 Chonggang Li 提交于 4月 16, 2017

Bonding driver changes the skb->dev to the bonding-master before
passing the packet to stack for further processing. This, however
does not make sense for the link-local packets and it loses "the
link info" once its skb->dev is changed to bonding-master.  This
patch changes this behavior for link-local packets by not changing
the skb->dev to the bonding-master and maintaining it as it is,
i.e. the link on which the packet arrived.
Signed-off-by: NChonggang Li <chonggangli@google.com>
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b89f04c6

14 4月, 2017 1 次提交

bonding: handle link transition from FAIL to UP correctly · fb9eb899

由 Mahesh Bandewar 提交于 4月 11, 2017

When link transitions from LINK_FAIL to LINK_UP, the commit phase is
not called. This leads to an erroneous state causing slave-link state to
get stuck in "going down" state while its speed and duplex are perfectly
fine. This issue is a side-effect of splitting link-set into propose and
commit phases introduced by de77ecd4 ("bonding: improve link-status
update in mii-monitoring")

This patch fixes these issues by calling commit phase whenever link
state change is proposed.

Fixes: de77ecd4 ("bonding: improve link-status update in mii-monitoring")
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb9eb899

06 4月, 2017 1 次提交

bonding: attempt to better support longer hw addresses · faeeb317

由 Jarod Wilson 提交于 4月 04, 2017

People are using bonding over Infiniband IPoIB connections, and who knows
what else. Infiniband has a hardware address length of 20 octets
(INFINIBAND_ALEN), and the network core defines a MAX_ADDR_LEN of 32.
Various places in the bonding code are currently hard-wired to 6 octets
(ETH_ALEN), such as the 3ad code, which I've left untouched here. Besides,
only alb is currently possible on Infiniband links right now anyway, due
to commit 1533e773, so the alb code is where most of the changes are.

One major component of this change is the addition of a bond_hw_addr_copy
function that takes a length argument, instead of using ether_addr_copy
everywhere that hardware addresses need to be copied about. The other
major component of this change is converting the bonding code from using
struct sockaddr for address storage to struct sockaddr_storage, as the
former has an address storage space of only 14, while the latter is 128
minus a few, which is necessary to support bonding over device with up to
MAX_ADDR_LEN octet hardware addresses. Additionally, this probably fixes
up some memory corruption issues with the current code, where it's
possible to write an infiniband hardware address into a sockaddr declared
on the stack.

Lightly tested on a dual mlx4 IPoIB setup, which properly shows a 20-octet
hardware address now:

$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
Primary Slave: mlx4_ib0 (primary_reselect always)
Currently Active Slave: mlx4_ib0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 100
Down Delay (ms): 100

Slave Interface: mlx4_ib0
MII Status: up
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr:
80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:1d:67:01
Slave queue ID: 0

Slave Interface: mlx4_ib1
MII Status: up
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr:
80:00:02:09:fe:80:00:00:00:00:00:01:e4:1d:2d:03:00:1d:67:02
Slave queue ID: 0

Also tested with a standard 1Gbps NIC bonding setup (with a mix of
e1000 and e1000e cards), running LNST's bonding tests.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

faeeb317

05 4月, 2017 1 次提交

bonding: fix active-backup transition · 3f3c278c

由 Mahesh Bandewar 提交于 4月 03, 2017

Earlier patch c4adfc82 ("bonding: make speed, duplex setting
consistent with link state") made an attempt to keep slave state
consistent with speed and duplex settings. Unfortunately link-state
transition is used to change the active link especially when used
in conjunction with mii-mon. The above mentioned patch broke that
logic. Also when speed and duplex settings for a link are updated
during a link-event, the link-status should not be changed to
invoke correct transition logic.

This patch fixes this issue by moving the link-state update outside
of the bond_update_speed_duplex() fn and to the places where this fn
is called and update link-state selectively.

Fixes: c4adfc82 ("bonding: make speed, duplex setting consistent
with link state")
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Reviewed-by: NAndy Gospodarek <andy@greyhouse.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f3c278c

31 3月, 2017 1 次提交

bonding: refine bond_fold_stats() wrap detection · 142c6594

由 Eric Dumazet 提交于 3月 29, 2017

Some device drivers reset their stats at down/up events, possibly
fooling bonding stats, since they operate with relative deltas.

It is nearly not possible to fix drivers, since some of them compute the
tx/rx counters based on per rx/tx queue stats, and the queues can be
reconfigured (ethtool -L) between the down/up sequence.

Lets avoid accumulating 'negative' values that render bonding stats
useless.

It is better to lose small deltas, assuming the bonding stats are
fetched at a reasonable frequency.

Fixes: 5f0c5f73 ("bonding: make global bonding stats more reliable")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

142c6594

28 3月, 2017 3 次提交

bonding: correctly update link status during mii-commit phase · b5bf0f5b

由 Mahesh Bandewar 提交于 3月 27, 2017

bond_miimon_commit() marks the link UP after attempting to get the speed
and duplex settings for the link. There is a possibility that
bond_update_speed_duplex() could fail. This is another place where it
could result into an inconsistent bonding link state.

With this patch the link will be marked UP only if the speed and duplex
values retrieved have sane values and processed further.
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5bf0f5b

bonding: make speed, duplex setting consistent with link state · c4adfc82

由 Mahesh Bandewar 提交于 3月 27, 2017

bond_update_speed_duplex() retrieves speed and duplex settings. There
is a possibility of failure in retrieving these values but caller has
to assume it's always successful. This leads to having inconsistent
slave link settings. If these (speed, duplex) values cannot be
retrieved, then keeping the link UP causes problems.

The updated bond_update_speed_duplex() returns 0 on success if it
retrieves sane values for speed and duplex. On failure it returns 1
and marks the link down.
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4adfc82

bonding: improve link-status update in mii-monitoring · de77ecd4

由 Mahesh Bandewar 提交于 3月 27, 2017

The primary issue is that mii-inspect phase updates link-state and
expects changes to be committed during the mii-commit phase. After
the inspect phase if it fails to acquire rtnl-mutex, the commit
phase (bond_mii_commit) doesn't get to run. This partially updated
state stays and makes the internal-state inconsistent.

e.g. setup bond0 => slaves: eth1, eth2
eth1 goes DOWN -> UP
   mii_monitor()
	mii-inspect()
	    bond_set_slave_link_state(eth1, UP, DontNotify)
	rtnl_trylock() <- fails!

Next mii-monitor round
eth1: No change
   mii_monitor()
	mii-inspect()
	    eth1->link == current-status (ethtool_ops->get_link)
	    no-change-detected

End result:
    eth1:
      Link = BOND_LINK_UP
      Speed = 0xfffff  [SpeedUnknown]
      Duplex = 0xff    [DuplexUnknown]

This doesn't always happen but for some unlucky machines in a large set
of machines it creates problems.

The fix for this is to avoid making changes during inspect phase and
postpone them until acquiring the rtnl-mutex / invoking commit phase.
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de77ecd4

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功