提交 · 76052d8c4f2dda6f31390521069bc109204e2f28 · openeuler / Kernel

12 4月, 2019 2 次提交

bridge: broute: make broute a real ebtables table · 223fd0ad

由 Florian Westphal 提交于 4月 11, 2019

This makes broute a normal ebtables table, hooking at PREROUTING.
The broute hook is removed.

It uses skb->cb to signal to bridge rx handler that the skb should be
routed instead of being bridged.

This change is backwards compatible with ebtables as no userspace visible
parts are changed.

This means we can also remove the !ops test in ebt_register_table,
it was only there for broute table sake.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

223fd0ad

bridge: reduce size of input cb to 16 bytes · f12064d1

由 Florian Westphal 提交于 4月 11, 2019

Reduce size of br_input_skb_cb from 24 to 16 bytes by
using bitfield for those values that can only be 0 or 1.

igmp is the igmp type value, so it needs to be at least u8.

Furthermore, the bridge currently relies on step-by-step initialization
of br_input_skb_cb fields as the skb passes through the stack.

Explicitly zero out the bridge input cb instead, this avoids having to
review/validate that no BR_INPUT_SKB_CB(skb)->foo test can see a
'random' value from previous protocol cb.

AFAICS all current fields are always set up before they are read again,
so this is not a bug fix.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f12064d1

30 3月, 2019 1 次提交

net: bridge: use netif_is_bridge_port() · 35f861e3

由 Julian Wiedmann 提交于 3月 29, 2019

Replace the br_port_exists() macro with its twin from netdevice.h

CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35f861e3

18 1月, 2019 1 次提交

net: Add extack argument to ndo_fdb_add() · 87b0984e

由 Petr Machata 提交于 1月 16, 2019

Drivers may not be able to support certain FDB entries, and an error
code is insufficient to give clear hints as to the reasons of rejection.

In order to make it possible to communicate the rejection reason, extend
ndo_fdb_add() with an extack argument. Adapt the existing
implementations of ndo_fdb_add() to take the parameter (and ignore it).
Pass the extack parameter when invoking ndo_fdb_add() from rtnl_fdb_add().
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87b0984e

09 1月, 2019 1 次提交

net: bridge: Fix VLANs memory leak · 27973793

由 Ido Schimmel 提交于 1月 08, 2019

When adding / deleting VLANs to / from a bridge port, the bridge driver
first tries to propagate the information via switchdev and falls back to
the 8021q driver in case the underlying driver does not support
switchdev. This can result in a memory leak [1] when VXLAN and mlxsw
ports are enslaved to the bridge:

$ ip link set dev vxlan0 master br0
# No mlxsw ports are enslaved to 'br0', so mlxsw ignores the switchdev
# notification and the bridge driver adds the VLAN on 'vxlan0' via the
# 8021q driver
$ bridge vlan add vid 10 dev vxlan0 pvid untagged
# mlxsw port is enslaved to the bridge
$ ip link set dev swp1 master br0
# mlxsw processes the switchdev notification and the 8021q driver is
# skipped
$ bridge vlan del vid 10 dev vxlan0

This results in 'struct vlan_info' and 'struct vlan_vid_info' being
leaked, as they were allocated by the 8021q driver during VLAN addition,
but never freed as the 8021q driver was skipped during deletion.

Fix this by introducing a new VLAN private flag that indicates whether
the VLAN was added on the port by switchdev or the 8021q driver. If the
VLAN was added by the 8021q driver, then we make sure to delete it via
the 8021q driver as well.

[1]
unreferenced object 0xffff88822d20b1e8 (size 256):
  comm "bridge", pid 2532, jiffies 4295216998 (age 1188.830s)
  hex dump (first 32 bytes):
    e0 42 97 ce 81 88 ff ff 00 00 00 00 00 00 00 00  .B..............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330
    [<00000000e0178b02>] vlan_vid_add+0x661/0x920
    [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00
    [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90
    [<000000003535392c>] br_vlan_info+0x132/0x410
    [<00000000aedaa9dc>] br_afspec+0x75c/0x870
    [<00000000f5716133>] br_setlink+0x3dc/0x6d0
    [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30
    [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80
    [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0
    [<000000008be8d614>] rtnetlink_rcv+0x21/0x30
    [<000000009ab2ca25>] netlink_unicast+0x52f/0x740
    [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50
    [<000000005d1e2050>] sock_sendmsg+0xbe/0x120
    [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0
    [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270
unreferenced object 0xffff888227454308 (size 32):
  comm "bridge", pid 2532, jiffies 4295216998 (age 1188.882s)
  hex dump (first 32 bytes):
    88 b2 20 2d 82 88 ff ff 88 b2 20 2d 82 88 ff ff  .. -...... -....
    81 00 0a 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330
    [<0000000018050631>] vlan_vid_add+0x3e6/0x920
    [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00
    [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90
    [<000000003535392c>] br_vlan_info+0x132/0x410
    [<00000000aedaa9dc>] br_afspec+0x75c/0x870
    [<00000000f5716133>] br_setlink+0x3dc/0x6d0
    [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30
    [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80
    [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0
    [<000000008be8d614>] rtnetlink_rcv+0x21/0x30
    [<000000009ab2ca25>] netlink_unicast+0x52f/0x740
    [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50
    [<000000005d1e2050>] sock_sendmsg+0xbe/0x120
    [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0
    [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270

Fixes: d70e42b2 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: bridge@lists.linux-foundation.org
Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27973793

17 12月, 2018 1 次提交

bridge: support for ndo_fdb_get · 47674562

由 Roopa Prabhu 提交于 12月 15, 2018

This patch implements ndo_fdb_get for the bridge
fdb.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47674562

13 12月, 2018 2 次提交

net: bridge: Propagate extack to switchdev · 169327d5

由 Petr Machata 提交于 12月 12, 2018

ndo_bridge_setlink has been updated in the previous patch to have extack
available, and changelink RTNL op has had this argument since the time
extack was added. Propagate both through the bridge driver to eventually
reach br_switchdev_port_vlan_add(), where it will be used by subsequent
patches.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: NIvan Vecera <ivecera@redhat.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

169327d5

net: ndo_bridge_setlink: Add extack · 2fd527b7

由 Petr Machata 提交于 12月 12, 2018

Drivers may not be able to implement a VLAN addition or reconfiguration.
In those cases it's desirable to explain to the user that it was
rejected (and why).

To that end, add extack argument to ndo_bridge_setlink. Adapt all users
to that change.

Following patches will use the new argument in the bridge driver.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2fd527b7

06 12月, 2018 4 次提交

net: bridge: increase multicast's default maximum number of entries · d08c6bc0

由 Nikolay Aleksandrov 提交于 12月 05, 2018

bridge's default hash_max was 512 which is rather conservative, now that
we're using the generic rhashtable API which autoshrinks let's increase
it to 4096 and move it to a define in br_private.h.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d08c6bc0

net: bridge: mark hash_elasticity as obsolete · cf332bca

由 Nikolay Aleksandrov 提交于 12月 05, 2018

Now that the bridge multicast uses the generic rhashtable interface we
can drop the hash_elasticity option as that is already done for us and
it's hardcoded to a maximum of RHT_ELASTICITY (16 currently). Add a
warning about the obsolete option when the hash_elasticity is set.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf332bca

net: bridge: multicast: use non-bh rcu flavor · 4329596c

由 Nikolay Aleksandrov 提交于 12月 05, 2018

The bridge multicast code has been using a mix of RCU and RCU-bh flavors
sometimes in questionable way. Since we've moved to rhashtable just use
non-bh RCU everywhere. In addition this simplifies freeing of objects
and allows us to remove some unnecessary callback functions.

v3: new patch
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4329596c

net: bridge: convert multicast to generic rhashtable · 19e3a9c9

由 Nikolay Aleksandrov 提交于 12月 05, 2018

The bridge multicast code currently uses a custom resizable hashtable
which predates the generic rhashtable interface. It has many
shortcomings compared and duplicates functionality that is presently
available via the generic rhashtable, so this patch removes the custom
rhashtable implementation in favor of the kernel's generic rhashtable.
The hash maximum is kept and the rhashtable's size is used to do a loose
check if it's reached in which case we revert to the old behaviour and
disable further bridge multicast processing. Also now we can support any
hash maximum, doesn't need to be a power of 2.

v3: add non-rcu br_mdb_get variant and use it where multicast_lock is
    held to avoid RCU splat, drop hash_max function and just set it
    directly

v2: handle when IGMP snooping is undefined, add br_mdb_init/uninit
    placeholders
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19e3a9c9

28 11月, 2018 2 次提交

net: bridge: add no_linklocal_learn bool option · 70e4272b

由 Nikolay Aleksandrov 提交于 11月 24, 2018

Use the new boolopt API to add an option which disables learning from
link-local packets. The default is kept as before and learning is
enabled. This is a simple map from a boolopt bit to a bridge private
flag that is tested before learning.

v2: pass NULL for extack via sysfs
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70e4272b

net: bridge: add support for user-controlled bool options · a428afe8

由 Nikolay Aleksandrov 提交于 11月 24, 2018

We have been adding many new bridge options, a big number of which are
boolean but still take up netlink attribute ids and waste space in the skb.
Recently we discussed learning from link-local packets[1] and decided
yet another new boolean option will be needed, thus introducing this API
to save some bridge nl space.
The API supports changing the value of multiple boolean options at once
via the br_boolopt_multi struct which has an optmask (which options to
set, bit per opt) and optval (options' new values). Future boolean
options will only be added to the br_boolopt_id enum and then will have
to be handled in br_boolopt_toggle/get. The API will automatically
add the ability to change and export them via netlink, sysfs can use the
single boolopt function versions to do the same. The behaviour with
failing/succeeding is the same as with normal netlink option changing.

If an option requires mapping to internal kernel flag or needs special
configuration to be enabled then it should be handled in
br_boolopt_toggle. It should also be able to retrieve an option's current
state via br_boolopt_get.

v2: WARN_ON() on unsupported option as that shouldn't be possible and
also will help catch people who add new options without handling
them for both set and get. Pass down extack so if an option desires
it could set it on error and be more user-friendly.

[1] https://www.spinics.net/lists/netdev/msg532698.htmlSigned-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a428afe8

18 11月, 2018 1 次提交

net: bridge: fix vlan stats use-after-free on destruction · 9d332e69

由 Nikolay Aleksandrov 提交于 11月 16, 2018

Syzbot reported a use-after-free of the global vlan context on port vlan
destruction. When I added per-port vlan stats I missed the fact that the
global vlan context can be freed before the per-port vlan rcu callback.
There're a few different ways to deal with this, I've chosen to add a
new private flag that is set only when per-port stats are allocated so
we can directly check it on destruction without dereferencing the global
context at all. The new field in net_bridge_vlan uses a hole.

v2: cosmetic change, move the check to br_process_vlan_info where the
other checks are done
v3: add change log in the patch, add private (in-kernel only) flags in a
hole in net_bridge_vlan struct and use that instead of mixing
user-space flags with private flags

Fixes: 9163a0fc ("net: bridge: add support for per-port vlan stats")
Reported-by: syzbot+04681da557a0e49a52e5@syzkaller.appspotmail.com
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d332e69

09 11月, 2018 1 次提交

bridge: use __vlan_hwaccel helpers · 5978f8a9

由 Michał Mirosław 提交于 11月 09, 2018

This removes assumption than vlan_tci != 0 when tag is present.
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5978f8a9

18 10月, 2018 1 次提交

bridge: switchdev: Allow clearing FDB entry offload indication · e9ba0fbc

由 Ido Schimmel 提交于 10月 17, 2018

Currently, an FDB entry only ceases being offloaded when it is deleted.
This changes with VxLAN encapsulation.

Devices capable of performing VxLAN encapsulation usually have only one
FDB table, unlike the software data path which has two - one in the
bridge driver and another in the VxLAN driver.

Therefore, bridge FDB entries pointing to a VxLAN device are only
offloaded if there is a corresponding entry in the VxLAN FDB.

Allow clearing the offload indication in case the corresponding entry
was deleted from the VxLAN FDB.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e9ba0fbc

13 10月, 2018 1 次提交

net: bridge: add support for per-port vlan stats · 9163a0fc

由 Nikolay Aleksandrov 提交于 10月 12, 2018

This patch adds an option to have per-port vlan stats instead of the
default global stats. The option can be set only when there are no port
vlans in the bridge since we need to allocate the stats if it is set
when vlans are being added to ports (and respectively free them
when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the
largest user of options. The current stats design allows us to add
these without any changes to the fast-path, it all comes down to
the per-vlan stats pointer which, if this option is enabled, will
be allocated for each port vlan instead of using the global bridge-wide
one.

CC: bridge@lists.linux-foundation.org
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9163a0fc

27 9月, 2018 9 次提交

net: bridge: pack net_bridge better · 35750b0b

由 Nikolay Aleksandrov 提交于 9月 26, 2018

Further reduce the size of net_bridge with 8 bytes and reduce the number of
holes in it:
 Before: holes: 5, sum holes: 15
 After: holes: 3, sum holes: 7
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35750b0b

net: bridge: convert mtu_set_by_user to a bit · 3341d917

由 Nikolay Aleksandrov 提交于 9月 26, 2018

Convert the last remaining bool option to a bit thus reducing the overall
net_bridge size further by 8 bytes.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3341d917

net: bridge: convert neigh_suppress_enabled option to a bit · c69c2cd4

由 Nikolay Aleksandrov 提交于 9月 26, 2018

Convert the neigh_suppress_enabled option to a bit.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c69c2cd4

net: bridge: convert mcast options to bits · 675779ad

由 Nikolay Aleksandrov 提交于 9月 26, 2018

This patch converts the rest of the mcast options to bits. It also packs
the mcast options a little better by moving multicast_mld_version to an
existing hole, reducing the net_bridge size by 8 bytes.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

675779ad

net: bridge: convert and rename mcast disabled · 13cefad2

由 Nikolay Aleksandrov 提交于 9月 26, 2018

Convert mcast disabled to an option bit and while doing so convert the
logic to check if multicast is enabled instead. That is make the logic
follow the option value - if it's set then mcast is enabled and vice versa.
This avoids a few confusing places where we inverted the value that's being
set to follow the mcast_disabled logic.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13cefad2

net: bridge: convert group_addr_set option to a bit · be3664a0

由 Nikolay Aleksandrov 提交于 9月 26, 2018

Convert group_addr_set internal bridge opt to a bit.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be3664a0

net: bridge: convert nf call options to bits · 8df3510f

由 Nikolay Aleksandrov 提交于 9月 26, 2018

No functional change, convert of nf_call_[ip|ip6|arp]tables to bits.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8df3510f

net: bridge: add bitfield for options and convert vlan opts · ae75767e

由 Nikolay Aleksandrov 提交于 9月 26, 2018

Bridge options have usually been added as separate fields all over the
net_bridge struct taking up space and ending up in different cache lines.
Let's move them to a single bitfield to save up space and speedup lookups.
This patch adds a simple API for option modifying and retrieving using
bitops and converts the first user of the API - the bridge vlan options
(vlan_enabled and vlan_stats_enabled).
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae75767e

net: bridge: make struct opening bracket consistent · 1c1cb6d0

由 Nikolay Aleksandrov 提交于 9月 26, 2018

Currently we have a mix of opening brackets on new lines and on the same
line, let's move them all on the same line.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c1cb6d0

13 9月, 2018 1 次提交

net: bridge: add support for sticky fdb entries · 435f2e7c

由 Nikolay Aleksandrov 提交于 9月 11, 2018

Add support for entries which are "sticky", i.e. will not change their port
if they show up from a different one. A new ndm flag is introduced for that
purpose - NTF_STICKY. We allow to set it only to non-local entries.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

435f2e7c

24 7月, 2018 1 次提交

net: bridge: add support for backup port · 2756f68c

由 Nikolay Aleksandrov 提交于 7月 23, 2018

This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
allows to set a backup port to be used for known unicast traffic if the
port has gone carrier down. The backup pointer is rcu protected and set
only under RTNL, a counter is maintained so when deleting a port we know
how many other ports reference it as a backup and we remove it from all.
Also the pointer is in the first cache line which is hot at the time of
the check and thus in the common case we only add one more test.
The backup port will be used only for the non-flooding case since
it's a part of the bridge and the flooded packets will be forwarded to it
anyway. To remove the forwarding just send a 0/non-existing backup port.
This is used to avoid numerous scalability problems when using MLAG most
notably if we have thousands of fdbs one would need to change all of them
on port carrier going down which takes too long and causes a storm of fdb
notifications (and again when the port comes back up). In a Multi-chassis
Link Aggregation setup usually hosts are connected to two different
switches which act as a single logical switch. Those switches usually have
a control and backup link between them called peerlink which might be used
for communication in case a host loses connectivity to one of them.
We need a fast way to failover in case a host port goes down and currently
none of the solutions (like bond) cannot fulfill the requirements because
the participating ports are actually the "master" devices and must have the
same peerlink as their backup interface and at the same time all of them
must participate in the bridge device. As Roopa noted it's normal practice
in routing called fast re-route where a precalculated backup path is used
when the main one is down.
Another use case of this is with EVPN, having a single vxlan device which
is backup of every port. Due to the nature of master devices it's not
currently possible to use one device as a backup for many and still have
all of them participate in the bridge (which is master itself).
More detailed information about MLAG is available at the link below.
https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG

Further explanation and a diagram by Roopa:
Two switches acting in a MLAG pair are connected by the peerlink
interface which is a bridge port.

the config on one of the switches looks like the below. The other
switch also has a similar config.
eth0 is connected to one port on the server. And the server is
connected to both switches.

br0 -- team0---eth0
|
-- switch-peerlink
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2756f68c

21 7月, 2018 1 次提交

bridge: make sure objects belong to container's owner · 705e0dea

由 Tyler Hicks 提交于 7月 20, 2018

When creating various bridge objects in /sys/class/net/... make sure
that they belong to the container's owner instead of global root (if
they belong to a container/namespace).
Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

705e0dea

01 6月, 2018 1 次提交

net: bridge: Extract boilerplate around switchdev_port_obj_*() · d66e4348

由 Petr Machata 提交于 5月 30, 2018

A call to switchdev_port_obj_add() or switchdev_port_obj_del() involves
initializing a struct switchdev_obj_port_vlan, a piece of code that
repeats on each call site almost verbatim. While in the current codebase
there is just one duplicated add call, the follow-up patches add more of
both add and del calls.

Thus to remove the duplication, extract the repetition into named
functions and reuse.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d66e4348

26 5月, 2018 1 次提交

net: bridge: add support for port isolation · 7d850abd

由 Nikolay Aleksandrov 提交于 5月 24, 2018

This patch adds support for a new port flag - BR_ISOLATED. If it is set
then isolated ports cannot communicate between each other, but they can
still communicate with non-isolated ports. The same can be achieved via
ACLs but they can't scale with large number of ports and also the
complexity of the rules grows. This feature can be used to achieve
isolated vlan functionality (similar to pvlan) as well, though currently
it will be port-wide (for all vlans on the port). The new test in
should_deliver uses data that is already cache hot and the new boolean
is used to avoid an additional source port test in should_deliver.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d850abd

04 5月, 2018 2 次提交

net: bridge: Notify about !added_by_user FDB entries · 161d82de

由 Petr Machata 提交于 5月 03, 2018

Do not automatically bail out on sending notifications about activity on
non-user-added FDB entries. Instead, notify about this activity except
for cases where the activity itself originates in a notification, to
avoid sending duplicate notifications.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: NIvan Vecera <ivecera@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

161d82de

net: bridge: avoid duplicate notification on up/down/change netdev events · faa1cd82

由 Nikolay Aleksandrov 提交于 5月 03, 2018

While handling netdevice events, br_device_event() sometimes uses
br_stp_(disable|enable)_port which unconditionally send a notification,
but then a second notification for the same event is sent at the end of
the br_device_event() function. To avoid sending duplicate notifications
in such cases, check if one has already been sent (i.e.
br_stp_enable/disable_port have been called).
The patch is based on a change by Satish Ashok.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

faa1cd82

01 5月, 2018 1 次提交

net: bridge: Publish bridge accessor functions · 4d4fd361

由 Petr Machata 提交于 4月 29, 2018

Add a couple new functions to allow querying FDB and vlan settings of a
bridge.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4d4fd361

01 4月, 2018 2 次提交

net: bridge: disable bridge MTU auto tuning if it was set manually · 804b854d

由 Nikolay Aleksandrov 提交于 3月 30, 2018

As Roopa noted today the biggest source of problems when configuring
bridge and ports is that the bridge MTU keeps changing automatically on
port events (add/del/changemtu). That leads to inconsistent behaviour
and network config software needs to chase the MTU and fix it on each
such event. Let's improve on that situation and allow for the user to
set any MTU within ETH_MIN/MAX limits, but once manually configured it
is the user's responsibility to keep it correct afterwards.

In case the MTU isn't manually set - the behaviour reverts to the
previous and the bridge follows the minimum MTU.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

804b854d

net: bridge: set min MTU on port events and allow user to set max · f40aa233

由 Nikolay Aleksandrov 提交于 3月 30, 2018

Recently the bridge was changed to automatically set maximum MTU on port
events (add/del/changemtu) when vlan filtering is enabled, but that
actually changes behaviour in a way which breaks some setups and can lead
to packet drops. In order to still allow that maximum to be set while being
compatible, we add the ability for the user to tune the bridge MTU up to
the maximum when vlan filtering is enabled, but that has to be done
explicitly and all port events (add/del/changemtu) lead to resetting that
MTU to the minimum as before.
Suggested-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f40aa233

24 3月, 2018 1 次提交

bridge: Allow max MTU when multiple VLANs present · 419d14af

由 Chas Williams 提交于 3月 22, 2018

If the bridge is allowing multiple VLANs, some VLANs may have
different MTUs.  Instead of choosing the minimum MTU for the
bridge interface, choose the maximum MTU of the bridge members.
With this the user only needs to set a larger MTU on the member
ports that are participating in the large MTU VLANS.
Signed-off-by: NChas Williams <3chas3@gmail.com>
Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

419d14af

23 1月, 2018 1 次提交

bridge: return boolean instead of integer in br_multicast_is_router · 03aaa9e2

由 Gustavo A. R. Silva 提交于 1月 18, 2018

Return statements in functions returning bool should use
true/false instead of 1/0.

This issue was detected with the help of Coccinelle.

Fixes: 85b35269 ("bridge: Fix build error when IGMP_SNOOPING is not enabled")
Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03aaa9e2

14 12月, 2017 1 次提交

net: bridge: use rhashtable for fdbs · eb793583

由 Nikolay Aleksandrov 提交于 12月 12, 2017

Before this patch the bridge used a fixed 256 element hash table which
was fine for small use cases (in my tests it starts to degrade
above 1000 entries), but it wasn't enough for medium or large
scale deployments. Modern setups have thousands of participants in a
single bridge, even only enabling vlans and adding a few thousand vlan
entries will cause a few thousand fdbs to be automatically inserted per
participating port. So we need to scale the fdb table considerably to
cope with modern workloads, and this patch converts it to use a
rhashtable for its operations thus improving the bridge scalability.
Tests show the following results (10 runs each), at up to 1000 entries
rhashtable is ~3% slower, at 2000 rhashtable is 30% faster, at 3000 it
is 2 times faster and at 30000 it is 50 times faster.
Obviously this happens because of the properties of the two constructs
and is expected, rhashtable keeps pretty much a constant time even with
10000000 entries (tested), while the fixed hash table struggles
considerably even above 10000.
As a side effect this also reduces the net_bridge struct size from 3248
bytes to 1344 bytes. Also note that the key struct is 8 bytes.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb793583

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功