提交 · e5a55a898720096f43bc24938f8875c0a1b34cd7 · openeuler / Kernel

01 11月, 2012 1 次提交

net: create generic bridge ops · e5a55a89

由 John Fastabend 提交于 10月 24, 2012

The PF_BRIDGE:RTM_{GET|SET}LINK nlmsg family and type are
currently embedded in the ./net/bridge module. This prohibits
them from being used by other bridging devices. One example
of this being hardware that has embedded bridging components.

In order to use these nlmsg types more generically this patch
adds two net_device_ops hooks. One to set link bridge attributes
and another to dump the current bride attributes.

	ndo_bridge_setlink()
	ndo_bridge_getlink()

CC: Lennert Buytenhek <buytenh@wantstofly.org>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5a55a89

11 10月, 2012 1 次提交

bridge: Pull ip header into skb->data before looking into ip header. · 6caab7b0

由 Sarveshwar Bandi 提交于 10月 10, 2012

If lower layer driver leaves the ip header in the skb fragment, it needs to
be first pulled into skb->data before inspecting ip header length or ip version
number.
Signed-off-by: NSarveshwar Bandi <sarveshwar.bandi@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6caab7b0

02 10月, 2012 1 次提交

netlink: add attributes to fdb interface · edc7d573

由 stephen hemminger 提交于 10月 01, 2012

Later changes need to be able to refer to neighbour attributes
when doing fdb_add.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

edc7d573

20 9月, 2012 1 次提交

netdev: make address const in device address management · 6b6e2725

由 stephen hemminger 提交于 9月 17, 2012

The internal functions for add/deleting addresses don't change
their argument.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b6e2725

12 9月, 2012 1 次提交

netfilter: log: Fix log-level processing · 16af511a

由 Joe Perches 提交于 9月 12, 2012

auto75914331@hushmail.com reports that iptables does not correctly
output the KERN_<level>.

$IPTABLES -A RULE_0_in  -j LOG  --log-level notice --log-prefix "DENY  in: "

result with linux 3.6-rc5
Sep 12 06:37:29 xxxxx kernel: <5>DENY  in: IN=eth0 OUT= MAC=.......

result with linux 3.5.3 and older:
Sep  9 10:43:01 xxxxx kernel: DENY  in: IN=eth0 OUT= MAC......

commit 04d2c8c8
("printk: convert the format for KERN_<LEVEL> to a 2 byte pattern")
updated the syslog header style but did not update netfilter uses.

Do so.

Use KERN_SOH and string concatenation instead of "%c" KERN_SOH_ASCII
as suggested by Eric Dumazet.
Signed-off-by: NJoe Perches <joe@perches.com>
cc: auto75914331@hushmail.com
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

16af511a

11 9月, 2012 1 次提交

netlink: Rename pid to portid to avoid confusion · 15e47304

由 Eric W. Biederman 提交于 9月 07, 2012

It is a frequent mistake to confuse the netlink port identifier with a
process identifier.  Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15e47304

09 9月, 2012 1 次提交

netlink: hide struct module parameter in netlink_kernel_create · 9f00d977

由 Pablo Neira Ayuso 提交于 9月 08, 2012

This patch defines netlink_kernel_create as a wrapper function of
__netlink_kernel_create to hide the struct module *me parameter
(which seems to be THIS_MODULE in all existing netlink subsystems).

Suggested by David S. Miller.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f00d977

16 8月, 2012 1 次提交

bridge: fix rcu dereference outside of rcu_read_lock · c03307ea

由 Stephen Hemminger 提交于 8月 14, 2012

Alternative solution for problem found by Linux Driver Verification
project (linuxtesting.org).

As it noted in the comment before the br_handle_frame_finish
function, this function should be called under rcu_read_lock.

The problem callgraph:
br_dev_xmit -> br_nf_pre_routing_finish_bridge_slow ->
 -> br_handle_frame_finish -> br_port_get_rcu -> rcu_dereference

And in this case there is no read-lock section.
Reported-by: NDenis Efremov <yefremov.denis@gmail.com>
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c03307ea

15 8月, 2012 5 次提交

netpoll: check netpoll tx status on the right device · e15c3c22

由 Amerigo Wang 提交于 8月 10, 2012

Although this doesn't matter actually, because netpoll_tx_running()
doesn't use the parameter, the code will be more readable.

For team_dev_queue_xmit() we have to move it down to avoid
compile errors.

Cc: David Miller <davem@davemloft.net>
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e15c3c22

bridge: use list_for_each_entry() in netpoll functions · 4e3828c4

由 Amerigo Wang 提交于 8月 10, 2012

We don't delete 'p' from the list in the loop,
so we can just use list_for_each_entry().

Cc: David Miller <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e3828c4

bridge: add some comments for NETDEV_RELEASE · d30362c0

由 Amerigo Wang 提交于 8月 10, 2012

Add comments on why we don't notify NETDEV_RELEASE.

Cc: David Miller <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d30362c0

netpoll: make __netpoll_cleanup non-block · 38e6bc18

由 Amerigo Wang 提交于 8月 10, 2012

Like the previous patch, slave_disable_netpoll() and __netpoll_cleanup()
may be called with read_lock() held too, so we should make them
non-block, by moving the cleanup and kfree() to call_rcu_bh() callbacks.

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38e6bc18

netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup() · 47be03a2

由 Amerigo Wang 提交于 8月 10, 2012

slave_enable_netpoll() and __netpoll_setup() may be called
with read_lock() held, so should use GFP_ATOMIC to allocate
memory. Eric suggested to pass gfp flags to __netpoll_setup().

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47be03a2

14 8月, 2012 1 次提交

netfilter: PTR_RET can be used · 19e303d6

由 Wu Fengguang 提交于 7月 28, 2012

This quiets the coccinelle warnings:

net/bridge/netfilter/ebtable_filter.c:107:1-3: WARNING: PTR_RET can be used
net/bridge/netfilter/ebtable_nat.c:107:1-3: WARNING: PTR_RET can be used
net/ipv6/netfilter/ip6table_filter.c:65:1-3: WARNING: PTR_RET can be used
net/ipv6/netfilter/ip6table_mangle.c:100:1-3: WARNING: PTR_RET can be used
net/ipv6/netfilter/ip6table_raw.c:44:1-3: WARNING: PTR_RET can be used
net/ipv6/netfilter/ip6table_security.c:62:1-3: WARNING: PTR_RET can be used
net/ipv4/netfilter/iptable_filter.c:72:1-3: WARNING: PTR_RET can be used
net/ipv4/netfilter/iptable_mangle.c:107:1-3: WARNING: PTR_RET can be used
net/ipv4/netfilter/iptable_raw.c:51:1-3: WARNING: PTR_RET can be used
net/ipv4/netfilter/iptable_security.c:70:1-3: WARNING: PTR_RET can be used
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

19e303d6

10 8月, 2012 1 次提交

time: jiffies_delta_to_clock_t() helper to the rescue · a399a805

由 Eric Dumazet 提交于 8月 08, 2012

Various /proc/net files sometimes report crazy timer values, expressed
in clock_t units.

This happens when an expired timer delta (expires - jiffies) is passed
to jiffies_to_clock_t().

This function has an overflow in :

return div_u64((u64)x * TICK_NSEC, NSEC_PER_SEC / USER_HZ);

commit cbbc719f (time: Change jiffies_to_clock_t() argument type
to unsigned long) only got around the problem.

As we cant output negative values in /proc/net/tcp without breaking
various tools, I suggest adding a jiffies_delta_to_clock_t() wrapper
that caps the negative delta to a 0 value.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NMaciej Żenczykowski <maze@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: hank <pyu@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a399a805

31 7月, 2012 1 次提交

bridge: make port attributes const · 5a0d513b

由 stephen hemminger 提交于 7月 30, 2012

Simple table that can be marked const.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a0d513b

23 7月, 2012 1 次提交

net: fix race condition in several drivers when reading stats · e3906486

由 Kevin Groeneveld 提交于 7月 21, 2012

Fix race condition in several network drivers when reading stats on 32bit
UP architectures.  These drivers update their stats in a BH context and
therefore should use u64_stats_fetch_begin_bh/u64_stats_fetch_retry_bh
instead of u64_stats_fetch_begin/u64_stats_fetch_retry when reading the
stats.
Signed-off-by: NKevin Groeneveld <kgroeneveld@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3906486

18 7月, 2012 1 次提交
- J
  netpoll: move np->dev and np->dev_name init into __netpoll_setup() · 30fdd8a0
  由 Jiri Pirko 提交于 7月 17, 2012
```
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  30fdd8a0
17 7月, 2012 2 次提交

net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() · 6700c270

由 David S. Miller 提交于 7月 17, 2012

This will be used so that we can compose a full flow key.

Even though we have a route in this context, we need more. In the
future the routes will be without destination address, source address,
etc. keying. One ipv4 route will cover entire subnets, etc.

In this environment we have to have a way to possess persistent storage
for redirects and PMTU information. This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here. Using that flow key will do a fib_lookup()
and create/update the persistent entry.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6700c270

bridge: Fix enforcement of multicast hash_max limit · 036be6db

由 Thomas Graf 提交于 7月 10, 2012

The hash size is doubled when it needs to grow and compared against
hash_max. The >= comparison will limit the hash table size to half
of what is expected i.e. the default 512 hash_max will not allow
the hash table to grow larger than 256.

Also print the hash table limit instead of the desirable size when
the limit is reached.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

036be6db

12 7月, 2012 1 次提交
- D
  net: Add dummy dst_ops->redirect method where needed. · b587ee3b
  由 David S. Miller 提交于 7月 12, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  b587ee3b
11 7月, 2012 1 次提交

bridge: fix endian · 4715213d

由 Li RongQing 提交于 7月 09, 2012

mld->mld_maxdelay is net endian, so we should use ntohs, not htons

CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4715213d

05 7月, 2012 2 次提交

D
br_netfilter: Convert to dst_neigh_lookup_skb(). · f9d75166
由 David S. Miller 提交于 7月 02, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
f9d75166

net: Add optional SKB arg to dst_ops->neigh_lookup(). · f894cbf8

由 David S. Miller 提交于 7月 02, 2012

Causes the handler to use the daddr in the ipv4/ipv6 header when
the route gateway is unspecified (local subnet).
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f894cbf8

01 7月, 2012 1 次提交

netfilter: use kfree_skb() not kfree() · f7eadafb

由 Dan Carpenter 提交于 6月 30, 2012

This was should be a kfree_skb() here to free the sk_buff pointer.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7eadafb

30 6月, 2012 1 次提交

netlink: add netlink_kernel_cfg parameter to netlink_kernel_create · a31f2d17

由 Pablo Neira Ayuso 提交于 6月 29, 2012

This patch adds the following structure:

struct netlink_kernel_cfg {
        unsigned int    groups;
        void            (*input)(struct sk_buff *skb);
        struct mutex    *cb_mutex;
};

That can be passed to netlink_kernel_create to set optional configurations
for netlink kernel sockets.

I've populated this structure by looking for NULL and zero parameters at the
existing code. The remaining parameters that always need to be set are still
left in the original interface.

That includes optional parameters for the netlink socket creation. This allows
easy extensibility of this interface in the future.

This patch also adapts all callers to use this new interface.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a31f2d17

27 6月, 2012 2 次提交

netfilter: ebt_ulog: Move away from NLMSG_PUT(). · 62566ca5

由 David S. Miller 提交于 6月 26, 2012

And use nlmsg_data() while we're here too.

Also, free and NULL out skb when nlmsg_put() fails and remove
pointless kernel log message.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62566ca5

bridge: Assign rtnl_link_ops to bridge devices created via ioctl (v2) · 149ddd83

由 stephen hemminger 提交于 6月 26, 2012

This ensures that bridges created with brctl(8) or ioctl(2) directly
also carry IFLA_LINKINFO when dumped over netlink. This also allows
to create a bridge with ioctl(2) and delete it with RTM_DELLINK.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

149ddd83

07 6月, 2012 1 次提交

netfilter: bridge: switch hook PFs to nfproto · aa740f46

由 Alban Crequy 提交于 5月 14, 2012

This patch is a cleanup. Use NFPROTO_* for consistency with other
netfilter code.
Signed-off-by: NAlban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: NJavier Martinez Canillas <javier.martinez@collabora.co.uk>
Reviewed-by: NVincent Sanders <vincent.sanders@collabora.co.uk>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

aa740f46

18 5月, 2012 1 次提交

ipv6: correct the ipv6 option name - Pad0 to Pad1 · 1de5a71c

由 Eldad Zack 提交于 5月 17, 2012

The padding destination or hop-by-hop option is called Pad1 and not Pad0.

See RFC2460 (4.2) or the IANA ipv6-parameters registry:
http://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xmlSigned-off-by: NEldad Zack <eldad@fogrefinery.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1de5a71c

10 5月, 2012 2 次提交

bridge: Convert compare_ether_addr to ether_addr_equal · 9a7b6ef9

由 Joe Perches 提交于 5月 08, 2012

Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.

Done via cocci script:

$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
-	!compare_ether_addr(a, b)
+	ether_addr_equal(a, b)

@@
expression a,b;
@@
-	compare_ether_addr(a, b)
+	!ether_addr_equal(a, b)

@@
expression a,b;
@@
-	!ether_addr_equal(a, b) == 0
+	ether_addr_equal(a, b)

@@
expression a,b;
@@
-	!ether_addr_equal(a, b) != 0
+	!ether_addr_equal(a, b)

@@
expression a,b;
@@
-	ether_addr_equal(a, b) == 0
+	!ether_addr_equal(a, b)

@@
expression a,b;
@@
-	ether_addr_equal(a, b) != 0
+	ether_addr_equal(a, b)

@@
expression a,b;
@@
-	!!ether_addr_equal(a, b)
+	ether_addr_equal(a, b)
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a7b6ef9

bridge: netfilter: Convert compare_ether_addr to ether_addr_equal · 171fe5ef

由 Joe Perches 提交于 5月 08, 2012

Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.

Done via cocci script:

$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
-	!compare_ether_addr(a, b)
+	ether_addr_equal(a, b)

@@
expression a,b;
@@
-	compare_ether_addr(a, b)
+	!ether_addr_equal(a, b)

@@
expression a,b;
@@
-	!ether_addr_equal(a, b) == 0
+	ether_addr_equal(a, b)

@@
expression a,b;
@@
-	!ether_addr_equal(a, b) != 0
+	!ether_addr_equal(a, b)

@@
expression a,b;
@@
-	ether_addr_equal(a, b) == 0
+	!ether_addr_equal(a, b)

@@
expression a,b;
@@
-	ether_addr_equal(a, b) != 0
+	ether_addr_equal(a, b)

@@
expression a,b;
@@
-	!!ether_addr_equal(a, b)
+	ether_addr_equal(a, b)
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

171fe5ef

09 5月, 2012 1 次提交

netfilter: bridge: optionally set indev to vlan · 4981682c

由 Pablo Neira Ayuso 提交于 5月 08, 2012

if net.bridge.bridge-nf-filter-vlan-tagged sysctl is enabled, bridge
netfilter removes the vlan header temporarily and then feeds the packet
to ip(6)tables.

When the new "bridge-nf-pass-vlan-input-device" sysctl is on
(default off), then bridge netfilter will also set the
in-interface to the vlan interface; if such an interface exists.

This is needed to make iptables REDIRECT target work with
"vlan-on-top-of-bridge" setups and to allow use of "iptables -i" to
match the vlan device name.

Also update Documentation with current brnf default settings.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NBart De Schuymer <bdschuym@pandora.be>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4981682c

01 5月, 2012 1 次提交

bridge: Fix fatal typo in setup of multicast_querier_expired · bb63f1f8

由 Herbert Xu 提交于 4月 30, 2012

Unfortunately it seems that I didn't properly test the case of
an expired external querier in the recent multicast bridge series.

The setup of the timer in that case is completely broken and leads
to a NULL-pointer dereference.  This patch fixes it.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb63f1f8

24 4月, 2012 1 次提交

set fake_rtable's dst to NULL to avoid kernel Oops · a881e963

由 Peter Huang (Peng) 提交于 4月 19, 2012

bridge: set fake_rtable's dst to NULL to avoid kernel Oops

when bridge is deleted before tap/vif device's delete, kernel may
encounter an oops because of NULL reference to fake_rtable's dst.
Set fake_rtable's dst to NULL before sending packets out can solve
this problem.

v4 reformat, change br_drop_fake_rtable(skb) to {}

v3 enrich commit header

v2 introducing new flag DST_FAKE_RTABLE to dst_entry struct.

[ Use "do { } while (0)" for nop br_drop_fake_rtable()
  implementation -DaveM ]
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NPeter Huang <peter.huangpeng@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a881e963

21 4月, 2012 2 次提交

net: Convert all sysctl registrations to register_net_sysctl · ec8f23ce

由 Eric W. Biederman 提交于 4月 19, 2012

This results in code with less boiler plate that is a bit easier
to read.

Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec8f23ce

net: Move all of the network sysctls without a namespace into init_net. · 5dd3df10

由 Eric W. Biederman 提交于 4月 19, 2012

This makes it clearer which sysctls are relative to your current network
namespace.

This makes it a little less error prone by not exposing sysctls for the
initial network namespace in other namespaces.

This is the same way we handle all of our other network interfaces to
userspace and I can't honestly remember why we didn't do this for
sysctls right from the start.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5dd3df10

16 4月, 2012 3 次提交

net: add generic PF_BRIDGE:RTM_ FDB hooks · 77162022

由 John Fastabend 提交于 4月 15, 2012

This adds two new flags NTF_MASTER and NTF_SELF that can
now be used to specify where PF_BRIDGE netlink commands should
be sent. NTF_MASTER sends the commands to the 'dev->master'
device for parsing. Typically this will be the linux net/bridge,
or open-vswitch devices. Also without any flags set the command
will be handled by the master device as well so that current user
space tools continue to work as expected.

The NTF_SELF flag will push the PF_BRIDGE commands to the
device. In the basic example below the commands are then parsed
and programmed in the embedded bridge.

Note if both NTF_SELF and NTF_MASTER bits are set then the
command will be sent to both 'dev->master' and 'dev' this allows
user space to easily keep the embedded bridge and software bridge
in sync.

There is a slight complication in the case with both flags set
when an error occurs. To resolve this the rtnl handler clears
the NTF_ flag in the netlink ack to indicate which sets completed
successfully. The add/del handlers will abort as soon as any
error occurs.

To support this new net device ops were added to call into
the device and the existing bridging code was refactored
to use these. There should be no required changes in user space
to support the current bridge behavior.

A basic setup with a SR-IOV enabled NIC looks like this,

          veth0  veth2
            |      |
          ------------
          |  bridge0 |   <---- software bridging
          ------------
               /
               /
  ethx.y      ethx
    VF         PF
     \         \          <---- propagate FDB entries to HW
     \         \
  --------------------
  |  Embedded Bridge |    <---- hardware offloaded switching
  --------------------

In this case the embedded bridge must be managed to allow 'veth0'
to communicate with 'ethx.y' correctly. At present drivers managing
the embedded bridge either send frames onto the network which
then get dropped by the switch OR the embedded bridge will flood
these frames. With this patch we have a mechanism to manage the
embedded bridge correctly from user space. This example is specific
to SR-IOV but replacing the VF with another PF or dropping this
into the DSA framework generates similar management issues.

Examples session using the 'br'[1] tool to add, dump and then
delete a mac address with a new "embedded" option and enabled
ixgbe driver:

# br fdb add 22:35:19:ac:60:59 dev eth3
# br fdb
port    mac addr                flags
veth0   22:35:19:ac:60:58       static
veth0   9a:5f:81:f7:f6:ec       local
eth3    00:1b:21:55:23:59       local
eth3    22:35:19:ac:60:59       static
veth0   22:35:19:ac:60:57       static
#br fdb add 22:35:19:ac:60:59 embedded dev eth3
#br fdb
port    mac addr                flags
veth0   22:35:19:ac:60:58       static
veth0   9a:5f:81:f7:f6:ec       local
eth3    00:1b:21:55:23:59       local
eth3    22:35:19:ac:60:59       static
veth0   22:35:19:ac:60:57       static
eth3    22:35:19:ac:60:59       local embedded
#br fdb del 22:35:19:ac:60:59 embedded dev eth3

I added a couple lines to 'br' to set the flags correctly is all. It
is my opinion that the merit of this patch is now embedded and SW
bridges can both be modeled correctly in user space using very nearly
the same message passing.

[1] 'br' tool was published as an RFC here and will be renamed 'bridge'
    http://patchwork.ozlabs.org/patch/117664/

Thanks to Jamal Hadi Salim, Stephen Hemminger and Ben Hutchings for
valuable feedback, suggestions, and review.

v2: fixed api descriptions and error case with both NTF_SELF and
    NTF_MASTER set plus updated patch description.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77162022

bridge: Add multicast_querier toggle and disable queries by default · c5c23260

由 Herbert Xu 提交于 4月 13, 2012

Sending general queries was implemented as an optimisation to speed
up convergence on start-up.  In order to prevent interference with
multicast routers a zero source address has to be used.

Unfortunately these packets appear to cause some multicast-aware
switches to misbehave, e.g., by disrupting multicast packets to us.

Since the multicast snooping feature still functions without sending
our own queries, this patch will change the default to not send
queries.

For those that need queries in order to speed up convergence on start-up,
a toggle is provided to restore the previous behaviour.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5c23260

bridge: Restart queries when last querier expires · c83b8fab

由 Herbert Xu 提交于 4月 13, 2012

As it stands when we discover that a real querier (one that queries
with a non-zero source address) we stop querying.  However, even
after said querier has fallen off the edge of the earth, we will
never restart querying (unless the bridge itself is restarted).

This patch fixes this by kicking our own querier into gear when
the timer for other queriers expire.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c83b8fab

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功