提交 · 7c1508e5f64a784988be4659dd4d6b791c008bbf · openeuler / Kernel

21 3月, 2019 3 次提交

net: remove 'fallback' argument from dev->ndo_select_queue() · a350ecce

由 Paolo Abeni 提交于 3月 20, 2019

After the previous patch, all the callers of ndo_select_queue()
provide as a 'fallback' argument netdev_pick_tx.
The only exceptions are nested calls to ndo_select_queue(),
which pass down the 'fallback' available in the current scope
- still netdev_pick_tx.

We can drop such argument and replace fallback() invocation with
netdev_pick_tx(). This avoids an indirect call per xmit packet
in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
with device drivers implementing such ndo. It also clean the code
a bit.

Tested with ixgbe and CONFIG_FCOE=m

With pktgen using queue xmit:
threads		vanilla 	patched
		(kpps)		(kpps)
1		2334		2428
2		4166		4278
4		7895		8100

 v1 -> v2:
 - rebased after helper's name change
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a350ecce

packet: rework packet_pick_tx_queue() to use common code selection · b71b5837

由 Paolo Abeni 提交于 3月 20, 2019

Currently packet_pick_tx_queue() is the only caller of
ndo_select_queue() using a fallback argument other than
netdev_pick_tx.

Leveraging rx queue, we can obtain a similar queue selection
behavior using core helpers. After this change, ndo_select_queue()
is always invoked with netdev_pick_tx() as fallback.
We can change ndo_select_queue() signature in a followup patch,
dropping an indirect call per transmitted packet in some scenarios
(e.g. TCP syn and XDP generic xmit)

This changes slightly how af packet queue selection happens when
PACKET_QDISC_BYPASS is set. It's now more similar to plan dev_queue_xmit()
tacking in account both XPS and TC mapping.

 v1  -> v2:
  - rebased after helper name change
 RFC -> v1:
  - initialize sender_cpu to the expected value
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b71b5837

net: dev: rename queue selection helpers. · 4bd97d51

由 Paolo Abeni 提交于 3月 20, 2019

With the following patches, we are going to use __netdev_pick_tx() in
many modules. Rename it to netdev_pick_tx(), to make it clear is
a public API.

Also rename the existing netdev_pick_tx() to netdev_core_pick_tx(),
to avoid name clashes.
Suggested-by: NEric Dumazet <edumazet@google.com>
Suggested-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4bd97d51

28 2月, 2019 2 次提交

net: Remove switchdev_ops · 3d705f07

由 Florian Fainelli 提交于 2月 27, 2019

Now that we have converted all possible callers to using a switchdev
notifier for attributes we do not have a need for implementing
switchdev_ops anymore, and this can be removed from all drivers the
net_device structure.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d705f07

net: dev: Use unsigned integer as an argument to left-shift · f4d7b3e2

由 Andy Shevchenko 提交于 2月 27, 2019

1 << 31 is Undefined Behaviour according to the C standard.
Use U type modifier to avoid theoretical overflow.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4d7b3e2

27 2月, 2019 1 次提交

devlink: create a special NDO for getting the devlink instance · b473b0d2

由 Jakub Kicinski 提交于 2月 25, 2019

Instead of iterating over all devlink ports add a NDO which
will return the devlink instance from the driver.

v2: add the netdev_to_devlink() helper (Michal)
v3: check that devlink has ops (Florian)
v4: hold devlink_mutex (Jiri)
Suggested-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b473b0d2

25 2月, 2019 1 次提交

net: dev: add generic protodown handler · b5899679

由 Andy Roulin 提交于 2月 22, 2019

Introduce dev_change_proto_down_generic, a generic ndo_change_proto_down
implementation, which sets the netdev carrier state according to proto_down.

This adds the ability to set protodown on vxlan and macvlan devices in a
generic way for use by control protocols like VRRPD.
Signed-off-by: NAndy Roulin <aroulin@cumulusnetworks.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5899679

23 2月, 2019 1 次提交

net: Introduce parse_protocol header_ops callback · e78b2915

由 Maxim Mikityanskiy 提交于 2月 21, 2019

Introduce a new optional header_ops callback called parse_protocol and a
wrapper function dev_parse_header_protocol, similar to dev_parse_header.

The new callback's purpose is to extract the protocol number from the L2
header, the format of which is known to the driver, but not to the upper
layers of the stack.
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e78b2915

15 2月, 2019 1 次提交

net: bpf: remove XDP_QUERY_XSK_UMEM enumerator · f8ebfaf6

由 Jan Sokolowski 提交于 2月 13, 2019

Commit c9b47cc1 ("xsk: fix bug when trying to use both copy and
zero-copy on one queue id") moved the umem query code to the AF_XDP
core, and therefore removed the need to query the netdevice for a
umem.

This patch removes XDP_QUERY_XSK_UMEM and all code that implement that
behavior, which is just dead code.
Signed-off-by: NJan Sokolowski <jan.sokolowski@intel.com>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

f8ebfaf6

07 2月, 2019 1 次提交

net: Introduce ndo_get_port_parent_id() · d6abc596

由 Florian Fainelli 提交于 2月 06, 2019

In preparation for getting rid of switchdev_ops, create a dedicated NDO
operation for getting the port's parent identifier. There are
essentially two classes of drivers that need to implement getting the
port's parent ID which are VF/PF drivers with a built-in switch, and
pure switchdev drivers such as mlxsw, ocelot, dsa etc.

We introduce a helper function: dev_get_port_parent_id() which supports
recursion into the lower devices to obtain the first port's parent ID.

Convert the bridge, core and ipv4 multicast routing code to check for
such ndo_get_port_parent_id() and call the helper function when valid
before falling back to switchdev_port_attr_get(). This will allow us to
convert all relevant drivers in one go instead of having to implement
both switchdev_port_attr_get() and ndo_get_port_parent_id() operations,
then get rid of switchdev_port_attr_get().
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6abc596

04 2月, 2019 1 次提交

netdevice.h: Add __cold to netdev_<level> logging functions · ce3fdb69

由 Joe Perches 提交于 2月 02, 2019

Add __cold to the netdev_<level> logging functions similar to
the use of __cold in the generic printk function.

Using __cold moves all the netdev_<level> logging functions
out-of-line possibly improving code locality and runtime
performance.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce3fdb69

31 1月, 2019 1 次提交

ipvlan, l3mdev: fix broken l3s mode wrt local routes · d5256083

由 Daniel Borkmann 提交于 1月 30, 2019

While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
I ran into the issue that while l3 mode is working fine, l3s mode
does not have any connectivity to kube-apiserver and hence all pods
end up in Error state as well. The ipvlan master device sits on
top of a bond device and hostns traffic to kube-apiserver (also running
in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
where the latter is the address of the bond0. While in l3 mode, a
curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
works fine from hostns, neither of them do in case of l3s. In the
latter only a curl to https://127.0.0.1:37573 appeared to work where
for local addresses of bond0 I saw kernel suddenly starting to emit
ARP requests to query HW address of bond0 which remained unanswered
and neighbor entries in INCOMPLETE state. These ARP requests only
happen while in l3s.

Debugging this further, I found the issue is that l3s mode is piggy-
backing on l3 master device, and in this case local routes are using
l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit
f5a0aab8 ("net: ipv4: dst for local input routes should use l3mdev
if relevant") and 5f02ce24 ("net: l3mdev: Allow the l3mdev to be
a loopback"). I found that reverting them back into using the
net->loopback_dev fixed ipvlan l3s connectivity and got everything
working for the CNI.

Now judging from 4fbae7d8 ("ipvlan: Introduce l3s mode") and the
l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
on l3 master device is to get the l3mdev_ip_rcv() receive hook for
setting the dst entry of the input route without adding its own
ipvlan specific hacks into the receive path, however, any l3 domain
semantics beyond just that are breaking l3s operation. Note that
ipvlan also has the ability to dynamically switch its internal
operation from l3 to l3s for all ports via ipvlan_set_port_mode()
at runtime. In any case, l3 vs l3s soley distinguishes itself by
'de-confusing' netfilter through switching skb->dev to ipvlan slave
device late in NF_INET_LOCAL_IN before handing the skb to L4.

Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
without any additional l3mdev semantics on top. This should also have
minimal impact since dev->priv_flags is already hot in cache. With
this set, l3s mode is working fine and I also get things like
masquerading pod traffic on the ipvlan master properly working.

  [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf

Fixes: f5a0aab8 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
Fixes: 5f02ce24 ("net: l3mdev: Allow the l3mdev to be a loopback")
Fixes: 4fbae7d8 ("ipvlan: Introduce l3s mode")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Martynas Pumputis <m@lambda.lt>
Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5256083

23 1月, 2019 1 次提交

net: introduce a knob to control whether to inherit devconf config · 856c395c

由 Cong Wang 提交于 1月 17, 2019

There have been many people complaining about the inconsistent
behaviors of IPv4 and IPv6 devconf when creating new network
namespaces.  Currently, for IPv4, we inherit all current settings
from init_net, but for IPv6 we reset all setting to default.

This patch introduces a new /proc file
/proc/sys/net/core/devconf_inherit_init_net to control the
behavior of whether to inhert sysctl current settings from init_net.
This file itself is only available in init_net.

As demonstrated below:

Initial setup in init_net:
 # cat /proc/sys/net/ipv4/conf/all/rp_filter
 2
 # cat /proc/sys/net/ipv6/conf/all/accept_dad
 1

Default value 0 (current behavior):
 # ip netns del test
 # ip netns add test
 # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter
 2
 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad
 0

Set to 1 (inherit from init_net):
 # echo 1 > /proc/sys/net/core/devconf_inherit_init_net
 # ip netns del test
 # ip netns add test
 # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter
 2
 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad
 1

Set to 2 (reset to default):
 # echo 2 > /proc/sys/net/core/devconf_inherit_init_net
 # ip netns del test
 # ip netns add test
 # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter
 0
 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad
 0

Set to a value out of range (invalid):
 # echo 3 > /proc/sys/net/core/devconf_inherit_init_net
 -bash: echo: write error: Invalid argument
 # echo -1 > /proc/sys/net/core/devconf_inherit_init_net
 -bash: echo: write error: Invalid argument
Reported-by: NZhu Yanjun <Yanjun.Zhu@windriver.com>
Reported-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

856c395c

18 1月, 2019 1 次提交

net: Add extack argument to ndo_fdb_add() · 87b0984e

由 Petr Machata 提交于 1月 16, 2019

Drivers may not be able to support certain FDB entries, and an error
code is insufficient to give clear hints as to the reasons of rejection.

In order to make it possible to communicate the rejection reason, extend
ndo_fdb_add() with an extack argument. Adapt the existing
implementations of ndo_fdb_add() to take the parameter (and ignore it).
Pass the extack parameter when invoking ndo_fdb_add() from rtnl_fdb_add().
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87b0984e

17 12月, 2018 1 次提交

net: rtnetlink: support for fdb get · 5b2f94b2

由 Roopa Prabhu 提交于 12月 15, 2018

This patch adds support for fdb get similar to
route get. arguments can be any of the following (similar to fdb add/del/dump):
[bridge, mac, vlan] or
[bridge_port, mac, vlan, flags=[NTF_MASTER]] or
[dev, mac, [vni|vlan], flags=[NTF_SELF]]
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b2f94b2

14 12月, 2018 3 次提交

net: dev: Issue NETDEV_PRE_CHANGEADDR · d59cdf94

由 Petr Machata 提交于 12月 13, 2018

When a device address is about to be changed, or an address added to the
list of device HW addresses, it is necessary to ensure that all
interested parties can support the address. Therefore, send the
NETDEV_PRE_CHANGEADDR notification, and if anyone bails on it, do not
change the address.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d59cdf94

net: dev: Add NETDEV_PRE_CHANGEADDR · 1570415f

由 Petr Machata 提交于 12月 13, 2018

The NETDEV_CHANGEADDR notification is emitted after a device address
changes. Extending this message to allow vetoing is certainly possible,
but several other notification types have instead adopted a simple
two-stage approach: first a "pre" notification is sent to make sure all
interested parties are OK with a change that's about to be done. Then
the change is done, and afterwards a "post" notification is sent.

This dual approach is easier to use: when the change is vetoed, nothing
has changed yet, and it's therefore unnecessary to roll anything back.
Therefore adopt it for NETDEV_CHANGEADDR as well.

To that end, add NETDEV_PRE_CHANGEADDR and an info structure to go along
with it.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1570415f

net: dev: Add extack argument to dev_set_mac_address() · 3a37a963

由 Petr Machata 提交于 12月 13, 2018

A follow-up patch will add a notifier type NETDEV_PRE_CHANGEADDR, which
allows vetoing of MAC address changes. One prominent path to that
notification is through dev_set_mac_address(). Therefore give this
function an extack argument, so that it can be packed together with the
notification. Thus a textual reason for rejection (or a warning) can be
communicated back to the user.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a37a963

13 12月, 2018 1 次提交

net: ndo_bridge_setlink: Add extack · 2fd527b7

由 Petr Machata 提交于 12月 12, 2018

Drivers may not be able to implement a VLAN addition or reconfiguration.
In those cases it's desirable to explain to the user that it was
rejected (and why).

To that end, add extack argument to ndo_bridge_setlink. Adapt all users
to that change.

Following patches will use the new argument in the bridge driver.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2fd527b7

07 12月, 2018 3 次提交

net: core: dev: Add extack argument to __dev_change_flags() · 6d040321

由 Petr Machata 提交于 12月 06, 2018

In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. The last missing API is __dev_change_flags().

Therefore extend __dev_change_flags() with and extra extack argument and
update the two existing users.

Since the function declaration line is changed anyway, name the struct
net_device argument to placate checkpatch.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d040321

net: core: dev: Add extack argument to dev_change_flags() · 567c5e13

由 Petr Machata 提交于 12月 06, 2018

In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. One prominent API through which the notification is
invoked is dev_change_flags().

Therefore extend dev_change_flags() with and extra extack argument and
update all users. Most of the calls end up just encoding NULL, but
several sites (VLAN, ipvlan, VRF, rtnetlink) do have extack available.

Since the function declaration line is changed anyway, name the other
function arguments to placate checkpatch.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

567c5e13

net: core: dev: Add extack argument to dev_open() · 00f54e68

由 Petr Machata 提交于 12月 06, 2018

In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. One prominent API through which the notification is
invoked is dev_open().

Therefore extend dev_open() with and extra extack argument and update
all users. Most of the calls end up just encoding NULL, but bond and
team drivers have the extack readily available.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00f54e68

26 11月, 2018 1 次提交

net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue · 620344c4

由 Heiner Kallweit 提交于 11月 25, 2018

Similar to netdev_sent_queue add helper __netdev_sent_queue as variant
of __netdev_tx_sent_queue.
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

620344c4

25 11月, 2018 1 次提交

net: fixup type in netdev_start_xmit() · 2183435c

由 Alexey Dobriyan 提交于 11月 24, 2018

Return code should be formally "netdev_tx_t".
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2183435c

20 11月, 2018 1 次提交

net: sched: gred: add basic Qdisc offload · 890d8d23

由 Jakub Kicinski 提交于 11月 19, 2018

Add basic offload for the GRED Qdisc.  Inform the drivers any
time Qdisc or virtual queue configuration changes.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NJohn Hurley <john.hurley@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

890d8d23

18 11月, 2018 1 次提交

net: align pcpu_sw_netstats and pcpu_lstats structs · 9a5ee462

由 Eric Dumazet 提交于 11月 16, 2018

Do not risk spanning these small structures on two cache lines,
it is absolutely not worth it.

For 32bit arches, the hint might not be enough, but we do not
really care anymore.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a5ee462

16 11月, 2018 1 次提交

net: dump more useful information in netdev_rx_csum_fault() · 7fe50ac8

由 Cong Wang 提交于 11月 12, 2018

Currently netdev_rx_csum_fault() only shows a device name,
we need more information about the skb for debugging csum
failures.

Sample output:

 ens3: hw csum failure
 dev features: 0x0000000000014b89
 skb len=84 data_len=0 pkt_type=0 gso_size=0 gso_type=0 nr_frags=0 ip_summed=0 csum=0 csum_complete_sw=0 csum_valid=0 csum_level=0

Note, I use pr_err() just to be consistent with the existing one.
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7fe50ac8

15 11月, 2018 1 次提交

net: sched: provide notification for graft on root · 98b0e5f6

由 Jakub Kicinski 提交于 11月 12, 2018

Drivers are currently not notified when a Qdisc is grafted as root.
This requires special casing Qdiscs added with parent = TC_H_ROOT in
the driver.  Also there is no notification sent to the driver when
an existing Qdisc is grafted as root.

Add this very simple notifications, drivers should now be able to
track their Qdisc tree fully.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NJohn Hurley <john.hurley@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98b0e5f6

11 11月, 2018 3 次提交

bpf: pass destroy() as a callback and remove its ndo_bpf subcommand · eb911947

由 Quentin Monnet 提交于 11月 09, 2018

As part of the transition from ndo_bpf() to callbacks attached to struct
bpf_offload_dev for some of the eBPF offload operations, move the
functions related to program destruction to the struct and remove the
subcommand that was used to call them through the NDO.

Remove function __bpf_offload_ndo(), which is no longer used.
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

eb911947

bpf: pass translate() as a callback and remove its ndo_bpf subcommand · b07ade27

由 Quentin Monnet 提交于 11月 09, 2018

As part of the transition from ndo_bpf() to callbacks attached to struct
bpf_offload_dev for some of the eBPF offload operations, move the
functions related to code translation to the struct and remove the
subcommand that was used to call them through the NDO.
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

b07ade27

bpf: call verifier_prep from its callback in struct bpf_offload_dev · 00db12c3

由 Quentin Monnet 提交于 11月 09, 2018

In a way similar to the change previously brought to the verify_insn
hook and to the finalize callback, switch to the newly added ops in
struct bpf_prog_offload for calling the functions used to prepare driver
verifiers.

Since the dev_ops pointer in struct bpf_prog_offload is no longer used
by any callback, we can now remove it from struct bpf_prog_offload.
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

00db12c3

09 11月, 2018 1 次提交

net: core: dev_addr_lists: add auxiliary func to handle reference address updates · e7946760

由 Ivan Khoronzhuk 提交于 11月 08, 2018

In order to avoid all table update, and only remove or add new
address, the auxiliary function exists, named __hw_addr_sync_dev().
It allows end driver do nothing when nothing changed and add/rm when
concrete address is firstly added or lastly removed. But it doesn't
include cases when an address of real device or vlan was reused by
other vlans or vlan/macval devices.

For handaling events when address was reused/unreused the patch adds
new auxiliary routine - __hw_addr_ref_sync_dev(). It allows to do
nothing when nothing was changed and do updates only for an address
being added/reused/deleted/unreused. Thus, clone address changes for
vlans can be mirrored in the table. The function is exclusive with
__hw_addr_sync_dev(). It's responsibility of the end driver to
identify address vlan device, if it needs so.
Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7946760

04 11月, 2018 1 次提交

net: bql: add __netdev_tx_sent_queue() · 3e59020a

由 Eric Dumazet 提交于 10月 31, 2018

When qdisc_run() tries to use BQL budget to bulk-dequeue a batch
of packets, GSO can later transform this list in another list
of skbs, and each skb is sent to device ndo_start_xmit(),
one at a time, with skb->xmit_more being set to one but
for last skb.

Problem is that very often, BQL limit is hit in the middle of
the packet train, forcing dev_hard_start_xmit() to stop the
bulk send and requeue the end of the list.

BQL role is to avoid head of line blocking, making sure
a qdisc can deliver high priority packets before low priority ones.

But there is no way requeued packets can be bypassed by fresh
packets in the qdisc.

Aborting the bulk send increases TX softirqs, and hot cache
lines (after skb_segment()) are wasted.

Note that for TSO packets, we never split a packet in the middle
because of BQL limit being hit.

Drivers should be able to update BQL counters without
flipping/caring about BQL status, if the current skb
has xmit_more set.

Upper layers are ultimately responsible to stop sending another
packet train when BQL limit is hit.

Code template in a driver might look like the following :

	send_doorbell = __netdev_tx_sent_queue(tx_queue, nr_bytes, skb->xmit_more);

Note that __netdev_tx_sent_queue() use is not mandatory,
since following patch will change dev_hard_start_xmit()
to not care about BQL status.

But it is highly recommended so that xmit_more full benefits
can be reached (less doorbells sent, and less atomic operations as well)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3e59020a

16 10月, 2018 1 次提交

FDDI: defza: Support capturing outgoing SMT traffic · 9f9a742d

由 Maciej W. Rozycki 提交于 10月 09, 2018

DEC FDDIcontroller 700 (DEFZA) uses a Tx/Rx queue pair to communicate
SMT frames with adapter's firmware.  Any SMT frame received from the RMC
via the Rx queue is queued back by the driver to the SMT Rx queue for
the firmware to process.  Similarly the firmware uses the SMT Tx queue
to supply the driver with SMT frames which are queued back to the Tx
queue for the RMC to send to the ring.

When a network tap is attached to an FDDI interface handled by `defza'
any incoming SMT frames captured are queued to our usual processing of
network data received, which in turn delivers them to any listening
taps.

However the outgoing SMT frames produced by the firmware bypass our
network protocol stack and are therefore not delivered to taps.  This in
turn means that taps are missing a part of network traffic sent by the
adapter, which may make it more difficult to track down network problems
or do general traffic analysis.

Call `dev_queue_xmit_nit' then in the SMT Tx path, having checked that
a network tap is attached, with a newly-created `dev_nit_active' helper
wrapping the usual condition used in the transmit path.
Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f9a742d

11 10月, 2018 1 次提交

net: ipv4: update fnhe_pmtu when first hop's MTU changes · af7d6cce

由 Sabrina Dubroca 提交于 10月 09, 2018

Since commit 5aad1de5 ("ipv4: use separate genid for next hop
exceptions"), exceptions get deprecated separately from cached
routes. In particular, administrative changes don't clear PMTU anymore.

As Stefano described in commit e9fa1495 ("ipv6: Reflect MTU changes
on PMTU of exceptions for MTU-less routes"), the PMTU discovered before
the local MTU change can become stale:
 - if the local MTU is now lower than the PMTU, that PMTU is now
   incorrect
 - if the local MTU was the lowest value in the path, and is increased,
   we might discover a higher PMTU

Similarly to what commit e9fa1495 did for IPv6, update PMTU in those
cases.

If the exception was locked, the discovered PMTU was smaller than the
minimal accepted PMTU. In that case, if the new local MTU is smaller
than the current PMTU, let PMTU discovery figure out if locking of the
exception is still needed.

To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU
notifier. By the time the notifier is called, dev->mtu has been
changed. This patch adds the old MTU as additional information in the
notifier structure, and a new call_netdevice_notifiers_u32() function.

Fixes: 5aad1de5 ("ipv4: use separate genid for next hop exceptions")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af7d6cce

05 10月, 2018 1 次提交

net: add umem reference in netdev{_rx}_queue · 661b8d1b

由 Magnus Karlsson 提交于 10月 01, 2018

These references to the umem will be used to store information
on what kind of AF_XDP umem that is bound to a queue id, if any.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

661b8d1b

27 9月, 2018 1 次提交

net: core: add member wol_enabled to struct net_device · 61941143

由 Heiner Kallweit 提交于 9月 24, 2018

Add flag wol_enabled to struct net_device indicating whether
Wake-on-LAN is enabled. As first user phy_suspend() will use it to
decide whether PHY can be suspended or not.

Fixes: f1e911d5 ("r8169: add basic phylib support")
Fixes: e8cfd9d6c772 ("net: phy: call state machine synchronously in phy_stop")
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61941143

19 9月, 2018 1 次提交

veth: rename pcpu_vstats as pcpu_lstats · 14d73416

由 Li RongQing 提交于 9月 17, 2018

struct pcpu_vstats and pcpu_lstats have same members and
usage, and pcpu_lstats is used in many files, so rename
pcpu_vstats as pcpu_lstats to reduce duplicate definition
Signed-off-by: NZhang Yu <zhangyu31@baidu.com>
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14d73416

14 9月, 2018 1 次提交

net: move definition of pcpu_lstats to header file · 52bb6677

由 Li RongQing 提交于 9月 14, 2018

pcpu_lstats is defined in several files, so unify them as one
and move to header file
Signed-off-by: NZhang Yu <zhangyu31@baidu.com>
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52bb6677

06 9月, 2018 1 次提交

packet: add sockopt to ignore outgoing packets · fa788d98

由 Vincent Whitchurch 提交于 9月 03, 2018

Currently, the only way to ignore outgoing packets on a packet socket is
via the BPF filter. With MSG_ZEROCOPY, packets that are looped into
AF_PACKET are copied in dev_queue_xmit_nit(), and this copy happens even
if the filter run from packet_rcv() would reject them. So the presence
of a packet socket on the interface takes away the benefits of
MSG_ZEROCOPY, even if the packet socket is not interested in outgoing
packets. (Even when MSG_ZEROCOPY is not used, the skb is unnecessarily
cloned, but the cost for that is much lower.)

Add a socket option to allow AF_PACKET sockets to ignore outgoing
packets to solve this. Note that the *BSDs already have something
similar: BIOCSSEESENT/BIOCSDIRECTION and BIOCSDIRFILT.

The first intended user is lldpd.
Signed-off-by: NVincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa788d98

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功