提交 · 148a9c4f29c75eb8136d2d6b38ed072dff51970d · openeuler / raspberrypi-kernel

27 12月, 2019 3 次提交

kabi: reserve space for network subsystem related structure · 148a9c4f

由 Tan Xiaojun 提交于 4月 10, 2019

hulk inclusion
category: feature
bugzilla: 13276
CVE: NA

-------------------------------

Reserve space for the structure in network subsystem.
Signed-off-by: NTan Xiaojun <tanxiaojun@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

148a9c4f

net: dev: Use unsigned integer as an argument to left-shift · b92b47fc

由 Andy Shevchenko 提交于 3月 19, 2019

mainline inclusion
from mainline-5.0
commit f4d7b3e23d259c44f1f1c39645450680fcd935d6
category: bugfix
bugzilla: 11090
CVE: NA

-------------------------------------------------
1 << 31 is Undefined Behaviour according to the C standard.
Use U type modifier to avoid theoretical overflow.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
(cherry picked from commit f4d7b3e23d259c44f1f1c39645450680fcd935d6)
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b92b47fc

ipvlan, l3mdev: fix broken l3s mode wrt local routes · 15ef6c67

由 Daniel Borkmann 提交于 1月 30, 2019

[ Upstream commit d5256083f62e2720f75bb3c5a928a0afe47d6bc3 ]

While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
I ran into the issue that while l3 mode is working fine, l3s mode
does not have any connectivity to kube-apiserver and hence all pods
end up in Error state as well. The ipvlan master device sits on
top of a bond device and hostns traffic to kube-apiserver (also running
in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
where the latter is the address of the bond0. While in l3 mode, a
curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
works fine from hostns, neither of them do in case of l3s. In the
latter only a curl to https://127.0.0.1:37573 appeared to work where
for local addresses of bond0 I saw kernel suddenly starting to emit
ARP requests to query HW address of bond0 which remained unanswered
and neighbor entries in INCOMPLETE state. These ARP requests only
happen while in l3s.

Debugging this further, I found the issue is that l3s mode is piggy-
backing on l3 master device, and in this case local routes are using
l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit
f5a0aab8 ("net: ipv4: dst for local input routes should use l3mdev
if relevant") and 5f02ce24 ("net: l3mdev: Allow the l3mdev to be
a loopback"). I found that reverting them back into using the
net->loopback_dev fixed ipvlan l3s connectivity and got everything
working for the CNI.

Now judging from 4fbae7d8 ("ipvlan: Introduce l3s mode") and the
l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
on l3 master device is to get the l3mdev_ip_rcv() receive hook for
setting the dst entry of the input route without adding its own
ipvlan specific hacks into the receive path, however, any l3 domain
semantics beyond just that are breaking l3s operation. Note that
ipvlan also has the ability to dynamically switch its internal
operation from l3 to l3s for all ports via ipvlan_set_port_mode()
at runtime. In any case, l3 vs l3s soley distinguishes itself by
'de-confusing' netfilter through switching skb->dev to ipvlan slave
device late in NF_INET_LOCAL_IN before handing the skb to L4.

Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
without any additional l3mdev semantics on top. This should also have
minimal impact since dev->priv_flags is already hot in cache. With
this set, l3s mode is working fine and I also get things like
masquerading pod traffic on the ipvlan master properly working.

  [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf

Fixes: f5a0aab8 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
Fixes: 5f02ce24 ("net: l3mdev: Allow the l3mdev to be a loopback")
Fixes: 4fbae7d8 ("ipvlan: Introduce l3s mode")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Martynas Pumputis <m@lambda.lt>
Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

15ef6c67

11 10月, 2018 1 次提交

net: ipv4: update fnhe_pmtu when first hop's MTU changes · af7d6cce

由 Sabrina Dubroca 提交于 10月 09, 2018

Since commit 5aad1de5 ("ipv4: use separate genid for next hop
exceptions"), exceptions get deprecated separately from cached
routes. In particular, administrative changes don't clear PMTU anymore.

As Stefano described in commit e9fa1495 ("ipv6: Reflect MTU changes
on PMTU of exceptions for MTU-less routes"), the PMTU discovered before
the local MTU change can become stale:
 - if the local MTU is now lower than the PMTU, that PMTU is now
   incorrect
 - if the local MTU was the lowest value in the path, and is increased,
   we might discover a higher PMTU

Similarly to what commit e9fa1495 did for IPv6, update PMTU in those
cases.

If the exception was locked, the discovered PMTU was smaller than the
minimal accepted PMTU. In that case, if the new local MTU is smaller
than the current PMTU, let PMTU discovery figure out if locking of the
exception is still needed.

To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU
notifier. By the time the notifier is called, dev->mtu has been
changed. This patch adds the old MTU as additional information in the
notifier structure, and a new call_netdevice_notifiers_u32() function.

Fixes: 5aad1de5 ("ipv4: use separate genid for next hop exceptions")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af7d6cce

27 9月, 2018 1 次提交

net: core: add member wol_enabled to struct net_device · 61941143

由 Heiner Kallweit 提交于 9月 24, 2018

Add flag wol_enabled to struct net_device indicating whether
Wake-on-LAN is enabled. As first user phy_suspend() will use it to
decide whether PHY can be suspended or not.

Fixes: f1e911d5 ("r8169: add basic phylib support")
Fixes: e8cfd9d6c772 ("net: phy: call state machine synchronously in phy_stop")
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61941143

11 8月, 2018 1 次提交

net: Provide stub for __netif_set_xps_queue if there is no CONFIG_XPS · c9fbb2d2

由 Krzysztof Kozlowski 提交于 8月 10, 2018

Building virtio_net driver without CONFIG_XPS fails with:

    drivers/net/virtio_net.c: In function ‘virtnet_set_affinity’:
    drivers/net/virtio_net.c:1910:3: error: implicit declaration of function ‘__netif_set_xps_queue’ [-Werror=implicit-function-declaration]
       __netif_set_xps_queue(vi->dev, mask, i, false);
       ^
Fixes: 4d99f660 ("net: allow to call netif_reset_xps_queues() under cpus_read_lock")
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9fbb2d2

01 8月, 2018 2 次提交

xsk: don't allow umem replace at stack level · 84c6b868

由 Jakub Kicinski 提交于 7月 30, 2018

Currently drivers have to check if they already have a umem
installed for a given queue and return an error if so.  Make
better use of XDP_QUERY_XSK_UMEM and move this functionality
to the core.

We need to keep rtnl across the calls now.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Acked-by: NBjörn Töpel <bjorn.topel@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84c6b868

net: update real_num_rx_queues even when !CONFIG_SYSFS · c29c2ebd

由 Jakub Kicinski 提交于 7月 30, 2018

We used to depend on real_num_rx_queues as a upper bound for sanity
checks.  For AF_XDP socket validation it's useful if the check behaves
the same regardless of CONFIG_SYSFS setting.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c29c2ebd

30 7月, 2018 1 次提交

net: report invalid mtu value via netlink extack · 7a4c53be

由 Stephen Hemminger 提交于 7月 27, 2018

If an invalid MTU value is set through rtnetlink return extra error
information instead of putting message in kernel log. For other cases
where there is no visible API, keep the error report in the log.

Example:
	# ip li set dev enp12s0 mtu 10000
	Error: mtu greater than device maximum.

	# ifconfig enp12s0 mtu 10000
	SIOCSIFMTU: Invalid argument
	# dmesg | tail -1
	[ 2047.795467] enp12s0: mtu greater than device maximum
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a4c53be

17 7月, 2018 1 次提交

net: convert gro_count to bitmask · d9f37d01

由 Li RongQing 提交于 7月 13, 2018

gro_hash size is 192 bytes, and uses 3 cache lines, if there is few
flows, gro_hash may be not fully used, so it is unnecessary to iterate
all gro_hash in napi_gro_flush(), to occupy unnecessary cacheline.

convert gro_count to a bitmask, and rename it as gro_bitmask, each bit
represents a element of gro_hash, only flush a gro_hash element if the
related bit is set, to speed up napi_gro_flush().

and update gro_bitmask only if it will be changed, to reduce cache
update
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Cc: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9f37d01

16 7月, 2018 1 次提交

net: Add TLS rx resync NDO · 16e4edc2

由 Boris Pismenny 提交于 7月 13, 2018

Add new netdev tls op for resynchronizing HW tls context
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16e4edc2

14 7月, 2018 2 次提交

xdp: support simultaneous driver and hw XDP attachment · a25717d2

由 Jakub Kicinski 提交于 7月 11, 2018

Split the query of HW-attached program from the software one.
Introduce new .ndo_bpf command to query HW-attached program.
This will allow drivers to install different programs in HW
and SW at the same time.  Netlink can now also carry multiple
programs on dump (in which case mode will be set to
XDP_ATTACHED_MULTI and user has to check per-attachment point
attributes, IFLA_XDP_PROG_ID will not be present).  We reuse
IFLA_XDP_PROG_ID skb space for second mode, so rtnl_xdp_size()
doesn't need to be updated.

Note that the installation side is still not there, since all
drivers currently reject installing more than one program at
the time.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

a25717d2

xdp: don't make drivers report attachment mode · 6b867589

由 Jakub Kicinski 提交于 7月 11, 2018

prog_attached of struct netdev_bpf should have been superseded
by simply setting prog_id long time ago, but we kept it around
to allow offloading drivers to communicate attachment mode (drv
vs hw).  Subsequently drivers were also allowed to report back
attachment flags (prog_flags), and since nowadays only programs
attached will XDP_FLAGS_HW_MODE can get offloaded, we can tell
the attachment mode from the flags driver reports.  Remove
prog_attached member.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

6b867589

10 7月, 2018 5 次提交

net: allow fallback function to pass netdev · 8ec56fc3

由 Alexander Duyck 提交于 7月 09, 2018

For most of these calls we can just pass NULL through to the fallback
function as the sb_dev. The only cases where we cannot are the cases where
we might be dealing with either an upper device or a driver that would
have configured things to support an sb_dev itself.

The only driver that has any significant change in this patch set should be
ixgbe as we can drop the redundant functionality that existed in both the
ndo_select_queue function and the fallback function that was passed through
to us.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

8ec56fc3

net: allow ndo_select_queue to pass netdev · 4f49dec9

由 Alexander Duyck 提交于 7月 09, 2018

This patch makes it so that instead of passing a void pointer as the
accel_priv we instead pass a net_device pointer as sb_dev. Making this
change allows us to pass the subordinate device through to the fallback
function eventually so that we can keep the actual code in the
ndo_select_queue call as focused on possible on the exception cases.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

4f49dec9

net: Add generic ndo_select_queue functions · a4ea8a3d

由 Alexander Duyck 提交于 7月 09, 2018

This patch adds a generic version of the ndo_select_queue functions for
either returning 0 or selecting a queue based on the processor ID. This is
generally meant to just reduce the number of functions we have to change
in the future when we have to deal with ndo_select_queue changes.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

a4ea8a3d

net: Add support for subordinate traffic classes to netdev_pick_tx · eadec877

由 Alexander Duyck 提交于 7月 09, 2018

This change makes it so that we can support the concept of subordinate
device traffic classes to the core networking code. In doing this we can
start pulling out the driver specific bits needed to support selecting a
queue based on an upper device.

The solution at is currently stands is only partially implemented. I have
the start of some XPS bits in here, but I would still need to allow for
configuration of the XPS maps on the queues reserved for the subordinate
devices. For now I am using the reference to the sb_dev XPS map as just a
way to skip the lookup of the lower device XPS map for now as that would
result in the wrong queue being picked.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

eadec877

net: Add support for subordinate device traffic classes · ffcfe25b

由 Alexander Duyck 提交于 7月 09, 2018

This patch is meant to provide the basic tools needed to allow us to create
subordinate device traffic classes. The general idea here is to allow
subdividing the queues of a device into queue groups accessible through an
upper device such as a macvlan.

The idea here is to enforce the idea that an upper device has to be a
single queue device, ideally with IFF_NO_QUQUE set. With that being the
case we can pretty much guarantee that the tc_to_txq mappings and XPS maps
for the upper device are unused. As such we could reuse those in order to
support subdividing the lower device and distributing those queues between
the subordinate devices.

In order to distinguish between a regular set of traffic classes and if a
device is carrying subordinate traffic classes I changed num_tc from a u8
to a s16 value and use the negative values to represent the subordinate
pool values. So starting at -1 and running to -32768 we can encode those as
pool values, and the existing values of 0 to 15 can be maintained.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

ffcfe25b

05 7月, 2018 1 次提交

net: limit each hash list length to MAX_GRO_SKBS · 6312fe77

由 Li RongQing 提交于 7月 05, 2018

After commit 07d78363 ("net: Convert NAPI gro list into a small hash
table.")' there is 8 hash buckets, which allows more flows to be held for
merging.  but MAX_GRO_SKBS, the total held skb for merging, is 8 skb still,
limit the hash table performance.

keep MAX_GRO_SKBS as 8 skb, but limit each hash list length to 8 skb, not
the total 8 skb
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6312fe77

04 7月, 2018 3 次提交

net/sched: Introduce the ETF Qdisc · 25db26a9

由 Vinicius Costa Gomes 提交于 7月 03, 2018

The ETF (Earliest TxTime First) qdisc uses the information added
earlier in this series (the socket option SO_TXTIME and the new
role of sk_buff->tstamp) to schedule packets transmission based
on absolute time.

For some workloads, just bandwidth enforcement is not enough, and
precise control of the transmission of packets is necessary.

Example:

$ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
           map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

$ tc qdisc add dev enp2s0 parent 100:1 etf delta 100000 \
           clockid CLOCK_TAI

In this example, the Qdisc will provide SW best-effort for the control
of the transmission time to the network adapter, the time stamp in the
socket will be in reference to the clockid CLOCK_TAI and packets
will leave the qdisc "delta" (100000) nanoseconds before its transmission
time.

The ETF qdisc will buffer packets sorted by their txtime. It will drop
packets on enqueue() if their skbuff clockid does not match the clock
reference of the Qdisc. Moreover, on dequeue(), a packet will be dropped
if it expires while being enqueued.

The qdisc also supports the SO_TXTIME deadline mode. For this mode, it
will dequeue a packet as soon as possible and change the skb timestamp
to 'now' during etf_dequeue().

Note that both the qdisc's and the SO_TXTIME ABIs allow for a clockid
to be configured, but it's been decided that usage of CLOCK_TAI should
be enforced until we decide to allow for other clockids to be used.
The rationale here is that PTP times are usually in the TAI scale, thus
no other clocks should be necessary. For now, the qdisc will return
EINVAL if any clocks other than CLOCK_TAI are used.
Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25db26a9

net: ipv4: listified version of ip_rcv · 17266ee9

由 Edward Cree 提交于 7月 02, 2018

Also involved adding a way to run a netfilter hook over a list of packets.
Rather than attempting to make netfilter know about lists (which would be
a major project in itself) we just let it call the regular okfn (in this
case ip_rcv_finish()) for any packets it steals, and have it give us back
a list of packets it's synchronously accepted (which normally NF_HOOK
would automatically call okfn() on, but we want to be able to potentially
pass the list to a listified version of okfn().)
The netfilter hooks themselves are indirect calls that still happen per-
packet (see nf_hook_entry_hookfn()), but again, changing that can be left
for future work.

There is potential for out-of-order receives if the netfilter hook ends up
synchronously stealing packets, as they will be processed before any
accepts earlier in the list. However, it was already possible for an
asynchronous accept to cause out-of-order receives, so presumably this is
considered OK.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

17266ee9

net: core: trivial netif_receive_skb_list() entry point · f6ad8c1b

由 Edward Cree 提交于 7月 02, 2018

Just calls netif_receive_skb() in a loop.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6ad8c1b

02 7月, 2018 2 次提交

net: fix use-after-free in GRO with ESP · 603d4cf8

由 Sabrina Dubroca 提交于 6月 30, 2018

Since the addition of GRO for ESP, gro_receive can consume the skb and
return -EINPROGRESS. In that case, the lower layer GRO handler cannot
touch the skb anymore.

Commit 5f114163 ("net: Add a skb_gro_flush_final helper.") converted
some of the gro_receive handlers that can lead to ESP's gro_receive so
that they wouldn't access the skb when -EINPROGRESS is returned, but
missed other spots, mainly in tunneling protocols.

This patch finishes the conversion to using skb_gro_flush_final(), and
adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and
GUE.

Fixes: 5f114163 ("net: Add a skb_gro_flush_final helper.")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

603d4cf8

net: Refactor XPS for CPUs and Rx queues · 80d19669

由 Amritha Nambiar 提交于 6月 29, 2018

Refactor XPS code to support Tx queue selection based on
CPU(s) map or Rx queue(s) map.
Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80d19669

26 6月, 2018 2 次提交

net: Convert NAPI gro list into a small hash table. · 07d78363

由 David Miller 提交于 6月 24, 2018

Improve the performance of GRO receive by splitting flows into
multiple hash chains.
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07d78363

net: Convert GRO SKB handling to list_head. · d4546c25

由 David Miller 提交于 6月 24, 2018

Manage pending per-NAPI GRO packets via list_head.

Return an SKB pointer from the GRO receive handlers.  When GRO receive
handlers return non-NULL, it means that this SKB needs to be completed
at this time and removed from the NAPI queue.

Several operations are greatly simplified by this transformation,
especially timing out the oldest SKB in the list when gro_count
exceeds MAX_GRO_SKBS, and napi_gro_flush() which walks the queue
in reverse order.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4546c25

05 6月, 2018 3 次提交

net: added netdevice operation for Tx · e3760c7e

由 Magnus Karlsson 提交于 6月 04, 2018

Added ndo_xsk_async_xmit. This ndo "kicks" the netdev to start to pull
userland AF_XDP Tx frames from a NAPI context.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

e3760c7e

net: xdp: added bpf_netdev_command XDP_{QUERY, SETUP}_XSK_UMEM · 74515c57

由 Björn Töpel 提交于 6月 04, 2018

Extend ndo_bpf with two new commands used for query zero-copy support
and register an UMEM to a queue_id of a netdev.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

74515c57

net: remove net_device operation ndo_xdp_flush · 189454e8

由 Jesper Dangaard Brouer 提交于 6月 05, 2018

All drivers are cleaned up and no references to ndo_xdp_flush
are left in drivers, it is time to remove the net_device_ops
operation ndo_xdp_flush.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

189454e8

03 6月, 2018 1 次提交

xdp: add flags argument to ndo_xdp_xmit API · 42b33468

由 Jesper Dangaard Brouer 提交于 5月 31, 2018

This patch only change the API and reject any use of flags. This is an
intermediate step that allows us to implement the flush flag operation
later, for each individual driver in a separate patch.

The plan is to implement flush operation via XDP_XMIT_FLUSH flag
and then remove XDP_XMIT_FLAGS_NONE when done.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

42b33468

29 5月, 2018 2 次提交

net: sched: mq: add simple offload notification · f971b132

由 Jakub Kicinski 提交于 5月 25, 2018

mq offload is trivial, we just need to let the device know
that the root qdisc is mq.  Alternative approach would be
to export qdisc_lookup() and make drivers check the root
type themselves, but notification via ndo_setup_tc is more
in line with other qdiscs.

Note that mq doesn't hold any stats on it's own, it just
adds up stats of its children.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f971b132

net: Introduce generic failover module · 30c8bd5a

由 Sridhar Samudrala 提交于 5月 24, 2018

The failover module provides a generic interface for paravirtual drivers
to register a netdev and a set of ops with a failover instance. The ops
are used as event handlers that get called to handle netdev register/
unregister/link change/name change events on slave pci ethernet devices
with the same mac address as the failover netdev.

This enables paravirtual drivers to use a VF as an accelerated low latency
datapath. It also allows migration of VMs with direct attached VFs by
failing over to the paravirtual datapath when the VF is unplugged.
Signed-off-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30c8bd5a

25 5月, 2018 2 次提交

net: include hash policy in LAG changeupper info · f44aa9ef

由 John Hurley 提交于 5月 23, 2018

LAG upper event notifiers contain the tx type used by the LAG device.
Extend this to also include the hash policy used for tx types that
utilize hashing.
Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f44aa9ef

xdp: change ndo_xdp_xmit API to support bulking · 735fc405

由 Jesper Dangaard Brouer 提交于 5月 24, 2018

This patch change the API for ndo_xdp_xmit to support bulking
xdp_frames.

When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
Most of the slowdown is caused by DMA API indirect function calls, but
also the net_device->ndo_xdp_xmit() call.

Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
performance improved:
 for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
 for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps

With frames avail as a bulk inside the driver ndo_xdp_xmit call,
further optimizations are possible, like bulk DMA-mapping for TX.

Testing without CONFIG_RETPOLINE show the same performance for
physical NIC drivers.

The virtual NIC driver tun sees a huge performance boost, as it can
avoid doing per frame producer locking, but instead amortize the
locking cost over the bulk.

V2: Fix compile errors reported by kbuild test robot <lkp@intel.com>
V4: Isolated ndo, driver changes and callers.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

735fc405

04 5月, 2018 1 次提交

dev: packet: make packet_direct_xmit a common function · 865b03f2

由 Magnus Karlsson 提交于 5月 02, 2018

The new dev_direct_xmit will be used by AF_XDP in later commits.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

865b03f2

02 5月, 2018 1 次提交

net: core: Inline netdev_features_size_check() · e283de3a

由 Florian Fainelli 提交于 4月 30, 2018

We do not require this inline function to be used in multiple different
locations, just inline it where it gets used in register_netdevice().
Suggested-by: NDavid Miller <davem@davemloft.net>
Suggested-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e283de3a

01 5月, 2018 1 次提交

net: Add TLS offload netdev ops · a5c37c63

由 Ilya Lesokhin 提交于 4月 30, 2018

Add new netdev ops to add and delete tls context
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NAviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5c37c63

30 4月, 2018 2 次提交

net: core: Assert the size of netdev_featres_t · 3ac305c3

由 Florian Fainelli 提交于 4月 27, 2018

We have about 53 netdev_features_t bits defined and counting, add a
build time check to catch when an u64 type will not be enough and we
will have to convert that to a bitmap. This is done in
register_netdevice() for convenience.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ac305c3

net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash · 1b837d48

由 Alexander Duyck 提交于 4月 27, 2018

I am dropping the export of __skb_tx_hash as after my patches nobody is
using it outside of the net/core/dev.c file. In addition I am renaming and
repurposing it to just be a static declaration of skb_tx_hash since that
was the only user for it at this point. By doing this the compiler can
inline it into __netdev_pick_tx as that will improve performance.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b837d48

27 4月, 2018 1 次提交

udp: add gso support to virtual devices · 83aa025f

由 Willem de Bruijn 提交于 4月 26, 2018

Virtual devices such as tunnels and bonding can handle large packets.
Only segment packets when reaching a physical or loopback device.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83aa025f