提交 · c0eb09aa7e1cf141f8a623fe46fec8d9a9e74268 · openeuler / Kernel

12 2月, 2021 21 次提交

mac80211: minstrel_ht: remove sample rate switching code for constrained devices · c0eb09aa

由 Felix Fietkau 提交于 3年前

This was added to mitigate the effects of too much sampling on devices that
use a static global fallback table instead of configurable multi-rate retry.
Now that the sampling algorithm is improved, this code path no longer performs
any better than the standard probing on affected devices.
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-6-nbd@nbd.nameSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

c0eb09aa

mac80211: minstrel_ht: show sampling rates in debugfs · 4a8d0c99

由 Felix Fietkau 提交于 3年前

This makes it easier to see what rates are going to be tested next
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-5-nbd@nbd.nameSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

4a8d0c99

mac80211: minstrel_ht: significantly redesign the rate probing strategy · 80d55154

由 Felix Fietkau 提交于 3年前

The biggest flaw in current minstrel_ht is the fact that it needs way too
many probing packets to be able to quickly find the best rate.
Depending on the wifi hardware and operating mode, this can significantly
reduce throughput when not operating at the highest available data rate.

In order to be able to significantly reduce the amount of rate sampling,
we need a much smarter selection of probing rates.

The new approach introduced by this patch maintains a limited set of
available rates to be tested during a statistics window.

They are split into distinct categories:
- MINSTREL_SAMPLE_TYPE_INC - incremental rate upgrade:
Pick the next rate group and find the first rate that is faster than
the current max. throughput rate
- MINSTREL_SAMPLE_TYPE_JUMP - random testing of higher rates:
Pick a random rate from the next group that is faster than the current
max throughput rate. This allows faster adaptation when the link changes
significantly
- MINSTREL_SAMPLE_TYPE_SLOW - test a rate between max_prob, max_tp2 and
max_tp in order to reduce the gap between them

In order to prioritize sampling, every 6 attempts are split into 3x INC,
2x JUMP, 1x SLOW.

Available rates are checked and refilled on every stats window update.

With this approach, we finally get a very small delta in throughput when
comparing setting the optimal data rate as a fixed rate vs normal rate
control operation.
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-4-nbd@nbd.nameSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

80d55154

mac80211: minstrel_ht: reduce the need to sample slower rates · 7aece471

由 Felix Fietkau 提交于 3年前

In order to more gracefully be able to fall back to lower rates without too
much throughput fluctuations, initialize all untested rates below tested ones
to the maximum probabilty of higher rates.
Usually this leads to untested lower rates getting initialized with a
probability value of 100%, making them better candidates for fallback without
having to rely on random probing
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-3-nbd@nbd.nameSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

7aece471

mac80211: minstrel_ht: update total packets counter in tx status path · 2012a2f7

由 Felix Fietkau 提交于 3年前

Keep the update in one place and prepare for further rework
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-2-nbd@nbd.nameSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

2012a2f7

mac80211: minstrel_ht: use bitfields to encode rate indexes · a42fa256

由 Felix Fietkau 提交于 3年前

Get rid of a lot of divisions and modulo operations
Reduces code size and improves performance
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-1-nbd@nbd.nameSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

a42fa256

cfg80211: initialize reg_rule in __freq_reg_info() · 9e6d5126

由 Luca Coelho 提交于 3年前

Sparse started warning on this function because we can potentially
return an uninitialized value. The reason is that if the caller
passes a min_bw value that is higher then the last value in bws[], we
will not go into the loop and reg_rule will remain initialized. This
cannot happen because the only caller of this function uses either 1
or 20 in min_bw, but the function will be more robust if we
pre-initialize the value.
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210204154439.6c884ea7281c.I257278d03b0c1ae0aa6631672cfa48f1a95d5996@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

9e6d5126

mac80211: fix potential overflow when multiplying to u32 integers · 6194f7e6

由 Colin Ian King 提交于 3年前

The multiplication of the u32 variables tx_time and estimated_retx is
performed using a 32 bit multiplication and the result is stored in
a u64 result. This has a potential u32 overflow issue, so avoid this
by casting tx_time to a u64 to force a 64 bit multiply.

Addresses-Coverity: ("Unintentional integer overflow")
Fixes: 050ac52c ("mac80211: code for on-demand Hybrid Wireless Mesh Protocol")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210205175352.208841-1-colin.king@canonical.comSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

6194f7e6

mac80211: enable QoS support for nl80211 ctrl port · 10cb8e61

由 Markus Theil 提交于 3年前

This patch unifies sending control port frames
over nl80211 and AF_PACKET sockets a little more.

Before this patch, EAPOL frames got QoS prioritization
only when using AF_PACKET sockets.

__ieee80211_select_queue only selects a QoS-enabled queue
for control port frames, when the control port protocol
is set correctly on the skb. For the AF_PACKET path this
works, but the nl80211 path used ETH_P_802_3.

Another check for injected frames in wme.c then prevented
the QoS TID to be copied in the frame.

In order to fix this, get rid of the frame injection marking
for nl80211 ctrl port and set the correct ethernet protocol.

Please note:
An erlier version of this path tried to prevent
frame aggregation for control port frames in order to speed up
the initial connection setup a little. This seemed to cause
issues on my older Intel dvm-based hardware, and was therefore
removed again. Future commits which try to reintroduce this
have to check carefully how hw behaves with aggregated and
non-aggregated traffic for the same TID.
My NIC: Intel(R) Centrino(R) Ultimate-N 6300 AGN, REV=0x74
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NMarkus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20210206115112.567881-1-markus.theil@tu-ilmenau.deSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

10cb8e61

cfg80211: remove unused callback · 258afa78

由 Matteo Croce 提交于 3年前

The ieee80211 class registers a callback which actually does nothing.
Given that the callback is optional, and all its accesses are protected
by a NULL check, remove it entirely.
Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
Link: https://lore.kernel.org/r/20210208113356.4105-1-mcroce@linux.microsoft.comSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

258afa78

tcp: Sanitize CMSG flags and reserved args in tcp_zerocopy_receive. · 3c5a2fd0

由 Arjun Roy 提交于 3年前

Explicitly define reserved field and require it and any subsequent
fields to be zero-valued for now. Additionally, limit the valid CMSG
flags that tcp_zerocopy_receive accepts.

Fixes: 7eeba170 ("tcp: Add receive timestamp support for receive zerocopy.")
Signed-off-by: NArjun Roy <arjunroy@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Suggested-by: NDavid Ahern <dsahern@gmail.com>
Suggested-by: NLeon Romanovsky <leon@kernel.org>
Suggested-by: NJakub Kicinski <kuba@kernel.org>
Acked-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c5a2fd0

net: fix dev_ifsioc_locked() race condition · 3b23a32a

由 Cong Wang 提交于 3年前

dev_ifsioc_locked() is called with only RCU read lock, so when
there is a parallel writer changing the mac address, it could
get a partially updated mac address, as shown below:

Thread 1			Thread 2
// eth_commit_mac_addr_change()
memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
				// dev_ifsioc_locked()
				memcpy(ifr->ifr_hwaddr.sa_data,
					dev->dev_addr,...);

Close this race condition by guarding them with a RW semaphore,
like netdev_get_name(). We can not use seqlock here as it does not
allow blocking. The writers already take RTNL anyway, so this does
not affect the slow path. To avoid bothering existing
dev_set_mac_address() callers in drivers, introduce a new wrapper
just for user-facing callers on ioctl and rtnetlink paths.

Note, bonding also changes slave mac addresses but that requires
a separate patch due to the complexity of bonding code.

Fixes: 3710becf ("net: RCU locking for simple ioctl()")
Reported-by: N"Gong, Sishuai" <sishuai@purdue.edu>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b23a32a

net: fib_notifier: don't return positive values on fib registration · 6f199552

由 Vlad Buslov 提交于 3年前

The function fib6_walk_continue() cannot return a positive value when
called from register_fib_notifier(), but ignoring causes static analyzer to
generate warnings in users of register_fib_notifier() that try to convert
returned error code to pointer with ERR_PTR(). Handle such case by
explicitly checking for positive error values and converting them to
-EINVAL in fib6_tables_dump().
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Suggested-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f199552

net: ipconfig: avoid use-after-free in ic_close_devs · f68cbaed

由 Vladimir Oltean 提交于 3年前

Due to the fact that ic_dev->dev is kept open in ic_close_dev, I had
thought that ic_dev will not be freed either. But that is not the case,
but instead "everybody dies" when ipconfig cleans up, and just the
net_device behind ic_dev->dev remains allocated but not ic_dev itself.

This is a problem because in ic_close_devs, for every net device that
we're about to close, we compare it against the list of lower interfaces
of ic_dev, to figure out whether we should close it or not. But since
ic_dev itself is subject to freeing, this means that at some point in
the middle of the list of ipconfig interfaces, ic_dev will have been
freed, and we would be still attempting to iterate through its list of
lower interfaces while checking whether to bring down the remaining
ipconfig interfaces.

There are multiple ways to avoid the use-after-free: we could delay
freeing ic_dev until the very end (outside the while loop). Or an even
simpler one: we can observe that we don't need ic_dev when iterating
through its lowers, only ic_dev->dev, structure which isn't ever freed.
So, by keeping ic_dev->dev in a variable assigned prior to freeing
ic_dev, we can avoid all use-after-free issues.

Fixes: 46acf7bd ("Revert "net: ipv4: handle DSA enabled master network devices"")
Reported-by: Nkernel test robot <oliver.sang@intel.com>
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f68cbaed

net: initialize net->net_cookie at netns setup · 3d368ab8

由 Eric Dumazet 提交于 3年前

It is simpler to make net->net_cookie a plain u64
written once in setup_net() instead of looping
and using atomic64 helpers.

Lorenz Bauer wants to add SO_NETNS_COOKIE socket option
and this patch would makes his patch series simpler.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Tested-by: NLorenz Bauer <lmb@cloudflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d368ab8

net: dsa: xrs700x: add HSR offloading support · bd62e6f5

由 George McCollister 提交于 3年前

Add offloading for HSR/PRP (IEC 62439-3) tag insertion, tag removal
forwarding and duplication supported by the xrs7000 series switches.

Only HSR v1 and PRP v1 are supported by the xrs7000 series switches (HSR
v0 is not).
Signed-off-by: NGeorge McCollister <george.mccollister@gmail.com>
Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd62e6f5

net: dsa: add support for offloading HSR · 18596f50

由 George McCollister 提交于 3年前

Add support for offloading of HSR/PRP (IEC 62439-3) tag insertion
tag removal, duplicate generation and forwarding on DSA switches.

Add DSA_NOTIFIER_HSR_JOIN and DSA_NOTIFIER_HSR_LEAVE which trigger calls
to .port_hsr_join and .port_hsr_leave in the DSA driver for the switch.

The DSA switch driver should then set netdev feature flags for the
HSR/PRP operation that it offloads.
    NETIF_F_HW_HSR_TAG_INS
    NETIF_F_HW_HSR_TAG_RM
    NETIF_F_HW_HSR_FWD
    NETIF_F_HW_HSR_DUP
Signed-off-by: NGeorge McCollister <george.mccollister@gmail.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18596f50

net: hsr: add offloading support · dcf0cd1c

由 George McCollister 提交于 3年前

Add support for offloading of HSR/PRP (IEC 62439-3) tag insertion
tag removal, duplicate generation and forwarding.

For HSR, insertion involves the switch adding a 6 byte HSR header after
the 14 byte Ethernet header. For PRP it adds a 6 byte trailer.

Tag removal involves automatically stripping the HSR/PRP header/trailer
in the switch. This is possible when the switch also performs auto
deduplication using the HSR/PRP header/trailer (making it no longer
required).

Forwarding involves automatically forwarding between redundant ports in
an HSR. This is crucial because delay is accumulated as a frame passes
through each node in the ring.

Duplication involves the switch automatically sending a single frame
from the CPU port to both redundant ports. This is required because the
inserted HSR/PRP header/trailer must contain the same sequence number
on the frames sent out both redundant ports.

Export is_hsr_master so DSA can tell them apart from other devices in
dsa_slave_changeupper.
Signed-off-by: NGeorge McCollister <george.mccollister@gmail.com>
Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcf0cd1c

net: hsr: generate supervision frame without HSR/PRP tag · 78be9217

由 George McCollister 提交于 3年前

For a switch to offload insertion of HSR/PRP tags, frames must not be
sent to the CPU facing switch port with a tag. Generate supervision frames
(eth type ETH_P_PRP) without HSR v1 (ETH_P_HSR)/PRP tag and rely on
create_tagged_frame which inserts it later. This will allow skipping the
tag insertion for all outgoing frames in the future which is required for
HSR v1/PRP tag insertions to be offloaded.

HSR v0 supervision frames always contain tag information so insertion of
the tag can't be offloaded. IEC 62439-3 Ed.2.0 (HSR v1) specifically
notes that this was changed since v0 to allow offloading.
Signed-off-by: NGeorge McCollister <george.mccollister@gmail.com>
Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
Tested-by: NVladimir Oltean <olteanv@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78be9217

tcp: add some entropy in __inet_hash_connect() · c579bd1b

由 Eric Dumazet 提交于 3年前

Even when implementing RFC 6056 3.3.4 (Algorithm 4: Double-Hash
Port Selection Algorithm), a patient attacker could still be able
to collect enough state from an otherwise idle host.

Idea of this patch is to inject some noise, in the
cases __inet_hash_connect() found a candidate in the first
attempt.

This noise should not significantly reduce the collision
avoidance, and should be zero if connection table
is already well used.

Note that this is not implementing RFC 6056 3.3.5
because we think Algorithm 5 could hurt typical
workloads.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: David Dworken <ddworken@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c579bd1b

tcp: change source port randomizarion at connect() time · 190cc824

由 Eric Dumazet 提交于 3年前

RFC 6056 (Recommendations for Transport-Protocol Port Randomization)
provides good summary of why source selection needs extra care.

David Dworken reminded us that linux implements Algorithm 3
as described in RFC 6056 3.3.3

Quoting David :
   In the context of the web, this creates an interesting info leak where
   websites can count how many TCP connections a user's computer is
   establishing over time. For example, this allows a website to count
   exactly how many subresources a third party website loaded.
   This also allows:
   - Distinguishing between different users behind a VPN based on
       distinct source port ranges.
   - Tracking users over time across multiple networks.
   - Covert communication channels between different browsers/browser
       profiles running on the same computer
   - Tracking what applications are running on a computer based on
       the pattern of how fast source ports are getting incremented.

Section 3.3.4 describes an enhancement, that reduces
attackers ability to use the basic information currently
stored into the shared 'u32 hint'.

This change also decreases collision rate when
multiple applications need to connect() to
different destinations.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDavid Dworken <ddworken@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

190cc824

11 2月, 2021 1 次提交

rxrpc: Fix missing dependency on NET_UDP_TUNNEL · dc0e6056

由 David Howells 提交于 3年前

The changes to make rxrpc create the udp socket missed a bit to add the
Kconfig dependency on the udp tunnel code to do this.

Fix this by adding making AF_RXRPC select NET_UDP_TUNNEL.

Fixes: 1a9b86c9 ("rxrpc: use udp tunnel APIs instead of open code in rxrpc_open_socket")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NVadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NXin Long <lucien.xin@gmail.com>
cc: alaa@dev.mellanox.co.il
cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc0e6056

10 2月, 2021 4 次提交

vsock: fix locking in vsock_shutdown() · 1c5fae9c

由 Stefano Garzarella 提交于 3年前

In vsock_shutdown() we touched some socket fields without holding the
socket lock, such as 'state' and 'sk_flags'.

Also, after the introduction of multi-transport, we are accessing
'vsk->transport' in vsock_send_shutdown() without holding the lock
and this call can be made while the connection is in progress, so
the transport can change in the meantime.

To avoid issues, we hold the socket lock when we enter in
vsock_shutdown() and release it when we leave.

Among the transports that implement the 'shutdown' callback, only
hyperv_transport acquired the lock. Since the caller now holds it,
we no longer take it.

Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c5fae9c

net: add sysfs attribute to control napi threaded mode · 5fdd2f0e

由 Wei Wang 提交于 3年前

This patch adds a new sysfs attribute to the network device class.
Said attribute provides a per-device control to enable/disable the
threaded mode for all the napi instances of the given network device,
without the need for a device up/down.
User sets it to 1 or 0 to enable or disable threaded mode.
Note: when switching between threaded and the current softirq based mode
for a napi instance, it will not immediately take effect if the napi is
currently being polled. The mode switch will happen for the next time
napi_schedule() is called.
Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Co-developed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Co-developed-by: NFelix Fietkau <nbd@nbd.name>
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Signed-off-by: NWei Wang <weiwan@google.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5fdd2f0e

net: implement threaded-able napi poll loop support · 29863d41

由 Wei Wang 提交于 3年前

This patch allows running each napi poll loop inside its own
kernel thread.
The kthread is created during netif_napi_add() if dev->threaded
is set. And threaded mode is enabled in napi_enable(). We will
provide a way to set dev->threaded and enable threaded mode
without a device up/down in the following patch.

Once that threaded mode is enabled and the kthread is
started, napi_schedule() will wake-up such thread instead
of scheduling the softirq.

The threaded poll loop behaves quite likely the net_rx_action,
but it does not have to manipulate local irqs and uses
an explicit scheduling point based on netdev_budget.
Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Co-developed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Co-developed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NWei Wang <weiwan@google.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29863d41

net: extract napi poll functionality to __napi_poll() · 898f8015

由 Felix Fietkau 提交于 3年前

This commit introduces a new function __napi_poll() which does the main
logic of the existing napi_poll() function, and will be called by other
functions in later commits.
This idea and implementation is done by Felix Fietkau <nbd@nbd.name> and
is proposed as part of the patch to move napi work to work_queue
context.
This commit by itself is a code restructure.
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Signed-off-by: NWei Wang <weiwan@google.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

898f8015

09 2月, 2021 13 次提交

IPv6: Extend 'fib_notify_on_flag_change' sysctl · 6fad361a

由 Amit Cohen 提交于 3年前

Add the value '2' to 'fib_notify_on_flag_change' to allow sending
notifications only for failed route installation.

Separate value is added for such notifications because there are less of
them, so they do not impact performance and some users will find them more
important.
Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fad361a

IPv6: Add "offload failed" indication to routes · 0c5fcf9e

由 Amit Cohen 提交于 3年前

After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel, but not
necessarily in hardware.

The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

To avoid such cases, previous patch set added the ability to emit
RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed, this behavior is controlled by sysctl.

With the above mentioned behavior, it is possible to know from user-space
if the route was offloaded, but if the offload fails there is no indication
to user-space. Following a failure, a routing daemon will wait indefinitely
for a notification that will never come.

This patch adds an "offload_failed" indication to IPv6 routes, so that
users will have better visibility into the offload process.

'struct fib6_info' is extended with new field that indicates if route
offload failed. Note that the new field is added using unused bit and
therefore there is no need to increase struct size.
Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c5fcf9e

IPv4: Extend 'fib_notify_on_flag_change' sysctl · 648106c3

由 Amit Cohen 提交于 3年前

Add the value '2' to 'fib_notify_on_flag_change' to allow sending
notifications only for failed route installation.

Separate value is added for such notifications because there are less of
them, so they do not impact performance and some users will find them more
important.
Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

648106c3

IPv4: Add "offload failed" indication to routes · 36c5100e

由 Amit Cohen 提交于 3年前

After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel, but not
necessarily in hardware.

The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

To avoid such cases, previous patch set added the ability to emit
RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed, this behavior is controlled by sysctl.

With the above mentioned behavior, it is possible to know from user-space
if the route was offloaded, but if the offload fails there is no indication
to user-space. Following a failure, a routing daemon will wait indefinitely
for a notification that will never come.

This patch adds an "offload_failed" indication to IPv4 routes, so that
users will have better visibility into the offload process.

'struct fib_alias', and 'struct fib_rt_info' are extended with new field
that indicates if route offload failed. Note that the new field is added
using unused bit and therefore there is no need to increase structs size.
Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36c5100e

bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_state · b2bdba1c

由 Horatiu Vultur 提交于 3年前

The function br_mrp_port_switchdev_set_state was called both with MRP
port state and STP port state, which is an issue because they don't
match exactly.

Therefore, update the function to be used only with STP port state and
use the id SWITCHDEV_ATTR_ID_PORT_STP_STATE.

The choice of using STP over MRP is that the drivers already implement
SWITCHDEV_ATTR_ID_PORT_STP_STATE and already in SW we update the port
STP state.

Fixes: 9a9f26e8 ("bridge: mrp: Connect MRP API with the switchdev API")
Fixes: fadd4091 ("bridge: switchdev: mrp: Implement MRP API for switchdev")
Fixes: 2f1a11ae ("bridge: mrp: Add MRP interface.")
Reported-by: NRasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2bdba1c

netfilter: nftables: relax check for stateful expressions in set definition · 664899e8

由 Pablo Neira Ayuso 提交于 3年前

Restore the original behaviour where users are allowed to add an element
with any stateful expression if the set definition specifies no stateful
expressions. Make sure upper maximum number of stateful expressions of
NFT_SET_EXPR_MAX is not reached.

Fixes: 8cfd9b0f ("netfilter: nftables: generalize set expressions support")
Fixes: 48b0ae04 ("netfilter: nftables: netlink support for several set element expressions")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

664899e8

net: bridge: use switchdev for port flags set through sysfs too · 8043c845

由 Vladimir Oltean 提交于 3年前

Looking through patchwork I don't see that there was any consensus to
use switchdev notifiers only in case of netlink provided port flags but
not sysfs (as a sort of deprecation, punishment or anything like that),
so we should probably keep the user interface consistent in terms of
functionality.

http://patchwork.ozlabs.org/project/netdev/patch/20170605092043.3523-3-jiri@resnulli.us/
http://patchwork.ozlabs.org/project/netdev/patch/20170608064428.4785-3-jiri@resnulli.us/

Fixes: 3922285d ("net: bridge: Add support for offloading port attributes")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8043c845

rxrpc: use udp tunnel APIs instead of open code in rxrpc_open_socket · 1a9b86c9

由 Xin Long 提交于 3年前

In rxrpc_open_socket(), now it's using sock_create_kern() and
kernel_bind() to create a udp tunnel socket, and other kernel
APIs to set up it. These code can be replaced with udp tunnel
APIs udp_sock_create() and setup_udp_tunnel_sock(), and it'll
simplify rxrpc_open_socket().

Note that with this patch, the udp tunnel socket will always
bind to a random port if transport is not provided by users,
which is suggested by David Howells, thanks!
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Reviewed-by: NVadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a9b86c9

net-sysfs: Add rtnl locking for getting Tx queue traffic class · b2f17564

由 Alexander Duyck 提交于 3年前

In order to access the suboordinate dev for a device we should be holding
the rtnl_lock when outside of the transmit path. The existing code was not
doing that for the sysfs dump function and as a result we were open to a
possible race.

To resolve that take the rtnl lock prior to accessing the sb_dev field of
the Tx queue and release it after we have retrieved the tc for the queue.
Signed-off-by: NAlexander Duyck <alexanderduyck@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2f17564

netfilter: conntrack: skip identical origin tuple in same zone only · 07998281

由 Florian Westphal 提交于 3年前

The origin skip check needs to re-test the zone. Else, we might skip
a colliding tuple in the reply direction.

This only occurs when using 'directional zones' where origin tuples
reside in different zones but the reply tuples share the same zone.

This causes the new conntrack entry to be dropped at confirmation time
because NAT clash resolution was elided.

Fixes: 4e35c1cb ("netfilter: nf_nat: skip nat clash resolution for same-origin entries")
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

07998281

vsock/virtio: update credit only if socket is not closed · ce7536bc

由 Stefano Garzarella 提交于 3年前

If the socket is closed or is being released, some resources used by
virtio_transport_space_update() such as 'vsk->trans' may be released.

To avoid a use after free bug we should only update the available credit
when we are sure the socket is still open and we have the lock held.

Fixes: 06a8fc78 ("VSOCK: Introduce virtio_vsock_common.ko")
Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20210208144454.84438-1-sgarzare@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

ce7536bc

seg6: fool-proof the processing of SRv6 behavior attributes · 300a0fd8

由 Andrea Mayer 提交于 3年前

The set of required attributes for a given SRv6 behavior is identified
using a bitmap stored in an unsigned long, since the initial design of SRv6
networking in Linux. Recently the same approach has been used for
identifying the optional attributes.

However, the number of attributes supported by SRv6 behaviors depends on
the size of the unsigned long type which changes with the architecture.
Indeed, on a 64-bit architecture, an SRv6 behavior can support up to 64
attributes while on a 32-bit architecture it can support at most 32
attributes.

To fool-proof the processing of SRv6 behaviors we verify, at compile time,
that the set of all supported SRv6 attributes can be encoded into a bitmap
stored in an unsigned long. Otherwise, kernel build fails forcing
developers to reconsider adding a new attribute or extend the total
number of supported attributes by the SRv6 behaviors.

Moreover, we replace all patterns (1 << i) with the macro SEG6_F_ATTR(i) in
order to address potential overflow issues caused by 32-bit signed
arithmetic.

Thanks to Colin Ian King for catching the overflow problem, providing a
solution and inspiring this patch.
Thanks to Jakub Kicinski for his useful suggestions during the design of
this patch.

v2:
- remove the SEG6_LOCAL_MAX_SUPP which is not strictly needed: it can
be derived from the unsigned long type. Thanks to David Ahern for
pointing it out.
Signed-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210206170934.5982-1-andrea.mayer@uniroma2.itSigned-off-by: NJakub Kicinski <kuba@kernel.org>

300a0fd8

net: fix iteration for sctp transport seq_files · af8085f3

由 NeilBrown 提交于 3年前

The sctp transport seq_file iterators take a reference to the transport
in the ->start and ->next functions and releases the reference in the
->show function.  The preferred handling for such resources is to
release them in the subsequent ->next or ->stop function call.

Since Commit 1f4aace6 ("fs/seq_file.c: simplify seq_file iteration
code and interface") there is no guarantee that ->show will be called
after ->next, so this function can now leak references.

So move the sctp_transport_put() call to ->next and ->stop.

Fixes: 1f4aace6 ("fs/seq_file.c: simplify seq_file iteration code and interface")
Reported-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

af8085f3

07 2月, 2021 1 次提交

net/vmw_vsock: improve locking in vsock_connect_timeout() · 3d0bc44d

由 Norbert Slusarek 提交于 3年前

A possible locking issue in vsock_connect_timeout() was recognized by
Eric Dumazet which might cause a null pointer dereference in
vsock_transport_cancel_pkt(). This patch assures that
vsock_transport_cancel_pkt() will be called within the lock, so a race
condition won't occur which could result in vsk->transport to be set to NULL.

Fixes: 380feae0 ("vsock: cancel packets when failing to connect")
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NNorbert Slusarek <nslusarek@gmx.net>
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/trinity-f8e0937a-cf0e-4d80-a76e-d9a958ba3ef1-1612535522360@3c-app-gmx-bap12Signed-off-by: NJakub Kicinski <kuba@kernel.org>

3d0bc44d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功