提交 · fd1740b6abac39f68ce12e201697f106e0f1d519 · openeuler / Kernel

17 12月, 2021 2 次提交

bpf: Remove the cgroup -> bpf header dependecy · fd1740b6

由 Jakub Kicinski 提交于 3年前

Remove the dependency from cgroup-defs.h to bpf-cgroup.h and bpf.h.
This reduces the incremental build size of x86 allmodconfig after
bpf.h was touched from ~17k objects rebuilt to ~5k objects.
bpf.h is 2.2kLoC and is modified relatively often.

We need a new header with just the definition of struct cgroup_bpf
and enum cgroup_bpf_attach_type, this is akin to cgroup-defs.h.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NTejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/bpf/20211216025538.1649516-4-kuba@kernel.org

fd1740b6

add includes masked by cgroup -> bpf dependency · f7ea534a

由 Jakub Kicinski 提交于 3年前

cgroup pulls in BPF which pulls in a lot of includes.
We're about to break that chain so fix those who were
depending on it.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211216025538.1649516-2-kuba@kernel.org

f7ea534a

14 12月, 2021 3 次提交

xsk: Wipe out dead zero_copy_allocator declarations · d27a6622

由 Maciej Fijalkowski 提交于 3年前

zero_copy_allocator has been removed back when Bjorn Topel introduced
xsk_buff_pool. Remove references to it that were dangling in the tree.
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20211210171511.11574-1-maciej.fijalkowski@intel.com

d27a6622

bpf: Let bpf_warn_invalid_xdp_action() report more info · c8064e5b

由 Paolo Abeni 提交于 3年前

In non trivial scenarios, the action id alone is not sufficient to
identify the program causing the warning. Before the previous patch,
the generated stack-trace pointed out at least the involved device
driver.

Let's additionally include the program name and id, and the relevant
device name.

If the user needs additional infos, he can fetch them via a kernel
probe, leveraging the arguments added here.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com

c8064e5b

bpf: Add get_func_[arg|ret|arg_cnt] helpers · f92c1e18

由 Jiri Olsa 提交于 3年前

Adding following helpers for tracing programs:

Get n-th argument of the traced function:
  long bpf_get_func_arg(void *ctx, u32 n, u64 *value)

Get return value of the traced function:
  long bpf_get_func_ret(void *ctx, u64 *value)

Get arguments count of the traced function:
  long bpf_get_func_arg_cnt(void *ctx)

The trampoline now stores number of arguments on ctx-8
address, so it's easy to verify argument index and find
return value argument's position.

Moving function ip address on the trampoline stack behind
the number of functions arguments, so it's now stored on
ctx-16 address if it's needed.

All helpers above are inlined by verifier.

Also bit unrelated small change - using newly added function
bpf_prog_has_trampoline in check_get_func_ip.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211208193245.172141-5-jolsa@kernel.org

f92c1e18

12 12月, 2021 1 次提交

bpf: Add bpf_strncmp helper · c5fb1993

由 Hou Tao 提交于 3年前

The helper compares two strings: one string is a null-terminated
read-only string, and another string has const max storage size
but doesn't need to be null-terminated. It can be used to compare
file name in tracing or LSM program.
Signed-off-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211210141652.877186-2-houtao1@huawei.com

c5fb1993

11 12月, 2021 6 次提交

net: ocelot: add FDMA support · 753a026c

由 Clément Léger 提交于 3年前

Ethernet frames can be extracted or injected autonomously to or from
the device’s DDR3/DDR3L memory and/or PCIe memory space. Linked list
data structures in memory are used for injecting or extracting Ethernet
frames. The FDMA generates interrupts when frame extraction or
injection is done and when the linked lists need updating.

The FDMA is shared between all the ethernet ports of the switch and
uses a linked list of descriptors (DCB) to inject and extract packets.
Before adding descriptors, the FDMA channels must be stopped. It would
be inefficient to do that each time a descriptor would be added so the
channels are restarted only once they stopped.

Both channels uses ring-like structure to feed the DCBs to the FDMA.
head and tail are never touched by hardware and are completely handled
by the driver. On top of that, page recycling has been added and is
mostly taken from gianfar driver.
Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Co-developed-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: NClément Léger <clement.leger@bootlin.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

753a026c

net: ocelot: add and export ocelot_ptp_rx_timestamp() · b471a71e

由 Clément Léger 提交于 3年前

In order to support PTP in FDMA, PTP handling code is needed. Since
this is the same as for register-based extraction, export it with
a new ocelot_ptp_rx_timestamp() function.
Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NClément Léger <clement.leger@bootlin.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

b471a71e

net: ocelot: export ocelot_ifh_port_set() to setup IFH · e5150f00

由 Clément Léger 提交于 3年前

FDMA will need this code to prepare the injection frame header when
sending SKBs. Move this code into ocelot_ifh_port_set() and add
conditional IFH setting for vlan and rew op if they are not set.
Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NClément Léger <clement.leger@bootlin.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

e5150f00

net: ocelot: fix missed include in the vsc7514_regs.h file · 840ece19

由 Colin Foster 提交于 3年前

commit 32ecd22b ("net: mscc: ocelot: split register definitions to a
separate file") left out an include for <soc/mscc/ocelot_vcap.h>. It was
missed because the only consumer was ocelot_vsc7514.h, which already
included ocelot_vcap.

Fixes: 32ecd22b ("net: mscc: ocelot: split register definitions to a separate file")
Signed-off-by: NColin Foster <colin.foster@in-advantage.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20211209074010.1813010-1-colin.foster@in-advantage.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

840ece19

sock: Use sock_owned_by_user_nocheck() instead of sk_lock.owned. · 33d60fbd

由 Kuniyuki Iwashima 提交于 3年前

This patch moves sock_release_ownership() down in include/net/sock.h and
replaces some sk_lock.owned tests with sock_owned_by_user_nocheck().
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
Link: https://lore.kernel.org/r/20211208062158.54132-1-kuniyu@amazon.co.jpSigned-off-by: NJakub Kicinski <kuba@kernel.org>

33d60fbd

xfrm: add net device refcount tracker to struct xfrm_state_offload · e1b539bd

由 Eric Dumazet 提交于 3年前

Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
Link: https://lore.kernel.org/r/20211209154451.4184050-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

e1b539bd

10 12月, 2021 7 次提交

E
net: sched: add netns refcount tracker to struct tcf_exts · dbdcda63
由 Eric Dumazet 提交于 3年前
```
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
```
dbdcda63
E
net: add netns refcount tracker to struct seq_net_private · 04a931e5
由 Eric Dumazet 提交于 3年前
```
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
```
04a931e5

net: add netns refcount tracker to struct sock · ffa84b5f

由 Eric Dumazet 提交于 3年前

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

ffa84b5f

net: add networking namespace refcount tracker · 9ba74e6c

由 Eric Dumazet 提交于 3年前

We have 100+ syzbot reports about netns being dismantled too soon,
still unresolved as of today.

We think a missing get_net() or an extra put_net() is the root cause.

In order to find the bug(s), and be able to spot future ones,
this patch adds CONFIG_NET_NS_REFCNT_TRACKER and new helpers
to precisely pair all put_net() with corresponding get_net().

To use these helpers, each data structure owning a refcount
should also use a "netns_tracker" to pair the get and put.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

9ba74e6c

skbuff: Extract list pointers to silence compiler warnings · 1a2fb220

由 Kees Cook 提交于 3年前

Under both -Warray-bounds and the object_size sanitizer, the compiler is
upset about accessing prev/next of sk_buff when the object it thinks it
is coming from is sk_buff_head. The warning is a false positive due to
the compiler taking a conservative approach, opting to warn at casting
time rather than access time.

However, in support of enabling -Warray-bounds globally (which has
found many real bugs), arrange things for sk_buff so that the compiler
can unambiguously see that there is no intention to access anything
except prev/next.  Introduce and cast to a separate struct sk_buff_list,
which contains _only_ the first two fields, silencing the warnings:

In file included from ./include/net/net_namespace.h:39,
                 from ./include/linux/netdevice.h:37,
                 from net/core/netpoll.c:17:
net/core/netpoll.c: In function 'refill_skbs':
./include/linux/skbuff.h:2086:9: warning: array subscript 'struct sk_buff[0]' is partly outside array bounds of 'struct sk_buff_head[1]' [-Warray-bounds]
 2086 |         __skb_insert(newsk, next->prev, next, list);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/core/netpoll.c:49:28: note: while referencing 'skb_pool'
   49 | static struct sk_buff_head skb_pool;
      |                            ^~~~~~~~

This change results in no executable instruction differences.
Signed-off-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20211207062758.2324338-1-keescook@chromium.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

1a2fb220

net: phylink: use legacy_pre_march2020 · 001f4261

由 Russell King (Oracle) 提交于 3年前

Use the legacy flag to indicate whether we should operate in legacy
mode. This allows us to stop using the presence of a PCS as an
indicator to the age of the phylink user, and make PCS presence
optional.

Legacy mode involves:
1) calling mac_config() whenever the link comes up
2) calling mac_config() whenever the inband advertisement changes,
   possibly followed by a call to mac_an_restart()
3) making use of mac_an_restart()
4) making use of mac_pcs_get_state()

All the above functionality was moved to a seperate "PCS" block of
operations in March 2020.

Update the documents to indicate that the differences that this flag
makes.
Signed-off-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

001f4261

net: phylink: add legacy_pre_march2020 indicator · 3e5b1fec

由 Russell King (Oracle) 提交于 3年前

Add a boolean to phylink_config to indicate whether a driver has not
been updated for the changes in commit 7cceb599 ("net: phylink:
avoid mac_config calls"), and thus are reliant on the old behaviour.

We were currently keying the phylink behaviour on the presence of a
PCS, but this is sub-optimal for modern drivers that may not have a
PCS.

This commit merely introduces the new flag, but does not add any use,
since we need all legacy drivers to set this flag before it can be
used. Once these legacy drivers have been updated, we can remove this
flag.
Signed-off-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

3e5b1fec

09 12月, 2021 8 次提交

net: wwan: make debugfs optional · 283e6f5a

由 Sergey Ryazanov 提交于 3年前

Debugfs interface is optional for the regular modem use. Some distros
and users will want to disable this feature for security or kernel
size reasons. So add a configuration option that allows to completely
disable the debugfs interface of the WWAN devices.

A primary considered use case for this option was embedded firmwares.
For example, in OpenWrt, you can not completely disable debugfs, as a
lot of wireless stuff can only be configured and monitored with the
debugfs knobs. At the same time, reducing the size of a kernel and
modules is an essential task in the world of embedded software.
Disabling the WWAN and IOSM debugfs interfaces allows us to save 50K
(x86-64 build) of space for module storage. Not much, but already
considerable when you only have 16MB of storage.

So it is hard to just disable whole debugfs. Users need some fine
grained set of options to control which debugfs interface is important
and should be available and which is not.

The new configuration symbol is enabled by default and is hidden under
the EXPERT option. So a regular user would not be bothered by another
one configuration question. While an embedded distro maintainer will be
able to a little more reduce the final image size.
Signed-off-by: NSergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Reviewed-by: NLoic Poulain <loic.poulain@linaro.org>
Acked-by: NM Chetan Kumar <m.chetan.kumar@intel.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

283e6f5a

net: dsa: eliminate dsa_switch_ops :: port_bridge_tx_fwd_{,un}offload · 857fdd74

由 Vladimir Oltean 提交于 3年前

We don't really need new switch API for these, and with new switches
which intend to add support for this feature, it will become cumbersome
to maintain.

The change consists in restructuring the two drivers that implement this
offload (sja1105 and mv88e6xxx) such that the offload is enabled and
disabled from the ->port_bridge_{join,leave} methods instead of the old
->port_bridge_tx_fwd_{,un}offload.

The only non-trivial change is that mv88e6xxx_map_virtual_bridge_to_pvt()
has been moved to avoid a forward declaration, and the
mv88e6xxx_reg_lock() calls from inside it have been removed, since
locking is now done from mv88e6xxx_port_bridge_{join,leave}.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

857fdd74

net: dsa: add a "tx_fwd_offload" argument to ->port_bridge_join · b079922b

由 Vladimir Oltean 提交于 3年前

This is a preparation patch for the removal of the DSA switch methods
->port_bridge_tx_fwd_offload() and ->port_bridge_tx_fwd_unoffload().
The plan is for the switch to report whether it offloads TX forwarding
directly as a response to the ->port_bridge_join() method.

This change deals with the noisy portion of converting all existing
function prototypes to take this new boolean pointer argument.
The bool is placed in the cross-chip notifier structure for bridge join,
and a reference to it is provided to drivers. In the next change, DSA
will then actually look at this value instead of calling
->port_bridge_tx_fwd_offload().
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

b079922b

net: dsa: keep the bridge_dev and bridge_num as part of the same structure · d3eed0e5

由 Vladimir Oltean 提交于 3年前

The main desire behind this is to provide coherent bridge information to
the fast path without locking.

For example, right now we set dp->bridge_dev and dp->bridge_num from
separate code paths, it is theoretically possible for a packet
transmission to read these two port properties consecutively and find a
bridge number which does not correspond with the bridge device.

Another desire is to start passing more complex bridge information to
dsa_switch_ops functions. For example, with FDB isolation, it is
expected that drivers will need to be passed the bridge which requested
an FDB/MDB entry to be offloaded, and along with that bridge_dev, the
associated bridge_num should be passed too, in case the driver might
want to implement an isolation scheme based on that number.

We already pass the {bridge_dev, bridge_num} pair to the TX forwarding
offload switch API, however we'd like to remove that and squash it into
the basic bridge join/leave API. So that means we need to pass this
pair to the bridge join/leave API.

During dsa_port_bridge_leave, first we unset dp->bridge_dev, then we
call the driver's .port_bridge_leave with what used to be our
dp->bridge_dev, but provided as an argument.

When bridge_dev and bridge_num get folded into a single structure, we
need to preserve this behavior in dsa_port_bridge_leave: we need a copy
of what used to be in dp->bridge.

Switch drivers check bridge membership by comparing dp->bridge_dev with
the provided bridge_dev, but now, if we provide the struct dsa_bridge as
a pointer, they cannot keep comparing dp->bridge to the provided
pointer, since this only points to an on-stack copy. To make this
obvious and prevent driver writers from forgetting and doing stupid
things, in this new API, the struct dsa_bridge is provided as a full
structure (not very large, contains an int and a pointer) instead of a
pointer. An explicit comparison function needs to be used to determine
bridge membership: dsa_port_offloads_bridge().
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

d3eed0e5

net: dsa: export bridging offload helpers to drivers · 6a43cba3

由 Vladimir Oltean 提交于 3年前

Move the static inline helpers from net/dsa/dsa_priv.h to
include/net/dsa.h, so that drivers can call functions such as
dsa_port_offloads_bridge_dev(), which will be necessary after the
transition to a more complex bridge structure.

More functions than are needed right now are being moved, but this is
done for uniformity.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

6a43cba3

net: dsa: hide dp->bridge_dev and dp->bridge_num in the core behind helpers · 36cbf39b

由 Vladimir Oltean 提交于 3年前

The location of the bridge device pointer and number is going to change.
It is not going to be kept individually per port, but in a common
structure allocated dynamically and which will have lockdep validation.

Create helpers to access these elements so that we have a migration path
to the new organization.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

36cbf39b

net: dsa: assign a bridge number even without TX forwarding offload · 947c8746

由 Vladimir Oltean 提交于 3年前

The service where DSA assigns a unique bridge number for each forwarding
domain is useful even for drivers which do not implement the TX
forwarding offload feature.

For example, drivers might use the dp->bridge_num for FDB isolation.

So rename ds->num_fwd_offloading_bridges to ds->max_num_bridges, and
calculate a unique bridge_num for all drivers that set this value.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

947c8746

net: dsa: make dp->bridge_num one-based · 3f9bb030

由 Vladimir Oltean 提交于 3年前

I have seen too many bugs already due to the fact that we must encode an
invalid dp->bridge_num as a negative value, because the natural tendency
is to check that invalid value using (!dp->bridge_num). Latest example
can be seen in commit 1bec0f05 ("net: dsa: fix bridge_num not
getting cleared after ports leaving the bridge").

Convert the existing users to assume that dp->bridge_num == 0 is the
encoding for invalid, and valid bridge numbers start from 1.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

3f9bb030

08 12月, 2021 12 次提交

can: bittiming: replace CAN units with the generic ones from linux/units.h · 330c6d3b

由 Vincent Mailhol 提交于 3年前

In [1], we introduced a set of units in linux/can/bittiming.h. Since
then, generic SI prefixes were added to linux/units.h in [2]. Those
new prefixes can perfectly replace CAN specific ones.

This patch replaces all occurrences of the CAN units with their
corresponding prefix (from linux/units) and the unit (as a comment)
according to below table.

 CAN units	SI metric prefix (from linux/units) + unit (as a comment)
 ------------------------------------------------------------------------
 CAN_KBPS	KILO /* BPS */
 CAN_MBPS	MEGA /* BPS */
 CAM_MHZ	MEGA /* Hz */

The definition are then removed from linux/can/bittiming.h

[1] commit 1d775076 ("can: bittiming: add CAN_KBPS, CAN_MBPS and
CAN_MHZ macros")

[2] commit 26471d4a ("units: Add SI metric prefix definitions")

Link: https://lore.kernel.org/all/20211124014536.782550-1-mailhol.vincent@wanadoo.frSuggested-by: NJimmy Assarsson <extja@kvaser.com>
Suggested-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

330c6d3b

net: mscc: ocelot: split register definitions to a separate file · 32ecd22b

由 Colin Foster 提交于 3年前

Move these to a separate file will allow them to be shared to other
drivers.
Signed-off-by: NColin Foster <colin.foster@in-advantage.com>
Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

32ecd22b

net: phy: Remove unnecessary indentation in the comments of phy_device · a97770cc

由 Yanteng Si 提交于 3年前

Fix warning as:

linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:543: WARNING: Unexpected indentation.
linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:544: WARNING: Block quote ends without a blank line; unexpected unindent.
linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:546: WARNING: Unexpected indentation.
Suggested-by: NAkira Yokosawa <akiyks@gmail.com>
Signed-off-by: NYanteng Si <siyanteng@loongson.cn>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

a97770cc

E
net: sched: act_mirred: add net device refcount tracker · ada066b2
由 Eric Dumazet 提交于 3年前
```
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
```
ada066b2

llc: add net device refcount tracker · 615d069d

由 Eric Dumazet 提交于 3年前

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

615d069d

ax25: add net device refcount tracker · 66ce07f7

由 Eric Dumazet 提交于 3年前

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

66ce07f7

E
inet: add net device refcount tracker to struct fib_nh_common · e44b14eb
由 Eric Dumazet 提交于 3年前
```
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
```
e44b14eb

net: watchdog: add net device refcount tracker · f12bf6f3

由 Eric Dumazet 提交于 3年前

Add a netdevice_tracker inside struct net_device, to track
the self reference when a device has an active watchdog timer.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

f12bf6f3

vlan: add net device refcount tracker · 19c9ebf6

由 Eric Dumazet 提交于 3年前

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

19c9ebf6

net: eql: add net device refcount tracker · 08f0b22d

由 Eric Dumazet 提交于 3年前

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

08f0b22d

netfilter: conntrack: annotate data-races around ct->timeout · 802a7dc5

由 Eric Dumazet 提交于 3年前

(struct nf_conn)->timeout can be read/written locklessly,
add READ_ONCE()/WRITE_ONCE() to prevent load/store tearing.

BUG: KCSAN: data-race in __nf_conntrack_alloc / __nf_conntrack_find_get

write to 0xffff888132e78c08 of 4 bytes by task 6029 on cpu 0:
 __nf_conntrack_alloc+0x158/0x280 net/netfilter/nf_conntrack_core.c:1563
 init_conntrack+0x1da/0xb30 net/netfilter/nf_conntrack_core.c:1635
 resolve_normal_ct+0x502/0x610 net/netfilter/nf_conntrack_core.c:1746
 nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901
 ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414
 nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
 nf_hook_slow+0x72/0x170 net/netfilter/core.c:619
 nf_hook include/linux/netfilter.h:262 [inline]
 NF_HOOK include/linux/netfilter.h:305 [inline]
 ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324
 inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135
 __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402
 tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline]
 tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680
 __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864
 tcp_push_pending_frames include/net/tcp.h:1897 [inline]
 tcp_data_snd_check+0x62/0x2e0 net/ipv4/tcp_input.c:5452
 tcp_rcv_established+0x880/0x10e0 net/ipv4/tcp_input.c:5947
 tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521
 sk_backlog_rcv include/net/sock.h:1030 [inline]
 __release_sock+0xf2/0x270 net/core/sock.c:2768
 release_sock+0x40/0x110 net/core/sock.c:3300
 sk_stream_wait_memory+0x435/0x700 net/core/stream.c:145
 tcp_sendmsg_locked+0xb85/0x25a0 net/ipv4/tcp.c:1402
 tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1440
 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:644
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg net/socket.c:724 [inline]
 __sys_sendto+0x21e/0x2c0 net/socket.c:2036
 __do_sys_sendto net/socket.c:2048 [inline]
 __se_sys_sendto net/socket.c:2044 [inline]
 __x64_sys_sendto+0x74/0x90 net/socket.c:2044
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888132e78c08 of 4 bytes by task 17446 on cpu 1:
 nf_ct_is_expired include/net/netfilter/nf_conntrack.h:286 [inline]
 ____nf_conntrack_find net/netfilter/nf_conntrack_core.c:776 [inline]
 __nf_conntrack_find_get+0x1c7/0xac0 net/netfilter/nf_conntrack_core.c:807
 resolve_normal_ct+0x273/0x610 net/netfilter/nf_conntrack_core.c:1734
 nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901
 ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414
 nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
 nf_hook_slow+0x72/0x170 net/netfilter/core.c:619
 nf_hook include/linux/netfilter.h:262 [inline]
 NF_HOOK include/linux/netfilter.h:305 [inline]
 ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324
 inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135
 __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402
 __tcp_send_ack+0x1fd/0x300 net/ipv4/tcp_output.c:3956
 tcp_send_ack+0x23/0x30 net/ipv4/tcp_output.c:3962
 __tcp_ack_snd_check+0x2d8/0x510 net/ipv4/tcp_input.c:5478
 tcp_ack_snd_check net/ipv4/tcp_input.c:5523 [inline]
 tcp_rcv_established+0x8c2/0x10e0 net/ipv4/tcp_input.c:5948
 tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521
 sk_backlog_rcv include/net/sock.h:1030 [inline]
 __release_sock+0xf2/0x270 net/core/sock.c:2768
 release_sock+0x40/0x110 net/core/sock.c:3300
 tcp_sendpage+0x94/0xb0 net/ipv4/tcp.c:1114
 inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833
 rds_tcp_xmit+0x376/0x5f0 net/rds/tcp_send.c:118
 rds_send_xmit+0xbed/0x1500 net/rds/send.c:367
 rds_send_worker+0x43/0x200 net/rds/threads.c:200
 process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
 worker_thread+0x616/0xa70 kernel/workqueue.c:2445
 kthread+0x2c7/0x2e0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30

value changed: 0x00027cc2 -> 0x00000000

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 17446 Comm: kworker/u4:5 Tainted: G        W         5.16.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: krdsd rds_send_worker

Note: I chose an arbitrary commit for the Fixes: tag,
because I do not think we need to backport this fix to very old kernels.

Fixes: e37542ba ("netfilter: conntrack: avoid possible false sharing")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

802a7dc5

tcp: expose __tcp_sock_set_cork and __tcp_sock_set_nodelay · 6fadaa56

由 Maxim Galaganov 提交于 3年前

Expose __tcp_sock_set_cork() and __tcp_sock_set_nodelay() for use in
MPTCP setsockopt code -- namely for syncing MPTCP socket options with
subflows inside sync_socket_options() while already holding the subflow
socket lock.
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NMaxim Galaganov <max@internet.ru>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

6fadaa56

07 12月, 2021 1 次提交

net: fix recent csum changes · 45cac675

由 Eric Dumazet 提交于 3年前

Vladimir reported csum issues after my recent change in skb_postpull_rcsum()

Issue here is the following:

initial skb->csum is the csum of

[part to be pulled][rest of packet]

Old code:
 skb->csum = csum_sub(skb->csum, csum_partial(pull, pull_length, 0));

New code:
 skb->csum = ~csum_partial(pull, pull_length, ~skb->csum);

This is broken if the csum of [pulled part]
happens to be equal to skb->csum, because end
result of skb->csum is 0 in new code, instead
of being 0xffffffff

David Laight suggested to use

skb->csum = -csum_partial(pull, pull_length, -skb->csum);

I based my patches on existing code present in include/net/seg6.h,
update_csum_diff4() and update_csum_diff16() which might need
a similar fix.

I guess that my tests, mostly pulling 40 bytes of IPv6 header
were not providing enough entropy to hit this bug.

v2: added wsum_negate() to make sparse happy.

Fixes: 29c30026 ("net: optimize skb_postpull_rcsum()")
Fixes: 0bd28476 ("gro: optimize skb_gro_postpull_rcsum()")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Suggested-by: NDavid Laight <David.Laight@ACULAB.COM>
Cc: David Lebrun <dlebrun@google.com>
Tested-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211204045356.3659278-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

45cac675

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功