提交 · 772412176fb98493158929b220fe250127f611af · openeuler / Kernel

28 1月, 2021 2 次提交

bpf: Allow rewriting to ports under ip_unprivileged_port_start · 77241217

由 Stanislav Fomichev 提交于 1月 27, 2021

At the moment, BPF_CGROUP_INET{4,6}_BIND hooks can rewrite user_port
to the privileged ones (< ip_unprivileged_port_start), but it will
be rejected later on in the __inet_bind or __inet6_bind.

Let's add another return value to indicate that CAP_NET_BIND_SERVICE
check should be ignored. Use the same idea as we currently use
in cgroup/egress where bit #1 indicates CN. Instead, for
cgroup/bind{4,6}, bit #1 indicates that CAP_NET_BIND_SERVICE should
be bypassed.

v5:
- rename flags to be less confusing (Andrey Ignatov)
- rework BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY to work on flags
  and accept BPF_RET_SET_CN (no behavioral changes)

v4:
- Add missing IPv6 support (Martin KaFai Lau)

v3:
- Update description (Martin KaFai Lau)
- Fix capability restore in selftest (Martin KaFai Lau)

v2:
- Switch to explicit return code (Martin KaFai Lau)
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NAndrey Ignatov <rdna@fb.com>
Link: https://lore.kernel.org/bpf/20210127193140.3170382-1-sdf@google.com

77241217

skmsg: Make sk_psock_destroy() static · 8063e184

由 Cong Wang 提交于 1月 27, 2021

sk_psock_destroy() is a RCU callback, I can't see any reason why
it could be used outside.
Signed-off-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210127221501.46866-1-xiyou.wangcong@gmail.com

8063e184

21 1月, 2021 4 次提交

bpf: Split cgroup_bpf_enabled per attach type · a9ed15da

由 Stanislav Fomichev 提交于 1月 15, 2021

When we attach any cgroup hook, the rest (even if unused/unattached) start
to contribute small overhead. In particular, the one we want to avoid is
__cgroup_bpf_run_filter_skb which does two redirections to get to
the cgroup and pushes/pulls skb.

Let's split cgroup_bpf_enabled to be per-attach to make sure
only used attach types trigger.

I've dropped some existing high-level cgroup_bpf_enabled in some
places because BPF_PROG_CGROUP_XXX_RUN macros usually have another
cgroup_bpf_enabled check.

I also had to copy-paste BPF_CGROUP_RUN_SA_PROG_LOCK for
GETPEERNAME/GETSOCKNAME because type for cgroup_bpf_enabled[type]
has to be constant and known at compile time.
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210115163501.805133-4-sdf@google.com

a9ed15da

bpf: Try to avoid kzalloc in cgroup/{s,g}etsockopt · 20f2505f

由 Stanislav Fomichev 提交于 1月 15, 2021

When we attach a bpf program to cgroup/getsockopt any other getsockopt()
syscall starts incurring kzalloc/kfree cost.

Let add a small buffer on the stack and use it for small (majority)
{s,g}etsockopt values. The buffer is small enough to fit into
the cache line and cover the majority of simple options (most
of them are 4 byte ints).

It seems natural to do the same for setsockopt, but it's a bit more
involved when the BPF program modifies the data (where we have to
kmalloc). The assumption is that for the majority of setsockopt
calls (which are doing pure BPF options or apply policy) this
will bring some benefit as well.

Without this patch (we remove about 1% __kmalloc):
     3.38%     0.07%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt
            |
             --3.30%--__cgroup_bpf_run_filter_getsockopt
                       |
                        --0.81%--__kmalloc
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210115163501.805133-3-sdf@google.com

20f2505f

bpf: Remove extra lock_sock for TCP_ZEROCOPY_RECEIVE · 9cacf81f

由 Stanislav Fomichev 提交于 1月 15, 2021

Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE.
We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom
call in do_tcp_getsockopt using the on-stack data. This removes
3% overhead for locking/unlocking the socket.

Without this patch:
     3.38%     0.07%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt
            |
             --3.30%--__cgroup_bpf_run_filter_getsockopt
                       |
                        --0.81%--__kmalloc

With the patch applied:
     0.52%     0.12%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt_kern

Note, exporting uapi/tcp.h requires removing netinet/tcp.h
from test_progs.h because those headers have confliciting
definitions.
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210115163501.805133-2-sdf@google.com

9cacf81f

net: usb: cdc_ncm: don't spew notifications · de658a19

由 Grant Grundler 提交于 1月 19, 2021

RTL8156 sends notifications about every 32ms.
Only display/log notifications when something changes.

This issue has been reported by others:
	https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832472
	https://lkml.org/lkml/2020/8/27/1083

...
[785962.779840] usb 1-1: new high-speed USB device number 5 using xhci_hcd
[785962.929944] usb 1-1: New USB device found, idVendor=0bda, idProduct=8156, bcdDevice=30.00
[785962.929949] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=6
[785962.929952] usb 1-1: Product: USB 10/100/1G/2.5G LAN
[785962.929954] usb 1-1: Manufacturer: Realtek
[785962.929956] usb 1-1: SerialNumber: 000000001
[785962.991755] usbcore: registered new interface driver cdc_ether
[785963.017068] cdc_ncm 1-1:2.0: MAC-Address: 00:24:27:88:08:15
[785963.017072] cdc_ncm 1-1:2.0: setting rx_max = 16384
[785963.017169] cdc_ncm 1-1:2.0: setting tx_max = 16384
[785963.017682] cdc_ncm 1-1:2.0 usb0: register 'cdc_ncm' at usb-0000:00:14.0-1, CDC NCM, 00:24:27:88:08:15
[785963.019211] usbcore: registered new interface driver cdc_ncm
[785963.023856] usbcore: registered new interface driver cdc_wdm
[785963.025461] usbcore: registered new interface driver cdc_mbim
[785963.038824] cdc_ncm 1-1:2.0 enx002427880815: renamed from usb0
[785963.089586] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
[785963.121673] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
[785963.153682] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
...

This is about 2KB per second and will overwrite all contents of a 1MB
dmesg buffer in under 10 minutes rendering them useless for debugging
many kernel problems.

This is also an extra 180 MB/day in /var/logs (or 1GB per week) rendering
the majority of those logs useless too.

When the link is up (expected state), spew amount is >2x higher:
...
[786139.600992] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.632997] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
[786139.665097] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.697100] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
[786139.729094] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.761108] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
...

Chrome OS cannot support RTL8156 until this is fixed.
Signed-off-by: NGrant Grundler <grundler@chromium.org>
Reviewed-by: NHayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20210120011208.3768105-1-grundler@chromium.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

de658a19

20 1月, 2021 4 次提交

bonding: add a vlan+srcmac tx hashing option · 7b8fc010

由 Jarod Wilson 提交于 1月 18, 2021

This comes from an end-user request, where they're running multiple VMs on
hosts with bonded interfaces connected to some interest switch topologies,
where 802.3ad isn't an option. They're currently running a proprietary
solution that effectively achieves load-balancing of VMs and bandwidth
utilization improvements with a similar form of transmission algorithm.

Basically, each VM has it's own vlan, so it always sends its traffic out
the same interface, unless that interface fails. Traffic gets split
between the interfaces, maintaining a consistent path, with failover still
available if an interface goes down.

Unlike bond_eth_hash(), this hash function is using the full source MAC
address instead of just the last byte, as there are so few components to
the hash, and in the no-vlan case, we would be returning just the last
byte of the source MAC as the hash value. It's entirely possible to have
two NICs in a bond with the same last byte of their MAC, but not the same
MAC, so this adjustment should guarantee distinct hashes in all cases.

This has been rudimetarily tested to provide similar results to the
proprietary solution it is aiming to replace. A patch for iproute2 is also
posted, to properly support the new mode there as well.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Thomas Davis <tadavis@lbl.gov>
Signed-off-by: NJarod Wilson <jarod@redhat.com>
Link: https://lore.kernel.org/r/20210119010927.1191922-1-jarod@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

7b8fc010

net: add inline function skb_csum_is_sctp · fa821170

由 Xin Long 提交于 1月 16, 2021

This patch is to define a inline function skb_csum_is_sctp(), and
also replace all places where it checks if it's a SCTP CSUM skb.
This function would be used later in many networking drivers in
the following patches.
Suggested-by: NAlexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

fa821170

mdio-bitbang: Export mdiobb_{read,write}() · 8eed01b5

由 Geert Uytterhoeven 提交于 1月 18, 2021

Export mdiobb_read() and mdiobb_write(), so Ethernet controller drivers
can call them from their MDIO read/write wrappers.
Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Tested-by: NWolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

8eed01b5

mdio, phy: fix -Wshadow warnings triggered by nested container_of() · 7eab14de

由 Alexander Lobakin 提交于 1月 16, 2021

container_of() macro hides a local variable '__mptr' inside. This
becomes a problem when several container_of() are nested in each
other within single line or plain macros.
As C preprocessor doesn't support generating random variable names,
the sole solution is to avoid defining macros that consist only of
container_of() calls, or they will self-shadow '__mptr' each time:

In file included from ./include/linux/bitmap.h:10,
                 from drivers/net/phy/phy_device.c:12:
drivers/net/phy/phy_device.c: In function ‘phy_device_release’:
./include/linux/kernel.h:693:8: warning: declaration of ‘__mptr’ shadows a previous local [-Wshadow]
  693 |  void *__mptr = (void *)(ptr);     \
      |        ^~~~~~
./include/linux/phy.h:647:26: note: in expansion of macro ‘container_of’
  647 | #define to_phy_device(d) container_of(to_mdio_device(d), \
      |                          ^~~~~~~~~~~~
./include/linux/mdio.h:52:27: note: in expansion of macro ‘container_of’
   52 | #define to_mdio_device(d) container_of(d, struct mdio_device, dev)
      |                           ^~~~~~~~~~~~
./include/linux/phy.h:647:39: note: in expansion of macro ‘to_mdio_device’
  647 | #define to_phy_device(d) container_of(to_mdio_device(d), \
      |                                       ^~~~~~~~~~~~~~
drivers/net/phy/phy_device.c:217:8: note: in expansion of macro ‘to_phy_device’
  217 |  kfree(to_phy_device(dev));
      |        ^~~~~~~~~~~~~
./include/linux/kernel.h:693:8: note: shadowed declaration is here
  693 |  void *__mptr = (void *)(ptr);     \
      |        ^~~~~~
./include/linux/phy.h:647:26: note: in expansion of macro ‘container_of’
  647 | #define to_phy_device(d) container_of(to_mdio_device(d), \
      |                          ^~~~~~~~~~~~
drivers/net/phy/phy_device.c:217:8: note: in expansion of macro ‘to_phy_device’
  217 |  kfree(to_phy_device(dev));
      |        ^~~~~~~~~~~~~

As they are declared in header files, these warnings are highly
repetitive and very annoying (along with the one from linux/pci.h).

Convert the related macros from linux/{mdio,phy}.h to static inlines
to avoid self-shadowing and potentially improve bug-catching.
No functional changes implied.
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210116161246.67075-1-alobakin@pm.meSigned-off-by: NJakub Kicinski <kuba@kernel.org>

7eab14de

19 1月, 2021 1 次提交

net: netdevice: Add operation ndo_sk_get_lower_dev · 719a402c

由 Tariq Toukan 提交于 1月 17, 2021

ndo_sk_get_lower_dev returns the lower netdev that corresponds to
a given socket.
Additionally, we implement a helper netdev_sk_get_lowest_dev() to get
the lowest one in chain.
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Reviewed-by: NBoris Pismenny <borisp@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

719a402c

15 1月, 2021 8 次提交

compiler.h: Raise minimum version of GCC to 5.1 for arm64 · dca5244d

由 Will Deacon 提交于 1月 12, 2021

GCC versions >= 4.9 and < 5.1 have been shown to emit memory references
beyond the stack pointer, resulting in memory corruption if an interrupt
is taken after the stack pointer has been adjusted but before the
reference has been executed. This leads to subtle, infrequent data
corruption such as the EXT4 problems reported by Russell King at the
link below.

Life is too short for buggy compilers, so raise the minimum GCC version
required by arm64 to 5.1.
Reported-by: NRussell King <linux@armlinux.org.uk>
Suggested-by: NArnd Bergmann <arnd@kernel.org>
Signed-off-by: NWill Deacon <will@kernel.org>
Tested-by: NNathan Chancellor <natechancellor@gmail.com>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20210105154726.GD1551@shell.armlinux.org.uk
Link: https://lore.kernel.org/r/20210112224832.10980-1-will@kernel.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>

dca5244d

bpf: Add size arg to build_id_parse function · 921f88fc

由 Jiri Olsa 提交于 1月 14, 2021

It's possible to have other build id types (other than default SHA1).
Currently there's also ld support for MD5 build id.

Adding size argument to build_id_parse function, that returns (if defined)
size of the parsed build id, so we can recognize the build id type.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210114134044.1418404-3-jolsa@kernel.org

921f88fc

bpf: Move stack_map_get_build_id into lib · bd7525da

由 Jiri Olsa 提交于 1月 14, 2021

Moving stack_map_get_build_id into lib with
declaration in linux/buildid.h header:

  int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id);

This function returns build id for given struct vm_area_struct.
There is no functional change to stack_map_get_build_id function.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210114134044.1418404-2-jolsa@kernel.org

bd7525da

bpf: Add bitwise atomic instructions · 981f94c3

由 Brendan Jackman 提交于 1月 14, 2021

This adds instructions for

atomic[64]_[fetch_]and
atomic[64]_[fetch_]or
atomic[64]_[fetch_]xor

All these operations are isomorphic enough to implement with the same
verifier, interpreter, and x86 JIT code, hence being a single commit.

The main interesting thing here is that x86 doesn't directly support
the fetch_ version these operations, so we need to generate a CMPXCHG
loop in the JIT. This requires the use of two temporary registers,
IIUC it's safe to use BPF_REG_AX and x86's AUX_REG for this purpose.
Signed-off-by: NBrendan Jackman <jackmanb@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-10-jackmanb@google.com

981f94c3

bpf: Add instructions for atomic_[cmp]xchg · 5ffa2550

由 Brendan Jackman 提交于 1月 14, 2021

This adds two atomic opcodes, both of which include the BPF_FETCH
flag. XCHG without the BPF_FETCH flag would naturally encode
atomic_set. This is not supported because it would be of limited
value to userspace (it doesn't imply any barriers). CMPXCHG without
BPF_FETCH woulud be an atomic compare-and-write. We don't have such
an operation in the kernel so it isn't provided to BPF either.

There are two significant design decisions made for the CMPXCHG
instruction:

 - To solve the issue that this operation fundamentally has 3
   operands, but we only have two register fields. Therefore the
   operand we compare against (the kernel's API calls it 'old') is
   hard-coded to be R0. x86 has similar design (and A64 doesn't
   have this problem).

   A potential alternative might be to encode the other operand's
   register number in the immediate field.

 - The kernel's atomic_cmpxchg returns the old value, while the C11
   userspace APIs return a boolean indicating the comparison
   result. Which should BPF do? A64 returns the old value. x86 returns
   the old value in the hard-coded register (and also sets a
   flag). That means return-old-value is easier to JIT, so that's
   what we use.
Signed-off-by: NBrendan Jackman <jackmanb@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-8-jackmanb@google.com

5ffa2550

bpf: Add BPF_FETCH field / create atomic_fetch_add instruction · 5ca419f2

由 Brendan Jackman 提交于 1月 14, 2021

The BPF_FETCH field can be set in bpf_insn.imm, for BPF_ATOMIC
instructions, in order to have the previous value of the
atomically-modified memory location loaded into the src register
after an atomic op is carried out.
Suggested-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NBrendan Jackman <jackmanb@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-7-jackmanb@google.com

5ca419f2

bpf: Rename BPF_XADD and prepare to encode other atomics in .imm · 91c960b0

由 Brendan Jackman 提交于 1月 14, 2021

A subsequent patch will add additional atomic operations. These new
operations will use the same opcode field as the existing XADD, with
the immediate discriminating different operations.

In preparation, rename the instruction mode BPF_ATOMIC and start
calling the zero immediate BPF_ADD.

This is possible (doesn't break existing valid BPF progs) because the
immediate field is currently reserved MBZ and BPF_ADD is zero.

All uses are removed from the tree but the BPF_XADD definition is
kept around to avoid breaking builds for people including kernel
headers.
Signed-off-by: NBrendan Jackman <jackmanb@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NBjörn Töpel <bjorn.topel@gmail.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-5-jackmanb@google.com

91c960b0

net: phy: Add 100 base-x mode · b1ae3587

由 Bjarni Jonasson 提交于 1月 13, 2021

Sparx-5 supports this mode and it is missing in the PHY core.
Signed-off-by: NBjarni Jonasson <bjarni.jonasson@microchip.com>
Reviewed-by: NRussell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

b1ae3587

14 1月, 2021 7 次提交

can: dev: can_rx_offload_get_echo_skb(): extend to return can frame length · 99842c96

由 Marc Kleine-Budde 提交于 1月 11, 2021

In order to implement byte queue limits (bql) in CAN drivers, the length of the
CAN frame needs to be passed into the networking stack after queueing and after
transmission completion.

To avoid to calculate this length twice, extend can_rx_offload_get_echo_skb()
to return that value. Convert all users of this function, too.
Reviewed-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-15-mkl@pengutronix.deSigned-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

99842c96

can: dev: can_get_echo_skb(): extend to return can frame length · 9420e1d4

由 Marc Kleine-Budde 提交于 1月 11, 2021

In order to implement byte queue limits (bql) in CAN drivers, the length of the
CAN frame needs to be passed into the networking stack after queueing and after
transmission completion.

To avoid to calculate this length twice, extend can_get_echo_skb() to return
that value. Convert all users of this function, too.
Reviewed-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-14-mkl@pengutronix.deSigned-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

9420e1d4

can: dev: can_put_echo_skb(): extend to handle frame_len · 1dcb6e57

由 Vincent Mailhol 提交于 1月 11, 2021

Add a frame_len argument to can_put_echo_skb() which is used to save length of
the CAN frame into field frame_len of struct can_skb_priv so that it can be
later used after transmission completion. Convert all users of this function,
too.

Drivers which implement BQL call can_put_echo_skb() with the output of
can_skb_get_frame_len(skb) and drivers which do not simply pass zero as an
input (in the same way that NULL would be given to can_get_echo_skb()). This
way, we have a nice symmetry between the two echo functions.

Link: https://lore.kernel.org/r/20210111061335.39983-1-mailhol.vincent@wanadoo.frSigned-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Reviewed-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-13-mkl@pengutronix.deSigned-off-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>

1dcb6e57

can: dev: extend struct can_skb_priv to hold CAN frame length · f0ef72fe

由 Marc Kleine-Budde 提交于 1月 11, 2021

In order to implement byte queue limits (bql) in CAN drivers, the length of the
CAN frame needs to be passed into the networking stack after queueing and after
transmission completion.

To avoid to calculate this length twice, extend the struct can_skb_priv to hold
the length of the CAN frame and extend __can_get_echo_skb() to return that
value.
Reviewed-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-12-mkl@pengutronix.deSigned-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

f0ef72fe

can: length: can_skb_get_frame_len(): introduce function to get data length of... · 85d99c3e

由 Vincent Mailhol 提交于 1月 11, 2021

can: length: can_skb_get_frame_len(): introduce function to get data length of frame in data link layer

This patch adds the function can_skb_get_frame_len() which returns the length
of a CAN frame on the data link layer, including Start-of-frame, Identifier,
various other bits, the actual data, the CRC, the End-of-frame, the Inter frame
spacing.
Co-developed-by: NArunachalam Santhanam <arunachalam.santhanam@in.bosch.com>
Signed-off-by: NArunachalam Santhanam <arunachalam.santhanam@in.bosch.com>
Co-developed-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
Acked-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
Co-developed-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Link: https://lore.kernel.org/r/20210111141930.693847-11-mkl@pengutronix.deSigned-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

85d99c3e

can: length: canfd_sanitize_len(): add function to sanitize CAN-FD data length · 99b7beb0

由 Marc Kleine-Budde 提交于 1月 11, 2021

The data field in CAN-FD frames have specifig frame length (0, 1, 2, 3, 4, 5,
6, 7, 8, 12, 16, 20, 24, 32, 48, 64). This function "rounds" up a given length
to the next valid CAN-FD frame length.
Reviewed-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-10-mkl@pengutronix.deSigned-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

99b7beb0

net/mlx5: Add HW definition of reg_c_preserve · 838b00a2

由 Paul Blakey 提交于 1月 11, 2021

Add capability bit to test whether reg_c value is preserved on
recirculation.
Signed-off-by: NPaul Blakey <paulb@mellanox.com>
Signed-off-by: NMaor Dickman <maord@nvidia.com>
Reviewed-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

838b00a2

13 1月, 2021 11 次提交

Revert "arm64: Enable perf events based hard lockup detector" · b90d72a6

由 Will Deacon 提交于 1月 12, 2021

This reverts commit 367c820e.

lockup_detector_init() makes heavy use of per-cpu variables and must be
called with preemption disabled. Usually, it's handled early during boot
in kernel_init_freeable(), before SMP has been initialised.

Since we do not know whether or not our PMU interrupt can be signalled
as an NMI until considerably later in the boot process, the Arm PMU
driver attempts to re-initialise the lockup detector off the back of a
device_initcall(). Unfortunately, this is called from preemptible
context and results in the following splat:

  | BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
  | caller is debug_smp_processor_id+0x20/0x2c
  | CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.10.0+ #276
  | Hardware name: linux,dummy-virt (DT)
  | Call trace:
  |   dump_backtrace+0x0/0x3c0
  |   show_stack+0x20/0x6c
  |   dump_stack+0x2f0/0x42c
  |   check_preemption_disabled+0x1cc/0x1dc
  |   debug_smp_processor_id+0x20/0x2c
  |   hardlockup_detector_event_create+0x34/0x18c
  |   hardlockup_detector_perf_init+0x2c/0x134
  |   watchdog_nmi_probe+0x18/0x24
  |   lockup_detector_init+0x44/0xa8
  |   armv8_pmu_driver_init+0x54/0x78
  |   do_one_initcall+0x184/0x43c
  |   kernel_init_freeable+0x368/0x380
  |   kernel_init+0x1c/0x1cc
  |   ret_from_fork+0x10/0x30

Rather than bodge this with raw_smp_processor_id() or randomly disabling
preemption, simply revert the culprit for now until we figure out how to
do this properly.
Reported-by: NLecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: NWill Deacon <will@kernel.org>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20201221162249.3119-1-lecopzer.chen@mediatek.com
Link: https://lore.kernel.org/r/20210112221855.10666-1-will@kernel.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>

b90d72a6

can: dev: move netlink related code into seperate file · 0a042c6e

由 Marc Kleine-Budde 提交于 1月 11, 2021

This patch moves the netlink related code of the CAN device infrastructure into
a separate file.
Reviewed-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-7-mkl@pengutronix.deSigned-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

0a042c6e

can: dev: move skb related into seperate file · 18f2dbfd

由 Marc Kleine-Budde 提交于 1月 11, 2021

This patch moves the skb related code of the CAN device infrastructure into a
separate file.
Reviewed-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-6-mkl@pengutronix.deSigned-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

18f2dbfd

can: dev: move length related code into seperate file · bdd2e413

由 Marc Kleine-Budde 提交于 1月 11, 2021

This patch moves all CAN frame length related code of the CAN device
infrastructure into a separate file.
Reviewed-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-5-mkl@pengutronix.deSigned-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

bdd2e413

can: dev: move bittiming related code into seperate file · 5a9d5ecd

由 Marc Kleine-Budde 提交于 1月 11, 2021

This patch moves the bittiming related code of the CAN device infrastructure
into a separate file.
Reviewed-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-4-mkl@pengutronix.deSigned-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

5a9d5ecd

arm/kasan: fix the array size of kasan_early_shadow_pte[] · 29970dc2

由 Hailong Liu 提交于 1月 12, 2021

The size of kasan_early_shadow_pte[] now is PTRS_PER_PTE which defined
to 512 for arm.  This means that it only covers the prev Linux pte
entries, but not the HWTABLE pte entries for arm.

The reason it currently works is that the symbol kasan_early_shadow_page
immediately following kasan_early_shadow_pte in memory is page aligned,
which makes kasan_early_shadow_pte look like a 4KB size array.  But we
can't ensure the order is always right with different compiler/linker,
or if more bss symbols are introduced.

We had a test with QEMU + vexpress：put a 512KB-size symbol with
attribute __section(".bss..page_aligned") after kasan_early_shadow_pte,
and poisoned it after kasan_early_init().  Then enabled CONFIG_KASAN, it
failed to boot up.

Link: https://lkml.kernel.org/r/20210109044622.8312-1-hailongliiu@yeah.netSigned-off-by: NHailong Liu <liu.hailong6@zte.com.cn>
Signed-off-by: NZiliang Guo <guo.ziliang@zte.com.cn>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

29970dc2

mm/memcontrol: fix warning in mem_cgroup_page_lruvec() · 7ea510b9

由 Hugh Dickins 提交于 1月 12, 2021

Boot a CONFIG_MEMCG=y kernel with "cgroup_disabled=memory" and you are
met by a series of warnings from the VM_WARN_ON_ONCE_PAGE(!memcg, page)
recently added to the inline mem_cgroup_page_lruvec().

An earlier attempt to place that warning, in mem_cgroup_lruvec(), had
been careful to do so after weeding out the mem_cgroup_disabled() case;
but was itself invalid because of the mem_cgroup_lruvec(NULL, pgdat) in
clear_pgdat_congested() and age_active_anon().

Warning in mem_cgroup_page_lruvec() was once useful in detecting a KSM
charge bug, so may be worth keeping: but skip if mem_cgroup_disabled().

Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2101032056260.1093@eggly.anvils
Fixes: 9a1ac228 ("mm/memcontrol:rewrite mem_cgroup_page_lruvec()")
Signed-off-by: NHugh Dickins <hughd@google.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Acked-by: NChris Down <chris@chrisdown.name>
Reviewed-by: NBaoquan He <bhe@redhat.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Hui Su <sh_def@163.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7ea510b9

bpf: Support BPF ksym variables in kernel modules · 541c3bad

由 Andrii Nakryiko 提交于 1月 11, 2021

Add support for directly accessing kernel module variables from BPF programs
using special ldimm64 instructions. This functionality builds upon vmlinux
ksym support, but extends ldimm64 with src_reg=BPF_PSEUDO_BTF_ID to allow
specifying kernel module BTF's FD in insn[1].imm field.

During BPF program load time, verifier will resolve FD to BTF object and will
take reference on BTF object itself and, for module BTFs, corresponding module
as well, to make sure it won't be unloaded from under running BPF program. The
mechanism used is similar to how bpf_prog keeps track of used bpf_maps.

One interesting change is also in how per-CPU variable is determined. The
logic is to find .data..percpu data section in provided BTF, but both vmlinux
and module each have their own .data..percpu entries in BTF. So for module's
case, the search for DATASEC record needs to look at only module's added BTF
types. This is implemented with custom search function.
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Acked-by: NHao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20210112075520.4103414-6-andrii@kernel.org

541c3bad

bpf: Declare __bpf_free_used_maps() unconditionally · 936f8946

由 Andrii Nakryiko 提交于 1月 11, 2021

__bpf_free_used_maps() is always defined in kernel/bpf/core.c, while
include/linux/bpf.h is guarding it behind CONFIG_BPF_SYSCALL. Move it out of
that guard region and fix compiler warning.

Fixes: a2ea0746 ("bpf: Fix missing prog untrack in release_maps")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210112075520.4103414-4-andrii@kernel.org

936f8946

bpf: Avoid warning when re-casting __bpf_call_base into __bpf_call_base_args · 6943c2b0

由 Andrii Nakryiko 提交于 1月 11, 2021

BPF interpreter uses extra input argument, so re-casts __bpf_call_base into
__bpf_call_base_args. Avoid compiler warning about incompatible function
prototypes by casting to void * first.

Fixes: 1ea47e01 ("bpf: add support for bpf_call to interpreter")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210112075520.4103414-3-andrii@kernel.org

6943c2b0

bpf: Add bpf_patch_call_args prototype to include/linux/bpf.h · a643bff7

由 Andrii Nakryiko 提交于 1月 11, 2021

Add bpf_patch_call_args() prototype. This function is called from BPF verifier
and only if CONFIG_BPF_JIT_ALWAYS_ON is not defined. This fixes compiler
warning about missing prototype in some kernel configurations.

Fixes: 1ea47e01 ("bpf: add support for bpf_call to interpreter")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210112075520.4103414-2-andrii@kernel.org

a643bff7

12 1月, 2021 2 次提交

net: compound page support in skb_seq_read · 97550f6f

由 Willem de Bruijn 提交于 1月 09, 2021

skb_seq_read iterates over an skb, returning pointer and length of
the next data range with each call.

It relies on kmap_atomic to access highmem pages when needed.

An skb frag may be backed by a compound page, but kmap_atomic maps
only a single page. There are not enough kmap slots to always map all
pages concurrently.

Instead, if kmap_atomic is needed, iterate over each page.

As this increases the number of calls, avoid this unless needed.
The necessary condition is captured in skb_frag_must_loop.

I tried to make the change as obvious as possible. It should be easy
to verify that nothing changes if skb_frag_must_loop returns false.

Tested:
  On an x86 platform with
    CONFIG_HIGHMEM=y
    CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y
    CONFIG_NETFILTER_XT_MATCH_STRING=y

  Run
    ip link set dev lo mtu 1500
    iptables -A OUTPUT -m string --string 'badstring' -algo bm -j ACCEPT
    dd if=/dev/urandom of=in bs=1M count=20
    nc -l -p 8000 > /dev/null &
    nc -w 1 -q 0 localhost 8000 < in
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

97550f6f

net: support kmap_local forced debugging in skb_frag_foreach · 29766bcf

由 Willem de Bruijn 提交于 1月 09, 2021

Skb frags may be backed by highmem and/or compound pages. Highmem
pages need kmap_atomic mappings to access. But kmap_atomic maps a
single page, not the entire compound page.

skb_foreach_page iterates over an skb frag, in one step in the common
case, page by page only if kmap_atomic must be called for each page.
The decision logic is captured in skb_frag_must_loop.

CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP extends kmap from highmem to all
pages, to increase code coverage.

Extend skb_frag_must_loop to this new condition.

Link: https://lore.kernel.org/linux-mm/20210106180132.41dc249d@gandalf.local.home/
Fixes: 0e91a0c6 ("mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP")
Reported-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Tested-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

29766bcf

10 1月, 2021 1 次提交

net-gro: remove GRO_DROP · 1d11fa69

由 Eric Dumazet 提交于 1月 08, 2021

GRO_DROP can only be returned from napi_gro_frags()
if the skb has not been allocated by a prior napi_get_frags()

Since drivers must use napi_get_frags() and test its result
before populating the skb with metadata, we can safely remove
GRO_DROP since it offers no practical use.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: NEdward Cree <ecree.xilinx@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

1d11fa69

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功