提交 · 0f7ba7bc46fa0b574ccacf5672991b321e028492 · openeuler / Kernel

28 12月, 2020 1 次提交

netfilter: nftables: add set expression flags · b4e70d8d

由 Pablo Neira Ayuso 提交于 12月 27, 2020

The set flag NFT_SET_EXPR provides a hint to the kernel that userspace
supports for multiple expressions per set element. In the same
direction, NFT_DYNSET_F_EXPR specifies that dynset expression defines
multiple expressions per set element.

This allows new userspace software with old kernels to bail out with
EOPNOTSUPP. This update is similar to ef516e86 ("netfilter:
nf_tables: reintroduce the NFT_SET_CONCAT flag"). The NFT_SET_EXPR flag
needs to be set on when the NFTA_SET_EXPRESSIONS attribute is specified.
The NFT_SET_EXPR flag is not set on with NFTA_SET_EXPR to retain
backward compatibility in old userspace binaries.

Fixes: 48b0ae04 ("netfilter: nftables: netlink support for several set element expressions")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b4e70d8d

22 12月, 2020 1 次提交

ppp: Fix PPPIOCUNBRIDGECHAN request number · bcce55f5

由 Guillaume Nault 提交于 12月 19, 2020

PPPIOCGL2TPSTATS already uses 54. This shouldn't be a problem in
practice, but let's keep the logical decreasing assignment scheme.

Fixes: 4cf476ce ("ppp: add PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN ioctls")
Signed-off-by: NGuillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/e3a4c355e3820331d8e1fffef8522739aae58b57.1608380117.git.gnault@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

bcce55f5

20 12月, 2020 1 次提交

epoll: wire up syscall epoll_pwait2 · b0a0c261

由 Willem de Bruijn 提交于 12月 18, 2020

Split off from prev patch in the series that implements the syscall.

Link: https://lkml.kernel.org/r/20201121144401.3727659-4-willemdebruijn.kernel@gmail.comSigned-off-by: NWillem de Bruijn <willemb@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b0a0c261

19 12月, 2020 2 次提交

uapi: virtio_ids: add missing device type IDs from OASIS spec · be618636

由 Enrico Weigelt, metux IT consult 提交于 12月 02, 2020

The OASIS virtio spec (1.1) defines several IDs that aren't reflected
in the header yet. Fixing this by adding the missing IDs, even though
they're not yet used by the kernel yet.
Signed-off-by: NEnrico Weigelt, metux IT consult <info@metux.net>
Link: https://lore.kernel.org/r/20201202111931.31953-2-info@metux.netSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

be618636

uapi: virtio_ids.h: consistent indentions · 1e38f003

由 Enrico Weigelt, metux IT consult 提交于 12月 02, 2020

Fixing the differing indentions to be consistent and properly aligned.
Signed-off-by: NEnrico Weigelt, metux IT consult <info@metux.net>
Link: https://lore.kernel.org/r/20201202111931.31953-1-info@metux.netSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

1e38f003

17 12月, 2020 1 次提交

devlink: use _BITUL() macro instead of BIT() in the UAPI header · 75f4d454

由 Tobias Klauser 提交于 12月 15, 2020

The BIT() macro is not available for the UAPI headers. Moreover, it can
be defined differently in user space headers. Thus, replace its usage
with the _BITUL() macro which is already used in other macro definitions
in <linux/devlink.h>.

Fixes: dc64cc7c ("devlink: Add devlink reload limit option")
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Link: https://lore.kernel.org/r/20201215102531.16958-1-tklauser@distanz.chSigned-off-by: NJakub Kicinski <kuba@kernel.org>

75f4d454

16 12月, 2020 2 次提交

userfaultfd: add UFFD_USER_MODE_ONLY · 37cd0575

由 Lokesh Gidra 提交于 12月 14, 2020

Patch series "Control over userfaultfd kernel-fault handling", v6.

This patch series is split from [1].  The other series enables SELinux
support for userfaultfd file descriptors so that its creation and movement
can be controlled.

It has been demonstrated on various occasions that suspending kernel code
execution for an arbitrary amount of time at any access to userspace
memory (copy_from_user()/copy_to_user()/...) can be exploited to change
the intended behavior of the kernel.  For instance, handling page faults
in kernel-mode using userfaultfd has been exploited in [2, 3].  Likewise,
FUSE, which is similar to userfaultfd in this respect, has been exploited
in [4, 5] for similar outcome.

This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object.  It then adds a 'user-mode only' option to the
unprivileged_userfaultfd sysctl knob to require unprivileged callers to
use this new flag.

The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to enhance
security vulnerabilities by lengthening the race window in kernel code.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dancol@google.com/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808

This patch (of 2):

userfaultfd handles page faults from both user and kernel code.  Add a new
UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting
userfaultfd object refuse to handle faults from kernel mode, treating
these faults as if SIGBUS were always raised, causing the kernel code to
fail with EFAULT.

A future patch adds a knob allowing administrators to give some processes
the ability to create userfaultfd file objects only if they pass
UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will
exploit userfaultfd's ability to delay kernel page faults to open timing
windows for future exploits.

Link: https://lkml.kernel.org/r/20201120030411.2690816-1-lokeshgidra@google.com
Link: https://lkml.kernel.org/r/20201120030411.2690816-2-lokeshgidra@google.comSigned-off-by: NDaniel Colascione <dancol@google.com>
Signed-off-by: NLokesh Gidra <lokeshgidra@google.com>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <calin@google.com>
Cc: Daniel Colascione <dancol@dancol.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

37cd0575

uapi: move constants from <linux/kernel.h> to <linux/const.h> · a85cbe61

由 Petr Vorel 提交于 12月 14, 2020

and include <linux/const.h> in UAPI headers instead of <linux/kernel.h>.

The reason is to avoid indirect <linux/sysinfo.h> include when using
some network headers: <linux/netlink.h> or others -> <linux/kernel.h>
-> <linux/sysinfo.h>.

This indirect include causes on MUSL redefinition of struct sysinfo when
included both <sys/sysinfo.h> and some of UAPI headers:

    In file included from x86_64-buildroot-linux-musl/sysroot/usr/include/linux/kernel.h:5,
                     from x86_64-buildroot-linux-musl/sysroot/usr/include/linux/netlink.h:5,
                     from ../include/tst_netlink.h:14,
                     from tst_crypto.c:13:
    x86_64-buildroot-linux-musl/sysroot/usr/include/linux/sysinfo.h:8:8: error: redefinition of `struct sysinfo'
     struct sysinfo {
            ^~~~~~~
    In file included from ../include/tst_safe_macros.h:15,
                     from ../include/tst_test.h:93,
                     from tst_crypto.c:11:
    x86_64-buildroot-linux-musl/sysroot/usr/include/sys/sysinfo.h:10:8: note: originally defined here

Link: https://lkml.kernel.org/r/20201015190013.8901-1-petr.vorel@gmail.comSigned-off-by: NPetr Vorel <petr.vorel@gmail.com>
Suggested-by: NRich Felker <dalias@aerifal.cx>
Acked-by: NRich Felker <dalias@libc.org>
Cc: Peter Korsgaard <peter@korsgaard.com>
Cc: Baruch Siach <baruch@tkos.co.il>
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a85cbe61

15 12月, 2020 2 次提交

vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag · caaf95e0

由 Andra Paraschiv 提交于 12月 14, 2020

Add VMADDR_FLAG_TO_HOST vsock flag that is used to setup a vsock
connection where all the packets are forwarded to the host.

Then, using this type of vsock channel, vsock communication between
sibling VMs can be built on top of it.

Changelog

v3 -> v4

* Update the "VMADDR_FLAG_TO_HOST" value, as the size of the field has
  been updated to 1 byte.

v2 -> v3

* Update comments to mention when the flag is set in the connect and
  listen paths.

v1 -> v2

* New patch in v2, it was split from the first patch in the series.
* Remove the default value for the vsock flags field.
* Update the naming for the vsock flag to "VMADDR_FLAG_TO_HOST".
Signed-off-by: NAndra Paraschiv <andraprs@amazon.com>
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

caaf95e0

vm_sockets: Add flags field in the vsock address data structure · dc8eeef7

由 Andra Paraschiv 提交于 12月 14, 2020

vsock enables communication between virtual machines and the host they
are running on. With the multi transport support (guest->host and
host->guest), nested VMs can also use vsock channels for communication.

In addition to this, by default, all the vsock packets are forwarded to
the host, if no host->guest transport is loaded. This behavior can be
implicitly used for enabling vsock communication between sibling VMs.

Add a flags field in the vsock address data structure that can be used
to explicitly mark the vsock connection as being targeted for a certain
type of communication. This way, can distinguish between different use
cases such as nested VMs and sibling VMs.

This field can be set when initializing the vsock address variable used
for the connect() call.

Changelog

v3 -> v4

* Update the size of "svm_flags" field to be 1 byte instead of 2 bytes.

v2 -> v3

* Add "svm_flags" as a new field, not reusing "svm_reserved1".

v1 -> v2

* Update the field name to "svm_flags".
* Split the current patch in 2 patches.
Signed-off-by: NAndra Paraschiv <andraprs@amazon.com>
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

dc8eeef7

14 12月, 2020 3 次提交

cifs: Set witness notification handler for messages from userspace daemon · fed979a7

由 Samuel Cabrero 提交于 11月 30, 2020

+ Set a handler for the witness notification messages received from the
  userspace daemon.

+ Handle the resource state change notification. When the resource
  becomes unavailable or available set the tcp status to
  CifsNeedReconnect for all channels.
Signed-off-by: NSamuel Cabrero <scabrero@suse.de>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

fed979a7

cifs: Send witness register and unregister commands to userspace daemon · bf80e5d4

由 Samuel Cabrero 提交于 11月 30, 2020

+ Define the generic netlink family commands and message attributes to
  communicate with the userspace daemon

+ The register and unregister commands are sent when connecting or
  disconnecting a tree. The witness registration keeps a pointer to
  the tcon and has the same lifetime.

+ Each registration has an id allocated by an IDR. This id is sent to the
  userspace daemon in the register command, and will be included in the
  notification messages from the userspace daemon to retrieve from the
  IDR the matching registration.

+ The authentication information is bundled in the register message.
  If kerberos is used the message just carries a flag.
Signed-off-by: NSamuel Cabrero <scabrero@suse.de>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

bf80e5d4

cifs: Register generic netlink family · 06f08dab

由 Samuel Cabrero 提交于 11月 30, 2020

Register a new generic netlink family to talk to the witness service
userspace daemon.
Signed-off-by: NSamuel Cabrero <scabrero@suse.de>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

06f08dab

13 12月, 2020 1 次提交

netfilter: nftables: netlink support for several set element expressions · 48b0ae04

由 Pablo Neira Ayuso 提交于 12月 07, 2020

This patch adds three new netlink attributes to encapsulate a list of
expressions per set elements:

- NFTA_SET_EXPRESSIONS: this attribute provides the set definition in
  terms of expressions. New set elements get attached the list of
  expressions that is specified by this new netlink attribute.
- NFTA_SET_ELEM_EXPRESSIONS: this attribute allows users to restore (or
  initialize) the stateful information of set elements when adding an
  element to the set.
- NFTA_DYNSET_EXPRESSIONS: this attribute specifies the list of
  expressions that the set element gets when it is inserted from the
  packet path.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

48b0ae04

12 12月, 2020 2 次提交

RDMA/rxe: Use acquire/release for memory ordering · d21a1240

由 Bob Pearson 提交于 12月 10, 2020

Change work and completion queues to use smp_load_acquire() and
smp_store_release() to synchronize between driver and users. This commit
goes with a matching series of commits in the rxe user space provider.

Link: https://lore.kernel.org/r/20201210174258.5234-1-rpearson@hpe.comSigned-off-by: NBob Pearson <rpearson@hpe.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

d21a1240

bpf: Fix enum names for bpf_this_cpu_ptr() and bpf_per_cpu_ptr() helpers · b7906b70

由 Andrii Nakryiko 提交于 12月 11, 2020

Remove bpf_ prefix, which causes these helpers to be reported in verifier
dump as bpf_bpf_this_cpu_ptr() and bpf_bpf_per_cpu_ptr(), respectively. Lets
fix it as long as it is still possible before UAPI freezes on these helpers.

Fixes: eaa6bcb7 ("bpf: Introduce bpf_per_cpu_ptr()")
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b7906b70

11 12月, 2020 6 次提交

dmaengine: idxd: add IAX configuration support in the IDXD driver · f25b4638

由 Dave Jiang 提交于 11月 17, 2020

Add support to allow configuration of Intel Analytics Accelerator (IAX) in
addition to the Intel Data Streaming Accelerator (DSA). The IAX hardware
has the same configuration interface as DSA. The main difference
is the type of operations it performs. We can support the DSA and
IAX devices on the same driver with some tweaks.

IAX has a 64B completion record that needs to be 64B aligned, as opposed to
a 32B completion record that is 32B aligned for DSA. IAX also does not
support token management.
Signed-off-by: NDave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/160564555488.1834439.4261958859935360473.stgit@djiang5-desk3.ch.intel.comSigned-off-by: NVinod Koul <vkoul@kernel.org>

f25b4638

nl80211: add common API to configure SAR power limitations · 6bdb68ce

由 Carl Huang 提交于 12月 03, 2020

NL80211_CMD_SET_SAR_SPECS is added to configure SAR from
user space. NL80211_ATTR_SAR_SPEC is used to pass the SAR
power specification when used with NL80211_CMD_SET_SAR_SPECS.

Wireless driver needs to register SAR type, supported frequency
ranges to wiphy, so user space can query it. The index in
frequency range is used to specify which sub band the power
limitation applies to. The SAR type is for compatibility, so later
other SAR mechanism can be implemented without breaking the user
space SAR applications.

Normal process is user space queries the SAR capability, and
gets the index of supported frequency ranges and associates the
power limitation with this index and sends to kernel.

Here is an example of message send to kernel:
8c 00 00 00 08 00 01 00 00 00 00 00 38 00 2b 81
08 00 01 00 00 00 00 00 2c 00 02 80 14 00 00 80
08 00 02 00 00 00 00 00 08 00 01 00 38 00 00 00
14 00 01 80 08 00 02 00 01 00 00 00 08 00 01 00
48 00 00 00

NL80211_CMD_SET_SAR_SPECS:  0x8c
NL80211_ATTR_WIPHY:     0x01(phy idx is 0)
NL80211_ATTR_SAR_SPEC:  0x812b (NLA_NESTED)
NL80211_SAR_ATTR_TYPE:  0x00 (NL80211_SAR_TYPE_POWER)
NL80211_SAR_ATTR_SPECS: 0x8002 (NLA_NESTED)
freq range 0 power: 0x38 in 0.25dbm unit (14dbm)
freq range 1 power: 0x48 in 0.25dbm unit (18dbm)
Signed-off-by: NCarl Huang <cjhuang@codeaurora.org>
Reviewed-by: NBrian Norris <briannorris@chromium.org>
Reviewed-by: NAbhishek Kumar <kuabhs@chromium.org>
Link: https://lore.kernel.org/r/20201203103728.3034-2-cjhuang@codeaurora.org
[minor edits, NLA parse cleanups]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

6bdb68ce

cfg80211: support immediate reconnect request hint · 3bb02143

由 Johannes Berg 提交于 12月 06, 2020

There are cases where it's necessary to disconnect, but an
immediate reconnection is desired. Support a hint to userspace
that this is the case, by including a new attribute in the
deauth or disassoc event.
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20201206145305.58d33941fb9d.I0e7168c205c7949529c8e3b86f3c9b12c01a7017@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

3bb02143

cfg80211: include block-tx flag in channel switch started event · 669b8413

由 Johannes Berg 提交于 11月 29, 2020

In the NL80211_CMD_CH_SWITCH_STARTED_NOTIFY event, include the
NL80211_ATTR_CH_SWITCH_BLOCK_TX flag attribute if block-tx was
requested by the AP.
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20201129172929.8953ef22cc64.Ifee9cab337a4369938545920ba5590559e91327a@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

669b8413

rfkill: add a reason to the HW rfkill state · 14486c82

由 Emmanuel Grumbach 提交于 11月 04, 2020

The WLAN device may exist yet not be usable. This can happen
when the WLAN device is controllable by both the host and
some platform internal component.
We need some arbritration that is vendor specific, but when
the device is not available for the host, we need to reflect
this state towards the user space.

Add a reason field to the rfkill object (and event) so that
userspace can know why the device is in rfkill: because some
other platform component currently owns the device, or
because the actual hw rfkill signal is asserted.

Capable userspace can now determine the reason for the rfkill
and possibly do some negotiation on a side band channel using
a proprietary protocol to gain ownership on the device in case
the device is owned by some other component. When the host
gains ownership on the device, the kernel can remove the
RFKILL_HARD_BLOCK_NOT_OWNER reason and the hw rfkill state
will be off. Then, the userspace can bring the device up and
start normal operation.

The rfkill_event structure is enlarged to include the additional
byte, it is now 9 bytes long. Old user space will ask to read
only 8 bytes so that the kernel can know not to feed them with
more data. When the user space writes 8 bytes, new kernels will
just read what is present in the file descriptor. This new byte
is read only from the userspace standpoint anyway.

If a new user space uses an old kernel, it'll ask to read 9 bytes
but will get only 8, and it'll know that it didn't get the new
state. When it'll write 9 bytes, the kernel will again ignore
this new byte which is read only from the userspace standpoint.
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://lore.kernel.org/r/20201104134641.28816-1-emmanuel.grumbach@intel.comSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

14486c82

ppp: add PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN ioctls · 4cf476ce

由 Tom Parkin 提交于 12月 10, 2020

This new ioctl pair allows two ppp channels to be bridged together:
frames arriving in one channel are transmitted in the other channel
and vice versa.

The practical use for this is primarily to support the L2TP Access
Concentrator use-case.  The end-user session is presented as a ppp
channel (typically PPPoE, although it could be e.g. PPPoA, or even PPP
over a serial link) and is switched into a PPPoL2TP session for
transmission to the LNS.  At the LNS the PPP session is terminated in
the ISP's network.

When a PPP channel is bridged to another it takes a reference on the
other's struct ppp_file.  This reference is dropped when the channels
are unbridged, which can occur either explicitly on userspace calling
the PPPIOCUNBRIDGECHAN ioctl, or implicitly when either channel in the
bridge is unregistered.

In order to implement the channel bridge, struct channel is extended
with a new field, 'bridge', which points to the other struct channel
making up the bridge.

This pointer is RCU protected to avoid adding another lock to the data
path.

To guard against concurrent writes to the pointer, the existing struct
channel lock 'upl' coverage is extended rather than adding a new lock.

The 'upl' lock is used to protect the existing unit pointer.  Since the
bridge effectively replaces the unit (they're mutually exclusive for a
channel) it makes coding easier to use the same lock to cover them
both.
Signed-off-by: NTom Parkin <tparkin@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4cf476ce

10 12月, 2020 7 次提交

can: isotp: add SF_BROADCAST support for functional addressing · 921ca574

由 Oliver Hartkopp 提交于 12月 06, 2020

When CAN_ISOTP_SF_BROADCAST is set in the CAN_ISOTP_OPTS flags the CAN_ISOTP
socket is switched into functional addressing mode, where only single frame
(SF) protocol data units can be send on the specified CAN interface and the
given tp.tx_id after bind().

In opposite to normal and extended addressing this socket does not register a
CAN-ID for reception which would be needed for a 1-to-1 ISOTP connection with a
segmented bi-directional data transfer.

Sending SFs on this socket is therefore a TX-only 'broadcast' operation.
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NThomas Wagner <thwa1@web.de>
Link: https://lore.kernel.org/r/20201206144731.4609-1-socketcan@hartkopp.netSigned-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

921ca574

io_uring: add timeout update · 9c8e11b3

由 Pavel Begunkov 提交于 11月 30, 2020

Support timeout updates through IORING_OP_TIMEOUT_REMOVE with passed in
IORING_TIMEOUT_UPDATE. Updates doesn't support offset timeout mode.
Oirignal timeout.off will be ignored as well.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
[axboe: remove now unused 'ret' variable]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9c8e11b3

io_uring: add timeout support for io_uring_enter() · c73ebb68

由 Hao Xu 提交于 11月 03, 2020

Now users who want to get woken when waiting for events should submit a
timeout command first. It is not safe for applications that split SQ and
CQ handling between two threads, such as mysql. Users should synchronize
the two threads explicitly to protect SQ and that will impact the
performance.

This patch adds support for timeout to existing io_uring_enter(). To
avoid overloading arguments, it introduces a new parameter structure
which contains sigmask and timeout.

I have tested the workloads with one thread submiting nop requests
while the other reaping the cqe with timeout. It shows 1.8~2x faster
when the iodepth is 16.
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
[axboe: various cleanups/fixes, and name change to SIG_IS_DATA]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c73ebb68

io_uring: add support for IORING_OP_UNLINKAT · 14a1143b

由 Jens Axboe 提交于 9月 28, 2020

IORING_OP_UNLINKAT behaves like unlinkat(2) and takes the same flags
and arguments.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

14a1143b

io_uring: add support for IORING_OP_RENAMEAT · 80a261fd

由 Jens Axboe 提交于 9月 28, 2020

IORING_OP_RENAMEAT behaves like renameat2(), and takes the same flags
etc.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

80a261fd

io_uring: allow non-fixed files with SQPOLL · 28cea78a

由 Jens Axboe 提交于 9月 14, 2020

The restriction of needing fixed files for SQPOLL is problematic, and
prevents/inhibits several valid uses cases. With the referenced
files_struct that we have now, it's trivially supportable.

Treat ->files like we do the mm for the SQPOLL thread - grab a reference
to it (and assign it), and drop it when we're done.

This feature is exposed as IORING_FEAT_SQPOLL_NONFIXED.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

28cea78a

btrfs: calculate inline extent buffer page size based on page size · deb67895

由 Qu Wenruo 提交于 12月 02, 2020

Btrfs only support 64K as maximum node size, thus for 4K page system, we
would have at most 16 pages for one extent buffer.

For a system using 64K page size, we would really have just one page.

While we always use 16 pages for extent_buffer::pages, this means for
systems using 64K pages, we are wasting memory for 15 page pointers
which will never be used.

Calculate the array size based on page size and the node size maximum.

- for systems using 4K page size, it will stay 16 pages
- for systems using 64K page size, it will be 1 page

Move the definition of BTRFS_MAX_METADATA_BLOCKSIZE to btrfs_tree.h, to
avoid circular inclusion of ctree.h.
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

deb67895

09 12月, 2020 1 次提交

binder: add flag to clear buffer on txn complete · 0f966cba

由 Todd Kjos 提交于 11月 20, 2020

Add a per-transaction flag to indicate that the buffer
must be cleared when the transaction is complete to
prevent copies of sensitive data from being preserved
in memory.
Signed-off-by: NTodd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20201120233743.3617529-1-tkjos@google.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

0f966cba

08 12月, 2020 2 次提交

btrfs: introduce ZONED feature flag · 7b3d5a90

由 Naohiro Aota 提交于 11月 10, 2020

This patch introduces the ZONED incompat flag. The flag indicates that
the volume management will satisfy the constraints imposed by
host-managed zoned block devices (aligned chunk allocation, append-only
updates, reset zone after filled).

As the zoned support will happen incrementally due to enhancing some
core infrastructure like super block writes, tree-log, raid support, the
feature will appear in sysfs only on debug builds. It will be enabled
once the support is feature complete and applications can reliably check
whether zoned support is present or not.
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7b3d5a90

RDMA/hns: Move capability flags of QP and CQ to hns-abi.h · 53ef4999

由 Weihang Li 提交于 12月 02, 2020

These flags will be returned to the userspace through ABI, so they should
be defined in hns-abi.h. Furthermore, there is no need to include
hns-abi.h in every source files, it just needs to be included in the
common header file.

Link: https://lore.kernel.org/r/1606872560-17823-1-git-send-email-liweihang@huawei.comReported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NWeihang Li <liweihang@huawei.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

53ef4999

07 12月, 2020 4 次提交

media: doc: pixfmt-rgb: Clarify naming scheme for RGB formats · e9a66489

由 Laurent Pinchart 提交于 12月 07, 2020

The naming scheme for the RGB pixel formats has been developed
organically, and isn't consistent between formats using less than 8 bits
per pixels (mostly stored in 1 or 2 bytes per pixel, except for RGB666
that uses 4 bytes per pixel) and formats with 8 bits per pixel (stored
in 3 or 4 bytes). For the latter category, the names use a components
order convention that is the opposite of the first category, and the
opposite of DRM pixel formats. This has led to lots of confusion in the
past, and would really benefit from being explained more precisely. Do
so, which also prepares for the addition of additional RGB pixels
formats.
Signed-off-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NHans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>

e9a66489

media: videodev2.h: Move HM12 format to YUV semi-planar section · 473dbed5

由 Laurent Pinchart 提交于 12月 07, 2020

V4L2_PIX_FMT_HM12 is a YUV semi-planar macro-block format. Move it from
the packed YUV formats section where it was misplaced to the YUV
semi-planar formats section.
Signed-off-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NHans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>

473dbed5

media: videodev2.h: Move HI240 format to vendor-specific section · 0a078e0d

由 Laurent Pinchart 提交于 12月 07, 2020

V4L2_PIX_FMT_HI240 is a 8-bit dithered RGB format specific to BTTV. Move
it from the packed YUV formats section where it was misplaced to the
vendor-specific formats section.
Signed-off-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NHans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>

0a078e0d

media: videodev2.h: Remove unneeded comment about 4CC value · 3771c031

由 Laurent Pinchart 提交于 12月 07, 2020

The V4L2_PIX_FMT_BGRA444 format has a comment that explains why its 4CC
value is GA12. This explains the development history and isn't of much
interest to readers, it should have been part of a commit message
instead. Drop the comment, anyone interested in history can turn to git.
Signed-off-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NHans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>

3771c031

06 12月, 2020 1 次提交

gpiolib: cdev: allow edge event timestamps to be configured as REALTIME · 26d060e4

由 Kent Gibson 提交于 10月 15, 2020

Using CLOCK_REALTIME as the source for event timestamps is crucial for
some specific applications, particularly those requiring timetamps
relative to a PTP clock, so provide an option to switch the event
timestamp source from the default CLOCK_MONOTONIC to CLOCK_REALTIME.

Note that CLOCK_REALTIME was the default source clock for GPIO until
Linux 5.7 when it was changed to CLOCK_MONOTONIC due to issues with the
shifting of the realtime clock.
Providing this option maintains the CLOCK_MONOTONIC as the default,
while also providing a path forward for those dependent on the pre-5.7
behaviour.
Suggested-by: NJack Winch <sunt.un.morcov@gmail.com>
Signed-off-by: NKent Gibson <warthog618@gmail.com>
Link: https://lore.kernel.org/r/20201014231158.34117-2-warthog618@gmail.comSigned-off-by: NLinus Walleij <linus.walleij@linaro.org>

26d060e4

05 12月, 2020 3 次提交

net-zerocopy: Defer vm zap unless actually needed. · 94ab9eb9

由 Arjun Roy 提交于 12月 02, 2020

Zapping pages is required only if we are calling vm_insert_page into a
region where pages had previously been mapped. Receive zerocopy allows
reusing such regions, and hitherto called zap_page_range() before
calling vm_insert_page() in that range.

zap_page_range() can also be triggered from userspace with
madvise(MADV_DONTNEED). If userspace is configured to call this before
reusing a segment, or if there was nothing mapped at this virtual
address to begin with, we can avoid calling zap_page_range() under the
socket lock. That said, if userspace does not do that, then we are
still responsible for calling zap_page_range().

This patch adds a flag that the user can use to hint to the kernel
that a zap is not required. If the flag is not set, or if an older
user application does not have a flags field at all, then the kernel
calls zap_page_range as before. Also, if the flag is set but a zap is
still required, the kernel performs that zap as necessary. Thus
incorrectly indicating that a zap can be avoided does not change the
correctness of operation. It also increases the batchsize for
vm_insert_pages and prefetches the page struct for the batch since
we're about to bump the refcount.

An alternative mechanism could be to not have a flag, assume by
default a zap is not needed, and fall back to zapping if needed.
However, this would harm performance for older applications for which
a zap is necessary, and thus we implement it with an explicit flag
so newer applications can opt in.

When using RPC-style traffic with medium sized (tens of KB) RPCs, this
change yields an efficency improvement of about 30% for QPS/CPU usage.
Signed-off-by: NArjun Roy <arjunroy@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

94ab9eb9

net-zerocopy: Copy straggler unaligned data for TCP Rx. zerocopy. · 18fb76ed

由 Arjun Roy 提交于 12月 02, 2020

When TCP receive zerocopy does not successfully map the entire
requested space, it outputs a 'hint' that the caller should recvmsg().

Augment zerocopy to accept a user buffer that it tries to copy this
hint into - if it is possible to copy the entire hint, it will do so.
This elides a recvmsg() call for received traffic that isn't exactly
page-aligned in size.

This was tested with RPC-style traffic of arbitrary sizes. Normally,
each received message required at least one getsockopt() call, and one
recvmsg() call for the remaining unaligned data.

With this change, almost all of the recvmsg() calls are eliminated,
leading to a savings of about 25%-50% in number of system calls
for RPC-style workloads.
Signed-off-by: NArjun Roy <arjunroy@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

18fb76ed

bpf: Add a bpf_sock_from_file helper · 4f19cab7

由 Florent Revest 提交于 12月 04, 2020

While eBPF programs can check whether a file is a socket by file->f_op
== &socket_file_ops, they cannot convert the void private_data pointer
to a struct socket BTF pointer. In order to do this a new helper
wrapping sock_from_file is added.

This is useful to tracing programs but also other program types
inheriting this set of helpers such as iterators or LSM programs.
Signed-off-by: NFlorent Revest <revest@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NKP Singh <kpsingh@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201204113609.1850150-2-revest@google.com

4f19cab7

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功