提交 · 74e4b90988b25d7bb60cf072b0f1b1afc1af27d5 · openeuler / Kernel

09 7月, 2021 1 次提交

drm/i915: Drop I915_CONTEXT_PARAM_RINGSIZE · fe4751c3

由 Jason Ekstrand 提交于 7月 08, 2021

This reverts commit 88be76cd ("drm/i915: Allow userspace to specify
ringsize on construction"). This API was originally added for OpenCL
but the compute-runtime PR has sat open for a year without action so we
can still pull it out if we want. I argue we should drop it for three
reasons:

1. If the compute-runtime PR has sat open for a year, this clearly
isn't that important.

2. It's a very leaky API. Ring size is an implementation detail of the
current execlist scheduler and really only makes sense there. It
can't apply to the older ring-buffer scheduler on pre-execlist
hardware because that's shared across all contexts and it won't
apply to the GuC scheduler that's in the pipeline.

3. Having userspace set a ring size in bytes is a bad solution to the
problem of having too small a ring. There is no way that userspace
has the information to know how to properly set the ring size so
it's just going to detect the feature and always set it to the
maximum of 512K. This is what the compute-runtime PR does. The
scheduler in i915, on the other hand, does have the information to
make an informed choice. It could detect if the ring size is a
problem and grow it itself. Or, if that's too hard, we could just
increase the default size from 16K to 32K or even 64K instead of
relying on userspace to do it.

Let's drop this API for now and, if someone decides they really care
about solving this problem, they can do it properly.
Signed-off-by: NJason Ekstrand <jason@jlekstrand.net>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-2-jason@jlekstrand.net

fe4751c3

21 6月, 2021 1 次提交

drm/i915: Document the Virtual Engine uAPI · 57772953

由 Tvrtko Ursulin 提交于 6月 18, 2021

A little bit of documentation covering the topics of engine discovery,
context engine maps and virtual engines. It is not very detailed but
supposed to be a starting point of giving a brief high level overview of
general principles and intended use cases.

v2:
 * Have the text in uapi header and link from there.

v4:
 * Link from driver-uapi.rst.
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210618150036.2507653-1-tvrtko.ursulin@linux.intel.com

57772953

14 6月, 2021 1 次提交

drm/i915: Fix busy ioctl commentary · c649432e

由 Tvrtko Ursulin 提交于 6月 11, 2021

Just tidy one instance of incorrect context parameter name and a stray
sentence ending from before reporting was converted to be class based.
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210611132221.1055650-1-tvrtko.ursulin@linux.intel.com

c649432e

05 6月, 2021 1 次提交

drm/amdgpu: add uapi to define yellow carp series · 90a187d2

由 Aaron Liu 提交于 6月 19, 2020

Add a flag to define yellow carp series.
Signed-off-by: NAaron Liu <aaron.liu@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

90a187d2

04 6月, 2021 1 次提交

drm/panfrost: Add AFBC_FEATURES parameter · 3e2926f8

由 Alyssa Rosenzweig 提交于 6月 04, 2021

The value of the AFBC_FEATURES register is required by userspace to
determine AFBC support on Bifrost. A user on our IRC channel (#panfrost)
reported a workload that raised a fault on one system's Mali G31 but
worked flawlessly with another system's Mali G31. We determined the
cause to be missing AFBC support on one vendor's Mali implementation --
it turns out AFBC is optional on Bifrost!

Whether AFBC is supported or not is exposed in the AFBC_FEATURES
register on Bifrost, which reads back as 0 on Midgard. A zero value
indicates AFBC is fully supported, provided the architecture itself
supports AFBC, allowing backwards-compatibility with Midgard. Bits 0 and
15 indicate that AFBC support is absent for texturing and rendering
respectively.

The user experiencing the fault reports that AFBC_FEATURES reads back
0x10001 on their system, confirming the architectural lack of AFBC.
Userspace needs this parameter to know to disable AFBC on that
chip, and perhaps others.

v2: Fix typo from copy-paste fail.

v3: Bump the UABI version. This commit was cherry-picked from another
series so chalking this up to a rebase fail.
Signed-off-by: NAlyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: NSteven Price <steven.price@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: NSteven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210604130011.3203-1-alyssa.rosenzweig@collabora.com

3e2926f8

02 6月, 2021 1 次提交

drm/amdgpu: Add vbios info ioctl interface · 29b4c589

由 Jiawei Gu 提交于 4月 14, 2021

Add AMDGPU_INFO_VBIOS_INFO subquery id for detailed vbios info.

Provides a way for the user application to get the VBIOS
information without having to parse the binary.
It is useful for the user to be able to display in a simple way the VBIOS
version in their system if they happen to encounter an issue.

V2:
Use numeric serial.
Parse and expose vbios version string.

V3:
Remove redundant data in drm_amdgpu_info_vbios struct.

V4:
64 bit alignment in drm_amdgpu_info_vbios.

v5: squash together all the reverts, etc. (Alex)
Signed-off-by: NJiawei Gu <Jiawei.Gu@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

29b4c589

01 6月, 2021 3 次提交

drm: document minimum kernel version for DRM_CLIENT_CAP_* · 2e290c8d

由 Simon Ser 提交于 5月 18, 2021

The kernel versions including the following commits are referenced:

DRM_CLIENT_CAP_STEREO_3D
61d8e328 ("drm: Add a STEREO_3D capability to the SET_CLIENT_CAP ioctl")

DRM_CLIENT_CAP_UNIVERSAL_PLANES
681e7ec7 ("drm: Allow userspace to ask for universal plane list (v2)")
c7dbc6c9 ("drm: Remove command line guard for universal planes")

DRM_CLIENT_CAP_ATOMIC
88a48e29 ("drm: add atomic properties")
8b72ce15 ("drm: Always enable atomic API")

DRM_CLIENT_CAP_ASPECT_RATIO
7595bda2 ("drm: Add DRM client cap for aspect-ratio")

DRM_CLIENT_CAP_WRITEBACK_CONNECTORS
d67b6a20 ("drm: writeback: Add client capability for exposing writeback connectors")
Signed-off-by: NSimon Ser <contact@emersion.fr>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: NDaniel Stone <daniels@collabora.com>
Acked-by: NPekka Paalanen <pekka.paalanen@collabora.com>
Link: https://patchwork.freedesktop.org/patch/434202/

2e290c8d

drm: clarify and linkify DRM_CLIENT_CAP_WRITEBACK_CONNECTORS docs · bbf4627b

由 Simon Ser 提交于 5月 18, 2021

Make it clear that the client is responsible for enabling ATOMIC
prior to enabling WRITEBACK_CONNECTORS. Linkify the reference to
ATOMIC.
Signed-off-by: NSimon Ser <contact@emersion.fr>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: NDaniel Stone <daniels@collabora.com>
Acked-by: NPekka Paalanen <pekka.paalanen@collabora.com>
Link: https://patchwork.freedesktop.org/patch/434200/

bbf4627b

drm: reference mode flags in DRM_CLIENT_CAP_* docs · 88938bf3

由 Simon Ser 提交于 5月 18, 2021

In the docs for DRM_CLIENT_CAP_STEREO_3D and
DRM_CLIENT_CAP_ASPECT_RATIO, reference the DRM_MODE_FLAG_* defines
that get set when the cap is enabled.
Signed-off-by: NSimon Ser <contact@emersion.fr>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: NDaniel Stone <daniels@collabora.com>
Acked-by: NPekka Paalanen <pekka.paalanen@collabora.com>
Link: https://patchwork.freedesktop.org/patch/434201/

88938bf3

28 5月, 2021 1 次提交

drm/fourcc: Add 16 bpc fixed point framebuffer formats. · ff92ecf5

由 Mario Kleiner 提交于 3月 19, 2021

These are 16 bits per color channel unsigned normalized formats.
They are supported by at least AMD display hw, and suitable for
direct scanout of Vulkan swapchain images in the format
VK_FORMAT_R16G16B16A16_UNORM.
Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: NMario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ff92ecf5

22 5月, 2021 1 次提交

drm/amdgpu: Add new placement for preemptible SG BOs · b453e42a

由 Felix Kuehling 提交于 5月 05, 2021

SG BOs such as dmabuf imports and userptr BOs do not consume system
resources directly. Instead they point to resources owned elsewhere.
They typically get evicted by DMABuf move notifiers of MMU notifiers.
If those notifiers don't need to wait for hardware fences (i.e. the SG
BOs are used in a preemptible context), then we don't need to limit
them to the GTT size and we don't need TTM to evict them.

Create a new placement for such preemptible SG BOs that does not impose
artificial size limits and TTM evictions.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b453e42a

19 5月, 2021 3 次提交

signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo · 922e3013

由 Eric W. Biederman 提交于 5月 03, 2021

With the addition of ssi_perf_data and ssi_perf_type struct signalfd_siginfo
is dangerously close to running out of space. All that remains is just
enough space for two additional 64bit fields. A practice of adding all
possible siginfo_t fields into struct singalfd_siginfo can not be supported
as adding the missing fields ssi_lower, ssi_upper, and ssi_pkey would
require two 64bit fields and one 32bit fields. In practice the fields
ssi_perf_data and ssi_perf_type can never be used by signalfd as the signal
that generates them always delivers them synchronously to the thread that
triggers them.

Therefore until someone actually needs the fields ssi_perf_data and
ssi_perf_type in signalfd_siginfo remove them. This leaves a bit more room
for future expansion.

v1: https://lkml.kernel.org/r/20210503203814.25487-12-ebiederm@xmission.com
v2: https://lkml.kernel.org/r/20210505141101.11519-12-ebiederm@xmission.com
Link: https://lkml.kernel.org/r/20210517195748.8880-5-ebiederm@xmission.comReviewed-by: NMarco Elver <elver@google.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

922e3013

signal: Deliver all of the siginfo perf data in _perf · 0683b531

由 Eric W. Biederman 提交于 5月 02, 2021

Don't abuse si_errno and deliver all of the perf data in _perf member
of siginfo_t.

Note: The data field in the perf data structures in a u64 to allow a
pointer to be encoded without needed to implement a 32bit and 64bit
version of the same structure. There already exists a 32bit and 64bit
versions siginfo_t, and the 32bit version can not include a 64bit
member as it only has 32bit alignment. So unsigned long is used in
siginfo_t instead of a u64 as unsigned long can encode a pointer on
all architectures linux supports.

v1: https://lkml.kernel.org/r/m11rarqqx2.fsf_-_@fess.ebiederm.org
v2: https://lkml.kernel.org/r/20210503203814.25487-10-ebiederm@xmission.com
v3: https://lkml.kernel.org/r/20210505141101.11519-11-ebiederm@xmission.com
Link: https://lkml.kernel.org/r/20210517195748.8880-4-ebiederm@xmission.comReviewed-by: NMarco Elver <elver@google.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

0683b531

siginfo: Move si_trapno inside the union inside _si_fault · add0b32e

由 Eric W. Biederman 提交于 4月 30, 2021

It turns out that linux uses si_trapno very sparingly, and as such it
can be considered extra information for a very narrow selection of
signals, rather than information that is present with every fault
reported in siginfo.

As such move si_trapno inside the union inside of _si_fault. This
results in no change in placement, and makes it eaiser
to extend _si_fault in the future as this reduces the number of
special cases. In particular with si_trapno included in the union it
is no longer a concern that the union must be pointer aligned on most
architectures because the union follows immediately after si_addr
which is a pointer.

This change results in a difference in siginfo field placement on
sparc and alpha for the fields si_addr_lsb, si_lower, si_upper,
si_pkey, and si_perf. These architectures do not implement the
signals that would use si_addr_lsb, si_lower, si_upper, si_pkey, and
si_perf. Further these architecture have not yet implemented the
userspace that would use si_perf.

The point of this change is in fact to correct these placement issues
before sparc or alpha grow userspace that cares. This change was
discussed[1] and the agreement is that this change is currently safe.

[1]: https://lkml.kernel.org/r/CAK8P3a0+uKYwL1NhY6Hvtieghba2hKYGD6hcKx5n8=4Gtt+pHA@mail.gmail.comAcked-by: NMarco Elver <elver@google.com>
v1: https://lkml.kernel.org/r/m1tunns7yf.fsf_-_@fess.ebiederm.org
v2: https://lkml.kernel.org/r/20210505141101.11519-5-ebiederm@xmission.com
Link: https://lkml.kernel.org/r/20210517195748.8880-1-ebiederm@xmission.comSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

add0b32e

10 5月, 2021 1 次提交

block: uapi: fix comment about block device ioctl · 63c8af56

由 Damien Le Moal 提交于 5月 10, 2021

Fix the comment mentioning ioctl command range used for zoned block
devices to reflect the range of commands actually implemented.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Link: https://lore.kernel.org/r/20210509234806.3000-1-damien.lemoal@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

63c8af56

08 5月, 2021 1 次提交

habanalabs: expose ASIC specific PLL index · 285c0fad

由 Bharat Jauhari 提交于 3月 25, 2021

Currently the user cannot interpret the PLL information based on index
as its exposed as an integer.

This commit exposes ASIC specific PLL indexes and maps it to a generic
FW compatible index.
Signed-off-by: NBharat Jauhari <bjauhari@habana.ai>
Reviewed-by: NOded Gabbay <ogabbay@kernel.org>
Signed-off-by: NOded Gabbay <ogabbay@kernel.org>

285c0fad

07 5月, 2021 1 次提交

treewide: remove editor modelines and cruft · fa60ce2c

由 Masahiro Yamada 提交于 5月 06, 2021

The section "19) Editor modelines and other cruft" in
Documentation/process/coding-style.rst clearly says, "Do not include any
of these in source files."

I recently receive a patch to explicitly add a new one.

Let's do treewide cleanups, otherwise some people follow the existing code
and attempt to upstream their favoriate editor setups.

It is even nicer if scripts/checkpatch.pl can check it.

If we like to impose coding style in an editor-independent manner, I think
editorconfig (patch [1]) is a saner solution.

[1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/

Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.orgSigned-off-by: NMasahiro Yamada <masahiroy@kernel.org>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>	[auxdisplay]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fa60ce2c

06 5月, 2021 4 次提交

mm/vmscan: move RECLAIM* bits to uapi header · b6676de8

由 Dave Hansen 提交于 5月 04, 2021

It is currently not obvious that the RECLAIM_* bits are part of the uapi
since they are defined in vmscan.c.  Move them to a uapi header to make it
obvious.

This should have no functional impact.

Link: https://lkml.kernel.org/r/20210219172557.08074910@viggo.jf.intel.comSigned-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NBen Widawsky <ben.widawsky@intel.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Acked-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: "Tobin C. Harding" <tobin@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b6676de8

userfaultfd: add UFFDIO_CONTINUE ioctl · f6191471

由 Axel Rasmussen 提交于 5月 04, 2021

This ioctl is how userspace ought to resolve "minor" userfaults.  The
idea is, userspace is notified that a minor fault has occurred.  It
might change the contents of the page using its second non-UFFD mapping,
or not.  Then, it calls UFFDIO_CONTINUE to tell the kernel "I have
ensured the page contents are correct, carry on setting up the mapping".

Note that it doesn't make much sense to use UFFDIO_{COPY,ZEROPAGE} for
MINOR registered VMAs.  ZEROPAGE maps the VMA to the zero page; but in
the minor fault case, we already have some pre-existing underlying page.
Likewise, UFFDIO_COPY isn't useful if we have a second non-UFFD mapping.
We'd just use memcpy() or similar instead.

It turns out hugetlb_mcopy_atomic_pte() already does very close to what
we want, if an existing page is provided via `struct page **pagep`.  We
already special-case the behavior a bit for the UFFDIO_ZEROPAGE case, so
just extend that design: add an enum for the three modes of operation,
and make the small adjustments needed for the MCOPY_ATOMIC_CONTINUE
case.  (Basically, look up the existing page, and avoid adding the
existing page to the page cache or calling set_page_huge_active() on
it.)

Link: https://lkml.kernel.org/r/20210301222728.176417-5-axelrasmussen@google.comSigned-off-by: NAxel Rasmussen <axelrasmussen@google.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f6191471

userfaultfd: add minor fault registration mode · 7677f7fd

由 Axel Rasmussen 提交于 5月 04, 2021

Patch series "userfaultfd: add minor fault handling", v9.

Overview
========

This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS.
When enabled (via the UFFDIO_API ioctl), this feature means that any
hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also*
get events for "minor" faults.  By "minor" fault, I mean the following
situation:

Let there exist two mappings (i.e., VMAs) to the same page(s) (shared
memory).  One of the mappings is registered with userfaultfd (in minor
mode), and the other is not.  Via the non-UFFD mapping, the underlying
pages have already been allocated & filled with some contents.  The UFFD
mapping has not yet been faulted in; when it is touched for the first
time, this results in what I'm calling a "minor" fault.  As a concrete
example, when working with hugetlbfs, we have huge_pte_none(), but
find_lock_page() finds an existing page.

We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE.  The idea
is, userspace resolves the fault by either a) doing nothing if the
contents are already correct, or b) updating the underlying contents using
the second, non-UFFD mapping (via memcpy/memset or similar, or something
fancier like RDMA, or etc...).  In either case, userspace issues
UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are
correct, carry on setting up the mapping".

Use Case
========

Consider the use case of VM live migration (e.g. under QEMU/KVM):

1. While a VM is still running, we copy the contents of its memory to a
   target machine. The pages are populated on the target by writing to the
   non-UFFD mapping, using the setup described above. The VM is still running
   (and therefore its memory is likely changing), so this may be repeated
   several times, until we decide the target is "up to date enough".

2. We pause the VM on the source, and start executing on the target machine.
   During this gap, the VM's user(s) will *see* a pause, so it is desirable to
   minimize this window.

3. Between the last time any page was copied from the source to the target, and
   when the VM was paused, the contents of that page may have changed - and
   therefore the copy we have on the target machine is out of date. Although we
   can keep track of which pages are out of date, for VMs with large amounts of
   memory, it is "slow" to transfer this information to the target machine. We
   want to resume execution before such a transfer would complete.

4. So, the guest begins executing on the target machine. The first time it
   touches its memory (via the UFFD-registered mapping), userspace wants to
   intercept this fault. Userspace checks whether or not the page is up to date,
   and if not, copies the updated page from the source machine, via the non-UFFD
   mapping. Finally, whether a copy was performed or not, userspace issues a
   UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents
   are correct, carry on setting up the mapping".

We don't have to do all of the final updates on-demand. The userfaultfd manager
can, in the background, also copy over updated pages once it receives the map of
which pages are up-to-date or not.

Interaction with Existing APIs
==============================

Because this is a feature, a registered VMA could potentially receive both
missing and minor faults.  I spent some time thinking through how the
existing API interacts with the new feature:

UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not
allocate a new page.  If UFFDIO_CONTINUE is used on a non-minor fault:

- For non-shared memory or shmem, -EINVAL is returned.
- For hugetlb, -EFAULT is returned.

UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults.
Without modifications, the existing codepath assumes a new page needs to
be allocated.  This is okay, since userspace must have a second
non-UFFD-registered mapping anyway, thus there isn't much reason to want
to use these in any case (just memcpy or memset or similar).

- If UFFDIO_COPY is used on a minor fault, -EEXIST is returned.
- If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL
  in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case).
- UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns
  -ENOENT in that case (regardless of the kind of fault).

Future Work
===========

This series only supports hugetlbfs.  I have a second series in flight to
support shmem as well, extending the functionality.  This series is more
mature than the shmem support at this point, and the functionality works
fully on hugetlbfs, so this series can be merged first and then shmem
support will follow.

This patch (of 6):

This feature allows userspace to intercept "minor" faults.  By "minor"
faults, I mean the following situation:

Let there exist two mappings (i.e., VMAs) to the same page(s).  One of the
mappings is registered with userfaultfd (in minor mode), and the other is
not.  Via the non-UFFD mapping, the underlying pages have already been
allocated & filled with some contents.  The UFFD mapping has not yet been
faulted in; when it is touched for the first time, this results in what
I'm calling a "minor" fault.  As a concrete example, when working with
hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing
page.

This commit adds the new registration mode, and sets the relevant flag on
the VMAs being registered.  In the hugetlb fault path, if we find that we
have huge_pte_none(), but find_lock_page() does indeed find an existing
page, then we have a "minor" fault, and if the VMA has the userfaultfd
registration flag, we call into userfaultfd to handle it.

This is implemented as a new registration mode, instead of an API feature.
This is because the alternative implementation has significant drawbacks
[1].

However, doing it this was requires we allocate a VM_* flag for the new
registration mode.  On 32-bit systems, there are no unused bits, so this
feature is only supported on architectures with
CONFIG_ARCH_USES_HIGH_VMA_FLAGS.  When attempting to register a VMA in
MINOR mode on 32-bit architectures, we return -EINVAL.

[1] https://lore.kernel.org/patchwork/patch/1380226/

[peterx@redhat.com: fix minor fault page leak]
  Link: https://lkml.kernel.org/r/20210322175132.36659-1-peterx@redhat.com

Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.comSigned-off-by: NAxel Rasmussen <axelrasmussen@google.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7677f7fd

vfio/pci: Revert nvlink removal uAPI breakage · 77b8aeb9

由 Alex Williamson 提交于 5月 04, 2021

Revert the uAPI changes from the below commit with notice that these
regions and capabilities are no longer provided.

Fixes: b392a198 ("vfio/pci: remove vfio_pci_nvlink2")
Reported-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Tested-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Message-Id: <162014341432.3807030.11054087109120670135.stgit@omen>

77b8aeb9

04 5月, 2021 4 次提交

drm/i915/uapi: implement object placement extension · 2459e56f

由 Matthew Auld 提交于 4月 29, 2021

Add new extension to support setting an immutable-priority-list of
potential placements, at creation time.

If we use the normal gem_create or gem_create_ext without the
extensions/placements then we still get the old behaviour with only
placing the object in system memory.

v2(Daniel & Jason):
    - Add a bunch of kernel-doc
    - Simplify design for placements extension

Testcase: igt/gem_create/create-ext-placement-sanity-check
Testcase: igt/gem_create/create-ext-placement-each
Testcase: igt/gem_create/create-ext-placement-all
Signed-off-by: NMatthew Auld <matthew.auld@intel.com>
Signed-off-by: NCQ Tang <cq.tang@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: mesa-dev@lists.freedesktop.org
Reviewed-by: NKenneth Graunke <kenneth@whitecape.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210429103056.407067-6-matthew.auld@intel.com

2459e56f

drm/i915/uapi: introduce drm_i915_gem_create_ext · ebcb4029

由 Matthew Auld 提交于 4月 29, 2021

Same old gem_create but with now with extensions support. This is needed
to support various upcoming usecases.

v2:(Chris)
    - Use separate ioctl number for gem_create_ext, instead of hijacking
      the existing gem_create ioctl, otherwise we run into the issue
      with being unable to detect if the kernel supports the new extension
      behaviour.
    - We now have gem_create_ext.flags, which should be zeroed.
    - I915_GEM_CREATE_EXT_SETPARAM value is now zero, since this is the
      index into our array of extensions.
    - Setup a "vanilla" object which we can directly apply our extensions
      to.
v3:(Daniel & Jason)
    - drop I915_GEM_CREATE_EXT_SETPARAM. Instead just have each extension
      do one thing only, instead of generic setparam which can cover
      various use cases.
    - add some kernel-doc.
Signed-off-by: NMatthew Auld <matthew.auld@intel.com>
Signed-off-by: NCQ Tang <cq.tang@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: mesa-dev@lists.freedesktop.org
Reviewed-by: NKenneth Graunke <kenneth@whitecape.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210429103056.407067-5-matthew.auld@intel.com

ebcb4029

drm/i915/query: Expose memory regions through the query uAPI · 71021729

由 Abdiel Janulgue 提交于 4月 29, 2021

Returns the available memory region areas supported by the HW.

v2(Daniel & Jason):
    - Add some kernel-doc, including example usage.
    - Drop all the extra rsvd
v3(Jason & Tvrtko)
    - add back rsvd
Signed-off-by: NAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: NMatthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: mesa-dev@lists.freedesktop.org
Reviewed-by: NKenneth Graunke <kenneth@whitecape.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210429103056.407067-3-matthew.auld@intel.com

71021729

netfilter: xt_SECMARK: add new revision to fix structure layout · c7d13358

由 Pablo Neira Ayuso 提交于 4月 30, 2021

This extension breaks when trying to delete rules, add a new revision to
fix this.

Fixes: 5e6874cd ("[SECMARK]: Add xtables SECMARK target")
Signed-off-by: NPhil Sutter <phil@nwl.cc>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c7d13358

03 5月, 2021 1 次提交

drm/connector: demote connector force-probes for non-master clients · 8f86c82a

由 Simon Ser 提交于 4月 02, 2021

Force-probing a connector can be slow and cause flickering. As this
affects the global KMS state, let's make it so only the DRM master
can force-probe a connector.

Non-master DRM clients won't be able to force-probe a connector
anymore. Instead, KMS will perform a regular read-only connector
query.
Signed-off-by: NSimon Ser <contact@emersion.fr>
Acked-by: NPekka Paalanen <pekka.paalanen@collabora.com>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210402112212.5625-1-contact@emersion.fr

8f86c82a

30 4月, 2021 1 次提交

seg6: add counters support for SRv6 Behaviors · 94604548

由 Andrea Mayer 提交于 4月 27, 2021

This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:

 - the total number of packets that have been correctly processed;
 - the total amount of traffic in bytes of all packets that have been
   correctly processed;

In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.

Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.

Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.

When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.

[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters

Troubleshooting using SRv6 Behavior counters
--------------------------------------------

Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:

  (i) arrived at the node;
 (ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.

The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.

Turning on/off counters for SRv6 Behaviors
------------------------------------------

Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:

 $ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0

per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:

 $ ip -s -6 route show 2001:db8::1
 2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Impact of counters for SRv6 Behaviors on performance
====================================================

To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.

We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.

Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:

 1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
    instance of an SRv6 End.DX2 Behavior;
 2) patched kernel with SRv6 Behavior counters and a single instance of
    an SRv6 End.DX2 Behavior with counters turned off;
 3) patched kernel with SRv6 Behavior counters and a single instance of
    SRv6 End.DX2 Behavior with counters turned on.

All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.

Results of tests are shown in the following table:

Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps

As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).

Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.

[2] https://www.cloudlab.usSigned-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94604548

29 4月, 2021 1 次提交

icmp: standardize naming of RFC 8335 PROBE constants · e542d29c

由 Andreas Roeseler 提交于 4月 27, 2021

The current definitions of constants for PROBE, currently defined only
in the net-next kernel branch, are inconsistent, with
some beginning with ICMP and others with simply EXT. This patch
attempts to standardize the naming conventions of the constants for
PROBE before their release into a stable Kernel, and to update the
relevant definitions in net/ipv4/icmp.c.

Similarly, the definitions for the code field (previously
ICMP_EXT_MAL_QUERY, etc) use the same prefixes as the type field. This
patch adds _CODE_ to the prefix to clarify the distinction of these
constants.
Signed-off-by: NAndreas Roeseler <andreas.a.roeseler@gmail.com>
Acked-by: NDavid Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210427153635.2591-1-andreas.a.roeseler@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

e542d29c

28 4月, 2021 2 次提交

RDMA/nldev: Add copy-on-fork attribute to get sys command · 6cc9e215

由 Gal Pressman 提交于 4月 18, 2021

The new attribute indicates that the kernel copies DMA pages on fork,
hence libibverbs' fork support through madvise and MADV_DONTFORK is not
needed.

The introduced attribute is always reported as supported since the kernel
has the patch that added the copy-on-fork behavior. This allows the
userspace library to identify older vs newer kernel versions. Extra care
should be taken when backporting this patch as it relies on the fact that
the copy-on-fork patch is merged, hence no check for support is added.

Don't backport this patch unless you also have the following series:
commit 70e806e4 ("mm: Do early cow for pinned pages during fork() for
ptes") and commit 4eae4efa ("hugetlb: do early cow when page pinned on
src mm").

Fixes: 70e806e4 ("mm: Do early cow for pinned pages during fork() for ptes")
Fixes: 4eae4efa ("hugetlb: do early cow when page pinned on src mm")
Link: https://lore.kernel.org/r/20210418121025.66849-1-galpress@amazon.comSigned-off-by: NGal Pressman <galpress@amazon.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

6cc9e215

netfilter: nftables: add catch-all set element support · aaa31047

由 Pablo Neira Ayuso 提交于 4月 27, 2021

This patch extends the set infrastructure to add a special catch-all set
element. If the lookup fails to find an element (or range) in the set,
then the catch-all element is selected. Users can specify a mapping,
expression(s) and timeout to be attached to the catch-all element.

This patch adds a catchall list to the set, this list might contain more
than one single catch-all element (e.g. in case that the catch-all
element is removed and a new one is added in the same transaction).
However, most of the time, there will be either one element or no
elements at all in this list.

The catch-all element is identified via NFT_SET_ELEM_CATCHALL flag and
such special element has no NFTA_SET_ELEM_KEY attribute. There is a new
nft_set_elem_catchall object that stores a reference to the dummy
catch-all element (catchall->elem) whose layout is the same of the set
element type to reuse the existing set element codebase.

The set size does not apply to the catch-all element, users can define a
catch-all element even if the set is full.

The check for valid set element flags hava been updates to report
EOPNOTSUPP in case userspace requests flags that are not supported when
using new userspace nftables and old kernel.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

aaa31047

26 4月, 2021 7 次提交

RISC-V: Add EM_RISCV to kexec UAPI header · d83e682e

由 Nick Kossifidis 提交于 4月 19, 2021

Add RISC-V to the list of supported kexec architectures, we need to
add the definition early-on so that later patches can use it.

EM_RISCV is 243 as per ELF psABI specification here:
https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.mdSigned-off-by: NNick Kossifidis <mick@ics.forth.gr>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

d83e682e

macvlan: Add nodst option to macvlan type source · 427f0c8c

由 Jethro Beekman 提交于 4月 25, 2021

The default behavior for source MACVLAN is to duplicate packets to
appropriate type source devices, and then do the normal destination MACVLAN
flow. This patch adds an option to skip destination MACVLAN processing if
any matching source MACVLAN device has the option set.

This allows setting up a "catch all" device for source MACVLAN: create one
or more devices with type source nodst, and one device with e.g. type vepa,
and incoming traffic will be received on exactly one device.

v2: netdev wants non-standard line length
Signed-off-by: NJethro Beekman <kernel@jbeekman.nl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

427f0c8c

netfilter: nft_socket: add support for cgroupsv2 · e0bb96db

由 Pablo Neira Ayuso 提交于 4月 21, 2021

Allow to match on the cgroupsv2 id from ancestor level.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

e0bb96db

io_uring: add full-fledged dynamic buffers support · 634d00df

由 Pavel Begunkov 提交于 4月 25, 2021

Hook buffers into all rsrc infrastructure, including tagging and
updates.
Suggested-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/119ed51d68a491dae87eb55fb467a47870c86aad.1619356238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

634d00df

io_uring: add generic rsrc update with tags · c3bdad02

由 Pavel Begunkov 提交于 4月 25, 2021

Add IORING_REGISTER_RSRC_UPDATE, which also supports passing in rsrc
tags. Implement it for registered files.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d4dc66df204212f64835ffca2c4eb5e8363f2f05.1619356238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c3bdad02

io_uring: add IORING_REGISTER_RSRC · 792e3582

由 Pavel Begunkov 提交于 4月 25, 2021

Add a new io_uring_register() opcode for rsrc registeration. Instead of
accepting a pointer to resources, fds or iovecs, it @arg is now pointing
to a struct io_uring_rsrc_register, and the second argument tells how
large that struct is to make it easily extendible by adding new fields.

All that is done mainly to be able to pass in a pointer with tags. Pass
it in and enable CQE posting for file resources. Doesn't support setting
tags on update yet.

A design choice made here is to not post CQEs on rsrc de-registration,
but only when we updated-removed it by rsrc dynamic update.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c498aaec32a4bb277b2406b9069662c02cdda98c.1619356238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

792e3582

io_uring: enumerate dynamic resources · fdecb662

由 Pavel Begunkov 提交于 4月 25, 2021

As resources are getting more support and common parts, it'll be more
convenient to index resources and use it for indexing.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f0be63e9310212d5601d36277c2946ff7a040485.1619356238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

fdecb662

24 4月, 2021 1 次提交

drm/amdgpu: remove AMDGPU_GEM_CREATE_SHADOW flag · 42daecfc

由 Nirmoy Das 提交于 4月 22, 2021

Remove unused AMDGPU_GEM_CREATE_SHADOW flag.
Userspace is never allowed to use this.
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

42daecfc

23 4月, 2021 2 次提交

signal, perf: Fix siginfo_t by avoiding u64 on 32-bit architectures · 3ddb3fd8

由 Marco Elver 提交于 4月 22, 2021

The alignment of a structure is that of its largest member. On
architectures like 32-bit Arm (but not e.g. 32-bit x86) 64-bit integers
will require 64-bit alignment and not its natural word size.

This means that there is no portable way to add 64-bit integers to
siginfo_t on 32-bit architectures without breaking the ABI, because
siginfo_t does not yet (and therefore likely never will) contain 64-bit
fields on 32-bit architectures. Adding a 64-bit integer could change the
alignment of the union after the 3 initial int si_signo, si_errno,
si_code, thus introducing 4 bytes of padding shifting the entire union,
which would break the ABI.

One alternative would be to use the __packed attribute, however, it is
non-standard C. Given siginfo_t has definitions outside the Linux kernel
in various standard libraries that can be compiled with any number of
different compilers (not just those we rely on), using non-standard
attributes on siginfo_t should be avoided to ensure portability.

In the case of the si_perf field, word size is sufficient since there is
no exact requirement on size, given the data it contains is user-defined
via perf_event_attr::sig_data. On 32-bit architectures, any excess bits
of perf_event_attr::sig_data will therefore be truncated when copying
into si_perf.

Since si_perf is intended to disambiguate events (e.g. encoding relevant
information if there are more events of the same type), 32 bits should
provide enough entropy to do so on 32-bit architectures.

For 64-bit architectures, no change is intended.

Fixes: fb6cc127 ("signal: Introduce TRAP_PERF si_code and si_perf to siginfo")
Reported-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Reported-by: NJon Hunter <jonathanh@nvidia.com>
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Tested-by: NJon Hunter <jonathanh@nvidia.com>
Link: https://lkml.kernel.org/r/20210422191823.79012-1-elver@google.com

3ddb3fd8

landlock: Enable user space to infer supported features · 3532b0b4

由 Mickaël Salaün 提交于 4月 22, 2021

Add a new flag LANDLOCK_CREATE_RULESET_VERSION to
landlock_create_ruleset(2). This enables to retreive a Landlock ABI
version that is useful to efficiently follow a best-effort security
approach. Indeed, it would be a missed opportunity to abort the whole
sandbox building, because some features are unavailable, instead of
protecting users as much as possible with the subset of features
provided by the running kernel.

This new flag enables user space to identify the minimum set of Landlock
features supported by the running kernel without relying on a filesystem
interface (e.g. /proc/version, which might be inaccessible) nor testing
multiple syscall argument combinations (i.e. syscall bisection). New
Landlock features will be documented and tied to a minimum version
number (greater than 1). The current version will be incremented for
each new kernel release supporting new Landlock features. User space
libraries can leverage this information to seamlessly restrict processes
as much as possible while being compatible with newer APIs.

This is a much more lighter approach than the previous
landlock_get_features(2): the complexity is pushed to user space
libraries. This flag meets similar needs as securityfs versions:
selinux/policyvers, apparmor/features/*/version* and tomoyo/version.

Supporting this flag now will be convenient for backward compatibility.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: NMickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210422154123.13086-14-mic@digikod.netSigned-off-by: NJames Morris <jamorris@linux.microsoft.com>

3532b0b4

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功