提交 · 361dd63ed5f1103ff21e22408a142a7d40804527 · openeuler / Kernel

07 12月, 2022 1 次提交

lsm: Clarify documentation of vm_enough_memory hook · 361dd63e

由 Roberto Sassu 提交于 11月 28, 2022

include/linux/lsm_hooks.h reports the result of the LSM infrastructure to
the callers, not what LSMs should return to the LSM infrastructure.

Clarify that and add that if all LSMs return a positive value
__vm_enough_memory() will be called with cap_sys_admin set. If at least one
LSM returns 0 or negative, it will be called with cap_sys_admin cleared.
Signed-off-by: NRoberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

361dd63e

19 11月, 2022 1 次提交

lsm,fs: fix vfs_getxattr_alloc() return type and caller error paths · f6fbd8cb

由 Paul Moore 提交于 11月 09, 2022

The vfs_getxattr_alloc() function currently returns a ssize_t value
despite the fact that it only uses int values internally for return
values.  Fix this by converting vfs_getxattr_alloc() to return an
int type and adjust the callers as necessary.  As part of these
caller modifications, some of the callers are fixed to properly free
the xattr value buffer on both success and failure to ensure that
memory is not leaked in the failure case.
Reviewed-by: NSerge Hallyn <serge@hallyn.com>
Reviewed-by: NMimi Zohar <zohar@linux.ibm.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

f6fbd8cb

05 11月, 2022 1 次提交

lsm: make security_socket_getpeersec_stream() sockptr_t safe · b10b9c34

由 Paul Moore 提交于 10月 10, 2022

Commit 4ff09db1 ("bpf: net: Change sk_getsockopt() to take the
sockptr_t argument") made it possible to call sk_getsockopt()
with both user and kernel address space buffers through the use of
the sockptr_t type.  Unfortunately at the time of conversion the
security_socket_getpeersec_stream() LSM hook was written to only
accept userspace buffers, and in a desire to avoid having to change
the LSM hook the commit author simply passed the sockptr_t's
userspace buffer pointer.  Since the only sk_getsockopt() callers
at the time of conversion which used kernel sockptr_t buffers did
not allow SO_PEERSEC, and hence the
security_socket_getpeersec_stream() hook, this was acceptable but
also very fragile as future changes presented the possibility of
silently passing kernel space pointers to the LSM hook.

There are several ways to protect against this, including careful
code review of future commits, but since relying on code review to
catch bugs is a recipe for disaster and the upstream eBPF maintainer
is "strongly against defensive programming", this patch updates the
LSM hook, and all of the implementations to support sockptr_t and
safely handle both user and kernel space buffers.
Acked-by: NCasey Schaufler <casey@schaufler-ca.com>
Acked-by: NJohn Johansen <john.johansen@canonical.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

b10b9c34

26 10月, 2022 1 次提交

lsm: remove obsoleted comments for security hooks · 3437d67a

由 Gaosheng Cui 提交于 10月 25, 2022

Remove the following obsoleted comments for security hooks:

1. sb_copy_data, the hook function has been removed since
commit 5b400239 ("LSM: turn sb_eat_lsm_opts() into a method").

2. sb_parse_opts_str, the hook function has been removed since
commit 757cbe59 ("LSM: new method: ->sb_add_mnt_opt()").

They are obsoleted comments, so remove them.
Signed-off-by: NGaosheng Cui <cuigaosheng1@huawei.com>
Reviewed-by: NCasey Schaufler <casey@schaufler-ca.com>
[PM: subj line tweaks]
Signed-off-by: NPaul Moore <paul@paul-moore.com>

3437d67a

18 10月, 2022 1 次提交

fs: edit a comment made in bad taste · 3b87d9f4

由 Paul Moore 提交于 10月 17, 2022

I know nobody likes a buzzkill, but I figure it's best to keep the
bad jokes appropriate for small children.
Signed-off-by: NPaul Moore <paul@paul-moore.com>

3b87d9f4

17 10月, 2022 1 次提交

Revert "cpumask: fix checking valid cpu range". · 80493877

由 Tetsuo Handa 提交于 10月 16, 2022

This reverts commit 78e5a339 ("cpumask: fix checking valid cpu range").

syzbot is hitting WARN_ON_ONCE(cpu >= nr_cpumask_bits) warning at
cpu_max_bits_warn() [1], for commit 78e5a339 ("cpumask: fix checking
valid cpu range") is broken.  Obviously that patch hits WARN_ON_ONCE()
when e.g.  reading /proc/cpuinfo because passing "cpu + 1" instead of
"cpu" will trivially hit cpu == nr_cpumask_bits condition.

Although syzbot found this problem in linux-next.git on 2022/09/27 [2],
this problem was not fixed immediately.  As a result, that patch was
sent to linux.git before the patch author recognizes this problem, and
syzbot started failing to test changes in linux.git since 2022/10/10
[3].

Andrew Jones proposed a fix for x86 and riscv architectures [4].  But
[2] and [5] indicate that affected locations are not limited to arch
code.  More delay before we find and fix affected locations, less tested
kernel (and more difficult to bisect and fix) before release.

We should have inspected and fixed basically all cpumask users before
applying that patch.  We should not crash kernels in order to ask
existing cpumask users to update their code, even if limited to
CONFIG_DEBUG_PER_CPU_MAPS=y case.

Link: https://syzkaller.appspot.com/bug?extid=d0fd2bf0dd6da72496dd [1]
Link: https://syzkaller.appspot.com/bug?extid=21da700f3c9f0bc40150 [2]
Link: https://syzkaller.appspot.com/bug?extid=51a652e2d24d53e75734 [3]
Link: https://lkml.kernel.org/r/20221014155845.1986223-1-ajones@ventanamicro.com [4]
Link: https://syzkaller.appspot.com/bug?extid=4d46c43d81c3bd155060 [5]
Reported-by: NAndrew Jones <ajones@ventanamicro.com>
Reported-by: syzbot+d0fd2bf0dd6da72496dd@syzkaller.appspotmail.com
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

80493877

16 10月, 2022 1 次提交

mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation · e36ce448

由 Hyeonggon Yoo 提交于 10月 15, 2022

After commit d6a71648 ("mm/slab: kmalloc: pass requests larger than
order-1 page to page allocator"), SLAB passes large ( > PAGE_SIZE * 2)
requests to buddy like SLUB does.

SLAB has been using kmalloc caches to allocate freelist_idx_t array for
off slab caches. But after the commit, freelist_size can be bigger than
KMALLOC_MAX_CACHE_SIZE.

Instead of using pointer to kmalloc cache, use kmalloc_node() and only
check if the kmalloc cache is off slab during calculate_slab_order().
If freelist_size > KMALLOC_MAX_CACHE_SIZE, no looping condition happens
as it allocates freelist_idx_t array directly from buddy.

Link: https://lore.kernel.org/all/20221014205818.GA1428667@roeck-us.net/Reported-and-tested-by: NGuenter Roeck <linux@roeck-us.net>
Fixes: d6a71648 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator")
Signed-off-by: NHyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: NVlastimil Babka <vbabka@suse.cz>

e36ce448

15 10月, 2022 1 次提交

clk: at91: fix the build with binutils 2.27 · 57d84963

由 Kefeng Wang 提交于 10月 12, 2022

There is an issue when build with older versions of binutils 2.27.0,

arch/arm/mach-at91/pm_suspend.S: Assembler messages:
arch/arm/mach-at91/pm_suspend.S:1086: Error: garbage following instruction -- `ldr tmp1,=0x00020010UL'

Use UL() macro to fix the issue in assembly file.

Fixes: 4fd36e45 ("ARM: at91: pm: add plla disable/enable support for sam9x60")
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20221012030635.13140-1-wangkefeng.wang@huawei.comSigned-off-by: NStephen Boyd <sboyd@kernel.org>

57d84963

13 10月, 2022 7 次提交

mm/migrate_device.c: add migrate_device_range() · e778406b

由 Alistair Popple 提交于 9月 28, 2022

Device drivers can use the migrate_vma family of functions to migrate
existing private anonymous mappings to device private pages.  These pages
are backed by memory on the device with drivers being responsible for
copying data to and from device memory.

Device private pages are freed via the pgmap->page_free() callback when
they are unmapped and their refcount drops to zero.  Alternatively they
may be freed indirectly via migration back to CPU memory in response to a
pgmap->migrate_to_ram() callback called whenever the CPU accesses an
address mapped to a device private page.

In other words drivers cannot control the lifetime of data allocated on
the devices and must wait until these pages are freed from userspace. 
This causes issues when memory needs to reclaimed on the device, either
because the device is going away due to a ->release() callback or because
another user needs to use the memory.

Drivers could use the existing migrate_vma functions to migrate data off
the device.  However this would require them to track the mappings of each
page which is both complicated and not always possible.  Instead drivers
need to be able to migrate device pages directly so they can free up
device memory.

To allow that this patch introduces the migrate_device family of functions
which are functionally similar to migrate_vma but which skips the initial
lookup based on mapping.

Link: https://lkml.kernel.org/r/868116aab70b0c8ee467d62498bb2cf0ef907295.1664366292.git-series.apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

e778406b

mm: free device private pages have zero refcount · ef233450

由 Alistair Popple 提交于 9月 28, 2022

Since 27674ef6 ("mm: remove the extra ZONE_DEVICE struct page
refcount") device private pages have no longer had an extra reference
count when the page is in use.  However before handing them back to the
owning device driver we add an extra reference count such that free pages
have a reference count of one.

This makes it difficult to tell if a page is free or not because both free
and in use pages will have a non-zero refcount.  Instead we should return
pages to the drivers page allocator with a zero reference count.  Kernel
code can then safely use kernel functions such as get_page_unless_zero().

Link: https://lkml.kernel.org/r/cf70cf6f8c0bdb8aaebdbfb0d790aea4c683c3c6.1664366292.git-series.apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com>
Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

ef233450

mm/memory.c: fix race when faulting a device private page · 16ce101d

由 Alistair Popple 提交于 9月 28, 2022

Patch series "Fix several device private page reference counting issues",
v2

This series aims to fix a number of page reference counting issues in
drivers dealing with device private ZONE_DEVICE pages.  These result in
use-after-free type bugs, either from accessing a struct page which no
longer exists because it has been removed or accessing fields within the
struct page which are no longer valid because the page has been freed.

During normal usage it is unlikely these will cause any problems.  However
without these fixes it is possible to crash the kernel from userspace. 
These crashes can be triggered either by unloading the kernel module or
unbinding the device from the driver prior to a userspace task exiting. 
In modules such as Nouveau it is also possible to trigger some of these
issues by explicitly closing the device file-descriptor prior to the task
exiting and then accessing device private memory.

This involves some minor changes to both PowerPC and AMD GPU code. 
Unfortunately I lack hardware to test either of those so any help there
would be appreciated.  The changes mimic what is done in for both Nouveau
and hmm-tests though so I doubt they will cause problems.


This patch (of 8):

When the CPU tries to access a device private page the migrate_to_ram()
callback associated with the pgmap for the page is called.  However no
reference is taken on the faulting page.  Therefore a concurrent migration
of the device private page can free the page and possibly the underlying
pgmap.  This results in a race which can crash the kernel due to the
migrate_to_ram() function pointer becoming invalid.  It also means drivers
can't reliably read the zone_device_data field because the page may have
been freed with memunmap_pages().

Close the race by getting a reference on the page while holding the ptl to
ensure it has not been freed.  Unfortunately the elevated reference count
will cause the migration required to handle the fault to fail.  To avoid
this failure pass the faulting page into the migrate_vma functions so that
if an elevated reference count is found it can be checked to see if it's
expected or not.

[mpe@ellerman.id.au: fix build]
  Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au
Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com
Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com>
Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

16ce101d

mm/damon: move sz_damon_region to damon_sz_region · 652e0446

由 Xin Hao 提交于 9月 27, 2022

Rename sz_damon_region() to damon_sz_region(), and move it to
"include/linux/damon.h", because in many places, we can to use this func.

Link: https://lkml.kernel.org/r/20220927001946.85375-1-xhao@linux.alibaba.comSigned-off-by: NXin Hao <xhao@linux.alibaba.com>
Suggested-by: NSeongJae Park <sj@kernel.org>
Reviewed-by: NSeongJae Park <sj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

652e0446

mm: more vma cache removal · 7be1c1a3

由 Alexey Dobriyan 提交于 10月 11, 2022

Link: https://lkml.kernel.org/r/Y0WuE3Riv4iy5Jx8@localhost.localdomain
Fixes: 7964cf8c ("mm: remove vmacache")
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NLiam Howlett <liam.howlett@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

7be1c1a3

io_uring: remove notif leftovers · b7a81775

由 Pavel Begunkov 提交于 10月 04, 2022

Notifications were killed but there is a couple of fields and struct
declarations left, remove them.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8df8877d677be5a2b43afd936d600e60105ea960.1664849941.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b7a81775

io_uring/af_unix: defer registered files gc to io_uring release · 0091bfc8

由 Pavel Begunkov 提交于 10月 03, 2022

Instead of putting io_uring's registered files in unix_gc() we want it
to be done by io_uring itself. The trick here is to consider io_uring
registered files for cycle detection but not actually putting them down.
Because io_uring can't register other ring instances, this will remove
all refs to the ring file triggering the ->release path and clean up
with io_ring_ctx_free().

Cc: stable@vger.kernel.org
Fixes: 6b06314c ("io_uring: add file set registration")
Reported-and-tested-by: NDavid Bouman <dbouman03@gmail.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
[axboe: add kerneldoc comment to skb, fold in skb leak fix]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0091bfc8

12 10月, 2022 4 次提交

mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page · fac35ba7

由 Baolin Wang 提交于 9月 01, 2022

On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and
1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.

So when looking up a CONT-PTE size hugetlb page by follow_page(), it will
use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size
hugetlb in follow_page_pte(). However this pte entry lock is incorrect
for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get
the correct lock, which is mm->page_table_lock.

That means the pte entry of the CONT-PTE size hugetlb under current pte
lock is unstable in follow_page_pte(), we can continue to migrate or
poison the pte entry of the CONT-PTE size hugetlb, which can cause some
potential race issues, even though they are under the 'pte lock'.

For example, suppose thread A is trying to look up a CONT-PTE size hugetlb
page by move_pages() syscall under the lock, however antoher thread B can
migrate the CONT-PTE hugetlb page at the same time, which will cause
thread A to get an incorrect page, if thread A also wants to do page
migration, then data inconsistency error occurs.

Moreover we have the same issue for CONT-PMD size hugetlb in
follow_huge_pmd().

To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte()
to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to
get the correct pte entry lock to make the pte entry stable.

Mike said:

Support for CONT_PMD/_PTE was added with bb9dd3df ("arm64: hugetlb:
refactor find_num_contig()"). Patch series "Support for contiguous pte
hugepages", v4. However, I do not believe these code paths were
executed until migration support was added with 5480280d ("arm64/mm:
enable HugeTLB migration for contiguous bit HugeTLB pages") I would go
with 5480280d for the Fixes: targe.

Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.1662017562.git.baolin.wang@linux.alibaba.com
Fixes: 5480280d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages")
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: NMike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

fac35ba7

include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype · 6a961bff

由 Tiezhu Yang 提交于 9月 02, 2022

The argument has_signal of arch_do_signal_or_restart() has been removed in
commit 8ba62d37 ("task_work: Call tracehook_notify_signal from
get_signal on all architectures"), let us remove the related comment.

Link: https://lkml.kernel.org/r/1662090106-5545-1-git-send-email-yangtiezhu@loongson.cn
Fixes: 8ba62d37 ("task_work: Call tracehook_notify_signal from get_signal on all architectures")
Signed-off-by: NTiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

6a961bff

prandom: remove unused functions · de492c83

由 Jason A. Donenfeld 提交于 10月 05, 2022

With no callers left of prandom_u32() and prandom_bytes(), as well as
get_random_int(), remove these deprecated wrappers, in favor of
get_random_u32() and get_random_bytes().
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NYury Norov <yury.norov@gmail.com>
Acked-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

de492c83

treewide: use prandom_u32_max() when possible, part 1 · 81895a65

由 Jason A. Donenfeld 提交于 10月 05, 2022

Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:

@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)

@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@

-       RAND = get_random_u32();
        ... when != RAND
-       RAND %= (E);
+       RAND = prandom_u32_max(E);

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

        ((T)get_random_u32()@p & (LITERAL))

// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
        value = int(literal, 16)
elif literal[0] in '123456789':
        value = int(literal, 10)
if value is None:
        print("I don't know how to handle %s" % (literal))
        cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
        print("Skipping 0x%x for cleanup elsewhere" % (value))
        cocci.include_match(False)
elif value & (value + 1) != 0:
        print("Skipping 0x%x because it's not a power of two minus one" % (value))
        cocci.include_match(False)
elif literal.startswith('0x'):
        coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
        coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@

-       (FUNC()@p & (LITERAL))
+       prandom_u32_max(RESULT)

@collapse_ret@
type T;
identifier VAR;
expression E;
@@

 {
-       T VAR;
-       VAR = (E);
-       return VAR;
+       return E;
 }

@drop_var@
type T;
identifier VAR;
@@

 {
-       T VAR;
        ... when != VAR
 }
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NYury Norov <yury.norov@gmail.com>
Reviewed-by: NKP Singh <kpsingh@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
Acked-by: NJakub Kicinski <kuba@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

81895a65

10 10月, 2022 2 次提交

block: avoid sign extend problem with default queue flags mask · ca5eebda

由 Brian Foster 提交于 10月 03, 2022

request_queue->queue_flags is unsigned long, which is 8-bytes on
64-bit architectures. Most queue flag modifications occur through
bit field helpers, but default flags can be logically OR'd via the
QUEUE_FLAG_MQ_DEFAULT mask. If this mask happens to include bit 31,
the assignment can sign extend the field and set all upper 32 bits.

This exact problem has been observed on a downstream kernel that
happens to use bit 31 for QUEUE_FLAG_NOWAIT. This is not an
immediate problem for current upstream because bit 31 is not
included in the default flag assignment (and is not used at all,
actually). Regardless, fix up the QUEUE_FLAG_MQ_DEFAULT mask
definition to avoid the landmine in the future.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221003133534.1075582-1-bfoster@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ca5eebda

clk: spear: Move prototype to accessible header · 9afd20a5

由 Viresh Kumar 提交于 10月 10, 2022

Fixes the following W=1 kernel build warning(s):

drivers/clk/spear/spear6xx_clock.c:116:13: warning: no previous prototype for function 'spear6xx_clk_init' [-Wmissing-prototypes]
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

9afd20a5

07 10月, 2022 4 次提交

vfio: Add vfio_file_is_group() · 4b22ef04

由 Jason Gunthorpe 提交于 10月 07, 2022

This replaces uses of vfio_file_iommu_group() which were only detecting if
the file is a VFIO file with no interest in the actual group.

The only remaning user of vfio_file_iommu_group() is in KVM for the SPAPR
stuff. It passes the iommu_group into the arch code through kvm for some
reason.
Tested-by: NMatthew Rosato <mjrosato@linux.ibm.com>
Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Tested-by: NEric Farman <farman@linux.ibm.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v2-15417f29324e+1c-vfio_group_disassociate_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

4b22ef04

vdpa: device feature provisioning · 90fea5a8

由 Jason Wang 提交于 9月 27, 2022

This patch allows the device features to be provisioned through
netlink. A new attribute is introduced to allow the userspace to pass
a 64bit device features during device adding.

This provides several advantages:

- Allow to provision a subset of the features to ease the cross vendor
  live migration.
- Better debug-ability for vDPA framework and parent.
Reviewed-by: NEli Cohen <elic@nvidia.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Message-Id: <20220927074810.28627-2-jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

90fea5a8

virtio: drop vp_legacy_set_queue_size · cdbd952b

由 Michael S. Tsirkin 提交于 8月 15, 2022

There's actually no way to set queue size on legacy virtio pci.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Message-Id: <20220815220447.155860-1-mst@redhat.com>

cdbd952b

wifi: wext: use flex array destination for memcpy() · e3e6e1d1

由 Hawkins Jiawei 提交于 9月 27, 2022

Syzkaller reports buffer overflow false positive as follows:
------------[ cut here ]------------
memcpy: detected field-spanning write (size 8) of single field
	"&compat_event->pointer" at net/wireless/wext-core.c:623 (size 4)
WARNING: CPU: 0 PID: 3607 at net/wireless/wext-core.c:623
	wireless_send_event+0xab5/0xca0 net/wireless/wext-core.c:623
Modules linked in:
CPU: 1 PID: 3607 Comm: syz-executor659 Not tainted
	6.0.0-rc6-next-20220921-syzkaller #0
[...]
Call Trace:
 <TASK>
 ioctl_standard_call+0x155/0x1f0 net/wireless/wext-core.c:1022
 wireless_process_ioctl+0xc8/0x4c0 net/wireless/wext-core.c:955
 wext_ioctl_dispatch net/wireless/wext-core.c:988 [inline]
 wext_ioctl_dispatch net/wireless/wext-core.c:976 [inline]
 wext_handle_ioctl+0x26b/0x280 net/wireless/wext-core.c:1049
 sock_ioctl+0x285/0x640 net/socket.c:1220
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:870 [inline]
 __se_sys_ioctl fs/ioctl.c:856 [inline]
 __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:856
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
 [...]
 </TASK>

Wireless events will be sent on the appropriate channels in
wireless_send_event(). Different wireless events may have different
payload structure and size, so kernel uses **len** and **cmd** field
in struct __compat_iw_event as wireless event common LCP part, uses
**pointer** as a label to mark the position of remaining different part.

Yet the problem is that, **pointer** is a compat_caddr_t type, which may
be smaller than the relative structure at the same position. So during
wireless_send_event() tries to parse the wireless events payload, it may
trigger the memcpy() run-time destination buffer bounds checking when the
relative structure's data is copied to the position marked by **pointer**.

This patch solves it by introducing flexible-array field **ptr_bytes**,
to mark the position of the wireless events remaining part next to
LCP part. What's more, this patch also adds **ptr_len** variable in
wireless_send_event() to improve its maintainability.

Reported-and-tested-by: syzbot+473754e5af963cf014cf@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/00000000000070db2005e95a5984@google.com/Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NHawkins Jiawei <yin31149@gmail.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

e3e6e1d1

06 10月, 2022 8 次提交

random: clear new batches when bringing new CPUs online · a890d1c6

由 Jason A. Donenfeld 提交于 10月 05, 2022

The commit that added the new get_random_{u8,u16}() functions neglected
to update the code that clears the batches when bringing up a new CPU.
It also forgot a few comments and helper defines, so add those in too.

Fixes: 585cd5fe ("random: add 8-bit and 16-bit batches")
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

a890d1c6

SUNRPC: Add API to force the client to disconnect · dc4c4304

由 Trond Myklebust 提交于 10月 05, 2022

Allow the caller to force a disconnection of the RPC client so that we
can clear any pending requests that are buffered in the socket.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

dc4c4304

SUNRPC: Add a helper to allow pNFS drivers to selectively cancel RPC calls · f8423909

由 Trond Myklebust 提交于 10月 05, 2022

Add the helper rpc_cancel_tasks(), which uses a caller-defined selection
function to define a set of in-flight RPC calls to cancel. This is
mainly intended for pNFS drivers which are subject to a layout recall,
and which may therefore want to cancel all pending I/O using that layout
in order to redrive it after the layout recall has been satisfied.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f8423909

SUNRPC: Fix races with rpc_killall_tasks() · 39494194

由 Trond Myklebust 提交于 10月 05, 2022

Ensure that we immediately call rpc_exit_task() after waking up, and
that the tk_rpc_status cannot get clobbered by some other function.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

39494194

cpumask: Introduce for_each_cpu_andnot() · 5f75ff29

由 Valentin Schneider 提交于 10月 03, 2022

for_each_cpu_and() is very convenient as it saves having to allocate a
temporary cpumask to store the result of cpumask_and(). The same issue
applies to cpumask_andnot() which doesn't actually need temporary storage
for iteration purposes.

Following what has been done for for_each_cpu_and(), introduce
for_each_cpu_andnot().
Signed-off-by: NValentin Schneider <vschneid@redhat.com>

5f75ff29

lib/find_bit: Introduce find_next_andnot_bit() · 90d48290

由 Valentin Schneider 提交于 10月 03, 2022

In preparation of introducing for_each_cpu_andnot(), add a variant of
find_next_bit() that negate the bits in @addr2 when ANDing them with the
bits in @addr1.
Signed-off-by: NValentin Schneider <vschneid@redhat.com>

90d48290

mmc: core: Add SD card quirk for broken discard · 07d2872b

由 Avri Altman 提交于 9月 28, 2022

Some SD-cards from Sandisk that are SDA-6.0 compliant reports they supports
discard, while they actually don't. This might cause mk2fs to fail while
trying to format the card and revert it to a read-only mode.

To fix this problem, let's add a card quirk (MMC_QUIRK_BROKEN_SD_DISCARD)
to indicate that we shall fall-back to use the legacy erase command
instead.
Signed-off-by: NAvri Altman <avri.altman@wdc.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220928095744.16455-1-avri.altman@wdc.com
[Ulf: Updated the commit message]
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>

07d2872b

cpufreq: amd-pstate: Expose struct amd_cpudata · f1375ec1

由 Meng Li 提交于 8月 17, 2022

Expose struct amd_cpudata to AMD P-State unit test module.

This data struct will be used on the following AMD P-State unit test
(amd-pstate-ut) module. The amd-pstate-ut module can get some
AMD infomations by this data struct. For example: highest perf,
nominal perf, boost supported etc.
Signed-off-by: NMeng Li <li.meng@amd.com>
Acked-by: NHuang Rui <ray.huang@amd.com>
Acked-by: NShuah Khan <skhan@linuxfoundation.org>
Signed-off-by: NShuah Khan <skhan@linuxfoundation.org>

f1375ec1

05 10月, 2022 7 次提交

f2fs: support recording errors into superblock · 95fa90c9

由 Chao Yu 提交于 9月 28, 2022

This patch supports to record detail reason of FSCORRUPTED error into
f2fs_super_block.s_errors[].
Signed-off-by: NChao Yu <chao@kernel.org>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

95fa90c9

f2fs: support recording stop_checkpoint reason into super_block · a9cfee0e

由 Chao Yu 提交于 9月 28, 2022

This patch supports to record stop_checkpoint error into
f2fs_super_block.s_stop_reason[].
Signed-off-by: NChao Yu <chao@kernel.org>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

a9cfee0e

PM: Improve EXPORT_*_DEV_PM_OPS macros · 34e1ed18

由 Paul Cercueil 提交于 8月 08, 2022

Update the _EXPORT_DEV_PM_OPS() internal macro. It was not used anywhere
outside pm.h and pm_runtime.h, so it is safe to update it.

Before, this macro would take a few parameters to be used as sleep and
runtime callbacks. This made it unsuitable to use with different
callbacks, for instance the "noirq" ones.

It is now semantically different: instead of creating a conditionally
exported dev_pm_ops structure, it only contains part of the definition.

This macro should however never be used directly (hence the trailing
underscore). Instead, the following four macros are provided:
- EXPORT_DEV_PM_OPS(name)
- EXPORT_GPL_DEV_PM_OPS(name)
- EXPORT_NS_DEV_PM_OPS(name, ns)
- EXPORT_NS_GPL_DEV_PM_OPS(name, ns)

For instance, it is now possible to conditionally export noirq
suspend/resume PM functions like this:

EXPORT_GPL_DEV_PM_OPS(foo_pm_ops) = {
    NOIRQ_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn)
};

The existing helper macros EXPORT_*_SIMPLE_DEV_PM_OPS() and
EXPORT_*_RUNTIME_DEV_PM_OPS() have been updated to use these new macros.
Signed-off-by: NPaul Cercueil <paul@crapouillou.net>
Reviewed-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

34e1ed18

vfio/mdev: add mdev available instance checking to the core · 9c799c22

由 Jason Gunthorpe 提交于 9月 23, 2022

Many of the mdev drivers use a simple counter for keeping track of the
available instances. Move this code to the core code and store the counter
in the mdev_parent. Implement it using correct locking, fixing mdpy.

Drivers just provide the value in the mdev_driver at registration time
and the core code takes care of maintaining it and exposing the value in
sysfs.

[hch: count instances per-parent instead of per-type, use an atomic_t
 to avoid taking mdev_list_lock in the show method]
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Reviewed-by: NKirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: NEric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20220923092652.100656-15-hch@lst.deSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

9c799c22

vfio/mdev: consolidate all the description sysfs into the core code · 685a1537

由 Christoph Hellwig 提交于 9月 23, 2022

Every driver just emits a string, simply add a method to the mdev_driver
to return it and provide a standard sysfs show function.

Remove the now unused types_attrs field in struct mdev_driver and the
support code for it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Reviewed-by: NKirti Wankhede <kwankhede@nvidia.com>
Link: https://lore.kernel.org/r/20220923092652.100656-14-hch@lst.deSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

685a1537

vfio/mdev: consolidate all the available_instance sysfs into the core code · f2fbc72e

由 Christoph Hellwig 提交于 9月 23, 2022

Every driver just print a number, simply add a method to the mdev_driver
to return it and provide a standard sysfs show function.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Reviewed-by: NKirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: NEric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20220923092652.100656-13-hch@lst.deSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

f2fbc72e

vfio/mdev: consolidate all the name sysfs into the core code · 0bc79069

由 Christoph Hellwig 提交于 9月 23, 2022

Every driver just emits a static string, simply add a field to the
mdev_type for the driver to fill out or fall back to the sysfs name and
provide a standard sysfs show function.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Reviewed-by: NKirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: NEric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20220923092652.100656-12-hch@lst.deSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

0bc79069

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功