提交 · e457e43c459e8ecc1679f5f356f11c84ada79bbf · openeuler / Kernel

28 7月, 2022 22 次提交

mm: hugetlb_vmemmap: introduce ARCH_WANT_HUGETLB_PAGE_FREE_VMEMMAP · e457e43c

由 Muchun Song 提交于 7月 28, 2022

mainline inclusion
from mainline-v5.19-rc1
commit 2e4ec02b
category: feature
bugzilla: 187198, https://gitee.com/openeuler/kernel/issues/I5GVFO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2e4ec02bbcc05b8905d65c763ebde6bc85508e90

--------------------------------

The feature of minimizing overhead of struct page associated with each
HugeTLB page is implemented on x86_64, however, the infrastructure of this
feature is already there, we could easily enable it for other
architectures.  Introduce ARCH_WANT_HUGETLB_PAGE_FREE_VMEMMAP for other
architectures to be easily enabled.  Just select this config if they want
to enable this feature.

Link: https://lkml.kernel.org/r/20220331065640.5777-1-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NBarry Song <baohua@kernel.org>
Tested-by: NBarry Song <baohua@kernel.org>
Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e457e43c

Revert "arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB" · 0a5d68bb

由 Liu Shixin 提交于 7月 28, 2022

hulk inclusion
category: feature
bugzilla: 187198, https://gitee.com/openeuler/kernel/issues/I5GVFO

--------------------------------

There is a formal solution to support hugetlb vmemmap feature on arm64.

This reverts commit 5838d235.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0a5d68bb

mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP · 22f969cf

由 Muchun Song 提交于 7月 28, 2022

mainline inclusion
from mainline-v5.18-rc1
commit e5408417
category: feature
bugzilla: 187198, https://gitee.com/openeuler/kernel/issues/I5GVFO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e54084173487804f5e2f23facf107fd9336e637e

--------------------------------

The vmemmap_remap_free/alloc are relevant to HugeTLB, so move those
functiongs to the scope of CONFIG_HUGETLB_PAGE_FREE_VMEMMAP.

Link: https://lkml.kernel.org/r/20211101031651.75851-6-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NBarry Song <song.bao.hua@hisilicon.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

22f969cf

selftests: vm: add a hugetlb test case · 7d9a273f

由 Muchun Song 提交于 7月 28, 2022

mainline inclusion
from mainline-v5.18-rc1
commit b147c89c
category: feature
bugzilla: 187198, https://gitee.com/openeuler/kernel/issues/I5GVFO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b147c89cd429321a59147368378c8aba17c8480f

--------------------------------

Since the head vmemmap page frame associated with each HugeTLB page is
reused, we should hide the PG_head flag of tail struct page from the
user.  Add a tese case to check whether it is work properly.  The test
steps are as follows.

  1) alloc 2MB hugeTLB
  2) get each page frame
  3) apply those APIs in each page frame
  4) Those APIs work completely the same as before.

Reading the flags of a page by /proc/kpageflags is done in
stable_page_flags(), which has invoked PageHead(), PageTail(),
PageCompound() and compound_head().

If those APIs work properly, the head page must have 15 and 17 bits set.
And tail pages must have 16 and 17 bits set but 15 bit unset.  Those
flags are checked in check_page_flags().

Link: https://lkml.kernel.org/r/20211101031651.75851-5-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NBarry Song <song.bao.hua@hisilicon.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	tools/testing/selftests/vm/Makefile
	tools/testing/selftests/vm/run_vmtests.sh
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7d9a273f

mm: sparsemem: use page table lock to protect kernel pmd operations · b31b84bf

由 Muchun Song 提交于 7月 28, 2022

mainline inclusion
from mainline-v5.18-rc1
commit d8d55f56
category: feature
bugzilla: 187198, https://gitee.com/openeuler/kernel/issues/I5GVFO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d8d55f5616cf3b900a23a72dd24e7b07211e7859

--------------------------------

The init_mm.page_table_lock is used to protect kernel page tables, we
can use it to serialize splitting vmemmap PMD mappings instead of mmap
write lock, which can increase the concurrency of vmemmap_remap_free().

Actually, It increase the concurrency between allocations of HugeTLB
pages.  But it is not the only benefit.  There are a lot of users of
mmap read lock of init_mm.  The mmap write lock is holding through
vmemmap_remap_free(), removing mmap write lock usage to make it does not
affect other users of mmap read lock.  It is not making anything worse
and always a win to move.

Now the kernel page table walker does not hold the page_table_lock when
walking pmd entries.  There may be consistency issue of a pmd entry,
because pmd entry might change from a huge pmd entry to a PTE page
table.  There is only one user of kernel page table walker, namely
ptdump.  The ptdump already considers the consistency, which use a local
variable to cache the value of pmd entry.  But we also need to update
->action to ACTION_CONTINUE to make sure the walker does not walk every
pte entry again when concurrent thread has split the huge pmd.

Link: https://lkml.kernel.org/r/20211101031651.75851-4-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	mm/sparse-vmemmap.c
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b31b84bf

mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key · 9b4aa808

由 Muchun Song 提交于 7月 28, 2022

mainline inclusion
from mainline-v5.18-rc1
commit a6b40850
category: feature
bugzilla: 187198, https://gitee.com/openeuler/kernel/issues/I5GVFO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a6b40850c442bf996e729e1d441d3dbc37cea171

--------------------------------

The page_fixed_fake_head() is used throughout memory management and the
conditional check requires checking a global variable, although the
overhead of this check may be small, it increases when the memory cache
comes under pressure.  Also, the global variable will not be modified
after system boot, so it is very appropriate to use static key machanism.

Link: https://lkml.kernel.org/r/20211101031651.75851-3-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NBarry Song <song.bao.hua@hisilicon.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	mm/memory_hotplug.c
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9b4aa808

mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page · fe0aed02

由 Muchun Song 提交于 7月 28, 2022

mainline inclusion
from mainline-v5.18-rc1
commit e7d32485
category: feature
bugzilla: 187198, https://gitee.com/openeuler/kernel/issues/I5GVFO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e7d324850bfcb30df563d144c0363cc44595277d

--------------------------------

Patch series "Free the 2nd vmemmap page associated with each HugeTLB
page", v7.

This series can minimize the overhead of struct page for 2MB HugeTLB
pages significantly.  It further reduces the overhead of struct page by
12.5% for a 2MB HugeTLB compared to the previous approach, which means
2GB per 1TB HugeTLB.  It is a nice gain.  Comments and reviews are
welcome.  Thanks.

The main implementation and details can refer to the commit log of patch
1.  In this series, I have changed the following four helpers, the
following table shows the impact of the overhead of those helpers.

	+------------------+-----------------------+
	|       APIs       | head page | tail page |
	+------------------+-----------+-----------+
	|    PageHead()    |     Y     |     N     |
	+------------------+-----------+-----------+
	|    PageTail()    |     Y     |     N     |
	+------------------+-----------+-----------+
	|  PageCompound()  |     N     |     N     |
	+------------------+-----------+-----------+
	|  compound_head() |     Y     |     N     |
	+------------------+-----------+-----------+

	Y: Overhead is increased.
	N: Overhead is _NOT_ increased.

It shows that the overhead of those helpers on a tail page don't change
between "hugetlb_free_vmemmap=on" and "hugetlb_free_vmemmap=off".  But the
overhead on a head page will be increased when "hugetlb_free_vmemmap=on"
(except PageCompound()).  So I believe that Matthew Wilcox's folio series
will help with this.

The users of PageHead() and PageTail() are much less than compound_head()
and most users of PageTail() are VM_BUG_ON(), so I have done some tests
about the overhead of compound_head() on head pages.

I have tested the overhead of calling compound_head() on a head page,
which is 2.11ns (Measure the call time of 10 million times
compound_head(), and then average).

For a head page whose address is not aligned with PAGE_SIZE or a
non-compound page, the overhead of compound_head() is 2.54ns which is
increased by 20%.  For a head page whose address is aligned with
PAGE_SIZE, the overhead of compound_head() is 2.97ns which is increased by
40%.  Most pages are the former.  I do not think the overhead is
significant since the overhead of compound_head() itself is low.

This patch (of 5):

This patch minimizes the overhead of struct page for 2MB HugeTLB pages
significantly.  It further reduces the overhead of struct page by 12.5%
for a 2MB HugeTLB compared to the previous approach, which means 2GB per
1TB HugeTLB (2MB type).

After the feature of "Free sonme vmemmap pages of HugeTLB page" is
enabled, the mapping of the vmemmap addresses associated with a 2MB
HugeTLB page becomes the figure below.

     HugeTLB                    struct pages(8 pages)         page frame(8 pages)
 +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+---> PG_head
 |           |                     |     0     | -------------> |     0     |
 |           |                     +-----------+                +-----------+
 |           |                     |     1     | -------------> |     1     |
 |           |                     +-----------+                +-----------+
 |           |                     |     2     | ----------------^ ^ ^ ^ ^ ^
 |           |                     +-----------+                   | | | | |
 |           |                     |     3     | ------------------+ | | | |
 |           |                     +-----------+                     | | | |
 |           |                     |     4     | --------------------+ | | |
 |    2MB    |                     +-----------+                       | | |
 |           |                     |     5     | ----------------------+ | |
 |           |                     +-----------+                         | |
 |           |                     |     6     | ------------------------+ |
 |           |                     +-----------+                           |
 |           |                     |     7     | --------------------------+
 |           |                     +-----------+
 |           |
 |           |
 |           |
 +-----------+

As we can see, the 2nd vmemmap page frame (indexed by 1) is reused and
remaped. However, the 2nd vmemmap page frame is also can be freed to
the buddy allocator, then we can change the mapping from the figure
above to the figure below.

    HugeTLB                    struct pages(8 pages)         page frame(8 pages)
 +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+---> PG_head
 |           |                     |     0     | -------------> |     0     |
 |           |                     +-----------+                +-----------+
 |           |                     |     1     | ---------------^ ^ ^ ^ ^ ^ ^
 |           |                     +-----------+                  | | | | | |
 |           |                     |     2     | -----------------+ | | | | |
 |           |                     +-----------+                    | | | | |
 |           |                     |     3     | -------------------+ | | | |
 |           |                     +-----------+                      | | | |
 |           |                     |     4     | ---------------------+ | | |
 |    2MB    |                     +-----------+                        | | |
 |           |                     |     5     | -----------------------+ | |
 |           |                     +-----------+                          | |
 |           |                     |     6     | -------------------------+ |
 |           |                     +-----------+                            |
 |           |                     |     7     | ---------------------------+
 |           |                     +-----------+
 |           |
 |           |
 |           |
 +-----------+

After we do this, all tail vmemmap pages (1-7) are mapped to the head
vmemmap page frame (0).  In other words, there are more than one page
struct with PG_head associated with each HugeTLB page.  We __know__ that
there is only one head page struct, the tail page structs with PG_head are
fake head page structs.  We need an approach to distinguish between those
two different types of page structs so that compound_head(), PageHead()
and PageTail() can work properly if the parameter is the tail page struct
but with PG_head.

The following code snippet describes how to distinguish between real and
fake head page struct.

	if (test_bit(PG_head, &page->flags)) {
		unsigned long head = READ_ONCE(page[1].compound_head);

		if (head & 1) {
			if (head == (unsigned long)page + 1)
				==> head page struct
			else
				==> tail page struct
		} else
			==> head page struct
	}

We can safely access the field of the @page[1] with PG_head because the
@page is a compound page composed with at least two contiguous pages.

[songmuchun@bytedance.com: restore lost comment changes]

Link: https://lkml.kernel.org/r/20211101031651.75851-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20211101031651.75851-2-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NBarry Song <song.bao.hua@hisilicon.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	include/linux/page-flags.h
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fe0aed02

mm: make compound_head const-preserving · 2722aa58

由 Matthew Wilcox (Oracle) 提交于 7月 28, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 0f2317e3
category: feature
bugzilla: 187198, https://gitee.com/openeuler/kernel/issues/I5GVFO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0f2317e34e2c7b97efd4600122115410795ebeea

--------------------------------

If you pass a const pointer to compound_head(), you get a const pointer
back; if you pass a mutable pointer, you get a mutable pointer back.  Also
remove an unnecessary forward definition of struct page; we're about to
dereference page->compound_head, so it must already have been defined.

Link: https://lkml.kernel.org/r/20210416231531.2521383-5-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2722aa58

jump_label: Provide CONFIG-driven build state defaults · 2706bcc0

由 Kees Cook 提交于 7月 28, 2022

mainline inclusion
from mainline-v5.13-rc1
commit 0d66ccc1
category: feature
bugzilla: 187198, https://gitee.com/openeuler/kernel/issues/I5GVFO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0d66ccc1627013c95f1e7ef10b95b8451cd7834e

--------------------------------

As shown in the comment in jump_label.h, choosing the initial state of
static branches changes the assembly layout. If the condition is expected
to be likely it's inline, and if unlikely it is out of line via a jump.

A few places in the kernel use (or could be using) a CONFIG to choose the
default state, which would give a small performance benefit to their
compile-time declared default. Provide the infrastructure to do this.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210401232347.2791257-2-keescook@chromium.orgSigned-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2706bcc0

Revert "mm/dynamic_hugetlb: disable dynamic hugetlb if hugetlb_vmemmap is enabled" · 156e60a4

由 Liu Shixin 提交于 7月 28, 2022

hulk inclusion
category: bugfix
bugzilla: 187198, https://gitee.com/openeuler/kernel/issues/I5GVFO
CVE: NA

--------------------------------

Will disable hugetlb_vmemmap when dynamic hugetlb is enabled in later patch.

This reverts commit c7ae7c0d.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

156e60a4

ubifs: Fix AA deadlock when setting xattr for encrypted file · baf97eb4

由 Zhihao Cheng 提交于 7月 28, 2022

hulk inclusion
category: bugfix
bugzilla: 187250, https://gitee.com/openeuler/kernel/issues/I5HSMS
CVE: NA

-------------------------------------------------

Following process:
vfs_setxattr(host)
  ubifs_xattr_set
    down_write(host_ui->xattr_sem)   <- lock first time
      create_xattr
        ubifs_new_inode(host)
          fscrypt_prepare_new_inode(host)
            fscrypt_policy_to_inherit(host)
              if (IS_ENCRYPTED(inode))
                fscrypt_require_key(host)
                  fscrypt_get_encryption_info(host)
                    ubifs_xattr_get(host)
                      down_read(host_ui->xattr_sem) <- AA deadlock

, which may trigger an AA deadlock problem:

[  102.620871] INFO: task setfattr:1599 blocked for more than 10 seconds.
[  102.625298]       Not tainted 5.19.0-rc7-00001-gb666b6823ce0-dirty #711
[  102.628732] task:setfattr        state:D stack:    0 pid: 1599
[  102.628749] Call Trace:
[  102.628753]  <TASK>
[  102.628776]  __schedule+0x482/0x1060
[  102.629964]  schedule+0x92/0x1a0
[  102.629976]  rwsem_down_read_slowpath+0x287/0x8c0
[  102.629996]  down_read+0x84/0x170
[  102.630585]  ubifs_xattr_get+0xd1/0x370 [ubifs]
[  102.630730]  ubifs_crypt_get_context+0x1f/0x30 [ubifs]
[  102.630791]  fscrypt_get_encryption_info+0x7d/0x1c0
[  102.630810]  fscrypt_policy_to_inherit+0x56/0xc0
[  102.630817]  fscrypt_prepare_new_inode+0x35/0x160
[  102.630830]  ubifs_new_inode+0xcc/0x4b0 [ubifs]
[  102.630873]  ubifs_xattr_set+0x591/0x9f0 [ubifs]
[  102.630961]  xattr_set+0x8c/0x3e0 [ubifs]
[  102.631003]  __vfs_setxattr+0x71/0xc0
[  102.631026]  vfs_setxattr+0x105/0x270
[  102.631034]  do_setxattr+0x6d/0x110
[  102.631041]  setxattr+0xa0/0xd0
[  102.631087]  __x64_sys_setxattr+0x2f/0x40

Fetch a reproducer in [Link].

Just like ext4 does, which skips encrypting for inode with
EXT4_EA_INODE_FL flag. Stop encypting xattr inode for ubifs.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216260
Fixes: f4e3634a ("ubifs: Fix races between xattr_{set|get} ...")
Fixes: d475a507 ("ubifs: Add skeleton for fscrypto")
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

baf97eb4

ubifs: Fix the issue that UBIFS be read-only due to truncate in the encrypted directory · 6658b33b

由 ZhaoLong Wang 提交于 7月 28, 2022

hulk inclusion
category: bugfix
bugzilla: 187163, https://gitee.com/openeuler/kernel/issues/I5GBC4
CVE: NA

--------------------------------

The ubifs_compress() function does not compress the data When the
data length is short than 128 bytes or the compressed data length
is not ideal.It cause that the compressed length of the truncated
data in the truncate_data_node() function may be greater than the
length of the raw data read from the flash.

The above two lengths are transferred to the ubifs_encrypt()
function as parameters. This may lead to assertion fails and then
the file system becomes read-only.

This patch use the actual length of the data in the memory as the
input parameter for assert comparison, which avoids the problem.
Signed-off-by: NZhaoLong Wang <wangzhaolong1@huawei.com>
Reviewed-by: Nzhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6658b33b

lockdown: Fix kexec lockdown bypass with ima policy · 89e9ad6d

由 Eric Snowberg 提交于 7月 28, 2022

mainline inclusion
from mainline-v5.19-rc8
commit 543ce63b
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5I0FP
CVE: CVE-2022-21505

Reference: https://seclists.org/oss-sec/2022/q3/57
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=543ce63b664e2c2f9533d089a4664b559c3e6b5b

--------------------------------

The lockdown LSM is primarily used in conjunction with UEFI Secure Boot.
This LSM may also be used on machines without UEFI.  It can also be
enabled when UEFI Secure Boot is disabled.  One of lockdown's features
is to prevent kexec from loading untrusted kernels.  Lockdown can be
enabled through a bootparam or after the kernel has booted through
securityfs.

If IMA appraisal is used with the "ima_appraise=log" boot param,
lockdown can be defeated with kexec on any machine when Secure Boot is
disabled or unavailable.  IMA prevents setting "ima_appraise=log" from
the boot param when Secure Boot is enabled, but this does not cover
cases where lockdown is used without Secure Boot.

To defeat lockdown, boot without Secure Boot and add ima_appraise=log to
the kernel command line; then:

  $ echo "integrity" > /sys/kernel/security/lockdown
  $ echo "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig" > \
    /sys/kernel/security/ima/policy
  $ kexec -ls unsigned-kernel

Add a call to verify ima appraisal is set to "enforce" whenever lockdown
is enabled.  This fixes CVE-2022-21505.

Cc: stable@vger.kernel.org
Fixes: 29d3c1c8 ("kexec: Allow kexec_file() with appropriate IMA policy when locked down")
Signed-off-by: NEric Snowberg <eric.snowberg@oracle.com>
Acked-by: NMimi Zohar <zohar@linux.ibm.com>
Reviewed-by: NJohn Haxby <john.haxby@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGUO Zihua <guozihua@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

89e9ad6d

fbmem: Check virtual screen sizes in fb_set_var() · 9ba05bbc

由 Helge Deller 提交于 7月 28, 2022

stable inclusion
from stable-v5.10.130
commit b81212828ad19ab3eccf00626cd04099215060bf
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5IQ4M
CVE: CVE-2021-33655

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b81212828ad19ab3eccf00626cd04099215060bf

--------------------------------

commit 6c11df58 upstream.

Verify that the fbdev or drm driver correctly adjusted the virtual
screen sizes. On failure report the failing driver and reject the screen
size change.
Signed-off-by: NHelge Deller <deller@gmx.de>
Reviewed-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9ba05bbc

fbcon: Prevent that screen size is smaller than font size · e664e980

由 Helge Deller 提交于 7月 28, 2022

stable inclusion
from stable-v5.10.130
commit cecb806c766c78e1be62b6b7b1483ef59bbaeabe
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5IQ4M
CVE: CVE-2021-33655

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=cecb806c766c78e1be62b6b7b1483ef59bbaeabe

--------------------------------

commit e64242ca upstream.

We need to prevent that users configure a screen size which is smaller than the
currently selected font size. Otherwise rendering chars on the screen will
access memory outside the graphics memory region.

This patch adds a new function fbcon_modechange_possible() which
implements this check and which later may be extended with other checks
if necessary.  The new function is called from the FBIOPUT_VSCREENINFO
ioctl handler in fbmem.c, which will return -EINVAL if userspace asked
for a too small screen size.
Signed-off-by: NHelge Deller <deller@gmx.de>
Reviewed-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e664e980

fbcon: Disallow setting font bigger than screen size · c248fb22

由 Helge Deller 提交于 7月 28, 2022

stable inclusion
from stable-v5.10.130
commit b727561ddc9360de9631af2d970d8ffed676a750
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5IQ4M
CVE: CVE-2021-33655

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b727561ddc9360de9631af2d970d8ffed676a750

--------------------------------

commit 65a01e60 upstream.

Prevent that users set a font size which is bigger than the physical screen.
It's unlikely this may happen (because screens are usually much larger than the
fonts and each font char is limited to 32x32 pixels), but it may happen on
smaller screens/LCD displays.
Signed-off-by: NHelge Deller <deller@gmx.de>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c248fb22

inotify: show inotify mask flags in proc fdinfo · 77eabe0b

由 Amir Goldstein 提交于 7月 28, 2022

mainline inclusion
from mainline-v5.19-rc1
commit a32e697c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5IHD1
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a32e697cda27679a0327ae2cafdad8c7170f548f

--------------------------------

The inotify mask flags IN_ONESHOT and IN_EXCL_UNLINK are not "internal
to kernel" and should be exposed in procfs fdinfo so CRIU can restore
them.

Fixes: 69335996 ("inotify: hide internal kernel bits from fdinfo")
Link: https://lore.kernel.org/r/20220422120327.3459282-2-amir73il@gmail.comSigned-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NLi Nan <linan122@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

77eabe0b

block: prevent lockdep false positive warning about 'bd_mutex' · 67b9d277

由 Yu Kuai 提交于 7月 28, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ETAB
CVE: NA

--------------------------------

Patch ("block: fix that part scan is disabled in device_add_disk()")
confuse lockdep to produce following warning:

=====================================================
WARNING: possible circular locking dependency detected
4.18.0+ #2 Tainted: G                 ---------r-  -
------------------------------------------------------
syz-executor.0/4652 is trying to acquire lock:
00000000ad5f5a19 (&mddev->open_mutex){+.+.}, at: md_open+0x13a/0x260 home/install/linux-rh-3-10/drivers/md/md.c:7626

but task is already holding lock:
000000005c3a3fea (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x156/0x1490 home/install/linux-rh-3-10/fs/block_dev.c:1583

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&bdev->bd_mutex){+.+.}:
       __mutex_lock_common home/install/linux-rh-3-10/kernel/locking/mutex.c:925 [inline]
       __mutex_lock+0x105/0x1270 home/install/linux-rh-3-10/kernel/locking/mutex.c:1072
       __blkdev_get+0x156/0x1490 home/install/linux-rh-3-10/fs/block_dev.c:1583
       blkdev_get+0x33c/0xac0 home/install/linux-rh-3-10/fs/block_dev.c:1735
       disk_init_partition home/install/linux-rh-3-10/block/blk-sysfs.c:972 [inline]
       blk_register_queue+0x5ed/0x6c0 home/install/linux-rh-3-10/block/blk-sysfs.c:1055
       __device_add_disk+0xab5/0xd70 home/install/linux-rh-3-10/block/genhd.c:729
       sd_probe_async+0x447/0x852 home/install/linux-rh-3-10/drivers/scsi/sd.c:3249
       async_run_entry_fn+0xe1/0x700 home/install/linux-rh-3-10/kernel/async.c:127
       process_one_work+0x9cf/0x1940 home/install/linux-rh-3-10/kernel/workqueue.c:2175
       worker_thread+0x91/0xc50 home/install/linux-rh-3-10/kernel/workqueue.c:2321
       kthread+0x33a/0x400 home/install/linux-rh-3-10/kernel/kthread.c:257
       ret_from_fork+0x3a/0x50 home/install/linux-rh-3-10/arch/x86/entry/entry_64.S:355

-> #1 (&q->sysfs_dir_lock){+.+.}:
       __mutex_lock_common home/install/linux-rh-3-10/kernel/locking/mutex.c:925 [inline]
       __mutex_lock+0x105/0x1270 home/install/linux-rh-3-10/kernel/locking/mutex.c:1072
       blk_register_queue+0x143/0x6c0 home/install/linux-rh-3-10/block/blk-sysfs.c:1010
       __device_add_disk+0xab5/0xd70 home/install/linux-rh-3-10/block/genhd.c:729
       add_disk home/install/linux-rh-3-10/./include/linux/genhd.h:447 [inline]
       md_alloc+0xb06/0x10d0 home/install/linux-rh-3-10/drivers/md/md.c:5525
       md_probe+0x32/0x60 home/install/linux-rh-3-10/drivers/md/md.c:5554
       kobj_lookup+0x2d2/0x450 home/install/linux-rh-3-10/drivers/base/map.c:152
       get_gendisk+0x3b/0x360 home/install/linux-rh-3-10/block/genhd.c:860
       bdev_get_gendisk home/install/linux-rh-3-10/fs/block_dev.c:1181 [inline]
       __blkdev_get+0x3b6/0x1490 home/install/linux-rh-3-10/fs/block_dev.c:1578
       blkdev_get+0x33c/0xac0 home/install/linux-rh-3-10/fs/block_dev.c:1735
       blkdev_open+0x1c2/0x250 home/install/linux-rh-3-10/fs/block_dev.c:1923
       do_dentry_open+0x686/0xf50 home/install/linux-rh-3-10/fs/open.c:777
       do_last home/install/linux-rh-3-10/fs/namei.c:3449 [inline]
       path_openat+0x92f/0x28c0 home/install/linux-rh-3-10/fs/namei.c:3578
       do_filp_open+0x1aa/0x2b0 home/install/linux-rh-3-10/fs/namei.c:3613
       do_sys_open+0x307/0x490 home/install/linux-rh-3-10/fs/open.c:1075
       do_syscall_64+0xca/0x5c0 home/install/linux-rh-3-10/arch/x86/entry/common.c:298
       entry_SYSCALL_64_after_hwframe+0x6a/0xdf

-> #0 (&mddev->open_mutex){+.+.}:
       lock_acquire+0x10b/0x3a0 home/install/linux-rh-3-10/kernel/locking/lockdep.c:3868
       __mutex_lock_common home/install/linux-rh-3-10/kernel/locking/mutex.c:925 [inline]
       __mutex_lock+0x105/0x1270 home/install/linux-rh-3-10/kernel/locking/mutex.c:1072
       md_open+0x13a/0x260 home/install/linux-rh-3-10/drivers/md/md.c:7626
       __blkdev_get+0x2dc/0x1490 home/install/linux-rh-3-10/fs/block_dev.c:1599
       blkdev_get+0x33c/0xac0 home/install/linux-rh-3-10/fs/block_dev.c:1735
       blkdev_open+0x1c2/0x250 home/install/linux-rh-3-10/fs/block_dev.c:1923
       do_dentry_open+0x686/0xf50 home/install/linux-rh-3-10/fs/open.c:777
       do_last home/install/linux-rh-3-10/fs/namei.c:3449 [inline]
       path_openat+0x92f/0x28c0 home/install/linux-rh-3-10/fs/namei.c:3578
       do_filp_open+0x1aa/0x2b0 home/install/linux-rh-3-10/fs/namei.c:3613
       do_sys_open+0x307/0x490 home/install/linux-rh-3-10/fs/open.c:1075
       do_syscall_64+0xca/0x5c0 home/install/linux-rh-3-10/arch/x86/entry/common.c:298
       entry_SYSCALL_64_after_hwframe+0x6a/0xdf

other info that might help us debug this:

Chain exists of:
  &mddev->open_mutex --> &q->sysfs_dir_lock --> &bdev->bd_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&bdev->bd_mutex);
                               lock(&q->sysfs_dir_lock);
                               lock(&bdev->bd_mutex);
  lock(&mddev->open_mutex);

 *** DEADLOCK ***

Since 'bd_mutex' and 'sysfs_dir_lock' is different is for each device,
deadlock between md_open() and sd_probe_async() is impossible. However,
lockdep is treating 'bd_mutex' and 'sysfs_dir_lock' from different devices
the same, and patch "block: fix that part scan is disabled in
device_add_disk()" is holding 'bd_mutex' inside 'sysfs_dir_lock',
which causes the false positive warning.

Fix the false positive warning by don't grab 'bd_mutex' inside
'sysfs_dir_lock'.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

67b9d277

block: fix that part scan is disabled in device_add_disk() · d7c2ddc8

由 Yu Kuai 提交于 7月 28, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ETAB
CVE: NA

--------------------------------

Patch ("block: Fix warning in bd_link_disk_holder()") moves the
setting of flag 'GENHD_FL_UP' behind blkdev_get, which will
disabled part scan:

devcie_add_disk
 register_disk
  blkdev_get
   __blkdev_get
    bdev_get_gendisk
     get_gendisk -> failed because 'GENHD_FL_UP' is not set

And this will cause tests block/017, block/018 and scsi/004 to fail.

Fix the problem by moving part scan as well.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d7c2ddc8

block: Fix warning in bd_link_disk_holder() · a51eb65b

由 Luo Meng 提交于 7月 28, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ETAB
CVE: NA

--------------------------------

Warning reports as follows:

 WARNING: CPU: 3 PID: 674 at fs/block_dev.c:1272 bd_link_disk_holder+0xcd/0x270
 Modules linked in: null_blk(+)
 CPU: 3 PID: 674 Comm: dmsetup Not tainted 5.10.0-16691-gf6076432827d-dirty #158
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-4
 RIP: 0010:bd_link_disk_holder+0xcd/0x270
 Code: 69 73 ee 00 44 89 e8 5b 48 83 05 c5 bf 6d 0c 01 5d 41 5c 41 5d 41 5e 41 8
 RSP: 0018:ffffc9000049bbb8 EFLAGS: 00010202
 RAX: ffff888104e39038 RBX: ffff888104185000 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: ffffffffaa085692 RDI: 0000000000000000
 RBP: ffff88810cc2ae00 R08: ffffffffa853659b R09: 0000000000000000
 R10: ffffc9000049bbb0 R11: 720030626c6c756e R12: ffff88810e800000
 R13: ffff88810e800090 R14: ffff888103570c98 R15: ffff888103570c80
 FS:  00007fb49dc13dc0(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007ff994ebde70 CR3: 000000010d54a000 CR4: 00000000000006e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  dm_get_table_device+0x175/0x300
  dm_get_device+0x238/0x360
  linear_ctr+0xee/0x170
  dm_table_add_target+0x199/0x4b0
  table_load+0x18c/0x480
  ? table_clear+0x190/0x190
  ctl_ioctl+0x21d/0x640
  ? check_preemption_disabled+0x140/0x150
  dm_ctl_ioctl+0x12/0x20
  __se_sys_ioctl+0xb1/0x100
  __x64_sys_ioctl+0x1e/0x30
  do_syscall_64+0x45/0x70
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

This can reproduce by concurrent operations:
	1. modprobe null_blk
	2. echo -e "0 10000 linear /dev/nullb0 0" > table
	   dmsetup create xxx table

t1: create disk a                   |     t2: dm setup
                                    |
device_add_disk                     |
 dev->devt = devt                   |
                        	    | dm_get_table_device
                        	    | open_table_device
                        	    | blkdev_get_by_dev -> succeed
				    | bd_link_disk_holder
                        	    |  -> holder_dir is still NULL
 register_disk -> create holder_dir
  kobject_create_and_add

device_add_disk() will set devt before creating holder_dir, which
leaves a window that dm_get_table_device() can find the disk by
devt while it's holder_dir is NULL.

So move GENHD_FL_UP in blk_register_queue() to avoid this warning and
fix a NULL-ptr in  __blk_mq_sched_bio_merge().
Signed-off-by: NLuo Meng <luomeng12@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a51eb65b

ucounts: add missing data type changes · 92561eff

由 Sven Schnelle 提交于 7月 28, 2022

mainline inclusion
from mainline-v5.14-rc6
commit f153c224
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5IDIC
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f153c2246783ba210493054d99c66353f56423c9

--------------------------------

commit f9c82a4e ("Increase size of ucounts to atomic_long_t")
changed the data type of ucounts/ucounts_max to long, but missed to
adjust a few other places. This is noticeable on big endian platforms
from user space because the /proc/sys/user/max_*_names files all
contain 0.

v4 - Made the min and max constants long so the sysctl values
     are actually settable on little endian machines.
     -- EWB

Fixes: f9c82a4e ("Increase size of ucounts to atomic_long_t")
Signed-off-by: NSven Schnelle <svens@linux.ibm.com>
Tested-by: NNathan Chancellor <nathan@kernel.org>
Tested-by: NLinux Kernel Functional Testing <lkft@linaro.org>
Acked-by: NAlexey Gladkov <legion@kernel.org>
v1: https://lkml.kernel.org/r/20210721115800.910778-1-svens@linux.ibm.com
v2: https://lkml.kernel.org/r/20210721125233.1041429-1-svens@linux.ibm.com
v3: https://lkml.kernel.org/r/20210730062854.3601635-1-svens@linux.ibm.com
Link: https://lkml.kernel.org/r/8735rijqlv.fsf_-_@disp2133Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

Conflict:
  fs/notify/fanotify/fanotify_user.c
Signed-off-by: NLi Nan <linan122@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

92561eff

bpf: Don't redirect packets with invalid pkt_len · 03a99552

由 Zhengchao Shao 提交于 7月 28, 2022

mainline inclusion
from mainline-v5.19-rc6
commit fd189422
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5HWKR
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=fd1894224407

--------------------------------

Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any
skbs, that is, the flow->head is null.
The root cause, as the [2] says, is because that bpf_prog_test_run_skb()
run a bpf prog which redirects empty skbs.
So we should determine whether the length of the packet modified by bpf
prog or others like bpf_prog_test is valid before forwarding it directly.

LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5
LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html

Reported-by: syzbot+7a12909485b94426aceb@syzkaller.appspotmail.com
Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: NStanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220715115559.139691-1-shaozhengchao@huawei.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

03a99552

26 7月, 2022 18 次提交

Revert "net: micrel: fix KS8851_MLL Kconfig" · 156b297a

由 Marek Vasut 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit 7992fdb045fbc7cb0e34eba464b73044585c0638
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7992fdb045fbc7cb0e34eba464b73044585c0638

--------------------------------

This reverts commit 1ff5359afa5ec0dd09fe76183dc4fa24b50e4125 which is
commit c3efcedd upstream.

The upstream commit c3efcedd ("net: micrel: fix KS8851_MLL Kconfig")
depends on e5f31552 ("ethernet: fix PTP_1588_CLOCK dependencies")
which is not part of Linux 5.10.y . Revert the aforementioned commit to
prevent breakage in 5.10.y .
Signed-off-by: NMarek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: <stable@vger.kernel.org> # 5.10.x
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

156b297a

block/compat_ioctl: fix range check in BLKGETSIZE · 9a1fc0d8

由 Khazhismel Kumykov 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit 8bedbc8f7f35f533000b347644b6bf1f62524676
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8bedbc8f7f35f533000b347644b6bf1f62524676

--------------------------------

commit ccf16413 upstream.

kernel ulong and compat_ulong_t may not be same width. Use type directly
to eliminate mismatches.

This would result in truncation rather than EFBIG for 32bit mode for
large disks.
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NKhazhismel Kumykov <khazhy@google.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220414224056.2875681-1-khazhy@google.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

9a1fc0d8

staging: ion: Prevent incorrect reference counting behavour · dd76b5fe

由 Lee Jones 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit fea24b07edfc348c67a019b6e17b39c0698e631f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fea24b07edfc348c67a019b6e17b39c0698e631f

--------------------------------

Supply additional check in order to prevent unexpected results.

Fixes: b892bf75 ("ion: Switch ion to use dma-buf")
Suggested-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

dd76b5fe

spi: atmel-quadspi: Fix the buswidth adjustment between spi-mem and controller · 0c5ab04a

由 Tudor Ambarus 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit dccee748af17fc087ff5017152e532ef8e18c8c0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=dccee748af17fc087ff5017152e532ef8e18c8c0

--------------------------------

commit 8c235cc2 upstream.

Use the spi_mem_default_supports_op() core helper in order to take into
account the buswidth specified by the user in device tree.

Cc: <stable@vger.kernel.org>
Fixes: 0e6aae08 ("spi: Add QuadSPI driver for Atmel SAMA5D2")
Signed-off-by: NTudor Ambarus <tudor.ambarus@microchip.com>
Link: https://lore.kernel.org/r/20220406133604.455356-1-tudor.ambarus@microchip.comSigned-off-by: NMark Brown <broonie@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

0c5ab04a

can: isotp: stop timeout monitoring when no first frame was sent · f120da4c

由 Oliver Hartkopp 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit 50aac44273600cb0ae1efd010bb1de7701444a41
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=50aac44273600cb0ae1efd010bb1de7701444a41

--------------------------------

commit d7349708 upstream.

The first attempt to fix a the 'impossible' WARN_ON_ONCE(1) in
isotp_tx_timer_handler() focussed on the identical CAN IDs created by
the syzbot reproducer and lead to upstream fix/commit 3ea56642
("can: isotp: sanitize CAN ID checks in isotp_bind()"). But this did
not catch the root cause of the wrong tx.state in the tx_timer handler.

In the isotp 'first frame' case a timeout monitoring needs to be started
before the 'first frame' is send. But when this sending failed the timeout
monitoring for this specific frame has to be disabled too.

Otherwise the tx_timer is fired with the 'warn me' tx.state of ISOTP_IDLE.

Fixes: e057dd3f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/all/20220405175112.2682-1-socketcan@hartkopp.net
Reported-by: syzbot+2339c27f5c66c652843e@syzkaller.appspotmail.com
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

f120da4c

ext4: force overhead calculation if the s_overhead_cluster makes no sense · fd9c2ff8

由 Theodore Ts'o 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit e1e96e37272156d691203a3725b876787f38c8f2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e1e96e37272156d691203a3725b876787f38c8f2

--------------------------------

commit 85d825db upstream.

If the file system does not use bigalloc, calculating the overhead is
cheap, so force the recalculation of the overhead so we don't have to
trust the precalculated overhead in the superblock.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

fd9c2ff8

ext4: fix overhead calculation to account for the reserved gdt blocks · 27d40cc6

由 Theodore Ts'o 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit 4789149b9ea2a1893c62d816742f1a76514fc901
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4789149b9ea2a1893c62d816742f1a76514fc901

--------------------------------

commit 10b01ee9 upstream.

The kernel calculation was underestimating the overhead by not taking
into account the reserved gdt blocks.  With this change, the overhead
calculated by the kernel matches the overhead calculation in mke2fs.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

27d40cc6

ext4, doc: fix incorrect h_reserved size · cf4de22f

由 wangjianjian (C) 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit 0c54b093766becb9c83317232c93290cf612b8ff
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0c54b093766becb9c83317232c93290cf612b8ff

--------------------------------

commit 7102ffe4 upstream.

According to document and code, ext4_xattr_header's size is 32 bytes, so
h_reserved size should be 3.
Signed-off-by: NWang Jianjian <wangjianjian3@huawei.com>
Link: https://lore.kernel.org/r/92fcc3a6-7d77-8c09-4126-377fcb4c46a5@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

cf4de22f

ext4: limit length to bitmap_maxbytes - blocksize in punch_hole · aaf9e2fa

由 Tadeusz Struk 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit 22c450d39f8922ae26de459cf4f83b2b294f207e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=22c450d39f8922ae26de459cf4f83b2b294f207e

--------------------------------

commit 2da37622 upstream.

Syzbot found an issue [1] in ext4_fallocate().
The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul,
and offset 0x1000000ul, which, when added together exceed the
bitmap_maxbytes for the inode. This triggers a BUG in
ext4_ind_remove_space(). According to the comments in this function
the 'end' parameter needs to be one block after the last block to be
removed. In the case when the BUG is triggered it points to the last
block. Modify the ext4_punch_hole() function and add constraint that
caps the length to satisfy the one before laster block requirement.

LINK: [1] https://syzkaller.appspot.com/bug?id=b80bd9cf348aac724a4f4dff251800106d721331
LINK: [2] https://syzkaller.appspot.com/text?tag=ReproC&x=14ba0238700000

Fixes: a4bb6b64 ("ext4: enable "punch hole" functionality")
Reported-by: syzbot+7a806094edd5d07ba029@syzkaller.appspotmail.com
Signed-off-by: NTadeusz Struk <tadeusz.struk@linaro.org>
Link: https://lore.kernel.org/r/20220331200515.153214-1-tadeusz.struk@linaro.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

aaf9e2fa

ext4: fix fallocate to use file_modified to update permissions consistently · 97cada6b

由 Darrick J. Wong 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit f6038d43b25bba1cd50d2a77e207f6550aee9954
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f6038d43b25bba1cd50d2a77e207f6550aee9954

--------------------------------

commit ad5cd4f4 upstream.

Since the initial introduction of (posix) fallocate back at the turn of
the century, it has been possible to use this syscall to change the
user-visible contents of files.  This can happen by extending the file
size during a preallocation, or through any of the newer modes (punch,
zero, collapse, insert range).  Because the call can be used to change
file contents, we should treat it like we do any other modification to a
file -- update the mtime, and drop set[ug]id privileges/capabilities.

The VFS function file_modified() does all this for us if pass it a
locked inode, so let's make fallocate drop permissions correctly.
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220308185043.GA117678@magnoliaSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

97cada6b

perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event · ce66add4

由 Leo Yan 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit 19590bbc691d81f03d2a24a3ec30c399ebe071e0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=19590bbc691d81f03d2a24a3ec30c399ebe071e0

--------------------------------

[ Upstream commit ccb17cae ]

Since commit bb30acae ("perf report: Bail out --mem-mode if mem
info is not available") "perf mem report" and "perf report --mem-mode"
don't report result if the PERF_SAMPLE_DATA_SRC bit is missed in sample
type.

The commit ffab4870 ("perf: arm-spe: Fix perf report
--mem-mode") partially fixes the issue.  It adds PERF_SAMPLE_DATA_SRC
bit for Arm SPE event, this allows the perf data file generated by
kernel v5.18-rc1 or later version can be reported properly.

On the other hand, perf tool still fails to be backward compatibility
for a data file recorded by an older version's perf which contains Arm
SPE trace data.  This patch is a workaround in reporting phase, when
detects ARM SPE PMU event and without PERF_SAMPLE_DATA_SRC bit, it will
force to set the bit in the sample type and give a warning info.

Fixes: bb30acae ("perf report: Bail out --mem-mode if mem info is not available")
Reviewed-by: NJames Clark <james.clark@arm.com>
Signed-off-by: NLeo Yan <leo.yan@linaro.org>
Tested-by: NGerman Gomez <german.gomez@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: https://lore.kernel.org/r/20220414123201.842754-1-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

ce66add4

powerpc/perf: Fix power9 event alternatives · a0820bde

由 Athira Rajeev 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit e012f9d1af54ca3c24ca0e9ec03a1a212972771c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e012f9d1af54ca3c24ca0e9ec03a1a212972771c

--------------------------------

[ Upstream commit 0dcad700 ]

When scheduling a group of events, there are constraint checks done to
make sure all events can go in a group. Example, one of the criteria is
that events in a group cannot use the same PMC. But platform specific
PMU supports alternative event for some of the event codes. During
perf_event_open(), if any event group doesn't match constraint check
criteria, further lookup is done to find alternative event.

By current design, the array of alternatives events in PMU code is
expected to be sorted by column 0. This is because in
find_alternative() the return criteria is based on event code
comparison. ie. "event < ev_alt[i][0])". This optimisation is there
since find_alternative() can be called multiple times. In power9 PMU
code, the alternative event array is not sorted properly and hence there
is breakage in finding alternative events.

To work with existing logic, fix the alternative event array to be
sorted by column 0 for power9-pmu.c

Results:

With alternative events, multiplexing can be avoided. That is, for
example, in power9 PM_LD_MISS_L1 (0x3e054) has alternative event,
PM_LD_MISS_L1_ALT (0x400f0). This is an identical event which can be
programmed in a different PMC.

Before:

# perf stat -e r3e054,r300fc

Performance counter stats for 'system wide':

1057860 r3e054 (50.21%)
379 r300fc (49.79%)

0.944329741 seconds time elapsed

Since both the events are using PMC3 in this case, they are
multiplexed here.

After:

# perf stat -e r3e054,r300fc

Performance counter stats for 'system wide':

1006948 r3e054
182 r300fc

Fixes: 91e0bd1e ("powerpc/perf: Add PM_LD_MISS_L1 and PM_BR_2PATH to power9 event list")
Signed-off-by: NAthira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220419114828.89843-1-atrajeev@linux.vnet.ibm.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

a0820bde

drm/vc4: Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage · c0d651a4

由 Miaoqian Lin 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit 0a2cef65b32919af7df4df979c5eede5f7825f17
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0a2cef65b32919af7df4df979c5eede5f7825f17

--------------------------------

[ Upstream commit 3d0b93d9 ]

If the device is already in a runtime PM enabled state
pm_runtime_get_sync() will return 1.

Also, we need to call pm_runtime_put_noidle() when pm_runtime_get_sync()
fails, so use pm_runtime_resume_and_get() instead. this function
will handle this.

Fixes: 4078f575 ("drm/vc4: Add DSI driver")
Signed-off-by: NMiaoqian Lin <linmq006@gmail.com>
Signed-off-by: NMaxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220420135008.2757-1-linmq006@gmail.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

c0d651a4

KVM: PPC: Fix TCE handling for VFIO · d4248c33

由 Alexey Kardashevskiy 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit f8f8b3124b899867a18a3f63e538c791e21252ac
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f8f8b3124b899867a18a3f63e538c791e21252ac

--------------------------------

[ Upstream commit 26a62b75 ]

The LoPAPR spec defines a guest visible IOMMU with a variable page size.
Currently QEMU advertises 4K, 64K, 2M, 16MB pages, a Linux VM picks
the biggest (16MB). In the case of a passed though PCI device, there is
a hardware IOMMU which does not support all pages sizes from the above -
P8 cannot do 2MB and P9 cannot do 16MB. So for each emulated
16M IOMMU page we may create several smaller mappings ("TCEs") in
the hardware IOMMU.

The code wrongly uses the emulated TCE index instead of hardware TCE
index in error handling. The problem is easier to see on POWER8 with
multi-level TCE tables (when only the first level is preallocated)
as hash mode uses real mode TCE hypercalls handlers.
The kernel starts using indirect tables when VMs get bigger than 128GB
(depends on the max page order).
The very first real mode hcall is going to fail with H_TOO_HARD as
in the real mode we cannot allocate memory for TCEs (we can in the virtual
mode) but on the way out the code attempts to clear hardware TCEs using
emulated TCE indexes which corrupts random kernel memory because
it_offset==1<<59 is subtracted from those indexes and the resulting index
is out of the TCE table bounds.

This fixes kvmppc_clear_tce() to use the correct TCE indexes.

While at it, this fixes TCE cache invalidation which uses emulated TCE
indexes instead of the hardware ones. This went unnoticed as 64bit DMA
is used these days and VMs map all RAM in one go and only then do DMA
and this is when the TCE cache gets populated.

Potentially this could slow down mapping, however normally 16MB
emulated pages are backed by 64K hardware pages so it is one write to
the "TCE Kill" per 256 updates which is not that bad considering the size
of the cache (1024 TCEs or so).

Fixes: ca1fc489 ("KVM: PPC: Book3S: Allow backing bigger guest IOMMU pages with smaller physical pages")
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NFrederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220420050840.328223-1-aik@ozlabs.ruSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

d4248c33

drm/panel/raspberrypi-touchscreen: Initialise the bridge in prepare · 27f402d6

由 Dave Stevenson 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit 405d98427416849cf37c84c0c70bd5008b686a1e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=405d98427416849cf37c84c0c70bd5008b686a1e

--------------------------------

[ Upstream commit 5f18c078 ]

The panel has a prepare call which is before video starts, and an
enable call which is after.
The Toshiba bridge should be configured before video, so move
the relevant power and initialisation calls to prepare.

Fixes: 2f733d61 ("drm/panel: Add support for the Raspberry Pi 7" Touchscreen.")
Signed-off-by: NDave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: NStefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: NMaxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220415162513.42190-3-stefan.wahren@i2se.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

27f402d6

drm/panel/raspberrypi-touchscreen: Avoid NULL deref if not initialised · 5f70f8f0

由 Dave Stevenson 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit 231381f5211620cec836b921f1c7a2cf702b3e8a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=231381f5211620cec836b921f1c7a2cf702b3e8a

--------------------------------

[ Upstream commit f92055ae ]

If a call to rpi_touchscreen_i2c_write from rpi_touchscreen_probe
fails before mipi_dsi_device_register_full is called, then
in trying to log the error message if uses ts->dsi->dev when
it is still NULL.

Use ts->i2c->dev instead, which is initialised earlier in probe.

Fixes: 2f733d61 ("drm/panel: Add support for the Raspberry Pi 7" Touchscreen.")
Signed-off-by: NDave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: NStefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: NMaxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220415162513.42190-2-stefan.wahren@i2se.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

5f70f8f0

perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled · 47edbfbf

由 Zhipeng Xie 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit 51d9cbbb0f5a175e5b4c4a25d2ec995363304860
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=51d9cbbb0f5a175e5b4c4a25d2ec995363304860

--------------------------------

[ Upstream commit 60490e79 ]

This problem can be reproduced with CONFIG_PERF_USE_VMALLOC enabled on
both x86_64 and aarch64 arch when using sysdig -B(using ebpf)[1].
sysdig -B works fine after rebuilding the kernel with
CONFIG_PERF_USE_VMALLOC disabled.

I tracked it down to the if condition event->rb->nr_pages != nr_pages
in perf_mmap is true when CONFIG_PERF_USE_VMALLOC is enabled where
event->rb->nr_pages = 1 and nr_pages = 2048 resulting perf_mmap to
return -EINVAL. This is because when CONFIG_PERF_USE_VMALLOC is
enabled, rb->nr_pages is always equal to 1.

Arch with CONFIG_PERF_USE_VMALLOC enabled by default:
	arc/arm/csky/mips/sh/sparc/xtensa

Arch with CONFIG_PERF_USE_VMALLOC disabled by default:
	x86_64/aarch64/...

Fix this problem by using data_page_nr()

[1] https://github.com/draios/sysdig

Fixes: 906010b2 ("perf_event: Provide vmalloc() based mmap() backing")
Signed-off-by: NZhipeng Xie <xiezhipeng1@huawei.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220209145417.6495-1-xiezhipeng1@huawei.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

47edbfbf

sched/pelt: Fix attach_entity_load_avg() corner case · 8109e161

由 kuyo chang 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit 88fcfd6ee6c5a617e712b346e9c15fc3057e532e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=88fcfd6ee6c5a617e712b346e9c15fc3057e532e

--------------------------------

[ Upstream commit 40f5aa4c ]

The warning in cfs_rq_is_decayed() triggered:

    SCHED_WARN_ON(cfs_rq->avg.load_avg ||
		  cfs_rq->avg.util_avg ||
		  cfs_rq->avg.runnable_avg)

There exists a corner case in attach_entity_load_avg() which will
cause load_sum to be zero while load_avg will not be.

Consider se_weight is 88761 as per the sched_prio_to_weight[] table.
Further assume the get_pelt_divider() is 47742, this gives:
se->avg.load_avg is 1.

However, calculating load_sum:

  se->avg.load_sum = div_u64(se->avg.load_avg * se->avg.load_sum, se_weight(se));
  se->avg.load_sum = 1*47742/88761 = 0.

Then enqueue_load_avg() adds this to the cfs_rq totals:

  cfs_rq->avg.load_avg += se->avg.load_avg;
  cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum;

Resulting in load_avg being 1 with load_sum is 0, which will trigger
the WARN.

Fixes: f207934f ("sched/fair: Align PELT windows between cfs_rq and its se")
Signed-off-by: Nkuyo chang <kuyo.chang@mediatek.com>
[peterz: massage changelog]
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
Tested-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/20220414090229.342-1-kuyo.chang@mediatek.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

8109e161

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功