提交 · 10e8409f3f9fc2db950a4dd1f8933c1af3fff850 · openeuler / Kernel

06 3月, 2023 10 次提交

mm/sharepool: Simplify sp_unshare_uva() · 10e8409f

由 Wang Wensheng 提交于 3月 04, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I650K6

--------------------------------

The unshare process for k2task can be normalized with k2spg
since there exist a local sp group for k2task.
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>

10e8409f

mm/sharepool: Rename sp_group operations · 7325bc5b

由 Wang Wensheng 提交于 3月 04, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I650K6

--------------------------------

Rename sp_group_drop[_locked] to sp_group_put[_locked].
Rename __sp_find_spg[_locked] to sp_group_get[_locked].
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>

7325bc5b

mm/sharepool: Simplify sp_make_share_k2u() · 5cd5e8ce

由 Wang Wensheng 提交于 3月 04, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I650K6

--------------------------------

The process for k2task can be normalized with k2spg since there exist a
local sp group for k2task.
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>

5cd5e8ce

mm/sharepool: Reorganize create_spg() · 7dd71d62

由 Wang Wensheng 提交于 3月 04, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I650K6

--------------------------------

1. Extract a function that initialize all the members for a newly
   allocated sp_group. Just to decrease the function size.
2. Move the idr_alloc to the end of the function, since we should not
   add an uninitialized sp_group to the global idr.
3. Rename the file for hugetlb map.
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>

7dd71d62

mm/sharepool: Add helper for master_list · 9cb23272

由 Xu Qiang 提交于 3月 04, 2023

hulk inclusion
category: other
bugzilla: https://gitee.com/openeuler/kernel/issues/I6HRGK

----------------------------------------------

Add two Helper functions sp_add_group_master and
sp_del_group_master to manipulate master_list.
Signed-off-by: NXu Qiang <xuqiang36@huawei.com>

9cb23272

mm/sharepool: Refactoring proc file interface similar code · 793b069d

由 Xu Qiang 提交于 3月 04, 2023

hulk inclusion
category: other
bugzilla: https://gitee.com/openeuler/kernel/issues/I6HRGK

----------------------------------------------

In spa_overview_show, spg_info_show and spg_overview_show,
there is similar code.

The solution is to extract the difference into the function macro.
Signed-off-by: NXu Qiang <xuqiang36@huawei.com>

793b069d

mm/sharepool: Don't display sharepool statistics in the container · 250a1f57

由 Zhou Guanghui 提交于 3月 04, 2023

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6HRGK

--------------------------------------------

The sharepool statistics record the sharepool memory information used
by all containers in the system. We do not expect to query the
sharepool memory information applied by processes in other containers
in the container.

Therefore, the sharepool statistics cannot be queried in the container
to solve this problem.
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>

250a1f57

mm/sharepool: Fix NULL pointer dereference in mg_sp_group_del_task · 3c4cb588

由 Wang Wensheng 提交于 3月 04, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I650K6

--------------------------------

If we delete a task that has not been added to any group from a
specified group, NULL pointer dereference would occur.
[  162.566615] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[  162.567699] Mem abort info:
[  162.567971]   ESR = 0x96000006
[  162.568187]   EC = 0x25: DABT (current EL), IL = 32 bits
[  162.568508]   SET = 0, FnV = 0
[  162.568670]   EA = 0, S1PTW = 0
[  162.568794] Data abort info:
[  162.568906]   ISV = 0, ISS = 0x00000006
[  162.569032]   CM = 0, WnR = 0
[  162.569314] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001029e0000
[  162.569516] [0000000000000008] pgd=00000001026da003, p4d=00000001026da003, pud=0000000102a90003, pmd=0000000000000000
[  162.570346] Internal error: Oops: 96000006 [#1] SMP
[  162.570524] CPU: 0 PID: 880 Comm: test_sp_group_d Tainted: G        W  O      5.10.0+ #1
[  162.570868] Hardware name: linux,dummy-virt (DT)
[  162.571053] pstate: 00000005 (nzcv daif -PAN -UAO -TCO BTYPE=--)
[  162.571370] pc : mg_sp_group_del_task+0x164/0x488
[  162.571511] lr : mg_sp_group_del_task+0x158/0x488
[  162.571644] sp : ffff8000127d3ca0
[  162.571749] x29: ffff8000127d3ca0 x28: ffff372281b8c140
[  162.571922] x27: 0000000000000000 x26: ffff372280b261c0
[  162.572090] x25: ffffd075db9a9000 x24: ffffd075db9a90f8
[  162.572259] x23: ffffd075db9a90e0 x22: 0000000000000371
[  162.572425] x21: ffff372280826b00 x20: 0000000000000000
[  162.572592] x19: ffffd075db12b000 x18: 0000000000000000
[  162.572756] x17: 0000000000000000 x16: ffffd075da51e60c
[  162.572923] x15: 0000ffffdcf1a540 x14: 0000000000000000
[  162.573087] x13: 0000000000000000 x12: 0000000000000000
[  162.573250] x11: 0000000000000040 x10: ffffd075db5f1908
[  162.573415] x9 : ffffd075db5f1900 x8 : ffff3722816f54b0
[  162.573579] x7 : 0000000000000000 x6 : 0000000000000000
[  162.573741] x5 : ffff3722816f5488 x4 : 0000000000000000
[  162.573906] x3 : ffff372280b2620c x2 : ffff37228036b4a0
[  162.574069] x1 : 0000000000000000 x0 : ffff372280b261c0
[  162.574239] Call trace:
[  162.574336]  mg_sp_group_del_task+0x164/0x488
[  162.575262]  dev_ioctl+0x10cc/0x2478 [sharepool_dev]
[  162.575443]  __arm64_sys_ioctl+0xb4/0xf0
[  162.575585]  el0_svc_common.constprop.0+0xe4/0x2d4
[  162.575726]  do_el0_svc+0x34/0xa8
[  162.575838]  el0_svc+0x1c/0x28
[  162.575941]  el0_sync_handler+0x90/0xf0
[  162.576060]  el0_sync+0x168/0x180
[  162.576391] Code: 97f4d4bf aa0003fa b4001580 f9420c01 (f8408c20)
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>

3c4cb588

mm/sharepool: Fix a double free problem caused by init_local_group · 144c1dd2

由 Chen Jun 提交于 3月 04, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I64Y5Y

-------------------------------

If local_group_add_task fails in init_local_group. ida free the
same id twice.

init_local_group
  local_group_add_task    // failed
  goto free_spg

free_spg:
  free_sp_group_locked
    free_sp_group_id      // free spg->id
free_spg_id:
  free_new_spg_id         // double free spg->id

To fix it, return before calling free_new_spg_id.
Signed-off-by: NChen Jun <chenjun102@huawei.com>

144c1dd2

vmalloc: Add config for Extend for hugepages mapping · 69ebe14f

由 Zhang Zekun 提交于 3月 04, 2023

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I6HRGK

----------------------------------------

Use CONFIG_EXTEND_HUGEPAGE_MAPPING to isolate code
introduced in a3425d41.

Besides, use tab instead of space to match the format
of Kconfig
Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>

69ebe14f

25 2月, 2023 1 次提交

mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages · 90da40b5

由 Rik van Riel 提交于 2月 22, 2023

mainline inclusion
from mainline-v6.1-rc2
commit 12df140f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6EVPO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=12df140f0bdfae5dcfc81800970dd7f6f632e00c

--------------------------------

The h->*_huge_pages counters are protected by the hugetlb_lock, but
alloc_huge_page has a corner case where it can decrement the counter
outside of the lock.

This could lead to a corrupted value of h->resv_huge_pages, which we have
observed on our systems.

Take the hugetlb_lock before decrementing h->resv_huge_pages to avoid a
potential race.

Link: https://lkml.kernel.org/r/20221017202505.0e6a4fcd@imladris.surriel.com
Fixes: a88c7695 ("mm: hugetlb: fix hugepage memory leak caused by wrong reserve count")
Signed-off-by: NRik van Riel <riel@surriel.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Glen McCready <gkmccready@meta.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NZhang Peng <zhangpeng362@huawei.com>
Reviewed-by: Ntong tiangen <tongtiangen@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

90da40b5

17 2月, 2023 6 次提交

mm: gup: fix the fast GUP race against THP collapse · 7cb4325b

由 Yang Shi 提交于 2月 15, 2023

stable inclusion
from stable-v5.10.148
commit 377c60dd32d3289788bdb3d8840382f79d42139b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0WL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=377c60dd32d3289788bdb3d8840382f79d42139b

--------------------------------

commit 70cbc3cc upstream.

Since general RCU GUP fast was introduced in commit 2667f50e ("mm:
introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer
sufficient to handle concurrent GUP-fast in all cases, it only handles
traditional IPI-based GUP-fast correctly.  On architectures that send an
IPI broadcast on TLB flush, it works as expected.  But on the
architectures that do not use IPI to broadcast TLB flush, it may have the
below race:

   CPU A                                          CPU B
THP collapse                                     fast GUP
                                              gup_pmd_range() <-- see valid pmd
                                                  gup_pte_range() <-- work on pte
pmdp_collapse_flush() <-- clear pmd and flush
__collapse_huge_page_isolate()
    check page pinned <-- before GUP bump refcount
                                                      pin the page
                                                      check PTE <-- no change
__collapse_huge_page_copy()
    copy data to huge page
    ptep_clear()
install huge pmd for the huge page
                                                      return the stale page
discard the stale page

The race can be fixed by checking whether PMD is changed or not after
taking the page pin in fast GUP, just like what it does for PTE.  If the
PMD is changed it means there may be parallel THP collapse, so GUP should
back off.

Also update the stale comment about serializing against fast GUP in
khugepaged.

Link: https://lkml.kernel.org/r/20220907180144.555485-1-shy828301@gmail.com
Fixes: 2667f50e ("mm: introduce a general RCU get_user_pages_fast()")
Acked-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NYang Shi <shy828301@gmail.com>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

7cb4325b

mm: fix madivse_pageout mishandling on non-LRU page · 2577c3c3

由 Minchan Kim 提交于 2月 15, 2023

stable inclusion
from stable-v5.10.147
commit be2cd261ca510be5cb51b198c602c516207dd32b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0W8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=be2cd261ca510be5cb51b198c602c516207dd32b

--------------------------------

commit 58d426a7 upstream.

MADV_PAGEOUT tries to isolate non-LRU pages and gets a warning from
isolate_lru_page below.

Fix it by checking PageLRU in advance.

------------[ cut here ]------------
trying to isolate tail page
WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140
Modules linked in:
CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:isolate_lru_page+0x130/0x140

Link: https://lore.kernel.org/linux-mm/485f8c33.2471b.182d5726afb.Coremail.hantianshuo@iie.ac.cn/
Link: https://lkml.kernel.org/r/20220908151204.762596-1-minchan@kernel.org
Fixes: 1a4e58cc ("mm: introduce MADV_PAGEOUT")
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Reported-by: N韩天ç` <hantianshuo@iie.ac.cn>
Suggested-by: NYang Shi <shy828301@gmail.com>
Acked-by: NYang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

2577c3c3

mm/migrate_device.c: flush TLB while holding PTL · 2ddf2425

由 Alistair Popple 提交于 2月 15, 2023

stable inclusion
from stable-v5.10.147
commit 1002d5fef406c8dc685679aedb16938ad57d968c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0W8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1002d5fef406c8dc685679aedb16938ad57d968c

--------------------------------

commit 60bae737 upstream.

When clearing a PTE the TLB should be flushed whilst still holding the PTL
to avoid a potential race with madvise/munmap/etc.  For example consider
the following sequence:

  CPU0                          CPU1
  ----                          ----

  migrate_vma_collect_pmd()
  pte_unmap_unlock()
                                madvise(MADV_DONTNEED)
                                -> zap_pte_range()
                                pte_offset_map_lock()
                                [ PTE not present, TLB not flushed ]
                                pte_unmap_unlock()
                                [ page is still accessible via stale TLB ]
  flush_tlb_range()

In this case the page may still be accessed via the stale TLB entry after
madvise returns.  Fix this by flushing the TLB while holding the PTL.

Fixes: 8c3328f1 ("mm/migrate: migrate_vma() unmap page from vma while collecting pages")
Link: https://lkml.kernel.org/r/9f801e9d8d830408f2ca27821f606e09aa856899.1662078528.git-series.apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com>
Reported-by: NNadav Amit <nadav.amit@gmail.com>
Reviewed-by: N"Huang, Ying" <ying.huang@intel.com>
Acked-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NPeter Xu <peterx@redhat.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

2ddf2425

mm: prevent page_frag_alloc() from corrupting the memory · fab2a606

由 Maurizio Lombardi 提交于 2月 15, 2023

stable inclusion
from stable-v5.10.147
commit a54fc536911371bd5863afdcb2418ab948484b75
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0W8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a54fc536911371bd5863afdcb2418ab948484b75

--------------------------------

commit dac22531 upstream.

A number of drivers call page_frag_alloc() with a fragment's size >
PAGE_SIZE.

In low memory conditions, __page_frag_cache_refill() may fail the order
3 cache allocation and fall back to order 0; In this case, the cache
will be smaller than the fragment, causing memory corruptions.

Prevent this from happening by checking if the newly allocated cache is
large enough for the fragment; if not, the allocation will fail and
page_frag_alloc() will return NULL.

Link: https://lkml.kernel.org/r/20220715125013.247085-1-mlombard@redhat.com
Fixes: b63ae8ca ("mm/net: Rename and move page fragment handling from net/ to mm/")
Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Cc: Chen Lin <chen45464546@163.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

fab2a606

mm/page_alloc: fix race condition between build_all_zonelists and page allocation · 83b19987

由 Mel Gorman 提交于 2月 15, 2023

stable inclusion
from stable-v5.10.147
commit 466a26af2d105666a427e4892b2be67cfcc55f4f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0W8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=466a26af2d105666a427e4892b2be67cfcc55f4f

--------------------------------

commit 3d36424b upstream.

Patrick Daly reported the following problem;

	NODE_DATA(nid)->node_zonelists[ZONELIST_FALLBACK] - before offline operation
	[0] - ZONE_MOVABLE
	[1] - ZONE_NORMAL
	[2] - NULL

	For a GFP_KERNEL allocation, alloc_pages_slowpath() will save the
	offset of ZONE_NORMAL in ac->preferred_zoneref. If a concurrent
	memory_offline operation removes the last page from ZONE_MOVABLE,
	build_all_zonelists() & build_zonerefs_node() will update
	node_zonelists as shown below. Only populated zones are added.

	NODE_DATA(nid)->node_zonelists[ZONELIST_FALLBACK] - after offline operation
	[0] - ZONE_NORMAL
	[1] - NULL
	[2] - NULL

The race is simple -- page allocation could be in progress when a memory
hot-remove operation triggers a zonelist rebuild that removes zones.  The
allocation request will still have a valid ac->preferred_zoneref that is
now pointing to NULL and triggers an OOM kill.

This problem probably always existed but may be slightly easier to trigger
due to 6aa303de ("mm, vmscan: only allocate and reclaim from zones
with pages managed by the buddy allocator") which distinguishes between
zones that are completely unpopulated versus zones that have valid pages
not managed by the buddy allocator (e.g.  reserved, memblock, ballooning
etc).  Memory hotplug had multiple stages with timing considerations
around managed/present page updates, the zonelist rebuild and the zone
span updates.  As David Hildenbrand puts it

	memory offlining adjusts managed+present pages of the zone
	essentially in one go. If after the adjustments, the zone is no
	longer populated (present==0), we rebuild the zone lists.

	Once that's done, we try shrinking the zone (start+spanned
	pages) -- which results in zone_start_pfn == 0 if there are no
	more pages. That happens *after* rebuilding the zonelists via
	remove_pfn_range_from_zone().

The only requirement to fix the race is that a page allocation request
identifies when a zonelist rebuild has happened since the allocation
request started and no page has yet been allocated.  Use a seqlock_t to
track zonelist updates with a lockless read-side of the zonelist and
protecting the rebuild and update of the counter with a spinlock.

[akpm@linux-foundation.org: make zonelist_update_seq static]
Link: https://lkml.kernel.org/r/20220824110900.vh674ltxmzb3proq@techsingularity.net
Fixes: 6aa303de ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Reported-by: NPatrick Daly <quic_pdaly@quicinc.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>	[4.9+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

83b19987

mm/slub: fix to return errno if kmalloc() fails · c87f74f2

由 Chao Yu 提交于 2月 15, 2023

stable inclusion
from stable-v5.10.146
commit 379ac7905ff3f0a6a4e507d3e9f710ec4fab9124
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0VX

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=379ac7905ff3f0a6a4e507d3e9f710ec4fab9124

--------------------------------

commit 7e9c323c upstream.

In create_unique_id(), kmalloc(, GFP_KERNEL) can fail due to
out-of-memory, if it fails, return errno correctly rather than
triggering panic via BUG_ON();

kernel BUG at mm/slub.c:5893!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP

Call trace:
 sysfs_slab_add+0x258/0x260 mm/slub.c:5973
 __kmem_cache_create+0x60/0x118 mm/slub.c:4899
 create_cache mm/slab_common.c:229 [inline]
 kmem_cache_create_usercopy+0x19c/0x31c mm/slab_common.c:335
 kmem_cache_create+0x1c/0x28 mm/slab_common.c:390
 f2fs_kmem_cache_create fs/f2fs/f2fs.h:2766 [inline]
 f2fs_init_xattr_caches+0x78/0xb4 fs/f2fs/xattr.c:808
 f2fs_fill_super+0x1050/0x1e0c fs/f2fs/super.c:4149
 mount_bdev+0x1b8/0x210 fs/super.c:1400
 f2fs_mount+0x44/0x58 fs/f2fs/super.c:4512
 legacy_get_tree+0x30/0x74 fs/fs_context.c:610
 vfs_get_tree+0x40/0x140 fs/super.c:1530
 do_new_mount+0x1dc/0x4e4 fs/namespace.c:3040
 path_mount+0x358/0x914 fs/namespace.c:3370
 do_mount fs/namespace.c:3383 [inline]
 __do_sys_mount fs/namespace.c:3591 [inline]
 __se_sys_mount fs/namespace.c:3568 [inline]
 __arm64_sys_mount+0x2f8/0x408 fs/namespace.c:3568

Cc: <stable@kernel.org>
Fixes: 81819f0f ("SLUB core")
Reported-by: syzbot+81684812ea68216e08c5@syzkaller.appspotmail.com
Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NHyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: NChao Yu <chao.yu@oppo.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

c87f74f2

13 2月, 2023 2 次提交

mm: Fix TLB flush for not-first PFNMAP mappings in unmap_region() · 839a3c31

由 Jann Horn 提交于 2月 13, 2023

stable inclusion
from stable-v5.10.144
commit 891f03f688de8418f44b32b88f6b4faed5b2aa81
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0V7

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=891f03f688de8418f44b32b88f6b4faed5b2aa81

--------------------------------

This is a stable-specific patch.
I botched the stable-specific rewrite of
commit b67fbebd ("mmu_gather: Force tlb-flush VM_PFNMAP vmas"):
As Hugh pointed out, unmap_region() actually operates on a list of VMAs,
and the variable "vma" merely points to the first VMA in that list.
So if we want to check whether any of the VMAs we're operating on is
PFNMAP or MIXEDMAP, we have to iterate through the list and check each VMA.
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

839a3c31

Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()" · 13031a84

由 Yee Lee 提交于 2月 13, 2023

stable inclusion
from stable-v5.10.143
commit a14f1799ce373b435a2a6253747dedc23ea35abc
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0U6

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a14f1799ce373b435a2a6253747dedc23ea35abc

--------------------------------

This reverts commit 23c2d497.

Commit 23c2d497 ("mm: kmemleak: take a full lowmem check in
kmemleak_*_phys()") brought false leak alarms on some archs like arm64
that does not init pfn boundary in early booting. The final solution
lands on linux-6.0: commit 0c24e061 ("mm: kmemleak: add rbtree and
store physical address for objects allocated with PA").

Revert this commit before linux-6.0. The original issue of invalid PA
can be mitigated by additional check in devicetree.

The false alarm report is as following: Kmemleak output: (Qemu/arm64)
unreferenced object 0xffff0000c0170a00 (size 128):
  comm "swapper/0", pid 1, jiffies 4294892404 (age 126.208s)
  hex dump (first 32 bytes):
 62 61 73 65 00 00 00 00 00 00 00 00 00 00 00 00  base............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<(____ptrval____)>] __kmalloc_track_caller+0x1b0/0x2e4
    [<(____ptrval____)>] kstrdup_const+0x8c/0xc4
    [<(____ptrval____)>] kvasprintf_const+0xbc/0xec
    [<(____ptrval____)>] kobject_set_name_vargs+0x58/0xe4
    [<(____ptrval____)>] kobject_add+0x84/0x100
    [<(____ptrval____)>] __of_attach_node_sysfs+0x78/0xec
    [<(____ptrval____)>] of_core_init+0x68/0x104
    [<(____ptrval____)>] driver_init+0x28/0x48
    [<(____ptrval____)>] do_basic_setup+0x14/0x28
    [<(____ptrval____)>] kernel_init_freeable+0x110/0x178
    [<(____ptrval____)>] kernel_init+0x20/0x1a0
    [<(____ptrval____)>] ret_from_fork+0x10/0x20

This pacth is also applicable to linux-5.17.y/linux-5.18.y/linux-5.19.y

Cc: <stable@vger.kernel.org>
Signed-off-by: NYee Lee <yee.lee@mediatek.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

13031a84

09 2月, 2023 3 次提交

mm: pagewalk: Fix race between unmap and page walker · 16efd9d7

由 Steven Price 提交于 2月 07, 2023

stable inclusion
from stable-v5.10.142
commit 47a73e5e6ba42e42db2d1400478b2e132e25ceb4
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6CSFH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=47a73e5e6ba42e42db2d1400478b2e132e25ceb4

--------------------------------

[ Upstream commit 8782fb61 ]

The mmap lock protects the page walker from changes to the page tables
during the walk.  However a read lock is insufficient to protect those
areas which don't have a VMA as munmap() detaches the VMAs before
downgrading to a read lock and actually tearing down PTEs/page tables.

For users of walk_page_range() the solution is to simply call pte_hole()
immediately without checking the actual page tables when a VMA is not
present. We now never call __walk_page_range() without a valid vma.

For walk_page_range_novma() the locking requirements are tightened to
require the mmap write lock to be taken, and then walking the pgd
directly with 'no_vma' set.

This in turn means that all page walkers either have a valid vma, or
it's that special 'novma' case for page table debugging.  As a result,
all the odd '(!walk->vma && !walk->no_vma)' tests can be removed.

Fixes: dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap")
Reported-by: NJann Horn <jannh@google.com>
Signed-off-by: NSteven Price <steven.price@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

16efd9d7

mm/memcg_memfs_info: fix potential oom_lock recursion deadlock · 36d09fa1

由 Liu Shixin 提交于 2月 07, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6ADCF
CVE: NA

--------------------------------

syzbot is reporting GFP_KERNEL allocation with oom_lock held when
reporting memcg OOM [1]. If this allocation triggers the global OOM
situation then the system can livelock because the GFP_KERNEL
allocation with oom_lock held cannot trigger the global OOM killer
because __alloc_pages_may_oom() fails to hold oom_lock.

The problem mentioned above has been fixed by patch[2]. The is the same
problem in memcg_memfs_info feature too. Refer to the patch[2], fix it by
removing the allocation from mem_cgroup_print_memfs_info() completely,
and pass static buffer when calling from memcg OOM path.

Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1]
Link: https://lkml.kernel.org/r/86afb39f-8c65-bec2-6cfc-c5e3cd600c0b@I-love.SAKURA.ne.jp [2]
Fixes: 6b1d4d3a ("mm/memcg_memfs_info: show files that having pages charged in mem_cgroup")
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NCai Xinchen <caixinchen1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
(cherry picked from commit d2218535)

36d09fa1

mm: memcontrol: fix potential oom_lock recursion deadlock · 3f533ee3

由 Tetsuo Handa 提交于 2月 07, 2023

mainline inclusion
from mainline-v6.0-rc1
commit 68aaee14
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6ADCF
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=68aaee147e597b495622b7c9038e5922c7c61f57

--------------------------------

syzbot is reporting GFP_KERNEL allocation with oom_lock held when
reporting memcg OOM [1].  If this allocation triggers the global OOM
situation then the system can livelock because the GFP_KERNEL
allocation with oom_lock held cannot trigger the global OOM killer
because __alloc_pages_may_oom() fails to hold oom_lock.

Fix this problem by removing the allocation from memory_stat_format()
completely, and pass static buffer when calling from memcg OOM path.

Note that the caller holding filesystem lock was the trigger for syzbot
to report this locking dependency.  Doing GFP_KERNEL allocation with
filesystem lock held can deadlock the system even without involving OOM
situation.

Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1]
Link: https://lkml.kernel.org/r/86afb39f-8c65-bec2-6cfc-c5e3cd600c0b@I-love.SAKURA.ne.jp
Fixes: c8713d0b ("mm: memcontrol: dump memory.stat during cgroup OOM")
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Nsyzbot <syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com>
Suggested-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

conflicts:
	mm/memcontrol.c
Signed-off-by: NCai Xinchen <caixinchen1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
(cherry picked from commit ee2d7b76)

3f533ee3

31 1月, 2023 2 次提交

mm/memory: return vm_fault_t result from migrate_to_ram() callback · 75e9319e

由 Alistair Popple 提交于 1月 31, 2023

mainline inclusion
from mainline-v6.1-rc7
commit 4a955bed
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6BG56
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4a955bed882e734807024afd8f53213d4c61ff97

--------------------------------

The migrate_to_ram() callback should always succeed, but in rare cases can
fail usually returning VM_FAULT_SIGBUS.  Commit 16ce101d
("mm/memory.c: fix race when faulting a device private page") incorrectly
stopped passing the return code up the stack.  Fix this by setting the ret
variable, restoring the previous behaviour on migrate_to_ram() failure.

Link: https://lkml.kernel.org/r/20221114115537.727371-1-apopple@nvidia.com
Fixes: 16ce101d ("mm/memory.c: fix race when faulting a device private page")
Signed-off-by: NAlistair Popple <apopple@nvidia.com>
Acked-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Ntong tiangen <tongtiangen@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
(cherry picked from commit 1254c416)

75e9319e

mm/vmpressure: fix data-race with memcg->socket_pressure · e7627053

由 Yuanzheng Song 提交于 1月 31, 2023

mainline inclusion
from mainline-v5.16-rc1
commit 7e6ec49c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6AW65
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7e6ec49c18988f1b8dab0677271dafde5f8d9a43

--------------------------------

When reading memcg->socket_pressure in mem_cgroup_under_socket_pressure()
and writing memcg->socket_pressure in vmpressure() at the same time, the
following data-race occurs:

  BUG: KCSAN: data-race in __sk_mem_reduce_allocated / vmpressure

  write to 0xffff8881286f4938 of 8 bytes by task 24550 on cpu 3:
   vmpressure+0x218/0x230 mm/vmpressure.c:307
   shrink_node_memcgs+0x2b9/0x410 mm/vmscan.c:2658
   shrink_node+0x9d2/0x11d0 mm/vmscan.c:2769
   shrink_zones+0x29f/0x470 mm/vmscan.c:2972
   do_try_to_free_pages+0x193/0x6e0 mm/vmscan.c:3027
   try_to_free_mem_cgroup_pages+0x1c0/0x3f0 mm/vmscan.c:3345
   reclaim_high mm/memcontrol.c:2440 [inline]
   mem_cgroup_handle_over_high+0x18b/0x4d0 mm/memcontrol.c:2624
   tracehook_notify_resume include/linux/tracehook.h:197 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:164 [inline]
   exit_to_user_mode_prepare+0x110/0x170 kernel/entry/common.c:191
   syscall_exit_to_user_mode+0x16/0x30 kernel/entry/common.c:266
   ret_from_fork+0x15/0x30 arch/x86/entry/entry_64.S:289

  read to 0xffff8881286f4938 of 8 bytes by interrupt on cpu 1:
   mem_cgroup_under_socket_pressure include/linux/memcontrol.h:1483 [inline]
   sk_under_memory_pressure include/net/sock.h:1314 [inline]
   __sk_mem_reduce_allocated+0x1d2/0x270 net/core/sock.c:2696
   __sk_mem_reclaim+0x44/0x50 net/core/sock.c:2711
   sk_mem_reclaim include/net/sock.h:1490 [inline]
   ......
   net_rx_action+0x17a/0x480 net/core/dev.c:6864
   __do_softirq+0x12c/0x2af kernel/softirq.c:298
   run_ksoftirqd+0x13/0x20 kernel/softirq.c:653
   smpboot_thread_fn+0x33f/0x510 kernel/smpboot.c:165
   kthread+0x1fc/0x220 kernel/kthread.c:292
   ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296

Fix it by using READ_ONCE() and WRITE_ONCE() to read and write
memcg->socket_pressure.

Link: https://lkml.kernel.org/r/20211025082843.671690-1-songyuanzheng@huawei.comSigned-off-by: NYuanzheng Song <songyuanzheng@huawei.com>
Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Alex Shi <alexs@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NCai Xinchen <caixinchen1@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
(cherry picked from commit bbc0f30d)

e7627053

19 1月, 2023 2 次提交

driver: char: delete svm.c · 14c74d92

由 Guo Mengqi 提交于 1月 18, 2023

hulk inclusion
category: other
bugzilla: https://gitee.com/openeuler/kernel/issues/I69JDF
CVE: NA

-------------------------------

Delete svm driver, as it is specially designed for Hisilicon platform.
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: Nchenweilong <chenweilong@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
(cherry picked from commit 056e261b)

14c74d92

mm: fix unexpected changes to {failslab|fail_page_alloc}.attr · f8f98863

由 Qi Zheng 提交于 1月 18, 2023

mainline inclusion
from mainline-v6.1-rc7
commit ea4452de
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I69VVC
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ea4452de2ae987342fadbdd2c044034e6480daad

--------------------------------

When we specify __GFP_NOWARN, we only expect that no warnings will be
issued for current caller.  But in the __should_failslab() and
__should_fail_alloc_page(), the local GFP flags alter the global
{failslab|fail_page_alloc}.attr, which is persistent and shared by all
tasks.  This is not what we expected, let's fix it.

[akpm@linux-foundation.org: unexport should_fail_ex()]
Link: https://lkml.kernel.org/r/20221118100011.2634-1-zhengqi.arch@bytedance.com
Fixes: 3f913fc5 ("mm: fix missing handler for __GFP_NOWARN")
Signed-off-by: NQi Zheng <zhengqi.arch@bytedance.com>
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Reviewed-by: NAkinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: Ntong tiangen <tongtiangen@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
(cherry picked from commit 49457cf7)

f8f98863

16 1月, 2023 1 次提交

kasan: add kasan support for memcpy_mcs() · 525213f1

由 Tong Tiangen 提交于 1月 12, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6AQE8
CVE: NA

-------------------------------

The function memcpy_mcs() is designed to copy in memory and should be
supported by kasan.
Signed-off-by: NTong Tiangen <tongtiangen@huawei.com>

525213f1

04 1月, 2023 4 次提交

mm: add cond_resched() in swapin_walk_pmd_entry() · e2442372

由 Kefeng Wang 提交于 1月 04, 2023

mainline inclusion
from mainline-v6.2-rc1
commit de2e5171
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I67BWQ
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=de2e5171433126d340573cb7d0d4fcac084ab2a0

--------------------------------

When handling MADV_WILLNEED in madvise(), a soflockup may occurr in
swapin_walk_pmd_entry() if swapping in lots of memory on a slow device.
Add a cond_resched() to avoid the possible softlockup.

Link: https://lkml.kernel.org/r/20221205140327.72304-1-wangkefeng.wang@huawei.com
Fixes: 1998cc04 ("mm: make madvise(MADV_WILLNEED) support swap file prefetch")
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

Conflicts:
	mm/madvise.c
Signed-off-by: NLonglong Xia <xialonglong1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e2442372

mm/dynamic_hugetlb: fix clear PagePool without lock protection · 75504f71

由 Liu Shixin 提交于 1月 04, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I66OCA
CVE: NA

--------------------------------

Since free_pages_prepare() will clear the PagePool without lock in
free_page_to_dhugetlb_pool() and free_page_list_to_dhugetlb_pool(),
it is unreliable to check whether a page is freed by PagePool in
hpool_merge_page().
Move free_pages_prepare() after ClearPagePool(), which can guarantee
all allocated page has PagePool flag.

Fixes: 71197c63 ("mm/dynamic_hugetlb: free pages to dhugetlb_pool")
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

75504f71

mm/dynamic_hugetlb: fix list corruption in hpool_merge_page() · 650252c8

由 Liu Shixin 提交于 1月 04, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I66OCA
CVE: NA

--------------------------------

The percpu pool will be cleared by clear_percpu_pools(), and then check
whether all pages are already freed. If some pages are not freed, we will
firstly isolate the freed pages and then migrate the used pages.
Since we missed to get lock of percpu_pool, the used pages can be free to
percpu_pool while the isolation is going. In such case, the list operation
will be unreliable.

To fix this problem, we need to get all related locks sequentially and
clear the perpcu_pool again before isolate the freed pages.

Fixes: cdbeee51 ("mm/dynamic_hugetlb: add migration function")
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

650252c8

mm/swapfile: use new way to fix broken kabi in swap_info_struct · 3355dda5

由 Chen Wandun 提交于 1月 04, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I645JI

-------------------------------

Commit 46d673d7 ("mm/swapfile: fix broken kabi in swap_info_struct")
uses KABI_EXTEND to fix kabi broken problem, but this is not the safest
way, so use new way to fix this prolem.

Introduce new struct swap_extend_info that contains extend info of swap,
meanwhile use KABI_USE to repalce memory space reserved by KABI_RESERVER.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: Nzhangjialin <zhangjialin11@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3355dda5

08 12月, 2022 6 次提交

mm/dynamic_hugetlb: fix compound_nr incorrect · a3a6ee2e

由 Liu Shixin 提交于 12月 08, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I641XX
CVE: NA

--------------------------------

Patch 1378a5ee ("mm: store compound_nr as well as compound_order") add
a new member compound_nr in struct page, and use this new member insteal
of compound_order in hugetlb_cgroup_move_parent() to compute the nr_pages.

In free_hugepage_to_hugetlb(), we reset page->mapping to NULL for each
subpage. Since page->mapping and page->compound_nr is union, we reset
page->compound_nr too unexpectly. This will finally result the nr_pages
incorrect in hugetlb_cgroup_move_parent() and can't release hugetlb_cgroup.

Fix this problem by reset page->compound_nr using set_compound_order().
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a3a6ee2e

mm/shmem: fix shmem_swapin() race with swapoff · c8f1dd00

由 Miaohe Lin 提交于 12月 08, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 2efa33fc
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I645JI
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2efa33fc7f6ec94a3a538c1a264273c889be2b36

--------------------------------

When I was investigating the swap code, I found the below possible race
window:

CPU 1                                         CPU 2
-----                                         -----
shmem_swapin
  swap_cluster_readahead
    if (likely(si->flags & (SWP_BLKDEV | SWP_FS_OPS))) {
                                              swapoff
                                                ..
                                                si->swap_file = NULL;
                                                ..
    struct inode *inode = si->swap_file->f_mapping->host;[oops!]

Close this race window by using get/put_swap_device() to guard against
concurrent swapoff.

Link: https://lkml.kernel.org/r/20210426123316.806267-5-linmiaohe@huawei.com
Fixes: 8fd2e0b5 ("mm: swap: check if swap backing device is congested or not")
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: N"Huang, Ying" <ying.huang@intel.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c8f1dd00

swap: fix do_swap_page() race with swapoff · 504f60d0

由 Miaohe Lin 提交于 12月 08, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 2799e775
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I645JI
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2799e77529c2a25492a4395db93996e3dacd762d

--------------------------------

When I was investigating the swap code, I found the below possible race
window:

CPU 1                                   	CPU 2
-----                                   	-----
do_swap_page
  if (data_race(si->flags & SWP_SYNCHRONOUS_IO)
  swap_readpage
    if (data_race(sis->flags & SWP_FS_OPS)) {
                                        	swapoff
					  	  ..
					  	  p->swap_file = NULL;
					  	  ..
    struct file *swap_file = sis->swap_file;
    struct address_space *mapping = swap_file->f_mapping;[oops!]

Note that for the pages that are swapped in through swap cache, this isn't
an issue. Because the page is locked, and the swap entry will be marked
with SWAP_HAS_CACHE, so swapoff() can not proceed until the page has been
unlocked.

Fix this race by using get/put_swap_device() to guard against concurrent
swapoff.

Link: https://lkml.kernel.org/r/20210426123316.806267-3-linmiaohe@huawei.com
Fixes: 0bcac06f ("mm,swap: skip swapcache for swapin of synchronous device")
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: N"Huang, Ying" <ying.huang@intel.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	mm/memory.c
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

504f60d0

mm/swapfile: use percpu_ref to serialize against concurrent swapoff · b95abeb4

由 Miaohe Lin 提交于 12月 08, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 63d8620e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I645JI
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=63d8620ecf93b5d8d0a254471184d08f8e8f538d

--------------------------------

Patch series "close various race windows for swap", v6.

When I was investigating the swap code, I found some possible race
windows.  This series aims to fix all these races.  But using current
get/put_swap_device() to guard against concurrent swapoff for
swap_readpage() looks terrible because swap_readpage() may take really
long time.  And to reduce the performance overhead on the hot-path as much
as possible, it appears we can use the percpu_ref to close this race
window(as suggested by Huang, Ying).  The patch 1 adds percpu_ref support
for swap and most of the remaining patches try to use this to close
various race windows.  More details can be found in the respective
changelogs.

This patch (of 4):

Using current get/put_swap_device() to guard against concurrent swapoff
for some swap ops, e.g.  swap_readpage(), looks terrible because they
might take really long time.  This patch adds the percpu_ref support to
serialize against concurrent swapoff(as suggested by Huang, Ying).  Also
we remove the SWP_VALID flag because it's used together with RCU solution.

Link: https://lkml.kernel.org/r/20210426123316.806267-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20210426123316.806267-2-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: N"Huang, Ying" <ying.huang@intel.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b95abeb4

swapfile: fix soft lockup in scan_swap_map_slots · b2b6c3df

由 Chen Wandun 提交于 12月 08, 2022

mainline inclusion
from mainline-v6.1-rc7
commit de1ccfb6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I645DG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=de1ccfb648243a031cfbdc2d5571dfdaf5023106

--------------------------------

A softlockup occurs in scan free swap slot under huge memory pressure.
The test scenario is: 64 CPU cores, 64GB memory, and 28 zram devices, the
disksize of each zram device is 50MB.

LATENCY_LIMIT is used to prevent softlockups in scan_swap_map_slots(), but
the real loop number would more than LATENCY_LIMIT because of "goto checks
and goto scan" repeatly without decreasing latency limit.

In order to fix it, decrease latency_ration in advance.

There is also a suspicious place that will cause softlockups in
get_swap_pages().  In this function, the "goto start_over" may result in
continuous scanning of the swap partition.  If there is no cond_sched in
scan_swap_map_slots(), it would cause a softlockup (I am not sure about
this).

WARN: soft lockup - CPU#11 stuck for 11s! [kswapd0:466]
CPU: 11 PID: 466 Comm: kswapd@ Kdump: loaded Tainted: G
dump backtrace+0x0/0x1le4
show stack+0x20/@x2c
dump_stack+0xd8/0x140
watchdog print_info+0x48/0x54
watchdog_process_before_softlockup+0x98/0xa0
watchdog_timer_fn+0xlac/0x2d0
hrtimer_rum_queues+0xb0/0x130
hrtimer_interrupt+0x13c/0x3c0
arch_timer_handler_virt+0x3c/0x50
handLe_percpu_devid_irq+0x90/0x1f4
handle domain irq+0x84/0x100
gic_handle_irq+0x88/0x2b0
e11 ira+0xhB/Bx140
scan_swap_map_slots+0x678/0x890
get_swap_pages+0x29c/0x440
get_swap_page+0x120/0x2e0
add_to_swap+UX2U/0XyC
shrink_page_list+0x5d0/0x152c
shrink_inactive_list+0xl6c/Bx500
shrink_lruvec+0x270/0x304

WARN: soft lockup - CPU#32 stuck for 11s! [stress-ng:309915]
watchdog_timer_fn+0x1ac/0x2d0
__run_hrtimer+0x98/0x2a0
__hrtimer_run_queues+0xb0/0x130
hrtimer_interrupt+0x13c/0x3c0
arch_timer_handler_virt+0x3c/0x50
handle_percpu_devid_irq+0x90/0x1f4
__handle_domain_irq+0x84/0x100
gic_handle_irq+0x88/0x2b0
el1_irq+0xb8/0x140
get_swap_pages+0x1e8/0x440
get_swap_page+0x1c8/0x2e0
add_to_swap+0x20/0x9c
shrink_page_list+0x5d0/0x152c
reclaim_pages+0x160/0x310
madvise_cold_or_pageout_pte_range+0x7bc/0xe3c
walk_pmd_range.isra.0+0xac/0x22c
walk_pud_range+0xfc/0x1c0
walk_pgd_range+0x158/0x1b0
__walk_page_range+0x64/0x100
walk_page_range+0x104/0x150

Link: https://lkml.kernel.org/r/20221118133850.3360369-1-chenwandun@huawei.com
Fixes: 048c27fd ("[PATCH] swap: scan_swap_map latency breaks")
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: N"Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: <xialonglong1@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Conflicts:
	mm/swapfile.c
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b2b6c3df

mm/migrate.c: rework migration_entry_wait() to not take a pageref · 321e0487

由 Alistair Popple 提交于 12月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit ffa65753
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I614X1
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ffa65753c43142f3b803486442813744da71cff2

--------------------------------

This fixes the FIXME in migrate_vma_check_page().

Before migrating a page migration code will take a reference and check
there are no unexpected page references, failing the migration if there
are.  When a thread faults on a migration entry it will take a temporary
reference to the page to wait for the page to become unlocked signifying
the migration entry has been removed.

This reference is dropped just prior to waiting on the page lock,
however the extra reference can cause migration failures so it is
desirable to avoid taking it.

As migration code already has a reference to the migrating page an extra
reference to wait on PG_locked is unnecessary so long as the reference
can't be dropped whilst setting up the wait.

When faulting on a migration entry the ptl is taken to check the
migration entry.  Removing a migration entry also requires the ptl, and
migration code won't drop its page reference until after the migration
entry has been removed.  Therefore retaining the ptl of a migration
entry is sufficient to ensure the page has a reference.  Reworking
migration_entry_wait() to hold the ptl until the wait setup is complete
means the extra page reference is no longer needed.

[apopple@nvidia.com: v5]
  Link: https://lkml.kernel.org/r/20211213033848.1973946-1-apopple@nvidia.com

Link: https://lkml.kernel.org/r/20211118020754.954425-1-apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com>
Acked-by: NDavid Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	include/linux/migrate.h.rej
	mm/filemap.c.rej
	mm/migrate.c.rej
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NNanyong Sun <sunnanyong@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

321e0487

02 12月, 2022 2 次提交

mm/hugetlb: fix hugetlb not supporting softdirty tracking · 9742b1b6

由 David Hildenbrand 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 62af37c5cd7f5fd071086cab645844bf5bcdc0ef
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=62af37c5cd7f5fd071086cab645844bf5bcdc0ef

--------------------------------

commit f96f7a40 upstream.

Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2.

I observed that hugetlb does not support/expect write-faults in shared
mappings that would have to map the R/O-mapped page writable -- and I
found two case where we could currently get such faults and would
erroneously map an anon page into a shared mapping.

Reproducers part of the patches.

I propose to backport both fixes to stable trees.  The first fix needs a
small adjustment.

This patch (of 2):

Staring at hugetlb_wp(), one might wonder where all the logic for shared
mappings is when stumbling over a write-protected page in a shared
mapping.  In fact, there is none, and so far we thought we could get away
with that because e.g., mprotect() should always do the right thing and
map all pages directly writable.

Looks like we were wrong:

--------------------------------------------------------------------------
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <fcntl.h>
 #include <unistd.h>
 #include <errno.h>
 #include <sys/mman.h>

 #define HUGETLB_SIZE (2 * 1024 * 1024u)

 static void clear_softdirty(void)
 {
         int fd = open("/proc/self/clear_refs", O_WRONLY);
         const char *ctrl = "4";
         int ret;

         if (fd < 0) {
                 fprintf(stderr, "open(clear_refs) failed\n");
                 exit(1);
         }
         ret = write(fd, ctrl, strlen(ctrl));
         if (ret != strlen(ctrl)) {
                 fprintf(stderr, "write(clear_refs) failed\n");
                 exit(1);
         }
         close(fd);
 }

 int main(int argc, char **argv)
 {
         char *map;
         int fd;

         fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
         if (!fd) {
                 fprintf(stderr, "open() failed\n");
                 return -errno;
         }
         if (ftruncate(fd, HUGETLB_SIZE)) {
                 fprintf(stderr, "ftruncate() failed\n");
                 return -errno;
         }

         map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
         if (map == MAP_FAILED) {
                 fprintf(stderr, "mmap() failed\n");
                 return -errno;
         }

         *map = 0;

         if (mprotect(map, HUGETLB_SIZE, PROT_READ)) {
                 fprintf(stderr, "mmprotect() failed\n");
                 return -errno;
         }

         clear_softdirty();

         if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) {
                 fprintf(stderr, "mmprotect() failed\n");
                 return -errno;
         }

         *map = 0;

         return 0;
 }
--------------------------------------------------------------------------

Above test fails with SIGBUS when there is only a single free hugetlb page.
 # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
 # ./test
 Bus error (core dumped)

And worse, with sufficient free hugetlb pages it will map an anonymous page
into a shared mapping, for example, messing up accounting during unmap
and breaking MAP_SHARED semantics:
 # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
 # ./test
 # cat /proc/meminfo | grep HugePages_
 HugePages_Total:       2
 HugePages_Free:        1
 HugePages_Rsvd:    18446744073709551615
 HugePages_Surp:        0

Reason in this particular case is that vma_wants_writenotify() will
return "true", removing VM_SHARED in vma_set_page_prot() to map pages
write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
support softdirty tracking.

Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com
Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com
Fixes: 64e45507 ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Jamie Liu <jamieliu@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>	[3.18+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

9742b1b6

mm/huge_memory.c: use helper function migration_entry_to_page() · 5dabdf42

由 Miaohe Lin 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit c7c77185fa3e9f8c3358426c2584a5b1dc1fdf0f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c7c77185fa3e9f8c3358426c2584a5b1dc1fdf0f

--------------------------------

[ Upstream commit a44f89dc ]

It's more recommended to use helper function migration_entry_to_page()
to get the page via migration entry.  We can also enjoy the PageLocked()
check there.

Link: https://lkml.kernel.org/r/20210318122722.13135-7-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Thomas Hellstrm (Intel) <thomas_os@shipmail.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: yuleixzhang <yulei.kernel@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

5dabdf42

01 12月, 2022 1 次提交

memblock,arm64: expand the static memblock memory table · 06bf4b58

由 Zhou Guanghui 提交于 12月 01, 2022

mainline inclusion
from mainline
commit c9bb62634fa772e80617cbe0cd09797e46c5df38
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63SDZ
CVE: NA

-------------------------------

In a system(Huawei Ascend ARM64 SoC) using HBM, a multi-bit ECC error
occurs, and the BIOS will mark the corresponding area (for example, 2 MB)
as unusable.  When the system restarts next time, these areas are not
reported or reported as EFI_UNUSABLE_MEMORY.  Both cases lead to an
increase in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to
a larger number of memblocks.

For example, if the EFI_UNUSABLE_MEMORY type is reported:
...
memory[0x92]    [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0
memory[0x93]    [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4
memory[0x94]    [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0
memory[0x95]    [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4
memory[0x96]    [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0
memory[0x97]    [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4
memory[0x98]    [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0
memory[0x99]    [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0
memory[0x9a]    [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
memory[0x9b]    [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0
memory[0x9c]    [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
memory[0x9d]    [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0
memory[0x9e]    [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4
memory[0x9f]    [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0
...

The EFI memory map is parsed to construct the memblock arrays before the
memblock arrays can be resized.  As the result, memory regions beyond
INIT_MEMBLOCK_REGIONS are lost.

Add a new macro INIT_MEMBLOCK_MEMORY_REGIONS to replace
INIT_MEMBLOCK_REGTIONS to define the size of the static memblock.memory
array.

Allow overriding memblock.memory array size with architecture defined
INIT_MEMBLOCK_MEMORY_REGIONS and make arm64 to set
INIT_MEMBLOCK_MEMORY_REGIONS to 1024 when CONFIG_EFI is enabled.

Link: https://lkml.kernel.org/r/20220615102742.96450-1-zhouguanghui1@huawei.comSigned-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
Acked-by: NMike Rapoport <rppt@linux.ibm.com>
Tested-by: NDarren Hart <darren@os.amperecomputing.com>
Acked-by: Will Deacon <will@kernel.org>		[arm64]
Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Cc: Xu Qiang <xuqiang36@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

06bf4b58

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功