- 06 3月, 2023 10 次提交
-
-
由 Wang Wensheng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I650K6 -------------------------------- The unshare process for k2task can be normalized with k2spg since there exist a local sp group for k2task. Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
-
由 Wang Wensheng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I650K6 -------------------------------- Rename sp_group_drop[_locked] to sp_group_put[_locked]. Rename __sp_find_spg[_locked] to sp_group_get[_locked]. Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
-
由 Wang Wensheng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I650K6 -------------------------------- The process for k2task can be normalized with k2spg since there exist a local sp group for k2task. Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
-
由 Wang Wensheng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I650K6 -------------------------------- 1. Extract a function that initialize all the members for a newly allocated sp_group. Just to decrease the function size. 2. Move the idr_alloc to the end of the function, since we should not add an uninitialized sp_group to the global idr. 3. Rename the file for hugetlb map. Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
-
由 Xu Qiang 提交于
hulk inclusion category: other bugzilla: https://gitee.com/openeuler/kernel/issues/I6HRGK ---------------------------------------------- Add two Helper functions sp_add_group_master and sp_del_group_master to manipulate master_list. Signed-off-by: NXu Qiang <xuqiang36@huawei.com>
-
由 Xu Qiang 提交于
hulk inclusion category: other bugzilla: https://gitee.com/openeuler/kernel/issues/I6HRGK ---------------------------------------------- In spa_overview_show, spg_info_show and spg_overview_show, there is similar code. The solution is to extract the difference into the function macro. Signed-off-by: NXu Qiang <xuqiang36@huawei.com>
-
由 Zhou Guanghui 提交于
ascend inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6HRGK -------------------------------------------- The sharepool statistics record the sharepool memory information used by all containers in the system. We do not expect to query the sharepool memory information applied by processes in other containers in the container. Therefore, the sharepool statistics cannot be queried in the container to solve this problem. Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
-
由 Wang Wensheng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I650K6 -------------------------------- If we delete a task that has not been added to any group from a specified group, NULL pointer dereference would occur. [ 162.566615] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 162.567699] Mem abort info: [ 162.567971] ESR = 0x96000006 [ 162.568187] EC = 0x25: DABT (current EL), IL = 32 bits [ 162.568508] SET = 0, FnV = 0 [ 162.568670] EA = 0, S1PTW = 0 [ 162.568794] Data abort info: [ 162.568906] ISV = 0, ISS = 0x00000006 [ 162.569032] CM = 0, WnR = 0 [ 162.569314] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001029e0000 [ 162.569516] [0000000000000008] pgd=00000001026da003, p4d=00000001026da003, pud=0000000102a90003, pmd=0000000000000000 [ 162.570346] Internal error: Oops: 96000006 [#1] SMP [ 162.570524] CPU: 0 PID: 880 Comm: test_sp_group_d Tainted: G W O 5.10.0+ #1 [ 162.570868] Hardware name: linux,dummy-virt (DT) [ 162.571053] pstate: 00000005 (nzcv daif -PAN -UAO -TCO BTYPE=--) [ 162.571370] pc : mg_sp_group_del_task+0x164/0x488 [ 162.571511] lr : mg_sp_group_del_task+0x158/0x488 [ 162.571644] sp : ffff8000127d3ca0 [ 162.571749] x29: ffff8000127d3ca0 x28: ffff372281b8c140 [ 162.571922] x27: 0000000000000000 x26: ffff372280b261c0 [ 162.572090] x25: ffffd075db9a9000 x24: ffffd075db9a90f8 [ 162.572259] x23: ffffd075db9a90e0 x22: 0000000000000371 [ 162.572425] x21: ffff372280826b00 x20: 0000000000000000 [ 162.572592] x19: ffffd075db12b000 x18: 0000000000000000 [ 162.572756] x17: 0000000000000000 x16: ffffd075da51e60c [ 162.572923] x15: 0000ffffdcf1a540 x14: 0000000000000000 [ 162.573087] x13: 0000000000000000 x12: 0000000000000000 [ 162.573250] x11: 0000000000000040 x10: ffffd075db5f1908 [ 162.573415] x9 : ffffd075db5f1900 x8 : ffff3722816f54b0 [ 162.573579] x7 : 0000000000000000 x6 : 0000000000000000 [ 162.573741] x5 : ffff3722816f5488 x4 : 0000000000000000 [ 162.573906] x3 : ffff372280b2620c x2 : ffff37228036b4a0 [ 162.574069] x1 : 0000000000000000 x0 : ffff372280b261c0 [ 162.574239] Call trace: [ 162.574336] mg_sp_group_del_task+0x164/0x488 [ 162.575262] dev_ioctl+0x10cc/0x2478 [sharepool_dev] [ 162.575443] __arm64_sys_ioctl+0xb4/0xf0 [ 162.575585] el0_svc_common.constprop.0+0xe4/0x2d4 [ 162.575726] do_el0_svc+0x34/0xa8 [ 162.575838] el0_svc+0x1c/0x28 [ 162.575941] el0_sync_handler+0x90/0xf0 [ 162.576060] el0_sync+0x168/0x180 [ 162.576391] Code: 97f4d4bf aa0003fa b4001580 f9420c01 (f8408c20) Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
-
由 Chen Jun 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I64Y5Y ------------------------------- If local_group_add_task fails in init_local_group. ida free the same id twice. init_local_group local_group_add_task // failed goto free_spg free_spg: free_sp_group_locked free_sp_group_id // free spg->id free_spg_id: free_new_spg_id // double free spg->id To fix it, return before calling free_new_spg_id. Signed-off-by: NChen Jun <chenjun102@huawei.com>
-
由 Zhang Zekun 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I6HRGK ---------------------------------------- Use CONFIG_EXTEND_HUGEPAGE_MAPPING to isolate code introduced in a3425d41. Besides, use tab instead of space to match the format of Kconfig Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>
-
- 25 2月, 2023 1 次提交
-
-
由 Rik van Riel 提交于
mainline inclusion from mainline-v6.1-rc2 commit 12df140f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6EVPO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=12df140f0bdfae5dcfc81800970dd7f6f632e00c -------------------------------- The h->*_huge_pages counters are protected by the hugetlb_lock, but alloc_huge_page has a corner case where it can decrement the counter outside of the lock. This could lead to a corrupted value of h->resv_huge_pages, which we have observed on our systems. Take the hugetlb_lock before decrementing h->resv_huge_pages to avoid a potential race. Link: https://lkml.kernel.org/r/20221017202505.0e6a4fcd@imladris.surriel.com Fixes: a88c7695 ("mm: hugetlb: fix hugepage memory leak caused by wrong reserve count") Signed-off-by: NRik van Riel <riel@surriel.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Glen McCready <gkmccready@meta.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NZhang Peng <zhangpeng362@huawei.com> Reviewed-by: Ntong tiangen <tongtiangen@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 17 2月, 2023 6 次提交
-
-
由 Yang Shi 提交于
stable inclusion from stable-v5.10.148 commit 377c60dd32d3289788bdb3d8840382f79d42139b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0WL Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=377c60dd32d3289788bdb3d8840382f79d42139b -------------------------------- commit 70cbc3cc upstream. Since general RCU GUP fast was introduced in commit 2667f50e ("mm: introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer sufficient to handle concurrent GUP-fast in all cases, it only handles traditional IPI-based GUP-fast correctly. On architectures that send an IPI broadcast on TLB flush, it works as expected. But on the architectures that do not use IPI to broadcast TLB flush, it may have the below race: CPU A CPU B THP collapse fast GUP gup_pmd_range() <-- see valid pmd gup_pte_range() <-- work on pte pmdp_collapse_flush() <-- clear pmd and flush __collapse_huge_page_isolate() check page pinned <-- before GUP bump refcount pin the page check PTE <-- no change __collapse_huge_page_copy() copy data to huge page ptep_clear() install huge pmd for the huge page return the stale page discard the stale page The race can be fixed by checking whether PMD is changed or not after taking the page pin in fast GUP, just like what it does for PTE. If the PMD is changed it means there may be parallel THP collapse, so GUP should back off. Also update the stale comment about serializing against fast GUP in khugepaged. Link: https://lkml.kernel.org/r/20220907180144.555485-1-shy828301@gmail.com Fixes: 2667f50e ("mm: introduce a general RCU get_user_pages_fast()") Acked-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NYang Shi <shy828301@gmail.com> Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com> Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Minchan Kim 提交于
stable inclusion from stable-v5.10.147 commit be2cd261ca510be5cb51b198c602c516207dd32b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0W8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=be2cd261ca510be5cb51b198c602c516207dd32b -------------------------------- commit 58d426a7 upstream. MADV_PAGEOUT tries to isolate non-LRU pages and gets a warning from isolate_lru_page below. Fix it by checking PageLRU in advance. ------------[ cut here ]------------ trying to isolate tail page WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140 Modules linked in: CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:isolate_lru_page+0x130/0x140 Link: https://lore.kernel.org/linux-mm/485f8c33.2471b.182d5726afb.Coremail.hantianshuo@iie.ac.cn/ Link: https://lkml.kernel.org/r/20220908151204.762596-1-minchan@kernel.org Fixes: 1a4e58cc ("mm: introduce MADV_PAGEOUT") Signed-off-by: NMinchan Kim <minchan@kernel.org> Reported-by: N韩天ç` <hantianshuo@iie.ac.cn> Suggested-by: NYang Shi <shy828301@gmail.com> Acked-by: NYang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com> Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Alistair Popple 提交于
stable inclusion from stable-v5.10.147 commit 1002d5fef406c8dc685679aedb16938ad57d968c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0W8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1002d5fef406c8dc685679aedb16938ad57d968c -------------------------------- commit 60bae737 upstream. When clearing a PTE the TLB should be flushed whilst still holding the PTL to avoid a potential race with madvise/munmap/etc. For example consider the following sequence: CPU0 CPU1 ---- ---- migrate_vma_collect_pmd() pte_unmap_unlock() madvise(MADV_DONTNEED) -> zap_pte_range() pte_offset_map_lock() [ PTE not present, TLB not flushed ] pte_unmap_unlock() [ page is still accessible via stale TLB ] flush_tlb_range() In this case the page may still be accessed via the stale TLB entry after madvise returns. Fix this by flushing the TLB while holding the PTL. Fixes: 8c3328f1 ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") Link: https://lkml.kernel.org/r/9f801e9d8d830408f2ca27821f606e09aa856899.1662078528.git-series.apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com> Reported-by: NNadav Amit <nadav.amit@gmail.com> Reviewed-by: N"Huang, Ying" <ying.huang@intel.com> Acked-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NPeter Xu <peterx@redhat.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: huang ying <huang.ying.caritas@gmail.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com> Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Maurizio Lombardi 提交于
stable inclusion from stable-v5.10.147 commit a54fc536911371bd5863afdcb2418ab948484b75 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0W8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a54fc536911371bd5863afdcb2418ab948484b75 -------------------------------- commit dac22531 upstream. A number of drivers call page_frag_alloc() with a fragment's size > PAGE_SIZE. In low memory conditions, __page_frag_cache_refill() may fail the order 3 cache allocation and fall back to order 0; In this case, the cache will be smaller than the fragment, causing memory corruptions. Prevent this from happening by checking if the newly allocated cache is large enough for the fragment; if not, the allocation will fail and page_frag_alloc() will return NULL. Link: https://lkml.kernel.org/r/20220715125013.247085-1-mlombard@redhat.com Fixes: b63ae8ca ("mm/net: Rename and move page fragment handling from net/ to mm/") Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Cc: Chen Lin <chen45464546@163.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com> Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Mel Gorman 提交于
stable inclusion from stable-v5.10.147 commit 466a26af2d105666a427e4892b2be67cfcc55f4f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0W8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=466a26af2d105666a427e4892b2be67cfcc55f4f -------------------------------- commit 3d36424b upstream. Patrick Daly reported the following problem; NODE_DATA(nid)->node_zonelists[ZONELIST_FALLBACK] - before offline operation [0] - ZONE_MOVABLE [1] - ZONE_NORMAL [2] - NULL For a GFP_KERNEL allocation, alloc_pages_slowpath() will save the offset of ZONE_NORMAL in ac->preferred_zoneref. If a concurrent memory_offline operation removes the last page from ZONE_MOVABLE, build_all_zonelists() & build_zonerefs_node() will update node_zonelists as shown below. Only populated zones are added. NODE_DATA(nid)->node_zonelists[ZONELIST_FALLBACK] - after offline operation [0] - ZONE_NORMAL [1] - NULL [2] - NULL The race is simple -- page allocation could be in progress when a memory hot-remove operation triggers a zonelist rebuild that removes zones. The allocation request will still have a valid ac->preferred_zoneref that is now pointing to NULL and triggers an OOM kill. This problem probably always existed but may be slightly easier to trigger due to 6aa303de ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator") which distinguishes between zones that are completely unpopulated versus zones that have valid pages not managed by the buddy allocator (e.g. reserved, memblock, ballooning etc). Memory hotplug had multiple stages with timing considerations around managed/present page updates, the zonelist rebuild and the zone span updates. As David Hildenbrand puts it memory offlining adjusts managed+present pages of the zone essentially in one go. If after the adjustments, the zone is no longer populated (present==0), we rebuild the zone lists. Once that's done, we try shrinking the zone (start+spanned pages) -- which results in zone_start_pfn == 0 if there are no more pages. That happens *after* rebuilding the zonelists via remove_pfn_range_from_zone(). The only requirement to fix the race is that a page allocation request identifies when a zonelist rebuild has happened since the allocation request started and no page has yet been allocated. Use a seqlock_t to track zonelist updates with a lockless read-side of the zonelist and protecting the rebuild and update of the counter with a spinlock. [akpm@linux-foundation.org: make zonelist_update_seq static] Link: https://lkml.kernel.org/r/20220824110900.vh674ltxmzb3proq@techsingularity.net Fixes: 6aa303de ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator") Signed-off-by: NMel Gorman <mgorman@techsingularity.net> Reported-by: NPatrick Daly <quic_pdaly@quicinc.com> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> [4.9+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com> Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chao Yu 提交于
stable inclusion from stable-v5.10.146 commit 379ac7905ff3f0a6a4e507d3e9f710ec4fab9124 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0VX Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=379ac7905ff3f0a6a4e507d3e9f710ec4fab9124 -------------------------------- commit 7e9c323c upstream. In create_unique_id(), kmalloc(, GFP_KERNEL) can fail due to out-of-memory, if it fails, return errno correctly rather than triggering panic via BUG_ON(); kernel BUG at mm/slub.c:5893! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Call trace: sysfs_slab_add+0x258/0x260 mm/slub.c:5973 __kmem_cache_create+0x60/0x118 mm/slub.c:4899 create_cache mm/slab_common.c:229 [inline] kmem_cache_create_usercopy+0x19c/0x31c mm/slab_common.c:335 kmem_cache_create+0x1c/0x28 mm/slab_common.c:390 f2fs_kmem_cache_create fs/f2fs/f2fs.h:2766 [inline] f2fs_init_xattr_caches+0x78/0xb4 fs/f2fs/xattr.c:808 f2fs_fill_super+0x1050/0x1e0c fs/f2fs/super.c:4149 mount_bdev+0x1b8/0x210 fs/super.c:1400 f2fs_mount+0x44/0x58 fs/f2fs/super.c:4512 legacy_get_tree+0x30/0x74 fs/fs_context.c:610 vfs_get_tree+0x40/0x140 fs/super.c:1530 do_new_mount+0x1dc/0x4e4 fs/namespace.c:3040 path_mount+0x358/0x914 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __arm64_sys_mount+0x2f8/0x408 fs/namespace.c:3568 Cc: <stable@kernel.org> Fixes: 81819f0f ("SLUB core") Reported-by: syzbot+81684812ea68216e08c5@syzkaller.appspotmail.com Reviewed-by: NMuchun Song <songmuchun@bytedance.com> Reviewed-by: NHyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: NChao Yu <chao.yu@oppo.com> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com> Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 13 2月, 2023 2 次提交
-
-
由 Jann Horn 提交于
stable inclusion from stable-v5.10.144 commit 891f03f688de8418f44b32b88f6b4faed5b2aa81 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0V7 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=891f03f688de8418f44b32b88f6b4faed5b2aa81 -------------------------------- This is a stable-specific patch. I botched the stable-specific rewrite of commit b67fbebd ("mmu_gather: Force tlb-flush VM_PFNMAP vmas"): As Hugh pointed out, unmap_region() actually operates on a list of VMAs, and the variable "vma" merely points to the first VMA in that list. So if we want to check whether any of the VMAs we're operating on is PFNMAP or MIXEDMAP, we have to iterate through the list and check each VMA. Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com> Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yee Lee 提交于
stable inclusion from stable-v5.10.143 commit a14f1799ce373b435a2a6253747dedc23ea35abc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0U6 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a14f1799ce373b435a2a6253747dedc23ea35abc -------------------------------- This reverts commit 23c2d497. Commit 23c2d497 ("mm: kmemleak: take a full lowmem check in kmemleak_*_phys()") brought false leak alarms on some archs like arm64 that does not init pfn boundary in early booting. The final solution lands on linux-6.0: commit 0c24e061 ("mm: kmemleak: add rbtree and store physical address for objects allocated with PA"). Revert this commit before linux-6.0. The original issue of invalid PA can be mitigated by additional check in devicetree. The false alarm report is as following: Kmemleak output: (Qemu/arm64) unreferenced object 0xffff0000c0170a00 (size 128): comm "swapper/0", pid 1, jiffies 4294892404 (age 126.208s) hex dump (first 32 bytes): 62 61 73 65 00 00 00 00 00 00 00 00 00 00 00 00 base............ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<(____ptrval____)>] __kmalloc_track_caller+0x1b0/0x2e4 [<(____ptrval____)>] kstrdup_const+0x8c/0xc4 [<(____ptrval____)>] kvasprintf_const+0xbc/0xec [<(____ptrval____)>] kobject_set_name_vargs+0x58/0xe4 [<(____ptrval____)>] kobject_add+0x84/0x100 [<(____ptrval____)>] __of_attach_node_sysfs+0x78/0xec [<(____ptrval____)>] of_core_init+0x68/0x104 [<(____ptrval____)>] driver_init+0x28/0x48 [<(____ptrval____)>] do_basic_setup+0x14/0x28 [<(____ptrval____)>] kernel_init_freeable+0x110/0x178 [<(____ptrval____)>] kernel_init+0x20/0x1a0 [<(____ptrval____)>] ret_from_fork+0x10/0x20 This pacth is also applicable to linux-5.17.y/linux-5.18.y/linux-5.19.y Cc: <stable@vger.kernel.org> Signed-off-by: NYee Lee <yee.lee@mediatek.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com> Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 09 2月, 2023 3 次提交
-
-
由 Steven Price 提交于
stable inclusion from stable-v5.10.142 commit 47a73e5e6ba42e42db2d1400478b2e132e25ceb4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6CSFH Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=47a73e5e6ba42e42db2d1400478b2e132e25ceb4 -------------------------------- [ Upstream commit 8782fb61 ] The mmap lock protects the page walker from changes to the page tables during the walk. However a read lock is insufficient to protect those areas which don't have a VMA as munmap() detaches the VMAs before downgrading to a read lock and actually tearing down PTEs/page tables. For users of walk_page_range() the solution is to simply call pte_hole() immediately without checking the actual page tables when a VMA is not present. We now never call __walk_page_range() without a valid vma. For walk_page_range_novma() the locking requirements are tightened to require the mmap write lock to be taken, and then walking the pgd directly with 'no_vma' set. This in turn means that all page walkers either have a valid vma, or it's that special 'novma' case for page table debugging. As a result, all the odd '(!walk->vma && !walk->no_vma)' tests can be removed. Fixes: dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap") Reported-by: NJann Horn <jannh@google.com> Signed-off-by: NSteven Price <steven.price@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com> Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Liu Shixin 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6ADCF CVE: NA -------------------------------- syzbot is reporting GFP_KERNEL allocation with oom_lock held when reporting memcg OOM [1]. If this allocation triggers the global OOM situation then the system can livelock because the GFP_KERNEL allocation with oom_lock held cannot trigger the global OOM killer because __alloc_pages_may_oom() fails to hold oom_lock. The problem mentioned above has been fixed by patch[2]. The is the same problem in memcg_memfs_info feature too. Refer to the patch[2], fix it by removing the allocation from mem_cgroup_print_memfs_info() completely, and pass static buffer when calling from memcg OOM path. Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1] Link: https://lkml.kernel.org/r/86afb39f-8c65-bec2-6cfc-c5e3cd600c0b@I-love.SAKURA.ne.jp [2] Fixes: 6b1d4d3a ("mm/memcg_memfs_info: show files that having pages charged in mem_cgroup") Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NCai Xinchen <caixinchen1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com> (cherry picked from commit d2218535)
-
由 Tetsuo Handa 提交于
mainline inclusion from mainline-v6.0-rc1 commit 68aaee14 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6ADCF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=68aaee147e597b495622b7c9038e5922c7c61f57 -------------------------------- syzbot is reporting GFP_KERNEL allocation with oom_lock held when reporting memcg OOM [1]. If this allocation triggers the global OOM situation then the system can livelock because the GFP_KERNEL allocation with oom_lock held cannot trigger the global OOM killer because __alloc_pages_may_oom() fails to hold oom_lock. Fix this problem by removing the allocation from memory_stat_format() completely, and pass static buffer when calling from memcg OOM path. Note that the caller holding filesystem lock was the trigger for syzbot to report this locking dependency. Doing GFP_KERNEL allocation with filesystem lock held can deadlock the system even without involving OOM situation. Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1] Link: https://lkml.kernel.org/r/86afb39f-8c65-bec2-6cfc-c5e3cd600c0b@I-love.SAKURA.ne.jp Fixes: c8713d0b ("mm: memcontrol: dump memory.stat during cgroup OOM") Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Nsyzbot <syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com> Suggested-by: NMichal Hocko <mhocko@suse.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> conflicts: mm/memcontrol.c Signed-off-by: NCai Xinchen <caixinchen1@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com> (cherry picked from commit ee2d7b76)
-
- 31 1月, 2023 2 次提交
-
-
由 Alistair Popple 提交于
mainline inclusion from mainline-v6.1-rc7 commit 4a955bed category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6BG56 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4a955bed882e734807024afd8f53213d4c61ff97 -------------------------------- The migrate_to_ram() callback should always succeed, but in rare cases can fail usually returning VM_FAULT_SIGBUS. Commit 16ce101d ("mm/memory.c: fix race when faulting a device private page") incorrectly stopped passing the return code up the stack. Fix this by setting the ret variable, restoring the previous behaviour on migrate_to_ram() failure. Link: https://lkml.kernel.org/r/20221114115537.727371-1-apopple@nvidia.com Fixes: 16ce101d ("mm/memory.c: fix race when faulting a device private page") Signed-off-by: NAlistair Popple <apopple@nvidia.com> Acked-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NMa Wupeng <mawupeng1@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Ntong tiangen <tongtiangen@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com> (cherry picked from commit 1254c416)
-
由 Yuanzheng Song 提交于
mainline inclusion from mainline-v5.16-rc1 commit 7e6ec49c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6AW65 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7e6ec49c18988f1b8dab0677271dafde5f8d9a43 -------------------------------- When reading memcg->socket_pressure in mem_cgroup_under_socket_pressure() and writing memcg->socket_pressure in vmpressure() at the same time, the following data-race occurs: BUG: KCSAN: data-race in __sk_mem_reduce_allocated / vmpressure write to 0xffff8881286f4938 of 8 bytes by task 24550 on cpu 3: vmpressure+0x218/0x230 mm/vmpressure.c:307 shrink_node_memcgs+0x2b9/0x410 mm/vmscan.c:2658 shrink_node+0x9d2/0x11d0 mm/vmscan.c:2769 shrink_zones+0x29f/0x470 mm/vmscan.c:2972 do_try_to_free_pages+0x193/0x6e0 mm/vmscan.c:3027 try_to_free_mem_cgroup_pages+0x1c0/0x3f0 mm/vmscan.c:3345 reclaim_high mm/memcontrol.c:2440 [inline] mem_cgroup_handle_over_high+0x18b/0x4d0 mm/memcontrol.c:2624 tracehook_notify_resume include/linux/tracehook.h:197 [inline] exit_to_user_mode_loop kernel/entry/common.c:164 [inline] exit_to_user_mode_prepare+0x110/0x170 kernel/entry/common.c:191 syscall_exit_to_user_mode+0x16/0x30 kernel/entry/common.c:266 ret_from_fork+0x15/0x30 arch/x86/entry/entry_64.S:289 read to 0xffff8881286f4938 of 8 bytes by interrupt on cpu 1: mem_cgroup_under_socket_pressure include/linux/memcontrol.h:1483 [inline] sk_under_memory_pressure include/net/sock.h:1314 [inline] __sk_mem_reduce_allocated+0x1d2/0x270 net/core/sock.c:2696 __sk_mem_reclaim+0x44/0x50 net/core/sock.c:2711 sk_mem_reclaim include/net/sock.h:1490 [inline] ...... net_rx_action+0x17a/0x480 net/core/dev.c:6864 __do_softirq+0x12c/0x2af kernel/softirq.c:298 run_ksoftirqd+0x13/0x20 kernel/softirq.c:653 smpboot_thread_fn+0x33f/0x510 kernel/smpboot.c:165 kthread+0x1fc/0x220 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 Fix it by using READ_ONCE() and WRITE_ONCE() to read and write memcg->socket_pressure. Link: https://lkml.kernel.org/r/20211025082843.671690-1-songyuanzheng@huawei.comSigned-off-by: NYuanzheng Song <songyuanzheng@huawei.com> Reviewed-by: NMuchun Song <songmuchun@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Alex Shi <alexs@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NCai Xinchen <caixinchen1@huawei.com> Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com> (cherry picked from commit bbc0f30d)
-
- 19 1月, 2023 2 次提交
-
-
由 Guo Mengqi 提交于
hulk inclusion category: other bugzilla: https://gitee.com/openeuler/kernel/issues/I69JDF CVE: NA ------------------------------- Delete svm driver, as it is specially designed for Hisilicon platform. Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com> Reviewed-by: Nchenweilong <chenweilong@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com> (cherry picked from commit 056e261b)
-
由 Qi Zheng 提交于
mainline inclusion from mainline-v6.1-rc7 commit ea4452de category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I69VVC CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ea4452de2ae987342fadbdd2c044034e6480daad -------------------------------- When we specify __GFP_NOWARN, we only expect that no warnings will be issued for current caller. But in the __should_failslab() and __should_fail_alloc_page(), the local GFP flags alter the global {failslab|fail_page_alloc}.attr, which is persistent and shared by all tasks. This is not what we expected, let's fix it. [akpm@linux-foundation.org: unexport should_fail_ex()] Link: https://lkml.kernel.org/r/20221118100011.2634-1-zhengqi.arch@bytedance.com Fixes: 3f913fc5 ("mm: fix missing handler for __GFP_NOWARN") Signed-off-by: NQi Zheng <zhengqi.arch@bytedance.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Reviewed-by: NAkinobu Mita <akinobu.mita@gmail.com> Reviewed-by: NJason Gunthorpe <jgg@nvidia.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NYe Weihua <yeweihua4@huawei.com> Reviewed-by: Ntong tiangen <tongtiangen@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com> (cherry picked from commit 49457cf7)
-
- 16 1月, 2023 1 次提交
-
-
由 Tong Tiangen 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6AQE8 CVE: NA ------------------------------- The function memcpy_mcs() is designed to copy in memory and should be supported by kasan. Signed-off-by: NTong Tiangen <tongtiangen@huawei.com>
-
- 04 1月, 2023 4 次提交
-
-
由 Kefeng Wang 提交于
mainline inclusion from mainline-v6.2-rc1 commit de2e5171 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I67BWQ CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=de2e5171433126d340573cb7d0d4fcac084ab2a0 -------------------------------- When handling MADV_WILLNEED in madvise(), a soflockup may occurr in swapin_walk_pmd_entry() if swapping in lots of memory on a slow device. Add a cond_resched() to avoid the possible softlockup. Link: https://lkml.kernel.org/r/20221205140327.72304-1-wangkefeng.wang@huawei.com Fixes: 1998cc04 ("mm: make madvise(MADV_WILLNEED) support swap file prefetch") Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Cc: Shaohua Li <shli@fusionio.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Conflicts: mm/madvise.c Signed-off-by: NLonglong Xia <xialonglong1@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Liu Shixin 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I66OCA CVE: NA -------------------------------- Since free_pages_prepare() will clear the PagePool without lock in free_page_to_dhugetlb_pool() and free_page_list_to_dhugetlb_pool(), it is unreliable to check whether a page is freed by PagePool in hpool_merge_page(). Move free_pages_prepare() after ClearPagePool(), which can guarantee all allocated page has PagePool flag. Fixes: 71197c63 ("mm/dynamic_hugetlb: free pages to dhugetlb_pool") Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Liu Shixin 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I66OCA CVE: NA -------------------------------- The percpu pool will be cleared by clear_percpu_pools(), and then check whether all pages are already freed. If some pages are not freed, we will firstly isolate the freed pages and then migrate the used pages. Since we missed to get lock of percpu_pool, the used pages can be free to percpu_pool while the isolation is going. In such case, the list operation will be unreliable. To fix this problem, we need to get all related locks sequentially and clear the perpcu_pool again before isolate the freed pages. Fixes: cdbeee51 ("mm/dynamic_hugetlb: add migration function") Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chen Wandun 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I645JI ------------------------------- Commit 46d673d7 ("mm/swapfile: fix broken kabi in swap_info_struct") uses KABI_EXTEND to fix kabi broken problem, but this is not the safest way, so use new way to fix this prolem. Introduce new struct swap_extend_info that contains extend info of swap, meanwhile use KABI_USE to repalce memory space reserved by KABI_RESERVER. Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: Nzhangjialin <zhangjialin11@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 08 12月, 2022 6 次提交
-
-
由 Liu Shixin 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I641XX CVE: NA -------------------------------- Patch 1378a5ee ("mm: store compound_nr as well as compound_order") add a new member compound_nr in struct page, and use this new member insteal of compound_order in hugetlb_cgroup_move_parent() to compute the nr_pages. In free_hugepage_to_hugetlb(), we reset page->mapping to NULL for each subpage. Since page->mapping and page->compound_nr is union, we reset page->compound_nr too unexpectly. This will finally result the nr_pages incorrect in hugetlb_cgroup_move_parent() and can't release hugetlb_cgroup. Fix this problem by reset page->compound_nr using set_compound_order(). Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Miaohe Lin 提交于
mainline inclusion from mainline-v5.14-rc1 commit 2efa33fc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I645JI CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2efa33fc7f6ec94a3a538c1a264273c889be2b36 -------------------------------- When I was investigating the swap code, I found the below possible race window: CPU 1 CPU 2 ----- ----- shmem_swapin swap_cluster_readahead if (likely(si->flags & (SWP_BLKDEV | SWP_FS_OPS))) { swapoff .. si->swap_file = NULL; .. struct inode *inode = si->swap_file->f_mapping->host;[oops!] Close this race window by using get/put_swap_device() to guard against concurrent swapoff. Link: https://lkml.kernel.org/r/20210426123316.806267-5-linmiaohe@huawei.com Fixes: 8fd2e0b5 ("mm: swap: check if swap backing device is congested or not") Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Reviewed-by: N"Huang, Ying" <ying.huang@intel.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Alex Shi <alexs@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Miaohe Lin 提交于
mainline inclusion from mainline-v5.14-rc1 commit 2799e775 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I645JI CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2799e77529c2a25492a4395db93996e3dacd762d -------------------------------- When I was investigating the swap code, I found the below possible race window: CPU 1 CPU 2 ----- ----- do_swap_page if (data_race(si->flags & SWP_SYNCHRONOUS_IO) swap_readpage if (data_race(sis->flags & SWP_FS_OPS)) { swapoff .. p->swap_file = NULL; .. struct file *swap_file = sis->swap_file; struct address_space *mapping = swap_file->f_mapping;[oops!] Note that for the pages that are swapped in through swap cache, this isn't an issue. Because the page is locked, and the swap entry will be marked with SWAP_HAS_CACHE, so swapoff() can not proceed until the page has been unlocked. Fix this race by using get/put_swap_device() to guard against concurrent swapoff. Link: https://lkml.kernel.org/r/20210426123316.806267-3-linmiaohe@huawei.com Fixes: 0bcac06f ("mm,swap: skip swapcache for swapin of synchronous device") Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Reviewed-by: N"Huang, Ying" <ying.huang@intel.com> Cc: Alex Shi <alexs@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Conflicts: mm/memory.c Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Miaohe Lin 提交于
mainline inclusion from mainline-v5.14-rc1 commit 63d8620e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I645JI CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=63d8620ecf93b5d8d0a254471184d08f8e8f538d -------------------------------- Patch series "close various race windows for swap", v6. When I was investigating the swap code, I found some possible race windows. This series aims to fix all these races. But using current get/put_swap_device() to guard against concurrent swapoff for swap_readpage() looks terrible because swap_readpage() may take really long time. And to reduce the performance overhead on the hot-path as much as possible, it appears we can use the percpu_ref to close this race window(as suggested by Huang, Ying). The patch 1 adds percpu_ref support for swap and most of the remaining patches try to use this to close various race windows. More details can be found in the respective changelogs. This patch (of 4): Using current get/put_swap_device() to guard against concurrent swapoff for some swap ops, e.g. swap_readpage(), looks terrible because they might take really long time. This patch adds the percpu_ref support to serialize against concurrent swapoff(as suggested by Huang, Ying). Also we remove the SWP_VALID flag because it's used together with RCU solution. Link: https://lkml.kernel.org/r/20210426123316.806267-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20210426123316.806267-2-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com> Reviewed-by: N"Huang, Ying" <ying.huang@intel.com> Cc: Alex Shi <alexs@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chen Wandun 提交于
mainline inclusion from mainline-v6.1-rc7 commit de1ccfb6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I645DG CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=de1ccfb648243a031cfbdc2d5571dfdaf5023106 -------------------------------- A softlockup occurs in scan free swap slot under huge memory pressure. The test scenario is: 64 CPU cores, 64GB memory, and 28 zram devices, the disksize of each zram device is 50MB. LATENCY_LIMIT is used to prevent softlockups in scan_swap_map_slots(), but the real loop number would more than LATENCY_LIMIT because of "goto checks and goto scan" repeatly without decreasing latency limit. In order to fix it, decrease latency_ration in advance. There is also a suspicious place that will cause softlockups in get_swap_pages(). In this function, the "goto start_over" may result in continuous scanning of the swap partition. If there is no cond_sched in scan_swap_map_slots(), it would cause a softlockup (I am not sure about this). WARN: soft lockup - CPU#11 stuck for 11s! [kswapd0:466] CPU: 11 PID: 466 Comm: kswapd@ Kdump: loaded Tainted: G dump backtrace+0x0/0x1le4 show stack+0x20/@x2c dump_stack+0xd8/0x140 watchdog print_info+0x48/0x54 watchdog_process_before_softlockup+0x98/0xa0 watchdog_timer_fn+0xlac/0x2d0 hrtimer_rum_queues+0xb0/0x130 hrtimer_interrupt+0x13c/0x3c0 arch_timer_handler_virt+0x3c/0x50 handLe_percpu_devid_irq+0x90/0x1f4 handle domain irq+0x84/0x100 gic_handle_irq+0x88/0x2b0 e11 ira+0xhB/Bx140 scan_swap_map_slots+0x678/0x890 get_swap_pages+0x29c/0x440 get_swap_page+0x120/0x2e0 add_to_swap+UX2U/0XyC shrink_page_list+0x5d0/0x152c shrink_inactive_list+0xl6c/Bx500 shrink_lruvec+0x270/0x304 WARN: soft lockup - CPU#32 stuck for 11s! [stress-ng:309915] watchdog_timer_fn+0x1ac/0x2d0 __run_hrtimer+0x98/0x2a0 __hrtimer_run_queues+0xb0/0x130 hrtimer_interrupt+0x13c/0x3c0 arch_timer_handler_virt+0x3c/0x50 handle_percpu_devid_irq+0x90/0x1f4 __handle_domain_irq+0x84/0x100 gic_handle_irq+0x88/0x2b0 el1_irq+0xb8/0x140 get_swap_pages+0x1e8/0x440 get_swap_page+0x1c8/0x2e0 add_to_swap+0x20/0x9c shrink_page_list+0x5d0/0x152c reclaim_pages+0x160/0x310 madvise_cold_or_pageout_pte_range+0x7bc/0xe3c walk_pmd_range.isra.0+0xac/0x22c walk_pud_range+0xfc/0x1c0 walk_pgd_range+0x158/0x1b0 __walk_page_range+0x64/0x100 walk_page_range+0x104/0x150 Link: https://lkml.kernel.org/r/20221118133850.3360369-1-chenwandun@huawei.com Fixes: 048c27fd ("[PATCH] swap: scan_swap_map latency breaks") Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: N"Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: <xialonglong1@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Conflicts: mm/swapfile.c Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Alistair Popple 提交于
mainline inclusion from mainline-v5.17-rc1 commit ffa65753 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I614X1 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ffa65753c43142f3b803486442813744da71cff2 -------------------------------- This fixes the FIXME in migrate_vma_check_page(). Before migrating a page migration code will take a reference and check there are no unexpected page references, failing the migration if there are. When a thread faults on a migration entry it will take a temporary reference to the page to wait for the page to become unlocked signifying the migration entry has been removed. This reference is dropped just prior to waiting on the page lock, however the extra reference can cause migration failures so it is desirable to avoid taking it. As migration code already has a reference to the migrating page an extra reference to wait on PG_locked is unnecessary so long as the reference can't be dropped whilst setting up the wait. When faulting on a migration entry the ptl is taken to check the migration entry. Removing a migration entry also requires the ptl, and migration code won't drop its page reference until after the migration entry has been removed. Therefore retaining the ptl of a migration entry is sufficient to ensure the page has a reference. Reworking migration_entry_wait() to hold the ptl until the wait setup is complete means the extra page reference is no longer needed. [apopple@nvidia.com: v5] Link: https://lkml.kernel.org/r/20211213033848.1973946-1-apopple@nvidia.com Link: https://lkml.kernel.org/r/20211118020754.954425-1-apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com> Acked-by: NDavid Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Conflicts: include/linux/migrate.h.rej mm/filemap.c.rej mm/migrate.c.rej Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NNanyong Sun <sunnanyong@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 02 12月, 2022 2 次提交
-
-
由 David Hildenbrand 提交于
stable inclusion from stable-v5.10.140 commit 62af37c5cd7f5fd071086cab645844bf5bcdc0ef category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=62af37c5cd7f5fd071086cab645844bf5bcdc0ef -------------------------------- commit f96f7a40 upstream. Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2. I observed that hugetlb does not support/expect write-faults in shared mappings that would have to map the R/O-mapped page writable -- and I found two case where we could currently get such faults and would erroneously map an anon page into a shared mapping. Reproducers part of the patches. I propose to backport both fixes to stable trees. The first fix needs a small adjustment. This patch (of 2): Staring at hugetlb_wp(), one might wonder where all the logic for shared mappings is when stumbling over a write-protected page in a shared mapping. In fact, there is none, and so far we thought we could get away with that because e.g., mprotect() should always do the right thing and map all pages directly writable. Looks like we were wrong: -------------------------------------------------------------------------- #include <stdio.h> #include <stdlib.h> #include <string.h> #include <fcntl.h> #include <unistd.h> #include <errno.h> #include <sys/mman.h> #define HUGETLB_SIZE (2 * 1024 * 1024u) static void clear_softdirty(void) { int fd = open("/proc/self/clear_refs", O_WRONLY); const char *ctrl = "4"; int ret; if (fd < 0) { fprintf(stderr, "open(clear_refs) failed\n"); exit(1); } ret = write(fd, ctrl, strlen(ctrl)); if (ret != strlen(ctrl)) { fprintf(stderr, "write(clear_refs) failed\n"); exit(1); } close(fd); } int main(int argc, char **argv) { char *map; int fd; fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT); if (!fd) { fprintf(stderr, "open() failed\n"); return -errno; } if (ftruncate(fd, HUGETLB_SIZE)) { fprintf(stderr, "ftruncate() failed\n"); return -errno; } map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if (map == MAP_FAILED) { fprintf(stderr, "mmap() failed\n"); return -errno; } *map = 0; if (mprotect(map, HUGETLB_SIZE, PROT_READ)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; } clear_softdirty(); if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; } *map = 0; return 0; } -------------------------------------------------------------------------- Above test fails with SIGBUS when there is only a single free hugetlb page. # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test Bus error (core dumped) And worse, with sufficient free hugetlb pages it will map an anonymous page into a shared mapping, for example, messing up accounting during unmap and breaking MAP_SHARED semantics: # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test # cat /proc/meminfo | grep HugePages_ HugePages_Total: 2 HugePages_Free: 1 HugePages_Rsvd: 18446744073709551615 HugePages_Surp: 0 Reason in this particular case is that vma_wants_writenotify() will return "true", removing VM_SHARED in vma_set_page_prot() to map pages write-protected. Let's teach vma_wants_writenotify() that hugetlb does not support softdirty tracking. Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com Fixes: 64e45507 ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared") Signed-off-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Peter Feiner <pfeiner@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Jamie Liu <jamieliu@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> [3.18+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
由 Miaohe Lin 提交于
stable inclusion from stable-v5.10.140 commit c7c77185fa3e9f8c3358426c2584a5b1dc1fdf0f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c7c77185fa3e9f8c3358426c2584a5b1dc1fdf0f -------------------------------- [ Upstream commit a44f89dc ] It's more recommended to use helper function migration_entry_to_page() to get the page via migration entry. We can also enjoy the PageLocked() check there. Link: https://lkml.kernel.org/r/20210318122722.13135-7-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michel Lespinasse <walken@google.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Thomas Hellstrm (Intel) <thomas_os@shipmail.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: yuleixzhang <yulei.kernel@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
- 01 12月, 2022 1 次提交
-
-
由 Zhou Guanghui 提交于
mainline inclusion from mainline commit c9bb62634fa772e80617cbe0cd09797e46c5df38 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63SDZ CVE: NA ------------------------------- In a system(Huawei Ascend ARM64 SoC) using HBM, a multi-bit ECC error occurs, and the BIOS will mark the corresponding area (for example, 2 MB) as unusable. When the system restarts next time, these areas are not reported or reported as EFI_UNUSABLE_MEMORY. Both cases lead to an increase in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to a larger number of memblocks. For example, if the EFI_UNUSABLE_MEMORY type is reported: ... memory[0x92] [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x93] [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x94] [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0 memory[0x95] [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x96] [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x97] [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x98] [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0 memory[0x99] [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0 memory[0x9a] [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9b] [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0 memory[0x9c] [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9d] [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0 memory[0x9e] [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9f] [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0 ... The EFI memory map is parsed to construct the memblock arrays before the memblock arrays can be resized. As the result, memory regions beyond INIT_MEMBLOCK_REGIONS are lost. Add a new macro INIT_MEMBLOCK_MEMORY_REGIONS to replace INIT_MEMBLOCK_REGTIONS to define the size of the static memblock.memory array. Allow overriding memblock.memory array size with architecture defined INIT_MEMBLOCK_MEMORY_REGIONS and make arm64 to set INIT_MEMBLOCK_MEMORY_REGIONS to 1024 when CONFIG_EFI is enabled. Link: https://lkml.kernel.org/r/20220615102742.96450-1-zhouguanghui1@huawei.comSigned-off-by: NZhou Guanghui <zhouguanghui1@huawei.com> Acked-by: NMike Rapoport <rppt@linux.ibm.com> Tested-by: NDarren Hart <darren@os.amperecomputing.com> Acked-by: Will Deacon <will@kernel.org> [arm64] Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com> Cc: Xu Qiang <xuqiang36@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-