- 29 3月, 2023 5 次提交
-
-
由 Miaohe Lin 提交于
mainline inclusion from mainline-v6.3-rc1 commit 3109de30 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6POXN CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3109de308987ceae413ee015038d51e2a86c7806 -------------------------------- It's possible that kcompactd_run could fail to run kcompactd for a hot added node and leave pgdat->kcompactd as NULL. So pgdat->kcompactd should be checked here to avoid possible NULL pointer dereference. Link: https://lkml.kernel.org/r/20220418141253.24298-10-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com> Cc: Charan Teja Kalla <charante@codeaurora.org> Cc: David Hildenbrand <david@redhat.com> Cc: Pintu Kumar <pintu@codeaurora.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NZe Zuo <zuoze1@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 ZhangPeng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6LD0S CVE: NA ---------------------------------------- This reverts commit 49ed1f1e. It will have a great impact on the product if we can't use vmalloc to alloc high-order physical pages. Signed-off-by: NZhangPeng <zhangpeng362@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Longlong Xia 提交于
mainline inclusion from mainline-v6.2-rc7 commit 7717fc1a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I645DG CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7717fc1a12f88701573f9ed897cc4f6699c661e3 ---------------------------------------- The softlockup still occurs in get_swap_pages() under memory pressure. 64 CPU cores, 64GB memory, and 28 zram devices, the disksize of each zram device is 50MB with same priority as si. Use the stress-ng tool to increase memory pressure, causing the system to oom frequently. The plist_for_each_entry_safe() loops in get_swap_pages() could reach tens of thousands of times to find available space (extreme case: cond_resched() is not called in scan_swap_map_slots()). Let's add cond_resched() into get_swap_pages() when failed to find available space to avoid softlockup. Link: https://lkml.kernel.org/r/20230128094757.1060525-1-xialonglong1@huawei.comSigned-off-by: NLonglong Xia <xialonglong1@huawei.com> Reviewed-by: N"Huang, Ying" <ying.huang@intel.com> Cc: Chen Wandun <chenwandun@huawei.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NZhangPeng <zhangpeng362@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 ZhangPeng 提交于
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6PKGM Reference: https://lore.kernel.org/linux-mm/20220824071909.192535-1-wangkefeng.wang@huawei.com/ -------------------------------- The pgdat->kswapd could be accessed concurrently by kswapd_run() and kcompactd(), it don't be protected by any lock, which could leads to data races, adding READ/WRITE_ONCE() to slince it. Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Conflicts: mm/compaction.c mm/vmscan.c Signed-off-by: NZhangPeng <zhangpeng362@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 ZhangPeng 提交于
maillist inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6PKGM Reference: https://lore.kernel.org/linux-mm/20220824071909.192535-1-wangkefeng.wang@huawei.com/ -------------------------------- wapd_run/stop() will set pgdat->kswapd to NULL, which could race with kswapd_is_running() in kcompactd(), kswapd_run/stop() kcompactd() kswapd_is_running() if (pgdat->kswapd) // load non-NULL pgdat->kswapd pgdat->kswapd = NULL task_is_running(pgdat->kswapd) // Null pointer derefence The KASAN report the null-ptr-deref shown below, vmscan: Failed to start kswapd on node 0 ... BUG: KASAN: null-ptr-deref in kcompactd+0x440/0x504 Read of size 8 at addr 0000000000000024 by task kcompactd0/37 CPU: 0 PID: 37 Comm: kcompactd0 Kdump: loaded Tainted: G OE 5.10.60 #1 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x0/0x394 show_stack+0x34/0x4c dump_stack+0x158/0x1e4 __kasan_report+0x138/0x140 kasan_report+0x44/0xdc __asan_load8+0x94/0xd0 kcompactd+0x440/0x504 kthread+0x1a4/0x1f0 ret_from_fork+0x10/0x18 For race between kswapd_run() and kcompactd(), adding a temporary value when create a kthread, and only set it to pgdat->kswapd if kthread_run() return successful task_struct to fix the issue. For race between kswapd_stop() and kcompactd(), let's call kcompactd_stop() before kswapd_stop() to fix the issue. Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Conflicts: mm/vmscan.c Signed-off-by: NZhangPeng <zhangpeng362@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 22 3月, 2023 3 次提交
-
-
由 David Hildenbrand 提交于
mainline inclusion from mainline-v5.18-rc1 commit d4c47097 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6NK0S CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d4c470970d45c863fafc757521a82be2f80b1232 -------------------------------- For example, if a page just got swapped in via a read fault, the LRU pagevecs might still hold a reference to the page. If we trigger a write fault on such a page, the additional reference from the LRU pagevecs will prohibit reusing the page. Let's conditionally drain the local LRU pagevecs when we stumble over a !PageLRU() page. We cannot easily drain remote LRU pagevecs and it might not be desirable performance-wise. Consequently, this will only avoid copying in some cases. Add a simple "page_count(page) > 3" check first but keep the "page_count(page) > 1 + PageSwapCache(page)" check in place, as we want to minimize cases where we remove a page from the swapcache but won't be able to reuse it, for example, because another process has it mapped R/O, to not affect reclaim. We cannot easily handle the following cases and we will always have to copy: (1) The page is referenced in the LRU pagevecs of other CPUs. We really would have to drain the LRU pagevecs of all CPUs -- most probably copying is much cheaper. (2) The page is already PageLRU() but is getting moved between LRU lists, for example, for activation (e.g., mark_page_accessed()), deactivation (MADV_COLD), or lazyfree (MADV_FREE). We'd have to drain mostly unconditionally, which might be bad performance-wise. Most probably this won't happen too often in practice. Note that there are other reasons why an anon page might temporarily not be PageLRU(): for example, compaction and migration have to isolate LRU pages from the LRU lists first (isolate_lru_page()), moving them to temporary local lists and clearing PageLRU() and holding an additional reference on the page. In that case, we'll always copy. This change seems to be fairly effective with the reproducer [1] shared by Nadav, as long as writeback is done synchronously, for example, using zram. However, with asynchronous writeback, we'll usually fail to free the swapcache because the page is still under writeback: something we cannot easily optimize for, and maybe it's not really relevant in practice. [1] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com Link: https://lkml.kernel.org/r/20220131162940.210846-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: David Rientjes <rientjes@google.com> Cc: Don Dutile <ddutile@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Liang Zhang <zhangliang5@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NZhangPeng <zhangpeng362@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 David Hildenbrand 提交于
mainline inclusion from mainline-v5.18-rc1 commit 53a05ad9 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6NK0S CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=53a05ad9f21d858d24f76d12b3e990405f2036d1 -------------------------------- Patch series "mm: COW fixes part 1: fix the COW security issue for THP and swap", v3. This series attempts to optimize and streamline the COW logic for ordinary anon pages and THP anon pages, fixing two remaining instances of CVE-2020-29374 in do_swap_page() and do_huge_pmd_wp_page(): information can leak from a parent process to a child process via anonymous pages shared during fork(). This issue, including other related COW issues, has been summarized in [2]: "1. Observing Memory Modifications of Private Pages From A Child Process Long story short: process-private memory might not be as private as you think once you fork(): successive modifications of private memory regions in the parent process can still be observed by the child process, for example, by smart use of vmsplice()+munmap(). The core problem is that pinning pages readable in a child process, such as done via the vmsplice system call, can result in a child process observing memory modifications done in the parent process the child is not supposed to observe. [1] contains an excellent summary and [2] contains further details. This issue was assigned CVE-2020-29374 [9]. For this to trigger, it's required to use a fork() without subsequent exec(), for example, as used under Android zygote. Without further details about an application that forks less-privileged child processes, one cannot really say what's actually affected and what's not -- see the details section the end of this mail for a short sshd/openssh analysis. While commit 17839856 ("gup: document and work around "COW can break either way" issue") fixed this issue and resulted in other problems (e.g., ptrace on pmem), commit 09854ba9 ("mm: do_wp_page() simplification") re-introduced part of the problem unfortunately. The original reproducer can be modified quite easily to use THP [3] and make the issue appear again on upstream kernels. I modified it to use hugetlb [4] and it triggers as well. The problem is certainly less severe with hugetlb than with THP; it merely highlights that we still have plenty of open holes we should be closing/fixing. Regarding vmsplice(), the only known workaround is to disallow the vmsplice() system call ... or disable THP and hugetlb. But who knows what else is affected (RDMA? O_DIRECT?) to achieve the same goal -- in the end, it's a more generic issue" This security issue was first reported by Jann Horn on 27 May 2020 and it currently affects anonymous pages during swapin, anonymous THP and hugetlb. This series tackles anonymous pages during swapin and anonymous THP: - do_swap_page() for handling COW on PTEs during swapin directly - do_huge_pmd_wp_page() for handling COW on PMD-mapped THP during write faults With this series, we'll apply the same COW logic we have in do_wp_page() to all swappable anon pages: don't reuse (map writable) the page in case there are additional references (page_count() != 1). All users of reuse_swap_page() are remove, and consequently reuse_swap_page() is removed. In general, we're struggling with the following COW-related issues: (1) "missed COW": we miss to copy on write and reuse the page (map it writable) although we must copy because there are pending references from another process to this page. The result is a security issue. (2) "wrong COW": we copy on write although we wouldn't have to and shouldn't: if there are valid GUP references, they will become out of sync with the pages mapped into the page table. We fail to detect that such a page can be reused safely, especially if never more than a single process mapped the page. The result is an intra process memory corruption. (3) "unnecessary COW": we copy on write although we wouldn't have to: performance degradation and temporary increases swap+memory consumption can be the result. While this series fixes (1) for swappable anon pages, it tries to reduce reported cases of (3) first as good and easy as possible to limit the impact when streamlining. The individual patches try to describe in which cases we will run into (3). This series certainly makes (2) worse for THP, because a THP will now get PTE-mapped on write faults if there are additional references, even if there was only ever a single process involved: once PTE-mapped, we'll copy each and every subpage and won't reuse any subpage as long as the underlying compound page wasn't split. I'm working on an approach to fix (2) and improve (3): PageAnonExclusive to mark anon pages that are exclusive to a single process, allow GUP pins only on such exclusive pages, and allow turning exclusive pages shared (clearing PageAnonExclusive) only if there are no GUP pins. Anon pages with PageAnonExclusive set never have to be copied during write faults, but eventually during fork() if they cannot be turned shared. The improved reuse logic in this series will essentially also be the logic to reset PageAnonExclusive. This work will certainly take a while, but I'm planning on sharing details before having code fully ready. cleanups related to reuse_swap_page(). Notes: * For now, I'll leave hugetlb code untouched: "unnecessary COW" might easily break existing setups because hugetlb pages are a scarce resource and we could just end up having to crash the application when we run out of hugetlb pages. We have to be very careful and the security aspect with hugetlb is most certainly less relevant than for unprivileged anon pages. * Instead of lru_add_drain() we might actually just drain the lru_add list or even just remove the single page of interest from the lru_add list. This would require a new helper function, and could be added if the conditional lru_add_drain() turn out to be a problem. * I extended the test case already included in [1] to also test for the newly found do_swap_page() case. I'll send that out separately once/if this part was merged. [1] https://lkml.kernel.org/r/20211217113049.23850-1-david@redhat.com [2] https://lore.kernel.org/r/3ae33b08-d9ef-f846-56fb-645e3b9b4c66@redhat.com This patch (of 9): Liang Zhang reported [1] that the current COW logic in do_wp_page() is sub-optimal when it comes to swap+read fault+write fault of anonymous pages that have a single user, visible via a performance degradation in the redis benchmark. Something similar was previously reported [2] by Nadav with a simple reproducer. After we put an anon page into the swapcache and unmapped it from a single process, that process might read that page again and refault it read-only. If that process then writes to that page, the process is actually the exclusive user of the page, however, the COW logic in do_co_page() won't be able to reuse it due to the additional reference from the swapcache. Let's optimize for pages that have been added to the swapcache but only have an exclusive user. Try removing the swapcache reference if there is hope that we're the exclusive user. We will fail removing the swapcache reference in two scenarios: (1) There are additional swap entries referencing the page: copying instead of reusing is the right thing to do. (2) The page is under writeback: theoretically we might be able to reuse in some cases, however, we cannot remove the additional reference and will have to copy. Note that we'll only try removing the page from the swapcache when it's highly likely that we'll be the exclusive owner after removing the page from the swapache. As we're about to map that page writable and redirty it, that should not affect reclaim but is rather the right thing to do. Further, we might have additional references from the LRU pagevecs, which will force us to copy instead of being able to reuse. We'll try handling such references for some scenarios next. Concurrent writeback cannot be handled easily and we'll always have to copy. While at it, remove the superfluous page_mapcount() check: it's implicitly covered by the page_count() for ordinary anon pages. [1] https://lkml.kernel.org/r/20220113140318.11117-1-zhangliang5@huawei.com [2] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com Link: https://lkml.kernel.org/r/20220131162940.210846-2-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com> Reported-by: NLiang Zhang <zhangliang5@huawei.com> Reported-by: NNadav Amit <nadav.amit@gmail.com> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Don Dutile <ddutile@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NZhangPeng <zhangpeng362@huawei.com> Reviewed-by: Ntong tiangen <tongtiangen@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Nicholas Piggin 提交于
mainline inclusion from mainline-v5.18-rc4 commit 3b8000ae category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6LD0S CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3b8000ae185cb068adbda5f966a3835053c85fd4 -------------------------------- Huge vmalloc higher-order backing pages were allocated with __GFP_COMP in order to allow the sub-pages to be refcounted by callers such as "remap_vmalloc_page [sic]" (remap_vmalloc_range). However a similar problem exists for other struct page fields callers use, for example fb_deferred_io_fault() takes a vmalloc'ed page and not only refcounts it but uses ->lru, ->mapping, ->index. This is not compatible with compound sub-pages, and can cause bad page state issues like BUG: Bad page state in process swapper/0 pfn:00743 page:(____ptrval____) refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x743 flags: 0x7ffff000000000(node=0|zone=0|lastcpupid=0x7ffff) raw: 007ffff000000000 c00c00000001d0c8 c00c00000001d0c8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: corrupted mapping in tail page Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc3-00082-gfc6fff4a7ce1-dirty #2810 Call Trace: dump_stack_lvl+0x74/0xa8 (unreliable) bad_page+0x12c/0x170 free_tail_pages_check+0xe8/0x190 free_pcp_prepare+0x31c/0x4e0 free_unref_page+0x40/0x1b0 __vunmap+0x1d8/0x420 ... The correct approach is to use split high-order pages for the huge vmalloc backing. These allow callers to treat them in exactly the same way as individually-allocated order-0 pages. Link: https://lore.kernel.org/all/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Cc: Song Liu <songliubraving@fb.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> conflicts: mm/vmalloc.c Signed-off-by: NZhangPeng <zhangpeng362@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 22 2月, 2023 1 次提交
-
-
由 Rik van Riel 提交于
mainline inclusion from mainline-v6.1-rc2 commit 12df140f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6EVPO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=12df140f0bdfae5dcfc81800970dd7f6f632e00c -------------------------------- The h->*_huge_pages counters are protected by the hugetlb_lock, but alloc_huge_page has a corner case where it can decrement the counter outside of the lock. This could lead to a corrupted value of h->resv_huge_pages, which we have observed on our systems. Take the hugetlb_lock before decrementing h->resv_huge_pages to avoid a potential race. Link: https://lkml.kernel.org/r/20221017202505.0e6a4fcd@imladris.surriel.com Fixes: a88c7695 ("mm: hugetlb: fix hugepage memory leak caused by wrong reserve count") Signed-off-by: NRik van Riel <riel@surriel.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Glen McCready <gkmccready@meta.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NZhang Peng <zhangpeng362@huawei.com> Reviewed-by: Ntong tiangen <tongtiangen@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 08 2月, 2023 2 次提交
-
-
由 Liu Shixin 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6ADCF CVE: NA -------------------------------- syzbot is reporting GFP_KERNEL allocation with oom_lock held when reporting memcg OOM [1]. If this allocation triggers the global OOM situation then the system can livelock because the GFP_KERNEL allocation with oom_lock held cannot trigger the global OOM killer because __alloc_pages_may_oom() fails to hold oom_lock. The problem mentioned above has been fixed by patch[2]. The is the same problem in memcg_memfs_info feature too. Refer to the patch[2], fix it by removing the allocation from mem_cgroup_print_memfs_info() completely, and pass static buffer when calling from memcg OOM path. Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1] Link: https://lkml.kernel.org/r/86afb39f-8c65-bec2-6cfc-c5e3cd600c0b@I-love.SAKURA.ne.jp [2] Fixes: 6b1d4d3a ("mm/memcg_memfs_info: show files that having pages charged in mem_cgroup") Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Tetsuo Handa 提交于
mainline inclusion from mainline-v6.0-rc1 commit 68aaee14 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6ADCF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=68aaee147e597b495622b7c9038e5922c7c61f57 -------------------------------- syzbot is reporting GFP_KERNEL allocation with oom_lock held when reporting memcg OOM [1]. If this allocation triggers the global OOM situation then the system can livelock because the GFP_KERNEL allocation with oom_lock held cannot trigger the global OOM killer because __alloc_pages_may_oom() fails to hold oom_lock. Fix this problem by removing the allocation from memory_stat_format() completely, and pass static buffer when calling from memcg OOM path. Note that the caller holding filesystem lock was the trigger for syzbot to report this locking dependency. Doing GFP_KERNEL allocation with filesystem lock held can deadlock the system even without involving OOM situation. Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1] Link: https://lkml.kernel.org/r/86afb39f-8c65-bec2-6cfc-c5e3cd600c0b@I-love.SAKURA.ne.jp Fixes: c8713d0b ("mm: memcontrol: dump memory.stat during cgroup OOM") Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Nsyzbot <syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com> Suggested-by: NMichal Hocko <mhocko@suse.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> conflicts: mm/memcontrol.c Signed-off-by: NCai Xinchen <caixinchen1@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 31 1月, 2023 2 次提交
-
-
由 Yuanzheng Song 提交于
mainline inclusion from mainline-v5.16-rc1 commit 7e6ec49c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6AW65 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7e6ec49c18988f1b8dab0677271dafde5f8d9a43 -------------------------------- When reading memcg->socket_pressure in mem_cgroup_under_socket_pressure() and writing memcg->socket_pressure in vmpressure() at the same time, the following data-race occurs: BUG: KCSAN: data-race in __sk_mem_reduce_allocated / vmpressure write to 0xffff8881286f4938 of 8 bytes by task 24550 on cpu 3: vmpressure+0x218/0x230 mm/vmpressure.c:307 shrink_node_memcgs+0x2b9/0x410 mm/vmscan.c:2658 shrink_node+0x9d2/0x11d0 mm/vmscan.c:2769 shrink_zones+0x29f/0x470 mm/vmscan.c:2972 do_try_to_free_pages+0x193/0x6e0 mm/vmscan.c:3027 try_to_free_mem_cgroup_pages+0x1c0/0x3f0 mm/vmscan.c:3345 reclaim_high mm/memcontrol.c:2440 [inline] mem_cgroup_handle_over_high+0x18b/0x4d0 mm/memcontrol.c:2624 tracehook_notify_resume include/linux/tracehook.h:197 [inline] exit_to_user_mode_loop kernel/entry/common.c:164 [inline] exit_to_user_mode_prepare+0x110/0x170 kernel/entry/common.c:191 syscall_exit_to_user_mode+0x16/0x30 kernel/entry/common.c:266 ret_from_fork+0x15/0x30 arch/x86/entry/entry_64.S:289 read to 0xffff8881286f4938 of 8 bytes by interrupt on cpu 1: mem_cgroup_under_socket_pressure include/linux/memcontrol.h:1483 [inline] sk_under_memory_pressure include/net/sock.h:1314 [inline] __sk_mem_reduce_allocated+0x1d2/0x270 net/core/sock.c:2696 __sk_mem_reclaim+0x44/0x50 net/core/sock.c:2711 sk_mem_reclaim include/net/sock.h:1490 [inline] ...... net_rx_action+0x17a/0x480 net/core/dev.c:6864 __do_softirq+0x12c/0x2af kernel/softirq.c:298 run_ksoftirqd+0x13/0x20 kernel/softirq.c:653 smpboot_thread_fn+0x33f/0x510 kernel/smpboot.c:165 kthread+0x1fc/0x220 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 Fix it by using READ_ONCE() and WRITE_ONCE() to read and write memcg->socket_pressure. Link: https://lkml.kernel.org/r/20211025082843.671690-1-songyuanzheng@huawei.comSigned-off-by: NYuanzheng Song <songyuanzheng@huawei.com> Reviewed-by: NMuchun Song <songmuchun@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Alex Shi <alexs@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NCai Xinchen <caixinchen1@huawei.com> Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Alistair Popple 提交于
mainline inclusion from mainline-v6.1-rc7 commit 4a955bed category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6BG56 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4a955bed882e734807024afd8f53213d4c61ff97 -------------------------------- The migrate_to_ram() callback should always succeed, but in rare cases can fail usually returning VM_FAULT_SIGBUS. Commit 16ce101d ("mm/memory.c: fix race when faulting a device private page") incorrectly stopped passing the return code up the stack. Fix this by setting the ret variable, restoring the previous behaviour on migrate_to_ram() failure. Link: https://lkml.kernel.org/r/20221114115537.727371-1-apopple@nvidia.com Fixes: 16ce101d ("mm/memory.c: fix race when faulting a device private page") Signed-off-by: NAlistair Popple <apopple@nvidia.com> Acked-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NMa Wupeng <mawupeng1@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Ntong tiangen <tongtiangen@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 18 1月, 2023 7 次提交
-
-
由 Qi Zheng 提交于
mainline inclusion from mainline-v6.1-rc7 commit ea4452de category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I69VVC CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ea4452de2ae987342fadbdd2c044034e6480daad -------------------------------- When we specify __GFP_NOWARN, we only expect that no warnings will be issued for current caller. But in the __should_failslab() and __should_fail_alloc_page(), the local GFP flags alter the global {failslab|fail_page_alloc}.attr, which is persistent and shared by all tasks. This is not what we expected, let's fix it. [akpm@linux-foundation.org: unexport should_fail_ex()] Link: https://lkml.kernel.org/r/20221118100011.2634-1-zhengqi.arch@bytedance.com Fixes: 3f913fc5 ("mm: fix missing handler for __GFP_NOWARN") Signed-off-by: NQi Zheng <zhengqi.arch@bytedance.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Reviewed-by: NAkinobu Mita <akinobu.mita@gmail.com> Reviewed-by: NJason Gunthorpe <jgg@nvidia.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NYe Weihua <yeweihua4@huawei.com> Reviewed-by: Ntong tiangen <tongtiangen@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Guo Mengqi 提交于
hulk inclusion category: other bugzilla: https://gitee.com/openeuler/kernel/issues/I69JDF CVE: NA ------------------------------- Delete svm driver, as it is specially designed for Hisilicon platform. Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Matthew Wilcox (Oracle) 提交于
mainline inclusion from mainline-v5.16-rc1 commit d417b49f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6110W CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d417b49fff3e2f21043c834841e8623a6098741d -------------------------------- It is not safe to check page->index without holding the page lock. It can be changed if the page is moved between the swap cache and the page cache for a shmem file, for example. There is a VM_BUG_ON below which checks page->index is correct after taking the page lock. Link: https://lkml.kernel.org/r/20210818144932.940640-1-willy@infradead.org Fixes: 5c211ba2 ("mm: add and use find_lock_entries") Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reported-by: <syzbot+c87be4f669d920c76330@syzkaller.appspotmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NMa Wupeng <mawupeng1@huawei.com> Reviewed-by: Ntong tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Ma Wupeng 提交于
mm: oom_kill: fix KABI broken by "oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup" hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I61FDP CVE: NA ------------------------------- Move oom_reaper_timer from task_struct to task_struct_resvd to fix KABI broken. Signed-off-by: NMa Wupeng <mawupeng1@huawei.com> Reviewed-by: NNanyong Sun <sunnanyong@huawei.com> Reviewed-by: Nchenhui <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Nico Pache 提交于
mainline inclusion from mainline-v5.18-rc4 commit e4a38402 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61FDP CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e4a38402c36e42df28eb1a5394be87e6571fb48a -------------------------------- The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can be targeted by the oom reaper. This mapping is used to store the futex robust list head; the kernel does not keep a copy of the robust list and instead references a userspace address to maintain the robustness during a process death. A race can occur between exit_mm and the oom reaper that allows the oom reaper to free the memory of the futex robust list before the exit path has handled the futex death: CPU1 CPU2 -------------------------------------------------------------------- page_fault do_exit "signal" wake_oom_reaper oom_reaper oom_reap_task_mm (invalidates mm) exit_mm exit_mm_release futex_exit_release futex_cleanup exit_robust_list get_user (EFAULT- can't access memory) If the get_user EFAULT's, the kernel will be unable to recover the waiters on the robust_list, leaving userspace mutexes hung indefinitely. Delay the OOM reaper, allowing more time for the exit path to perform the futex cleanup. Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer Based on a patch by Michal Hocko. Link: https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370 [1] Link: https://lkml.kernel.org/r/20220414144042.677008-1-npache@redhat.com Fixes: 21292580 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Signed-off-by: NJoel Savitz <jsavitz@redhat.com> Signed-off-by: NNico Pache <npache@redhat.com> Co-developed-by: NJoel Savitz <jsavitz@redhat.com> Suggested-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Waiman Long <longman@redhat.com> Cc: Herton R. Krzesinski <herton@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Savitz <jsavitz@redhat.com> Cc: Darren Hart <dvhart@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NMa Wupeng <mawupeng1@huawei.com> Reviewed-by: NNanyong Sun <sunnanyong@huawei.com> Reviewed-by: Nchenhui <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Hugh Dickins 提交于
mainline inclusion from mainline-v5.18-rc3 commit 1bdec44b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6113U CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1bdec44b1eee32e311b44b5b06144bb7d9b33938 -------------------------------- Chuck Lever reported fsx-based xfstests generic 075 091 112 127 failing when 5.18-rc1 NFS server exports tmpfs: bisected to recent tmpfs change. Whilst nfsd_splice_action() does contain some questionable handling of repeated pages, and Chuck was able to work around there, history from Mark Hemment makes clear that there might be similar dangers elsewhere: it was not a good idea for me to pass ZERO_PAGE down to unknown actors. Revert shmem_file_read_iter() to using ZERO_PAGE for holes only when iter_is_iovec(); in other cases, use the more natural iov_iter_zero() instead of copy_page_to_iter(). We would use iov_iter_zero() throughout, but the x86 clear_user() is not nearly so well optimized as copy to user (dd of 1T sparse tmpfs file takes 57 seconds rather than 44 seconds). And now pagecache_init() does not need to SetPageUptodate(ZERO_PAGE(0)): which had caused boot failure on arm noMMU STM32F7 and STM32H7 boards Link: https://lkml.kernel.org/r/9a978571-8648-e830-5735-1f4748ce2e30@google.com Fixes: 56a8c8eb ("tmpfs: do not allocate pages on read") Signed-off-by: NHugh Dickins <hughd@google.com> Reported-by: NPatrice CHOTARD <patrice.chotard@foss.st.com> Reported-by: NChuck Lever III <chuck.lever@oracle.com> Tested-by: NChuck Lever III <chuck.lever@oracle.com> Cc: Mark Hemment <markhemm@googlemail.com> Cc: Patrice CHOTARD <patrice.chotard@foss.st.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Lukas Czerner <lczerner@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NMa Wupeng <mawupeng1@huawei.com> Reviewed-by: NNanyong Sun <sunnanyong@huawei.com> Reviewed-by: Ntong tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Hugh Dickins 提交于
mainline inclusion from mainline-v5.18-rc1 commit 56a8c8eb category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6113U CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=56a8c8eb1eaf21261be8cdc4e3715239ac087342 -------------------------------- Mikulas asked in "Do we still need commit a0ee5ec5 ('tmpfs: allocate on read when stacked')?" in [1] Lukas noticed this unusual behavior of loop device backed by tmpfs in [2]. Normally, shmem_file_read_iter() copies the ZERO_PAGE when reading holes; but if it looks like it might be a read for "a stacking filesystem", it allocates actual pages to the page cache, and even marks them as dirty. And reads from the loop device do satisfy the test that is used. This oddity was added for an old version of unionfs, to help to limit its usage to the limited size of the tmpfs mount involved; but about the same time as the tmpfs mod went in (2.6.25), unionfs was reworked to proceed differently; and the mod kept just in case others needed it. Do we still need it? I cannot answer with more certainty than "Probably not". It's nasty enough that we really should try to delete it; but if a regression is reported somewhere, then we might have to revert later. It's not quite as simple as just removing the test (as Mikulas did): xfstests generic/013 hung because splice from tmpfs failed on page not up-to-date and page mapping unset. That can be fixed just by marking the ZERO_PAGE as Uptodate, which of course it is: do so in pagecache_init() - it might be useful to others than tmpfs. My intention, though, was to stop using the ZERO_PAGE here altogether: surely iov_iter_zero() is better for this case? Sadly not: it relies on clear_user(), and the x86 clear_user() is slower than its copy_user() [3]. But while we are still using the ZERO_PAGE, let's stop dirtying its struct page cacheline with unnecessary get_page() and put_page(). Link: https://lore.kernel.org/linux-mm/alpine.LRH.2.02.2007210510230.6959@file01.intranet.prod.int.rdu2.redhat.com/ [1] Link: https://lore.kernel.org/linux-mm/20211126075100.gd64odg2bcptiqeb@work/ [2] Link: https://lore.kernel.org/lkml/2f5ca5e4-e250-a41c-11fb-a7f4ebc7e1c9@google.com/ [3] Link: https://lkml.kernel.org/r/90bc5e69-9984-b5fa-a685-be55f2b64b@google.comSigned-off-by: NHugh Dickins <hughd@google.com> Reported-by: NMikulas Patocka <mpatocka@redhat.com> Reported-by: NLukas Czerner <lczerner@redhat.com> Acked-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Zdenek Kabelac <zkabelac@redhat.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NMa Wupeng <mawupeng1@huawei.com> Reviewed-by: Ntong tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 04 1月, 2023 2 次提交
-
-
由 Liu Shixin 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I66OCA CVE: NA -------------------------------- Since free_pages_prepare() will clear the PagePool without lock in free_page_to_dhugetlb_pool() and free_page_list_to_dhugetlb_pool(), it is unreliable to check whether a page is freed by PagePool in hpool_merge_page(). Move free_pages_prepare() after ClearPagePool(), which can guarantee all allocated page has PagePool flag. Fixes: 71197c63 ("mm/dynamic_hugetlb: free pages to dhugetlb_pool") Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Liu Shixin 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I66OCA CVE: NA -------------------------------- The percpu pool will be cleared by clear_percpu_pools(), and then check whether all pages are already freed. If some pages are not freed, we will firstly isolate the freed pages and then migrate the used pages. Since we missed to get lock of percpu_pool, the used pages can be free to percpu_pool while the isolation is going. In such case, the list operation will be unreliable. To fix this problem, we need to get all related locks sequentially and clear the perpcu_pool again before isolate the freed pages. Fixes: cdbeee51 ("mm/dynamic_hugetlb: add migration function") Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 13 12月, 2022 3 次提交
-
-
由 Liu Shixin 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I641XX CVE: NA -------------------------------- Patch 1378a5ee ("mm: store compound_nr as well as compound_order") add a new member compound_nr in struct page, and use this new member insteal of compound_order in hugetlb_cgroup_move_parent() to compute the nr_pages. In free_hugepage_to_hugetlb(), we reset page->mapping to NULL for each subpage. Since page->mapping and page->compound_nr is union, we reset page->compound_nr too unexpectly. This will finally result the nr_pages incorrect in hugetlb_cgroup_move_parent() and can't release hugetlb_cgroup. Fix this problem by reset page->compound_nr using set_compound_order(). Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NNanyong Sun <sunnanyong@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 David Hildenbrand 提交于
stable inclusion from stable-v5.10.140 commit 62af37c5cd7f5fd071086cab645844bf5bcdc0ef category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=62af37c5cd7f5fd071086cab645844bf5bcdc0ef -------------------------------- commit f96f7a40 upstream. Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2. I observed that hugetlb does not support/expect write-faults in shared mappings that would have to map the R/O-mapped page writable -- and I found two case where we could currently get such faults and would erroneously map an anon page into a shared mapping. Reproducers part of the patches. I propose to backport both fixes to stable trees. The first fix needs a small adjustment. This patch (of 2): Staring at hugetlb_wp(), one might wonder where all the logic for shared mappings is when stumbling over a write-protected page in a shared mapping. In fact, there is none, and so far we thought we could get away with that because e.g., mprotect() should always do the right thing and map all pages directly writable. Looks like we were wrong: -------------------------------------------------------------------------- #include <stdio.h> #include <stdlib.h> #include <string.h> #include <fcntl.h> #include <unistd.h> #include <errno.h> #include <sys/mman.h> #define HUGETLB_SIZE (2 * 1024 * 1024u) static void clear_softdirty(void) { int fd = open("/proc/self/clear_refs", O_WRONLY); const char *ctrl = "4"; int ret; if (fd < 0) { fprintf(stderr, "open(clear_refs) failed\n"); exit(1); } ret = write(fd, ctrl, strlen(ctrl)); if (ret != strlen(ctrl)) { fprintf(stderr, "write(clear_refs) failed\n"); exit(1); } close(fd); } int main(int argc, char **argv) { char *map; int fd; fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT); if (!fd) { fprintf(stderr, "open() failed\n"); return -errno; } if (ftruncate(fd, HUGETLB_SIZE)) { fprintf(stderr, "ftruncate() failed\n"); return -errno; } map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if (map == MAP_FAILED) { fprintf(stderr, "mmap() failed\n"); return -errno; } *map = 0; if (mprotect(map, HUGETLB_SIZE, PROT_READ)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; } clear_softdirty(); if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; } *map = 0; return 0; } -------------------------------------------------------------------------- Above test fails with SIGBUS when there is only a single free hugetlb page. # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test Bus error (core dumped) And worse, with sufficient free hugetlb pages it will map an anonymous page into a shared mapping, for example, messing up accounting during unmap and breaking MAP_SHARED semantics: # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test # cat /proc/meminfo | grep HugePages_ HugePages_Total: 2 HugePages_Free: 1 HugePages_Rsvd: 18446744073709551615 HugePages_Surp: 0 Reason in this particular case is that vma_wants_writenotify() will return "true", removing VM_SHARED in vma_set_page_prot() to map pages write-protected. Let's teach vma_wants_writenotify() that hugetlb does not support softdirty tracking. Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com Fixes: 64e45507 ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared") Signed-off-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Peter Feiner <pfeiner@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Jamie Liu <jamieliu@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> [3.18+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
由 Miaohe Lin 提交于
stable inclusion from stable-v5.10.140 commit c7c77185fa3e9f8c3358426c2584a5b1dc1fdf0f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c7c77185fa3e9f8c3358426c2584a5b1dc1fdf0f -------------------------------- [ Upstream commit a44f89dc ] It's more recommended to use helper function migration_entry_to_page() to get the page via migration entry. We can also enjoy the PageLocked() check there. Link: https://lkml.kernel.org/r/20210318122722.13135-7-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michel Lespinasse <walken@google.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Thomas Hellstrm (Intel) <thomas_os@shipmail.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: yuleixzhang <yulei.kernel@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
- 07 12月, 2022 2 次提交
-
-
由 Liu Shixin 提交于
stable inclusion from stable-v5.10.150 commit 45c33966759ea1b4040c08dacda99ef623c0ca29 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I62WRY CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=45c33966759ea1b4040c08dacda99ef623c0ca29 -------------------------------- commit 958f32ce upstream. The vma_lock and hugetlb_fault_mutex are dropped before handling userfault and reacquire them again after handle_userfault(), but reacquire the vma_lock could lead to UAF[1,2] due to the following race, hugetlb_fault hugetlb_no_page /*unlock vma_lock */ hugetlb_handle_userfault handle_userfault /* unlock mm->mmap_lock*/ vm_mmap_pgoff do_mmap mmap_region munmap_vma_range /* clean old vma */ /* lock vma_lock again <--- UAF */ /* unlock vma_lock */ Since the vma_lock will unlock immediately after hugetlb_handle_userfault(), let's drop the unneeded lock and unlock in hugetlb_handle_userfault() to fix the issue. [1] https://lore.kernel.org/linux-mm/000000000000d5e00a05e834962e@google.com/ [2] https://lore.kernel.org/linux-mm/20220921014457.1668-1-liuzixian4@huawei.com/ Link: https://lkml.kernel.org/r/20220923042113.137273-1-liushixin2@huawei.com Fixes: 1a1aad8a ("userfaultfd: hugetlbfs: add userfaultfd hugetlb hook") Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reported-by: syzbot+193f9cee8638750b23cf@syzkaller.appspotmail.com Reported-by: NLiu Zixian <liuzixian4@huawei.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: mm/hugetlb.c Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NChen Wandun <chenwandun@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Qi Zheng 提交于
Offering: HULK mainline inclusion from mainline-v5.19-rc1 commit 3f913fc5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I610B5 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3f913fc5f9745613088d3c569778c9813ab9c129 -------------------------------- We expect no warnings to be issued when we specify __GFP_NOWARN, but currently in paths like alloc_pages() and kmalloc(), there are still some warnings printed, fix it. But for some warnings that report usage problems, we don't deal with them. If such warnings are printed, then we should fix the usage problems. Such as the following case: WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); [zhengqi.arch@bytedance.com: v2] Link: https://lkml.kernel.org/r/20220511061951.1114-1-zhengqi.arch@bytedance.com Link: https://lkml.kernel.org/r/20220510113809.80626-1-zhengqi.arch@bytedance.comSigned-off-by: NQi Zheng <zhengqi.arch@bytedance.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Conflict: mm/internal.h mm/page_alloc.c Signed-off-by: NYe Weihua <yeweihua4@huawei.com> Reviewed-by: NKuohai Xu <xukuohai@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 29 11月, 2022 1 次提交
-
-
由 Feiyang Chen 提交于
LoongArch inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5OHOB -------------------------------- Add sparse memory vmemmap support for LoongArch. SPARSEMEM_VMEMMAP uses a virtually mapped memmap to optimise pfn_to_page and page_to_pfn operations. This is the most efficient option when sufficient kernel resources are available. Signed-off-by: NMin Zhou <zhoumin@loongson.cn> Signed-off-by: NFeiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: NHuacai Chen <chenhuacai@loongson.cn>
-
- 18 11月, 2022 1 次提交
-
-
由 Miaohe Lin 提交于
stable inclusion from stable-v5.10.137 commit 4ffa6cecb53d46af8f869cc7a5a376341ebef79f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4ffa6cecb53d46af8f869cc7a5a376341ebef79f -------------------------------- [ Upstream commit 7f82f922 ] Since the beginning, charged is set to 0 to avoid calling vm_unacct_memory twice because vm_unacct_memory will be called by above unmap_region. But since commit 4f74d2c8 ("vm: remove 'nr_accounted' calculations from the unmap_vmas() interfaces"), unmap_region doesn't call vm_unacct_memory anymore. So charged shouldn't be set to 0 now otherwise the calling to paired vm_unacct_memory will be missed and leads to imbalanced account. Link: https://lkml.kernel.org/r/20220618082027.43391-1-linmiaohe@huawei.com Fixes: 4f74d2c8 ("vm: remove 'nr_accounted' calculations from the unmap_vmas() interfaces") Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
- 10 11月, 2022 2 次提交
-
-
由 Jaewon Kim 提交于
stable inclusion from stable-v5.10.135 commit 2670f76a563124478d0d14e603b38b73b99c389c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZWFM Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2670f76a563124478d0d14e603b38b73b99c389c -------------------------------- commit 9282012f upstream. There was a report that a task is waiting at the throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was increasing. This is a bug where zone_watermark_fast returns true even when the free is very low. The commit f27ce0e1 ("page_alloc: consider highatomic reserve in watermark fast") changed the watermark fast to consider highatomic reserve. But it did not handle a negative value case which can be happened when reserved_highatomic pageblock is bigger than the actual free. If watermark is considered as ok for the negative value, allocating contexts for order-0 will consume all free pages without direct reclaim, and finally free page may become depleted except highatomic free. Then allocating contexts may fall into throttle_direct_reclaim. This symptom may easily happen in a system where wmark min is low and other reclaimers like kswapd does not make free pages quickly. Handle the negative case by using MIN. Link: https://lkml.kernel.org/r/20220725095212.25388-1-jaewon31.kim@samsung.com Fixes: f27ce0e1 ("page_alloc: consider highatomic reserve in watermark fast") Signed-off-by: NJaewon Kim <jaewon31.kim@samsung.com> Reported-by: NGyeongHwan Hong <gh21.hong@samsung.com> Acked-by: NMel Gorman <mgorman@techsingularity.net> Cc: Minchan Kim <minchan@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yong-Taek Lee <ytk.lee@samsung.com> Cc: <stable@vger.kerenl.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
由 Wang Cheng 提交于
stable inclusion from stable-v5.10.134 commit ddb3f0b68863bd1c5f43177eea476bce316d4993 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZVR7 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ddb3f0b68863bd1c5f43177eea476bce316d4993 -------------------------------- commit 018160ad upstream. mpol_set_nodemask()(mm/mempolicy.c) does not set up nodemask when pol->mode is MPOL_LOCAL. Check pol->mode before access pol->w.cpuset_mems_allowed in mpol_rebind_policy()(mm/mempolicy.c). BUG: KMSAN: uninit-value in mpol_rebind_policy mm/mempolicy.c:352 [inline] BUG: KMSAN: uninit-value in mpol_rebind_task+0x2ac/0x2c0 mm/mempolicy.c:368 mpol_rebind_policy mm/mempolicy.c:352 [inline] mpol_rebind_task+0x2ac/0x2c0 mm/mempolicy.c:368 cpuset_change_task_nodemask kernel/cgroup/cpuset.c:1711 [inline] cpuset_attach+0x787/0x15e0 kernel/cgroup/cpuset.c:2278 cgroup_migrate_execute+0x1023/0x1d20 kernel/cgroup/cgroup.c:2515 cgroup_migrate kernel/cgroup/cgroup.c:2771 [inline] cgroup_attach_task+0x540/0x8b0 kernel/cgroup/cgroup.c:2804 __cgroup1_procs_write+0x5cc/0x7a0 kernel/cgroup/cgroup-v1.c:520 cgroup1_tasks_write+0x94/0xb0 kernel/cgroup/cgroup-v1.c:539 cgroup_file_write+0x4c2/0x9e0 kernel/cgroup/cgroup.c:3852 kernfs_fop_write_iter+0x66a/0x9f0 fs/kernfs/file.c:296 call_write_iter include/linux/fs.h:2162 [inline] new_sync_write fs/read_write.c:503 [inline] vfs_write+0x1318/0x2030 fs/read_write.c:590 ksys_write+0x28b/0x510 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0xdb/0x120 fs/read_write.c:652 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Uninit was created at: slab_post_alloc_hook mm/slab.h:524 [inline] slab_alloc_node mm/slub.c:3251 [inline] slab_alloc mm/slub.c:3259 [inline] kmem_cache_alloc+0x902/0x11c0 mm/slub.c:3264 mpol_new mm/mempolicy.c:293 [inline] do_set_mempolicy+0x421/0xb70 mm/mempolicy.c:853 kernel_set_mempolicy mm/mempolicy.c:1504 [inline] __do_sys_set_mempolicy mm/mempolicy.c:1510 [inline] __se_sys_set_mempolicy+0x44c/0xb60 mm/mempolicy.c:1507 __x64_sys_set_mempolicy+0xd8/0x110 mm/mempolicy.c:1507 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae KMSAN: uninit-value in mpol_rebind_task (2) https://syzkaller.appspot.com/bug?id=d6eb90f952c2a5de9ea718a1b873c55cb13b59dc This patch seems to fix below bug too. KMSAN: uninit-value in mpol_rebind_mm (2) https://syzkaller.appspot.com/bug?id=f2fecd0d7013f54ec4162f60743a2b28df40926b The uninit-value is pol->w.cpuset_mems_allowed in mpol_rebind_policy(). When syzkaller reproducer runs to the beginning of mpol_new(), mpol_new() mm/mempolicy.c do_mbind() mm/mempolicy.c kernel_mbind() mm/mempolicy.c `mode` is 1(MPOL_PREFERRED), nodes_empty(*nodes) is `true` and `flags` is 0. Then mode = MPOL_LOCAL; ... policy->mode = mode; policy->flags = flags; will be executed. So in mpol_set_nodemask(), mpol_set_nodemask() mm/mempolicy.c do_mbind() kernel_mbind() pol->mode is 4 (MPOL_LOCAL), that `nodemask` in `pol` is not initialized, which will be accessed in mpol_rebind_policy(). Link: https://lkml.kernel.org/r/20220512123428.fq3wofedp6oiotd4@ppc.localdomainSigned-off-by: NWang Cheng <wanngchenng@gmail.com> Reported-by: <syzbot+217f792c92599518a2ab@syzkaller.appspotmail.com> Tested-by: <syzbot+217f792c92599518a2ab@syzkaller.appspotmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
- 02 11月, 2022 3 次提交
-
-
由 Baolin Wang 提交于
mainline inclusion from mainline-v6.1-rc1 commit fac35ba7 category: bugfix bugzilla: 187864, https://gitee.com/src-openeuler/kernel/issues/I5X1Z9 CVE: CVE-2022-3623 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=fac35ba763ed07ba93154c95ffc0c4a55023707f -------------------------------- On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and 1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified. So when looking up a CONT-PTE size hugetlb page by follow_page(), it will use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size hugetlb in follow_page_pte(). However this pte entry lock is incorrect for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get the correct lock, which is mm->page_table_lock. That means the pte entry of the CONT-PTE size hugetlb under current pte lock is unstable in follow_page_pte(), we can continue to migrate or poison the pte entry of the CONT-PTE size hugetlb, which can cause some potential race issues, even though they are under the 'pte lock'. For example, suppose thread A is trying to look up a CONT-PTE size hugetlb page by move_pages() syscall under the lock, however antoher thread B can migrate the CONT-PTE hugetlb page at the same time, which will cause thread A to get an incorrect page, if thread A also wants to do page migration, then data inconsistency error occurs. Moreover we have the same issue for CONT-PMD size hugetlb in follow_huge_pmd(). To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte() to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to get the correct pte entry lock to make the pte entry stable. Mike said: Support for CONT_PMD/_PTE was added with bb9dd3df ("arm64: hugetlb: refactor find_num_contig()"). Patch series "Support for contiguous pte hugepages", v4. However, I do not believe these code paths were executed until migration support was added with 5480280d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages") I would go with 5480280d for the Fixes: targe. Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.1662017562.git.baolin.wang@linux.alibaba.com Fixes: 5480280d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages") Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Suggested-by: NMike Kravetz <mike.kravetz@oracle.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Conflicts: mm/hugetlb.c Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: Nanyong Sun<sunnanyong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Alistair Popple 提交于
mainline inclusion from mainline-v6.1-rc1 commit 16ce101d category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5VZ0L CVE: CVE-2022-3523 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=16ce101db85db694a91380aa4c89b25530871d33 -------------------------------- Patch series "Fix several device private page reference counting issues", v2 This series aims to fix a number of page reference counting issues in drivers dealing with device private ZONE_DEVICE pages. These result in use-after-free type bugs, either from accessing a struct page which no longer exists because it has been removed or accessing fields within the struct page which are no longer valid because the page has been freed. During normal usage it is unlikely these will cause any problems. However without these fixes it is possible to crash the kernel from userspace. These crashes can be triggered either by unloading the kernel module or unbinding the device from the driver prior to a userspace task exiting. In modules such as Nouveau it is also possible to trigger some of these issues by explicitly closing the device file-descriptor prior to the task exiting and then accessing device private memory. This involves some minor changes to both PowerPC and AMD GPU code. Unfortunately I lack hardware to test either of those so any help there would be appreciated. The changes mimic what is done in for both Nouveau and hmm-tests though so I doubt they will cause problems. This patch (of 8): When the CPU tries to access a device private page the migrate_to_ram() callback associated with the pgmap for the page is called. However no reference is taken on the faulting page. Therefore a concurrent migration of the device private page can free the page and possibly the underlying pgmap. This results in a race which can crash the kernel due to the migrate_to_ram() function pointer becoming invalid. It also means drivers can't reliably read the zone_device_data field because the page may have been freed with memunmap_pages(). Close the race by getting a reference on the page while holding the ptl to ensure it has not been freed. Unfortunately the elevated reference count will cause the migration required to handle the fault to fail. To avoid this failure pass the faulting page into the migrate_vma functions so that if an elevated reference count is found it can be checked to see if it's expected or not. [mpe@ellerman.id.au: fix build] Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com> Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Lyude Paul <lyude@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Christian König <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Conflicts: arch/powerpc/kvm/book3s_hv_uvmem.c include/linux/migrate.h lib/test_hmm.c mm/migrate.c Signed-off-by: NMa Wupeng <mawupeng1@huawei.com> Reviewed-by: Ntong tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Gowans, James 提交于
stable inclusion from stable-v5.10.132 commit 931dbcc2e02f0409c095b11e35490cade9ac14af category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5YS3T Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=931dbcc2e02f0409c095b11e35490cade9ac14af -------------------------------- commit 14c99d65 upstream. Currently the implementation will split the PUD when a fallback is taken inside the create_huge_pud function. This isn't where it should be done: the splitting should be done in wp_huge_pud, just like it's done for PMDs. Reason being that if a callback is taken during create, there is no PUD yet so nothing to split, whereas if a fallback is taken when encountering a write protection fault there is something to split. It looks like this was the original intention with the commit where the splitting was introduced, but somehow it got moved to the wrong place between v1 and v2 of the patch series. Rebase mistake perhaps. Link: https://lkml.kernel.org/r/6f48d622eb8bce1ae5dd75327b0b73894a2ec407.camel@amazon.com Fixes: 327e9fd4 ("mm: Split huge pages on write-notify or COW") Signed-off-by: NJames Gowans <jgowans@amazon.com> Reviewed-by: NThomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 18 10月, 2022 1 次提交
-
-
由 Jann Horn 提交于
stable inclusion from stable-v5.10.141 commit 98f401d36396134c0c86e9e3bd00b6b6b028b521 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5USOP CVE: CVE-2022-42703 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=98f401d36396134c0c86e9e3bd00b6b6b028b521 -------------------------------- commit 2555283e upstream. anon_vma->degree tracks the combined number of child anon_vmas and VMAs that use the anon_vma as their ->anon_vma. anon_vma_clone() then assumes that for any anon_vma attached to src->anon_vma_chain other than src->anon_vma, it is impossible for it to be a leaf node of the VMA tree, meaning that for such VMAs ->degree is elevated by 1 because of a child anon_vma, meaning that if ->degree equals 1 there are no VMAs that use the anon_vma as their ->anon_vma. This assumption is wrong because the ->degree optimization leads to leaf nodes being abandoned on anon_vma_clone() - an existing anon_vma is reused and no new parent-child relationship is created. So it is possible to reuse an anon_vma for one VMA while it is still tied to another VMA. This is an issue because is_mergeable_anon_vma() and its callers assume that if two VMAs have the same ->anon_vma, the list of anon_vmas attached to the VMAs is guaranteed to be the same. When this assumption is violated, vma_merge() can merge pages into a VMA that is not attached to the corresponding anon_vma, leading to dangling page->mapping pointers that will be dereferenced during rmap walks. Fix it by separately tracking the number of child anon_vmas and the number of VMAs using the anon_vma as their ->anon_vma. Fixes: 7a3ef208 ("mm: prevent endless growth of anon_vma hierarchy") Cc: stable@kernel.org Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NMa Wupeng <mawupeng1@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 27 9月, 2022 3 次提交
-
-
由 Jann Horn 提交于
stable inclusion from stable-v5.10.142 commit 895428ee124ad70b9763259308354877b725c31d category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PE9S CVE: CVE-2022-39188 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=895428ee124ad70b9763259308354877b725c31d -------------------------------- commit b67fbebd upstream. Some drivers rely on having all VMAs through which a PFN might be accessible listed in the rmap for correctness. However, on X86, it was possible for a VMA with stale TLB entries to not be listed in the rmap. This was fixed in mainline with commit b67fbebd ("mmu_gather: Force tlb-flush VM_PFNMAP vmas"), but that commit relies on preceding refactoring in commit 18ba064e ("mmu_gather: Let there be one tlb_{start,end}_vma() implementation") and commit 1e9fdf21 ("mmu_gather: Remove per arch tlb_{start,end}_vma()"). This patch provides equivalent protection without needing that refactoring, by forcing a TLB flush between removing PTEs in unmap_vmas() and the call to unlink_file_vma() in free_pgtables(). [This is a stable-specific rewrite of the upstream commit!] Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Nze zuo <zuoze1@huawei.com> Reviewed-by: NChen Wandun <chenwandun@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Mike Kravetz 提交于
stable inclusion from stable-v5.10.121 commit 63758dd9595f87c7e7b5f826fd2dcf53d6aff0cf category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=63758dd9595f87c7e7b5f826fd2dcf53d6aff0cf -------------------------------- commit 48381273 upstream. The routine huge_pmd_unshare() is passed a pointer to an address associated with an area which may be unshared. If unshare is successful this address is updated to 'optimize' callers iterating over huge page addresses. For the optimization to work correctly, address should be updated to the last huge page in the unmapped/unshared area. However, in the common case where the passed address is PUD_SIZE aligned, the address is incorrectly updated to the address of the preceding huge page. That wastes CPU cycles as the unmapped/unshared range is scanned twice. Link: https://lkml.kernel.org/r/20220524205003.126184-1-mike.kravetz@oracle.com Fixes: 39dde65c ("shared page table for hugetlb page") Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Acked-by: NMuchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Rei Yamamoto 提交于
stable inclusion from stable-v5.10.121 commit 7994d890123a6cad033f2842ff0177a9bda1cb23 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7994d890123a6cad033f2842ff0177a9bda1cb23 -------------------------------- commit bbe832b9 upstream. At present, pages not in the target zone are added to cc->migratepages list in isolate_migratepages_block(). As a result, pages may migrate between nodes unintentionally. This would be a serious problem for older kernels without commit a984226f ("mm: memcontrol: remove the pgdata parameter of mem_cgroup_page_lruvec"), because it can corrupt the lru list by handling pages in list without holding proper lru_lock. Avoid returning a pfn outside the target zone in the case that it is not aligned with a pageblock boundary. Otherwise isolate_migratepages_block() will handle pages not in the target zone. Link: https://lkml.kernel.org/r/20220511044300.4069-1-yamamoto.rei@jp.fujitsu.com Fixes: 70b44595 ("mm, compaction: use free lists to quickly locate a migration source") Signed-off-by: NRei Yamamoto <yamamoto.rei@jp.fujitsu.com> Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com> Acked-by: NMel Gorman <mgorman@techsingularity.net> Reviewed-by: NOscar Salvador <osalvador@suse.de> Cc: Don Dutile <ddutile@redhat.com> Cc: Wonhyuk Yang <vvghjk1234@gmail.com> Cc: Rei Yamamoto <yamamoto.rei@jp.fujitsu.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 20 9月, 2022 2 次提交
-
-
由 liubo 提交于
euleros inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5DC4A CVE: NA ------------------------------------------------- etmem, the memory vertical expansion technology, The existing memory expansion tool etmem swaps out all pages that can be swapped out for the process by default, unless the page is marked with lock flag. The function of swapping out specified pages is added. The process adds VM_SWAPFLAG flags for pages to be swapped out. The etmem adds filters to the scanning module and swaps out only these pages. Signed-off-by: Nliubo <liubo254@huawei.com> Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 liubo 提交于
euleros inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5DC4A CVE: NA ------------------------------------------------- etmem, the memory vertical expansion technology, In the current etmem process, memory page swapping is implemented by invoking shrink_page_list. When this interface is invoked for the first time, pages are added to the swap cache and written to disks.The swap cache page is reclaimed only when this interface is invoked for the second time and no process accesses the page.However, in the etmem process, the user mode scans pages that have been accessed, and the migration is not delivered to pages that are not accessed by processes. Therefore, the swap cache may always be occupied. To solve the preceding problem, add the logic for actively reclaiming the swap cache.When the swap cache occupies a large amount of memory, the system proactively scans the LRU linked list and reclaims the swap cache to save memory within the specified range. Signed-off-by: Nliubo <liubo254@huawei.com> Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-