- 01 5月, 2021 17 次提交
-
-
由 Jesper Dangaard Brouer 提交于
Looking at perf-report and ASM-code for __alloc_pages_bulk() it is clear that the code activated is suboptimal. The compiler guesses wrong and places unlikely code at the beginning. Due to the use of WARN_ON_ONCE() macro the UD2 asm instruction is added to the code, which confuse the I-cache prefetcher in the CPU. [mgorman@techsingularity.net: minor changes and rebasing] Link: https://lkml.kernel.org/r/20210325114228.27719-5-mgorman@techsingularity.netSigned-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NMel Gorman <mgorman@techsingularity.net> Reviewed-by: NAlexander Lobakin <alobakin@pm.me> Acked-By: NVlastimil Babka <vbabka@suse.cz> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
The proposed callers for the bulk allocator store pages from the bulk allocator in an array. This patch adds an array-based interface to the API to avoid multiple list iterations. The page list interface is preserved to avoid requiring all users of the bulk API to allocate and manage enough storage to store the pages. [akpm@linux-foundation.org: remove now unused local `allocated'] Link: https://lkml.kernel.org/r/20210325114228.27719-4-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net> Reviewed-by: NAlexander Lobakin <alobakin@pm.me> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
This patch adds a new page allocator interface via alloc_pages_bulk, and __alloc_pages_bulk_nodemask. A caller requests a number of pages to be allocated and added to a list. The API is not guaranteed to return the requested number of pages and may fail if the preferred allocation zone has limited free memory, the cpuset changes during the allocation or page debugging decides to fail an allocation. It's up to the caller to request more pages in batch if necessary. Note that this implementation is not very efficient and could be improved but it would require refactoring. The intent is to make it available early to determine what semantics are required by different callers. Once the full semantics are nailed down, it can be refactored. [mgorman@techsingularity.net: fix alloc_pages_bulk() return type, per Matthew] Link: https://lkml.kernel.org/r/20210325123713.GQ3697@techsingularity.net [mgorman@techsingularity.net: fix uninit var warning] Link: https://lkml.kernel.org/r/20210330114847.GX3697@techsingularity.net [mgorman@techsingularity.net: fix comment, per Vlastimil] Link: https://lkml.kernel.org/r/20210412110255.GV3697@techsingularity.net Link: https://lkml.kernel.org/r/20210325114228.27719-3-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net> Acked-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NAlexander Lobakin <alobakin@pm.me> Tested-by: NColin Ian King <colin.king@canonical.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
Patch series "Introduce a bulk order-0 page allocator with two in-tree users", v6. This series introduces a bulk order-0 page allocator with sunrpc and the network page pool being the first users. The implementation is not efficient as semantics needed to be ironed out first. If no other semantic changes are needed, it can be made more efficient. Despite that, this is a performance-related for users that require multiple pages for an operation without multiple round-trips to the page allocator. Quoting the last patch for the high-speed networking use-case Kernel XDP stats CPU pps Delta Baseline XDP-RX CPU total 3,771,046 n/a List XDP-RX CPU total 3,940,242 +4.49% Array XDP-RX CPU total 4,249,224 +12.68% Via the SUNRPC traces of svc_alloc_arg() Single page: 25.007 us per call over 532,571 calls Bulk list: 6.258 us per call over 517,034 calls Bulk array: 4.590 us per call over 517,442 calls Both potential users in this series are corner cases (NFS and high-speed networks) so it is unlikely that most users will see any benefit in the short term. Other potential other users are batch allocations for page cache readahead, fault around and SLUB allocations when high-order pages are unavailable. It's unknown how much benefit would be seen by converting multiple page allocation calls to a single batch or what difference it may make to headline performance. Light testing of my own running dbench over NFS passed. Chuck and Jesper conducted their own tests and details are included in the changelogs. Patch 1 renames a variable name that is particularly unpopular Patch 2 adds a bulk page allocator Patch 3 adds an array-based version of the bulk allocator Patches 4-5 adds micro-optimisations to the implementation Patches 6-7 SUNRPC user Patches 8-9 Network page_pool user This patch (of 9): Review feedback of the bulk allocator twice found problems with "alloced" being a counter for pages allocated. The naming was based on the API name "alloc" and was based on the idea that verbal communication about malloc tends to use the fake word "malloced" instead of the fake word mallocated. To be consistent, this preparation patch renames alloced to allocated in rmqueue_bulk so the bulk allocator and per-cpu allocator use similar names when the bulk allocator is introduced. Link: https://lkml.kernel.org/r/20210325114228.27719-1-mgorman@techsingularity.net Link: https://lkml.kernel.org/r/20210325114228.27719-2-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NAlexander Lobakin <alobakin@pm.me> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 zhouchuangao 提交于
linux/vmalloc.h is repeatedly in the file page_alloc.c Link: https://lkml.kernel.org/r/1616468751-80656-1-git-send-email-zhouchuangao@vivo.comSigned-off-by: Nzhouchuangao <zhouchuangao@vivo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kefeng Wang 提交于
The start_pfn and end_pfn are already available in move_freepages_block(), there is no need to go back and forth between page and pfn in move_freepages and move_freepages_block, and pfn_valid_within() should validate pfn first before touching the page. Link: https://lkml.kernel.org/r/20210323131215.934472-1-liushixin2@huawei.comSigned-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
Currently, debugging CMA allocation failures is quite limited. The most common source of these failures seems to be page migration which doesn't provide any useful information on the reason of the failure by itself. alloc_contig_range can report those failures as it holds a list of migrate-failed pages. The information logged by dump_page() has already proven helpful for debugging allocation issues, like identifying long-term pinnings on ZONE_MOVABLE or MIGRATE_CMA. Let's use the dynamic debugging infrastructure, such that we avoid flooding the logs and creating a lot of noise on frequent alloc_contig_range() calls. This information is helpful for debugging only. There are two ifdefery conditions to support common dyndbg options: - CONFIG_DYNAMIC_DEBUG_CORE && DYNAMIC_DEBUG_MODULE It aims for supporting the feature with only specific file with adding ccflags. - CONFIG_DYNAMIC_DEBUG It aims for supporting the feature with system wide globally. A simple example to enable the feature: Admin could enable the dump like this(by default, disabled) echo "func alloc_contig_dump_pages +p" > control Admin could disable it. echo "func alloc_contig_dump_pages =_" > control Detail goes Documentation/admin-guide/dynamic-debug-howto.rst A concern is utility functions in dump_page use inconsistent loglevels. In the future, we might want to make the loglevels used inside dump_page() consistent and eventually rework the way we log the information here. See [1]. [1] https://lore.kernel.org/linux-mm/YEh4doXvyuRl5BDB@google.com/ Link: https://lkml.kernel.org/r/20210311194042.825152-1-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Cc: John Dias <joaodias@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
There are only two callers of __alloc_pages() so prune the thicket of alloc_page variants by combining the two functions together. Current callers of __alloc_pages() simply add an extra 'NULL' parameter and current callers of __alloc_pages_nodemask() call __alloc_pages() instead. Link: https://lkml.kernel.org/r/20210225150642.2582252-4-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Shorten some overly-long lines by renaming this identifier. Link: https://lkml.kernel.org/r/20210225150642.2582252-3-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Patch series "Rationalise __alloc_pages wrappers", v3. I was poking around the __alloc_pages variants trying to understand why they each exist, and couldn't really find a good justification for keeping __alloc_pages and __alloc_pages_nodemask as separate functions. That led to getting rid of alloc_pages_current() and then I noticed the documentation was bad, and then I noticed the mempolicy documentation wasn't included. Anyway, this is all cleanups & doc fixes. This patch (of 7): We have two masks involved -- the nodemask and the gfp mask, so alloc_mask is an unclear name. Link: https://lkml.kernel.org/r/20210225150642.2582252-2-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
__alloc_contig_migrate_range already has lru_add_drain_all call via migrate_prep. It's necessary to move LRU taget pages into LRU list to be able to isolated. However, lru_add_drain_all call after __alloc_contig_migrate_range is pointless since it has changed source page freeing from putback_lru_pages to put_page[1]. This patch removes it. [1] c6c919eb, ("mm: use put_page() to free page instead of putback_lru_page()" Link: https://lkml.kernel.org/r/20210303204512.2863087-1-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org> Reviewed-by: NOscar Salvador <osalvador@suse.de> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Hildenbrand 提交于
The information that some PFNs are busy is: a) not helpful for ordinary users: we don't even know *who* called alloc_contig_range(). This is certainly not worth a pr_info.*(). b) not really helpful for debugging: we don't have any details *why* these PFNs are busy, and that is what we usually care about. c) not complete: there are other cases where we fail alloc_contig_range() using different paths that are not getting recorded. For example, we reach this path once we succeeded in isolating pageblocks, but failed to migrate some pages - which can happen easily on ZONE_NORMAL (i.e., has_unmovable_pages() is racy) but also on ZONE_MOVABLE i.e., we would have to retry longer to migrate). For example via virtio-mem when unplugging memory, we can create quite some noise (especially with ZONE_NORMAL) that is not of interest to users - it's expected that some allocations may fail as memory is busy. Let's just drop that pr_info_ratelimit() and rather implement a dynamic debugging mechanism in the future that can give us a better reason why alloc_contig_range() failed on specific pages. Link: https://lkml.kernel.org/r/20210301150945.77012-1-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NZi Yan <ziy@nvidia.com> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NOscar Salvador <osalvador@suse.de> Acked-by: NMinchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kefeng Wang 提交于
mem_init_print_info() is called in mem_init() on each architecture, and pass NULL argument, so using void argument and move it into mm_init(). Link: https://lkml.kernel.org/r/20210317015210.33641-1-wangkefeng.wang@huawei.comSigned-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86] Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> [powerpc] Acked-by: NDavid Hildenbrand <david@redhat.com> Tested-by: Anatoly Pugachev <matorola@gmail.com> [sparc64] Acked-by: Russell King <rmk+kernel@armlinux.org.uk> [arm] Acked-by: NMike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Guo Ren <guoren@kernel.org> Cc: Yoshinori Sato <ysato@users.osdn.me> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Konovalov 提交于
This change uses the previously added memory initialization feature of HW_TAGS KASAN routines for page_alloc memory when init_on_alloc/free is enabled. With this change, kernel_init_free_pages() is no longer called when both HW_TAGS KASAN and init_on_alloc/free are enabled. Instead, memory is initialized in KASAN runtime. To avoid discrepancies with which memory gets initialized that can be caused by future changes, both KASAN and kernel_init_free_pages() hooks are put together and a warning comment is added. This patch changes the order in which memory initialization and page poisoning hooks are called. This doesn't lead to any side-effects, as whenever page poisoning is enabled, memory initialization gets disabled. Combining setting allocation tags with memory initialization improves HW_TAGS KASAN performance when init_on_alloc/free is enabled. [andreyknvl@google.com: fix for "integrate page_alloc init with HW_TAGS"] Link: https://lkml.kernel.org/r/65b6028dea2e9a6e8e2cb779b5115c09457363fc.1617122211.git.andreyknvl@google.com Link: https://lkml.kernel.org/r/e77f0d5b1b20658ef0b8288625c74c2b3690e725.1615296150.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com> Reviewed-by: NMarco Elver <elver@google.com> Tested-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NSergei Trofimovich <slyfox@gentoo.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Konovalov 提交于
During boot, all non-reserved memblock memory is exposed to page_alloc via memblock_free_pages->__free_pages_core(). This results in kasan_free_pages() being called, which poisons that memory. Poisoning all that memory lengthens boot time. The most noticeable effect is observed with the HW_TAGS mode. A boot-time impact may potentially also affect systems with large amount of RAM. This patch changes the tag-based modes to not poison the memory during the memblock->page_alloc transition. An exception is made for KASAN_GENERIC. Since it marks all new memory as accessible, not poisoning the memory released from memblock will lead to KASAN missing invalid boot-time accesses to that memory. With KASAN_SW_TAGS, as it uses the invalid 0xFE tag as the default tag for all memory, it won't miss bad boot-time accesses even if the poisoning of memblock memory is removed. With KASAN_HW_TAGS, the default memory tags values are unspecified. Therefore, if memblock poisoning is removed, this KASAN mode will miss the mentioned type of boot-time bugs with a 1/16 probability. This is taken as an acceptable trafe-off. Internally, the poisoning is removed as follows. __free_pages_core() is used when exposing fresh memory during system boot and when onlining memory during hotplug. This patch adds a new FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok() through free_pages_prepare() from __free_pages_core(). If FPI_SKIP_KASAN_POISON is set, kasan_free_pages() is not called. All memory allocated normally when the boot is over keeps getting poisoned as usual. Link: https://lkml.kernel.org/r/a0570dc1e3a8f39a55aa343a1fc08cd5c2d4cad6.1613692950.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com> Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Marco Elver <elver@google.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nicholas Piggin 提交于
Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC enables support on architectures that define HAVE_ARCH_HUGE_VMAP and supports PMD sized vmap mappings. vmalloc will attempt to allocate PMD-sized pages if allocating PMD size or larger, and fall back to small pages if that was unsuccessful. Architectures must ensure that any arch specific vmalloc allocations that require PAGE_SIZE mappings (e.g., module allocations vs strict module rwx) use the VM_NOHUGE flag to inhibit larger mappings. This can result in more internal fragmentation and memory overhead for a given allocation, an option nohugevmalloc is added to disable at boot. [colin.king@canonical.com: fix read of uninitialized pointer area] Link: https://lkml.kernel.org/r/20210318155955.18220-1-colin.king@canonical.com Link: https://lkml.kernel.org/r/20210317062402.533919-14-npiggin@gmail.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
page_memcg() is not suitable for use by page_expected_state() and page_bad_reason(). Because it can BUG_ON() for the slab pages when CONFIG_DEBUG_VM is enabled. As neither lru, nor kmem, nor slab page should have anything left in there by the time the page is freed, what we care about is whether the value of page->memcg_data is 0. So just directly access page->memcg_data here. Link: https://lkml.kernel.org/r/20210319163821.20704-4-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 4月, 2021 1 次提交
-
-
由 Kees Cook 提交于
The state of CONFIG_INIT_ON_ALLOC_DEFAULT_ON (and ...ON_FREE...) did not change the assembly ordering of the static branches: they were always out of line. Use the new jump_label macros to check the CONFIG settings to default to the "expected" state, which slightly optimizes the resulting assembly code. Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NAlexander Potapenko <glider@google.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20210401232347.2791257-3-keescook@chromium.org
-
- 14 3月, 2021 3 次提交
-
-
由 Zhou Guanghui 提交于
As described in the split_page() comment, for the non-compound high order page, the sub-pages must be freed individually. If the memcg of the first page is valid, the tail pages cannot be uncharged when be freed. For example, when alloc_pages_exact is used to allocate 1MB continuous physical memory, 2MB is charged(kmemcg is enabled and __GFP_ACCOUNT is set). When make_alloc_exact free the unused 1MB and free_pages_exact free the applied 1MB, actually, only 4KB(one page) is uncharged. Therefore, the memcg of the tail page needs to be set when splitting a page. Michel: There are at least two explicit users of __GFP_ACCOUNT with alloc_exact_pages added recently. See 7efe8ef2 ("KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT") and c4196218 ("KVM: s390: Add memcg accounting to KVM allocations"), so this is not just a theoretical issue. Link: https://lkml.kernel.org/r/20210304074053.65527-3-zhouguanghui1@huawei.comSigned-off-by: NZhou Guanghui <zhouguanghui1@huawei.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NZi Yan <ziy@nvidia.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Rui Xiang <rui.xiang@huawei.com> Cc: Tianhong Ding <dingtianhong@huawei.com> Cc: Weilong Chen <chenweilong@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Konovalov 提交于
Currently, kasan_free_nondeferred_pages()->kasan_free_pages() is called after debug_pagealloc_unmap_pages(). This causes a crash when debug_pagealloc is enabled, as HW_TAGS KASAN can't set tags on an unmapped page. This patch puts kasan_free_nondeferred_pages() before debug_pagealloc_unmap_pages() and arch_free_page(), which can also make the page unavailable. Link: https://lkml.kernel.org/r/24cd7db274090f0e5bc3adcdc7399243668e3171.1614987311.git.andreyknvl@google.com Fixes: 94ab5b61 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS") Signed-off-by: NAndrey Konovalov <andreyknvl@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Marco Elver <elver@google.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Rapoport 提交于
There could be struct pages that are not backed by actual physical memory. This can happen when the actual memory bank is not a multiple of SECTION_SIZE or when an architecture does not register memory holes reserved by the firmware as memblock.memory. Such pages are currently initialized using init_unavailable_mem() function that iterates through PFNs in holes in memblock.memory and if there is a struct page corresponding to a PFN, the fields of this page are set to default values and it is marked as Reserved. init_unavailable_mem() does not take into account zone and node the page belongs to and sets both zone and node links in struct page to zero. Before commit 73a6e474 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") the holes inside a zone were re-initialized during memmap_init() and got their zone/node links right. However, after that commit nothing updates the struct pages representing such holes. On a system that has firmware reserved holes in a zone above ZONE_DMA, for instance in a configuration below: # grep -A1 E820 /proc/iomem 7a17b000-7a216fff : Unknown E820 type 7a217000-7bffffff : System RAM unset zone link in struct page will trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); in set_pfnblock_flags_mask() when called with a struct page from a range other than E820_TYPE_RAM because there are pages in the range of ZONE_DMA32 but the unset zone link in struct page makes them appear as a part of ZONE_DMA. Interleave initialization of the unavailable pages with the normal initialization of memory map, so that zone and node information will be properly set on struct pages that are not backed by the actual memory. With this change the pages for holes inside a zone will get proper zone/node links and the pages that are not spanned by any node will get links to the adjacent zone/node. The holes between nodes will be prepended to the zone/node above the hole and the trailing pages in the last section that will be appended to the zone/node below. [akpm@linux-foundation.org: don't initialize static to zero, use %llu for u64] Link: https://lkml.kernel.org/r/20210225224351.7356-2-rppt@kernel.org Fixes: 73a6e474 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Reported-by: NQian Cai <cai@lca.pw> Reported-by: NAndrea Arcangeli <aarcange@redhat.com> Reviewed-by: NBaoquan He <bhe@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Łukasz Majczak <lma@semihalf.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Sarvela, Tomi P" <tomi.p.sarvela@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 2月, 2021 1 次提交
-
-
由 David Hildenbrand 提交于
Let's count the number of CMA pages per zone and print them in /proc/zoneinfo. Having access to the total number of CMA pages per zone is helpful for debugging purposes to know where exactly the CMA pages ended up, and to figure out how many pages of a zone might behave differently, even after some of these pages might already have been allocated. As one example, CMA pages part of a kernel zone cannot be used for ordinary kernel allocations but instead behave more like ZONE_MOVABLE. For now, we are only able to get the global nr+free cma pages from /proc/meminfo and the free cma pages per zone from /proc/zoneinfo. Example after this patch when booting a 6 GiB QEMU VM with "hugetlb_cma=2G": # cat /proc/zoneinfo | grep cma cma 0 nr_free_cma 0 cma 0 nr_free_cma 0 cma 524288 nr_free_cma 493016 cma 0 cma 0 # cat /proc/meminfo | grep Cma CmaTotal: 2097152 kB CmaFree: 1972064 kB Note: We print even without CONFIG_CMA, just like "nr_free_cma"; this way, one can be sure when spotting "cma 0", that there are definetly no CMA pages located in a zone. [david@redhat.com: v2] Link: https://lkml.kernel.org/r/20210128164533.18566-1-david@redhat.com [david@redhat.com: v3] Link: https://lkml.kernel.org/r/20210129113451.22085-1-david@redhat.com Link: https://lkml.kernel.org/r/20210127101813.6370-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NOscar Salvador <osalvador@suse.de> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 2月, 2021 8 次提交
-
-
由 David Hildenbrand 提交于
adjust_managed_page_count() as called by free_reserved_page() properly handles pages in a highmem zone, so we can reuse it for free_highmem_page(). We can now get rid of totalhigh_pages_inc() and simplify free_reserved_page(). Link: https://lkml.kernel.org/r/20210126182113.19892-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NOscar Salvador <osalvador@suse.de> Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baoquan He 提交于
Local variable 'zone_start_pfn' is not needed since there's only one call site in free_area_init_core(). Let's remove it and pass zone->zone_start_pfn directly to init_currently_empty_zone(). Link: https://lkml.kernel.org/r/20210122135956.5946-6-bhe@redhat.comSigned-off-by: NBaoquan He <bhe@redhat.com> Reviewed-by: NMike Rapoport <rppt@linux.ibm.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baoquan He 提交于
Parameter 'zone' has got needed information, let's remove other unnecessary parameters. Link: https://lkml.kernel.org/r/20210122135956.5946-5-bhe@redhat.comSigned-off-by: NBaoquan He <bhe@redhat.com> Reviewed-by: NMike Rapoport <rppt@linux.ibm.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baoquan He 提交于
As David suggested, simply passing 'struct zone *zone' is enough. We can get all needed information from 'struct zone*' easily. Link: https://lkml.kernel.org/r/20210122135956.5946-4-bhe@redhat.comSigned-off-by: NBaoquan He <bhe@redhat.com> Suggested-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NMike Rapoport <rppt@linux.ibm.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baoquan He 提交于
The current memmap_init_zone() only handles memory region inside one zone, actually memmap_init() does the memmap init of one zone. So rename both of them accordingly. Link: https://lkml.kernel.org/r/20210122135956.5946-3-bhe@redhat.comSigned-off-by: NBaoquan He <bhe@redhat.com> Reviewed-by: NMike Rapoport <rppt@linux.ibm.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_SHMEM_PMDMAPPED account to pages. This patch is consistent with 8f182270 ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/20201228164110.2838-6-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: NeilBrown <neilb@suse.de> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: Rafael. J. Wysocki <rafael@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Roman Gushchin <guro@fb.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_SHMEM_THPS account to pages. This patch is consistent with 8f182270 ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/20201228164110.2838-5-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: NeilBrown <neilb@suse.de> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: Rafael. J. Wysocki <rafael@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Roman Gushchin <guro@fb.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_ANON_THPS account to pages. This patch is consistent with 8f182270 ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/20201228164110.2838-3-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Rafael. J. Wysocki <rafael@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Roman Gushchin <guro@fb.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Feng Tang <feng.tang@intel.com> Cc: NeilBrown <neilb@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 2月, 2021 1 次提交
-
-
由 Kevin Hao 提交于
In the current implementation of page_frag_alloc(), it doesn't have any align guarantee for the returned buffer address. But for some hardwares they do require the DMA buffer to be aligned correctly, so we would have to use some workarounds like below if the buffers allocated by the page_frag_alloc() are used by these hardwares for DMA. buf = page_frag_alloc(really_needed_size + align); buf = PTR_ALIGN(buf, align); These codes seems ugly and would waste a lot of memories if the buffers are used in a network driver for the TX/RX. So introduce page_frag_alloc_align() to make sure that an aligned buffer address is returned. Signed-off-by: NKevin Hao <haokexin@gmail.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 27 1月, 2021 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit d3921cb8. Chris Wilson reports that it causes boot problems: "We have half a dozen or so different machines in CI that are silently failing to boot, that we believe is bisected to this patch" and the CI team confirmed that a revert fixed the issues. The cause is unknown for now, so let's revert it. Link: https://lore.kernel.org/lkml/161160687463.28991.354987542182281928@build.alporthouse.com/Reported-and-tested-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NMike Rapoport <rppt@linux.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 1月, 2021 2 次提交
-
-
由 Andrey Konovalov 提交于
A previous commit added resetting KASAN page tags to kernel_init_free_pages() to avoid false-positives due to accesses to metadata with the hardware tag-based mode. That commit did reset page tags before the metadata access, but didn't restore them after. As the result, KASAN fails to detect bad accesses to page_alloc allocations on some configurations. Fix this by recovering the tag after the metadata access. Link: https://lkml.kernel.org/r/02b5bcd692e912c27d484030f666b350ad7e4ae4.1611074450.git.andreyknvl@google.com Fixes: aa1ef4d7 ("kasan, mm: reset tags when accessing metadata") Signed-off-by: NAndrey Konovalov <andreyknvl@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Marco Elver <elver@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Rapoport 提交于
There could be struct pages that are not backed by actual physical memory. This can happen when the actual memory bank is not a multiple of SECTION_SIZE or when an architecture does not register memory holes reserved by the firmware as memblock.memory. Such pages are currently initialized using init_unavailable_mem() function that iterates through PFNs in holes in memblock.memory and if there is a struct page corresponding to a PFN, the fields if this page are set to default values and the page is marked as Reserved. init_unavailable_mem() does not take into account zone and node the page belongs to and sets both zone and node links in struct page to zero. On a system that has firmware reserved holes in a zone above ZONE_DMA, for instance in a configuration below: # grep -A1 E820 /proc/iomem 7a17b000-7a216fff : Unknown E820 type 7a217000-7bffffff : System RAM unset zone link in struct page will trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link in struct page) in the same pageblock. Update init_unavailable_mem() to use zone constraints defined by an architecture to properly setup the zone link and use node ID of the adjacent range in memblock.memory to set the node link. Link: https://lkml.kernel.org/r/20210111194017.22696-3-rppt@kernel.org Fixes: 73a6e474 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Reported-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Hildenbrand <david@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qian Cai <cai@lca.pw> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 1月, 2021 1 次提交
-
-
由 Hailong liu 提交于
The trace point *trace_mm_page_alloc_zone_locked()* in __rmqueue() does not currently cover all branches. Add the missing tracepoint and check the page before do that. [akpm@linux-foundation.org: use IS_ENABLED() to suppress warning] Link: https://lkml.kernel.org/r/20201228132901.41523-1-carver4lio@163.comSigned-off-by: NHailong liu <liu.hailong6@zte.com.cn> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 12月, 2020 1 次提交
-
-
由 Baoquan He 提交于
VMware observed a performance regression during memmap init on their platform, and bisected to commit 73a6e474 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") causing it. Before the commit: [0.033176] Normal zone: 1445888 pages used for memmap [0.033176] Normal zone: 89391104 pages, LIFO batch:63 [0.035851] ACPI: PM-Timer IO Port: 0x448 With commit [0.026874] Normal zone: 1445888 pages used for memmap [0.026875] Normal zone: 89391104 pages, LIFO batch:63 [2.028450] ACPI: PM-Timer IO Port: 0x448 The root cause is the current memmap defer init doesn't work as expected. Before, memmap_init_zone() was used to do memmap init of one whole zone, to initialize all low zones of one numa node, but defer memmap init of the last zone in that numa node. However, since commit 73a6e474, function memmap_init() is adapted to iterater over memblock regions inside one zone, then call memmap_init_zone() to do memmap init for each region. E.g, on VMware's system, the memory layout is as below, there are two memory regions in node 2. The current code will mistakenly initialize the whole 1st region [mem 0xab00000000-0xfcffffffff], then do memmap defer to iniatialize only one memmory section on the 2nd region [mem 0x10000000000-0x1033fffffff]. In fact, we only expect to see that there's only one memory section's memmap initialized. That's why more time is costed at the time. [ 0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff] [ 0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff] [ 0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff] [ 0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff] [ 0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff] [ 0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff] Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass down the real zone end pfn so that defer_init() can use it to judge whether defer need be taken in zone wide. Link: https://lkml.kernel.org/r/20201223080811.16211-1-bhe@redhat.com Link: https://lkml.kernel.org/r/20201223080811.16211-2-bhe@redhat.com Fixes: commit 73a6e474 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") Signed-off-by: NBaoquan He <bhe@redhat.com> Reported-by: NRahul Gopakumar <gopakumarr@vmware.com> Reviewed-by: NMike Rapoport <rppt@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 12月, 2020 2 次提交
-
-
由 Andrey Konovalov 提交于
Kernel allocator code accesses metadata for slab objects, that may lie out-of-bounds of the object itself, or be accessed when an object is freed. Such accesses trigger tag faults and lead to false-positive reports with hardware tag-based KASAN. Software KASAN modes disable instrumentation for allocator code via KASAN_SANITIZE Makefile macro, and rely on kasan_enable/disable_current() annotations which are used to ignore KASAN reports. With hardware tag-based KASAN neither of those options are available, as it doesn't use compiler instrumetation, no tag faults are ignored, and MTE is disabled after the first one. Instead, reset tags when accessing metadata (currently only for SLUB). Link: https://lkml.kernel.org/r/a0f3cefbc49f34c843b664110842de4db28179d0.1606161801.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com> Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: NMarco Elver <elver@google.com> Reviewed-by: NAlexander Potapenko <glider@google.com> Tested-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vincenzo Frascino 提交于
free_reserved_area() memsets the pages belonging to a given memory area. As that memory hasn't been allocated via page_alloc, the KASAN tags that those pages have are 0x00. As the result the memset might result in a tag mismatch. Untag the address to avoid spurious faults. Link: https://lkml.kernel.org/r/ebef6425f4468d063e2f09c1b62ccbb2236b71d3.1606161801.git.andreyknvl@google.com Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: NAndrey Konovalov <andreyknvl@google.com> Reviewed-by: NAlexander Potapenko <glider@google.com> Tested-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 12月, 2020 2 次提交
-
-
由 Alex Shi 提交于
This patch moves per node lru_lock into lruvec, thus bring a lru_lock for each of memcg per node. So on a large machine, each of memcg don't have to suffer from per node pgdat->lru_lock competition. They could go fast with their self lru_lock. After move memcg charge before lru inserting, page isolation could serialize page's memcg, then per memcg lruvec lock is stable and could replace per node lru lock. In isolate_migratepages_block(), compact_unlock_should_abort and lock_page_lruvec_irqsave are open coded to work with compact_control. Also add a debug func in locking which may give some clues if there are sth out of hands. Daniel Jordan's testing show 62% improvement on modified readtwice case on his 2P * 10 core * 2 HT broadwell box. https://lore.kernel.org/lkml/20200915165807.kpp7uhiw7l3loofu@ca-dmjordan1.us.oracle.com/ Hugh Dickins helped on the patch polish, thanks! [alex.shi@linux.alibaba.com: fix comment typo] Link: https://lkml.kernel.org/r/5b085715-292a-4b43-50b3-d73dc90d1de5@linux.alibaba.com [alex.shi@linux.alibaba.com: use page_memcg()] Link: https://lkml.kernel.org/r/5a4c2b72-7ee8-2478-fc0e-85eb83aafec4@linux.alibaba.com Link: https://lkml.kernel.org/r/1604566549-62481-18-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NHugh Dickins <hughd@google.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Rong Chen <rong.a.chen@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mauro Carvalho Chehab 提交于
Kernel-doc markups should use this format: identifier - description Fix some issues on mm files: 1) The definition for get_user_pages_locked() doesn't follow it. Also, it expects a short descrpition at the header, followed by a long one, after the parameters. Fix it. 2) Kernel-doc requires that a kernel-doc markup to be immediately below the function prototype, as otherwise it will rename it. So, move get_pfnblock_flags_mask() description to the right place. 3) Make invalidate_mapping_pagevec() to also follow the expected kernel-doc format. While here, fix a few minor English syntax issues, as suggested by Matthew: will used -> will be used similar with -> similar to Link: https://lkml.kernel.org/r/80e85dddc92d333bc2159ee8a2294921612e8745.1605521731.git.mchehab+huawei@kernel.orgSigned-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org> Suggested-by: Mattew Wilcox <willy@infradead.org> [English fixes] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-