- 17 10月, 2020 15 次提交
-
-
由 David Hildenbrand 提交于
On the memory onlining path, we want to start with MIGRATE_ISOLATE, to un-isolate the pages after memory onlining is complete. Let's allow passing in the migratetype. Signed-off-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NOscar Salvador <osalvador@suse.de> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michel Lespinasse <walken@google.com> Cc: Charan Teja Reddy <charante@codeaurora.org> Cc: Mel Gorman <mgorman@techsingularity.net> Link: https://lkml.kernel.org/r/20200819175957.28465-10-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Hildenbrand 提交于
offline_pages() is the only user. __offline_isolated_pages() never gets called with ranges that contain memory holes and we no longer care about the return value. Drop the return value handling and all pfn_valid() checks. Update the documentation. Signed-off-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NOscar Salvador <osalvador@suse.de> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Charan Teja Reddy <charante@codeaurora.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200819175957.28465-5-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
memory_failure() is supposed to call action_result() when it handles a memory error event, but there's one missing case. So let's add it. I find that include/ras/ras_event.h has some other MF_MSG_* undefined, so this patch also adds them. Signed-off-by: NNaoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: NOscar Salvador <osalvador@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Aristeu Rozanski <aris@ruivo.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Yakunin <zeil@yandex-team.ru> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200922135650.1634-13-osalvador@suse.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oscar Salvador 提交于
This patch changes the way we set and handle in-use poisoned pages. Until now, poisoned pages were released to the buddy allocator, trusting that the checks that take place at allocation time would act as a safe net and would skip that page. This has proved to be wrong, as we got some pfn walkers out there, like compaction, that all they care is the page to be in a buddy freelist. Although this might not be the only user, having poisoned pages in the buddy allocator seems a bad idea as we should only have free pages that are ready and meant to be used as such. Before explaining the taken approach, let us break down the kind of pages we can soft offline. - Anonymous THP (after the split, they end up being 4K pages) - Hugetlb - Order-0 pages (that can be either migrated or invalited) * Normal pages (order-0 and anon-THP) - If they are clean and unmapped page cache pages, we invalidate then by means of invalidate_inode_page(). - If they are mapped/dirty, we do the isolate-and-migrate dance. Either way, do not call put_page directly from those paths. Instead, we keep the page and send it to page_handle_poison to perform the right handling. page_handle_poison sets the HWPoison flag and does the last put_page. Down the chain, we placed a check for HWPoison page in free_pages_prepare, that just skips any poisoned page, so those pages do not end up in any pcplist/freelist. After that, we set the refcount on the page to 1 and we increment the poisoned pages counter. If we see that the check in free_pages_prepare creates trouble, we can always do what we do for free pages: - wait until the page hits buddy's freelists - take it off, and flag it The downside of the above approach is that we could race with an allocation, so by the time we want to take the page off the buddy, the page has been already allocated so we cannot soft offline it. But the user could always retry it. * Hugetlb pages - We isolate-and-migrate them After the migration has been successful, we call dissolve_free_huge_page, and we set HWPoison on the page if we succeed. Hugetlb has a slightly different handling though. While for non-hugetlb pages we cared about closing the race with an allocation, doing so for hugetlb pages requires quite some additional and intrusive code (we would need to hook in free_huge_page and some other places). So I decided to not make the code overly complicated and just fail normally if the page we allocated in the meantime. We can always build on top of this. As a bonus, because of the way we handle now in-use pages, we no longer need the put-as-isolation-migratetype dance, that was guarding for poisoned pages to end up in pcplists. Signed-off-by: NOscar Salvador <osalvador@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NNaoya Horiguchi <naoya.horiguchi@nec.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Aristeu Rozanski <aris@ruivo.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Yakunin <zeil@yandex-team.ru> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200922135650.1634-10-osalvador@suse.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oscar Salvador 提交于
When trying to soft-offline a free page, we need to first take it off the buddy allocator. Once we know is out of reach, we can safely flag it as poisoned. take_page_off_buddy will be used to take a page meant to be poisoned off the buddy allocator. take_page_off_buddy calls break_down_buddy_pages, which splits a higher-order page in case our page belongs to one. Once the page is under our control, we call page_handle_poison to set it as poisoned and grab a refcount on it. Signed-off-by: NOscar Salvador <osalvador@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NNaoya Horiguchi <naoya.horiguchi@nec.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Aristeu Rozanski <aris@ruivo.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Yakunin <zeil@yandex-team.ru> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200922135650.1634-9-osalvador@suse.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oscar Salvador 提交于
After commit 4e41a30c ("mm: hwpoison: adjust for new thp refcounting"), put_hwpoison_page got reduced to a put_page. Let us just use put_page instead. Signed-off-by: NOscar Salvador <osalvador@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NNaoya Horiguchi <naoya.horiguchi@nec.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Aristeu Rozanski <aris@ruivo.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Yakunin <zeil@yandex-team.ru> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200922135650.1634-7-osalvador@suse.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oscar Salvador 提交于
Since get_hwpoison_page is only used in memory-failure code now, let us un-export it and make it private to that code. Signed-off-by: NOscar Salvador <osalvador@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NNaoya Horiguchi <naoya.horiguchi@nec.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Aristeu Rozanski <aris@ruivo.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Yakunin <zeil@yandex-team.ru> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200922135650.1634-5-osalvador@suse.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Reimplement page_cache_sync_readahead() and page_cache_async_readahead() as wrappers around versions of the function which take a readahead_control in preparation for making do_sync_mmap_readahead() pass down an RAC struct. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: David Howells <dhowells@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Link: https://lkml.kernel.org/r/20200903140844.14194-8-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Define it in the callers instead of in page_cache_ra_unbounded(). Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: David Howells <dhowells@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Link: https://lkml.kernel.org/r/20200903140844.14194-4-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Patch series "Readahead patches for 5.9/5.10". These are infrastructure for both the THP patchset and for the fscache rewrite, For both pieces of infrastructure being build on top of this patchset, we want the ractl to be available higher in the call-stack. For David's work, he wants to add the 'critical page' to the ractl so that he knows which page NEEDS to be brought in from storage, and which ones are nice-to-have. We might want something similar in block storage too. It used to be simple -- the first page was the critical one, but then mmap added fault-around and so for that usecase, the middle page is the critical one. Anyway, I don't have any code to show that yet, we just know that the lowest point in the callchain where we have that information is do_sync_mmap_readahead() and so the ractl needs to start its life there. For THP, we havew the code that needs it. It's actually the apex patch to the series; the one which finally starts to allocate THPs and present them to consenting filesystems: http://git.infradead.org/users/willy/pagecache.git/commitdiff/798bcf30ab2eff278caad03a9edca74d2f8ae760 This patch (of 8): Allow for a more concise definition of a struct readahead_control. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Eric Biggers <ebiggers@google.com> Cc: David Howells <dhowells@redhat.com> Link: https://lkml.kernel.org/r/20200903140844.14194-1-willy@infradead.org Link: https://lkml.kernel.org/r/20200903140844.14194-3-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
The nr_thps counter is to support THPs in the page cache when the filesystem doesn't understand THPs. Eventually it will be removed, but we should still support filesystems which do not understand THPs yet. Move the nr_thp manipulation functions to filemap.h since they're page-cache specific. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Rik van Riel <riel@surriel.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Link: https://lkml.kernel.org/r/20200916032717.22917-2-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
The page cache needs to know whether the filesystem supports THPs so that it doesn't send THPs to filesystems which can't handle them. Dave Chinner points out that getting from the page mapping to the filesystem type is too many steps (mapping->host->i_sb->s_type->fs_flags) so cache that information in the address space flags. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Rik van Riel <riel@surriel.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Link: https://lkml.kernel.org/r/20200916032717.22917-1-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
The implementation of split_page_owner() prefers a count rather than the old order of the page. When we support a variable size THP, we won't have the order at this point, but we will have the number of pages. So change the interface to what the caller and callee would prefer. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NSeongJae Park <sjpark@amazon.de> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Link: https://lkml.kernel.org/r/20200908195539.25896-4-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
In order to use multi-index entries for huge pages in the page cache, we need to be able to split a multi-index entry (eg if a file is truncated in the middle of a huge page entry). This version does not support splitting more than one level of the tree at a time. This is an acceptable limitation for the page cache as we do not expect to support order-12 pages in the near future. [akpm@linux-foundation.org: export xas_split_alloc() to modules] [willy@infradead.org: fix xarray split] Link: https://lkml.kernel.org/r/20200910175450.GV6583@casper.infradead.org [willy@infradead.org: fix xarray] Link: https://lkml.kernel.org/r/20201001233943.GW20115@casper.infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Qian Cai <cai@lca.pw> Cc: Song Liu <songliubraving@fb.com> Link: https://lkml.kernel.org/r/20200903183029.14930-3-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Patch series "Fix read-only THP for non-tmpfs filesystems". As described more verbosely in the [3/3] changelog, we can inadvertently put an order-0 page in the page cache which occupies 512 consecutive entries. Users are running into this if they enable the READ_ONLY_THP_FOR_FS config option; see https://bugzilla.kernel.org/show_bug.cgi?id=206569 and Qian Cai has also reported it here: https://lore.kernel.org/lkml/20200616013309.GB815@lca.pw/ This is a rather intrusive way of fixing the problem, but has the advantage that I've actually been testing it with the THP patches, which means that it sees far more use than it does upstream -- indeed, Song has been entirely unable to reproduce it. It also has the advantage that it removes a few patches from my gargantuan backlog of THP patches. This patch (of 3): This function returns the order of the entry at the index. We need this because there isn't space in the shadow entry to encode its order. [akpm@linux-foundation.org: export xa_get_order to modules] Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Qian Cai <cai@lca.pw> Cc: Song Liu <songliubraving@fb.com> Link: https://lkml.kernel.org/r/20200903183029.14930-1-willy@infradead.org Link: https://lkml.kernel.org/r/20200903183029.14930-2-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 10月, 2020 25 次提交
-
-
由 Suren Baghdasaryan 提交于
Currently __set_oom_adj loops through all processes in the system to keep oom_score_adj and oom_score_adj_min in sync between processes sharing their mm. This is done for any task with more that one mm_users, which includes processes with multiple threads (sharing mm and signals). However for such processes the loop is unnecessary because their signal structure is shared as well. Android updates oom_score_adj whenever a tasks changes its role (background/foreground/...) or binds to/unbinds from a service, making it more/less important. Such operation can happen frequently. We noticed that updates to oom_score_adj became more expensive and after further investigation found out that the patch mentioned in "Fixes" introduced a regression. Using Pixel 4 with a typical Android workload, write time to oom_score_adj increased from ~3.57us to ~362us. Moreover this regression linearly depends on the number of multi-threaded processes running on the system. Mark the mm with a new MMF_MULTIPROCESS flag bit when task is created with (CLONE_VM && !CLONE_THREAD && !CLONE_VFORK). Change __set_oom_adj to use MMF_MULTIPROCESS instead of mm_users to decide whether oom_score_adj update should be synchronized between multiple processes. To prevent races between clone() and __set_oom_adj(), when oom_score_adj of the process being cloned might be modified from userspace, we use oom_adj_mutex. Its scope is changed to global. The combination of (CLONE_VM && !CLONE_THREAD) is rarely used except for the case of vfork(). To prevent performance regressions of vfork(), we skip taking oom_adj_mutex and setting MMF_MULTIPROCESS when CLONE_VFORK is specified. Clearing the MMF_MULTIPROCESS flag (when the last process sharing the mm exits) is left out of this patch to keep it simple and because it is believed that this threading model is rare. Should there ever be a need for optimizing that case as well, it can be done by hooking into the exit path, likely following the mm_update_next_owner pattern. With the combination of (CLONE_VM && !CLONE_THREAD && !CLONE_VFORK) being quite rare, the regression is gone after the change is applied. [surenb@google.com: v3] Link: https://lkml.kernel.org/r/20200902012558.2335613-1-surenb@google.com Fixes: 44a70ade ("mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj") Reported-by: NTim Murray <timmurray@google.com> Suggested-by: NMichal Hocko <mhocko@kernel.org> Signed-off-by: NSuren Baghdasaryan <surenb@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NChristian Brauner <christian.brauner@ubuntu.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Eugene Syromiatnikov <esyr@redhat.com> Cc: Christian Kellner <christian@kellner.me> Cc: Adrian Reber <areber@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Gladkov <gladkov.alexey@gmail.com> Cc: Michel Lespinasse <walken@google.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Bernd Edlinger <bernd.edlinger@hotmail.de> Cc: John Johansen <john.johansen@canonical.com> Cc: Yafang Shao <laoar.shao@gmail.com> Link: https://lkml.kernel.org/r/20200824153036.3201505-1-surenb@google.comDebugged-by: NMinchan Kim <minchan@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Rapoport 提交于
for_each_memblock() is used to iterate over memblock.memory in a few places that use data from memblock_region rather than the memory ranges. Introduce separate for_each_mem_region() and for_each_reserved_mem_region() to improve encapsulation of memblock internals from its users. Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NBaoquan He <bhe@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> [x86] Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> [MIPS] Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> [.clang-format] Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-18-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Rapoport 提交于
Iteration over memblock.reserved with for_each_reserved_mem_region() used __next_reserved_mem_region() that implemented a subset of __next_mem_region(). Use __for_each_mem_range() and, essentially, __next_mem_region() with appropriate parameters to reduce code duplication. While on it, rename for_each_reserved_mem_region() to for_each_reserved_mem_range() for consistency. Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> [.clang-format] Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-17-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Rapoport 提交于
The only user of memblock_mem_size() was x86 setup code, it is gone now and memblock_mem_size() funciton can be removed. Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NBaoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-16-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Rapoport 提交于
Currently for_each_mem_range() and for_each_mem_range_rev() iterators are the most generic way to traverse memblock regions. As such, they have 8 parameters and they are hardly convenient to users. Most users choose to utilize one of their wrappers and the only user that actually needs most of the parameters is memblock itself. To avoid yet another naming for memblock iterators, rename the existing for_each_mem_range[_rev]() to __for_each_mem_range[_rev]() and add a new for_each_mem_range[_rev]() wrappers with only index, start and end parameters. The new wrapper nicely fits into init_unavailable_mem() and will be used in upcoming changes to simplify memblock traversals. Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> [MIPS] Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-11-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Rapoport 提交于
The only user of memblock_dbg() outside memblock was s390 setup code and it is converted to use pr_debug() instead. This allows to stop exposing memblock_debug and memblock_dbg() to the rest of the kernel. [akpm@linux-foundation.org: make memblock_dbg() safer and neater] Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NBaoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-10-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Rapoport 提交于
for_each_memblock_type() is not used outside mm/memblock.c, move it there from include/linux/memblock.h Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NBaoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-9-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
No one use this macro anymore. Also fix code style of policy_node(). Signed-off-by: NWei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200921021401.84508-1-richard.weiyang@linux.alibaba.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mateusz Nosek 提交于
The enum value 'COMPACT_INACTIVE' is never used so can be removed. Signed-off-by: NMateusz Nosek <mateusznosek0@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200917110750.12015-1-mateusznosek0@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
There is a general understanding that GFP_ATOMIC/GFP_NOWAIT are to be used from atomic contexts. E.g. from within a spin lock or from the IRQ context. This is correct but there are some atomic contexts where the above doesn't hold. One of them would be an NMI context. Page allocator has never supported that and the general fear of this context didn't let anybody to actually even try to use the allocator there. Good, but let's be more specific about that. Another such a context, and that is where people seem to be more daring, is raw_spin_lock. Mostly because it simply resembles regular spin lock which is supported by the allocator and there is not any implementation difference with !RT kernels in the first place. Be explicit that such a context is not supported by the allocator. The underlying reason is that zone->lock would have to become raw_spin_lock as well and that has turned out to be a problem for RT (http://lkml.kernel.org/r/87mu305c1w.fsf@nanos.tec.linutronix.de). Signed-off-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NUladzislau Rezki <urezki@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: https://lkml.kernel.org/r/20200929123010.5137-1-mhocko@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mateusz Nosek 提交于
Previously 'for_next_zone_zonelist_nodemask' macro parameter 'zlist' was unused so this patch removes it. Signed-off-by: NMateusz Nosek <mateusznosek0@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200917211906.30059-1-mateusznosek0@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Hildenbrand 提交于
Let's document what ZONE_MOVABLE means, how it's used, and which special cases we have regarding unmovable pages (memory offlining vs. migration / allocations). Signed-off-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NMike Rapoport <rppt@linux.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Qian Cai <cai@lca.pw> Link: http://lkml.kernel.org/r/20200816125333.7434-7-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Patricia Alfonso 提交于
Integrate KASAN into KUnit testing framework. - Fail tests when KASAN reports an error that is not expected - Use KUNIT_EXPECT_KASAN_FAIL to expect a KASAN error in KASAN tests - Expected KASAN reports pass tests and are still printed when run without kunit_tool (kunit_tool still bypasses the report due to the test passing) - KUnit struct in current task used to keep track of the current test from KASAN code Make use of "[PATCH v3 kunit-next 1/2] kunit: generalize kunit_resource API beyond allocated resources" and "[PATCH v3 kunit-next 2/2] kunit: add support for named resources" from Alan Maguire [1] - A named resource is added to a test when a KASAN report is expected - This resource contains a struct for kasan_data containing booleans representing if a KASAN report is expected and if a KASAN report is found [1] (https://lore.kernel.org/linux-kselftest/1583251361-12748-1-git-send-email-alan.maguire@oracle.com/T/#t) Signed-off-by: NPatricia Alfonso <trishalfonso@google.com> Signed-off-by: NDavid Gow <davidgow@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NAndrey Konovalov <andreyknvl@google.com> Reviewed-by: NAndrey Konovalov <andreyknvl@google.com> Reviewed-by: NDmitry Vyukov <dvyukov@google.com> Acked-by: NBrendan Higgins <brendanhiggins@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Shuah Khan <shuah@kernel.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20200915035828.570483-3-davidgow@google.com Link: https://lkml.kernel.org/r/20200910070331.3358048-3-davidgow@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Patricia Alfonso 提交于
Patch series "KASAN-KUnit Integration", v14. This patchset contains everything needed to integrate KASAN and KUnit. KUnit will be able to: (1) Fail tests when an unexpected KASAN error occurs (2) Pass tests when an expected KASAN error occurs Convert KASAN tests to KUnit with the exception of copy_user_test because KUnit is unable to test those. Add documentation on how to run the KASAN tests with KUnit and what to expect when running these tests. This patch (of 5): In order to integrate debugging tools like KASAN into the KUnit framework, add KUnit struct to the current task to keep track of the current KUnit test. Signed-off-by: NPatricia Alfonso <trishalfonso@google.com> Signed-off-by: NDavid Gow <davidgow@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NAndrey Konovalov <andreyknvl@google.com> Reviewed-by: NBrendan Higgins <brendanhiggins@google.com> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Shuah Khan <shuah@kernel.org> Link: https://lkml.kernel.org/r/20200915035828.570483-1-davidgow@google.com Link: https://lkml.kernel.org/r/20200915035828.570483-2-davidgow@google.com Link: https://lkml.kernel.org/r/20200910070331.3358048-1-davidgow@google.com Link: https://lkml.kernel.org/r/20200910070331.3358048-2-davidgow@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 yuleixzhang 提交于
As mincore_huge_pmd() was dropped, remove the declaration from the header file. Signed-off-by: NYulei Zhang <yuleixzhang@tencent.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NZi Yan <ziy@nvidia.com> Link: https://lkml.kernel.org/r/20200922083423.15074-1-yuleixzhang@tencent.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Xu 提交于
Both of the mm pointers are not needed after commit 7a4830c3 ("mm/fork: Pass new vma pointer into copy_page_range()"). Jason Gunthorpe also reported that the ordering of copy_page_range() is odd. Since working at it, reorder the parameters to be logical, by (1) always put the dst_* fields to be before src_* fields, and (2) keep the same type of parameters together. [peterx@redhat.com: further reorder some parameters and line format, per Jason] Link: https://lkml.kernel.org/r/20201002192647.7161-1-peterx@redhat.com [peterx@redhat.com: fix warnings] Link: https://lkml.kernel.org/r/20201006200138.GA6026@xz-x1Reported-by: NKirill A. Shutemov <kirill@shutemov.name> Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJason Gunthorpe <jgg@nvidia.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20200930204950.6668-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Introduce the new page policy of PF_SECOND which lets us use the normal pageflags generation machinery to create the various DoubleMap manipulation functions. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NZi Yan <ziy@nvidia.com> Link: https://lkml.kernel.org/r/20200629151933.15671-3-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Patch series "Fix PageDoubleMap". This is a purely theoretical problem for now as none of the filesystems which use PG_private_2 (ie PG_fscache) are being converted at this time, but it's confusing to leave it like this. This patch (of 2): PG_private_2 is defined as being PF_ANY (applicable to tail pages as well as regular & head pages). That means that the first tail page of a double-map page will appear to have Private2 set. Use the Workingset bit instead which is defined as PF_HEAD so any attempt to access the Workingset bit on a tail page will redirect to the head page's Workingset bit. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NZi Yan <ziy@nvidia.com> Link: https://lkml.kernel.org/r/20200629151933.15671-1-willy@infradead.org Link: https://lkml.kernel.org/r/20200629151933.15671-2-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Chinwen Chang 提交于
Patch series "Try to release mmap_lock temporarily in smaps_rollup", v4. Recently, we have observed some janky issues caused by unpleasantly long contention on mmap_lock which is held by smaps_rollup when probing large processes. To address the problem, we let smaps_rollup detect if anyone wants to acquire mmap_lock for write attempts. If yes, just release the lock temporarily to ease the contention. smaps_rollup is a procfs interface which allows users to summarize the process's memory usage without the overhead of seq_* calls. Android uses it to sample the memory usage of various processes to balance its memory pool sizes. If no one wants to take the lock for write requests, smaps_rollup with this patch will behave like the original one. Although there are on-going mmap_lock optimizations like range-based locks, the lock applied to smaps_rollup would be the coarse one, which is hard to avoid the occurrence of aforementioned issues. So the detection and temporary release for write attempts on mmap_lock in smaps_rollup is still necessary. This patch (of 3): Add new API to query if someone wants to acquire mmap_lock for write attempts. Using this instead of rwsem_is_contended makes it more tolerant of future changes to the lock type. Signed-off-by: NChinwen Chang <chinwen.chang@mediatek.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NSteven Price <steven.price@arm.com> Acked-by: NMichel Lespinasse <walken@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Daniel Kiss <daniel.kiss@arm.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Huang Ying <ying.huang@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jimmy Assarsson <jimmyassarsson@gmail.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/1597715898-3854-1-git-send-email-chinwen.chang@mediatek.com Link: http://lkml.kernel.org/r/1597715898-3854-2-git-send-email-chinwen.chang@mediatek.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox 提交于
We account the PTE level of the page tables to the process in order to make smarter OOM decisions and help diagnose why memory is fragmented. For these same reasons, we should account pages allocated for PMDs. With larger process address spaces and ASLR, the number of PMDs in use is higher than it used to be so the inaccuracy is starting to matter. [rppt@linux.ibm.com: arm: __pmd_free_tlb(): call page table destructor] Link: https://lkml.kernel.org/r/20200825111303.GB69694@linux.ibm.comSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NMike Rapoport <rppt@linux.ibm.com> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Joerg Roedel <joro@8bytes.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Anders Roxell <anders.roxell@linaro.org> Link: http://lkml.kernel.org/r/20200627184642.GF25039@casper.infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Waiman Long 提交于
The swap page counter is v2 only while memsw is v1 only. As v1 and v2 controllers cannot be active at the same time, there is no point to keep both swap and memsw page counters in mem_cgroup. The previous patch has made sure that memsw page counter is updated and accessed only when in v1 code paths. So it is now safe to alias the v1 memsw page counter to v2 swap page counter. This saves 14 long's in the size of mem_cgroup. This is a saving of 112 bytes for 64-bit archs. While at it, also document which page counters are used in v1 and/or v2. Signed-off-by: NWaiman Long <longman@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NShakeel Butt <shakeelb@google.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Chris Down <chris@chrisdown.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Yafang Shao <laoar.shao@gmail.com> Link: https://lkml.kernel.org/r/20200914024452.19167-4-longman@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Miaohe Lin 提交于
enable_swap_slots_cache() always return zero and its return value is just ignored by the caller. So make enable_swap_slots_cache() void. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200924113554.50614-1-linmiaohe@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yu Zhao 提交于
We don't initially add anon pages to active lruvec after commit b518154e ("mm/vmscan: protect the workingset on anonymous LRU"). Remove activate_page() from unuse_pte(), which seems to be missed by the commit. And make the function static while we are at it. Before the commit, we called lru_cache_add_active_or_unevictable() to add new ksm pages to active lruvec. Therefore, activate_page() wasn't necessary for them in the first place. Signed-off-by: NYu Zhao <yuzhao@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NYang Shi <shy828301@gmail.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200818184704.3625199-1-yuzhao@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gao Xiang 提交于
SWP_FS is used to make swap_{read,write}page() go through the filesystem, and it's only used for swap files over NFS for now. Otherwise it will directly submit IO to blockdev according to swapfile extents reported by filesystems in advance. As Matthew pointed out [1], SWP_FS naming is somewhat confusing, so let's rename to SWP_FS_OPS. [1] https://lore.kernel.org/r/20200820113448.GM17456@casper.infradead.orgSuggested-by: NMatthew Wilcox <willy@infradead.org> Signed-off-by: NGao Xiang <hsiangkao@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200822113019.11319-1-hsiangkao@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yafang Shao 提交于
Our users reported that there're some random latency spikes when their RT process is running. Finally we found that latency spike is caused by FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on remote CPUs, and then waits the per-cpu work to complete. The wait time is uncertain, which may be tens millisecond. That behavior is unreasonable, because this process is bound to a specific CPU and the file is only accessed by itself, IOW, there should be no pagecache pages on a per-cpu pagevec of a remote CPU. That unreasonable behavior is partially caused by the wrong comparation of the number of invalidated pages and the number of the target. For example, if (count < (end_index - start_index + 1)) The count above is how many pages were invalidated in the local CPU, and (end_index - start_index + 1) is how many pages should be invalidated. The usage of (end_index - start_index + 1) is incorrect, because they are virtual addresses, which may not mapped to pages. Besides that, there may be holes between start and end. So we'd better check whether there are still pages on per-cpu pagevec after drain the local cpu, and then decide whether or not to call lru_add_drain_all(). After I applied it with a hotfix to our production environment, most of the lru_add_drain_all() can be avoided. Suggested-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NYafang Shao <laoar.shao@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NMel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Link: https://lkml.kernel.org/r/20200923133318.14373-1-laoar.shao@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-