- 15 3月, 2022 1 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
Straightforward conversion. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Tested-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
-
- 18 10月, 2021 1 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
This nets us 178 bytes of savings from removing calls to compound_head. The three callers all grow a little, but each of them will be converted to use folios soon, so that's fine. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDavid Howells <dhowells@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz>
-
- 21 8月, 2021 1 次提交
-
-
由 Yang Shi 提交于
Due to the change about how block layer detects congestion the justification of commit 8fd2e0b5 ("mm: swap: check if swap backing device is congested or not") doesn't stand anymore, so the commit could be just reverted in order to solve the race reported by commit 2efa33fc ("mm/shmem: fix shmem_swapin() race with swapoff"). The fix was reverted by the previous patch. Link: https://lkml.kernel.org/r/20210810202936.2672-3-shy828301@gmail.comSigned-off-by: NYang Shi <shy828301@gmail.com> Suggested-by: NHugh Dickins <hughd@google.com> Acked-by: NHugh Dickins <hughd@google.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 6月, 2021 4 次提交
-
-
由 Huang Ying 提交于
To check whether all pages and shadow entries in swap cache has been removed before swap cache is freed. Link: https://lkml.kernel.org/r/20210608005121.511140-1-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Huang Ying 提交于
With commit 09854ba9 ("mm: do_wp_page() simplification"), after COW, the idle swap cache page (neither the page nor the corresponding swap entry is mapped by any process) will be left in the LRU list, even if it's in the active list or the head of the inactive list. So, the page reclaimer may take quite some overhead to reclaim these actually unused pages. To help the page reclaiming, in this patch, after COW, the idle swap cache page will be tried to be freed. To avoid to introduce much overhead to the hot COW code path, a) there's almost zero overhead for non-swap case via checking PageSwapCache() firstly. b) the page lock is acquired via trylock only. To test the patch, we used pmbench memory accessing benchmark with working-set larger than available memory on a 2-socket Intel server with a NVMe SSD as swap device. Test results shows that the pmbench score increases up to 23.8% with the decreased size of swap cache and swapin throughput. Link: https://lkml.kernel.org/r/20210601053143.1380078-1-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> [use free_swap_cache()] Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@surriel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Tim Chen <tim.c.chen@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Miaohe Lin 提交于
Since commit 55c653b71e8c ("mm: stop accounting shadow entries"), nr_shadows is not used anymore. Link: https://lkml.kernel.org/r/20210520134022.1370406-3-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> -
由 Miaohe Lin 提交于
The non_swap_entry() was used for working with VMA based swap readahead via commit ec560175 ("mm, swap: VMA based swap readahead"). At that time, the non_swap_entry() checking is necessary because the function is called before checking that in do_swap_page(). Then it's moved to swap_ra_info() since commit eaf649eb ("mm: swap: clean up swap readahead"). After that, the non_swap_entry() checking is unnecessary, because swap_ra_info() is called after non_swap_entry() has been checked already. The resulting code is confusing as the non_swap_entry() check looks racy now because while we released the pte lock, somebody else might have faulted in this pte. So we should check whether it's swap pte first to guard against such race or swap_type will be unexpected. But the race isn't important because it will not cause problem. We would have enough checking when we really operate the PTE entries later. So we remove the non_swap_entry() check here to avoid confusion. Link: https://lkml.kernel.org/r/20210426123316.806267-4-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com> Reviewed-by: N"Huang, Ying" <ying.huang@intel.com> Cc: Alex Shi <alexs@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 5月, 2021 1 次提交
-
-
由 Shijie Luo 提交于
fix some typos and code style problems in mm. gfp.h: s/MAXNODES/MAX_NUMNODES mmzone.h: s/then/than rmap.c: s/__vma_split()/__vma_adjust() swap.c: s/__mod_zone_page_stat/__mod_zone_page_state, s/is is/is swap_state.c: s/whoes/whose z3fold.c: code style problem fix in z3fold_unregister_migration zsmalloc.c: s/of/or, s/give/given Link: https://lkml.kernel.org/r/20210419083057.64820-1-luoshijie1@huawei.comSigned-off-by: NShijie Luo <luoshijie1@huawei.com> Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 5月, 2021 1 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
We no longer need to keep track of how many shadow entries are present in a mapping. This saves a few writes to the inode and memory barriers. Link: https://lkml.kernel.org/r/20201026151849.24232-3-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Tested-by: NVishal Verma <vishal.l.verma@intel.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 5月, 2021 1 次提交
-
-
由 Shakeel Butt 提交于
Currently the kernel adds the page, allocated for swapin, to the swapcache before charging the page. This is fine but now we want a per-memcg swapcache stat which is essential for folks who wants to transparently migrate from cgroup v1's memsw to cgroup v2's memory and swap counters. In addition charging a page before exposing it to other parts of the kernel is a step in the right direction. To correctly maintain the per-memcg swapcache stat, this patch has adopted to charge the page before adding it to swapcache. One challenge in this option is the failure case of add_to_swap_cache() on which we need to undo the mem_cgroup_charge(). Specifically undoing mem_cgroup_uncharge_swap() is not simple. To resolve the issue, this patch decouples the charging for swapin pages from mem_cgroup_charge(). Two new functions are introduced, mem_cgroup_swapin_charge_page() for just charging the swapin page and mem_cgroup_swapin_uncharge_swap() for uncharging the swap slot once the page has been successfully added to the swapcache. [shakeelb@google.com: set page->private before calling swap_readpage] Link: https://lkml.kernel.org/r/20210318015959.2986837-1-shakeelb@google.com Link: https://lkml.kernel.org/r/20210305212639.775498-1-shakeelb@google.comSigned-off-by: NShakeel Butt <shakeelb@google.com> Acked-by: NRoman Gushchin <guro@fb.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NHugh Dickins <hughd@google.com> Tested-by: NHeiko Carstens <hca@linux.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 2月, 2021 2 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
The functionality of find_lock_entry() and find_get_entry() can be provided by pagecache_get_page(), which lets us delete find_lock_entry() and make find_get_entry() static. Link: https://lkml.kernel.org/r/20201112212641.27837-5-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
There's no need to get a reference to the page, just load the entry and see if it's a shadow entry. Link: https://lkml.kernel.org/r/20201112212641.27837-4-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 2月, 2021 3 次提交
-
-
由 Shakeel Butt 提交于
This patch adds swapcache stat for the cgroup v2. The swapcache represents the memory that is accounted against both the memory and the swap limit of the cgroup. The main motivation behind exposing the swapcache stat is for enabling users to gracefully migrate from cgroup v1's memsw counter to cgroup v2's memory and swap counters. Cgroup v1's memsw limit allows users to limit the memory+swap usage of a workload but without control on the exact proportion of memory and swap. Cgroup v2 provides separate limits for memory and swap which enables more control on the exact usage of memory and swap individually for the workload. With some little subtleties, the v1's memsw limit can be switched with the sum of the v2's memory and swap limits. However the alternative for memsw usage is not yet available in cgroup v2. Exposing per-cgroup swapcache stat enables that alternative. Adding the memory usage and swap usage and subtracting the swapcache will approximate the memsw usage. This will help in the transparent migration of the workloads depending on memsw usage and limit to v2' memory and swap counters. The reasons these applications are still interested in this approximate memsw usage are: (1) these applications are not really interested in two separate memory and swap usage metrics. A single usage metric is more simple to use and reason about for them. (2) The memsw usage metric hides the underlying system's swap setup from the applications. Applications with multiple instances running in a datacenter with heterogeneous systems (some have swap and some don't) will keep seeing a consistent view of their usage. [akpm@linux-foundation.org: fix CONFIG_SWAP=n build] Link: https://lkml.kernel.org/r/20210108155813.2914586-3-shakeelb@google.comSigned-off-by: NShakeel Butt <shakeelb@google.com> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NRoman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yu Zhao 提交于
We are capable of SetPageWorkingset based on refault distances after commit aae466b0 ("mm/swap: implement workingset detection for anonymous LRU"). This is done by workingset_refault(), which is right above the unconditional SetPageWorkingset deleted by this patch. The unconditional SetPageWorkingset miscategorizes pages that are read ahead or never belonged to the working set (e.g., tmpfs pages accessed only once by fd). When those pages are swapped in (after they were swapped out) for the first time, they skew PSI (when using async swap). When this happens again, depending on their refault distances, they might skew workingset_restore_anon counter in addition to PSI because their shadows indicate they were part of the working set. Historically, SetPageWorkingset was added as part of the PSI series, and Johannes said: "It was meant to mark incoming pages under IO with SetPageWorkingset when waiting for them constituted a memory stall. On the page cache side, because we HAVE workingset detection, this was specific to recently evicted pages that had been active in their previous life. On the anon side, the aging algorithm had no distinction between workingset and sporadically used pages. Given the choice between a) no swapin stalls are pressure and b) all swapin stalls are pressure, I went with the latter in order to detect swap storms. The false positive case - high rate of swapin without severe memory pressure - was relatively unlikely, because we tried to avoid swapping until everything was completely on fire in the first place." Link: https://lkml.kernel.org/r/20201209012400.1771150-1-yuzhao@google.com Link: https://lkml.kernel.org/r/20201214231253.62313-1-yuzhao@google.comSigned-off-by: NYu Zhao <yuzhao@google.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rikard Falkeborn 提交于
The only usage of swap_attr_group is to pass its address to sysfs_create_group() which takes a pointer to const attribute_group. Make it const to allow the compiler to put it in read-only memory. Link: https://lkml.kernel.org/r/20210201233254.91809-1-rikard.falkeborn@gmail.comSigned-off-by: NRikard Falkeborn <rikard.falkeborn@gmail.com> Reviewed-by: NAmy Parker <enbyamy@gmail.com> Acked-by: N"Huang, Ying" <ying.huang@intel.com> Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 12月, 2020 2 次提交
-
-
由 Joe Perches 提交于
Patch series "mm: Convert sysfs sprintf family to sysfs_emit", v2. Use the new sysfs_emit family and not the sprintf family. This patch (of 5): Use the sysfs_emit function instead of the sprintf family. Done with cocci script as in commit 3c6bff3c ("RDMA: Convert sysfs kobject * show functions to use sysfs_emit()") Link: https://lkml.kernel.org/r/cover.1605376435.git.joe@perches.com Link: https://lkml.kernel.org/r/9c249215bad6df616ba0410ad980042694970c1b.1605376435.git.joe@perches.comSigned-off-by: NJoe Perches <joe@perches.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Miaohe Lin 提交于
swap_ra_info() may leave ra_info untouched in non_swap_entry() case as page table lock is not held. In this case, we have ra_info.nr_pte == 0 and it is meaningless to continue with swap cache readahead. Skip such ops by init ra_info.win = 1. [akpm@linux-foundation.org: clean up struct init] Link: https://lkml.kernel.org/r/20201009133059.58407-1-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 10月, 2020 1 次提交
-
-
由 Miaohe Lin 提交于
Fix some broken comments including typo, grammar error and wrong function name. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200913095456.54873-1-linmiaohe@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 10月, 2020 3 次提交
-
-
由 Gao Xiang 提交于
SWP_FS is used to make swap_{read,write}page() go through the filesystem, and it's only used for swap files over NFS for now. Otherwise it will directly submit IO to blockdev according to swapfile extents reported by filesystems in advance. As Matthew pointed out [1], SWP_FS naming is somewhat confusing, so let's rename to SWP_FS_OPS. [1] https://lore.kernel.org/r/20200820113448.GM17456@casper.infradead.orgSuggested-by: NMatthew Wilcox <willy@infradead.org> Signed-off-by: NGao Xiang <hsiangkao@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200822113019.11319-1-hsiangkao@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org> -
由 Matthew Wilcox (Oracle) 提交于
There are only four callers remaining of find_get_entry(). get_shadow_from_swap_cache() only wants to see shadow entries and doesn't care about which page is returned. Push the find_subpage() call into find_lock_entry(), find_get_incore_page() and pagecache_get_page(). [willy@infradead.org: fix oops] Link: https://lkml.kernel.org/r/20200914112738.GM6583@casper.infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Huang Ying <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Auld <matthew.auld@intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Link: https://lkml.kernel.org/r/20200910183318.20139-7-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Patch series "Return head pages from find_*_entry", v2. This patch series started out as part of the THP patch set, but it has some nice effects along the way and it seems worth splitting it out and submitting separately. Currently find_get_entry() and find_lock_entry() return the page corresponding to the requested index, but the first thing most callers do is find the head page, which we just threw away. As part of auditing all the callers, I found some misuses of the APIs and some plain inefficiencies that I've fixed. The diffstat is unflattering, but I added more kernel-doc and a new wrapper. This patch (of 8); Provide this functionality from the swap cache. It's useful for more than just mincore(). Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Huang Ying <ying.huang@intel.com> Link: https://lkml.kernel.org/r/20200910183318.20139-1-willy@infradead.org Link: https://lkml.kernel.org/r/20200910183318.20139-2-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 8月, 2020 2 次提交
-
-
由 Qian Cai 提交于
swap_cache_info.* could be accessed concurrently as noticed by KCSAN, BUG: KCSAN: data-race in lookup_swap_cache / lookup_swap_cache write to 0xffffffff85517318 of 8 bytes by task 94138 on cpu 101: lookup_swap_cache+0x12e/0x460 lookup_swap_cache at mm/swap_state.c:322 do_swap_page+0x112/0xeb0 __handle_mm_fault+0xc7a/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 read to 0xffffffff85517318 of 8 bytes by task 91655 on cpu 100: lookup_swap_cache+0x117/0x460 lookup_swap_cache at mm/swap_state.c:322 shmem_swapin_page+0xc7/0x9e0 shmem_getpage_gfp+0x2ca/0x16c0 shmem_fault+0xef/0x3c0 __do_fault+0x9e/0x220 do_fault+0x4a0/0x920 __handle_mm_fault+0xc69/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 Reported by Kernel Concurrency Sanitizer on: CPU: 100 PID: 91655 Comm: systemd-journal Tainted: G W O L 5.5.0-next-20200204+ #6 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 write to 0xffffffff8d717308 of 8 bytes by task 11365 on cpu 87: __delete_from_swap_cache+0x681/0x8b0 __delete_from_swap_cache at mm/swap_state.c:178 read to 0xffffffff8d717308 of 8 bytes by task 11275 on cpu 53: __delete_from_swap_cache+0x66e/0x8b0 __delete_from_swap_cache at mm/swap_state.c:178 Both the read and write are done as lockless. Since swap_cache_info.* are only used to print out counter information, even if any of them missed a few incremental due to data races, it will be harmless, so just mark it as an intentional data race using the data_race() macro. While at it, fix a checkpatch.pl warning, WARNING: Single statement macros should not use a do {} while (0) loop Signed-off-by: NQian Cai <cai@lca.pw> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Marco Elver <elver@google.com> Link: http://lkml.kernel.org/r/20200207003715.1578-1-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org> -
由 Matthew Wilcox (Oracle) 提交于
The thp prefix is more frequently used than hpage and we should be consistent between the various functions. [akpm@linux-foundation.org: fix mm/migrate.c] Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com> Reviewed-by: NZi Yan <ziy@nvidia.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 8月, 2020 2 次提交
-
-
由 Joonsoo Kim 提交于
This patch implements workingset detection for anonymous LRU. All the infrastructure is implemented by the previous patches so this patch just activates the workingset detection by installing/retrieving the shadow entry and adding refault calculation. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Link: http://lkml.kernel.org/r/1595490560-15117-6-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
Workingset detection for anonymous page will be implemented in the following patch and it requires to store the shadow entries into the swapcache. This patch implements an infrastructure to store the shadow entry in the swapcache. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/1595490560-15117-5-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 8月, 2020 1 次提交
-
-
由 Krzysztof Kozlowski 提交于
Fix W=1 compile warnings (invalid kerneldoc): mm/swap_state.c:742: warning: Function parameter or member 'fentry' not described in 'swap_vma_readahead' mm/swap_state.c:742: warning: Excess function parameter 'entry' description in 'swap_vma_readahead' Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200728171109.28687-2-krzk@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 6月, 2020 1 次提交
-
-
由 Hugh Dickins 提交于
Chris Murphy reports that a slightly overcommitted load, testing swap and zram along with i915, splats and keeps on splatting, when it had better fail less noisily: gnome-shell: page allocation failure: order:0, mode:0x400d0(__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_RECLAIMABLE), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 2 PID: 1155 Comm: gnome-shell Not tainted 5.7.0-1.fc33.x86_64 #1 Call Trace: dump_stack+0x64/0x88 warn_alloc.cold+0x75/0xd9 __alloc_pages_slowpath.constprop.0+0xcfa/0xd30 __alloc_pages_nodemask+0x2df/0x320 alloc_slab_page+0x195/0x310 allocate_slab+0x3c5/0x440 ___slab_alloc+0x40c/0x5f0 __slab_alloc+0x1c/0x30 kmem_cache_alloc+0x20e/0x220 xas_nomem+0x28/0x70 add_to_swap_cache+0x321/0x400 __read_swap_cache_async+0x105/0x240 swap_cluster_readahead+0x22c/0x2e0 shmem_swapin+0x8e/0xc0 shmem_swapin_page+0x196/0x740 shmem_getpage_gfp+0x3a2/0xa60 shmem_read_mapping_page_gfp+0x32/0x60 shmem_get_pages+0x155/0x5e0 [i915] __i915_gem_object_get_pages+0x68/0xa0 [i915] i915_vma_pin+0x3fe/0x6c0 [i915] eb_add_vma+0x10b/0x2c0 [i915] i915_gem_do_execbuffer+0x704/0x3430 [i915] i915_gem_execbuffer2_ioctl+0x1ea/0x3e0 [i915] drm_ioctl_kernel+0x86/0xd0 [drm] drm_ioctl+0x206/0x390 [drm] ksys_ioctl+0x82/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5b/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported on 5.7, but it goes back really to 3.1: when shmem_read_mapping_page_gfp() was implemented for use by i915, and allowed for __GFP_NORETRY and __GFP_NOWARN flags in most places, but missed swapin's "& GFP_KERNEL" mask for page tree node allocation in __read_swap_cache_async() - that was to mask off HIGHUSER_MOVABLE bits from what page cache uses, but GFP_RECLAIM_MASK is now what's needed. Link: https://bugzilla.kernel.org/show_bug.cgi?id=208085 Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2006151330070.11064@eggly.anvils Fixes: 68da9f05 ("tmpfs: pass gfp to shmem_getpage_gfp") Signed-off-by: NHugh Dickins <hughd@google.com> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reported-by: NChris Murphy <lists@colorremedies.com> Analyzed-by: NVlastimil Babka <vbabka@suse.cz> Analyzed-by: NMatthew Wilcox <willy@infradead.org> Tested-by: NChris Murphy <lists@colorremedies.com> Cc: <stable@vger.kernel.org> [3.1+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 6月, 2020 2 次提交
-
-
由 Michel Lespinasse 提交于
Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: NMichel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Rapoport 提交于
Patch series "mm: consolidate definitions of page table accessors", v2. The low level page table accessors (pXY_index(), pXY_offset()) are duplicated across all architectures and sometimes more than once. For instance, we have 31 definition of pgd_offset() for 25 supported architectures. Most of these definitions are actually identical and typically it boils down to, e.g. static inline unsigned long pmd_index(unsigned long address) { return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); } static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) { return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); } These definitions can be shared among 90% of the arches provided XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined. For architectures that really need a custom version there is always possibility to override the generic version with the usual ifdefs magic. These patches introduce include/linux/pgtable.h that replaces include/asm-generic/pgtable.h and add the definitions of the page table accessors to the new header. This patch (of 12): The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the functions involving page table manipulations, e.g. pte_alloc() and pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h> in the files that include <linux/mm.h>. The include statements in such cases are remove with a simple loop: for f in $(git grep -l "include <linux/mm.h>") ; do sed -i -e '/include <asm\/pgtable.h>/ d' $f done Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 6月, 2020 5 次提交
-
-
由 Johannes Weiner 提交于
The VM tries to balance reclaim pressure between anon and file so as to reduce the amount of IO incurred due to the memory shortage. It already counts refaults and swapins, but in addition it should also count writepage calls during reclaim. For swap, this is obvious: it's IO that wouldn't have occurred if the anonymous memory hadn't been under memory pressure. From a relative balancing point of view this makes sense as well: even if anon is cold and reclaimable, a cache that isn't thrashing may have equally cold pages that don't require IO to reclaim. For file writeback, it's trickier: some of the reclaim writepage IO would have likely occurred anyway due to dirty expiration. But not all of it - premature writeback reduces batching and generates additional writes. Since the flushers are already woken up by the time the VM starts writing cache pages one by one, let's assume that we'e likely causing writes that wouldn't have happened without memory pressure. In addition, the per-page cost of IO would have probably been much cheaper if written in larger batches from the flusher thread rather than the single-page-writes from kswapd. For our purposes - getting the trend right to accelerate convergence on a stable state that doesn't require paging at all - this is sufficiently accurate. If we later wanted to optimize for sustained thrashing, we can still refine the measurements. Count all writepage calls from kswapd as IO cost toward the LRU that the page belongs to. Why do this dynamically? Don't we know in advance that anon pages require IO to reclaim, and so could build in a static bias? First, scanning is not the same as reclaiming. If all the anon pages are referenced, we may not swap for a while just because we're scanning the anon list. During this time, however, it's important that we age anonymous memory and the page cache at the same rate so that their hot-cold gradients are comparable. Everything else being equal, we still want to reclaim the coldest memory overall. Second, we keep copies in swap unless the page changes. If there is swap-backed data that's mostly read (tmpfs file) and has been swapped out before, we can reclaim it without incurring additional IO. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Link: http://lkml.kernel.org/r/20200520232525.798933-14-hannes@cmpxchg.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Since the LRUs were split into anon and file lists, the VM has been balancing between page cache and anonymous pages based on per-list ratios of scanned vs. rotated pages. In most cases that tips page reclaim towards the list that is easier to reclaim and has the fewest actively used pages, but there are a few problems with it: 1. Refaults and LRU rotations are weighted the same way, even though one costs IO and the other costs a bit of CPU. 2. The less we scan an LRU list based on already observed rotations, the more we increase the sampling interval for new references, and rotations become even more likely on that list. This can enter a death spiral in which we stop looking at one list completely until the other one is all but annihilated by page reclaim. Since commit a528910e ("mm: thrash detection-based file cache sizing") we have refault detection for the page cache. Along with swapin events, they are good indicators of when the file or anon list, respectively, is too small for its workingset and needs to grow. For example, if the page cache is thrashing, the cache pages need more time in memory, while there may be colder pages on the anonymous list. Likewise, if swapped pages are faulting back in, it indicates that we reclaim anonymous pages too aggressively and should back off. Replace LRU rotations with refaults and swapins as the basis for relative reclaim cost of the two LRUs. This will have the VM target list balances that incur the least amount of IO on aggregate. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Link: http://lkml.kernel.org/r/20200520232525.798933-12-hannes@cmpxchg.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
They're the same function, and for the purpose of all callers they are equivalent to lru_cache_add(). [akpm@linux-foundation.org: fix it for local_lock changes] Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NRik van Riel <riel@surriel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NMinchan Kim <minchan@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200520232525.798933-5-hannes@cmpxchg.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Swapin faults were the last event to charge pages after they had already been put on the LRU list. Now that we charge directly on swapin, the lrucare portion of the charge code is unused. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Link: http://lkml.kernel.org/r/20200508183105.225460-19-hannes@cmpxchg.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Right now, users that are otherwise memory controlled can easily escape their containment and allocate significant amounts of memory that they're not being charged for. That's because swap readahead pages are not being charged until somebody actually faults them into their page table. This can be exploited with MADV_WILLNEED, which triggers arbitrary readahead allocations without charging the pages. There are additional problems with the delayed charging of swap pages: 1. To implement refault/workingset detection for anonymous pages, we need to have a target LRU available at swapin time, but the LRU is not determinable until the page has been charged. 2. To implement per-cgroup LRU locking, we need page->mem_cgroup to be stable when the page is isolated from the LRU; otherwise, the locks change under us. But swapcache gets charged after it's already on the LRU, and even if we cannot isolate it ourselves (since charging is not exactly optional). The previous patch ensured we always maintain cgroup ownership records for swap pages. This patch moves the swapcache charging point from the fault handler to swapin time to fix all of the above problems. v2: simplify swapin error checking (Joonsoo) [hughd@google.com: fix livelock in __read_swap_cache_async()] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2005212246080.8458@eggly.anvilsSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Signed-off-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Link: http://lkml.kernel.org/r/20200508183105.225460-17-hannes@cmpxchg.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 6月, 2020 1 次提交
-
-
由 Qian Cai 提交于
"prev_offset" is a static variable in swapin_nr_pages() that can be accessed concurrently with only mmap_sem held in read mode as noticed by KCSAN, BUG: KCSAN: data-race in swap_cluster_readahead / swap_cluster_readahead write to 0xffffffff92763830 of 8 bytes by task 14795 on cpu 17: swap_cluster_readahead+0x2a6/0x5e0 swapin_readahead+0x92/0x8dc do_swap_page+0x49b/0xf20 __handle_mm_fault+0xcfb/0xd70 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x715 page_fault+0x34/0x40 1 lock held by (dnf)/14795: #0: ffff897bd2e98858 (&mm->mmap_sem#2){++++}-{3:3}, at: do_page_fault+0x143/0x715 do_user_addr_fault at arch/x86/mm/fault.c:1405 (inlined by) do_page_fault at arch/x86/mm/fault.c:1535 irq event stamp: 83493 count_memcg_event_mm+0x1a6/0x270 count_memcg_event_mm+0x119/0x270 __do_softirq+0x365/0x589 irq_exit+0xa2/0xc0 read to 0xffffffff92763830 of 8 bytes by task 1 on cpu 22: swap_cluster_readahead+0xfd/0x5e0 swapin_readahead+0x92/0x8dc do_swap_page+0x49b/0xf20 __handle_mm_fault+0xcfb/0xd70 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x715 page_fault+0x34/0x40 1 lock held by systemd/1: #0: ffff897c38f14858 (&mm->mmap_sem#2){++++}-{3:3}, at: do_page_fault+0x143/0x715 irq event stamp: 43530289 count_memcg_event_mm+0x1a6/0x270 count_memcg_event_mm+0x119/0x270 __do_softirq+0x365/0x589 irq_exit+0xa2/0xc0 Signed-off-by: NQian Cai <cai@lca.pw> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Marco Elver <elver@google.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200402213748.2237-1-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 4月, 2020 1 次提交
-
-
由 Wei Yang 提交于
add_to_swap_cache() and delete_from_swap_cache() are counterparts, while currently they use different ways to count pages. It doesn't break anything because we only have two sizes for PageAnon, but this is confusing and not good practice. This patch corrects it by making both functions use hpage_nr_pages(). Signed-off-by: NWei Yang <richard.weiyang@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Link: http://lkml.kernel.org/r/20200315012920.2687-1-richard.weiyang@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 9月, 2019 2 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
Transparent Huge Pages are currently stored in i_pages as pointers to consecutive subpages. This patch changes that to storing consecutive pointers to the head page in preparation for storing huge pages more efficiently in i_pages. Large parts of this are "inspired" by Kirill's patch https://lore.kernel.org/lkml/20170126115819.58875-2-kirill.shutemov@linux.intel.com/ Kirill and Huang Ying contributed several fixes. [willy@infradead.org: use compound_nr, squish uninit-var warning] Link: http://lkml.kernel.org/r/20190731210400.7419-1-willy@infradead.orgSigned-off-by: NMatthew Wilcox <willy@infradead.org> Acked-by: NJan Kara <jack@suse.cz> Reviewed-by: NKirill Shutemov <kirill@shutemov.name> Reviewed-by: NSong Liu <songliubraving@fb.com> Tested-by: NSong Liu <songliubraving@fb.com> Tested-by: NWilliam Kucharski <william.kucharski@oracle.com> Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com> Tested-by: NQian Cai <cai@lca.pw> Tested-by: NMikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Song Liu <songliubraving@fb.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Replace 1 << compound_order(page) with compound_nr(page). Minor improvements in readability. Link: http://lkml.kernel.org/r/20190721104612.19120-4-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 7月, 2019 2 次提交
-
-
由 Huang Ying 提交于
total_swapcache_pages() may race with swapper_spaces[] allocation and freeing. Previously, this is protected with a swapper_spaces[] specific RCU mechanism. To simplify the logic/code complexity, it is replaced with get/put_swap_device(). The code line number is reduced too. Although not so important, the swapoff() performance improves too because one synchronize_rcu() call during swapoff() is deleted. [ying.huang@intel.com: fix bad swap file entry warning] Link: http://lkml.kernel.org/r/20190531024102.21723-1-ying.huang@intel.com Link: http://lkml.kernel.org/r/20190527082714.12151-1-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Andrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Huang Ying 提交于
When swapin is performed, after getting the swap entry information from the page table, system will swap in the swap entry, without any lock held to prevent the swap device from being swapoff. This may cause the race like below, CPU 1 CPU 2 ----- ----- do_swap_page swapin_readahead __read_swap_cache_async swapoff swapcache_prepare p->swap_map = NULL __swap_duplicate p->swap_map[?] /* !!! NULL pointer access */ Because swapoff is usually done when system shutdown only, the race may not hit many people in practice. But it is still a race need to be fixed. To fix the race, get_swap_device() is added to check whether the specified swap entry is valid in its swap device. If so, it will keep the swap entry valid via preventing the swap device from being swapoff, until put_swap_device() is called. Because swapoff() is very rare code path, to make the normal path runs as fast as possible, rcu_read_lock/unlock() and synchronize_rcu() instead of reference count is used to implement get/put_swap_device(). >From get_swap_device() to put_swap_device(), RCU reader side is locked, so synchronize_rcu() in swapoff() will wait until put_swap_device() is called. In addition to swap_map, cluster_info, etc. data structure in the struct swap_info_struct, the swap cache radix tree will be freed after swapoff, so this patch fixes the race between swap cache looking up and swapoff too. Races between some other swap cache usages and swapoff are fixed too via calling synchronize_rcu() between clearing PageSwapCache() and freeing swap cache data structure. Another possible method to fix this is to use preempt_off() + stop_machine() to prevent the swap device from being swapoff when its data structure is being accessed. The overhead in hot-path of both methods is similar. The advantages of RCU based method are, 1. stop_machine() may disturb the normal execution code path on other CPUs. 2. File cache uses RCU to protect its radix tree. If the similar mechanism is used for swap cache too, it is easier to share code between them. 3. RCU is used to protect swap cache in total_swapcache_pages() and exit_swap_address_space() already. The two mechanisms can be merged to simplify the logic. Link: http://lkml.kernel.org/r/20190522015423.14418-1-ying.huang@intel.com Fixes: 235b6217 ("mm/swap: add cluster lock") Signed-off-by: N"Huang, Ying" <ying.huang@intel.com> Reviewed-by: NAndrea Parri <andrea.parri@amarulasolutions.com> Not-nacked-by: NHugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dave Jiang <dave.jiang@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-