- 14 10月, 2020 40 次提交
-
-
由 Muchun Song 提交于
In the cgroup v1, we have a numa_stat interface. This is useful for providing visibility into the numa locality information within an memcg since the pages are allowed to be allocated from any physical node. One of the use cases is evaluating application performance by combining this information with the application's CPU allocation. But the cgroup v2 does not. So this patch adds the missing information. Suggested-by: NShakeel Butt <shakeelb@google.com> Signed-off-by: NMuchun Song <songmuchun@bytedance.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Zefan Li <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://lkml.kernel.org/r/20200916100030.71698-2-songmuchun@bytedance.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Waiman Long 提交于
The swap page counter is v2 only while memsw is v1 only. As v1 and v2 controllers cannot be active at the same time, there is no point to keep both swap and memsw page counters in mem_cgroup. The previous patch has made sure that memsw page counter is updated and accessed only when in v1 code paths. So it is now safe to alias the v1 memsw page counter to v2 swap page counter. This saves 14 long's in the size of mem_cgroup. This is a saving of 112 bytes for 64-bit archs. While at it, also document which page counters are used in v1 and/or v2. Signed-off-by: NWaiman Long <longman@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NShakeel Butt <shakeelb@google.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Chris Down <chris@chrisdown.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Yafang Shao <laoar.shao@gmail.com> Link: https://lkml.kernel.org/r/20200914024452.19167-4-longman@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Waiman Long 提交于
mem_cgroup_get_max() used to get memory+swap max from both the v1 memsw and v2 memory+swap page counters & return the maximum of these 2 values. This is redundant and it is more efficient to just get either the v1 or the v2 values depending on which one is currently in use. [longman@redhat.com: v4] Link: https://lkml.kernel.org/r/20200914150928.7841-1-longman@redhat.comSigned-off-by: NWaiman Long <longman@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NShakeel Butt <shakeelb@google.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Chris Down <chris@chrisdown.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Yafang Shao <laoar.shao@gmail.com> Link: https://lkml.kernel.org/r/20200914024452.19167-3-longman@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Waiman Long 提交于
Patch series "mm/memcg: Miscellaneous cleanups and streamlining", v2. This patch (of 3): Since commit 0a31bc97 ("mm: memcontrol: rewrite uncharge API") and commit 00501b53 ("mm: memcontrol: rewrite charge API") in v3.17, the enum charge_type was no longer used anywhere. However, the enum itself was not removed at that time. Remove the obsolete enum charge_type now. Signed-off-by: NWaiman Long <longman@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NShakeel Butt <shakeelb@google.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NChris Down <chris@chrisdown.name> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Yafang Shao <laoar.shao@gmail.com> Link: https://lkml.kernel.org/r/20200914024452.19167-1-longman@redhat.com Link: https://lkml.kernel.org/r/20200914024452.19167-2-longman@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Miaohe Lin 提交于
Since commit bbec2e15 ("mm: rename page_counter's count/limit into usage/max"), the arg @reclaim has no priority field anymore. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Link: https://lkml.kernel.org/r/20200913094129.44558-1-linmiaohe@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Roman Gushchin 提交于
mem_cgroup_from_obj() checks the lowest bit of the page->mem_cgroup pointer to determine if the page has an attached obj_cgroup vector instead of a regular memcg pointer. If it's not set, it simple returns the page->mem_cgroup value as a struct mem_cgroup pointer. The commit 10befea9 ("mm: memcg/slab: use a single set of kmem_caches for all allocations") changed the moment when this bit is set: if previously it was set on the allocation of the slab page, now it can be set well after, when the first accounted object is allocated on this page. It opened a race: if page->mem_cgroup is set concurrently after the first page_has_obj_cgroups(page) check, a pointer to the obj_cgroups array can be returned as a memory cgroup pointer. A simple check for page->mem_cgroup pointer for NULL before the page_has_obj_cgroups() check fixes the race. Indeed, if the pointer is not NULL, it's either a simple mem_cgroup pointer or a pointer to obj_cgroup vector. The pointer can be asynchronously changed from NULL to (obj_cgroup_vec | 0x1UL), but can't be changed from a valid memcg pointer to objcg vector or back. If the object passed to mem_cgroup_from_obj() is a slab object and page->mem_cgroup is NULL, it means that the object is not accounted, so the function must return NULL. I've discovered the race looking at the code, so far I haven't seen it in the wild. Fixes: 10befea9 ("mm: memcg/slab: use a single set of kmem_caches for all allocations") Signed-off-by: NRoman Gushchin <guro@fb.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: https://lkml.kernel.org/r/20200910022435.2773735-1-guro@fb.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gustavo A. R. Silva 提交于
Use the preferred form for passing the size of a structure type. The alternative form where the structure type is spelled out hurts readability and introduces an opportunity for a bug when the object type is changed but the corresponding object identifier to which the sizeof operator is applied is not. Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Link: https://lkml.kernel.org/r/773e013ff2f07fe2a0b47153f14dea054c0c04f1.1596214831.git.gustavoars@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gustavo A. R. Silva 提交于
Make use of the flex_array_size() helper to calculate the size of a flexible array member within an enclosing structure. This helper offers defense-in-depth against potential integer overflows, while at the same time makes it explicitly clear that we are dealing with a flexible array member. Also, remove unnecessary braces. Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Link: https://lkml.kernel.org/r/ddd60dae2d9aea1ccdd2be66634815c93696125e.1596214831.git.gustavoars@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ira Weiny 提交于
While reviewing Protection Key Supervisor support it was pointed out that using a counter to track static branch enable was an anti-pattern which was better solved using the provided static_branch_{inc,dec} functions.[1] Fix up devmap_managed_key to work the same way. Also this should be safer because there is a very small (very unlikely) race when multiple callers try to enable at the same time. [1] https://lore.kernel.org/lkml/20200714194031.GI5523@worktop.programming.kicks-ass.net/Signed-off-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Link: https://lkml.kernel.org/r/20200810235319.2796597-1-ira.weiny@intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Miaohe Lin 提交于
If we failed to drain inode, we would forget to free the swap address space allocated by init_swap_address_space() above. Fixes: dc617f29 ("vfs: don't allow writes to swap files") Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Link: https://lkml.kernel.org/r/20200930101803.53884-1-linmiaohe@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Miaohe Lin 提交于
It's unnecessary to goto the out label while out label is just below. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200930102549.1885-1-linmiaohe@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Miaohe Lin 提交于
Since commit 9c4e6b1a ("mm, mlock, vmscan: no more skipping pagevecs"), unevictable pages do not goes directly back onto zone's unevictable list. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Shakeel Butt <shakeelb@google.com> Link: https://lkml.kernel.org/r/20200927122209.59328-1-linmiaohe@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Miaohe Lin 提交于
The out label is only used in one place and return ret directly without something like resource cleanup or lock release and so on. So we should remove this jump label and do some cleanup. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200927124032.22521-1-linmiaohe@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Miaohe Lin 提交于
enable_swap_slots_cache() always return zero and its return value is just ignored by the caller. So make enable_swap_slots_cache() void. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200924113554.50614-1-linmiaohe@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Miaohe Lin 提交于
Since commit 07d80269 ("mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages"), we have renamed the func put_devmap_managed_page() to page_is_devmap_managed(). Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Link: https://lkml.kernel.org/r/20200905084453.19353-1-linmiaohe@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yu Zhao 提交于
To activate a page, mark_page_accessed() always holds a reference on it. It either gets a new reference when adding a page to lru_pvecs.activate_page or reuses an existing one it previously got when it added a page to lru_pvecs.lru_add. So it doesn't call SetPageActive() on a page that doesn't have any reference left. Therefore, the race is impossible these days (I didn't brother to dig into its history). For other paths, namely reclaim and migration, a reference count is always held while calling SetPageActive() on a page. SetPageSlabPfmemalloc() also uses SetPageActive(), but it's irrelevant to LRU pages. Signed-off-by: NYu Zhao <yuzhao@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NYang Shi <shy828301@gmail.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Qian Cai <cai@lca.pw> Link: http://lkml.kernel.org/r/20200818184704.3625199-2-yuzhao@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yu Zhao 提交于
We don't initially add anon pages to active lruvec after commit b518154e ("mm/vmscan: protect the workingset on anonymous LRU"). Remove activate_page() from unuse_pte(), which seems to be missed by the commit. And make the function static while we are at it. Before the commit, we called lru_cache_add_active_or_unevictable() to add new ksm pages to active lruvec. Therefore, activate_page() wasn't necessary for them in the first place. Signed-off-by: NYu Zhao <yuzhao@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NYang Shi <shy828301@gmail.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200818184704.3625199-1-yuzhao@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gao Xiang 提交于
SWP_FS is used to make swap_{read,write}page() go through the filesystem, and it's only used for swap files over NFS for now. Otherwise it will directly submit IO to blockdev according to swapfile extents reported by filesystems in advance. As Matthew pointed out [1], SWP_FS naming is somewhat confusing, so let's rename to SWP_FS_OPS. [1] https://lore.kernel.org/r/20200820113448.GM17456@casper.infradead.orgSuggested-by: NMatthew Wilcox <willy@infradead.org> Signed-off-by: NGao Xiang <hsiangkao@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200822113019.11319-1-hsiangkao@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
As suggested by Dan Carpenter, fortify unpin_user_pages() just a bit, against a typical caller mistake: check if the npages arg is really a -ERRNO value, which would blow up the unpinning loop: WARN and return. If this new WARN_ON() fires, then the system *might* be leaking pages (by leaving them pinned), but probably not. More likely, gup/pup returned a hard -ERRNO error to the caller, who erroneously passed it here. Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com> Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Souptick Joarder <jrdr.linux@gmail.com> Link: https://lkml.kernel.org/r/20200917065706.409079-1-jhubbard@nvidia.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Barry Song 提交于
gup prohibits users from calling get_user_pages() with FOLL_PIN. But it allows users to call get_user_pages() with FOLL_LONGTERM only. It seems insensible. Since FOLL_LONGTERM is a stricter case of FOLL_PIN, we should prohibit users from calling get_user_pages() with FOLL_LONGTERM while not with FOLL_PIN. mm/gup_benchmark.c used to be the only user who did this improperly. But it has been fixed by moving to use pin_user_pages(). [akpm@linux-foundation.org: fix CONFIG_MMU=n build] Link: https://lkml.kernel.org/r/CA+G9fYuNS3k0DVT62twfV746pfNhCSrk5sVMcOcQ1PGGnEseyw@mail.gmail.comSigned-off-by: NBarry Song <song.bao.hua@hisilicon.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Link: http://lkml.kernel.org/r/20200819110100.23504-1-song.bao.hua@hisilicon.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Barry Song 提交于
According to Documentation/core-api/pin_user_pages.rst, FOLL_PIN is a prerequisite to FOLL_LONGTERM. Another way of saying that is, FOLL_LONGTERM is a specific case, more restrictive case of FOLL_PIN. Almost all kernel modules are using pin_user_pages() with FOLL_LONGTERM, mm/gup_benchmark.c seems to the only exception in which FOLL_PIN is not a prerequisite to FOLL_LONGTERM. Signed-off-by: NBarry Song <song.bao.hua@hisilicon.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200815122056.29508-1-song.bao.hua@hisilicon.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Barry Song 提交于
In the beginning, mm/gup_benchmark.c supported get_user_pages_fast() only, but right now, it supports the benchmarking of a couple of get_user_pages() related calls like: * get_user_pages_fast() * get_user_pages() * pin_user_pages_fast() * pin_user_pages() The documentation is confusing and needs update. Signed-off-by: NBarry Song <song.bao.hua@hisilicon.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20200821032546.19992-1-song.bao.hua@hisilicon.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yafang Shao 提交于
Our users reported that there're some random latency spikes when their RT process is running. Finally we found that latency spike is caused by FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on remote CPUs, and then waits the per-cpu work to complete. The wait time is uncertain, which may be tens millisecond. That behavior is unreasonable, because this process is bound to a specific CPU and the file is only accessed by itself, IOW, there should be no pagecache pages on a per-cpu pagevec of a remote CPU. That unreasonable behavior is partially caused by the wrong comparation of the number of invalidated pages and the number of the target. For example, if (count < (end_index - start_index + 1)) The count above is how many pages were invalidated in the local CPU, and (end_index - start_index + 1) is how many pages should be invalidated. The usage of (end_index - start_index + 1) is incorrect, because they are virtual addresses, which may not mapped to pages. Besides that, there may be holes between start and end. So we'd better check whether there are still pages on per-cpu pagevec after drain the local cpu, and then decide whether or not to call lru_add_drain_all(). After I applied it with a hotfix to our production environment, most of the lru_add_drain_all() can be avoided. Suggested-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NYafang Shao <laoar.shao@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NMel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Link: https://lkml.kernel.org/r/20200923133318.14373-1-laoar.shao@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
We dereference page->mapping and page->index directly after calling find_subpage() and these fields are not valid for tail pages. While commit 4101196b ("mm: page cache: store only head pages in i_pages") introduced the call to find_subpage(), the problem existed prior to this; I'm going to suggest all the way back to when THPs first existed. The user-visible effects of this are almost negligible. To hit it, you have to mmap a tmpfs file at an unaligned address and then it's only a disabled optimisation causing page faults to happen more frequently than they otherwise would. Fix this by keeping both head and page pointers and checking the appropriate one. We could use page_mapping() and page_to_index(), but that's higher overhead. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Link: https://lkml.kernel.org/r/20200911012532.24761-1-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Add a new FGP_HEAD flag which avoids calling find_subpage() and add a convenience wrapper for it. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Huang Ying <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Auld <matthew.auld@intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Link: https://lkml.kernel.org/r/20200910183318.20139-9-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Convert shmem_getpage_gfp() (the only remaining caller of find_lock_entry()) to cope with a head page being returned instead of the subpage for the index. [willy@infradead.org: fix BUG()s] Link https://lore.kernel.org/linux-mm/20200912032042.GA6583@casper.infradead.org/Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Huang Ying <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Auld <matthew.auld@intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Link: https://lkml.kernel.org/r/20200910183318.20139-8-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
There are only four callers remaining of find_get_entry(). get_shadow_from_swap_cache() only wants to see shadow entries and doesn't care about which page is returned. Push the find_subpage() call into find_lock_entry(), find_get_incore_page() and pagecache_get_page(). [willy@infradead.org: fix oops] Link: https://lkml.kernel.org/r/20200914112738.GM6583@casper.infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Huang Ying <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Auld <matthew.auld@intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Link: https://lkml.kernel.org/r/20200910183318.20139-7-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
i915 does not want to see value entries. Switch it to use find_lock_page() instead, and remove the export of find_lock_entry(). Move find_lock_entry() and find_get_entry() to mm/internal.h to discourage any future use. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Huang Ying <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Link: https://lkml.kernel.org/r/20200910183318.20139-6-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Avoid bumping the refcount on pages when we're only interested in the swap entries. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Huang Ying <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Link: https://lkml.kernel.org/r/20200910183318.20139-5-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Instead of calling find_get_entry() for every page index, use an XArray iterator to skip over NULL entries, and avoid calling get_page(), because we only want the swap entries. [willy@infradead.org: fix LTP soft lockups] Link: https://lkml.kernel.org/r/20200914165032.GS6583@casper.infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Huang Ying <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Qian Cai <cai@redhat.com> Link: https://lkml.kernel.org/r/20200910183318.20139-4-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
The current code does not protect against swapoff of the underlying swap device, so this is a bug fix as well as a worthwhile reduction in code complexity. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Huang Ying <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Auld <matthew.auld@intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Link: https://lkml.kernel.org/r/20200910183318.20139-3-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Patch series "Return head pages from find_*_entry", v2. This patch series started out as part of the THP patch set, but it has some nice effects along the way and it seems worth splitting it out and submitting separately. Currently find_get_entry() and find_lock_entry() return the page corresponding to the requested index, but the first thing most callers do is find the head page, which we just threw away. As part of auditing all the callers, I found some misuses of the APIs and some plain inefficiencies that I've fixed. The diffstat is unflattering, but I added more kernel-doc and a new wrapper. This patch (of 8); Provide this functionality from the swap cache. It's useful for more than just mincore(). Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Huang Ying <ying.huang@intel.com> Link: https://lkml.kernel.org/r/20200910183318.20139-1-willy@infradead.org Link: https://lkml.kernel.org/r/20200910183318.20139-2-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
Rename head_pincount() --> head_compound_pincount(). These names are more accurate (or less misleading) than the original ones. Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Qian Cai <cai@lca.pw> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: William Kucharski <william.kucharski@oracle.com> Link: https://lkml.kernel.org/r/20200807183358.105097-1-jhubbard@nvidia.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
__dump_page() checks i_dentry is fetchable and i_ino is earlier in the struct than i_ino, so it ought to work fine, but it's possible that struct randomisation has reordered i_ino after i_dentry and the pointer is just wild enough that i_dentry is fetchable and i_ino isn't. Also print the inode number if the dentry is invalid. Reported-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com> Reviewed-by: NMike Rapoport <rppt@linux.ibm.com> Link: https://lkml.kernel.org/r/20200819185710.28180-1-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joao Martins 提交于
Add a sysfs attribute which denotes a range from the dax region to be allocated. It's an write only @mapping sysfs attribute in the format of '<start>-<end>' to allocate a range. @start and @end use hexadecimal values and the @pgoff is implicitly ordered wrt to previous writes to @mapping sysfs e.g. a write of a range of length 1G the pgoff is 0..1G(-4K), a second write will use @pgoff for 1G+4K..<size>. This range mapping interface is useful for: 1) Application which want to implement its own allocation logic, and thus pick the desired ranges from dax_region. 2) For use cases like VMM fast restart[0] where after kexec we want to the same gpa<->phys mappings (as originally created before kexec). [0] https://static.sched.com/hosted_files/kvmforum2019/66/VMM-fast-restart_kvmforum2019.pdfSigned-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643106970.4062302.10402616567780784722.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lore.kernel.org/r/20200716172913.19658-5-joao.m.martins@oracle.com Link: https://lkml.kernel.org/r/160106119570.30709.4548889722645210610.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joao Martins 提交于
Introduce a new module parameter for dax_hmem which initializes all region devices as free, rather than allocating a pagemap for the region by default. All hmem devices created with dax_hmem.region_idle=1 will have full available size for creating dynamic dax devices. Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643106460.4062302.5868522341307530091.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lore.kernel.org/r/20200716172913.19658-4-joao.m.martins@oracle.com Link: https://lkml.kernel.org/r/160106119033.30709.11249962152222193448.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
Introduce a device align attribute. While doing so, rename the region align attribute to be more explicitly named as so, but keep it named as @align to retain the API for tools like daxctl. Changes on align may not always be valid, when say certain mappings were created with 2M and then we switch to 1G. So, we validate all ranges against the new value being attempted, post resizing. Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643105944.4062302.3131761052969132784.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lore.kernel.org/r/20200716172913.19658-3-joao.m.martins@oracle.com Link: https://lkml.kernel.org/r/160106118486.30709.13012322227204800596.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joao Martins 提交于
Introduce @align to struct dev_dax. When creating a new device, we still initialize to the default dax_region @align. Child devices belonging to a region may wish to keep a different alignment property instead of a global region-defined one. Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643105377.4062302.4159447829955683131.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lore.kernel.org/r/20200716172913.19658-2-joao.m.martins@oracle.com Link: https://lkml.kernel.org/r/160106117957.30709.1142303024324655705.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
In support of interrogating the physical address layout of a device with dis-contiguous ranges, introduce a sysfs directory with 'start', 'end', and 'page_offset' attributes. The alternative is trying to parse /proc/iomem, and that file will not reflect the extent layout until the device is enabled. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643104819.4062302.13691281391423291589.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106117446.30709.2751020815463722537.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
Break the requirement that device-dax instances are physically contiguous. With this constraint removed it allows fragmented available capacity to be fully allocated. This capability is useful to mitigate the "noisy neighbor" problem with memory-side-cache management for virtual machines, or any other scenario where a platform address boundary also designates a performance boundary. For example a direct mapped memory side cache might rotate cache colors at 1GB boundaries. With dis-contiguous allocations a device-dax instance could be configured to contain only 1 cache color. It also satisfies Joao's use case (see link) for partitioning memory for exclusive guest access. It allows for a future potential mode where the host kernel need not allocate 'struct page' capacity up-front. Reported-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/lkml/20200110190313.17144-1-joao.m.martins@oracle.com/ Link: https://lkml.kernel.org/r/159643104304.4062302.16561669534797528660.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106116875.30709.11456649969327399771.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-