- 13 12月, 2016 40 次提交
-
-
由 Christoph Hellwig 提交于
The purge_lock spinlock causes high latencies with non RT kernel. This has been reported multiple times on lkml [1] [2] and affects applications like audio. This patch replaces it with a mutex to allow preemption while holding the lock. Thanks to Joel Fernandes for the detailed report and analysis as well as an earlier attempt at fixing this issue. [1] http://lists.openwall.net/linux-kernel/2016/03/23/29 [2] https://lkml.org/lkml/2016/10/9/59 Link: http://lkml.kernel.org/r/1479474236-4139-10-git-send-email-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NJisheng Zhang <jszhang@marvell.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Dias <joaodias@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
We will take a sleeping lock in later in this series, so this adds the proper safeguards. Link: http://lkml.kernel.org/r/1479474236-4139-9-git-send-email-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NJisheng Zhang <jszhang@marvell.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Dias <joaodias@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Ryabinin 提交于
vfree() is going to use sleeping lock. free_ldt_struct() may be called with disabled preemption, therefore we must use vfree_atomic() here. E.g. call trace: vfree() free_ldt_struct() destroy_context_ldt() __mmdrop() finish_task_switch() schedule_tail() ret_from_fork() Link: http://lkml.kernel.org/r/1479474236-4139-7-git-send-email-hch@lst.deSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Jisheng Zhang <jszhang@marvell.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Dias <joaodias@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Ryabinin 提交于
vfree() is going to use sleeping lock. Thread stack freed in atomic context, therefore we must use vfree_atomic() here. Link: http://lkml.kernel.org/r/1479474236-4139-6-git-send-email-hch@lst.deSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Jisheng Zhang <jszhang@marvell.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Dias <joaodias@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Ryabinin 提交于
We are going to use sleeping lock for freeing vmap. However some vfree() users want to free memory from atomic (but not from interrupt) context. For this we add vfree_atomic() - deferred variation of vfree() which can be used in any atomic context (except NMIs). [akpm@linux-foundation.org: tweak comment grammar] [aryabinin@virtuozzo.com: use raw_cpu_ptr() instead of this_cpu_ptr()] Link: http://lkml.kernel.org/r/1481553981-3856-1-git-send-email-aryabinin@virtuozzo.com Link: http://lkml.kernel.org/r/1479474236-4139-5-git-send-email-hch@lst.deSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Jisheng Zhang <jszhang@marvell.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Dias <joaodias@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Move the purge_lock synchronization to the callers, move the call to purge_fragmented_blocks_allcpus at the beginning of the function to the callers that need it, move the force_flush behavior to the caller that needs it, and pass start and end by value instead of by reference. No change in behavior. Link: http://lkml.kernel.org/r/1479474236-4139-4-git-send-email-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NJisheng Zhang <jszhang@marvell.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Dias <joaodias@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Just inline it into the only caller. Link: http://lkml.kernel.org/r/1479474236-4139-3-git-send-email-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NJisheng Zhang <jszhang@marvell.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Dias <joaodias@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Patch series "reduce latency in __purge_vmap_area_lazy", v2. This patch (of 10): Sort out the long lock hold times in __purge_vmap_area_lazy. It is based on a patch from Joel. Inline free_unmap_vmap_area_noflush() it into the only caller. Link: http://lkml.kernel.org/r/1479474236-4139-2-git-send-email-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NJisheng Zhang <jszhang@marvell.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Dias <joaodias@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Since commit 59dc76b0 ("mm: vmscan: reduce size of inactive file list") the size of the active file list is no longer limited to half of memory. Increase the shadow node limit accordingly to avoid throwing out shadow entries that might still result in eligible refaults. The exact size of the active list now depends on the overall size of the page cache, but converges toward taking up most of the space: In mm/vmscan.c::inactive_list_is_low(), * total target max * memory ratio inactive * ------------------------------------- * 10MB 1 5MB * 100MB 1 50MB * 1GB 3 250MB * 10GB 10 0.9GB * 100GB 31 3GB * 1TB 101 10GB * 10TB 320 32GB It would be possible to apply the same precise ratios when determining the limit for radix tree nodes containing shadow entries, but since it is merely an approximation of the oldest refault distances in the wild and the code also makes assumptions about the node population density, keep it simple and always target the full cache size. While at it, clarify the comment and the formula for memory footprint. Link: http://lkml.kernel.org/r/20161117214701.29000-1-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Shadow entries in the page cache used to be accounted behind the radix tree implementation's back in the upper bits of node->count, and the radix tree code extending a single-entry tree with a shadow entry in root->rnode would corrupt that counter. As a result, we could not put shadow entries at index 0 if the tree didn't have any other entries, and that means no refault detection for any single-page file. Now that the shadow entries are tracked natively in the radix tree's exceptional counter, this is no longer necessary. Extending and shrinking the tree from and to single entries in root->rnode now does the right thing when the entry is exceptional, remove that limitation. Link: http://lkml.kernel.org/r/20161117193244.GF23430@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Currently, we track the shadow entries in the page cache in the upper bits of the radix_tree_node->count, behind the back of the radix tree implementation. Because the radix tree code has no awareness of them, we rely on random subtleties throughout the implementation (such as the node->count != 1 check in the shrinking code, which is meant to exclude multi-entry nodes but also happens to skip nodes with only one shadow entry, as that's accounted in the upper bits). This is error prone and has, in fact, caused the bug fixed in d3798ae8 ("mm: filemap: don't plant shadow entries without radix tree node"). To remove these subtleties, this patch moves shadow entry tracking from the upper bits of node->count to the existing counter for exceptional entries. node->count goes back to being a simple counter of valid entries in the tree node and can be shrunk to a single byte. This vastly simplifies the page cache code. All accounting happens natively inside the radix tree implementation, and maintaining the LRU linkage of shadow nodes is consolidated into a single function in the workingset code that is called for leaf nodes affected by a change in the page cache tree. This also removes the last user of the __radix_delete_node() return value. Eliminate it. Link: http://lkml.kernel.org/r/20161117193211.GE23430@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Support handing __radix_tree_replace() a callback that gets invoked for all leaf nodes that change or get freed as a result of the slot replacement, to assist users tracking nodes with node->private_list. This prepares for putting page cache shadow entries into the radix tree root again and drastically simplifying the shadow tracking. Link: http://lkml.kernel.org/r/20161117193134.GD23430@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Suggested-by: NJan Kara <jack@suse.cz> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Page cache shadow entry handling will be a lot simpler when it can use a single generic replacement function for pages, shadow entries, and emptying slots. Make __radix_tree_replace() properly account insertions and deletions in node->count and garbage collect nodes as they become empty. Then re-implement radix_tree_delete() on top of it. Link: http://lkml.kernel.org/r/20161117193058.GC23430@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
The bug in khugepaged fixed earlier in this series shows that radix tree slot replacement is fragile; and it will become more so when not only NULL<->!NULL transitions need to be caught but transitions from and to exceptional entries as well. We need checks. Re-implement radix_tree_replace_slot() on top of the sanity-checked __radix_tree_replace(). This requires existing callers to also pass the radix tree root, but it'll warn us when somebody replaces slots with contents that need proper accounting (transitions between NULL entries, real entries, exceptional entries) and where a replacement through the slot pointer would corrupt the radix tree node counts. Link: http://lkml.kernel.org/r/20161117193021.GB23430@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Suggested-by: NJan Kara <jack@suse.cz> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
The way the page cache is sneaking shadow entries of evicted pages into the radix tree past the node entry accounting and tracking them manually in the upper bits of node->count is fraught with problems. These shadow entries are marked in the tree as exceptional entries, which are a native concept to the radix tree. Maintain an explicit counter of exceptional entries in the radix tree node. Subsequent patches will switch shadow entry tracking over to that counter. DAX and shmem are the other users of exceptional entries. Since slot replacements that change the entry type from regular to exceptional must now be accounted, introduce a __radix_tree_replace() function that does replacement and accounting, and switch DAX and shmem over. The increase in radix tree node size is temporary. A followup patch switches the shadow tracking to this new scheme and we'll no longer need the upper bits in node->count and shrink that back to one byte. Link: http://lkml.kernel.org/r/20161117192945.GA23430@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
When the shadow page shrinker tries to reclaim a radix tree node but finds it in an unexpected state - it should contain no pages, and non-zero shadow entries - there is no need to kill the executing task or even the entire system. Warn about the invalid state, then leave that tree node be. Simply don't put it back on the shadow LRU for future reclaim and move on. Link: http://lkml.kernel.org/r/20161117191138.22769-4-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
The radix tree counts valid entries in each tree node. Entries stored in the tree cannot be removed by simpling storing NULL in the slot or the internal counters will be off and the node never gets freed again. When collapsing a shmem page fails, restore the holes that were filled with radix_tree_insert() with a proper radix tree deletion. Fixes: f3f0e1d2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Link: http://lkml.kernel.org/r/20161117191138.22769-3-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reported-by: NJan Kara <jack@suse.cz> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Patch series "mm: workingset: radix tree subtleties & single-page file refaults", v3. This is another revision of the radix tree / workingset patches based on feedback from Jan and Kirill. This is a follow-up to d3798ae8 ("mm: filemap: don't plant shadow entries without radix tree node"). That patch fixed an issue that was caused mainly by the page cache sneaking special shadow page entries into the radix tree and relying on subtleties in the radix tree code to make that work. The fix also had to stop tracking refaults for single-page files because shadow pages stored as direct pointers in radix_tree_root->rnode weren't properly handled during tree extension. These patches make the radix tree code explicitely support and track such special entries, to eliminate the subtleties and to restore the thrash detection for single-page files. This patch (of 9): When a radix tree iteration drops the tree lock, another thread might swoop in and free the node holding the current slot. The iteration needs to do another tree lookup from the current index to continue. [kirill.shutemov@linux.intel.com: re-lookup for replacement] Fixes: f3f0e1d2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Link: http://lkml.kernel.org/r/20161117191138.22769-2-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
Move the 4-byte `capabilities' field next to other 4-byte things. Shrinks sizeof(backing_dev_info) by 8 bytes on x86_64. Reviewed-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jens Axboe 提交于
We ran into a funky issue, where someone doing 256K buffered reads saw 128K requests at the device level. Turns out it is read-ahead capping the request size, since we use 128K as the default setting. This doesn't make a lot of sense - if someone is issuing 256K reads, they should see 256K reads, regardless of the read-ahead setting, if the underlying device can support a 256K read in a single command. This patch introduces a bdi hint, io_pages. This is the soft max IO size for the lower level, I've hooked it up to the bdev settings here. Read-ahead is modified to issue the maximum of the user request size, and the read-ahead max size, but capped to the max request size on the device side. The latter is done to avoid reading ahead too much, if the application asks for a huge read. With this patch, the kernel behaves like the application expects. Link: http://lkml.kernel.org/r/1479498073-8657-1-git-send-email-axboe@fb.comSigned-off-by: NJens Axboe <axboe@fb.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jérémy Lefaure 提交于
Compiling shmem.c with SHMEM and TRANSAPRENT_HUGE_PAGECACHE enabled raises warnings on two unused functions when CONFIG_TMPFS and CONFIG_SYSFS are both disabled: mm/shmem.c:390:20: warning: `shmem_format_huge' defined but not used [-Wunused-function] static const char *shmem_format_huge(int huge) ^~~~~~~~~~~~~~~~~ mm/shmem.c:373:12: warning: `shmem_parse_huge' defined but not used [-Wunused-function] static int shmem_parse_huge(const char *str) ^~~~~~~~~~~~~~~~ A conditional compilation on tmpfs or sysfs removes the warnings. Link: http://lkml.kernel.org/r/20161118055749.11313-1-jeremy.lefaure@lse.epita.frSigned-off-by: NJérémy Lefaure <jeremy.lefaure@lse.epita.fr> Acked-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tahsin Erdogan 提交于
b_more_io non-empty check is already preceded by an opposite check. Link: http://lkml.kernel.org/r/1478591249-30641-1-git-send-email-tahsin@google.comSigned-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
Unlike THP, hugetlb pages are represented by one entry in the radix-tree. [akpm@linux-foundation.org: tweak comment] Link: http://lkml.kernel.org/r/20161110163640.126124-1-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Thierry Reding 提交于
The header uses types and definitions from the linux/init.h as well as linux/types.h headers without explicitly including them. This causes a failure to compile if they are not implicitly pulled in by includers. Link: http://lkml.kernel.org/r/20161115133235.13387-1-thierry.reding@gmail.comSigned-off-by: NThierry Reding <treding@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
Mark dax vmas as not migratable to exclude them from task_numa_work(). This is especially relevant for device-dax which wants to ensure predictable access latency and not incur periodic faults. [akpm@linux-foundation.org: add comment] Link: http://lkml.kernel.org/r/147892450132.22062.16875659431109209179.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com> Reported-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heiko Carstens 提交于
Having code for the pkey_mprotect, pkey_alloc and pkey_free system calls makes only sense if ARCH_HAS_PKEYS is selected. If not selected these system calls will always return -ENOSPC or -EINVAL. To simplify things and have less code generate the pkey system call code only if ARCH_HAS_PKEYS is selected. For architectures which have already wired up the system calls, but do not select ARCH_HAS_PKEYS this will result in less generated code and a different return code: the three system calls will now always return -ENOSYS, using the cond_syscall mechanism. For architectures which have not wired up the system calls less unreachable code will be generated. Link: http://lkml.kernel.org/r/20161114111251.70084-1-heiko.carstens@de.ibm.comSigned-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Acked-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Reza Arbab 提交于
Summarize the "hotpluggable" property of dt memory nodes. Link: http://lkml.kernel.org/r/1479160961-25840-6-git-send-email-arbab@linux.vnet.ibm.comSigned-off-by: NReza Arbab <arbab@linux.vnet.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alistair Popple <apopple@au1.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Reza Arbab 提交于
When movable nodes are enabled, any node containing only hotpluggable memory is made movable at boot time. On x86, hotpluggable memory is discovered by parsing the ACPI SRAT, making corresponding calls to memblock_mark_hotplug(). If we introduce a dt property to describe memory as hotpluggable, configs supporting early fdt may then also do this marking and use movable nodes. Link: http://lkml.kernel.org/r/1479160961-25840-5-git-send-email-arbab@linux.vnet.ibm.comSigned-off-by: NReza Arbab <arbab@linux.vnet.ibm.com> Tested-by: NBalbir Singh <bsingharora@gmail.com> Acked-by: NBalbir Singh <bsingharora@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alistair Popple <apopple@au1.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Reza Arbab 提交于
To support movable memory nodes (CONFIG_MOVABLE_NODE), at least one of the following must be true: 1. This config has the capability to identify movable nodes at boot. Right now, only x86 can do this. 2. Our config supports memory hotplug, which means that a movable node can be created by hotplugging all of its memory into ZONE_MOVABLE. Fix the Kconfig definition of CONFIG_MOVABLE_NODE, which currently recognizes (1), but not (2). Link: http://lkml.kernel.org/r/1479160961-25840-4-git-send-email-arbab@linux.vnet.ibm.comSigned-off-by: NReza Arbab <arbab@linux.vnet.ibm.com> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NBalbir Singh <bsingharora@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alistair Popple <apopple@au1.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Reza Arbab 提交于
In commit c5320926 ("mem-hotplug: introduce movable_node boot option"), the memblock allocation direction is changed to bottom-up and then back to top-down like this: 1. memblock_set_bottom_up(true), called by cmdline_parse_movable_node(). 2. memblock_set_bottom_up(false), called by x86's numa_init(). Even though (1) occurs in generic mm code, it is wrapped by #ifdef CONFIG_MOVABLE_NODE, which depends on X86_64. This means that when we extend CONFIG_MOVABLE_NODE to non-x86 arches, things will be unbalanced. (1) will happen for them, but (2) will not. This toggle was added in the first place because x86 has a delay between adding memblocks and marking them as hotpluggable. Since other arches do this marking either immediately or not at all, they do not require the bottom-up toggle. So, resolve things by moving (1) from cmdline_parse_movable_node() to x86's setup_arch(), immediately after the movable_node parameter has been parsed. Link: http://lkml.kernel.org/r/1479160961-25840-3-git-send-email-arbab@linux.vnet.ibm.comSigned-off-by: NReza Arbab <arbab@linux.vnet.ibm.com> Acked-by: NBalbir Singh <bsingharora@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alistair Popple <apopple@au1.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Reza Arbab 提交于
Patch series "enable movable nodes on non-x86 configs", v7. This patchset allows more configs to make use of movable nodes. When CONFIG_MOVABLE_NODE is selected, there are two ways to introduce such nodes into the system: 1. Discover movable nodes at boot. Currently this is only possible on x86, but we will enable configs supporting fdt to do the same. 2. Hotplug and online all of a node's memory using online_movable. This is already possible on any config supporting memory hotplug, not just x86, but the Kconfig doesn't say so. We will fix that. We'll also remove some cruft on power which would prevent (2). This patch (of 5): Remove the check which prevents us from hotplugging into an empty node. The original commit b226e462 ("[PATCH] powerpc: don't add memory to empty node/zone"), states that this was intended to be a temporary measure. It is a workaround for an oops which no longer occurs. Link: http://lkml.kernel.org/r/1479160961-25840-2-git-send-email-arbab@linux.vnet.ibm.comSigned-off-by: NReza Arbab <arbab@linux.vnet.ibm.com> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NBalbir Singh <bsingharora@gmail.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alistair Popple <apopple@au1.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Piotr Kwapulinski 提交于
The MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES flags are irrelevant when setting them for MPOL_LOCAL NUMA memory policy via set_mempolicy or mbind. Return the "invalid argument" from set_mempolicy and mbind whenever any of these flags is passed along with MPOL_LOCAL. It is consistent with MPOL_PREFERRED passed with empty nodemask. It slightly shortens the execution time in paths where these flags are used e.g. when trying to rebind the NUMA nodes for changes in cgroups cpuset mems (mpol_rebind_preferred()) or when just printing the mempolicy structure (/proc/PID/numa_maps). Isolated tests done. Link: http://lkml.kernel.org/r/20161027163037.4089-1-kwapulinski.piotr@gmail.comSigned-off-by: NPiotr Kwapulinski <kwapulinski.piotr@gmail.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Liang Chen <liangchen.linux@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Nathan Zimmer <nzimmer@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lorenzo Stoakes 提交于
In the previous round of get_user_pages* changes comments attached to __get_user_pages_unlocked() and get_user_pages_unlocked() were rendered incorrect, this patch corrects them. In addition the get_user_pages_unlocked() comment seems to have already been outdated as it referred to tsk, mm parameters which were removed in c12d2da5 ("mm/gup: Remove the macro overload API migration helpers from the get_user*() APIs"), this patch fixes this also. Link: http://lkml.kernel.org/r/20161025233435.5338-1-lstoakes@gmail.comSigned-off-by: NLorenzo Stoakes <lstoakes@gmail.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
Now that we check for page size change early in the loop, we can partially revert e9d55e15 ("mm: change the interface for __tlb_remove_page"). This simplies the code much, by removing the need to track the last address with which we adjusted the range. We also go back to the older way of filling the mmu_gather array, ie, we add an entry and then check whether the gather batch is full. Link: http://lkml.kernel.org/r/20161026084839.27299-6-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
With commit e77b0852 ("mm/mmu_gather: track page size with mmu gather and force flush if page size change") we added the ability to force a tlb flush when the page size change in a mmu_gather loop. We did that by checking for a page size change every time we added a page to mmu_gather for lazy flush/remove. We can improve that by moving the page size change check early and not doing it every time we add a page. This also helps us to do tlb flush when invalidating a range covering dax mapping. Wrt dax mapping we don't have a backing struct page and hence we don't call tlb_remove_page, which earlier forced the tlb flush on page size change. Moving the page size change check earlier means we will do the same even for dax mapping. We also avoid doing this check on architecture other than powerpc. In a later patch we will remove page size check from tlb_remove_page(). Link: http://lkml.kernel.org/r/20161026084839.27299-5-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
This add tlb_remove_hugetlb_entry similar to tlb_remove_pmd_tlb_entry. Link: http://lkml.kernel.org/r/20161026084839.27299-4-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
We use __tlb_adjust_range to update range convered by mmu_gather struct. We later use the 'start' and 'end' to do a mmu_notifier_invalidate_range in tlb_flush_mmu_tlbonly(). Update the 'end' correctly in __tlb_adjust_range so that we call mmu_notifier_invalidate_range with the correct range values. Wrt tlbflush, this should not have any impact, because a flush with correct start address will flush tlb mapping for the range. Also add comment w.r.t updating the range when we free pagetable pages. For now we don't support a range based page table cache flush. Link: http://lkml.kernel.org/r/20161026084839.27299-3-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
We are removing a pmd hugepage here. Use the correct page size. Link: http://lkml.kernel.org/r/20161026084839.27299-2-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arnd Bergmann 提交于
After enabling -Wmaybe-uninitialized warnings, we get a false-postive warning for shmem: mm/shmem.c: In function `shmem_getpage_gfp': include/linux/spinlock.h:332:21: error: `info' may be used uninitialized in this function [-Werror=maybe-uninitialized] This can be easily avoided, since the correct 'info' pointer is known at the time we first enter the function, so we can simply move the initialization up. Moving it before the first label avoids the warning and lets us remove two later initializations. Note that the function is so hard to read that it not only confuses the compiler, but also most readers and without this patch it could\ easily break if one of the 'goto's changed. Link: https://www.spinics.net/lists/kernel/msg2368133.html Link: http://lkml.kernel.org/r/20161024205725.786455-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ming Ling 提交于
Since commit bda807d4 ("mm: migrate: support non-lru movable page migration") isolate_migratepages_block) can isolate !PageLRU pages which would acct_isolated account as NR_ISOLATED_*. Accounting these non-lru pages NR_ISOLATED_{ANON,FILE} doesn't make any sense and it can misguide heuristics based on those counters such as pgdat_reclaimable_pages resp. too_many_isolated which would lead to unexpected stalls during the direct reclaim without any good reason. Note that __alloc_contig_migrate_range can isolate a lot of pages at once. On mobile devices such as 512M ram android Phone, it may use a big zram swap. In some cases zram(zsmalloc) uses too many non-lru but migratedable pages, such as: MemTotal: 468148 kB Normal free:5620kB Free swap:4736kB Total swap:409596kB ZRAM: 164616kB(zsmalloc non-lru pages) active_anon:60700kB inactive_anon:60744kB active_file:34420kB inactive_file:37532kB Fix this by only accounting lru pages to NR_ISOLATED_* in isolate_migratepages_block right after they were isolated and we still know they were on LRU. Drop acct_isolated because it is called after the fact and we've lost that information. Batching per-cpu counter doesn't make much improvement anyway. Also make sure that we uncharge only LRU pages when putting them back on the LRU in putback_movable_pages resp. when unmap_and_move migrates the page. [mhocko@suse.com: replace acct_isolated() with direct counting] Fixes: bda807d4 ("mm: migrate: support non-lru movable page migration") Link: http://lkml.kernel.org/r/20161019080240.9682-1-mhocko@kernel.orgSigned-off-by: NMing Ling <ming.ling@spreadtrum.com> Signed-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NMinchan Kim <minchan@kernel.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <js1304@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-