- 26 7月, 2017 13 次提交
-
-
由 Dennis Zhou (Facebook) 提交于
The area map allocator only used a bitmap for the backing page state. The new bitmap allocator will use bitmaps to manage the allocation region in addition to this. This patch generalizes the bitmap iterators so they can be reused with the bitmap allocator. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Dennis Zhou (Facebook) 提交于
This patch increases the minimum allocation size of percpu memory to 4-bytes. This change will help minimize the metadata overhead associated with the bitmap allocator. The assumption is that most allocations will be of objects or structs greater than 2 bytes with integers or longs being used rather than shorts. The first chunk regions are now aligned with the minimum allocation size. The reserved region is expected to be set as a multiple of the minimum allocation size. The static region is aligned up and the delta is removed from the dynamic size. This works because the dynamic size is increased to be page aligned. If the static size is not minimum allocation size aligned, then there must be a gap that is added to the dynamic size. The dynamic size will never be smaller than the set value. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Dennis Zhou (Facebook) 提交于
pcpu_nr_empty_pop_pages is used to ensure there are a handful of free pages around to serve atomic allocations. A new field, nr_empty_pop_pages, is added to the pcpu_chunk struct to keep track of the number of empty pages. This field is needed as the number of empty populated pages is globally tracked and deltas are used to update in the bitmap allocator. Pages that contain a hidden area are not considered to be empty. This new field is exposed in percpu_stats. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Dennis Zhou (Facebook) 提交于
The populated bitmap represents the state of the pages the chunk serves. Prior, the bitmap was marked completely used as the first chunk was allocated and immutable. This is misleading because the first chunk may not be completely filled. Additionally, with moving the base_addr up in the previous patch, the population check no longer corresponds to what was being checked. This patch modifies the population map to be only the number of pages the region serves and to make what it was checking correspond correctly again. The change is to remove any misunderstanding between the size of the populated bitmap and the actual size of it. The work function page iterators now use nr_pages for the check rather than pcpu_unit_pages because nr_populated is now chunk specific. Without this, the work function would try to populate the remainder of these chunks despite it not serving any more than nr_pages when nr_pages is set less than pcpu_unit_pages. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Dennis Zhou (Facebook) 提交于
The percpu address checks for the reserved and dynamic region chunks are now specific to each region. The address checking logic can be combined taking advantage of the global references to the dynamic and static region chunks. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Dennis Zhou (Facebook) 提交于
Originally, the first chunk was served by one or two chunks, each given a region they are responsible for. Despite this, the arithmetic was based off of the true base_addr of the chunk making it be overly inclusive. This patch moves the base_addr of chunks that are responsible for the first chunk. The base_addr must remain page aligned to keep the address alignment correct, so it is the beginning of the region served page aligned down. start_offset holds where the region served begins from this new base_addr. The corresponding percpu address checks are modified to be more specific as a result. The first chunk considers only the dynamic region and both first chunk and reserved chunk checks ignore the static region. The static region addresses should never be passed into the allocator. There is no impact here besides distinguishing the first chunk and making the checks specific. The percpu pointer to physical address is left intact as addresses are not given out in the non-allocated portion of percpu memory. nr_pages is added to pcpu_chunk to keep track of the size of the entire region served containing both start_offset and end_offset. This variable will be used to manage the bitmap allocator. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Dennis Zhou (Facebook) 提交于
There is no need to have the static chunk and dynamic chunk be named separately as the allocations are sequential. This preemptively solves the misnomer problem with the base_addrs being moved up in the following patch. It also removes a ternary operation deciding the first chunk. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Dennis Zhou (Facebook) 提交于
The area map allocator manages the first chunk area by hiding all but the region it is responsible for serving in the area map. To align this with the populated page bitmap, end_offset is introduced to keep track of the delta to end page aligned. The area map is appended with the page aligned end when necessary to be in line with how the bitmap allocator requires the ending to be aligned with the LCM of PAGE_SIZE and the size of each bitmap block. percpu_stats is updated to ignore this region when present. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Dennis Zhou (Facebook) 提交于
Create a common allocator for first chunk initialization, pcpu_alloc_first_chunk. Comments for this function will be added in a later patch once the bitmap allocator is added. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Dennis Zhou (Facebook) 提交于
There is logic for setting variables in the static chunk init code that could be consolidated with the dynamic chunk init code. This combines this logic to setup for combining the allocation paths. reserved_size is used as the conditional as a dynamic region will always exist. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Dennis Zhou (Facebook) 提交于
Prior this variable was used to manage statistics when the first chunk had a reserved region. The previous patch introduced start_offset to keep track of the offset by value rather than boolean. Therefore, has_reserved can be removed. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Dennis Zhou (Facebook) 提交于
The reserved chunk arithmetic uses a global variable pcpu_reserved_chunk_limit that is set in the first chunk init code to hide a portion of the area map. The bitmap allocator to come will eventually move the base_addr up and require both the reserved chunk and static chunk to maintain this offset. pcpu_reserved_chunk_limit is removed and start_offset is added. The first chunk that is circulated and is pcpu_first_chunk serves the dynamic region, the region following the reserved region. The reserved chunk address check will temporarily use the first chunk to identify its address range. A following patch will increase the base_addr and remove this. If there is no reserved chunk, this will check the static region and return false because those values should never be passed into the allocator. Lastly, when linking in the first chunk, make sure to count the right free region for the number of empty populated pages. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Dennis Zhou (Facebook) 提交于
The first chunk is handled as a special case as it is composed of the static, reserved, and dynamic regions. The code handles each case individually. The next several patches will merge these code paths and lay the foundation for the bitmap allocator. This patch modifies logic to enforce that a dynamic region exists and changes the area map to account for that. This brings the logic closer to the dynamic chunk's init logic. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 17 7月, 2017 4 次提交
-
-
由 Dennis Zhou (Facebook) 提交于
The header comment for percpu memory is a little hard to parse and is not super clear about how the first chunk is managed. This adds a little more clarity to the situation. There is also quite a bit of tricky logic in the pcpu_build_alloc_info. This adds a restructure of a comment to add a little more information. Unfortunately, you will still have to piece together a handful of other comments too, but should help direct you to the meaningful comments. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Dennis Zhou (Facebook) 提交于
Percpu memory holds a minimum threshold of pages that are populated in order to serve atomic percpu memory requests. This change makes it easier to verify that there are a minimum number of populated pages lying around. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Dennis Zhou (Facebook) 提交于
This makes the debugfs output for percpu_stats a little easier to read by changing the spacing of the output to be consistent. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Dennis Zhou (Facebook) 提交于
Changes the use of a void buffer to an int buffer for clarity. Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 15 7月, 2017 1 次提交
-
-
由 Helge Deller 提交于
Jörn Engel noticed that the expand_upwards() function might not return -ENOMEM in case the requested address is (unsigned long)-PAGE_SIZE and if the architecture didn't defined TASK_SIZE as multiple of PAGE_SIZE. Affected architectures are arm, frv, m68k, blackfin, h8300 and xtensa which all define TASK_SIZE as 0xffffffff, but since none of those have an upwards-growing stack we currently have no actual issue. Nevertheless let's fix this just in case any of the architectures with an upward-growing stack (currently parisc, metag and partly ia64) define TASK_SIZE similar. Link: http://lkml.kernel.org/r/20170702192452.GA11868@p100.box Fixes: bd726c90 ("Allow stack to grow up to address space limit") Signed-off-by: NHelge Deller <deller@gmx.de> Reported-by: NJörn Engel <joern@purestorage.com> Cc: Hugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 7月, 2017 5 次提交
-
-
由 Nikolay Borisov 提交于
Currently the writeback statistics code uses a percpu counters to hold various statistics. Furthermore we have 2 families of functions - those which disable local irq and those which doesn't and whose names begin with double underscore. However, they both end up calling __add_wb_stats which in turn calls percpu_counter_add_batch which is already irq-safe. Exploiting this fact allows to eliminated the __wb_* functions since they don't add any further protection than we already have. Furthermore, refactor the wb_* function to call __add_wb_stat directly without the irq-disabling dance. This will likely result in better runtime of code which deals with modifying the stat counters. While at it also document why percpu_counter_add_batch is in fact preempt and irq-safe since at least 3 people got confused. Link: http://lkml.kernel.org/r/1498029937-27293-1-git-send-email-nborisov@suse.comSigned-off-by: NNikolay Borisov <nborisov@suse.com> Acked-by: NTejun Heo <tj@kernel.org> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Josef Bacik <jbacik@fb.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Jeff Layton <jlayton@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Page migration (for memory hotplug, soft_offline_page or mbind) needs to allocate a new memory. This can trigger an oom killer if the target memory is depleated. Although quite unlikely, still possible, especially for the memory hotplug (offlining of memoery). Up to now we didn't really have reasonable means to back off. __GFP_NORETRY can fail just too easily and __GFP_THISNODE sticks to a single node and that is not suitable for all callers. But now that we have __GFP_RETRY_MAYFAIL we should use it. It is preferable to fail the migration than disrupt the system by killing some processes. Link: http://lkml.kernel.org/r/20170623085345.11304-7-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Alex Belits <alex.belits@cavium.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: David Daney <david.daney@cavium.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: NeilBrown <neilb@suse.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Now that __GFP_RETRY_MAYFAIL has a reasonable semantic regardless of the request size we can drop the hackish implementation for !costly orders. __GFP_RETRY_MAYFAIL retries as long as the reclaim makes a forward progress and backs of when we are out of memory for the requested size. Therefore we do not need to enforce__GFP_NORETRY for !costly orders just to silent the oom killer anymore. Link: http://lkml.kernel.org/r/20170623085345.11304-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Alex Belits <alex.belits@cavium.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: David Daney <david.daney@cavium.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: NeilBrown <neilb@suse.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
__GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to the page allocator. This has been true but only for allocations requests larger than PAGE_ALLOC_COSTLY_ORDER. It has been always ignored for smaller sizes. This is a bit unfortunate because there is no way to express the same semantic for those requests and they are considered too important to fail so they might end up looping in the page allocator for ever, similarly to GFP_NOFAIL requests. Now that the whole tree has been cleaned up and accidental or misled usage of __GFP_REPEAT flag has been removed for !costly requests we can give the original flag a better name and more importantly a more useful semantic. Let's rename it to __GFP_RETRY_MAYFAIL which tells the user that the allocator would try really hard but there is no promise of a success. This will work independent of the order and overrides the default allocator behavior. Page allocator users have several levels of guarantee vs. cost options (take GFP_KERNEL as an example) - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_ attempt to free memory at all. The most light weight mode which even doesn't kick the background reclaim. Should be used carefully because it might deplete the memory and the next user might hit the more aggressive reclaim - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic allocation without any attempt to free memory from the current context but can wake kswapd to reclaim memory if the zone is below the low watermark. Can be used from either atomic contexts or when the request is a performance optimization and there is another fallback for a slow path. - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) - non sleeping allocation with an expensive fallback so it can access some portion of memory reserves. Usually used from interrupt/bh context with an expensive slow path fallback. - GFP_KERNEL - both background and direct reclaim are allowed and the _default_ page allocator behavior is used. That means that !costly allocation requests are basically nofail but there is no guarantee of that behavior so failures have to be checked properly by callers (e.g. OOM killer victim is allowed to fail currently). - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior and all allocation requests fail early rather than cause disruptive reclaim (one round of reclaim in this implementation). The OOM killer is not invoked. - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator behavior and all allocation requests try really hard. The request will fail if the reclaim cannot make any progress. The OOM killer won't be triggered. - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior and all allocation requests will loop endlessly until they succeed. This might be really dangerous especially for larger orders. Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL because they already had their semantic. No new users are added. __alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if there is no progress and we have already passed the OOM point. This means that all the reclaim opportunities have been exhausted except the most disruptive one (the OOM killer) and a user defined fallback behavior is more sensible than keep retrying in the page allocator. [akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c] [mhocko@suse.com: semantic fix] Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz [mhocko@kernel.org: address other thing spotted by Vlastimil] Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Alex Belits <alex.belits@cavium.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: David Daney <david.daney@cavium.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: NeilBrown <neilb@suse.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Geert Uytterhoeven 提交于
With gcc 4.1.2: mm/memory.o: In function `create_huge_pmd': memory.c:(.text+0x93e): undefined reference to `do_huge_pmd_anonymous_page' Interestingly, create_huge_pmd() is emitted in the assembler output, but never called. Converting transparent_hugepage_enabled() from a macro to a static inline function reduced the ability of the compiler to remove unused code. Fix this by marking create_huge_pmd() inline. Fixes: 16981d76 ("mm: improve readability of transparent_hugepage_enabled()") Link: http://lkml.kernel.org/r/1499842660-10665-1-git-send-email-geert@linux-m68k.orgSigned-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Acked-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 7月, 2017 17 次提交
-
-
由 Colin Ian King 提交于
The helper function get_wild_bug_type() does not need to be in global scope, so make it static. Cleans up sparse warning: "symbol 'get_wild_bug_type' was not declared. Should it be static?" Link: http://lkml.kernel.org/r/20170622090049.10658-1-colin.king@canonical.comSigned-off-by: NColin Ian King <colin.king@canonical.com> Acked-by: NDmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
They return positive value, that is, true, if non-zero value is found. Rename them to reduce confusion. Link: http://lkml.kernel.org/r/20170516012350.GA16015@js1304-desktopSigned-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Ryabinin 提交于
KASAN doesn't happen work with memory hotplug because hotplugged memory doesn't have any shadow memory. So any access to hotplugged memory would cause a crash on shadow check. Use memory hotplug notifier to allocate and map shadow memory when the hotplugged memory is going online and free shadow after the memory offlined. Link: http://lkml.kernel.org/r/20170601162338.23540-4-aryabinin@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexander Potapenko <glider@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Ryabinin 提交于
For some unaligned memory accesses we have to check additional byte of the shadow memory. Currently we load that byte speculatively to have only single load + branch on the optimistic fast path. However, this approach has some downsides: - It's unaligned access, so this prevents porting KASAN on architectures which doesn't support unaligned accesses. - We have to map additional shadow page to prevent crash if speculative load happens near the end of the mapped memory. This would significantly complicate upcoming memory hotplug support. I wasn't able to notice any performance degradation with this patch. So these speculative loads is just a pain with no gain, let's remove them. Link: http://lkml.kernel.org/r/20170601162338.23540-1-aryabinin@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: NDmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
There is missing optimization in zero_p4d_populate() that can save some memory when mapping zero shadow. Implement it like as others. Link: http://lkml.kernel.org/r/1494829255-23946-1-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jerome Marchand 提交于
Commit 40f9fb8c ("mm/zsmalloc: support allocating obj with size of ZS_MAX_ALLOC_SIZE") fixes a size calculation error that prevented zsmalloc to allocate an object of the maximal size (ZS_MAX_ALLOC_SIZE). I think however the fix is unneededly complicated. This patch replaces the dynamic calculation of zs_size_classes at init time by a compile time calculation that uses the DIV_ROUND_UP() macro already used in get_size_class_index(). [akpm@linux-foundation.org: use min_t] Link: http://lkml.kernel.org/r/20170630114859.1979-1-jmarchan@redhat.comSigned-off-by: NJerome Marchand <jmarchan@redhat.com> Acked-by: NMinchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Mahendran Ganesh <opensource.ganesh@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Thomas Gleixner 提交于
Andrey reported a potential deadlock with the memory hotplug lock and the cpu hotplug lock. The reason is that memory hotplug takes the memory hotplug lock and then calls stop_machine() which calls get_online_cpus(). That's the reverse lock order to get_online_cpus(); get_online_mems(); in mm/slub_common.c The problem has been there forever. The reason why this was never reported is that the cpu hotplug locking had this homebrewn recursive reader writer semaphore construct which due to the recursion evaded the full lock dep coverage. The memory hotplug code copied that construct verbatim and therefor has similar issues. Three steps to fix this: 1) Convert the memory hotplug locking to a per cpu rwsem so the potential issues get reported proper by lockdep. 2) Lock the online cpus in mem_hotplug_begin() before taking the memory hotplug rwsem and use stop_machine_cpuslocked() in the page_alloc code to avoid recursive locking. 3) The cpu hotpluck locking in #2 causes a recursive locking of the cpu hotplug lock via __offline_pages() -> lru_add_drain_all(). Solve this by invoking lru_add_drain_all_cpuslocked() instead. Link: http://lkml.kernel.org/r/20170704093421.506836322@linutronix.deReported-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Thomas Gleixner 提交于
The rework of the cpu hotplug locking unearthed potential deadlocks with the memory hotplug locking code. The solution for these is to rework the memory hotplug locking code as well and take the cpu hotplug lock before the memory hotplug lock in mem_hotplug_begin(), but this will cause a recursive locking of the cpu hotplug lock when the memory hotplug code calls lru_add_drain_all(). Split out the inner workings of lru_add_drain_all() into lru_add_drain_all_cpuslocked() so this function can be invoked from the memory hotplug code with the cpu hotplug lock held. Link: http://lkml.kernel.org/r/20170704093421.419329357@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de> Reported-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Krzysztof Opasiak 提交于
Use rlimit() helper instead of manually writing whole chain from current task to rlim_cur. Link: http://lkml.kernel.org/r/20170705172811.8027-1-k.opasiak@samsung.comSigned-off-by: NKrzysztof Opasiak <k.opasiak@samsung.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sahitya Tummala 提交于
list_lru_count_node() iterates over all memcgs to get the total number of entries on the node but it can race with memcg_drain_all_list_lrus(), which migrates the entries from a dead cgroup to another. This can return incorrect number of entries from list_lru_count_node(). Fix this by keeping track of entries per node and simply return it in list_lru_count_node(). Link: http://lkml.kernel.org/r/1498707555-30525-1-git-send-email-stummala@codeaurora.orgSigned-off-by: NSahitya Tummala <stummala@codeaurora.org> Acked-by: NVladimir Davydov <vdavydov.dev@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Alexander Polakov <apolyakov@beget.ru> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
expand_stack(vma) fails if address < stack_guard_gap even if there is no vma->vm_prev. I don't think this makes sense, and we didn't do this before the recent commit 1be7107f ("mm: larger stack guard gap, between vmas"). We do not need a gap in this case, any address is fine as long as security_mmap_addr() doesn't object. This also simplifies the code, we know that address >= prev->vm_end and thus underflow is not possible. Link: http://lkml.kernel.org/r/20170628175258.GA24881@redhat.comSigned-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: Larry Woodman <lwoodman@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Commit 1be7107f ("mm: larger stack guard gap, between vmas") has introduced a regression in some rust and Java environments which are trying to implement their own stack guard page. They are punching a new MAP_FIXED mapping inside the existing stack Vma. This will confuse expand_{downwards,upwards} into thinking that the stack expansion would in fact get us too close to an existing non-stack vma which is a correct behavior wrt safety. It is a real regression on the other hand. Let's work around the problem by considering PROT_NONE mapping as a part of the stack. This is a gros hack but overflowing to such a mapping would trap anyway an we only can hope that usespace knows what it is doing and handle it propely. Fixes: 1be7107f ("mm: larger stack guard gap, between vmas") Link: http://lkml.kernel.org/r/20170705182849.GA18027@dhcp22.suse.czSigned-off-by: NMichal Hocko <mhocko@suse.com> Debugged-by: NVlastimil Babka <vbabka@suse.cz> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Willy Tarreau <w@1wt.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 zhenwei.pi 提交于
presently pages in the balloon device have random value, and these pages will be scanned by ksmd on the host. They usually cannot be merged. Enqueue zero pages will resolve this problem. Link: http://lkml.kernel.org/r/1498698637-26389-1-git-send-email-zhenwei.pi@youruncloud.comSigned-off-by: Nzhenwei.pi <zhenwei.pi@youruncloud.com> Cc: Gioh Kim <gi-oh.kim@profitbricks.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Rafael Aquini <aquini@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Doug Berger 提交于
The align_offset parameter is used by bitmap_find_next_zero_area_off() to represent the offset of map's base from the previous alignment boundary; the function ensures that the returned index, plus the align_offset, honors the specified align_mask. The logic introduced by commit b5be83e3 ("mm: cma: align to physical address, not CMA region position") has the cma driver calculate the offset to the *next* alignment boundary. In most cases, the base alignment is greater than that specified when making allocations, resulting in a zero offset whether we align up or down. In the example given with the commit, the base alignment (8MB) was half the requested alignment (16MB) so the math also happened to work since the offset is 8MB in both directions. However, when requesting allocations with an alignment greater than twice that of the base, the returned index would not be correctly aligned. Also, the align_order arguments of cma_bitmap_aligned_mask() and cma_bitmap_aligned_offset() should not be negative so the argument type was made unsigned. Fixes: b5be83e3 ("mm: cma: align to physical address, not CMA region position") Link: http://lkml.kernel.org/r/20170628170742.2895-1-opendmb@gmail.comSigned-off-by: NAngus Clark <angus@angusclark.org> Signed-off-by: NDoug Berger <opendmb@gmail.com> Acked-by: NGregory Fong <gregory.0xf0@gmail.com> Cc: Doug Berger <opendmb@gmail.com> Cc: Angus Clark <angus@angusclark.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Shiraz Hashim <shashim@codeaurora.org> Cc: Jaewon Kim <jaewon31.kim@samsung.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
__remove_zone() sets up up zone_type, but never uses it for anything. This does not cause a warning, due to the (necessary) use of -Wno-unused-but-set-variable. However, it's noise, so just delete it. Link: http://lkml.kernel.org/r/20170624043421.24465-2-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
get_cpu_var() disables preemption and returns the per-CPU version of the variable. Disabling preemption is useful to ensure atomic access to the variable within the critical section. In this case however, after the per-CPU version of the variable is obtained the ->free_lock is acquired. For that reason it seems the raw accessor could be used. It only seems that ->slots_ret should be retested (because with disabled preemption this variable can not be set to NULL otherwise). This popped up during PREEMPT-RT testing because it tries to take spinlocks in a preempt disabled section. In RT, spinlocks can sleep. Link: http://lkml.kernel.org/r/20170623114755.2ebxdysacvgxzott@linutronix.deSigned-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ying Huang <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rasmus Villemoes 提交于
Since current_order starts as MAX_ORDER-1 and is then only decremented, the second half of the loop condition seems superfluous. However, if order is 0, we may decrement current_order past 0, making it UINT_MAX. This is obviously too subtle ([1], [2]). Since we need to add some comment anyway, change the two variables to signed, making the counting-down for loop look more familiar, and apparently also making gcc generate slightly smaller code. [1] https://lkml.org/lkml/2016/6/20/493 [2] https://lkml.org/lkml/2017/6/19/345 [akpm@linux-foundation.org: fix up reject fixupping] Link: http://lkml.kernel.org/r/20170621185529.2265-1-linux@rasmusvillemoes.dkSigned-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Reported-by: NHao Lee <haolee.swjtu@gmail.com> Acked-by: NWei Yang <weiyang@gmail.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-