- 18 12月, 2019 5 次提交
-
-
由 Yang Shi 提交于
Since commit 0a432dcb ("mm: shrinker: make shrinker not depend on memcg kmem"), shrinkers' idr is protected by CONFIG_MEMCG instead of CONFIG_MEMCG_KMEM, so it makes no sense to protect shrinker idr replace with CONFIG_MEMCG_KMEM. And in the CONFIG_MEMCG && CONFIG_SLOB case, shrinker_idr contains only shrinker, and it is deferred_split_shrinker. But it is never actually called, since idr_replace() is never compiled due to the wrong #ifdef. The deferred_split_shrinker all the time is staying in half-registered state, and it's never called for subordinate mem cgroups. Link: http://lkml.kernel.org/r/1575486978-45249-1-git-send-email-yang.shi@linux.alibaba.com Fixes: 0a432dcb ("mm: shrinker: make shrinker not depend on memcg kmem") Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com> Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> [5.4+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Axtens 提交于
syzkaller and the fault injector showed that I was wrong to assume that we could ignore percpu shadow allocation failures. Handle failures properly. Merge all the allocated areas back into the free list and release the shadow, then clean up and return NULL. The shadow is released unconditionally, which relies upon the fact that the release function is able to tolerate pages not being present. Also clean up shadows in the recovery path - currently they are not released, which leaks a bit of memory. Link: http://lkml.kernel.org/r/20191205140407.1874-3-dja@axtens.net Fixes: 3c5c3cfb ("kasan: support backing vmalloc space with real shadow memory") Signed-off-by: NDaniel Axtens <dja@axtens.net> Reported-by: syzbot+82e323920b78d54aaed5@syzkaller.appspotmail.com Reported-by: syzbot+59b7daa4315e07a994f1@syzkaller.appspotmail.com Reviewed-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Qian Cai <cai@lca.pw> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Axtens 提交于
kasan_release_vmalloc uses apply_to_page_range to release vmalloc shadow. Unfortunately, apply_to_page_range can allocate memory to fill in page table entries, which is not what we want. Also, kasan_release_vmalloc is called under free_vmap_area_lock, so if apply_to_page_range does allocate memory, we get a sleep in atomic bug: BUG: sleeping function called from invalid context at mm/page_alloc.c:4681 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 15087, name: Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x199/0x216 lib/dump_stack.c:118 ___might_sleep.cold.97+0x1f5/0x238 kernel/sched/core.c:6800 __might_sleep+0x95/0x190 kernel/sched/core.c:6753 prepare_alloc_pages mm/page_alloc.c:4681 [inline] __alloc_pages_nodemask+0x3cd/0x890 mm/page_alloc.c:4730 alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2211 alloc_pages include/linux/gfp.h:532 [inline] __get_free_pages+0xc/0x40 mm/page_alloc.c:4786 __pte_alloc_one_kernel include/asm-generic/pgalloc.h:21 [inline] pte_alloc_one_kernel include/asm-generic/pgalloc.h:33 [inline] __pte_alloc_kernel+0x1d/0x200 mm/memory.c:459 apply_to_pte_range mm/memory.c:2031 [inline] apply_to_pmd_range mm/memory.c:2068 [inline] apply_to_pud_range mm/memory.c:2088 [inline] apply_to_p4d_range mm/memory.c:2108 [inline] apply_to_page_range+0x77d/0xa00 mm/memory.c:2133 kasan_release_vmalloc+0xa7/0xc0 mm/kasan/common.c:970 __purge_vmap_area_lazy+0xcbb/0x1f30 mm/vmalloc.c:1313 try_purge_vmap_area_lazy mm/vmalloc.c:1332 [inline] free_vmap_area_noflush+0x2ca/0x390 mm/vmalloc.c:1368 free_unmap_vmap_area mm/vmalloc.c:1381 [inline] remove_vm_area+0x1cc/0x230 mm/vmalloc.c:2209 vm_remove_mappings mm/vmalloc.c:2236 [inline] __vunmap+0x223/0xa20 mm/vmalloc.c:2299 __vfree+0x3f/0xd0 mm/vmalloc.c:2356 __vmalloc_area_node mm/vmalloc.c:2507 [inline] __vmalloc_node_range+0x5d5/0x810 mm/vmalloc.c:2547 __vmalloc_node mm/vmalloc.c:2607 [inline] __vmalloc_node_flags mm/vmalloc.c:2621 [inline] vzalloc+0x6f/0x80 mm/vmalloc.c:2666 alloc_one_pg_vec_page net/packet/af_packet.c:4233 [inline] alloc_pg_vec net/packet/af_packet.c:4258 [inline] packet_set_ring+0xbc0/0x1b50 net/packet/af_packet.c:4342 packet_setsockopt+0xed7/0x2d90 net/packet/af_packet.c:3695 __sys_setsockopt+0x29b/0x4d0 net/socket.c:2117 __do_sys_setsockopt net/socket.c:2133 [inline] __se_sys_setsockopt net/socket.c:2130 [inline] __x64_sys_setsockopt+0xbe/0x150 net/socket.c:2130 do_syscall_64+0xfa/0x780 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe Switch to using the apply_to_existing_page_range() helper instead, which won't allocate memory. [akpm@linux-foundation.org: s/apply_to_existing_pages/apply_to_existing_page_range/] Link: http://lkml.kernel.org/r/20191205140407.1874-2-dja@axtens.net Fixes: 3c5c3cfb ("kasan: support backing vmalloc space with real shadow memory") Signed-off-by: NDaniel Axtens <dja@axtens.net> Reported-by: NDmitry Vyukov <dvyukov@google.com> Reviewed-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Qian Cai <cai@lca.pw> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Axtens 提交于
apply_to_page_range() takes an address range, and if any parts of it are not covered by the existing page table hierarchy, it allocates memory to fill them in. In some use cases, this is not what we want - we want to be able to operate exclusively on PTEs that are already in the tables. Add apply_to_existing_page_range() for this. Adjust the walker functions for apply_to_page_range to take 'create', which switches them between the old and new modes. This will be used in KASAN vmalloc. [akpm@linux-foundation.org: reduce code duplication] [akpm@linux-foundation.org: s/apply_to_existing_pages/apply_to_existing_page_range/] [akpm@linux-foundation.org: initialize __apply_to_page_range::err] Link: http://lkml.kernel.org/r/20191205140407.1874-1-dja@axtens.netSigned-off-by: NDaniel Axtens <dja@axtens.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Daniel Axtens <dja@axtens.net> Cc: Qian Cai <cai@lca.pw> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Ryabinin 提交于
With CONFIG_KASAN_VMALLOC=y any use of memory obtained via vm_map_ram() will crash because there is no shadow backing that memory. Instead of sprinkling additional kasan_populate_vmalloc() calls all over the vmalloc code, move it into alloc_vmap_area(). This will fix vm_map_ram() and simplify the code a bit. [aryabinin@virtuozzo.com: v2] Link: http://lkml.kernel.org/r/20191205095942.1761-1-aryabinin@virtuozzo.comLink: http://lkml.kernel.org/r/20191204204534.32202-1-aryabinin@virtuozzo.com Fixes: 3c5c3cfb ("kasan: support backing vmalloc space with real shadow memory") Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Reviewed-by: NUladzislau Rezki (Sony) <urezki@gmail.com> Cc: Daniel Axtens <dja@axtens.net> Cc: Alexander Potapenko <glider@google.com> Cc: Daniel Axtens <dja@axtens.net> Cc: Qian Cai <cai@lca.pw> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 12月, 2019 6 次提交
-
-
由 Mike Rapoport 提交于
There are no architectures that use include/asm-generic/4level-fixup.h therefore it can be removed along with __ARCH_HAS_4LEVEL_HACK define. Link: http://lkml.kernel.org/r/1572938135-31886-14-git-send-email-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com> Cc: Anatoly Pugachev <matorola@gmail.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Peter Rosin <peda@axentia.se> Cc: Richard Weinberger <richard@nod.at> Cc: Rolf Eike Beer <eike-kernel@sf-tec.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Sam Creasey <sammy@sammy.net> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yu Zhao 提交于
For hugely mapped thp, we use is_huge_zero_pmd() to check if it's zero page or not. We do fill ptes with my_zero_pfn() when we split zero thp pmd, but this is not what we have in vm_normal_page_pmd() -- pmd_trans_huge_lock() makes sure of it. This is a trivial fix for /proc/pid/numa_maps, and AFAIK nobody complains about it. Gerald Schaefer asked: : Maybe the description could also mention the symptom of this bug? : I would assume that it affects anon/dirty accounting in gather_pte_stats(), : for huge mappings, if zero page mappings are not correctly recognized. I came across this while I was looking at the code, so I'm not aware of any symptom. Link: http://lkml.kernel.org/r/20191108192629.201556-1-yuzhao@google.comSigned-off-by: NYu Zhao <yuzhao@google.com> Acked-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Will Deacon <will@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
Use common names from vmstat array when possible. This gives not much difference in code size for now, but should help in keeping interfaces consistent. add/remove: 0/2 grow/shrink: 2/0 up/down: 70/-72 (-2) Function old new delta memory_stat_format 984 1050 +66 memcg_stat_show 957 961 +4 memcg1_event_names 32 - -32 mem_cgroup_lru_names 40 - -40 Total: Before=14485337, After=14485335, chg -0.00% Link: http://lkml.kernel.org/r/157113012508.453.80391533767219371.stgit@buzzSigned-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
Statistics in vmstat is combined from counters with different structure, but names for them are merged into one array. This patch adds trivial helpers to get name for each item: const char *zone_stat_name(enum zone_stat_item item); const char *numa_stat_name(enum numa_stat_item item); const char *node_stat_name(enum node_stat_item item); const char *writeback_stat_name(enum writeback_stat_item item); const char *vm_event_name(enum vm_event_item item); Names for enum writeback_stat_item are folded in the middle of vmstat_text so this patch moves declaration into header to calculate offset of following items. Also this patch reuses piece of node stat names for lru list names: const char *lru_list_name(enum lru_list lru); This returns common lru list names: "inactive_anon", "active_anon", "inactive_file", "active_file", "unevictable". [khlebnikov@yandex-team.ru: do not use size of vmstat_text as count of /proc/vmstat items] Link: http://lkml.kernel.org/r/157152151769.4139.15423465513138349343.stgit@buzz Link: https://lore.kernel.org/linux-mm/cd1c42ae-281f-c8a8-70ac-1d01d417b2e1@infradead.org/T/#u Link: http://lkml.kernel.org/r/157113012325.453.562783073839432766.stgit@buzzSigned-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: YueHaibing <yuehaibing@huawei.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Roman Gushchin 提交于
Christian reported a warning like the following obtained during running some KVM-related tests on s390: WARNING: CPU: 8 PID: 208 at lib/percpu-refcount.c:108 percpu_ref_exit+0x50/0x58 Modules linked in: kvm(-) xt_CHECKSUM xt_MASQUERADE bonding xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_na> CPU: 8 PID: 208 Comm: kworker/8:1 Not tainted 5.2.0+ #66 Hardware name: IBM 2964 NC9 712 (LPAR) Workqueue: events sysfs_slab_remove_workfn Krnl PSW : 0704e00180000000 0000001529746850 (percpu_ref_exit+0x50/0x58) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 00000000ffff8808 0000001529746740 000003f4e30e8e18 0036008100000000 0000001f00000000 0035008100000000 0000001fb3573ab8 0000000000000000 0000001fbdb6de00 0000000000000000 0000001529f01328 0000001fb3573b00 0000001fbb27e000 0000001fbdb69300 000003e009263d00 000003e009263cd0 Krnl Code: 0000001529746842: f0a0000407fe srp 4(11,%r0),2046,0 0000001529746848: 47000700 bc 0,1792 #000000152974684c: a7f40001 brc 15,152974684e >0000001529746850: a7f4fff2 brc 15,1529746834 0000001529746854: 0707 bcr 0,%r7 0000001529746856: 0707 bcr 0,%r7 0000001529746858: eb8ff0580024 stmg %r8,%r15,88(%r15) 000000152974685e: a738ffff lhi %r3,-1 Call Trace: ([<000003e009263d00>] 0x3e009263d00) [<00000015293252ea>] slab_kmem_cache_release+0x3a/0x70 [<0000001529b04882>] kobject_put+0xaa/0xe8 [<000000152918cf28>] process_one_work+0x1e8/0x428 [<000000152918d1b0>] worker_thread+0x48/0x460 [<00000015291942c6>] kthread+0x126/0x160 [<0000001529b22344>] ret_from_fork+0x28/0x30 [<0000001529b2234c>] kernel_thread_starter+0x0/0x10 Last Breaking-Event-Address: [<000000152974684c>] percpu_ref_exit+0x4c/0x58 ---[ end trace b035e7da5788eb09 ]--- The problem occurs because kmem_cache_destroy() is called immediately after deleting of a memcg, so it races with the memcg kmem_cache deactivation. flush_memcg_workqueue() at the beginning of kmem_cache_destroy() is supposed to guarantee that all deactivation processes are finished, but failed to do so. It waits for an rcu grace period, after which all children kmem_caches should be deactivated. During the deactivation percpu_ref_kill() is called for non root kmem_cache refcounters, but it requires yet another rcu grace period to finish the transition to the atomic (dead) state. So in a rare case when not all children kmem_caches are destroyed at the moment when the root kmem_cache is about to be gone, we need to wait another rcu grace period before destroying the root kmem_cache. This issue can be triggered only with dynamically created kmem_caches which are used with memcg accounting. In this case per-memcg child kmem_caches are created. They are deactivated from the cgroup removing path. If the destruction of the root kmem_cache is racing with the removal of the cgroup (both are quite complicated multi-stage processes), the described issue can occur. The only known way to trigger it in the real life, is to unload some kernel module which creates a dedicated kmem_cache, used from different memory cgroups with GFP_ACCOUNT flag. If the unloading happens immediately after calling rmdir on the corresponding cgroup, there is some chance to trigger the issue. Link: http://lkml.kernel.org/r/20191129025011.3076017-1-guro@fb.com Fixes: f0a3a24b ("mm: memcg/slab: rework non-root kmem_cache lifecycle management") Signed-off-by: NRoman Gushchin <guro@fb.com> Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com> Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 zhong jiang 提交于
I hit the following compile error in arch/x86/ mm/kasan/common.c: In function kasan_populate_vmalloc: mm/kasan/common.c:797:2: error: implicit declaration of function flush_cache_vmap; did you mean flush_rcu_work? [-Werror=implicit-function-declaration] flush_cache_vmap(shadow_start, shadow_end); ^~~~~~~~~~~~~~~~ flush_rcu_work cc1: some warnings being treated as errors Link: http://lkml.kernel.org/r/1575363013-43761-1-git-send-email-zhongjiang@huawei.com Fixes: 3c5c3cfb ("kasan: support backing vmalloc space with real shadow memory") Signed-off-by: Nzhong jiang <zhongjiang@huawei.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NDaniel Axtens <dja@axtens.net> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 12月, 2019 29 次提交
-
-
由 Minchan Kim 提交于
If a block device supports rw_page operation, it doesn't submit bios so the annotation in submit_bio() for refault stall doesn't work. It happens with zram in android, especially swap read path which could consume CPU cycle for decompress. It is also a problem for zswap which uses frontswap. Annotate swap_readpage() to account the synchronous IO overhead to prevent underreport memory pressure. [akpm@linux-foundation.org: add comment, per Johannes] Link: http://lkml.kernel.org/r/20191010152134.38545-1-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@google.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Randy Dunlap 提交于
End a Kconfig help text sentence with a period (aka full stop). Link: http://lkml.kernel.org/r/c17f2c75-dc2a-42a4-2229-bb6b489addf2@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Krzysztof Kozlowski 提交于
Adjust indentation from spaces to tab (+optional two spaces) as in coding style with command like: $ sed -e 's/^ / /' -i */Kconfig Link: http://lkml.kernel.org/r/1574306437-28837-1-git-send-email-krzk@kernel.orgSigned-off-by: NKrzysztof Kozlowski <krzk@kernel.org> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Kosina <trivial@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Souptick Joarder 提交于
__online_page_set_limits() is a dummy function - remove it and all callers. Link: http://lkml.kernel.org/r/8e1bc9d3b492f6bde16e95ebc1dee11d6aefabd7.1567889743.git.jrdr.linux@gmail.com Link: http://lkml.kernel.org/r/854db2cf8145d9635249c95584d9a91fd774a229.1567889743.git.jrdr.linux@gmail.com Link: http://lkml.kernel.org/r/9afe6c5a18158f3884a6b302ac2c772f3da49ccc.1567889743.git.jrdr.linux@gmail.comSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
There are several places emphasise the effect of __SetPageUptodate(), while the comment seems to have a typo in two places. Link: http://lkml.kernel.org/r/20190926023705.7226-1-richardw.yang@linux.intel.comSigned-off-by: NWei Yang <richardw.yang@linux.intel.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Chen Jun 提交于
In 64bit system. sb->s_maxbytes of shmem filesystem is MAX_LFS_FILESIZE, which equal LLONG_MAX. If offset > LLONG_MAX - PAGE_SIZE, offset + len < LLONG_MAX in shmem_fallocate, which will pass the checking in vfs_fallocate. /* Check for wrap through zero too */ if (((offset + len) > inode->i_sb->s_maxbytes) || ((offset + len) < 0)) return -EFBIG; loff_t unmap_start = round_up(offset, PAGE_SIZE) in shmem_fallocate causes a overflow. Syzkaller reports a overflow problem in mm/shmem: UBSAN: Undefined behaviour in mm/shmem.c:2014:10 signed integer overflow: '9223372036854775807 + 1' cannot be represented in type 'long long int' CPU: 0 PID:17076 Comm: syz-executor0 Not tainted 4.1.46+ #1 Hardware name: linux, dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2c8 arch/arm64/kernel/traps.c:100 show_stack+0x20/0x30 arch/arm64/kernel/traps.c:238 __dump_stack lib/dump_stack.c:15 [inline] ubsan_epilogue+0x18/0x70 lib/ubsan.c:164 handle_overflow+0x158/0x1b0 lib/ubsan.c:195 shmem_fallocate+0x6d0/0x820 mm/shmem.c:2104 vfs_fallocate+0x238/0x428 fs/open.c:312 SYSC_fallocate fs/open.c:335 [inline] SyS_fallocate+0x54/0xc8 fs/open.c:239 The highest bit of unmap_start will be appended with sign bit 1 (overflow) when calculate shmem_falloc.start: shmem_falloc.start = unmap_start >> PAGE_SHIFT. Fix it by casting the type of unmap_start to u64, when right shifted. This bug is found in LTS Linux 4.1. It also seems to exist in mainline. Link: http://lkml.kernel.org/r/1573867464-5107-1-git-send-email-chenjun102@huawei.comSigned-off-by: NChen Jun <chenjun102@huawei.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Qian Cai <cai@lca.pw> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yang Shi 提交于
The shmem_writepage() uses GFP_ATOMIC to allocate swap cache. GFP_ATOMIC used to mean __GFP_HIGH, but now it means __GFP_HIGH | __GFP_ATOMIC | __GFP_KSWAPD_RECLAIM. However, shmem_writepage() should write out to swap only in response to memory pressure, so __GFP_KSWAPD_RECLAIM looks useless since the caller may be kswapd itself or in direct reclaim already. In addition, XArray node allocations from PF_MEMALLOC contexts could completely exhaust the page allocator, __GFP_NOMEMALLOC stops emergency reserves from being allocated. Here just copy the gfp flags used by add_to_swap(). Hugh: "a cleanup to make the two calls look the same when they don't need to be different (whereas the call from __read_swap_cache_async() rightly uses a lower priority gfp)". Link: http://lkml.kernel.org/r/1572991351-86061-1-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: NYang Shi <yang.shi@linux.alibaba.com> Acked-by: NHugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Colin Ian King 提交于
Don't populate the array 'values' on the stack but instead make it static const. Makes the object code smaller by 111 bytes. Before: text data bss dec hex filename 108612 11169 512 120293 1d5e5 mm/shmem.o After: text data bss dec hex filename 108437 11233 512 120182 1d576 mm/shmem.o (gcc version 9.2.1, amd64) Link: http://lkml.kernel.org/r/20190906143012.28698-1-colin.king@canonical.comSigned-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
When doing UFFDIO_COPY, it is necessary to find the correct destination vma and make sure fault range is in it. Since there are two places need to do the same task, just wrap those common check into an inlined function. Link: http://lkml.kernel.org/r/20190927070032.2129-3-richardw.yang@linux.intel.comSigned-off-by: NWei Yang <richardw.yang@linux.intel.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
These warning here is to make sure address(dst_addr) and length(len - copied) are huge page size aligned. While this is ensured by: dst_start and len is huge page size aligned dst_addr equals to dst_start and increase huge page size each time copied increase huge page size each time This means these warnings will never be triggered. Link: http://lkml.kernel.org/r/20190927070032.2129-2-richardw.yang@linux.intel.comSigned-off-by: NWei Yang <richardw.yang@linux.intel.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
In __mcopy_atomic_hugetlb() we use two variables to deal with huge page size: vma_hpagesize and huge_page_size. Since they are the same, it is not necessary to use two different mechanism. This patch makes it consistent by all using vma_hpagesize. Link: http://lkml.kernel.org/r/20190927070032.2129-1-richardw.yang@linux.intel.comSigned-off-by: NWei Yang <richardw.yang@linux.intel.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
Improve readability, no functional change. Link: http://lkml.kernel.org/r/20191118032857.22683-1-richardw.yang@linux.intel.comSigned-off-by: NWei Yang <richardw.yang@linux.intel.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yunfeng Ye 提交于
page_size() is supported after the commit a50b854e ("mm: introduce page_size()"). Use page_size() in madvise_inject_error() for readability. [akpm@linux-foundation.org: use ulong for `size', per David] Link: http://lkml.kernel.org/r/29dce60c-38d6-0220-f292-e298f0c78c4d@huawei.comSigned-off-by: NYunfeng Ye <yeyunfeng@huawei.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Hu Shiyuan <hushiyuan@huawei.com> Cc: Feilong Lin <linfeilong@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
Case 1/6, 2/7 and 3/8 have the same pattern and we handle them in the same logic. Rearrange the comment to make it a little easy for audience to understand. Link: http://lkml.kernel.org/r/20191030012445.16944-1-richardw.yang@linux.intel.comSigned-off-by: NWei Yang <richardw.yang@linux.intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Michel Lespinasse <walken@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 zhong jiang 提交于
It is more clear to use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs file operation rather than DEFINE_SIMPLE_ATTRIBUTE. Link: http://lkml.kernel.org/r/1572403660-44718-1-git-send-email-zhongjiang@huawei.comSigned-off-by: Nzhong jiang <zhongjiang@huawei.com> Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Huang Ying 提交于
In auto NUMA balancing page table scanning, if the pte_protnone() is true, the PTE needs not to be changed because it's in target state already. So other checking on corresponding struct page is unnecessary too. So, if we check pte_protnone() firstly for each PTE, we can avoid unnecessary struct page accessing, so that reduce the cache footprint of NUMA balancing page table scanning. In the performance test of pmbench memory accessing benchmark with 80:20 read/write ratio and normal access address distribution on a 2 socket Intel server with Optance DC Persistent Memory, perf profiling shows that the autonuma page table scanning time reduces from 1.23% to 0.97% (that is, reduced 21%) with the patch. Link: http://lkml.kernel.org/r/20191101075727.26683-3-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com> Acked-by: NMel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Huang Ying 提交于
When zone_watermark_ok() is called in migrate_balanced_pgdat() to check migration target node, the parameter classzone_idx (for requested zone) is specified as 0 (ZONE_DMA). But when allocating memory for autonuma in alloc_misplaced_dst_page(), the requested zone from GFP flags is ZONE_MOVABLE. That is, the requested zone is different. The size of lowmem_reserve for the different requested zone is different. And this may cause some issues. For example, in the zoneinfo of a test machine as below, Node 0, zone DMA32 pages free 61592 min 29 low 454 high 879 spanned 1044480 present 442306 managed 425921 protection: (0, 0, 62457, 62457, 62457) The free page number of ZONE_DMA32 is greater than "high watermark + lowmem_reserve[ZONE_DMA]", but less than "high watermark + lowmem_reserve[ZONE_MOVABLE]". And because __alloc_pages_node() in alloc_misplaced_dst_page() requests ZONE_MOVABLE, the zone_watermark_ok() on ZONE_DMA32 in migrate_balanced_pgdat() may always return true. So, autonuma may not stop even when memory pressure in node 0 is heavy. To fix the issue, ZONE_MOVABLE is used as parameter to call zone_watermark_ok() in migrate_balanced_pgdat(). This makes it same as requested zone in alloc_misplaced_dst_page(). So that migrate_balanced_pgdat() returns false when memory pressure is heavy. Link: http://lkml.kernel.org/r/20191101075727.26683-2-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com> Acked-by: NMel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 zhong jiang 提交于
It is more clear to use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs file operation rather than DEFINE_SIMPLE_ATTRIBUTE. Link: http://lkml.kernel.org/r/1572348687-9951-1-git-send-email-zhongjiang@huawei.comSigned-off-by: Nzhong jiang <zhongjiang@huawei.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Yue Hu <huyue2@yulong.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yunfeng Ye 提交于
kzalloc() is used for cma bitmap allocation in cma_activate_area(), switch to bitmap_zalloc() for clarity. Link: http://lkml.kernel.org/r/895d4627-f115-c77a-d454-c0a196116426@huawei.comSigned-off-by: NYunfeng Ye <yeyunfeng@huawei.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Yue Hu <huyue2@yulong.com> Cc: Peng Fan <peng.fan@nxp.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Ryohei Suzuki <ryh.szk.cmnty@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Doug Berger <opendmb@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Song Liu 提交于
For non-shmem file THPs, khugepaged only collapses read only .text mapping (VM_DENYWRITE). These pages should not be dirty except the case where the file hasn't been flushed since first write. Call filemap_flush() in collapse_file() to accelerate the write back in such cases. Link: http://lkml.kernel.org/r/20191106060930.2571389-3-songliubraving@fb.comSigned-off-by: NSong Liu <songliubraving@fb.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
Adding fully unmapped pages into deferred split queue is not productive: these pages are about to be freed or they are pinned and cannot be split anyway. Link: http://lkml.kernel.org/r/20190913091849.11151-1-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yang Shi 提交于
When doing migration if the freed page is met, we just return without migrating it since it is pointless to migrate a freed page. But, the current code allocates target page unconditionally before handling freed page, if the page is freed, the newly allocated will be just freed. It doesn't make too much sense and is just a waste of time although migrating freed page is rare. So, handle freed page at the before that to avoid unnecessary page allocation and free. Link: http://lkml.kernel.org/r/1573755869-106954-1-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: NYang Shi <yang.shi@linux.alibaba.com> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 zhong jiang 提交于
split_huge_pages_fops is used for debugfs file. hence, it is more clear to use DEFINE_DEBUGFS_ATTRIBUTE. Link: http://lkml.kernel.org/r/1572347674-8111-1-git-send-email-zhongjiang@huawei.comSigned-off-by: Nzhong jiang <zhongjiang@huawei.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Zhigang Lu 提交于
When mmapping an existing hugetlbfs file with MAP_POPULATE, we find it is very time consuming. For example, mmapping a 128GB file takes about 50 milliseconds. Sampling with perfevent shows it spends 99% time in the same_page loop in follow_hugetlb_page(). samples: 205 of event 'cycles', Event count (approx.): 136686374 - 99.04% test_mmap_huget [kernel.kallsyms] [k] follow_hugetlb_page follow_hugetlb_page __get_user_pages __mlock_vma_pages_range __mm_populate vm_mmap_pgoff sys_mmap_pgoff sys_mmap system_call_fastpath __mmap64 follow_hugetlb_page() is called with pages=NULL and vmas=NULL, so for each hugepage, we run into the same_page loop for pages_per_huge_page() times, but doing nothing. With this change, it takes less then 1 millisecond to mmap a 128GB file in hugetlbfs. Link: http://lkml.kernel.org/r/1567581712-5992-1-git-send-email-totty.lu@gmail.comSigned-off-by: NZhigang Lu <tonnylu@tencent.com> Reviewed-by: NHaozhong Zhang <hzhongzhang@tencent.com> Reviewed-by: NZongming Zhang <knightzhang@tencent.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Acked-by: NMatthew Wilcox <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
The first parameter hstate in function hugetlb_fault_mutex_hash() is not used anymore. This patch removes it. [akpm@linux-foundation.org: various build fixes] [cai@lca.pw: fix a GCC compilation warning] Link: http://lkml.kernel.org/r/1570544108-32331-1-git-send-email-cai@lca.pw Link: http://lkml.kernel.org/r/20191005003302.785-1-richardw.yang@linux.intel.comSigned-off-by: NWei Yang <richardw.yang@linux.intel.com> Signed-off-by: NQian Cai <cai@lca.pw> Suggested-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mina Almasry 提交于
Remove duplicated code between region_chg and region_add, and refactor it into a common function, add_reservation_in_range. This is mostly done because there is a follow up change in another series that disables region coalescing in region_add, and I want to make that change in one place only. It should improve maintainability anyway on its own. [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/20190919200428.188797-3-almasrymina@google.comSigned-off-by: NMina Almasry <almasrymina@google.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mina Almasry 提交于
Current behavior is that region_chg provides both a cache entry in resv->region_cache, AND a placeholder entry in resv->regions. region_add first tries to use the placeholder, and if it finds that the placeholder has been deleted by a racing region_del call, it uses the cache entry. This behavior is completely unnecessary and is removed in this patch for a couple of reasons: 1. region_add needs to either find a cached file_region entry in resv->region_cache, or find an entry in resv->regions to expand. It does not need both. 2. region_chg adding a placeholder entry in resv->regions opens up a possible race with region_del, where region_chg adds a placeholder region in resv->regions, and this region is deleted by a racing call to region_del during region_chg execution or before region_add is called. Removing the race makes the code easier to reason about and maintain. In addition, a follow up patch in another series that disables region coalescing, which would be further complicated if the race with region_del exists. Link: http://lkml.kernel.org/r/20190919200428.188797-2-almasrymina@google.comSigned-off-by: NMina Almasry <almasrymina@google.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Waiman Long 提交于
A customer with large SMP systems (up to 16 sockets) with application that uses large amount of static hugepages (~500-1500GB) are experiencing random multisecond delays. These delays were caused by the long time it took to scan the VMA interval tree with mmap_sem held. The sharing of huge PMD does not require changes to the i_mmap at all. Therefore, we can just take the read lock and let other threads searching for the right VMA share it in parallel. Once the right VMA is found, either the PMD lock (2M huge page for x86-64) or the mm->page_table_lock will be acquired to perform the actual PMD sharing. Lock contention, if present, will happen in the spinlock. That is much better than contention in the rwsem where the time needed to scan the the interval tree is indeterminate. With this patch applied, the customer is seeing significant performance improvement over the unpatched kernel. Link: http://lkml.kernel.org/r/20191107211809.9539-1-longman@redhat.comSigned-off-by: NWaiman Long <longman@redhat.com> Suggested-by: NMike Kravetz <mike.kravetz@oracle.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
A new clang diagnostic (-Wsizeof-array-div) warns about the calculation to determine the number of u32's in an array of unsigned longs. Suppress warning by adding parentheses. While looking at the above issue, noticed that the 'address' parameter to hugetlb_fault_mutex_hash is no longer used. So, remove it from the definition and all callers. No functional change. Link: http://lkml.kernel.org/r/20190919011847.18400-1-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com> Reported-by: NNathan Chancellor <natechancellor@gmail.com> Reviewed-by: NNathan Chancellor <natechancellor@gmail.com> Reviewed-by: NDavidlohr Bueso <dbueso@suse.de> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Ilie Halip <ilie.halip@gmail.com> Cc: David Bolvansky <david.bolvansky@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-