- 21 4月, 2022 1 次提交
-
-
由 yanghui 提交于
mainline inclusion from mainline-v5.15-rc1 commit 276aeee1 category: bugfix bugzilla: 181417 https://gitee.com/openeuler/kernel/issues/I53CSV backport: openEuler-22.03-LTS Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=276aeee1c5fc00df700f0782060beae126600472 ------------------------------- Servers happened below panic: Kernel version:5.4.56 BUG: unable to handle page fault for address: 0000000000002c48 RIP: 0010:__next_zones_zonelist+0x1d/0x40 Call Trace: __alloc_pages_nodemask+0x277/0x310 alloc_page_interleave+0x13/0x70 handle_mm_fault+0xf99/0x1390 __do_page_fault+0x288/0x500 do_page_fault+0x30/0x110 page_fault+0x3e/0x50 The reason for the panic is that MAX_NUMNODES is passed in the third parameter in __alloc_pages_nodemask(preferred_nid). So access to zonelist->zoneref->zone_idx in __next_zones_zonelist will cause a panic. In offset_il_node(), first_node() returns nid from pol->v.nodes, after this other threads may chang pol->v.nodes before next_node(). This race condition will let next_node return MAX_NUMNODES. So put pol->nodes in a local variable. The race condition is between offset_il_node and cpuset_change_task_nodemask: CPU0: CPU1: alloc_pages_vma() interleave_nid(pol,) offset_il_node(pol,) first_node(pol->v.nodes) cpuset_change_task_nodemask //nodes==0xc mpol_rebind_task mpol_rebind_policy mpol_rebind_nodemask(pol,nodes) //nodes==0x3 next_node(nid, pol->v.nodes)//return MAX_NUMNODES Link: https://lkml.kernel.org/r/20210906034658.48721-1-yanghui.def@bytedance.comSigned-off-by: Nyanghui <yanghui.def@bytedance.com> Reviewed-by: NMuchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 276aeee1) conflicts: mm/mempolicy.c Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 26 1月, 2022 1 次提交
-
-
由 Andrey Ryabinin 提交于
stable inclusion from stable-v5.10.89 commit ee6f34215c5dfa2257298cc362cd79e14af5a25a bugzilla: 186140 https://gitee.com/openeuler/kernel/issues/I4S8HA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ee6f34215c5dfa2257298cc362cd79e14af5a25a -------------------------------- alloc_pages_vma() may try to allocate THP page on the local NUMA node first: page = __alloc_pages_node(hpage_node, gfp | __GFP_THISNODE | __GFP_NORETRY, order); And if the allocation fails it retries allowing remote memory: if (!page && (gfp & __GFP_DIRECT_RECLAIM)) page = __alloc_pages_node(hpage_node, gfp, order); However, this retry allocation completely ignores memory policy nodemask allowing allocation to escape restrictions. The first appearance of this bug seems to be the commit ac5b2c18 ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings"). The bug disappeared later in the commit 89c83fb5 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") and reappeared again in slightly different form in the commit 76e654cc ("mm, page_alloc: allow hugepage fallback to remote nodes when madvised") Fix this by passing correct nodemask to the __alloc_pages() call. The demonstration/reproducer of the problem: $ mount -oremount,size=4G,huge=always /dev/shm/ $ echo always > /sys/kernel/mm/transparent_hugepage/defrag $ cat mbind_thp.c #include <unistd.h> #include <sys/mman.h> #include <sys/stat.h> #include <fcntl.h> #include <assert.h> #include <stdlib.h> #include <stdio.h> #include <numaif.h> #define SIZE 2ULL << 30 int main(int argc, char **argv) { int fd; unsigned long long i; char *addr; pid_t pid; char buf[100]; unsigned long nodemask = 1; fd = open("/dev/shm/test", O_RDWR|O_CREAT); assert(fd > 0); assert(ftruncate(fd, SIZE) == 0); addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); assert(mbind(addr, SIZE, MPOL_BIND, &nodemask, 2, MPOL_MF_STRICT|MPOL_MF_MOVE)==0); for (i = 0; i < SIZE; i+=4096) { addr[i] = 1; } pid = getpid(); snprintf(buf, sizeof(buf), "grep shm /proc/%d/numa_maps", pid); system(buf); sleep(10000); return 0; } $ gcc mbind_thp.c -o mbind_thp -lnuma $ numactl -H available: 2 nodes (0-1) node 0 cpus: 0 2 node 0 size: 1918 MB node 0 free: 1595 MB node 1 cpus: 1 3 node 1 size: 2014 MB node 1 free: 1731 MB node distances: node 0 1 0: 10 20 1: 20 10 $ rm -f /dev/shm/test; taskset -c 0 ./mbind_thp 7fd970a00000 bind:0 file=/dev/shm/test dirty=524288 active=0 N0=396800 N1=127488 kernelpagesize_kB=4 Link: https://lkml.kernel.org/r/20211208165343.22349-1-arbn@yandex-team.com Fixes: ac5b2c18 ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings") Signed-off-by: NAndrey Ryabinin <arbn@yandex-team.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NMel Gorman <mgorman@techsingularity.net> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 29 11月, 2021 3 次提交
-
-
由 Lijun Fang 提交于
ascend inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4JMLR CVE: NA ----------------- CDM nodes should not be part of mems_allowed, However, It must be allowed to alloc from CDM node, when mpol->mode was MPOL_BIND. Signed-off-by: NLijun Fang <fanglijun3@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Anshuman Khandual 提交于
ascend inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4JMLR CVE: NA ------------------- CDM nodes need a way of explicit memory allocation mechanism from the user space. After the previous FALLBACK zonelist rebuilding process changes, the mbind(MPOL_BIND) based allocation request fails on the CDM node. This is because allocation requesting local node's FALLBACK zonelist is selected for further nodemask processing targeted at MPOL_BIND implementation. As the CDM node's zones are not part of any other regular node's FALLBACK zonelist, the allocation simply fails without getting any valid zone. The allocation requesting node is always going to be different than the CDM node which does not have any CPU. Hence MPOL_MBIND implementation must choose given CDM node's FALLBACK zonelist instead of the requesting local node's FALLBACK zonelist. This implements that change. Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: NLijun Fang <fanglijun3@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Anshuman Khandual 提交于
ascend inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4JMLR CVE: NA ------------------- Mark all the applicable VMAs with VM_CDM explicitly during mbind(MPOL_BIND) call if the user provided nodemask has a CDM node. Mark the corresponding VMA with VM_CDM flag if the allocated page happens to be from a CDM node. This can be expensive from performance stand point. There are multiple checks to avoid an expensive page_to_nid lookup but it can be optimized further. Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: NLijun Fang <fanglijun3@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 14 7月, 2021 6 次提交
-
-
由 Wei Li 提交于
hulk inclusion category: feature bugzilla: 169576 CVE: NA ------------------------------------------------- Enabling CNA is controlled via a new configuration option (NUMA_AWARE_SPINLOCKS). Add it for arm64. Signed-off-by: NWei Li <liwei391@huawei.com> Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Matthew Wilcox (Oracle) 提交于
mainline inclusion from mainline-5.13-rc1 commit 5f076944 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5f076944f06988391a6dbd458fc6485a71088e57 ------------------------------------------------- Sphinx interprets the Return section as a list and complains about it. Turn it into a sentence and move it to the end of the kernel-doc to fit the kernel-doc style. Link: https://lkml.kernel.org/r/20210225150642.2582252-8-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Acked-by: NMike Rapoport <rppt@linux.ibm.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 5f076944) Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com> (fix conflicts: add "kernel-doc:: mm/mempolicy.c" at the end of mm-api.rst) Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Matthew Wilcox (Oracle) 提交于
mainline inclusion from mainline-5.13-rc1 commit eb350739 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eb35073960510762dee417574589b7a8971c68ab ------------------------------------------------- The current formatting doesn't quite work with kernel-doc. Link: https://lkml.kernel.org/r/20210225150642.2582252-7-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Acked-by: NMike Rapoport <rppt@linux.ibm.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit eb350739) Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Matthew Wilcox (Oracle) 提交于
mainline inclusion from mainline-5.13-rc1 commit 6421ec76 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6421ec764a62c51f810c5dc40cd45eeb15801ad9 ------------------------------------------------- Document alloc_pages() for both NUMA and non-NUMA cases as kernel-doc doesn't care. Link: https://lkml.kernel.org/r/20210225150642.2582252-6-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Acked-by: NMike Rapoport <rppt@linux.ibm.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 6421ec76) Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Matthew Wilcox (Oracle) 提交于
mainline inclusion from mainline-5.13-rc1 commit d7f946d0 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d7f946d0faf90014547ee5d090e9d05018278c7a ------------------------------------------------- When CONFIG_NUMA is enabled, alloc_pages() is a wrapper around alloc_pages_current(). This is pointless, just implement alloc_pages() directly. Link: https://lkml.kernel.org/r/20210225150642.2582252-5-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit d7f946d0) Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Matthew Wilcox (Oracle) 提交于
mainline inclusion from mainline-5.13-rc1 commit 84172f4b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=84172f4bb752424415756351a40f8da5714e1554 ------------------------------------------------- There are only two callers of __alloc_pages() so prune the thicket of alloc_page variants by combining the two functions together. Current callers of __alloc_pages() simply add an extra 'NULL' parameter and current callers of __alloc_pages_nodemask() call __alloc_pages() instead. Link: https://lkml.kernel.org/r/20210225150642.2582252-4-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 84172f4b) Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 03 11月, 2020 1 次提交
-
-
由 Shijie Luo 提交于
When flags in queue_pages_pte_range don't have MPOL_MF_MOVE or MPOL_MF_MOVE_ALL bits, code breaks and passing origin pte - 1 to pte_unmap_unlock seems like not a good idea. queue_pages_pte_range can run in MPOL_MF_MOVE_ALL mode which doesn't migrate misplaced pages but returns with EIO when encountering such a page. Since commit a7f40cfe ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified") and early break on the first pte in the range results in pte_unmap_unlock on an underflow pte. This can lead to lockups later on when somebody tries to lock the pte resp. page_table_lock again.. Fixes: a7f40cfe ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified") Signed-off-by: NShijie Luo <luoshijie1@huawei.com> Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NOscar Salvador <osalvador@suse.de> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Feilong Lin <linfeilong@huawei.com> Cc: Shijie Luo <luoshijie1@huawei.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201019074853.50856-1-luoshijie1@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 10月, 2020 2 次提交
-
-
由 Wei Yang 提交于
No one use this macro anymore. Also fix code style of policy_node(). Signed-off-by: NWei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200921021401.84508-1-richard.weiyang@linux.alibaba.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
It is not necessary to hold the lock of current when setting nodemask of a new policy. Signed-off-by: NWei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200921040416.86185-1-richard.weiyang@linux.alibaba.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 8月, 2020 1 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
The thp prefix is more frequently used than hpage and we should be consistent between the various functions. [akpm@linux-foundation.org: fix mm/migrate.c] Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com> Reviewed-by: NZi Yan <ziy@nvidia.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 8月, 2020 5 次提交
-
-
由 Joonsoo Kim 提交于
There is a well-defined migration target allocation callback. Use it. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Roman Gushchin <guro@fb.com> Link: http://lkml.kernel.org/r/1594622517-20681-7-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
There is no difference between two migration callback functions, alloc_huge_page_node() and alloc_huge_page_nodemask(), except __GFP_THISNODE handling. It's redundant to have two almost similar functions in order to handle this flag. So, this patch tries to remove one by introducing a new argument, gfp_mask, to alloc_huge_page_nodemask(). After introducing gfp_mask argument, it's caller's job to provide correct gfp_mask. So, every callsites for alloc_huge_page_nodemask() are changed to provide gfp_mask. Note that it's safe to remove a node id check in alloc_huge_page_node() since there is no caller passing NUMA_NO_NODE as a node id. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Roman Gushchin <guro@fb.com> Link: http://lkml.kernel.org/r/1594622517-20681-4-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wenchao Hao 提交于
Previous implementatoin calls untagged_addr() before error check, while if the error check failed and return EINVAL, the untagged_addr() call is just useless work. Signed-off-by: NWenchao Hao <haowenchao22@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200801090825.5597-1-haowenchao22@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Krzysztof Kozlowski 提交于
Fix W=1 compile warnings (invalid kerneldoc): mm/mempolicy.c:137: warning: Function parameter or member 'node' not described in 'numa_map_to_online_node' mm/mempolicy.c:137: warning: Excess function parameter 'nid' description in 'numa_map_to_online_node' Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200728171109.28687-3-krzk@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
In the reservation routine, we only check whether the cpuset meets the memory allocation requirements. But we ignore the mempolicy of MPOL_BIND case. If someone mmap hugetlb succeeds, but the subsequent memory allocation may fail due to mempolicy restrictions and receives the SIGBUS signal. This can be reproduced by the follow steps. 1) Compile the test case. cd tools/testing/selftests/vm/ gcc map_hugetlb.c -o map_hugetlb 2) Pre-allocate huge pages. Suppose there are 2 numa nodes in the system. Each node will pre-allocate one huge page. echo 2 > /proc/sys/vm/nr_hugepages 3) Run test case(mmap 4MB). We receive the SIGBUS signal. numactl --membind=3D0 ./map_hugetlb 4 With this patch applied, the mmap will fail in the step 3) and throw "mmap: Cannot allocate memory". [akpm@linux-foundation.org: include sched.h for `current'] Reported-by: NJianchao Guo <guojianchao@bytedance.com> Suggested-by: NMichal Hocko <mhocko@kernel.org> Signed-off-by: NMuchun Song <songmuchun@bytedance.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michel Lespinasse <walken@google.com> Cc: Baoquan He <bhe@redhat.com> Link: http://lkml.kernel.org/r/20200728034938.14993-1-songmuchun@bytedance.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 7月, 2020 1 次提交
-
-
由 Kees Cook 提交于
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 10 6月, 2020 3 次提交
-
-
由 Michel Lespinasse 提交于
Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: NMichel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
Convert comments that reference old mmap_sem APIs to reference corresponding new mmap locking APIs instead. Signed-off-by: NMichel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NDavidlohr Bueso <dbueso@suse.de> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: NMichel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: NLaurent Dufour <ldufour@linux.ibm.com> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 6月, 2020 1 次提交
-
-
由 Michal Hocko 提交于
ba841078 ("mm/mempolicy: Allow lookup_node() to handle fatal signal") has added a special casing for 0 return value because that was a possible gup return value when interrupted by fatal signal. This has been fixed by ae46d2aa ("mm/gup: Let __get_user_pages_locked() return -EINTR for fatal signal") in the mean time so ba841078 can be reverted. This patch however doesn't go all the way to revert it because the check for 0 is wrong and confusing here. Firstly it is inherently unsafe to access the page when get_user_pages_locked returns 0 (aka no page returned). Fortunatelly this will not happen because get_user_pages_locked will not return 0 when nr_pages > 0 unless FOLL_NOWAIT is specified which is not the case here. Document this potential error code in gup code while we are at it. Signed-off-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Peter Xu <peterx@redhat.com> Link: http://lkml.kernel.org/r/20200421071026.18394-1-mhocko@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 4月, 2020 6 次提交
-
-
由 Peter Xu 提交于
lookup_node() uses gup to pin the page and get node information. It checks against ret>=0 assuming the page will be filled in. However it's also possible that gup will return zero, for example, when the thread is quickly killed with a fatal signal. Teach lookup_node() to gracefully return an error -EFAULT if it happens. Meanwhile, initialize "page" to NULL to avoid potential risk of exploiting the pointer. Fixes: 4426e945 ("mm/gup: allow VM_FAULT_RETRY for multiple times") Reported-by: syzbot+693dc11fcb53120b5559@syzkaller.appspotmail.com Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joe Perches 提交于
Convert the various /* fallthrough */ comments to the pseudo-keyword fallthrough; Done via script: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Link: http://lkml.kernel.org/r/f62fea5d10eb0ccfc05d87c242a620c261219b66.camel@perches.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jules Irenge 提交于
Sparse reports a warning at queue_pages_pmd() context imbalance in queue_pages_pmd() - unexpected unlock The root cause is the missing annotation at queue_pages_pmd() Add the missing __releases(ptl) Signed-off-by: NJules Irenge <jbi.octave@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200214204741.94112-8-jbi.octave@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Xu 提交于
change_protection() was used by either the NUMA or mprotect() code, there's one parameter for each of the callers (dirty_accountable and prot_numa). Further, these parameters are passed along the calls: - change_protection_range() - change_p4d_range() - change_pud_range() - change_pmd_range() - ... Now we introduce a flag for change_protect() and all these helpers to replace these parameters. Then we can avoid passing multiple parameters multiple times along the way. More importantly, it'll greatly simplify the work if we want to introduce any new parameters to change_protection(). In the follow up patches, a new parameter for userfaultfd write protection will be introduced. No functional change at all. Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJerome Glisse <jglisse@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@fb.com> Link: http://lkml.kernel.org/r/20200220163112.11409-7-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Huang Ying 提交于
Some comments for MADV_FREE is revised and added to help people understand the MADV_FREE code, especially the page flag, PG_swapbacked. This makes page_is_file_cache() isn't consistent with its comments. So the function is renamed to page_is_file_lru() to make them consistent again. All these are put in one patch as one logical change. Suggested-by: NDavid Hildenbrand <david@redhat.com> Suggested-by: NJohannes Weiner <hannes@cmpxchg.org> Suggested-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: N"Huang, Ying" <ying.huang@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NMichal Hocko <mhocko@kernel.org> Acked-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@surriel.com> Link: http://lkml.kernel.org/r/20200317100342.2730705-1-ying.huang@intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anshuman Khandual 提交于
Lets move vma_is_accessible() helper to include/linux/mm.h which makes it available for general use. While here, this replaces all remaining open encodings for VMA access check with vma_is_accessible(). Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org> Acked-by: NGuo Ren <guoren@kernel.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Guo Ren <guoren@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paulburton@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Nick Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/1582520593-30704-3-git-send-email-anshuman.khandual@arm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 4月, 2020 4 次提交
-
-
由 Randy Dunlap 提交于
Using an empty (malformed) nodelist that is not caught during mount option parsing leads to a stack-out-of-bounds access. The option string that was used was: "mpol=prefer:,". However, MPOL_PREFERRED requires a single node number, which is not being provided here. Add a check that 'nodes' is not empty after parsing for MPOL_PREFERRED's nodeid. Fixes: 095f1fc4 ("mempolicy: rework shmem mpol parsing and display") Reported-by: NEntropy Moe <3ntr0py1337@gmail.com> Reported-by: syzbot+b055b1a6b2b958707a21@syzkaller.appspotmail.com Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: syzbot+b055b1a6b2b958707a21@syzkaller.appspotmail.com Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Link: http://lkml.kernel.org/r/89526377-7eb6-b662-e1d8-4430928abde9@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yang Shi 提交于
The VM_BUG_ON() is already used by queue_pages_test_walk(), it sounds better to dump more debug information by using VM_BUG_ON_VMA() to help debugging. Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: "Li Xinhai" <lixinhai.lxh@gmail.com> Cc: Qian Cai <cai@lca.pw> Link: http://lkml.kernel.org/r/1579068565-110432-1-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Li Xinhai 提交于
vma_migratable() is called to check if pages in vma can be migrated before go ahead to further actions. Currently it is used in below code path: - task_numa_work - mbind - move_pages For hugetlb mapping, whether vma is migratable or not is determined by: - CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION - arch_hugetlb_migration_supported Issue: current code only checks for CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION alone, and no code should use it directly. (note that current code in vma_migratable don't cause failure or bug because unmap_and_move_huge_page() will catch unsupported hugepage and handle it properly) This patch checks the two factors by hugepage_migration_supported for impoving code logic and robustness. It will enable early bail out of hugepage migration procedure, but because currently all architecture supporting hugepage migration is able to support all page size, we would not see performance gain with this patch applied. vma_migratable() is moved to mm/mempolicy.c, because of the circular reference of mempolicy.h and hugetlb.h cause defining it as inline not feasible. Signed-off-by: NLi Xinhai <lixinhai.lxh@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Link: http://lkml.kernel.org/r/1579786179-30633-1-git-send-email-lixinhai.lxh@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Li Xinhai 提交于
MPOL_MF_STRICT is used in mbind() for purposes: (1) MPOL_MF_STRICT is set alone without MPOL_MF_MOVE or MPOL_MF_MOVE_ALL, to check if there is misplaced page and return -EIO; (2) MPOL_MF_STRICT is set with MPOL_MF_MOVE or MPOL_MF_MOVE_ALL, to check if there is misplaced page which is failed to isolate, or page is success on isolate but failed to move, and return -EIO. For non hugepage mapping, (1) and (2) are implemented as expectation. For hugepage mapping, (1) is not implemented. And in (2), the part about failed to isolate and report -EIO is not implemented. This patch implements the missed parts for hugepage mapping. Benefits with it applied: - User space can apply same code logic to handle mbind() on hugepage and non hugepage mapping; - Reliably using MPOL_MF_STRICT alone to check whether there is misplaced page or not when bind policy on address range, especially for address range which contains both hugepage and non hugepage mapping. Analysis of potential impact to existing users: - If MPOL_MF_STRICT alone was previously used, hugetlb pages not following the memory policy would not cause an EIO error. After this change, hugetlb pages are treated like all other pages. If MPOL_MF_STRICT alone is used and hugetlb pages do not follow memory policy an EIO error will be returned. - For users who using MPOL_MF_STRICT with MPOL_MF_MOVE or MPOL_MF_MOVE_ALL, the semantic about some pages could not be moved will not be changed by this patch, because failed to isolate and failed to move have same effects to users, so their existing code will not be impacted. In mbind man page, the note about 'MPOL_MF_STRICT is ignored on huge page mappings' can be removed after this patch is applied. Mike: : The current behavior with MPOL_MF_STRICT and hugetlb pages is inconsistent : and does not match documentation (as described above). The special : behavior for hugetlb pages ideally should have been removed when hugetlb : page migration was introduced. It is unlikely that anyone relies on : today's inconsistent behavior, and removing one more case of special : handling for hugetlb pages is a good thing. Signed-off-by: NLi Xinhai <lixinhai.lxh@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Reviewed-by: NNaoya Horiguchi <naoya.horiguchi@nec.com> Cc: Michal Hocko <mhocko@suse.com> Cc: linux-man <linux-man@vger.kernel.org> Link: http://lkml.kernel.org/r/1581559627-6206-1-git-send-email-lixinhai.lxh@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 2月, 2020 2 次提交
-
-
由 Dan Williams 提交于
Update numa_map_to_online_node() to stop falling back to numa node 0 when the input is NUMA_NO_NODE. Also, skip the lookup if @node is online. This makes the routine compatible with other arch node mapping routines. Reported-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/157401275716.43284.13185549705765009174.stgit@dwillia2-desk3.amr.corp.intel.comReviewed-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/158188325316.894464.15650888748083329531.stgit@dwillia2-desk3.amr.corp.intel.com
-
由 Dan Williams 提交于
The acpi_map_pxm_to_online_node() helper is used to find the closest online node to a given proximity domain. This is used to map devices in a proximity domain with no online memory or cpus to the closest online node and populate a device's 'numa_node' property. The numa_node property allows applications to be migrated "close" to a resource. In preparation for providing a generic facility to optionally map an address range to its closest online node, or the node the range would represent were it to be onlined (target_node), up-level the core of acpi_map_pxm_to_online_node() to a generic mm/numa helper. Cc: Michal Hocko <mhocko@suse.com> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/158188324802.894464.13128795207831894206.stgit@dwillia2-desk3.amr.corp.intel.com
-
- 01 2月, 2020 1 次提交
-
-
由 Dan Carpenter 提交于
What we are trying to do is change the '=' character to a NUL terminator and then at the end of the function we restore it back to an '='. The problem is there are two error paths where we jump to the end of the function before we have replaced the '=' with NUL. We end up putting the '=' in the wrong place (possibly one element before the start of the buffer). Link: http://lkml.kernel.org/r/20200115055426.vdjwvry44nfug7yy@kili.mountain Reported-by: syzbot+e64a13c5369a194d67df@syzkaller.appspotmail.com Fixes: 095f1fc4 ("mempolicy: rework shmem mpol parsing and display") Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Dmitry Vyukov <dvyukov@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 1月, 2020 1 次提交
-
-
由 Vlastimil Babka 提交于
THP page faults now attempt a __GFP_THISNODE allocation first, which should only compact existing free memory, followed by another attempt that can allocate from any node using reclaim/compaction effort specified by global defrag setting and madvise. This patch makes the following changes to the scheme: - Before the patch, the first allocation relies on a check for pageblock order and __GFP_IO to prevent excessive reclaim. This however affects also the second attempt, which is not limited to single node. Instead of that, reuse the existing check for costly order __GFP_NORETRY allocations, and make sure the first THP attempt uses __GFP_NORETRY. As a side-effect, all costly order __GFP_NORETRY allocations will bail out if compaction needs reclaim, while previously they only bailed out when compaction was deferred due to previous failures. This should be still acceptable within the __GFP_NORETRY semantics. - Before the patch, the second allocation attempt (on all nodes) was passing __GFP_NORETRY. This is redundant as the check for pageblock order (discussed above) was stronger. It's also contrary to madvise(MADV_HUGEPAGE) which means some effort to allocate THP is requested. After this patch, the second attempt doesn't pass __GFP_THISNODE nor __GFP_NORETRY. To sum up, THP page faults now try the following attempts: 1. local node only THP allocation with no reclaim, just compaction. 2. for madvised VMA's or when synchronous compaction is enabled always - THP allocation from any node with effort determined by global defrag setting and VMA madvise 3. fallback to base pages on any node Link: http://lkml.kernel.org/r/08a3f4dd-c3ce-0009-86c5-9ee51aba8557@suse.cz Fixes: b39d0ee2 ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed") Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 12月, 2019 1 次提交
-
-
由 Li Xinhai 提交于
mbind() is required to report EFAULT if range, specified by addr and len, contains unmapped holes. In current implementation, below rules are applied for this checking: 1: Unmapped holes at any part of the specified range should be reported as EFAULT if mbind() for none MPOL_DEFAULT cases; 2: Unmapped holes at any part of the specified range should be ignored (do not reprot EFAULT) if mbind() for MPOL_DEFAULT case; 3: The whole range in an unmapped hole should be reported as EFAULT; Note that rule 2 does not fullfill the mbind() API definition, but since that behavior has existed for long days (the internal flag MPOL_MF_DISCONTIG_OK is for this purpose), this patch does not plan to change it. In current code, application observed inconsistent behavior on rule 1 and rule 2 respectively. That inconsistency is fixed as below details. Cases of rule 1: - Hole at head side of range. Current code reprot EFAULT, no change by this patch. [ vma ][ hole ][ vma ] [ range ] - Hole at middle of range. Current code report EFAULT, no change by this patch. [ vma ][ hole ][ vma ] [ range ] - Hole at tail side of range. Current code do not report EFAULT, this patch fixes it. [ vma ][ hole ][ vma ] [ range ] Cases of rule 2: - Hole at head side of range. Current code reports EFAULT, this patch fixes it. [ vma ][ hole ][ vma ] [ range ] - Hole at middle of range. Current code does not report EFAULT, no change by this patch. [ vma ][ hole ][ vma] [ range ] - Hole at tail side of range. Current code does not report EFAULT, no change by this patch. [ vma ][ hole ][ vma] [ range ] This patch has no changes to rule 3. The unmapped hole checking can also be handled by using .pte_hole(), instead of .test_walk(). But .pte_hole() is called for holes inside and outside vma, which causes more cost, so this patch keeps the original design with .test_walk(). Link: http://lkml.kernel.org/r/1573218104-11021-3-git-send-email-lixinhai.lxh@gmail.com Fixes: 6f4576e3 ("mempolicy: apply page table walker on queue_pages_range()") Signed-off-by: NLi Xinhai <lixinhai.lxh@gmail.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: linux-man <linux-man@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-