1. 09 9月, 2021 1 次提交
    • A
      mm: simplify compat numa syscalls · e130242d
      Arnd Bergmann 提交于
      The compat implementations for mbind, get_mempolicy, set_mempolicy and
      migrate_pages are just there to handle the subtly different layout of
      bitmaps on 32-bit hosts.
      
      The compat implementation however lacks some of the checks that are
      present in the native one, in particular for checking that the extra bits
      are all zero when user space has a larger mask size than the kernel.
      Worse, those extra bits do not get cleared when copying in or out of the
      kernel, which can lead to incorrect data as well.
      
      Unify the implementation to handle the compat bitmap layout directly in
      the get_nodes() and copy_nodes_to_user() helpers.  Splitting out the
      get_bitmap() helper from get_nodes() also helps readability of the native
      case.
      
      On x86, two additional problems are addressed by this: compat tasks can
      pass a bitmap at the end of a mapping, causing a fault when reading across
      the page boundary for a 64-bit word.  x32 tasks might also run into
      problems with get_mempolicy corrupting data when an odd number of 32-bit
      words gets passed.
      
      On parisc the migrate_pages() system call apparently had the wrong calling
      convention, as big-endian architectures expect the words inside of a
      bitmap to be swapped.  This is not a problem though since parisc has no
      NUMA support.
      
      [arnd@arndb.de: fix mempolicy crash]
        Link: https://lkml.kernel.org/r/20210730143417.3700653-1-arnd@kernel.org
        Link: https://lore.kernel.org/lkml/YQPLG20V3dmOfq3a@osiris/
      
      Link: https://lkml.kernel.org/r/20210727144859.4150043-5-arnd@kernel.orgSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e130242d
  2. 04 9月, 2021 8 次提交
  3. 01 7月, 2021 5 次提交
  4. 30 6月, 2021 2 次提交
  5. 07 5月, 2021 2 次提交
  6. 06 5月, 2021 3 次提交
  7. 01 5月, 2021 5 次提交
  8. 25 2月, 2021 2 次提交
    • M
      mm/mempolicy: use helper range_in_vma() in queue_pages_test_walk() · ce33135c
      Miaohe Lin 提交于
      The helper range_in_vma() is introduced via commit 017b1660 ("mm:
      migration: fix migration of huge PMD shared pages"). But we forgot to
      use it in queue_pages_test_walk().
      
      Link: https://lkml.kernel.org/r/20210130091352.20220-1-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce33135c
    • H
      numa balancing: migrate on fault among multiple bound nodes · bda420b9
      Huang Ying 提交于
      Now, NUMA balancing can only optimize the page placement among the NUMA
      nodes if the default memory policy is used.  Because the memory policy
      specified explicitly should take precedence.  But this seems too strict in
      some situations.  For example, on a system with 4 NUMA nodes, if the
      memory of an application is bound to the node 0 and 1, NUMA balancing can
      potentially migrate the pages between the node 0 and 1 to reduce
      cross-node accessing without breaking the explicit memory binding policy.
      
      So in this patch, we add MPOL_F_NUMA_BALANCING mode flag to
      set_mempolicy() when mode is MPOL_BIND.  With the flag specified, NUMA
      balancing will be enabled within the thread to optimize the page placement
      within the constrains of the specified memory binding policy.  With the
      newly added flag, the NUMA balancing control mechanism becomes,
      
       - sysctl knob numa_balancing can enable/disable the NUMA balancing
         globally.
      
       - even if sysctl numa_balancing is enabled, the NUMA balancing will be
         disabled for the memory areas or applications with the explicit
         memory policy by default.
      
       - MPOL_F_NUMA_BALANCING can be used to enable the NUMA balancing for
         the applications when specifying the explicit memory policy
         (MPOL_BIND).
      
      Various page placement optimization based on the NUMA balancing can be
      done with these flags.  As the first step, in this patch, if the memory of
      the application is bound to multiple nodes (MPOL_BIND), and in the hint
      page fault handler the accessing node are in the policy nodemask, the page
      will be tried to be migrated to the accessing node to reduce the
      cross-node accessing.
      
      If the newly added MPOL_F_NUMA_BALANCING flag is specified by an
      application on an old kernel version without its support, set_mempolicy()
      will return -1 and errno will be set to EINVAL.  The application can use
      this behavior to run on both old and new kernel versions.
      
      And if the MPOL_F_NUMA_BALANCING flag is specified for the mode other than
      MPOL_BIND, set_mempolicy() will return -1 and errno will be set to EINVAL
      as before.  Because we don't support optimization based on the NUMA
      balancing for these modes.
      
      In the previous version of the patch, we tried to reuse MPOL_MF_LAZY for
      mbind().  But that flag is tied to MPOL_MF_MOVE.*, so it seems not a good
      API/ABI for the purpose of the patch.
      
      And because it's not clear whether it's necessary to enable NUMA balancing
      for a specific memory area inside an application, so we only add the flag
      at the thread level (set_mempolicy()) instead of the memory area level
      (mbind()).  We can do that when it become necessary.
      
      To test the patch, we run a test case as follows on a 4-node machine with
      192 GB memory (48 GB per node).
      
      1. Change pmbench memory accessing benchmark to call set_mempolicy()
         to bind its memory to node 1 and 3 and enable NUMA balancing.  Some
         related code snippets are as follows,
      
           #include <numaif.h>
           #include <numa.h>
      
      	struct bitmask *bmp;
      	int ret;
      
      	bmp = numa_parse_nodestring("1,3");
      	ret = set_mempolicy(MPOL_BIND | MPOL_F_NUMA_BALANCING,
      			    bmp->maskp, bmp->size + 1);
      	/* If MPOL_F_NUMA_BALANCING isn't supported, fall back to MPOL_BIND */
      	if (ret < 0 && errno == EINVAL)
      		ret = set_mempolicy(MPOL_BIND, bmp->maskp, bmp->size + 1);
      	if (ret < 0) {
      		perror("Failed to call set_mempolicy");
      		exit(-1);
      	}
      
      2. Run a memory eater on node 3 to use 40 GB memory before running pmbench.
      
      3. Run pmbench with 64 processes, the working-set size of each process
         is 640 MB, so the total working-set size is 64 * 640 MB = 40 GB.  The
         CPU and the memory (as in step 1.) of all pmbench processes is bound
         to node 1 and 3. So, after CPU usage is balanced, some pmbench
         processes run on the CPUs of the node 3 will access the memory of
         the node 1.
      
      4. After the pmbench processes run for 100 seconds, kill the memory
         eater.  Now it's possible for some pmbench processes to migrate
         their pages from node 1 to node 3 to reduce cross-node accessing.
      
      Test results show that, with the patch, the pages can be migrated from
      node 1 to node 3 after killing the memory eater, and the pmbench score
      can increase about 17.5%.
      
      Link: https://lkml.kernel.org/r/20210120061235.148637-2-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bda420b9
  9. 13 1月, 2021 1 次提交
  10. 16 12月, 2020 1 次提交
  11. 03 11月, 2020 1 次提交
  12. 14 10月, 2020 2 次提交
  13. 15 8月, 2020 1 次提交
  14. 13 8月, 2020 5 次提交
  15. 17 7月, 2020 1 次提交