1. 19 5月, 2018 1 次提交
  2. 26 3月, 2018 1 次提交
  3. 23 2月, 2018 1 次提交
  4. 01 2月, 2018 1 次提交
  5. 18 11月, 2017 1 次提交
  6. 09 9月, 2017 6 次提交
  7. 07 9月, 2017 1 次提交
  8. 11 7月, 2017 1 次提交
  9. 07 7月, 2017 2 次提交
    • M
      mm, memory_hotplug: drop CONFIG_MOVABLE_NODE · f70029bb
      Michal Hocko 提交于
      Commit 20b2f52b ("numa: add CONFIG_MOVABLE_NODE for
      movable-dedicated node") has introduced CONFIG_MOVABLE_NODE without a
      good explanation on why it is actually useful.
      
      It makes a lot of sense to make movable node semantic opt in but we
      already have that because the feature has to be explicitly enabled on
      the kernel command line.  A config option on top only makes the
      configuration space larger without a good reason.  It also adds an
      additional ifdefery that pollutes the code.
      
      Just drop the config option and make it de-facto always enabled.  This
      shouldn't introduce any change to the semantic.
      
      Link: http://lkml.kernel.org/r/20170529114141.536-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Kani Toshimitsu <toshi.kani@hpe.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f70029bb
    • H
      mm, THP, swap: delay splitting THP during swap out · 38d8b4e6
      Huang Ying 提交于
      Patch series "THP swap: Delay splitting THP during swapping out", v11.
      
      This patchset is to optimize the performance of Transparent Huge Page
      (THP) swap.
      
      Recently, the performance of the storage devices improved so fast that
      we cannot saturate the disk bandwidth with single logical CPU when do
      page swap out even on a high-end server machine.  Because the
      performance of the storage device improved faster than that of single
      logical CPU.  And it seems that the trend will not change in the near
      future.  On the other hand, the THP becomes more and more popular
      because of increased memory size.  So it becomes necessary to optimize
      THP swap performance.
      
      The advantages of the THP swap support include:
      
       - Batch the swap operations for the THP to reduce lock
         acquiring/releasing, including allocating/freeing the swap space,
         adding/deleting to/from the swap cache, and writing/reading the swap
         space, etc. This will help improve the performance of the THP swap.
      
       - The THP swap space read/write will be 2M sequential IO. It is
         particularly helpful for the swap read, which are usually 4k random
         IO. This will improve the performance of the THP swap too.
      
       - It will help the memory fragmentation, especially when the THP is
         heavily used by the applications. The 2M continuous pages will be
         free up after THP swapping out.
      
       - It will improve the THP utilization on the system with the swap
         turned on. Because the speed for khugepaged to collapse the normal
         pages into the THP is quite slow. After the THP is split during the
         swapping out, it will take quite long time for the normal pages to
         collapse back into the THP after being swapped in. The high THP
         utilization helps the efficiency of the page based memory management
         too.
      
      There are some concerns regarding THP swap in, mainly because possible
      enlarged read/write IO size (for swap in/out) may put more overhead on
      the storage device.  To deal with that, the THP swap in should be turned
      on only when necessary.  For example, it can be selected via
      "always/never/madvise" logic, to be turned on globally, turned off
      globally, or turned on only for VMA with MADV_HUGEPAGE, etc.
      
      This patchset is the first step for the THP swap support.  The plan is
      to delay splitting THP step by step, finally avoid splitting THP during
      the THP swapping out and swap out/in the THP as a whole.
      
      As the first step, in this patchset, the splitting huge page is delayed
      from almost the first step of swapping out to after allocating the swap
      space for the THP and adding the THP into the swap cache.  This will
      reduce lock acquiring/releasing for the locks used for the swap cache
      management.
      
      With the patchset, the swap out throughput improves 15.5% (from about
      3.73GB/s to about 4.31GB/s) in the vm-scalability swap-w-seq test case
      with 8 processes.  The test is done on a Xeon E5 v3 system.  The swap
      device used is a RAM simulated PMEM (persistent memory) device.  To test
      the sequential swapping out, the test case creates 8 processes, which
      sequentially allocate and write to the anonymous pages until the RAM and
      part of the swap device is used up.
      
      This patch (of 5):
      
      In this patch, splitting huge page is delayed from almost the first step
      of swapping out to after allocating the swap space for the THP
      (Transparent Huge Page) and adding the THP into the swap cache.  This
      will batch the corresponding operation, thus improve THP swap out
      throughput.
      
      This is the first step for the THP swap optimization.  The plan is to
      delay splitting the THP step by step and avoid splitting the THP
      finally.
      
      In this patch, one swap cluster is used to hold the contents of each THP
      swapped out.  So, the size of the swap cluster is changed to that of the
      THP (Transparent Huge Page) on x86_64 architecture (512).  For other
      architectures which want such THP swap optimization,
      ARCH_USES_THP_SWAP_CLUSTER needs to be selected in the Kconfig file for
      the architecture.  In effect, this will enlarge swap cluster size by 2
      times on x86_64.  Which may make it harder to find a free cluster when
      the swap space becomes fragmented.  So that, this may reduce the
      continuous swap space allocation and sequential write in theory.  The
      performance test in 0day shows no regressions caused by this.
      
      In the future of THP swap optimization, some information of the swapped
      out THP (such as compound map count) will be recorded in the
      swap_cluster_info data structure.
      
      The mem cgroup swap accounting functions are enhanced to support charge
      or uncharge a swap cluster backing a THP as a whole.
      
      The swap cluster allocate/free functions are added to allocate/free a
      swap cluster for a THP.  A fair simple algorithm is used for swap
      cluster allocation, that is, only the first swap device in priority list
      will be tried to allocate the swap cluster.  The function will fail if
      the trying is not successful, and the caller will fallback to allocate a
      single swap slot instead.  This works good enough for normal cases.  If
      the difference of the number of the free swap clusters among multiple
      swap devices is significant, it is possible that some THPs are split
      earlier than necessary.  For example, this could be caused by big size
      difference among multiple swap devices.
      
      The swap cache functions is enhanced to support add/delete THP to/from
      the swap cache as a set of (HPAGE_PMD_NR) sub-pages.  This may be
      enhanced in the future with multi-order radix tree.  But because we will
      split the THP soon during swapping out, that optimization doesn't make
      much sense for this first step.
      
      The THP splitting functions are enhanced to support to split THP in swap
      cache during swapping out.  The page lock will be held during allocating
      the swap cluster, adding the THP into the swap cache and splitting the
      THP.  So in the code path other than swapping out, if the THP need to be
      split, the PageSwapCache(THP) will be always false.
      
      The swap cluster is only available for SSD, so the THP swap optimization
      in this patchset has no effect for HDD.
      
      [ying.huang@intel.com: fix two issues in THP optimize patch]
        Link: http://lkml.kernel.org/r/87k25ed8zo.fsf@yhuang-dev.intel.com
      [hannes@cmpxchg.org: extensive cleanups and simplifications, reduce code size]
      Link: http://lkml.kernel.org/r/20170515112522.32457-2-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Suggested-by: Andrew Morton <akpm@linux-foundation.org> [for config option]
      Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> [for changes in huge_memory.c and huge_mm.h]
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38d8b4e6
  10. 02 7月, 2017 1 次提交
  11. 21 6月, 2017 1 次提交
  12. 13 6月, 2017 1 次提交
  13. 01 5月, 2017 1 次提交
  14. 23 4月, 2017 1 次提交
    • I
      Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" · 6dd29b3d
      Ingo Molnar 提交于
      This reverts commit 2947ba05.
      
      Dan Williams reported dax-pmem kernel warnings with the following signature:
      
         WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
         percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0 (0) after switching to atomic
      
      ... and bisected it to this commit, which suggests possible memory corruption
      caused by the x86 fast-GUP conversion.
      
      He also pointed out:
      
       "
        This is similar to the backtrace when we were not properly handling
        pud faults and was fixed with this commit: 220ced16 "mm: fix
        get_user_pages() vs device-dax pud mappings"
      
        I've found some missing _devmap checks in the generic
        get_user_pages_fast() path, but this does not fix the regression
        [...]
       "
      
      So given that there are known bugs, and a pretty robust looking bisection
      points to this commit suggesting that are unknown bugs in the conversion
      as well, revert it for the time being - we'll re-try in v4.13.
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: dann.frazier@canonical.com
      Cc: dave.hansen@intel.com
      Cc: steve.capper@linaro.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6dd29b3d
  15. 18 3月, 2017 1 次提交
  16. 13 12月, 2016 3 次提交
  17. 28 10月, 2016 1 次提交
  18. 27 8月, 2016 1 次提交
  19. 05 8月, 2016 1 次提交
  20. 29 7月, 2016 1 次提交
  21. 27 7月, 2016 1 次提交
  22. 28 5月, 2016 1 次提交
  23. 27 5月, 2016 1 次提交
  24. 21 5月, 2016 2 次提交
  25. 20 5月, 2016 2 次提交
    • V
      memory_hotplug: introduce CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE · 8604d9e5
      Vitaly Kuznetsov 提交于
      This patchset continues the work I started with commit 31bc3858
      ("memory-hotplug: add automatic onlining policy for the newly added
      memory").
      
      Initially I was going to stop there and bring the policy setting logic
      to userspace.  I met two issues on this way:
      
       1) It is possible to have memory hotplugged at boot (e.g.  with QEMU).
          These blocks stay offlined if we turn the onlining policy on by
          userspace.
      
       2) My attempt to bring this policy setting to systemd failed, systemd
          maintainers suggest to change the default in kernel or ...  to use
          tmpfiles.d to alter the policy (which looks like a hack to me):
              https://github.com/systemd/systemd/pull/2938
      
      Here I suggest to add a config option to set the default value for the
      policy and a kernel command line parameter to make the override.
      
      This patch (of 2):
      
      Introduce config option to set the default value for memory hotplug
      onlining policy (/sys/devices/system/memory/auto_online_blocks).  The
      reason one would want to turn this option on are to have early onlining
      for hotpluggable memory available at boot and to not require any
      userspace actions to make memory hotplug work.
      
      [akpm@linux-foundation.org: tweak Kconfig text]
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Lennart Poettering <lennart@poettering.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8604d9e5
    • Y
      mm: slab: remove ZONE_DMA_FLAG · a3187e43
      Yang Shi 提交于
      Now we have IS_ENABLED helper to check if a Kconfig option is enabled or
      not, so ZONE_DMA_FLAG sounds no longer useful.
      
      And, the use of ZONE_DMA_FLAG in slab looks pointless according to the
      comment [1] from Johannes Weiner, so remove them and ORing passed in
      flags with the cache gfp flags has been done in kmem_getpages().
      
      [1] https://lkml.org/lkml/2014/9/25/553
      
      Link: http://lkml.kernel.org/r/1462381297-11009-1-git-send-email-yang.shi@linaro.orgSigned-off-by: NYang Shi <yang.shi@linaro.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a3187e43
  26. 18 3月, 2016 3 次提交
  27. 19 2月, 2016 1 次提交
    • D
      mm/core, x86/mm/pkeys: Add arch_validate_pkey() · 66d37570
      Dave Hansen 提交于
      The syscall-level code is passed a protection key and need to
      return an appropriate error code if the protection key is bogus.
      We will be using this in subsequent patches.
      
      Note that this also begins a series of arch-specific calls that
      we need to expose in otherwise arch-independent code.  We create
      a linux/pkeys.h header where we will put *all* the stubs for
      these functions.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210232.774EEAAB@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      66d37570
  28. 18 2月, 2016 1 次提交
    • D
      mm/core, x86/mm/pkeys: Store protection bits in high VMA flags · 63c17fb8
      Dave Hansen 提交于
      vma->vm_flags is an 'unsigned long', so has space for 32 flags
      on 32-bit architectures.  The high 32 bits are unused on 64-bit
      platforms.  We've steered away from using the unused high VMA
      bits for things because we would have difficulty supporting it
      on 32-bit.
      
      Protection Keys are not available in 32-bit mode, so there is
      no concern about supporting this feature in 32-bit mode or on
      32-bit CPUs.
      
      This patch carves out 4 bits from the high half of
      vma->vm_flags and allows architectures to set config option
      to make them available.
      
      Sparse complains about these constants unless we explicitly
      call them "UL".
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Valentin Rothberg <valentinrothberg@gmail.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210208.81AF00D5@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      63c17fb8