1. 16 1月, 2016 33 次提交
  2. 15 1月, 2016 7 次提交
    • E
      mm: add tracepoint for scanning pages · 7d2eba05
      Ebru Akagunduz 提交于
      This patch series makes swapin readahead up to a certain number to gain
      more thp performance and adds tracepoint for khugepaged_scan_pmd,
      collapse_huge_page, __collapse_huge_page_isolate.
      
      This patch series was written to deal with programs that access most,
      but not all, of their memory after they get swapped out.  Currently
      these programs do not get their memory collapsed into THPs after the
      system swapped their memory out, while they would get THPs before
      swapping happened.
      
      This patch series was tested with a test program, it allocates 400MB of
      memory, writes to it, and then sleeps.  I force the system to swap out
      all.  Afterwards, the test program touches the area by writing and
      leaves a piece of it without writing.  This shows how much swap in
      readahead made by the patch.
      
      Test results:
      
                              After swapped out
      -------------------------------------------------------------------
                    | Anonymous | AnonHugePages | Swap      | Fraction  |
      -------------------------------------------------------------------
      With patch    | 90076 kB    | 88064 kB    | 309928 kB |    %99    |
      -------------------------------------------------------------------
      Without patch | 194068 kB | 192512 kB     | 205936 kB |    %99    |
      -------------------------------------------------------------------
      
                              After swapped in
      -------------------------------------------------------------------
                    | Anonymous | AnonHugePages | Swap      | Fraction  |
      -------------------------------------------------------------------
      With patch    | 201408 kB | 198656 kB     | 198596 kB |    %98    |
      -------------------------------------------------------------------
      Without patch | 292624 kB | 192512 kB     | 107380 kB |    %65    |
      -------------------------------------------------------------------
      
      This patch (of 3):
      
      Using static tracepoints, data of functions is recorded.  It is good to
      automatize debugging without doing a lot of changes in the source code.
      
      This patch adds tracepoint for khugepaged_scan_pmd, collapse_huge_page
      and __collapse_huge_page_isolate.
      
      [dan.carpenter@oracle.com: add a missing tab]
      Signed-off-by: NEbru Akagunduz <ebru.akagunduz@gmail.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d2eba05
    • N
    • K
      mm: rework virtual memory accounting · 84638335
      Konstantin Khlebnikov 提交于
      When inspecting a vague code inside prctl(PR_SET_MM_MEM) call (which
      testing the RLIMIT_DATA value to figure out if we're allowed to assign
      new @start_brk, @brk, @start_data, @end_data from mm_struct) it's been
      commited that RLIMIT_DATA in a form it's implemented now doesn't do
      anything useful because most of user-space libraries use mmap() syscall
      for dynamic memory allocations.
      
      Linus suggested to convert RLIMIT_DATA rlimit into something suitable
      for anonymous memory accounting.  But in this patch we go further, and
      the changes are bundled together as:
      
       * keep vma counting if CONFIG_PROC_FS=n, will be used for limits
       * replace mm->shared_vm with better defined mm->data_vm
       * account anonymous executable areas as executable
       * account file-backed growsdown/up areas as stack
       * drop struct file* argument from vm_stat_account
       * enforce RLIMIT_DATA for size of data areas
      
      This way code looks cleaner: now code/stack/data classification depends
      only on vm_flags state:
      
       VM_EXEC & ~VM_WRITE            -> code  (VmExe + VmLib in proc)
       VM_GROWSUP | VM_GROWSDOWN      -> stack (VmStk)
       VM_WRITE & ~VM_SHARED & !stack -> data  (VmData)
      
      The rest (VmSize - VmData - VmStk - VmExe - VmLib) could be called
      "shared", but that might be strange beast like readonly-private or VM_IO
      area.
      
       - RLIMIT_AS            limits whole address space "VmSize"
       - RLIMIT_STACK         limits stack "VmStk" (but each vma individually)
       - RLIMIT_DATA          now limits "VmData"
      Signed-off-by: NKonstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Kees Cook <keescook@google.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      84638335
    • G
      mm: move lru_to_page to mm_inline.h · d72ee911
      Geliang Tang 提交于
      Move lru_to_page() from internal.h to mm_inline.h.
      Signed-off-by: NGeliang Tang <geliangtang@163.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d72ee911
    • V
      memory-hotplug: don't BUG() in register_memory_resource() · 6f754ba4
      Vitaly Kuznetsov 提交于
      Out of memory condition is not a bug and while we can't add new memory
      in such case crashing the system seems wrong.  Propagating the return
      value from register_memory_resource() requires interface change.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NIgor Mammedov <imammedo@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Sheng Yong <shengyong1@huawei.com>
      Cc: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f754ba4
    • P
      hugetlb: make mm and fs code explicitly non-modular · 3e89e1c5
      Paul Gortmaker 提交于
      The Kconfig currently controlling compilation of this code is:
      
      config HUGETLBFS
              bool "HugeTLB file system support"
      
      ...meaning that it currently is not being built as a module by anyone.
      
      Lets remove the modular code that is essentially orphaned, so that when
      reading the driver there is no doubt it is builtin-only.
      
      Since module_init translates to device_initcall in the non-modular case,
      the init ordering gets moved to earlier levels when we use the more
      appropriate initcalls here.
      
      Originally I had the fs part and the mm part as separate commits, just
      by happenstance of the nature of how I detected these non-modular use
      cases.  But that can possibly introduce regressions if the patch merge
      ordering puts the fs part 1st -- as the 0-day testing reported a splat
      at mount time.
      
      Investigating with "initcall_debug" showed that the delta was
      init_hugetlbfs_fs being called _before_ hugetlb_init instead of after.  So
      both the fs change and the mm change are here together.
      
      In addition, it worked before due to luck of link order, since they were
      both in the same initcall category.  So we now have the fs part using
      fs_initcall, and the mm part using subsys_initcall, which puts it one
      bucket earlier.  It now passes the basic sanity test that failed in
      earlier 0-day testing.
      
      We delete the MODULE_LICENSE tag and capture that information at the top
      of the file alongside author comments, etc.
      
      We don't replace module.h with init.h since the file already has that.
      Also note that MODULE_ALIAS is a no-op for non-modular code.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Reported-by: Nkernel test robot <ying.huang@linux.intel.com>
      Cc: Nadia Yvette Chambers <nyc@holomorphy.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: NDavidlohr Bueso <dave@stgolabs.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e89e1c5
    • G
      mm/swapfile.c: use list_for_each_entry_safe in free_swap_count_continuations · 0d576d20
      Geliang Tang 提交于
      Use list_for_each_entry_safe() instead of list_for_each_safe() to
      simplify the code.
      Signed-off-by: NGeliang Tang <geliangtang@163.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0d576d20