1. 10 10月, 2014 1 次提交
    • J
      topology: add support for node_to_mem_node() to determine the fallback node · ad2c8144
      Joonsoo Kim 提交于
      Anton noticed (http://www.spinics.net/lists/linux-mm/msg67489.html) that
      on ppc LPARs with memoryless nodes, a large amount of memory was consumed
      by slabs and was marked unreclaimable.  He tracked it down to slab
      deactivations in the SLUB core when we allocate remotely, leading to poor
      efficiency always when memoryless nodes are present.
      
      After much discussion, Joonsoo provided a few patches that help
      significantly.  They don't resolve the problem altogether:
      
       - memory hotplug still needs testing, that is when a memoryless node
         becomes memory-ful, we want to dtrt
       - there are other reasons for going off-node than memoryless nodes,
         e.g., fully exhausted local nodes
      
      Neither case is resolved with this series, but I don't think that should
      block their acceptance, as they can be explored/resolved with follow-on
      patches.
      
      The series consists of:
      
      [1/3] topology: add support for node_to_mem_node() to determine the
            fallback node
      
      [2/3] slub: fallback to node_to_mem_node() node if allocating on
            memoryless node
      
            - Joonsoo's patches to cache the nearest node with memory for each
              NUMA node
      
      [3/3] Partial revert of 81c98869 (""kthread: ensure locality of
            task_struct allocations")
      
       - At Tejun's request, keep the knowledge of memoryless node fallback
         to the allocator core.
      
      This patch (of 3):
      
      We need to determine the fallback node in slub allocator if the allocation
      target node is memoryless node.  Without it, the SLUB wrongly select the
      node which has no memory and can't use a partial slab, because of node
      mismatch.  Introduced function, node_to_mem_node(X), will return a node Y
      with memory that has the nearest distance.  If X is memoryless node, it
      will return nearest distance node, but, if X is normal node, it will
      return itself.
      
      We will use this function in following patch to determine the fallback
      node.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Han Pingtian <hanpt@linux.vnet.ibm.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad2c8144
  2. 03 10月, 2014 1 次提交
  3. 07 8月, 2014 10 次提交
  4. 31 7月, 2014 1 次提交
    • D
      mm, thp: do not allow thp faults to avoid cpuset restrictions · b104a35d
      David Rientjes 提交于
      The page allocator relies on __GFP_WAIT to determine if ALLOC_CPUSET
      should be set in allocflags.  ALLOC_CPUSET controls if a page allocation
      should be restricted only to the set of allowed cpuset mems.
      
      Transparent hugepages clears __GFP_WAIT when defrag is disabled to prevent
      the fault path from using memory compaction or direct reclaim.  Thus, it
      is unfairly able to allocate outside of its cpuset mems restriction as a
      side-effect.
      
      This patch ensures that ALLOC_CPUSET is only cleared when the gfp mask is
      truly GFP_ATOMIC by verifying it is also not a thp allocation.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reported-by: NAlex Thorlton <athorlton@sgi.com>
      Tested-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: Bob Liu <lliubbo@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Hedi Berriche <hedi@sgi.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b104a35d
  5. 30 7月, 2014 1 次提交
    • R
      mm: fix page_alloc.c kernel-doc warnings · 1aab4d77
      Randy Dunlap 提交于
      Fix kernel-doc warnings and function name in mm/page_alloc.c:
      
        Warning(..//mm/page_alloc.c:6074): No description found for parameter 'pfn'
        Warning(..//mm/page_alloc.c:6074): No description found for parameter 'mask'
        Warning(..//mm/page_alloc.c:6074): Excess function parameter 'start_bitidx' description in 'get_pfnblock_flags_mask'
        Warning(..//mm/page_alloc.c:6102): No description found for parameter 'pfn'
        Warning(..//mm/page_alloc.c:6102): No description found for parameter 'mask'
        Warning(..//mm/page_alloc.c:6102): Excess function parameter 'start_bitidx' description in 'set_pfnblock_flags_mask'
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1aab4d77
  6. 04 7月, 2014 1 次提交
    • M
      mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER · dc78327c
      Michal Nazarewicz 提交于
      With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
      the following is triggered at early boot:
      
        SMP: Total of 8 processors activated.
        devtmpfs: initialized
        Unable to handle kernel NULL pointer dereference at virtual address 00000008
        pgd = fffffe0000050000
        [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
        Internal error: Oops: 96000006 [#1] SMP
        Modules linked in:
        CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
        task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
        PC is at __list_add+0x10/0xd4
        LR is at free_one_page+0x270/0x638
        ...
        Call trace:
          __list_add+0x10/0xd4
          free_one_page+0x26c/0x638
          __free_pages_ok.part.52+0x84/0xbc
          __free_pages+0x74/0xbc
          init_cma_reserved_pageblock+0xe8/0x104
          cma_init_reserved_areas+0x190/0x1e4
          do_one_initcall+0xc4/0x154
          kernel_init_freeable+0x204/0x2a8
          kernel_init+0xc/0xd4
      
      This happens because init_cma_reserved_pageblock() calls
      __free_one_page() with pageblock_order as page order but it is bigger
      than MAX_ORDER.  This in turn causes accesses past zone->free_list[].
      
      Fix the problem by changing init_cma_reserved_pageblock() such that it
      splits pageblock into individual MAX_ORDER pages if pageblock is bigger
      than a MAX_ORDER page.
      
      In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
      architectures expect for ia64, powerpc and tile at the moment, the
      “pageblock_order > MAX_ORDER” condition will be optimised out since both
      sides of the operator are constants.  In cases where pageblock size is
      variable, the performance degradation should not be significant anyway
      since init_cma_reserved_pageblock() is called only at boot time at most
      MAX_CMA_AREAS times which by default is eight.
      Signed-off-by: NMichal Nazarewicz <mina86@mina86.com>
      Reported-by: NMark Salter <msalter@redhat.com>
      Tested-by: NMark Salter <msalter@redhat.com>
      Tested-by: NChristopher Covington <cov@codeaurora.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: <stable@vger.kernel.org>	[3.5+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc78327c
  7. 24 6月, 2014 1 次提交
    • D
      mm, pcp: allow restoring percpu_pagelist_fraction default · 7cd2b0a3
      David Rientjes 提交于
      Oleg reports a division by zero error on zero-length write() to the
      percpu_pagelist_fraction sysctl:
      
          divide error: 0000 [#1] SMP DEBUG_PAGEALLOC
          CPU: 1 PID: 9142 Comm: badarea_io Not tainted 3.15.0-rc2-vm-nfs+ #19
          Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
          task: ffff8800d5aeb6e0 ti: ffff8800d87a2000 task.ti: ffff8800d87a2000
          RIP: 0010: percpu_pagelist_fraction_sysctl_handler+0x84/0x120
          RSP: 0018:ffff8800d87a3e78  EFLAGS: 00010246
          RAX: 0000000000000f89 RBX: ffff88011f7fd000 RCX: 0000000000000000
          RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000010
          RBP: ffff8800d87a3e98 R08: ffffffff81d002c8 R09: ffff8800d87a3f50
          R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000060
          R13: ffffffff81c3c3e0 R14: ffffffff81cfddf8 R15: ffff8801193b0800
          FS:  00007f614f1e9740(0000) GS:ffff88011f440000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
          CR2: 00007f614f1fa000 CR3: 00000000d9291000 CR4: 00000000000006e0
          Call Trace:
            proc_sys_call_handler+0xb3/0xc0
            proc_sys_write+0x14/0x20
            vfs_write+0xba/0x1e0
            SyS_write+0x46/0xb0
            tracesys+0xe1/0xe6
      
      However, if the percpu_pagelist_fraction sysctl is set by the user, it
      is also impossible to restore it to the kernel default since the user
      cannot write 0 to the sysctl.
      
      This patch allows the user to write 0 to restore the default behavior.
      It still requires a fraction equal to or larger than 8, however, as
      stated by the documentation for sanity.  If a value in the range [1, 7]
      is written, the sysctl will return EINVAL.
      
      This successfully solves the divide by zero issue at the same time.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reported-by: NOleg Drokin <green@linuxhacker.ru>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7cd2b0a3
  8. 07 6月, 2014 1 次提交
  9. 05 6月, 2014 21 次提交
  10. 08 4月, 2014 2 次提交
新手
引导
客服 返回
顶部