1. 16 1月, 2014 1 次提交
  2. 07 1月, 2014 1 次提交
  3. 19 12月, 2013 2 次提交
    • L
      x86/mm/numa: Fix 32-bit kernel NUMA boot · f3d815cb
      Lans Zhang 提交于
      When booting a 32-bit x86 kernel on a NUMA machine, node data
      cannot be allocated from local node if the account of memory for
      node 0 covers the low memory space entirely:
      
        [    0.000000] Initmem setup node 0 [mem 0x00000000-0x83fffffff]
        [    0.000000]   NODE_DATA [mem 0x367ed000-0x367edfff]
        [    0.000000] Initmem setup node 1 [mem 0x840000000-0xfffffffff]
        [    0.000000] Cannot find 4096 bytes in node 1
        [    0.000000] 64664MB HIGHMEM available.
        [    0.000000] 871MB LOWMEM available.
      
      To fix this issue, node data is allowed to be allocated from
      other nodes if the memory of local node is still not mapped. The
      expected result looks like this:
      
        [    0.000000] Initmem setup node 0 [mem 0x00000000-0x83fffffff]
        [    0.000000]   NODE_DATA [mem 0x367ed000-0x367edfff]
        [    0.000000] Initmem setup node 1 [mem 0x840000000-0xfffffffff]
        [    0.000000]   NODE_DATA [mem 0x367ec000-0x367ecfff]
        [    0.000000]     NODE_DATA(1) on node 0
        [    0.000000] 64664MB HIGHMEM available.
        [    0.000000] 871MB LOWMEM available.
      Signed-off-by: NLans Zhang <jia.zhang@windriver.com>
      Cc: <andi@firstfloor.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1386303510-18574-1-git-send-email-jia.zhang@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f3d815cb
    • M
      mm: numa: serialise parallel get_user_page against THP migration · 2b4847e7
      Mel Gorman 提交于
      Base pages are unmapped and flushed from cache and TLB during normal
      page migration and replaced with a migration entry that causes any
      parallel NUMA hinting fault or gup to block until migration completes.
      
      THP does not unmap pages due to a lack of support for migration entries
      at a PMD level.  This allows races with get_user_pages and
      get_user_pages_fast which commit 3f926ab9 ("mm: Close races between
      THP migration and PMD numa clearing") made worse by introducing a
      pmd_clear_flush().
      
      This patch forces get_user_page (fast and normal) on a pmd_numa page to
      go through the slow get_user_page path where it will serialise against
      THP migration and properly account for the NUMA hinting fault.  On the
      migration side the page table lock is taken for each PTE update.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2b4847e7
  4. 22 11月, 2013 1 次提交
  5. 21 11月, 2013 1 次提交
  6. 19 11月, 2013 1 次提交
    • K
      x86/mm: Implement ASLR for hugetlb mappings · fd8526ad
      Kirill A. Shutemov 提交于
      Matthew noticed that hugetlb mappings don't participate in ASLR on x86-64:
      
        %  for i in `seq 3`; do
        > tools/testing/selftests/vm/map_hugetlb | grep address
        > done
        Returned address is 0x2aaaaac00000
        Returned address is 0x2aaaaac00000
        Returned address is 0x2aaaaac00000
      
      /proc/PID/maps entries for the mapping are always the same
      (except inode number):
      
        2aaaaac00000-2aaabac00000 rw-p 00000000 00:0c 8200              /anon_hugepage (deleted)
        2aaaaac00000-2aaabac00000 rw-p 00000000 00:0c 256               /anon_hugepage (deleted)
        2aaaaac00000-2aaabac00000 rw-p 00000000 00:0c 7180              /anon_hugepage (deleted)
      
      The reason is the generic hugetlb_get_unmapped_area() function
      which is used on x86-64.  It doesn't support randomization and
      use bottom-up unmapped area lookup, instead of usual top-down
      on x86-64.
      
      x86 has arch-specific hugetlb_get_unmapped_area(), but it's used
      only on x86-32.
      
      Let's use arch-specific hugetlb_get_unmapped_area() on x86-64
      too. That adds ASLR and switches hugetlb mappings to use top-down
      unmapped area lookup:
      
        %  for i in `seq 3`; do
        > tools/testing/selftests/vm/map_hugetlb | grep address
        > done
        Returned address is 0x7f4f08a00000
        Returned address is 0x7fdda4200000
        Returned address is 0x7febe0000000
      
      /proc/PID/maps entries:
      
        7f4f08a00000-7f4f18a00000 rw-p 00000000 00:0c 1168              /anon_hugepage (deleted)
        7fdda4200000-7fddb4200000 rw-p 00000000 00:0c 7092              /anon_hugepage (deleted)
        7febe0000000-7febf0000000 rw-p 00000000 00:0c 7183              /anon_hugepage (deleted)
      
      Unmapped area lookup policy for hugetlb mappings is consistent
      with normal mappings now -- the only difference is alignment
      requirements for huge pages.
      
      libhugetlbfs test-suite didn't detect any regressions with the
      patch applied (although it shows few failures on my machine
      regardless the patch).
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mel Gorman <mgorman@suse.de>
      Link: http://lkml.kernel.org/r/20131119131750.EA45CE0090@blue.fi.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fd8526ad
  7. 15 11月, 2013 2 次提交
    • K
      x86: handle pgtable_page_ctor() fail · cecbd1b5
      Kirill A. Shutemov 提交于
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cecbd1b5
    • K
      x86: add missed pgtable_pmd_page_ctor/dtor calls for preallocated pmds · 09ef4939
      Kirill A. Shutemov 提交于
      In split page table lock case, we embed spinlock_t into struct page.
      For obvious reason, we don't want to increase size of struct page if
      spinlock_t is too big, like with DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC or
      on -rt kernel.  So we disable split page table lock, if spinlock_t is
      too big.
      
      This patchset allows to allocate the lock dynamically if spinlock_t is
      big.  In this page->ptl is used to store pointer to spinlock instead of
      spinlock itself.  It costs additional cache line for indirect access,
      but fix page fault scalability for multi-threaded applications.
      
      LOCK_STAT depends on DEBUG_SPINLOCK, so on current kernel enabling
      LOCK_STAT to analyse scalability issues breaks scalability.  ;)
      
      The patchset mostly fixes this.  Results for ./thp_memscale -c 80 -b 512M
      on 4-socket machine:
      
      baseline, no CONFIG_LOCK_STAT:	9.115460703 seconds time elapsed
      baseline, CONFIG_LOCK_STAT=y:	53.890567123 seconds time elapsed
      patched, no CONFIG_LOCK_STAT:	8.852250368 seconds time elapsed
      patched, CONFIG_LOCK_STAT=y:	11.069770759 seconds time elapsed
      
      Patch count is scary, but most of them trivial. Overview:
      
       Patches 1-4	Few bug fixes. No dependencies to other patches.
      		Probably should applied as soon as possible.
      
       Patch 5	Changes signature of pgtable_page_ctor(). We will use it
      		for dynamic lock allocation, so it can fail.
      
       Patches 6-8	Add missing constructor/destructor calls on few archs.
      		It's fixes NR_PAGETABLE accounting and prepare to use
      		split ptl.
      
       Patches 9-33	Add pgtable_page_ctor() fail handling to all archs.
      
       Patches 34	Finally adds support of dynamically-allocated page->pte.
      		Also contains documentation for split page table lock.
      
      This patch (of 34):
      
      I've missed that we preallocate few pmds on pgd_alloc() if X86_PAE
      enabled.  Let's add missed constructor/destructor calls.
      
      I haven't noticed it during testing since prep_new_page() clears
      page->mapping and therefore page->ptl.  It's effectively equal to
      spin_lock_init(&page->ptl).
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09ef4939
  8. 13 11月, 2013 5 次提交
    • Z
    • T
      mem-hotplug: introduce movable_node boot option · c5320926
      Tang Chen 提交于
      The hot-Pluggable field in SRAT specifies which memory is hotpluggable.
      As we mentioned before, if hotpluggable memory is used by the kernel, it
      cannot be hot-removed.  So memory hotplug users may want to set all
      hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.
      
      Memory hotplug users may also set a node as movable node, which has
      ZONE_MOVABLE only, so that the whole node can be hot-removed.
      
      But the kernel cannot use memory in ZONE_MOVABLE.  By doing this, the
      kernel cannot use memory in movable nodes.  This will cause NUMA
      performance down.  And other users may be unhappy.
      
      So we need a way to allow users to enable and disable this functionality.
      In this patch, we introduce movable_node boot option to allow users to
      choose to not to consume hotpluggable memory at early boot time and later
      we can set it as ZONE_MOVABLE.
      
      To achieve this, the movable_node boot option will control the memblock
      allocation direction.  That said, after memblock is ready, before SRAT is
      parsed, we should allocate memory near the kernel image as we explained in
      the previous patches.  So if movable_node boot option is set, the kernel
      does the following:
      
      1. After memblock is ready, make memblock allocate memory bottom up.
      2. After SRAT is parsed, make memblock behave as default, allocate memory
         top down.
      
      Users can specify "movable_node" in kernel commandline to enable this
      functionality.  For those who don't use memory hotplug or who don't want
      to lose their NUMA performance, just don't specify anything.  The kernel
      will work as before.
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Suggested-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c5320926
    • T
      x86/mem-hotplug: support initialize page tables in bottom-up · b959ed6c
      Tang Chen 提交于
      The Linux kernel cannot migrate pages used by the kernel.  As a result,
      kernel pages cannot be hot-removed.  So we cannot allocate hotpluggable
      memory for the kernel.
      
      In a memory hotplug system, any numa node the kernel resides in should be
      unhotpluggable.  And for a modern server, each node could have at least
      16GB memory.  So memory around the kernel image is highly likely
      unhotpluggable.
      
      ACPI SRAT (System Resource Affinity Table) contains the memory hotplug
      info.  But before SRAT is parsed, memblock has already started to allocate
      memory for the kernel.  So we need to prevent memblock from doing this.
      
      So direct memory mapping page tables setup is the case.
      init_mem_mapping() is called before SRAT is parsed.  To prevent page
      tables being allocated within hotpluggable memory, we will use bottom-up
      direction to allocate page tables from the end of kernel image to the
      higher memory.
      
      Note:
      As for allocating page tables in lower memory, TJ said:
      
      : This is an optional behavior which is triggered by a very specific kernel
      : boot param, which I suspect is gonna need to stick around to support
      : memory hotplug in the current setup unless we add another layer of address
      : translation to support memory hotplug.
      
      As for page tables may occupy too much lower memory if using 4K mapping
      (CONFIG_DEBUG_PAGEALLOC and CONFIG_KMEMCHECK both disable using >4k
      pages), TJ said:
      
      : But as I said in the same paragraph, parsing SRAT earlier doesn't solve
      : the problem in itself either.  Ignoring the option if 4k mapping is
      : required and memory consumption would be prohibitive should work, no?
      : Something like that would be necessary if we're gonna worry about cases
      : like this no matter how we implement it, but, frankly, I'm not sure this
      : is something worth worrying about.
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b959ed6c
    • T
      x86/mm: factor out of top-down direct mapping setup · 0167d7d8
      Tang Chen 提交于
      Create a new function memory_map_top_down to factor out of the top-down
      direct memory mapping pagetable setup.  This is also a preparation for the
      following patch, which will introduce the bottom-up memory mapping.  That
      said, we will put the two ways of pagetable setup into separate functions,
      and choose to use which way in init_mem_mapping, which makes the code more
      clear.
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0167d7d8
    • J
      x86/dumpstack: Fix printk_address for direct addresses · 5f01c988
      Jiri Slaby 提交于
      Consider a kernel crash in a module, simulated the following way:
      
       static int my_init(void)
       {
               char *map = (void *)0x5;
               *map = 3;
               return 0;
       }
       module_init(my_init);
      
      When we turn off FRAME_POINTERs, the very first instruction in
      that function causes a BUG. The problem is that we print IP in
      the BUG report using %pB (from printk_address). And %pB
      decrements the pointer by one to fix printing addresses of
      functions with tail calls.
      
      This was added in commit 71f9e598 ("x86, dumpstack: Use
      %pB format specifier for stack trace") to fix the call stack
      printouts.
      
      So instead of correct output:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
        IP: [<ffffffffa01ac000>] my_init+0x0/0x10 [pb173]
      
      We get:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
        IP: [<ffffffffa0152000>] 0xffffffffa0151fff
      
      To fix that, we use %pS only for stack addresses printouts (via
      newly added printk_stack_address) and %pB for regs->ip (via
      printk_address). I.e. we revert to the old behaviour for all
      except call stacks. And since from all those reliable is 1, we
      remove that parameter from printk_address.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: joe@perches.com
      Cc: jirislaby@gmail.com
      Link: http://lkml.kernel.org/r/1382706418-8435-1-git-send-email-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5f01c988
  9. 12 11月, 2013 1 次提交
  10. 09 11月, 2013 2 次提交
  11. 02 11月, 2013 8 次提交
  12. 29 10月, 2013 1 次提交
    • P
      perf/x86: Further optimize copy_from_user_nmi() · e00b12e6
      Peter Zijlstra 提交于
      Now that we can deal with nested NMI due to IRET re-enabling NMIs and
      can deal with faults from NMI by making sure we preserve CR2 over NMIs
      we can in fact simply access user-space memory from NMI context.
      
      So rewrite copy_from_user_nmi() to use __copy_from_user_inatomic() and
      rework the fault path to do the minimal required work before taking
      the in_atomic() fault handler.
      
      In particular avoid perf_sw_event() which would make perf recurse on
      itself (it should be harmless as our recursion protections should be
      able to deal with this -- but why tempt fate).
      
      Also rename notify_page_fault() to kprobes_fault() as that is a much
      better name; there is no notifier in it and its specific to kprobes.
      
      Don measured that his worst case NMI path shrunk from ~300K cycles to
      ~150K cycles.
      
      Cc: Stephane Eranian <eranian@google.com>
      Cc: jmario@redhat.com
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: dave.hansen@linux.intel.com
      Tested-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20131024105206.GM2490@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e00b12e6
  13. 13 10月, 2013 1 次提交
  14. 13 9月, 2013 2 次提交
  15. 12 9月, 2013 3 次提交
    • N
      mm: migrate: check movability of hugepage in unmap_and_move_huge_page() · 83467efb
      Naoya Horiguchi 提交于
      Currently hugepage migration works well only for pmd-based hugepages
      (mainly due to lack of testing,) so we had better not enable migration of
      other levels of hugepages until we are ready for it.
      
      Some users of hugepage migration (mbind, move_pages, and migrate_pages) do
      page table walk and check pud/pmd_huge() there, so they are safe.  But the
      other users (softoffline and memory hotremove) don't do this, so without
      this patch they can try to migrate unexpected types of hugepages.
      
      To prevent this, we introduce hugepage_migration_support() as an
      architecture dependent check of whether hugepage are implemented on a pmd
      basis or not.  And on some architecture multiple sizes of hugepages are
      available, so hugepage_migration_support() also checks hugepage size.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83467efb
    • D
      mm: vmstats: track TLB flush stats on UP too · 6df46865
      Dave Hansen 提交于
      The previous patch doing vmstats for TLB flushes ("mm: vmstats: tlb flush
      counters") effectively missed UP since arch/x86/mm/tlb.c is only compiled
      for SMP.
      
      UP systems do not do remote TLB flushes, so compile those counters out on
      UP.
      
      arch/x86/kernel/cpu/mtrr/generic.c calls __flush_tlb() directly.  This is
      probably an optimization since both the mtrr code and __flush_tlb() write
      cr4.  It would probably be safe to make that a flush_tlb_all() (and then
      get these statistics), but the mtrr code is ancient and I'm hesitant to
      touch it other than to just stick in the counters.
      
      [akpm@linux-foundation.org: tweak comments]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6df46865
    • D
      mm: vmstats: tlb flush counters · 9824cf97
      Dave Hansen 提交于
      I was investigating some TLB flush scaling issues and realized that we do
      not have any good methods for figuring out how many TLB flushes we are
      doing.
      
      It would be nice to be able to do these in generic code, but the
      arch-independent calls don't explicitly specify whether we actually need
      to do remote flushes or not.  In the end, we really need to know if we
      actually _did_ global vs.  local invalidations, so that leaves us with few
      options other than to muck with the counters from arch-specific code.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9824cf97
  16. 10 9月, 2013 1 次提交
  17. 23 8月, 2013 2 次提交
  18. 20 8月, 2013 1 次提交
    • Y
      x86/mm: Fix boot crash with DEBUG_PAGE_ALLOC=y and more than 512G RAM · 527bf129
      Yinghai Lu 提交于
      Dave Hansen reported that systems between 500G and 600G RAM
      crash early if DEBUG_PAGEALLOC is selected.
      
       > [    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
       > [    0.000000]  [mem 0x00000000-0x000fffff] page 4k
       > [    0.000000] BRK [0x02086000, 0x02086fff] PGTABLE
       > [    0.000000] BRK [0x02087000, 0x02087fff] PGTABLE
       > [    0.000000] BRK [0x02088000, 0x02088fff] PGTABLE
       > [    0.000000] init_memory_mapping: [mem 0xe80ee00000-0xe80effffff]
       > [    0.000000]  [mem 0xe80ee00000-0xe80effffff] page 4k
       > [    0.000000] BRK [0x02089000, 0x02089fff] PGTABLE
       > [    0.000000] BRK [0x0208a000, 0x0208afff] PGTABLE
       > [    0.000000] Kernel panic - not syncing: alloc_low_page: ran out of memory
      
      It turns out that we missed increasing needed pages in BRK to
      mapping initial 2M and [0,1M) when we switched to use the #PF
      handler to set memory mappings:
      
       > commit 8170e6be
       > Author: H. Peter Anvin <hpa@zytor.com>
       > Date:   Thu Jan 24 12:19:52 2013 -0800
       >
       >     x86, 64bit: Use a #PF handler to materialize early mappings on demand
      
      Before that, we had the maping from [0,512M) in head_64.S, and we
      can spare two pages [0-1M).  After that change, we can not reuse
      pages anymore.
      
      When we have more than 512M ram, we need an extra page for pgd page
      with [512G, 1024g).
      
      Increase pages in BRK for page table to solve the boot crash.
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Bisected-by: NDave Hansen <dave.hansen@intel.com>
      Tested-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: <stable@vger.kernel.org> # v3.9 and later
      Link: http://lkml.kernel.org/r/1376351004-4015-1-git-send-email-yinghai@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      527bf129
  19. 15 8月, 2013 1 次提交
  20. 14 8月, 2013 1 次提交
  21. 13 8月, 2013 1 次提交
  22. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8