1. 13 12月, 2012 1 次提交
    • J
      mm: introduce new field "managed_pages" to struct zone · 9feedc9d
      Jiang Liu 提交于
      Currently a zone's present_pages is calcuated as below, which is
      inaccurate and may cause trouble to memory hotplug.
      
      	spanned_pages - absent_pages - memmap_pages - dma_reserve.
      
      During fixing bugs caused by inaccurate zone->present_pages, we found
      zone->present_pages has been abused.  The field zone->present_pages may
      have different meanings in different contexts:
      
      1) pages existing in a zone.
      2) pages managed by the buddy system.
      
      For more discussions about the issue, please refer to:
        http://lkml.org/lkml/2012/11/5/866
        https://patchwork.kernel.org/patch/1346751/
      
      This patchset tries to introduce a new field named "managed_pages" to
      struct zone, which counts "pages managed by the buddy system".  And revert
      zone->present_pages to count "physical pages existing in a zone", which
      also keep in consistence with pgdat->node_present_pages.
      
      We will set an initial value for zone->managed_pages in function
      free_area_init_core() and will adjust it later if the initial value is
      inaccurate.
      
      For DMA/normal zones, the initial value is set to:
      
      	(spanned_pages - absent_pages - memmap_pages - dma_reserve)
      
      Later zone->managed_pages will be adjusted to the accurate value when the
      bootmem allocator frees all free pages to the buddy system in function
      free_all_bootmem_node() and free_all_bootmem().
      
      The bootmem allocator doesn't touch highmem pages, so highmem zones'
      managed_pages is set to the accurate value "spanned_pages - absent_pages"
      in function free_area_init_core() and won't be updated anymore.
      
      This patch also adds a new field "managed_pages" to /proc/zoneinfo
      and sysrq showmem.
      
      [akpm@linux-foundation.org: small comment tweaks]
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
      Tested-by: NChris Clayton <chris2553@googlemail.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9feedc9d
  2. 18 11月, 2012 1 次提交
  3. 17 11月, 2012 1 次提交
    • A
      revert "mm: fix-up zone present pages" · 5576646f
      Andrew Morton 提交于
      Revert commit 7f1290f2 ("mm: fix-up zone present pages")
      
      That patch tried to fix a issue when calculating zone->present_pages,
      but it caused a regression on 32bit systems with HIGHMEM.  With that
      change, reset_zone_present_pages() resets all zone->present_pages to
      zero, and fixup_zone_present_pages() is called to recalculate
      zone->present_pages when the boot allocator frees core memory pages into
      buddy allocator.  Because highmem pages are not freed by bootmem
      allocator, all highmem zones' present_pages becomes zero.
      
      Various options for improving the situation are being discussed but for
      now, let's return to the 3.6 code.
      
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Tested-by: NChris Clayton <chris2553@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5576646f
  4. 09 10月, 2012 2 次提交
    • J
      mm: fix-up zone present pages · 7f1290f2
      Jianguo Wu 提交于
      I think zone->present_pages indicates pages that buddy system can management,
      it should be:
      
      	zone->present_pages = spanned pages - absent pages - bootmem pages,
      
      but is now:
      	zone->present_pages = spanned pages - absent pages - memmap pages.
      
      spanned pages: total size, including holes.
      absent pages: holes.
      bootmem pages: pages used in system boot, managed by bootmem allocator.
      memmap pages: pages used by page structs.
      
      This may cause zone->present_pages less than it should be.  For example,
      numa node 1 has ZONE_NORMAL and ZONE_MOVABLE, it's memmap and other
      bootmem will be allocated from ZONE_MOVABLE, so ZONE_NORMAL's
      present_pages should be spanned pages - absent pages, but now it also
      minus memmap pages(free_area_init_core), which are actually allocated from
      ZONE_MOVABLE.  When offlining all memory of a zone, this will cause
      zone->present_pages less than 0, because present_pages is unsigned long
      type, it is actually a very large integer, it indirectly caused
      zone->watermark[WMARK_MIN] becomes a large
      integer(setup_per_zone_wmarks()), than cause totalreserve_pages become a
      large integer(calculate_totalreserve_pages()), and finally cause memory
      allocating failure when fork process(__vm_enough_memory()).
      
      [root@localhost ~]# dmesg
      -bash: fork: Cannot allocate memory
      
      I think the bug described in
      
        http://marc.info/?l=linux-mm&m=134502182714186&w=2
      
      is also caused by wrong zone present pages.
      
      This patch intends to fix-up zone->present_pages when memory are freed to
      buddy system on x86_64 and IA64 platforms.
      Signed-off-by: NJianguo Wu <wujianguo@huawei.com>
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Reported-by: NPetr Tesarik <ptesarik@suse.cz>
      Tested-by: NPetr Tesarik <ptesarik@suse.cz>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f1290f2
    • W
      mm/memblock: cleanup early_node_map[] related comments · f2d52fe5
      Wanpeng Li 提交于
      Commit 0ee332c1 ("memblock: Kill early_node_map[]") removed
      early_node_map[].  Clean up the comments to comply with that change.
      Signed-off-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2d52fe5
  5. 12 7月, 2012 2 次提交
    • Y
      memblock: free allocated memblock_reserved_regions later · 29f67386
      Yinghai Lu 提交于
      memblock_free_reserved_regions() calls memblock_free(), but
      memblock_free() would double reserved.regions too, so we could free the
      old range for reserved.regions.
      
      Also tj said there is another bug which could be related to this.
      
      | I don't think we're saving any noticeable
      | amount by doing this "free - give it to page allocator - reserve
      | again" dancing.  We should just allocate regions aligned to page
      | boundaries and free them later when memblock is no longer in use.
      
      in that case, when DEBUG_PAGEALLOC, will get panic:
      
           memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39
        BUG: unable to handle kernel paging request at ffff88102febd948
        IP: [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
        PGD 4826063 PUD cf67a067 PMD cf7fa067 PTE 800000102febd160
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        CPU 0
        Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-20120614-sasha #447
        RIP: 0010:[<ffffffff836a5774>]  [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
      
      See the discussion at https://lkml.org/lkml/2012/6/13/469
      
      So try to allocate with PAGE_SIZE alignment and free it later.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29f67386
    • Y
      mm: sparse: fix usemap allocation above node descriptor section · 99ab7b19
      Yinghai Lu 提交于
      After commit f5bf18fa ("bootmem/sparsemem: remove limit constraint
      in alloc_bootmem_section"), usemap allocations may easily be placed
      outside the optimal section that holds the node descriptor, even if
      there is space available in that section.  This results in unnecessary
      hotplug dependencies that need to have the node unplugged before the
      section holding the usemap.
      
      The reason is that the bootmem allocator doesn't guarantee a linear
      search starting from the passed allocation goal but may start out at a
      much higher address absent an upper limit.
      
      Fix this by trying the allocation with the limit at the section end,
      then retry without if that fails.  This keeps the fix from f5bf18fa
      of not panicking if the allocation does not fit in the section, but
      still makes sure to try to stay within the section at first.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>	[3.3.x, 3.4.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99ab7b19
  6. 30 5月, 2012 3 次提交
  7. 11 5月, 2012 1 次提交
  8. 26 4月, 2012 1 次提交
    • D
      mm: nobootmem: Correct alloc_bootmem semantics. · 4e1c2b28
      David Miller 提交于
      The comments above __alloc_bootmem_node() claim that the code will
      first try the allocation using 'goal' and if that fails it will
      try again but with the 'goal' requirement dropped.
      
      Unfortunately, this is not what the code does, so fix it to do so.
      
      This is important for nobootmem conversions to architectures such
      as sparc where MAX_DMA_ADDRESS is infinity.
      
      On such architectures all of the allocations done by generic spots,
      such as the sparse-vmemmap implementation, will pass in:
      
      	__pa(MAX_DMA_ADDRESS)
      
      as the goal, and with the limit given as "-1" this will always fail
      unless we add the appropriate fallback logic here.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e1c2b28
  9. 31 10月, 2011 1 次提交
  10. 15 7月, 2011 4 次提交
  11. 14 7月, 2011 1 次提交
  12. 25 5月, 2011 1 次提交
  13. 31 3月, 2011 1 次提交
  14. 24 3月, 2011 1 次提交
    • O
      crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr,... · 93a72052
      Olaf Hering 提交于
      crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn
      
      The Xen PV drivers in a crashed HVM guest can not connect to the dom0
      backend drivers because both frontend and backend drivers are still in
      connected state.  To run the connection reset function only in case of a
      crashdump, the is_kdump_kernel() function needs to be available for the PV
      driver modules.
      
      Consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn into
      kernel/crash_dump.c Also export elfcorehdr_addr to make is_kdump_kernel()
      usable for modules.
      
      Leave 'elfcorehdr' as early_param().  This changes powerpc from __setup()
      to early_param().  It adds an address range check from x86 also on ia64
      and powerpc.
      
      [akpm@linux-foundation.org: additional #includes]
      [akpm@linux-foundation.org: remove elfcorehdr_addr export]
      [akpm@linux-foundation.org: fix for Tejun's mm/nobootmem.c changes]
      Signed-off-by: NOlaf Hering <olaf@aepfle.de>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93a72052
  15. 24 2月, 2011 3 次提交