1. 30 4月, 2013 2 次提交
  2. 03 3月, 2013 1 次提交
    • Y
      x86, ACPI, mm: Revert movablemem_map support · 20e6926d
      Yinghai Lu 提交于
      Tim found:
      
        WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
        Hardware name: S2600CP
        sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
        smpboot: Booting Node   1, Processors  #1
        Modules linked in:
        Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
        Call Trace:
          set_cpu_sibling_map+0x279/0x449
          start_secondary+0x11d/0x1e5
      
      Don Morris reproduced on a HP z620 workstation, and bisected it to
      commit e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock
      is ready")
      
      It turns out movable_map has some problems, and it breaks several things
      
      1. numa_init is called several times, NOT just for srat. so those
      	nodes_clear(numa_nodes_parsed)
      	memset(&numa_meminfo, 0, sizeof(numa_meminfo))
         can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
         and make fall back path working.
      
      2. simply split acpi_numa_init to early_parse_srat.
         a. that early_parse_srat is NOT called for ia64, so you break ia64.
         b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
      	     set_apicid_to_node(i, NUMA_NO_NODE)
           still left in numa_init. So it will just clear result from early_parse_srat.
           it should be moved before that....
         c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
             early before override from INITRD is settled.
      
      3. that patch TITLE is total misleading, there is NO x86 in the title,
         but it changes critical x86 code. It caused x86 guys did not
         pay attention to find the problem early. Those patches really should
         be routed via tip/x86/mm.
      
      4. after that commit, following range can not use movable ram:
        a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
        b. initrd... it will be freed after booting, so it could be on movable...
        c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
      	anymore.
        d. init_mem_mapping: can not put page table high anymore.
        e. initmem_init: vmemmap can not be high local node anymore. That is
           not good.
      
      If node is hotplugable, the mem related range like page table and
      vmemmap could be on the that node without problem and should be on that
      node.
      
      We have workaround patch that could fix some problems, but some can not
      be fixed.
      
      So just remove that offending commit and related ones including:
      
       f7210e6c ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
          protect movablecore_map in memblock_overlaps_region().")
      
       01a178a9 ("acpi, memory-hotplug: support getting hotplug info from
          SRAT")
      
       27168d38 ("acpi, memory-hotplug: extend movablemem_map ranges to
          the end of node")
      
       e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock is
          ready")
      
       fb06bc8e ("page_alloc: bootmem limit with movablecore_map")
      
       42f47e27 ("page_alloc: make movablemem_map have higher priority")
      
       6981ec31 ("page_alloc: introduce zone_movable_limit[] to keep
          movable limit for nodes")
      
       34b71f1e ("page_alloc: add movable_memmap kernel parameter")
      
       4d59a751 ("x86: get pg_data_t's memory from other node")
      
      Later we should have patches that will make sure kernel put page table
      and vmemmap on local node ram instead of push them down to node0.  Also
      need to find way to put other kernel used ram to local node ram.
      Reported-by: NTim Gardner <tim.gardner@canonical.com>
      Reported-by: NDon Morris <don.morris@hp.com>
      Bisected-by: NDon Morris <don.morris@hp.com>
      Tested-by: NDon Morris <don.morris@hp.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20e6926d
  3. 24 2月, 2013 2 次提交
  4. 30 1月, 2013 1 次提交
  5. 12 1月, 2013 1 次提交
  6. 25 10月, 2012 1 次提交
  7. 09 10月, 2012 2 次提交
  8. 05 9月, 2012 1 次提交
  9. 01 8月, 2012 1 次提交
  10. 12 7月, 2012 1 次提交
    • Y
      memblock: free allocated memblock_reserved_regions later · 29f67386
      Yinghai Lu 提交于
      memblock_free_reserved_regions() calls memblock_free(), but
      memblock_free() would double reserved.regions too, so we could free the
      old range for reserved.regions.
      
      Also tj said there is another bug which could be related to this.
      
      | I don't think we're saving any noticeable
      | amount by doing this "free - give it to page allocator - reserve
      | again" dancing.  We should just allocate regions aligned to page
      | boundaries and free them later when memblock is no longer in use.
      
      in that case, when DEBUG_PAGEALLOC, will get panic:
      
           memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39
        BUG: unable to handle kernel paging request at ffff88102febd948
        IP: [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
        PGD 4826063 PUD cf67a067 PMD cf7fa067 PTE 800000102febd160
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        CPU 0
        Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-20120614-sasha #447
        RIP: 0010:[<ffffffff836a5774>]  [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
      
      See the discussion at https://lkml.org/lkml/2012/6/13/469
      
      So try to allocate with PAGE_SIZE alignment and free it later.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29f67386
  11. 21 6月, 2012 2 次提交
  12. 08 6月, 2012 1 次提交
  13. 30 5月, 2012 2 次提交
    • G
      mm/memblock: fix memory leak on extending regions · 181eb394
      Gavin Shan 提交于
      The overall memblock has been organized into the memory regions and
      reserved regions.  Initially, the memory regions and reserved regions are
      stored in the predetermined arrays of "struct memblock _region".  It's
      possible for the arrays to be enlarged when we have newly added regions,
      but no free space left there.  The policy here is to create double-sized
      array either by slab allocator or memblock allocator.  Unfortunately, we
      didn't free the old array, which might be allocated through slab allocator
      before.  That would cause memory leak.
      
      The patch introduces 2 variables to trace where (slab or memblock) the
      memory and reserved regions come from.  The memory for the memory or
      reserved regions will be deallocated by kfree() if that was allocated by
      slab allocator.  Thus to fix the memory leak issue.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      181eb394
    • G
      mm/memblock: cleanup on duplicate VA/PA conversion · 4e2f0775
      Gavin Shan 提交于
      The overall memblock has been organized into the memory regions and
      reserved regions.  Initially, the memory regions and reserved regions are
      stored in the predetermined arrays of "struct memblock _region".  It's
      possible for the arrays to be enlarged when we have newly added regions
      for them, but no enough space there.  Under the situation, We will created
      double-sized array to meet the requirement.  However, the original
      implementation converted the VA (Virtual Address) of the newly allocated
      array of regions to PA (Physical Address), then translate back when we
      allocates the new array from slab.  That's actually unnecessary.
      
      The patch removes the duplicate VA/PA conversion.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e2f0775
  14. 21 4月, 2012 1 次提交
  15. 01 3月, 2012 1 次提交
    • T
      memblock: Fix size aligning of memblock_alloc_base_nid() · 847854f5
      Tejun Heo 提交于
      memblock allocator aligns @size to @align to reduce the amount
      of fragmentation.  Commit:
      
       7bd0b0f0 ("memblock: Reimplement memblock allocation using reverse free area iterator")
      
      Broke it by incorrectly relocating @size aligning to
      memblock_find_in_range_node().  As the aligned size is not
      propagated back to memblock_alloc_base_nid(), the actually
      reserved size isn't aligned.
      
      While this increases memory use for memblock reserved array,
      this shouldn't cause any critical failure; however, it seems
      that the size aligning was hiding a use-beyond-allocation bug in
      sparc64 and losing the aligning causes boot failure.
      
      The underlying problem is currently being debugged but this is a
      proper fix in itself, it's already pretty late in -rc cycle for
      boot failures and reverting the change for debugging isn't
      difficult. Restore the size aligning moving it to
      memblock_alloc_base_nid().
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20120228205621.GC3252@dhcp-172-17-108-109.mtv.corp.google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <alpine.SOC.1.00.1202130942030.1488@math.ut.ee>
      847854f5
  16. 16 1月, 2012 1 次提交
  17. 09 12月, 2011 14 次提交
    • T
      memblock: Reimplement memblock allocation using reverse free area iterator · 7bd0b0f0
      Tejun Heo 提交于
      Now that all early memory information is in memblock when enabled, we
      can implement reverse free area iterator and use it to implement NUMA
      aware allocator which is then wrapped for simpler variants instead of
      the confusing and inefficient mending of information in separate NUMA
      aware allocator.
      
      Implement for_each_free_mem_range_reverse(), use it to reimplement
      memblock_find_in_range_node() which in turn is used by all allocators.
      
      The visible allocator interface is inconsistent and can probably use
      some cleanup too.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      7bd0b0f0
    • T
      memblock: Kill early_node_map[] · 0ee332c1
      Tejun Heo 提交于
      Now all ARCH_POPULATES_NODE_MAP archs select HAVE_MEBLOCK_NODE_MAP -
      there's no user of early_node_map[] left.  Kill early_node_map[] and
      replace ARCH_POPULATES_NODE_MAP with HAVE_MEMBLOCK_NODE_MAP.  Also,
      relocate for_each_mem_pfn_range() and helper from mm.h to memblock.h
      as page_alloc.c would no longer host an alternative implementation.
      
      This change is ultimately one to one mapping and shouldn't cause any
      observable difference; however, after the recent changes, there are
      some functions which now would fit memblock.c better than page_alloc.c
      and dependency on HAVE_MEMBLOCK_NODE_MAP instead of HAVE_MEMBLOCK
      doesn't make much sense on some of them.  Further cleanups for
      functions inside HAVE_MEMBLOCK_NODE_MAP in mm.h would be nice.
      
      -v2: Fix compile bug introduced by mis-spelling
       CONFIG_HAVE_MEMBLOCK_NODE_MAP to CONFIG_MEMBLOCK_HAVE_NODE_MAP in
       mmzone.h.  Reported by Stephen Rothwell.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      0ee332c1
    • T
      memblock: Implement memblock_add_node() · 7fb0bc3f
      Tejun Heo 提交于
      Implement memblock_add_node() which can add a new memblock memory
      region with specific node ID.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      7fb0bc3f
    • T
      memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users · 1aadc056
      Tejun Heo 提交于
      The only function of memblock_analyze() is now allowing resize of
      memblock region arrays.  Rename it to memblock_allow_resize() and
      update its users.
      
      * The following users remain the same other than renaming.
      
        arm/mm/init.c::arm_memblock_init()
        microblaze/kernel/prom.c::early_init_devtree()
        powerpc/kernel/prom.c::early_init_devtree()
        openrisc/kernel/prom.c::early_init_devtree()
        sh/mm/init.c::paging_init()
        sparc/mm/init_64.c::paging_init()
        unicore32/mm/init.c::uc32_memblock_init()
      
      * In the following users, analyze was used to update total size which
        is no longer necessary.
      
        powerpc/kernel/machine_kexec.c::reserve_crashkernel()
        powerpc/kernel/prom.c::early_init_devtree()
        powerpc/mm/init_32.c::MMU_init()
        powerpc/mm/tlb_nohash.c::__early_init_mmu()  
        powerpc/platforms/ps3/mm.c::ps3_mm_add_memory()
        powerpc/platforms/embedded6xx/wii.c::wii_memory_fixups()
        sh/kernel/machine_kexec.c::reserve_crashkernel()
      
      * x86/kernel/e820.c::memblock_x86_fill() was directly setting
        memblock_can_resize before populating memblock and calling analyze
        afterwards.  Call memblock_allow_resize() before start populating.
      
      memblock_can_resize is now static inside memblock.c.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      1aadc056
    • T
      memblock: Track total size of regions automatically · 1440c4e2
      Tejun Heo 提交于
      Total size of memory regions was calculated by memblock_analyze()
      requiring explicitly calling the function between operations which can
      change memory regions and possible users of total size, which is
      cumbersome and fragile.
      
      This patch makes each memblock_type track total size automatically
      with minor modifications to memblock manipulation functions and remove
      requirements on calling memblock_analyze().  [__]memblock_dump_all()
      now also dumps the total size of reserved regions.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      1440c4e2
    • T
      memblock: Reimplement memblock_enforce_memory_limit() using __memblock_remove() · c0ce8fef
      Tejun Heo 提交于
      With recent updates, the basic memblock operations are robust enough
      that there's no reason for memblock_enfore_memory_limit() to directly
      manipulate memblock region arrays.  Reimplement it using
      __memblock_remove().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      c0ce8fef
    • T
      memblock: Make memblock functions handle overflowing range @size · eb18f1b5
      Tejun Heo 提交于
      Allow memblock users to specify range where @base + @size overflows
      and automatically cap it at maximum.  This makes the interface more
      robust and specifying till-the-end-of-memory easier.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      eb18f1b5
    • T
      memblock: Reimplement __memblock_remove() using memblock_isolate_range() · 71936180
      Tejun Heo 提交于
      __memblock_remove()'s open coded region manipulation can be trivially
      replaced with memblock_islate_range().  This increases code sharing
      and eases improving region tracking.
      
      This pulls memblock_isolate_range() out of HAVE_MEMBLOCK_NODE_MAP.
      Make it use memblock_get_region_node() instead of assuming rgn->nid is
      available.
      
      -v2: Fixed build failure on !HAVE_MEMBLOCK_NODE_MAP caused by direct
           rgn->nid access.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      71936180
    • T
      memblock: Separate out memblock_isolate_range() from memblock_set_node() · 6a9ceb31
      Tejun Heo 提交于
      memblock_set_node() operates in three steps - break regions crossing
      boundaries, set nid and merge back regions.  This patch separates the
      first part into a separate function - memblock_isolate_range(), which
      breaks regions crossing range boundaries and returns range index range
      for regions properly contained in the specified memory range.
      
      This doesn't introduce any behavior change and will be used to further
      unify region handling.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      6a9ceb31
    • T
      memblock: Kill memblock_init() · fe091c20
      Tejun Heo 提交于
      memblock_init() initializes arrays for regions and memblock itself;
      however, all these can be done with struct initializers and
      memblock_init() can be removed.  This patch kills memblock_init() and
      initializes memblock with struct initializer.
      
      The only difference is that the first dummy entries don't have .nid
      set to MAX_NUMNODES initially.  This doesn't cause any behavior
      difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      fe091c20
    • T
      memblock: Kill sentinel entries at the end of static region arrays · c5a1cb28
      Tejun Heo 提交于
      memblock no longer depends on having one more entry at the end during
      addition making the sentinel entries at the end of region arrays not
      too useful.  Remove the sentinels.  This eases further updates.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      c5a1cb28
    • T
      memblock: Add __memblock_dump_all() · 4ff7b82f
      Tejun Heo 提交于
      Add __memblock_dump_all() which dumps memblock configuration whether
      memblock_debug is enabled or not.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      4ff7b82f
    • T
      memblock: Use memblock_reserve() in memblock internal functions · 9c8c27e2
      Tejun Heo 提交于
      Make memblock_double_array(), __memblock_alloc_base() and
      memblock_alloc_nid() use memblock_reserve() instead of calling
      memblock_add_region() with reserved array directly.  This eases
      debugging and updates to memblock_add_region().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      9c8c27e2
    • T
      memblock: Make memblock_{add|remove|free|reserve}() return int and update prototypes · 581adcbe
      Tejun Heo 提交于
      memblock_{add|remove|free|reserve}() return either 0 or -errno but had
      long as return type.  Chage it to int.  Also, drop 'extern' from all
      prototypes in memblock.h - they are unnecessary and used
      inconsistently (especially if mm.h is included in the picture).
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      581adcbe
  18. 01 11月, 2011 3 次提交
  19. 26 7月, 2011 1 次提交
  20. 15 7月, 2011 1 次提交