1. 09 12月, 2010 1 次提交
  2. 29 11月, 2010 1 次提交
  3. 13 10月, 2010 1 次提交
    • Y
      memblock, bootmem: Round pfn properly for memory and reserved regions · c7fc2de0
      Yinghai Lu 提交于
      We need to round memory regions correctly -- specifically, we need to
      round reserved region in the more expansive direction (lower limit
      down, upper limit up) whereas usable memory regions need to be rounded
      in the more restrictive direction (lower limit up, upper limit down).
      
      This introduces two set of inlines:
      
      	memblock_region_memory_base_pfn()
      	memblock_region_memory_end_pfn()
      	memblock_region_reserved_base_pfn()
      	memblock_region_reserved_end_pfn()
      
      Although they are antisymmetric (and therefore are technically
      duplicates) the use of the different inlines explicitly documents the
      programmer's intention.
      
      The lack of proper rounding caused a bug on ARM, which was then found
      to also affect other architectures.
      Reported-by: NRussell King <rmk@arm.linux.org.uk>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4CB4CDFD.4020105@kernel.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      c7fc2de0
  4. 04 8月, 2010 1 次提交
  5. 23 7月, 2010 1 次提交
  6. 14 7月, 2010 1 次提交
  7. 09 7月, 2010 1 次提交
    • A
      powerpc/numa: Use form 1 affinity to setup node distance · 41eab6f8
      Anton Blanchard 提交于
      Form 1 affinity allows multiple entries in ibm,associativity-reference-points
      which represent affinity domains in decreasing order of importance. The
      Linux concept of a node is always the first entry, but using the other
      values as an input to node_distance() allows the memory allocator to make
      better decisions on which node to go first when local memory has been
      exhausted.
      
      We keep things simple and create an array indexed by NUMA node, capped at
      4 entries. Each time we lookup an associativity property we initialise
      the array which is overkill, but since we should only hit this path during
      boot it didn't seem worth adding a per node valid bit.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      41eab6f8
  8. 21 5月, 2010 1 次提交
  9. 06 5月, 2010 1 次提交
  10. 28 4月, 2010 1 次提交
    • A
      powerpc/numa: Add form 1 NUMA affinity · 4b83c330
      Anton Blanchard 提交于
      Firmware changed the way it represents memory and cpu affinity on POWER7.
      Unfortunately the old method now caps the topology to work around issues
      with legacy operating systems. For Linux to get the correct topology we
      need to use the new form 1 affinity information.
      
      We set the form 1 field in the client architecture, and if we see "1" in the
      ibm,associativity-form property firmware supports form 1 affinity and
      we should look at the first field in the ibm,associativity-reference-points
      array. If not we use the second field as we always have.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4b83c330
  11. 07 3月, 2010 1 次提交
  12. 09 6月, 2009 1 次提交
  13. 23 2月, 2009 1 次提交
    • N
      powerpc/numa: Cleanup hot_add_scn_to_nid · 0f16ef7f
      Nathan Fontenot 提交于
      This patch reworks the hot_add_scn_to_nid and its supporting functions
      to make them easier to understand.  There are no functional changes in
      this patch and has been tested on machine with memory represented in the
      device tree as memory nodes and in the ibm,dynamic-memory property.
      
      My previous patch that introduced support for hotplug memory add on
      systems whose memory was represented by the ibm,dynamic-memory property
      of the device tree only left the code more unintelligible.  This
      will hopefully makes things easier to understand.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0f16ef7f
  14. 13 2月, 2009 1 次提交
    • D
      powerpc/mm: Fix numa reserve bootmem page selection · 06eccea6
      Dave Hansen 提交于
      Fix the powerpc NUMA reserve bootmem page selection logic.
      
      commit 8f64e1f2 (powerpc: Reserve
      in bootmem lmb reserved regions that cross NUMA nodes) changed
      the logic for how the powerpc LMB reserved regions were converted
      to bootmen reserved regions.  As the folowing discussion reports,
      the new logic was not correct.
      
      mark_reserved_regions_for_nid() goes through each LMB on the
      system that specifies a reserved area.  It searches for
      active regions that intersect with that LMB and are on the
      specified node.  It attempts to bootmem-reserve only the area
      where the active region and the reserved LMB intersect.  We
      can not reserve things on other nodes as they may not have
      bootmem structures allocated, yet.
      
      We base the size of the bootmem reservation on two possible
      things.  Normally, we just make the reservation start and
      stop exactly at the start and end of the LMB.
      
      However, the LMB reservations are not aware of NUMA nodes and
      on occasion a single LMB may cross into several adjacent
      active regions.  Those may even be on different NUMA nodes
      and will require separate calls to the bootmem reserve
      functions.  So, the bootmem reservation must be trimmed to
      fit inside the current active region.
      
      That's all fine and dandy, but we trim the reservation
      in a page-aligned fashion.  That's bad because we start the
      reservation at a non-page-aligned address: physbase.
      
      The reservation may only span 2 bytes, but that those bytes
      may span two pfns and cause a reserve_size of 2*PAGE_SIZE.
      
      Take the case where you reserve 0x2 bytes at 0x0fff and
      where the active region ends at 0x1000.  You'll jump into
      that if() statment, but node_ar.end_pfn=0x1 and
      start_pfn=0x0.  You'll end up with a reserve_size=0x1000,
      and then call
      
        reserve_bootmem_node(node, physbase=0xfff, size=0x1000);
      
      0x1000 may not be on the same node as 0xfff.  Oops.
      
      In almost all the vm code, end_<anything> is not inclusive.
      If you have an end_pfn of 0x1234, page 0x1234 is not
      included in the range.  Using PFN_UP instead of the
      (>> >> PAGE_SHIFT) will make this consistent with the other VM
      code.
      
      We also need to do math for the reserved size with physbase
      instead of start_pfn.  node_ar.end_pfn << PAGE_SHIFT is
      *precisely* the end of the node.  However,
      (start_pfn << PAGE_SHIFT) is *NOT* precisely the beginning
      of the reserved area.  That is, of course, physbase.
      If we don't use physbase here, the reserve_size can be
      made too large.
      
      From: Dave Hansen <dave@linux.vnet.ibm.com>
      Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>  Tested on PS3.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      06eccea6
  15. 11 2月, 2009 2 次提交
  16. 08 1月, 2009 4 次提交
  17. 16 12月, 2008 1 次提交
    • D
      powerpc: Fix bootmem reservation on uninitialized node · a4c74ddd
      Dave Hansen 提交于
      careful_allocation() was calling into the bootmem allocator for
      nodes which had not been fully initialized and caused a previous
      bug:  http://patchwork.ozlabs.org/patch/10528/  So, I merged a
      few broken out loops in do_init_bootmem() to fix it.  That changed
      the code ordering.
      
      I think this bug is triggered by having reserved areas for a node
      which are spanned by another node's contents.  In the
      mark_reserved_regions_for_nid() code, we attempt to reserve the
      area for a node before we have allocated the NODE_DATA() for that
      nid.  We do this since I reordered that loop.  I suck.
      
      This is causing crashes at bootup on some systems, as reported
      by Jon Tollefson.
      
      This may only present on some systems that have 16GB pages
      reserved.  But, it can probably happen on any system that is
      trying to reserve large swaths of memory that happen to span other
      nodes' contents.
      
      This commit ensures that we do not touch bootmem for any node which
      has not been initialized, and also removes a compile warning about
      an unused variable.
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      a4c74ddd
  18. 01 12月, 2008 1 次提交
    • D
      powerpc: Fix boot freeze on machine with empty memory node · 4a618669
      Dave Hansen 提交于
      I got a bug report about a distro kernel not booting on a particular
      machine.  It would freeze during boot:
      
      > ...
      > Could not find start_pfn for node 1
      > [boot]0015 Setup Done
      > Built 2 zonelists in Node order, mobility grouping on.  Total pages: 123783
      > Policy zone: DMA
      > Kernel command line:
      > [boot]0020 XICS Init
      > [boot]0021 XICS Done
      > PID hash table entries: 4096 (order: 12, 32768 bytes)
      > clocksource: timebase mult[7d0000] shift[22] registered
      > Console: colour dummy device 80x25
      > console handover: boot [udbg0] -> real [hvc0]
      > Dentry cache hash table entries: 1048576 (order: 7, 8388608 bytes)
      > Inode-cache hash table entries: 524288 (order: 6, 4194304 bytes)
      > freeing bootmem node 0
      
      I've reproduced this on 2.6.27.7.  It is caused by commit
      8f64e1f2 ("powerpc: Reserve in bootmem
      lmb reserved regions that cross NUMA nodes").
      
      The problem is that Jon took a loop which was (in pseudocode):
      
      	for_each_node(nid)
      		NODE_DATA(nid) = careful_alloc(nid);
      		setup_bootmem(nid);
      		reserve_node_bootmem(nid);
      
      and broke it up into:
      
      	for_each_node(nid)
      		NODE_DATA(nid) = careful_alloc(nid);
      		setup_bootmem(nid);
      	for_each_node(nid)
      		reserve_node_bootmem(nid);
      
      The issue comes in when the 'careful_alloc()' is called on a node with
      no memory.  It falls back to using bootmem from a previously-initialized
      node.  But, bootmem has not yet been reserved when Jon's patch is
      applied.  It gives back bogus memory (0xc000000000000000) and pukes
      later in boot.
      
      The following patch collapses the loop back together.  It also breaks
      the mark_reserved_regions_for_nid() code out into a function and adds
      some comments.  I think a huge part of introducing this bug is because
      for loop was too long and hard to read.
      
      The actual bug fix here is the:
      
      +		if (end_pfn <= node->node_start_pfn ||
      +		    start_pfn >= node_end_pfn)
      +			continue;
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      4a618669
  19. 21 10月, 2008 2 次提交
  20. 10 10月, 2008 1 次提交
    • J
      powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes · 8f64e1f2
      Jon Tollefson 提交于
      If there are multiple reserved memory blocks via lmb_reserve() that are
      contiguous addresses and on different NUMA nodes we are losing track of which
      address ranges to reserve in bootmem on which node.  I discovered this
      when I recently got to try 16GB huge pages on a system with more then 2 nodes.
      
      When scanning the device tree in early boot we call lmb_reserve() with
      the addresses of the 16G pages that we find so that the memory doesn't
      get used for something else.  For example the addresses for the pages
      could be 4000000000, 4400000000, 4800000000, 4C00000000, etc - 8 pages,
      one on each of eight nodes.  In the lmb after all the pages have been
      reserved it will look something like the following:
      
      lmb_dump_all:
          memory.cnt            = 0x2
          memory.size           = 0x3e80000000
          memory.region[0x0].base       = 0x0
                            .size     = 0x1e80000000
          memory.region[0x1].base       = 0x4000000000
                            .size     = 0x2000000000
          reserved.cnt          = 0x5
          reserved.size         = 0x3e80000000
          reserved.region[0x0].base       = 0x0
                            .size     = 0x7b5000
          reserved.region[0x1].base       = 0x2a00000
                            .size     = 0x78c000
          reserved.region[0x2].base       = 0x328c000
                            .size     = 0x43000
          reserved.region[0x3].base       = 0xf4e8000
                            .size     = 0xb18000
          reserved.region[0x4].base       = 0x4000000000
                            .size     = 0x2000000000
      
      The reserved.region[0x4] contains the 16G pages.  In
      arch/powerpc/mm/num.c: do_init_bootmem() we loop through each of the
      node numbers looking for the reserved regions that belong to the
      particular node.  It is not able to identify region 0x4 as being a part
      of each of the 8 nodes.  It is assuming that a reserved region is only
      on a single node.
      
      This patch takes out the reserved region loop from inside
      the loop that goes over each node.  It looks up the active region containing
      the start of the reserved region.  If it extends past that active region then
      it adjusts the size and gets the next active region containing it.
      Signed-off-by: NJon Tollefson <kniht@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8f64e1f2
  21. 16 9月, 2008 1 次提交
    • C
      powerpc: Add support for dynamic reconfiguration memory in kexec/kdump kernels · cf00085d
      Chandru 提交于
      Kdump kernel needs to use only those memory regions that it is allowed
      to use (crashkernel, rtas, tce, etc.).  Each of these regions have
      their own sizes and are currently added under 'linux,usable-memory'
      property under each memory@xxx node of the device tree.
      
      The ibm,dynamic-memory property of ibm,dynamic-reconfiguration-memory
      node (on POWER6) now stores in it the representation for most of the
      logical memory blocks with the size of each memory block being a
      constant (lmb_size).  If one or more or part of the above mentioned
      regions lie under one of the lmb from ibm,dynamic-memory property,
      there is a need to identify those regions within the given lmb.
      
      This makes the kernel recognize a new 'linux,drconf-usable-memory'
      property added by kexec-tools.  Each entry in this property is of the
      form of a count followed by that many (base, size) pairs for the above
      mentioned regions.  The number of cells in the count value is given by
      the #size-cells property of the root node.
      Signed-off-by: NChandru Siddalingappa <chandru@in.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      cf00085d
  22. 25 7月, 2008 1 次提交
  23. 03 7月, 2008 2 次提交
  24. 24 4月, 2008 1 次提交
  25. 14 2月, 2008 1 次提交
  26. 08 2月, 2008 1 次提交
    • B
      Introduce flags for reserve_bootmem() · 72a7fe39
      Bernhard Walle 提交于
      This patchset adds a flags variable to reserve_bootmem() and uses the
      BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
      between crashkernel area and already used memory.
      
      This patch:
      
      Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
      If that flag is set, the function returns with -EBUSY if the memory already
      has been reserved in the past.  This is to avoid conflicts.
      
      Because that code runs before SMP initialisation, there's no race condition
      inside reserve_bootmem_core().
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix powerpc build]
      Signed-off-by: NBernhard Walle <bwalle@suse.de>
      Cc: <linux-arch@vger.kernel.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72a7fe39
  27. 07 2月, 2008 1 次提交
    • B
      [POWERPC] Fake NUMA emulation for PowerPC · 1daa6d08
      Balbir Singh 提交于
      Here's a dumb simple implementation of fake NUMA nodes for PowerPC.
      Fake NUMA nodes can be specified using the following command line
      option
      
      numa=fake=<node range>
      
      node range is of the format <range1>,<range2>,...<rangeN>
      
      Each of the rangeX parameters is passed using memparse().  I find the
      patch useful for fake NUMA emulation on my simple PowerPC machine.
      I've tested it on a numa box with the following arguments
      
      numa=fake=512M
      numa=fake=512M,768M
      numa=fake=256M,512M mem=512M
      numa=fake=1G mem=768M
      numa=fake=
      without any numa= argument
      
      The other side-effect introduced by this patch is that; in the case
      where we don't have NUMA information, we now set a node online after
      adding each LMB.  This node could very well be node 0, but in the case
      that we enable fake NUMA nodes, when we cross node boundaries, we need
      to set the new node online.
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      1daa6d08
  28. 26 1月, 2008 1 次提交
    • P
      Revert "[POWERPC] Fake NUMA emulation for PowerPC" · 55852bed
      Paul Mackerras 提交于
      This reverts commit 5c3f5892,
      basically because it changes behaviour even when no fake NUMA
      information is specified on the kernel command line.
      
      Firstly, it changes the nid, thus destroying the real NUMA
      information.  Secondly, it also changes behaviour in that if a node
      ends up with no memory in it because of the memory limit, we used to
      set it online and now we don't.
      
      Also, in the non-NUMA case with no fake NUMA information, we do
      node_set_online once for each LMB now, whereas previously we only did
      it once.  I don't know if that is actually a problem, but it does seem
      unnecessary.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      55852bed
  29. 20 12月, 2007 1 次提交
    • B
      [POWERPC] Fake NUMA emulation for PowerPC · 5c3f5892
      Balbir Singh 提交于
      Here's a dumb simple implementation of fake NUMA nodes for PowerPC.
      Fake NUMA nodes can be specified using the following command line option
      
      numa=fake=<node range>
      
      node range is of the format <range1>,<range2>,...<rangeN>
      
      Each of the rangeX parameters is passed using memparse().  I find this
      useful for fake NUMA emulation on my simple PowerPC machine.  I've
      tested it on a non-numa box with the following arguments:
      
      numa=fake=1G
      numa=fake=1G,2G
      name=fake=1G,512M,2G
      numa=fake=1500M,2800M mem=3500M
      numa=fake=1G mem=512M
      numa=fake=1G mem=1G
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NOlof Johansson <olof@lixom.net>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      5c3f5892
  30. 03 8月, 2007 1 次提交
  31. 10 5月, 2007 1 次提交
    • R
      Add suspend-related notifications for CPU hotplug · 8bb78442
      Rafael J. Wysocki 提交于
      Since nonboot CPUs are now disabled after tasks and devices have been
      frozen and the CPU hotplug infrastructure is used for this purpose, we need
      special CPU hotplug notifications that will help the CPU-hotplug-aware
      subsystems distinguish normal CPU hotplug events from CPU hotplug events
      related to a system-wide suspend or resume operation in progress.  This
      patch introduces such notifications and causes them to be used during
      suspend and resume transitions.  It also changes all of the
      CPU-hotplug-aware subsystems to take these notifications into consideration
      (for now they are handled in the same way as the corresponding "normal"
      ones).
      
      [oleg@tv-sign.ru: cleanups]
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bb78442
  32. 13 4月, 2007 3 次提交