1. 16 12月, 2008 2 次提交
    • D
      powerpc: Fix bootmem reservation on uninitialized node · a4c74ddd
      Dave Hansen 提交于
      careful_allocation() was calling into the bootmem allocator for
      nodes which had not been fully initialized and caused a previous
      bug:  http://patchwork.ozlabs.org/patch/10528/  So, I merged a
      few broken out loops in do_init_bootmem() to fix it.  That changed
      the code ordering.
      
      I think this bug is triggered by having reserved areas for a node
      which are spanned by another node's contents.  In the
      mark_reserved_regions_for_nid() code, we attempt to reserve the
      area for a node before we have allocated the NODE_DATA() for that
      nid.  We do this since I reordered that loop.  I suck.
      
      This is causing crashes at bootup on some systems, as reported
      by Jon Tollefson.
      
      This may only present on some systems that have 16GB pages
      reserved.  But, it can probably happen on any system that is
      trying to reserve large swaths of memory that happen to span other
      nodes' contents.
      
      This commit ensures that we do not touch bootmem for any node which
      has not been initialized, and also removes a compile warning about
      an unused variable.
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      a4c74ddd
    • B
      powerpc: Check for valid hugepage size in hugetlb_get_unmapped_area · 48f797de
      Brian King 提交于
      It looks like most of the hugetlb code is doing the correct thing if
      hugepages are not supported, but the mmap code is not.  If we get into
      the mmap code when hugepages are not supported, such as in an LPAR
      which is running Active Memory Sharing, we can oops the kernel.  This
      fixes the oops being seen in this path.
      
      oops: Kernel access of bad area, sig: 11 [#1]
      SMP NR_CPUS=1024 NUMA pSeries
      Modules linked in: nfs(N) lockd(N) nfs_acl(N) sunrpc(N) ipv6(N) fuse(N) loop(N)
      dm_mod(N) sg(N) ibmveth(N) sd_mod(N) crc_t10dif(N) ibmvscsic(N)
      scsi_transport_srp(N) scsi_tgt(N) scsi_mod(N)
      Supported: No
      NIP: c000000000038d60 LR: c00000000003945c CTR: c0000000000393f0
      REGS: c000000077e7b830 TRAP: 0300   Tainted: G
      (2.6.27.5-bz50170-2-ppc64)
      MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 44000448  XER: 20000001
      DAR: c000002000af90a8, DSISR: 0000000040000000
      TASK = c00000007c1b8600[4019] 'hugemmap01' THREAD: c000000077e78000 CPU: 6
      GPR00: 0000001fffffffe0 c000000077e7bab0 c0000000009a4e78 0000000000000000
      GPR04: 0000000000010000 0000000000000001 00000000ffffffff 0000000000000001
      GPR08: 0000000000000000 c000000000af90c8 0000000000000001 0000000000000000
      GPR12: 000000000000003f c000000000a73880 0000000000000000 0000000000000000
      GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000010000
      GPR20: 0000000000000000 0000000000000003 0000000000010000 0000000000000001
      GPR24: 0000000000000003 0000000000000000 0000000000000001 ffffffffffffffb5
      GPR28: c000000077ca2e80 0000000000000000 c00000000092af78 0000000000010000
      NIP [c000000000038d60] .slice_get_unmapped_area+0x6c/0x4e0
      LR [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80
      Call Trace:
      [c000000077e7bbc0] [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80
      [c000000077e7bc30] [c000000000107e30] .get_unmapped_area+0x64/0xd8
      [c000000077e7bcb0] [c00000000010b140] .do_mmap_pgoff+0x140/0x420
      [c000000077e7bd80] [c00000000000bf5c] .sys_mmap+0xc4/0x140
      [c000000077e7be30] [c0000000000086b4] syscall_exit+0x0/0x40
      Instruction dump:
      fac1ffb0 fae1ffb8 fb01ffc0 fb21ffc8 fb41ffd0 fb61ffd8 fb81ffe0 fbc1fff0
      fbe1fff8 f821fef1 f8c10158 f8e10160 <7d49002e> f9010168 e92d01b0 eb4902b0
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      48f797de
  2. 01 12月, 2008 2 次提交
    • D
      powerpc: Fix boot freeze on machine with empty memory node · 4a618669
      Dave Hansen 提交于
      I got a bug report about a distro kernel not booting on a particular
      machine.  It would freeze during boot:
      
      > ...
      > Could not find start_pfn for node 1
      > [boot]0015 Setup Done
      > Built 2 zonelists in Node order, mobility grouping on.  Total pages: 123783
      > Policy zone: DMA
      > Kernel command line:
      > [boot]0020 XICS Init
      > [boot]0021 XICS Done
      > PID hash table entries: 4096 (order: 12, 32768 bytes)
      > clocksource: timebase mult[7d0000] shift[22] registered
      > Console: colour dummy device 80x25
      > console handover: boot [udbg0] -> real [hvc0]
      > Dentry cache hash table entries: 1048576 (order: 7, 8388608 bytes)
      > Inode-cache hash table entries: 524288 (order: 6, 4194304 bytes)
      > freeing bootmem node 0
      
      I've reproduced this on 2.6.27.7.  It is caused by commit
      8f64e1f2 ("powerpc: Reserve in bootmem
      lmb reserved regions that cross NUMA nodes").
      
      The problem is that Jon took a loop which was (in pseudocode):
      
      	for_each_node(nid)
      		NODE_DATA(nid) = careful_alloc(nid);
      		setup_bootmem(nid);
      		reserve_node_bootmem(nid);
      
      and broke it up into:
      
      	for_each_node(nid)
      		NODE_DATA(nid) = careful_alloc(nid);
      		setup_bootmem(nid);
      	for_each_node(nid)
      		reserve_node_bootmem(nid);
      
      The issue comes in when the 'careful_alloc()' is called on a node with
      no memory.  It falls back to using bootmem from a previously-initialized
      node.  But, bootmem has not yet been reserved when Jon's patch is
      applied.  It gives back bogus memory (0xc000000000000000) and pukes
      later in boot.
      
      The following patch collapses the loop back together.  It also breaks
      the mark_reserved_regions_for_nid() code out into a function and adds
      some comments.  I think a huge part of introducing this bug is because
      for loop was too long and hard to read.
      
      The actual bug fix here is the:
      
      +		if (end_pfn <= node->node_start_pfn ||
      +		    start_pfn >= node_end_pfn)
      +			continue;
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      4a618669
    • A
      powerpc set_huge_psize() false positive · 4ea8fb9c
      Al Viro 提交于
      called only from __init, calls __init.  Incidentally, it ought to be static
      in file.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ea8fb9c
  3. 13 11月, 2008 1 次提交
    • G
      powerpc/40x: Limit allocable DRAM during early mapping · 5907630f
      Grant Erickson 提交于
      If the size of DRAM is not an exact power of two, we may not have
      covered DRAM in its entirety with large 16 and 4 MiB pages.  If that
      is the case, we can get non-recoverable page faults when doing the
      final PTE mappings for the non-large page PTEs.
      
      Consequently, we restrict the top end of DRAM currently allocable
      by updating '__initial_memory_limit_addr' so that calls to the LMB to
      allocate PTEs for "tail" coverage with normal-sized pages (or other
      reasons) do not attempt to allocate outside the allowed range.
      Signed-off-by: NGrant Erickson <gerickson@nuovations.com>
      Signed-off-by: NJosh Boyer <jwboyer@linux.vnet.ibm.com>
      5907630f
  4. 22 10月, 2008 1 次提交
  5. 21 10月, 2008 2 次提交
  6. 20 10月, 2008 1 次提交
    • B
      mm: cleanup to make remove_memory() arch-neutral · 71088785
      Badari Pulavarty 提交于
      There is nothing architecture specific about remove_memory().
      remove_memory() function is common for all architectures which support
      hotplug memory remove.  Instead of duplicating it in every architecture,
      collapse them into arch neutral function.
      
      [akpm@linux-foundation.org: fix the export]
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71088785
  7. 14 10月, 2008 1 次提交
  8. 10 10月, 2008 1 次提交
    • J
      powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes · 8f64e1f2
      Jon Tollefson 提交于
      If there are multiple reserved memory blocks via lmb_reserve() that are
      contiguous addresses and on different NUMA nodes we are losing track of which
      address ranges to reserve in bootmem on which node.  I discovered this
      when I recently got to try 16GB huge pages on a system with more then 2 nodes.
      
      When scanning the device tree in early boot we call lmb_reserve() with
      the addresses of the 16G pages that we find so that the memory doesn't
      get used for something else.  For example the addresses for the pages
      could be 4000000000, 4400000000, 4800000000, 4C00000000, etc - 8 pages,
      one on each of eight nodes.  In the lmb after all the pages have been
      reserved it will look something like the following:
      
      lmb_dump_all:
          memory.cnt            = 0x2
          memory.size           = 0x3e80000000
          memory.region[0x0].base       = 0x0
                            .size     = 0x1e80000000
          memory.region[0x1].base       = 0x4000000000
                            .size     = 0x2000000000
          reserved.cnt          = 0x5
          reserved.size         = 0x3e80000000
          reserved.region[0x0].base       = 0x0
                            .size     = 0x7b5000
          reserved.region[0x1].base       = 0x2a00000
                            .size     = 0x78c000
          reserved.region[0x2].base       = 0x328c000
                            .size     = 0x43000
          reserved.region[0x3].base       = 0xf4e8000
                            .size     = 0xb18000
          reserved.region[0x4].base       = 0x4000000000
                            .size     = 0x2000000000
      
      The reserved.region[0x4] contains the 16G pages.  In
      arch/powerpc/mm/num.c: do_init_bootmem() we loop through each of the
      node numbers looking for the reserved regions that belong to the
      particular node.  It is not able to identify region 0x4 as being a part
      of each of the 8 nodes.  It is assuming that a reserved region is only
      on a single node.
      
      This patch takes out the reserved region loop from inside
      the loop that goes over each node.  It looks up the active region containing
      the start of the reserved region.  If it extends past that active region then
      it adjusts the size and gets the next active region containing it.
      Signed-off-by: NJon Tollefson <kniht@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8f64e1f2
  9. 07 10月, 2008 1 次提交
    • R
      powerpc: Avoid integer overflow in page_is_ram() · a880e762
      Roland Dreier 提交于
      Commit 8b150478 ("ppc: make phys_mem_access_prot() work with pfns
      instead of addresses") fixed page_is_ram() in arch/ppc to avoid overflow
      for addresses above 4G on 32-bit kernels.  However arch/powerpc's
      page_is_ram() is missing the same fix -- it computes a physical address
      by doing pfn << PAGE_SHIFT, which overflows if pfn corresponds to a page
      above 4G.
      
      In particular this causes pages above 4G to be mapped with the wrong
      caching attribute; for example many ppc440-based SoCs have PCI space
      above 4G, and mmap()ing MMIO space may end up with a mapping that has
      caching enabled.
      
      Fix this by working with the pfn and avoiding the conversion to
      physical address that causes the overflow.  This patch compares the
      pfn to max_pfn, which is a semantic change from the old code -- that
      code compared the physical address to high_memory, which corresponds
      to max_low_pfn.  However, I think that was is another bug, since
      highmem pages are still RAM.
      Reported-by: Nvb <vb@vsbe.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a880e762
  10. 25 9月, 2008 1 次提交
    • B
      POWERPC: Allow 32-bit hashed pgtable code to support 36-bit physical · 4ee7084e
      Becky Bruce 提交于
      This rearranges a bit of code, and adds support for
      36-bit physical addressing for configs that use a
      hashed page table.  The 36b physical support is not
      enabled by default on any config - it must be
      explicitly enabled via the config system.
      
      This patch *only* expands the page table code to accomodate
      large physical addresses on 32-bit systems and enables the
      PHYS_64BIT config option for 86xx.  It does *not*
      allow you to boot a board with more than about 3.5GB of
      RAM - for that, SWIOTLB support is also required (and
      coming soon).
      Signed-off-by: NBecky Bruce <becky.bruce@freescale.com>
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      4ee7084e
  11. 16 9月, 2008 5 次提交
    • B
      powerpc/85xx: fix build warning, remove silly cast · 82331ab1
      Becky Bruce 提交于
      This fixes a build warning when PHYS_64BIT is enabled, and removes an
      unnecessary cast to phys_addr_t (the variable being cast is already
      a phys_addr_t)
      Signed-off-by: NBecky Bruce <becky.bruce@freescale.com>
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      82331ab1
    • D
      powerpc: Clean up hugepage pagetable allocation for powerpc with 16G pages · 0b26425c
      David Gibson 提交于
      There is a small bug in the handling of 16G hugepages recently added
      to the kernel.  This doesn't cause a crash or other user-visible
      problems, but it does mean that more levels of pagetable are allocated
      than makes sense for 16G pages.  The hugepage pagetables for the 16G
      pages are allocated much lower in the pagetable tree than they should
      be, with the intervening levels allocated with full pmd and pud pages
      which will only ever have one entry filled in.
      
      This corrects this problem, at the same time cleaning up the handling
      of which level 64k versus 16M hugepage pagetables are allocated at.
      The new way of formatting the tests should be more robust against
      changes in pagetable structure, or any newly added hugepage sizes.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      0b26425c
    • B
      powerpc: Rename PTE_SIZE to HPTE_SIZE · aaf4a9b0
      Becky Bruce 提交于
      It's the size of the hardware PTE; make that clear in the name.
      Signed-off-by: NBecky Bruce <becky.bruce@freescale.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      aaf4a9b0
    • P
      powerpc: Make the 64-bit kernel as a position-independent executable · 549e8152
      Paul Mackerras 提交于
      This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as
      a position-independent executable (PIE) when it is set.  This involves
      processing the dynamic relocations in the image in the early stages of
      booting, even if the kernel is being run at the address it is linked at,
      since the linker does not necessarily fill in words in the image for
      which there are dynamic relocations.  (In fact the linker does fill in
      such words for 64-bit executables, though not for 32-bit executables,
      so in principle we could avoid calling relocate() entirely when we're
      running a 64-bit kernel at the linked address.)
      
      The dynamic relocations are processed by a new function relocate(addr),
      where the addr parameter is the virtual address where the image will be
      run.  In fact we call it twice; once before calling prom_init, and again
      when starting the main kernel.  This means that reloc_offset() returns
      0 in prom_init (since it has been relocated to the address it is running
      at), which necessitated a few adjustments.
      
      This also changes __va and __pa to use an equivalent definition that is
      simpler.  With the relocatable kernel, PAGE_OFFSET and MEMORY_START are
      constants (for 64-bit) whereas PHYSICAL_START is a variable (and
      KERNELBASE ideally should be too, but isn't yet).
      
      With this, relocatable kernels still copy themselves down to physical
      address 0 and run there.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      549e8152
    • C
      powerpc: Add support for dynamic reconfiguration memory in kexec/kdump kernels · cf00085d
      Chandru 提交于
      Kdump kernel needs to use only those memory regions that it is allowed
      to use (crashkernel, rtas, tce, etc.).  Each of these regions have
      their own sizes and are currently added under 'linux,usable-memory'
      property under each memory@xxx node of the device tree.
      
      The ibm,dynamic-memory property of ibm,dynamic-reconfiguration-memory
      node (on POWER6) now stores in it the representation for most of the
      logical memory blocks with the size of each memory block being a
      constant (lmb_size).  If one or more or part of the above mentioned
      regions lie under one of the lmb from ibm,dynamic-memory property,
      there is a need to identify those regions within the given lmb.
      
      This makes the kernel recognize a new 'linux,drconf-usable-memory'
      property added by kexec-tools.  Each entry in this property is of the
      form of a count followed by that many (base, size) pairs for the above
      mentioned regions.  The number of cells in the count value is given by
      the #size-cells property of the root node.
      Signed-off-by: NChandru Siddalingappa <chandru@in.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      cf00085d
  12. 03 9月, 2008 1 次提交
    • P
      powerpc: Only make kernel text pages of linear mapping executable · 9e88ba4e
      Paul Mackerras 提交于
      Commit bc033b63 ("powerpc/mm: Fix
      attribute confusion with htab_bolt_mapping()") moved the check for
      whether we should make pages of the linear mapping executable from
      htab_bolt_mapping into its callers, including htab_initialize.
      A side-effect of this is that the decision is now made once for
      each contiguous section in the LMB array rather than for each page
      individually.  This can often mean that the whole of the linear
      mapping ends up being executable.
      
      This reverts to the previous behaviour, where individual pages are
      checked for being part of the kernel text or not, by moving the check
      back down into htab_bolt_mapping.
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      9e88ba4e
  13. 20 8月, 2008 1 次提交
  14. 11 8月, 2008 1 次提交
    • B
      powerpc/mm: Fix attribute confusion with htab_bolt_mapping() · bc033b63
      Benjamin Herrenschmidt 提交于
      The function htab_bolt_mapping() is used to create permanent
      mappings in the MMU hash table, for example, in order to create
      the linear mapping of vmemmap.  It's also used by early boot
      ioremap (before mem_init_done).
      
      However, the way ioremap uses it is incorrect as it passes it the
      protection flags in the "linux PTE" form while htab_bolt_mapping()
      expects them in the hash table format.  This is made more confusing by
      the fact that some of those flags are actually in the same position in
      both cases.
      
      This fixes it all by making htab_bolt_mapping() take normal linux
      protection flags instead, and use a little helper to convert them to
      htab flags. Callers can now use the usual PAGE_* definitions safely.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      
       arch/powerpc/include/asm/mmu-hash64.h |    2 -
       arch/powerpc/mm/hash_utils_64.c       |   65 ++++++++++++++++++++--------------
       arch/powerpc/mm/init_64.c             |    9 +---
       3 files changed, 44 insertions(+), 32 deletions(-)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      bc033b63
  15. 04 8月, 2008 3 次提交
  16. 30 7月, 2008 1 次提交
  17. 28 7月, 2008 1 次提交
  18. 27 7月, 2008 2 次提交
  19. 25 7月, 2008 10 次提交
  20. 10 7月, 2008 2 次提交