1. 01 7月, 2008 2 次提交
  2. 30 6月, 2008 3 次提交
  3. 18 6月, 2008 1 次提交
    • P
      [POWERPC] Clear sub-page HPTE present bits when demoting page size · 65ba6cdc
      Paul Mackerras 提交于
      When we demote a slice from 64k to 4k, and we are about to insert an
      HPTE for a 4k subpage and we notice that there is an existing 64k
      HPTE, we first invalidate that HPTE before inserting the new 4k
      subpage HPTE.  Since the bits that encode which hash bucket the old
      HPTE was in overlap with the bits that encode which of the 16 subpages
      have HPTEs, we need to clear out the subpage HPTE-present bits before
      starting to insert HPTEs for the 4k subpages.  If we don't do that, we
      can erroneously think that a subpage already has an HPTE when it
      doesn't.
      
      That in itself wouldn't be such a problem except that when we go to
      update the HPTE that we think is present on machines with a
      hypervisor, the hypervisor can tell us that the HPTE we think is there
      is actually there even though it isn't, which can lead to a process
      getting stuck in a loop, continually faulting.  The reason for the
      confusion is that the AVPN (abbreviated virtual page number) we are
      looking for in the HPTE for a 4k subpage can actually match the AVPN
      in a stale HPTE for another 64k page.  For example, the HPTE for
      the 4k subpage at 0x84000f000 will be in the same hash bucket and have
      the same AVPN as the HPTE for the 64k page at 0x8400f0000.
      
      This fixes the code to clear out the subpage HPTE-present bits.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      65ba6cdc
  4. 09 6月, 2008 1 次提交
  5. 23 5月, 2008 1 次提交
  6. 20 5月, 2008 1 次提交
  7. 15 5月, 2008 1 次提交
    • B
      [POWERPC] vmemmap fixes to use smaller pages · cec08e7a
      Benjamin Herrenschmidt 提交于
      This changes vmemmap to use a different region (region 0xf) of the
      address space, and to configure the page size of that region
      dynamically at boot.
      
      The problem with the current approach of always using 16M pages is that
      it's not well suited to machines that have small amounts of memory such
      as small partitions on pseries, or PS3's.
      
      In fact, on the PS3, failure to allocate the 16M page backing vmmemmap
      tends to prevent hotplugging the HV's "additional" memory, thus limiting
      the available memory even more, from my experience down to something
      like 80M total, which makes it really not very useable.
      
      The logic used by my match to choose the vmemmap page size is:
      
       - If 16M pages are available and there's 1G or more RAM at boot,
         use that size.
       - Else if 64K pages are available, use that
       - Else use 4K pages
      
      I've tested on a POWER6 (16M pages) and on an iSeries POWER3 (4K pages)
      and it seems to work fine.
      
      Note that I intend to change the way we organize the kernel regions &
      SLBs so the actual region will change from 0xf back to something else at
      one point, as I simplify the SLB miss handler, but that will be for a
      later patch.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      cec08e7a
  8. 14 5月, 2008 4 次提交
  9. 02 5月, 2008 2 次提交
    • P
      [POWERPC] Bolt in SLB entry for kernel stack on secondary cpus · 3b575064
      Paul Mackerras 提交于
      This fixes a regression reported by Kamalesh Bulabel where a POWER4
      machine would crash because of an SLB miss at a point where the SLB
      miss exception was unrecoverable.  This regression is tracked at:
      
      http://bugzilla.kernel.org/show_bug.cgi?id=10082
      
      SLB misses at such points shouldn't happen because the kernel stack is
      the only memory accessed other than things in the first segment of the
      linear mapping (which is mapped at all times by entry 0 of the SLB).
      The context switch code ensures that SLB entry 2 covers the kernel
      stack, if it is not already covered by entry 0.  None of entries 0
      to 2 are ever replaced by the SLB miss handler.
      
      Where this went wrong is that the context switch code assumes it
      doesn't have to write to SLB entry 2 if the new kernel stack is in the
      same segment as the old kernel stack, since entry 2 should already be
      correct.  However, when we start up a secondary cpu, it calls
      slb_initialize, which doesn't set up entry 2.  This is correct for
      the boot cpu, where we will be using a stack in the kernel BSS at this
      point (i.e. init_thread_union), but not necessarily for secondary
      cpus, whose initial stack can be allocated anywhere.  This doesn't
      cause any immediate problem since the SLB miss handler will just
      create an SLB entry somewhere else to cover the initial stack.
      
      In fact it's possible for the cpu to go quite a long time without SLB
      entry 2 being valid.  Eventually, though, the entry created by the SLB
      miss handler will get overwritten by some other entry, and if the next
      access to the stack is at an unrecoverable point, we get the crash.
      
      This fixes the problem by making slb_initialize create a suitable
      entry for the kernel stack, if we are on a secondary cpu and the stack
      isn't covered by SLB entry 0.  This requires initializing the
      get_paca()->kstack field earlier, so I do that in smp_create_idle
      where the current field is initialized.  This also abstracts a bit of
      the computation that mk_esid_data in slb.c does so that it can be used
      in slb_initialize.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      3b575064
    • G
      [POWERPC] Fix slb.c compile warnings · bbea3460
      Geoff Levand 提交于
      Arrange for a syntax check to always be done on the powerpc/mm/slb.c
      DBG() macro by defining it to pr_debug() for non-debug builds.
      
      Also, fix these related compile warnings:
      
        slb.c:273: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'long unsigned int
        slb.c:274: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'long unsigned int'
      Signed-off-by: NGeoff Levand <geoffrey.levand@am.sony.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      bbea3460
  10. 29 4月, 2008 1 次提交
  11. 28 4月, 2008 1 次提交
  12. 24 4月, 2008 4 次提交
    • K
      [POWERPC] Clean up access to thread_info in assembly · f608600e
      Kumar Gala 提交于
      Use (31-THREAD_SHIFT) to get to thread_info from stack pointer.  This makes
      the code a bit easier to read and more robust if we ever change THREAD_SHIFT.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f608600e
    • K
      [POWERPC] Port fixmap from x86 and use for kmap_atomic · 2c419bde
      Kumar Gala 提交于
      The fixmap code from x86 allows us to have compile time virtual addresses
      that we change the physical addresses of at run time.
      
      This is useful for applications like kmap_atomic, PCI config that is done
      via direct memory map, kexec/kdump.
      
      We got ride of CONFIG_HIGHMEM_START as we can now determine a more optimal
      location for PKMAP_BASE based on where the fixmap addresses start and
      working back from there.
      
      Additionally, the kmap code in asm-powerpc/highmem.h always had debug
      enabled.  Moved to using CONFIG_DEBUG_HIGHMEM to determine if we should
      have the extra debug checking.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      2c419bde
    • K
      [POWERPC] 85xx: Add support for relocatable kernel (and booting at non-zero) · 37dd2bad
      Kumar Gala 提交于
      Added support to allow an 85xx kernel to be run from a non-zero physical
      address (useful for cooperative asymmetric multiprocessing situations and
      kdump).  The support can be configured at compile time by setting
      CONFIG_PAGE_OFFSET, CONFIG_KERNEL_START, and CONFIG_PHYSICAL_START as
      desired.
      
      Alternatively, the kernel build can set CONFIG_RELOCATABLE.  Setting this
      config option causes the kernel to determine at runtime the physical
      addresses of CONFIG_PAGE_OFFSET and CONFIG_KERNEL_START.  If
      CONFIG_RELOCATABLE is set, then CONFIG_PHYSICAL_START has no meaning.
      However, CONFIG_PHYSICAL_START will always be used to set the LOAD program
      header physical address field in the resulting ELF image.
      
      Currently we are limited to running at a physical address that is a
      multiple of 256M.  This is due to how we map TLBs to cover
      lowmem.  This should be fixed to allow 64M or maybe even 16M alignment
      in the future.  It is considered an error to try and run a kernel at a
      non-aligned physical address.
      
      All the magic for this support is accomplished by proper initialization
      of the kernel memory subsystem and use of ARCH_PFN_OFFSET.
      
      The use of ARCH_PFN_OFFSET only affects normal memory and not IO mappings.
      ioremap uses map_page and isn't affected by ARCH_PFN_OFFSET.
      
      /dev/mem continues to allow access to any physical address in the system
      regardless of how CONFIG_PHYSICAL_START is set.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      37dd2bad
    • M
      [POWERPC] Add include of linux/of.h to numa.c · 6df1646e
      Michael Ellerman 提交于
      numa.c requires routines declared in linux/of.h, so should include it.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      6df1646e
  13. 18 4月, 2008 1 次提交
  14. 17 4月, 2008 5 次提交
  15. 07 4月, 2008 1 次提交
  16. 03 4月, 2008 1 次提交
  17. 01 4月, 2008 4 次提交
  18. 24 3月, 2008 1 次提交
    • P
      [POWERPC] Don't use 64k pages for ioremap on pSeries · cfe666b1
      Paul Mackerras 提交于
      On pSeries, the hypervisor doesn't let us map in the eHEA ethernet
      adapter using 64k pages, and thus the ehea driver will fail if 64k
      pages are configured.  This works around the problem by always
      using 4k pages for ioremap on pSeries (but not on other platforms).
      A better fix would be to check whether the partition could ever
      have an eHEA adapter, and only force 4k pages if it could, but this
      will do for 2.6.25.
      
      This is based on an earlier patch by Tony Breeds.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      cfe666b1
  19. 20 3月, 2008 1 次提交
  20. 13 3月, 2008 1 次提交
    • M
      [POWERPC] Fix large hash table allocation on Cell blades · 31bf1119
      Michael Ellerman 提交于
      My recent hack to allocate the hash table under 1GB on cell was poorly
      tested, *cough*. It turns out on blades with large amounts of memory we
      fail to allocate the hash table at all. This is because RTAS has been
      instantiated just below 768MB, and 0-x MB are used by the kernel,
      leaving no areas that are both large enough and also naturally-aligned.
      
      For the cell IOMMU hack the page tables must be under 2GB, so use that
      as the limit instead. This has been tested on real hardware and boots
      happily.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      31bf1119
  21. 26 2月, 2008 1 次提交
  22. 14 2月, 2008 1 次提交
  23. 09 2月, 2008 1 次提交
    • M
      CONFIG_HIGHPTE vs. sub-page page tables. · 2f569afd
      Martin Schwidefsky 提交于
      Background: I've implemented 1K/2K page tables for s390.  These sub-page
      page tables are required to properly support the s390 virtualization
      instruction with KVM.  The SIE instruction requires that the page tables
      have 256 page table entries (pte) followed by 256 page status table entries
      (pgste).  The pgstes are only required if the process is using the SIE
      instruction.  The pgstes are updated by the hardware and by the hypervisor
      for a number of reasons, one of them is dirty and reference bit tracking.
      To avoid wasting memory the standard pte table allocation should return
      1K/2K (31/64 bit) and 2K/4K if the process is using SIE.
      
      Problem: Page size on s390 is 4K, page table size is 1K or 2K.  That means
      the s390 version for pte_alloc_one cannot return a pointer to a struct
      page.  Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
      cannot return a pointer to a pte either, since that would require more than
      32 bit for the return value of pte_alloc_one (and the pte * would not be
      accessible since its not kmapped).
      
      Solution: The only solution I found to this dilemma is a new typedef: a
      pgtable_t.  For s390 pgtable_t will be a (pte *) - to be introduced with a
      later patch.  For everybody else it will be a (struct page *).  The
      additional problem with the initialization of the ptl lock and the
      NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
      a destructor pgtable_page_dtor.  The page table allocation and free
      functions need to call these two whenever a page table page is allocated or
      freed.  pmd_populate will get a pgtable_t instead of a struct page pointer.
       To get the pgtable_t back from a pmd entry that has been installed with
      pmd_populate a new function pmd_pgtable is added.  It replaces the pmd_page
      call in free_pte_range and apply_to_pte_range.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f569afd