1. 04 5月, 2014 3 次提交
    • D
      sparc64: Fix hex values in comment above pte_modify(). · c2e4e676
      David S. Miller 提交于
      When _PAGE_SPECIAL and _PAGE_PMD_HUGE were added to the mask, the
      comment was not updated.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2e4e676
    • D
      sparc64: Fix bugs in get_user_pages_fast() wrt. THP. · 04df419d
      David S. Miller 提交于
      The large PMD path needs to check _PAGE_VALID not _PAGE_PRESENT, to
      decide if it needs to bail and return 0.
      
      pmd_large() should therefore just check _PAGE_PMD_HUGE.
      
      Calls to gup_huge_pmd() are guarded with a check of pmd_large(), so we
      just need to add a valid bit check.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04df419d
    • D
      sparc64: Fix huge PMD invalidation. · 51e5ef1b
      David S. Miller 提交于
      On sparc64 "present" and "valid" are seperate PTE bits, this allows us to
      naturally distinguish between the user explicitly asking for PROT_NONE
      with mprotect() and other situations.
      
      However we weren't handling this properly in the huge PMD paths.
      
      First of all, the page table walker in the TSB miss path only checks
      for _PAGE_PMD_HUGE.  So the generic pmdp_invalidate() would clear
      _PAGE_PRESENT but the TLB miss paths would still load it into the TLB
      as a valid huge PMD.
      
      Fix this by clearing the valid bit in pmdp_invalidate(), and also
      checking the valid bit in USER_PGTABLE_CHECK_PMD_HUGE using "brgez"
      since _PAGE_VALID is bit 63 in both the sun4u and sun4v pte layouts.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51e5ef1b
  2. 19 12月, 2013 1 次提交
    • R
      mm: fix TLB flush race between migration, and change_protection_range · 20841405
      Rik van Riel 提交于
      There are a few subtle races, between change_protection_range (used by
      mprotect and change_prot_numa) on one side, and NUMA page migration and
      compaction on the other side.
      
      The basic race is that there is a time window between when the PTE gets
      made non-present (PROT_NONE or NUMA), and the TLB is flushed.
      
      During that time, a CPU may continue writing to the page.
      
      This is fine most of the time, however compaction or the NUMA migration
      code may come in, and migrate the page away.
      
      When that happens, the CPU may continue writing, through the cached
      translation, to what is no longer the current memory location of the
      process.
      
      This only affects x86, which has a somewhat optimistic pte_accessible.
      All other architectures appear to be safe, and will either always flush,
      or flush whenever there is a valid mapping, even with no permissions
      (SPARC).
      
      The basic race looks like this:
      
      CPU A			CPU B			CPU C
      
      						load TLB entry
      make entry PTE/PMD_NUMA
      			fault on entry
      						read/write old page
      			start migrating page
      			change PTE/PMD to new page
      						read/write old page [*]
      flush TLB
      						reload TLB from new entry
      						read/write new page
      						lose data
      
      [*] the old page may belong to a new user at this point!
      
      The obvious fix is to flush remote TLB entries, by making sure that
      pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
      still be accessible if there is a TLB flush pending for the mm.
      
      This should fix both NUMA migration and compaction.
      
      [mgorman@suse.de: fix build]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20841405
  3. 14 11月, 2013 1 次提交
    • D
      sparc64: Encode huge PMDs using PTE encoding. · a7b9403f
      David S. Miller 提交于
      Now that we have 64-bits for PMDs we can stop using special encodings
      for the huge PMD values, and just put real PTEs in there.
      
      We allocate a _PAGE_PMD_HUGE bit to distinguish between plain PMDs and
      huge ones.  It is the same for both 4U and 4V PTE layouts.
      
      We also use _PAGE_SPECIAL to indicate the splitting state, since a
      huge PMD cannot also be special.
      
      All of the PMD --> PTE translation code disappears, and most of the
      huge PMD bit modifications and tests just degenerate into the PTE
      operations.  In particular USER_PGTABLE_CHECK_PMD_HUGE becomes
      trivial.
      
      As a side effect, normal PMDs don't shift the physical address around.
      This also speeds up the page table walks in the TLB miss paths since
      they don't have to do the shifts any more.
      
      Another non-trivial aspect is that pte_modify() has to be changed
      to preserve the _PAGE_PMD_HUGE bits as well as the page size field
      of the pte.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7b9403f
  4. 13 11月, 2013 2 次提交
    • D
      sparc64: Move to 64-bit PGDs and PMDs. · 2b77933c
      David S. Miller 提交于
      To make the page tables compact, we were using 32-bit PGDs and PMDs.
      We only had to support <= 43 bits of physical addresses so this was
      quite feasible.
      
      In order to support larger physical addresses we have to move to
      64-bit PGDs and PMDs.
      
      Most of the changes are straight-forward:
      
      1) {pgd,pmd}_t --> unsigned long
      
      2) Anything that tries to use plain "unsigned int" types with pgd/pmd
         values needs to be adjusted.  In particular things like "0U" become
         "0UL".
      
      3) {PGDIR,PMD}_BITS decrease by one.
      
      4) In the assembler page table walkers, use "ldxa" instead of "lduwa"
         and adjust the low bit masks to clear out the low 3 bits instead of
         just the low 2 bits during pgd/pmd address formation.
      
      Also, use PTRS_PER_PGD and PTRS_PER_PMD in the sizing of the
      swapper_{pg_dir,low_pmd_dir} arrays.
      
      This patch does not try to take advantage of having 64-bits in the
      PMDs to simplify the hugepage code, that will come in a subsequent
      change.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b77933c
    • D
      sparc64: Move from 4MB to 8MB huge pages. · 37b3a8ff
      David S. Miller 提交于
      The impetus for this is that we would like to move to 64-bit PMDs and
      PGDs, but that would result in only supporting a 42-bit address space
      with the current page table layout.  It'd be nice to support at least
      43-bits.
      
      The reason we'd end up with only 42-bits after making PMDs and PGDs
      64-bit is that we only use half-page sized PTE tables in order to make
      PMDs line up to 4MB, the hardware huge page size we use.
      
      So what we do here is we make huge pages 8MB, and fabricate them using
      4MB hw TLB entries.
      
      Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in
      places that really need to operate on hardware 4MB pages.
      
      Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT,
      PGD_SHIFT, and the build time CPP test as needed.  Use a CPP test to
      make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up.
      
      This makes the pgtable cache completely unused, so remove the code
      managing it and the state used in mm_context_t.  Now we have less
      spinlocks taken in the page table allocation path.
      
      The technique we use to fabricate the 8MB pages is to transfer bit 22
      from the missing virtual address into the PTEs physical address field.
      That takes care of the transparent huge pages case.
      
      For hugetlb, we fill things in at the PTE level and that code already
      puts the sub huge page physical bits into the PTEs, based upon the
      offset, so there is nothing special we need to do.  It all just works
      out.
      
      So, a small amount of complexity in the THP case, but this code is
      about to get much simpler when we move the 64-bit PMDs as we can move
      away from the fancy 32-bit huge PMD encoding and just put a real PTE
      value in there.
      
      With bug fixes and help from Bob Picco.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37b3a8ff
  5. 29 6月, 2013 1 次提交
  6. 20 6月, 2013 1 次提交
  7. 20 4月, 2013 1 次提交
    • D
      sparc64: Fix race in TLB batch processing. · f36391d2
      David S. Miller 提交于
      As reported by Dave Kleikamp, when we emit cross calls to do batched
      TLB flush processing we have a race because we do not synchronize on
      the sibling cpus completing the cross call.
      
      So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.)
      and either flushes are missed or flushes will flush the wrong
      addresses.
      
      Fix this by using generic infrastructure to synchonize on the
      completion of the cross call.
      
      This first required getting the flush_tlb_pending() call out from
      switch_to() which operates with locks held and interrupts disabled.
      The problem is that smp_call_function_many() cannot be invoked with
      IRQs disabled and this is explicitly checked for with WARN_ON_ONCE().
      
      We get the batch processing outside of locked IRQ disabled sections by
      using some ideas from the powerpc port. Namely, we only batch inside
      of arch_{enter,leave}_lazy_mmu_mode() calls.  If we're not in such a
      region, we flush TLBs synchronously.
      
      1) Get rid of xcall_flush_tlb_pending and per-cpu type
         implementations.
      
      2) Do TLB batch cross calls instead via:
      
      	smp_call_function_many()
      		tlb_pending_func()
      			__flush_tlb_pending()
      
      3) Batch only in lazy mmu sequences:
      
      	a) Add 'active' member to struct tlb_batch
      	b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
      	c) Set 'active' in arch_enter_lazy_mmu_mode()
      	d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode()
      	e) Check 'active' in tlb_batch_add_one() and do a synchronous
                 flush if it's clear.
      
      4) Add infrastructure for synchronous TLB page flushes.
      
      	a) Implement __flush_tlb_page and per-cpu variants, patch
      	   as needed.
      	b) Likewise for xcall_flush_tlb_page.
      	c) Implement smp_flush_tlb_page() to invoke the cross-call.
      	d) Wire up global_flush_tlb_page() to the right routine based
                 upon CONFIG_SMP
      
      5) It turns out that singleton batches are very common, 2 out of every
         3 batch flushes have only a single entry in them.
      
         The batch flush waiting is very expensive, both because of the poll
         on sibling cpu completeion, as well as because passing the tlb batch
         pointer to the sibling cpus invokes a shared memory dereference.
      
         Therefore, in flush_tlb_pending(), if there is only one entry in
         the batch perform a completely asynchronous global_flush_tlb_page()
         instead.
      Reported-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      f36391d2
  8. 14 2月, 2013 1 次提交
  9. 19 12月, 2012 1 次提交
  10. 09 10月, 2012 4 次提交
    • D
      sparc64: Support transparent huge pages. · 9e695d2e
      David Miller 提交于
      This is relatively easy since PMD's now cover exactly 4MB of memory.
      
      Our PMD entries are 32-bits each, so we use a special encoding.  The
      lowest bit, PMD_ISHUGE, determines the interpretation.  This is possible
      because sparc64's page tables are purely software entities so we can use
      whatever encoding scheme we want.  We just have to make the TLB miss
      assembler page table walkers aware of the layout.
      
      set_pmd_at() works much like set_pte_at() but it has to operate in two
      page from a table of non-huge PTEs, so we have to queue up TLB flushes
      based upon what mappings are valid in the PTE table.  In the second regime
      we are going from huge-page to non-huge-page, and in that case we need
      only queue up a single TLB flush to push out the huge page mapping.
      
      We still have 5 bits remaining in the huge PMD encoding so we can very
      likely support any new pieces of THP state tracking that might get added
      in the future.
      
      With lots of help from Johannes Weiner.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e695d2e
    • D
      sparc64: Document PGD and PMD layout. · dbc9fdf0
      David Miller 提交于
      We're going to be messing around with the PMD interpretation and layout
      for the sake of transparent huge pages, so we better clearly document what
      we're starting with.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dbc9fdf0
    • D
      sparc64: Halve the size of PTE tables · 56a70b8c
      David Miller 提交于
      The reason we want to do this is to facilitate transparent huge page
      support.
      
      Right now PMD's cover 8MB of address space, and our huge page size is 4MB.
       The current transparent hugepage support is not able to handle HPAGE_SIZE
      != PMD_SIZE.
      
      So make PTE tables be sized to half of a page instead of a full page.
      
      We can still map properly the whole supported virtual address range which
      on sparc64 requires 44 bits.  Add a compile time CPP test which ensures
      that this requirement is always met.
      
      There is a minor inefficiency added by this change.  We only use half of
      the page for PTE tables.  It's not trivial to use only half of the page
      yet still get all of the pgtable_page_{ctor,dtor}() stuff working
      properly.  It is doable, and that will come in a subsequent change.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56a70b8c
    • D
      sparc64: Only support 4MB huge pages and 8KB base pages. · 15b9350a
      David Miller 提交于
      Narrowing the scope of the page size configurations will make the
      transparent hugepage changes much simpler.
      
      In the end what we really want to do is have the kernel support multiple
      huge page sizes and use whatever is appropriate as the context dictactes.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15b9350a
  11. 14 5月, 2012 1 次提交
    • D
      sparc: Kill mmu_{un,}lockarea(). · 679bea5e
      David S. Miller 提交于
      These were used on sun4c during floppy data transfers since on that
      chip we had to lock the cpu mappings into the TLB because we cannot
      take a TLB miss during the assembler floppy interrupt handler that
      does the data transfer.
      
      That is no longer necessary since we've removed sun4c support, thus
      this stuff can disappear completely.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      679bea5e
  12. 02 4月, 2012 1 次提交
    • A
      sparc: pgtable_64: change include order · 2533e824
      Aaro Koskinen 提交于
      Fix the following build breakage in v3.4-rc1:
      
        CC      arch/sparc/kernel/cpu.o
      In file included from /home/aaro/git/linux/arch/sparc/include/asm/pgtable_64.h:15:0,
                       from /home/aaro/git/linux/arch/sparc/include/asm/pgtable.h:4,
                       from arch/sparc/kernel/cpu.c:15:
      include/asm-generic/pgtable-nopud.h:13:16: error: unknown type name 'pgd_t'
      include/asm-generic/pgtable-nopud.h:25:28: error: unknown type name 'pgd_t'
      include/asm-generic/pgtable-nopud.h:26:27: error: unknown type name 'pgd_t'
      include/asm-generic/pgtable-nopud.h:27:31: error: unknown type name 'pgd_t'
      include/asm-generic/pgtable-nopud.h:28:30: error: unknown type name 'pgd_t'
      include/asm-generic/pgtable-nopud.h:38:34: error: unknown type name 'pgd_t'
      In file included from /home/aaro/git/linux/arch/sparc/include/asm/pgtable_64.h:783:0,
                       from /home/aaro/git/linux/arch/sparc/include/asm/pgtable.h:4,
                       from arch/sparc/kernel/cpu.c:15:
      include/asm-generic/pgtable.h: In function 'pgd_none_or_clear_bad':
      include/asm-generic/pgtable.h:258:2: error: implicit declaration of function 'pgd_none' [-Werror=implicit-function-declaration]
      include/asm-generic/pgtable.h:260:2: error: implicit declaration of function 'pgd_bad' [-Werror=implicit-function-declaration]
      include/asm-generic/pgtable.h: In function 'pud_none_or_clear_bad':
      include/asm-generic/pgtable.h:269:6: error: request for member 'pgd' in something not a structure or union
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2533e824
  13. 29 3月, 2012 1 次提交
  14. 18 11月, 2011 1 次提交
  15. 26 7月, 2011 1 次提交
  16. 25 5月, 2011 1 次提交
    • P
      sparc: mmu_gather rework · 90f08e39
      Peter Zijlstra 提交于
      Rework the sparc mmu_gather usage to conform to the new world order :-)
      
      Sparc mmu_gather does two things:
       - tracks vaddrs to unhash
       - tracks pages to free
      
      Split these two things like powerpc has done and keep the vaddrs
      in per-cpu data structures and flush them on context switch.
      
      The remaining bits can then use the generic mmu_gather.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90f08e39
  17. 22 4月, 2011 1 次提交
  18. 27 10月, 2010 1 次提交
  19. 21 2月, 2010 1 次提交
    • R
      MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself · 4b3073e1
      Russell King 提交于
      On VIVT ARM, when we have multiple shared mappings of the same file
      in the same MM, we need to ensure that we have coherency across all
      copies.  We do this via make_coherent() by making the pages
      uncacheable.
      
      This used to work fine, until we allowed highmem with highpte - we
      now have a page table which is mapped as required, and is not available
      for modification via update_mmu_cache().
      
      Ralf Beache suggested getting rid of the PTE value passed to
      update_mmu_cache():
      
        On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
        to construct a pointer to the pte again.  Passing a pte_t * is much
        more elegant.  Maybe we might even replace the pte argument with the
        pte_t?
      
      Ben Herrenschmidt would also like the pte pointer for PowerPC:
      
        Passing the ptep in there is exactly what I want.  I want that
        -instead- of the PTE value, because I have issue on some ppc cases,
        for I$/D$ coherency, where set_pte_at() may decide to mask out the
        _PAGE_EXEC.
      
      So, pass in the mapped page table pointer into update_mmu_cache(), and
      remove the PTE value, updating all implementations and call sites to
      suit.
      
      Includes a fix from Stephen Rothwell:
      
        sparc: fix fallout from update_mmu_cache API change
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      4b3073e1
  20. 29 9月, 2009 1 次提交
  21. 26 8月, 2009 1 次提交
    • D
      sparc64: Validate linear D-TLB misses. · d8ed1d43
      David S. Miller 提交于
      When page alloc debugging is not enabled, we essentially accept any
      virtual address for linear kernel TLB misses.  But with kgdb, kernel
      address probing, and other facilities we can try to access arbitrary
      crap.
      
      So, make sure the address we miss on will translate to physical memory
      that actually exists.
      
      In order to make this work we have to embed the valid address bitmap
      into the kernel image.  And in order to make that less expensive we
      make an adjustment, in that the max physical memory address is
      decreased to "1 << 41", even on the chips that support a 42-bit
      physical address space.  We can do this because bit 41 indicates
      "I/O space" and thus covers non-memory ranges.
      
      The result of this is that:
      
      1) kpte_linear_bitmap shrinks from 2K to 1K in size
      
      2) we need 64K more for the valid address bitmap
      
      We can't let the valid address bitmap be dynamically allocated
      once we start using it to validate TLB misses, otherwise we have
      crazy issues to deal with wrt. recursive TLB misses and such.
      
      If we're in a TLB miss it could be the deepest trap level that's legal
      inside of the cpu.  So if we TLB miss referencing the bitmap, the cpu
      will be out of trap levels and enter RED state.
      
      To guard against out-of-range accesses to the bitmap, we have to check
      to make sure no bits in the physical address above bit 40 are set.  We
      could export and use last_valid_pfn for this check, but that's just an
      unnecessary extra memory reference.
      
      On the plus side of all this, since we load all of these translations
      into the special 4MB mapping TSB, and we check the TSB first for TLB
      misses, there should be absolutely no real cost for these new checks
      in the TLB miss path.
      
      Reported-by: heyongli@gmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8ed1d43
  22. 12 9月, 2008 1 次提交
  23. 28 7月, 2008 1 次提交
    • S
      sparc, sparc64: use arch/sparc/include · a439fe51
      Sam Ravnborg 提交于
      The majority of this patch was created by the following script:
      
      ***
      ASM=arch/sparc/include/asm
      mkdir -p $ASM
      git mv include/asm-sparc64/ftrace.h $ASM
      git rm include/asm-sparc64/*
      git mv include/asm-sparc/* $ASM
      sed -ie 's/asm-sparc64/asm/g' $ASM/*
      sed -ie 's/asm-sparc/asm/g' $ASM/*
      ***
      
      The rest was an update of the top-level Makefile to use sparc
      for header files when sparc64 is being build.
      And a small fixlet to pick up the correct unistd.h from
      sparc64 code.
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      a439fe51
  24. 18 7月, 2008 2 次提交
    • D
      sparc64: Remove 4MB and 512K base page size options. · f7fe9334
      David S. Miller 提交于
      Adrian Bunk reported that enabling 4MB page size breaks the build.
      The problem is that MAX_ORDER combined with the page shift exceeds the
      SECTION_SIZE_BITS we use in asm-sparc64/sparsemem.h
      
      There are several ways I suppose we could work around this.  For one
      we could define a CONFIG_FORCE_MAX_ZONEORDER to decrease MAX_ORDER in
      these higher page size cases.
      
      But I also know that these page size cases are broken wrt. TLB miss
      handling especially on pre-hypervisor systems, and there isn't an easy
      way to fix that.
      
      These options were meant to be fun experimental hacks anyways, and
      only 8K and 64K make any sense to support.
      
      So remove 512K and 4M base page size support.  Of course, we still
      support these page sizes for huge pages.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7fe9334
    • S
      sparc: join the remaining header files · f5e706ad
      Sam Ravnborg 提交于
      With this commit all sparc64 header files are moved to asm-sparc.
      The remaining files (71 files) were too different to be trivially
      merged so divide them up in a _32.h and a _64.h file which
      are both included from the file with no bit size.
      
      The following script were used:
      cd include
      FILES=`wc -l asm-sparc64/*h | grep -v '^     1' | cut -b 20-`
      
      for FILE in ${FILES}; do
        echo $FILE:
        BASE=`echo $FILE | cut -d '.' -f 1`
        FN32=${BASE}_32.h
        FN64=${BASE}_64.h
        GUARD=___ASM_SPARC_`echo $BASE | tr '-' '_' | tr [:lower:] [:upper:]`_H
        git mv asm-sparc/$FILE asm-sparc/$FN32
        git mv asm-sparc64/$FILE asm-sparc/$FN64
        echo git mv done
        printf "#ifndef %s\n" $GUARD                             >   asm-sparc/$FILE
        printf "#define %s\n" $GUARD                             >>  asm-sparc/$FILE
        printf "#if defined(__sparc__) && defined(__arch64__)\n" >>  asm-sparc/$FILE
        printf "#include <asm-sparc/%s>\n" $FN64                 >>  asm-sparc/$FILE
        printf "#else\n"                                         >>  asm-sparc/$FILE
        printf "#include <asm-sparc/%s>\n" $FN32                 >>  asm-sparc/$FILE
        printf "#endif\n"                                        >>  asm-sparc/$FILE
        printf "#endif\n"                                        >>  asm-sparc/$FILE
        git add asm-sparc/$FILE
        echo new file done
        printf "#include <asm-sparc/%s>\n" $FILE                 >  asm-sparc64/$FILE
        git add asm-sparc64/$FILE
        echo sparc64 file done
      done
      
      The guard contains three '_' to avoid conflict with existing guards.
      In additing the two Kbuild files are emptied to avoid breaking
      headers_* targets.
      We will reintroduce the exported header files when the necessary
      kbuild changes are merged.
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5e706ad
  25. 20 5月, 2008 1 次提交
  26. 28 4月, 2008 1 次提交
    • N
      mm: introduce pte_special pte bit · 7e675137
      Nick Piggin 提交于
      s390 for one, cannot implement VM_MIXEDMAP with pfn_valid, due to their memory
      model (which is more dynamic than most).  Instead, they had proposed to
      implement it with an additional path through vm_normal_page(), using a bit in
      the pte to determine whether or not the page should be refcounted:
      
      vm_normal_page()
      {
      	...
              if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
                      if (vma->vm_flags & VM_MIXEDMAP) {
      #ifdef s390
      			if (!mixedmap_refcount_pte(pte))
      				return NULL;
      #else
                              if (!pfn_valid(pfn))
                                      return NULL;
      #endif
                              goto out;
                      }
      	...
      }
      
      This is fine, however if we are allowed to use a bit in the pte to determine
      refcountedness, we can use that to _completely_ replace all the vma based
      schemes.  So instead of adding more cases to the already complex vma-based
      scheme, we can have a clearly seperate and simple pte-based scheme (and get
      slightly better code generation in the process):
      
      vm_normal_page()
      {
      #ifdef s390
      	if (!mixedmap_refcount_pte(pte))
      		return NULL;
      	return pte_page(pte);
      #else
      	...
      #endif
      }
      
      And finally, we may rather make this concept usable by any architecture rather
      than making it s390 only, so implement a new type of pte state for this.
      Unfortunately the old vma based code must stay, because some architectures may
      not be able to spare pte bits.  This makes vm_normal_page a little bit more
      ugly than we would like, but the 2 cases are clearly seperate.
      
      So introduce a pte_special pte state, and use it in mm/memory.c.  It is
      currently a noop for all architectures, so this doesn't actually result in any
      compiled code changes to mm/memory.o.
      
      BTW:
      I haven't put vm_normal_page() into arch code as-per an earlier suggestion.
      The reason is that, regardless of where vm_normal_page is actually
      implemented, the *abstraction* is still exactly the same. Also, while it
      depends on whether the architecture has pte_special or not, that is the
      only two possible cases, and it really isn't an arch specific function --
      the role of the arch code should be to provide primitive functions and
      accessors with which to build the core code; pte_special does that. We do
      not want architectures to know or care about vm_normal_page itself, and
      we definitely don't want them being able to invent something new there
      out of sight of mm/ code. If we made vm_normal_page an arch function, then
      we have to make vm_insert_mixed (next patch) an arch function too. So I
      don't think moving it to arch code fundamentally improves any abstractions,
      while it does practically make the code more difficult to follow, for both
      mm and arch developers, and easier to misuse.
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NCarsten Otte <cotte@de.ibm.com>
      Cc: Jared Hulbert <jaredeh@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e675137
  27. 26 3月, 2008 2 次提交
  28. 17 10月, 2007 1 次提交
  29. 17 7月, 2007 1 次提交
  30. 09 5月, 2007 1 次提交
  31. 26 4月, 2007 2 次提交