1. 09 10月, 2012 2 次提交
    • D
      sparc64: Eliminate PTE table memory wastage. · c460bec7
      David Miller 提交于
      We've split up the PTE tables so that they take up half a page instead of
      a full page.  This is in order to facilitate transparent huge page
      support, which works much better if our PMDs cover 4MB instead of 8MB.
      
      What we do is have a one-behind cache for PTE table allocations in the
      mm struct.
      
      This logic triggers only on allocations.  For example, we don't try to
      keep track of free'd up page table blocks in the style that the s390 port
      does.
      
      There were only two slightly annoying aspects to this change:
      
      1) Changing pgtable_t to be a "pte_t *".  There's all of this special
         logic in the TLB free paths that needed adjustments, as did the
         PMD populate interfaces.
      
      2) init_new_context() needs to zap the pointer, since the mm struct
         just gets copied from the parent on fork.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c460bec7
    • D
      sparc64: Only support 4MB huge pages and 8KB base pages. · 15b9350a
      David Miller 提交于
      Narrowing the scope of the page size configurations will make the
      transparent hugepage changes much simpler.
      
      In the end what we really want to do is have the kernel support multiple
      huge page sizes and use whatever is appropriate as the context dictactes.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15b9350a
  2. 03 10月, 2012 1 次提交
  3. 07 9月, 2012 3 次提交
    • D
      sparc64: Use cpu_pgsz_mask for linear kernel mapping config. · c69ad0a3
      David S. Miller 提交于
      This required a little bit of reordering of how we set up the memory
      management early on.
      
      We now only know the final values of kern_linear_pte_xor[] after we
      take over the trap table and start processing TLB misses ourselves.
      
      So once we fill those values in we re-clear the kernel's 4M TSB and
      flush the TLBs.  That way if we find we support larger than 4M pages
      we won't have any stale smaller page size entries in the TSB.
      
      SUN4U Panther support for larger page sizes should now be extremely
      trivial but I have no hardware on which to test it and I believe
      that some of the sun4u TLB miss assembler needs to be audited first
      to make sure it really can handle larger than 4M PTEs properly.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c69ad0a3
    • D
      sparc64: Probe cpu page size support more portably. · ce33fdc5
      David S. Miller 提交于
      On sun4v, interrogate the machine description.  This code is extremely
      defensive in nature, and a lot of the checks can probably be removed.
      
      On sun4u things are a lot simpler.  There are the page sizes all chips
      support, and then Panther adds 32MB and 256MB pages.
      
      Report the probed value in /proc/cpuinfo
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce33fdc5
    • D
      sparc64: Support 2GB and 16GB page sizes for kernel linear mappings. · 4f93d21d
      David S. Miller 提交于
      SPARC-T4 supports 2GB pages.
      
      So convert kpte_linear_bitmap into an array of 2-bit values which
      index into kern_linear_pte_xor.
      
      Now kern_linear_pte_xor is used for 4 page size aligned regions,
      4MB, 256MB, 2GB, and 16GB respectively.
      
      Enabling 2GB pages is currently hardcoded using a check against
      sun4v_chip_type.  In the future this will be done more cleanly
      by interrogating the machine description which is the correct
      way to determine this kind of thing.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f93d21d
  4. 15 8月, 2012 1 次提交
    • D
      sparc64: Be less verbose during vmemmap population. · 2856cc2e
      David S. Miller 提交于
      On a 2-node machine with 256GB of ram we get 512 lines of
      console output, which is just too much.
      
      This mimicks Yinghai Lu's x86 commit c2b91e2e
      (x86_64/mm: check and print vmemmap allocation continuous) except that
      we aren't ever going to get contiguous block pointers in between calls
      so just print when the virtual address or node changes.
      
      This decreases the output by an order of 16.
      
      Also demote this to KERN_DEBUG.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2856cc2e
  5. 10 5月, 2012 1 次提交
    • P
      sparc: fix build fail in mm/init_64.c when NEED_MULTIPLE_NODES is off · aa6f0790
      Paul Gortmaker 提交于
      Commit 625d693e (linux-next)
      
          "sparc64: Convert over to NO_BOOTMEM."
      
      causes the following compile failure for sparc64 allnoconfig:
      
        arch/sparc/mm/init_64.c:822:16: error: unused variable 'paddr'
        arch/sparc/mm/init_64.c:1759:7: error: unused variable 'node'
        arch/sparc/mm/init_64.c:809:12: error: 'memblock_nid_range' defined but not used
      
      The paddr decl can easily be shuffled within the ifdef.  The
      memblock_nid_range is just a stub function for when NEED_MULTIPLE_NODES
      is off, but the only caller is within a NEED_MULTIPLE_NODES enabled
      section, so we can simply delete it.
      
      The unused "node" is slightly more interesting.  In the case of
      "# CONFIG_NEED_MULTIPLE_NODES is not set" we no longer get the
      definition of:
      
       #define NODE_DATA(nid)          (node_data[nid])
      
      from arch/sparc/include/asm/mmzone.h - but instead we get:
      
       #define NODE_DATA(nid)          (&contig_page_data)
      
      from include/linux/mmzone.h -- and since the arg is ignored,
      the thing really is unused.  Rather than put in a confusing
      looking __maybe_unused, simply splitting the declaration
      from the assignment seemed to me to be the least offensive.
      
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa6f0790
  6. 28 4月, 2012 1 次提交
  7. 27 4月, 2012 2 次提交
  8. 29 3月, 2012 1 次提交
  9. 09 12月, 2011 3 次提交
    • T
      sparc: Use HAVE_MEMBLOCK_NODE_MAP · 2a4814df
      Tejun Heo 提交于
      sparc doesn't access early_node_map[] directly and enabling
      HAVE_MEMBLOCK_NODE_MAP is trivial - replacing add_active_range() calls
      with memblock_set_node() and selecting HAVE_MEMBLOCK_NODE_MAP is
      enough.
      
      -v2: Use select in Kconfig instead as suggested by Sam Ravnborg.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: sparclinux@vger.kernel.org
      2a4814df
    • T
      memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users · 1aadc056
      Tejun Heo 提交于
      The only function of memblock_analyze() is now allowing resize of
      memblock region arrays.  Rename it to memblock_allow_resize() and
      update its users.
      
      * The following users remain the same other than renaming.
      
        arm/mm/init.c::arm_memblock_init()
        microblaze/kernel/prom.c::early_init_devtree()
        powerpc/kernel/prom.c::early_init_devtree()
        openrisc/kernel/prom.c::early_init_devtree()
        sh/mm/init.c::paging_init()
        sparc/mm/init_64.c::paging_init()
        unicore32/mm/init.c::uc32_memblock_init()
      
      * In the following users, analyze was used to update total size which
        is no longer necessary.
      
        powerpc/kernel/machine_kexec.c::reserve_crashkernel()
        powerpc/kernel/prom.c::early_init_devtree()
        powerpc/mm/init_32.c::MMU_init()
        powerpc/mm/tlb_nohash.c::__early_init_mmu()  
        powerpc/platforms/ps3/mm.c::ps3_mm_add_memory()
        powerpc/platforms/embedded6xx/wii.c::wii_memory_fixups()
        sh/kernel/machine_kexec.c::reserve_crashkernel()
      
      * x86/kernel/e820.c::memblock_x86_fill() was directly setting
        memblock_can_resize before populating memblock and calling analyze
        afterwards.  Call memblock_allow_resize() before start populating.
      
      memblock_can_resize is now static inside memblock.c.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      1aadc056
    • T
      memblock: Kill memblock_init() · fe091c20
      Tejun Heo 提交于
      memblock_init() initializes arrays for regions and memblock itself;
      however, all these can be done with struct initializers and
      memblock_init() can be removed.  This patch kills memblock_init() and
      initializes memblock with struct initializer.
      
      The only difference is that the first dummy entries don't have .nid
      set to MAX_NUMNODES initially.  This doesn't cause any behavior
      difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      fe091c20
  10. 30 9月, 2011 1 次提交
    • D
      sparc64: Force the execute bit in OpenFirmware's translation entries. · f4142cba
      David S. Miller 提交于
      In the OF 'translations' property, the template TTEs in the mappings
      never specify the executable bit.  This is the case even though some
      of these mappings are for OF's code segment.
      
      Therefore, we need to force the execute bit on in every mapping.
      
      This problem can only really trigger on Niagara/sun4v machines and the
      history behind this is a little complicated.
      
      Previous to sun4v, the sun4u TTE entries lacked a hardware execute
      permission bit.  So OF didn't have to ever worry about setting
      anything to handle executable pages.  Any valid TTE loaded into the
      I-TLB would be respected by the chip.
      
      But sun4v Niagara chips have a real hardware enforced executable bit
      in their TTEs.  So it has to be set or else the I-TLB throws an
      instruction access exception with type code 6 (protection violation).
      
      We've been extremely fortunate to not get bitten by this in the past.
      
      The best I can tell is that the OF's mappings for it's executable code
      were mapped using permanent locked mappings on sun4v in the past.
      Therefore, the fact that we didn't have the exec bit set in the OF
      translations we would use did not matter in practice.
      
      Thanks to Greg Onufer for helping me track this down.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4142cba
  11. 06 8月, 2011 1 次提交
  12. 05 8月, 2011 1 次提交
  13. 15 7月, 2011 1 次提交
    • T
      memblock: Don't allow archs to override memblock_nid_range() · f9b18db3
      Tejun Heo 提交于
      memblock_nid_range() is used to implement memblock_[try_]alloc_nid().
      The generic version determines the range by walking early_node_map
      with for_each_mem_pfn_range().  The generic version is defined __weak
      to allow arch override.
      
      Currently, only sparc overrides it; however, with the previous update
      to the generic implementation, there isn't much to be gained with arch
      override.  Sparc would behave exactly the same with the generic
      implementation.
      
      This patch disallows arch override for memblock_nid_range() and make
      both generic and sparc versions static.
      
      sparc is only compile tested.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/r/1310460395-30913-6-git-send-email-tj@kernel.org
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      f9b18db3
  14. 08 6月, 2011 1 次提交
  15. 17 5月, 2011 1 次提交
  16. 13 10月, 2010 1 次提交
    • Y
      memblock, bootmem: Round pfn properly for memory and reserved regions · c7fc2de0
      Yinghai Lu 提交于
      We need to round memory regions correctly -- specifically, we need to
      round reserved region in the more expansive direction (lower limit
      down, upper limit up) whereas usable memory regions need to be rounded
      in the more restrictive direction (lower limit up, upper limit down).
      
      This introduces two set of inlines:
      
      	memblock_region_memory_base_pfn()
      	memblock_region_memory_end_pfn()
      	memblock_region_reserved_base_pfn()
      	memblock_region_reserved_end_pfn()
      
      Although they are antisymmetric (and therefore are technically
      duplicates) the use of the different inlines explicitly documents the
      programmer's intention.
      
      The lack of proper rounding caused a bug on ARM, which was then found
      to also affect other architectures.
      Reported-by: NRussell King <rmk@arm.linux.org.uk>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4CB4CDFD.4020105@kernel.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      c7fc2de0
  17. 09 10月, 2010 1 次提交
  18. 05 8月, 2010 2 次提交
  19. 04 8月, 2010 2 次提交
  20. 14 7月, 2010 1 次提交
  21. 04 4月, 2010 1 次提交
  22. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  23. 21 2月, 2010 1 次提交
    • R
      MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself · 4b3073e1
      Russell King 提交于
      On VIVT ARM, when we have multiple shared mappings of the same file
      in the same MM, we need to ensure that we have coherency across all
      copies.  We do this via make_coherent() by making the pages
      uncacheable.
      
      This used to work fine, until we allowed highmem with highpte - we
      now have a page table which is mapped as required, and is not available
      for modification via update_mmu_cache().
      
      Ralf Beache suggested getting rid of the PTE value passed to
      update_mmu_cache():
      
        On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
        to construct a pointer to the pte again.  Passing a pte_t * is much
        more elegant.  Maybe we might even replace the pte argument with the
        pte_t?
      
      Ben Herrenschmidt would also like the pte pointer for PowerPC:
      
        Passing the ptep in there is exactly what I want.  I want that
        -instead- of the PTE value, because I have issue on some ppc cases,
        for I$/D$ coherency, where set_pte_at() may decide to mask out the
        _PAGE_EXEC.
      
      So, pass in the mapped page table pointer into update_mmu_cache(), and
      remove the PTE value, updating all implementations and call sites to
      suit.
      
      Includes a fix from Stephen Rothwell:
      
        sparc: fix fallout from update_mmu_cache API change
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      4b3073e1
  24. 12 10月, 2009 1 次提交
  25. 26 8月, 2009 1 次提交
    • D
      sparc64: Validate linear D-TLB misses. · d8ed1d43
      David S. Miller 提交于
      When page alloc debugging is not enabled, we essentially accept any
      virtual address for linear kernel TLB misses.  But with kgdb, kernel
      address probing, and other facilities we can try to access arbitrary
      crap.
      
      So, make sure the address we miss on will translate to physical memory
      that actually exists.
      
      In order to make this work we have to embed the valid address bitmap
      into the kernel image.  And in order to make that less expensive we
      make an adjustment, in that the max physical memory address is
      decreased to "1 << 41", even on the chips that support a 42-bit
      physical address space.  We can do this because bit 41 indicates
      "I/O space" and thus covers non-memory ranges.
      
      The result of this is that:
      
      1) kpte_linear_bitmap shrinks from 2K to 1K in size
      
      2) we need 64K more for the valid address bitmap
      
      We can't let the valid address bitmap be dynamically allocated
      once we start using it to validate TLB misses, otherwise we have
      crazy issues to deal with wrt. recursive TLB misses and such.
      
      If we're in a TLB miss it could be the deepest trap level that's legal
      inside of the cpu.  So if we TLB miss referencing the bitmap, the cpu
      will be out of trap levels and enter RED state.
      
      To guard against out-of-range accesses to the bitmap, we have to check
      to make sure no bits in the physical address above bit 40 are set.  We
      could export and use last_valid_pfn for this check, but that's just an
      unnecessary extra memory reference.
      
      On the plus side of all this, since we load all of these translations
      into the special 4MB mapping TSB, and we check the TSB first for TLB
      misses, there should be absolutely no real cost for these new checks
      in the TLB miss path.
      
      Reported-by: heyongli@gmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8ed1d43
  26. 19 6月, 2009 1 次提交
  27. 16 6月, 2009 5 次提交
  28. 08 4月, 2009 1 次提交