1. 30 4月, 2013 1 次提交
    • J
      sparse-vmemmap: specify vmemmap population range in bytes · 0aad818b
      Johannes Weiner 提交于
      The sparse code, when asking the architecture to populate the vmemmap,
      specifies the section range as a starting page and a number of pages.
      
      This is an awkward interface, because none of the arch-specific code
      actually thinks of the range in terms of 'struct page' units and always
      translates it to bytes first.
      
      In addition, later patches mix huge page and regular page backing for
      the vmemmap.  For this, they need to call vmemmap_populate_basepages()
      on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind.  But these
      are not necessarily multiples of the 'struct page' size and so this unit
      is too coarse.
      
      Just translate the section range into bytes once in the generic sparse
      code, then pass byte ranges down the stack.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Tested-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0aad818b
  2. 09 4月, 2013 1 次提交
  3. 21 3月, 2013 1 次提交
  4. 24 2月, 2013 2 次提交
  5. 21 2月, 2013 2 次提交
    • D
      sparc64: Fix tsb_grow() in atomic context. · 0fbebed6
      David S. Miller 提交于
      If our first THP installation for an MM is via the set_pmd_at() done
      during khugepaged's collapsing we'll end up in tsb_grow() trying to do
      a GFP_KERNEL allocation with several locks held.
      
      Simply using GFP_ATOMIC in this situation is not the best option
      because we really can't have this fail, so we'd really like to keep
      this an order 0 GFP_KERNEL allocation if possible.
      
      Also, doing the TSB allocation from khugepaged is a really bad idea
      because we'll allocate it potentially from the wrong NUMA node in that
      context.
      
      So what we do is defer the hugepage TSB allocation until the first TLB
      miss we take on a hugepage.  This is slightly tricky because we have
      to handle two unusual cases:
      
      1) Taking the first hugepage TLB miss in the window trap handler.
         We'll call the winfix_trampoline when that is detected.
      
      2) An initial TSB allocation via TLB miss races with a hugetlb
         fault on another cpu running the same MM.  We handle this by
         unconditionally loading the TSB we see into the current cpu
         even if it's non-NULL at hugetlb_setup time.
      Reported-by: NMeelis Roos <mroos@ut.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fbebed6
    • D
      sparc64: Handle hugepage TSB being NULL. · bcd896ba
      David S. Miller 提交于
      Accomodate the possibility that the TSB might be NULL at
      the point that update_mmu_cache() is invoked.  This is
      necessary because we will sometimes need to defer the TSB
      allocation to the first fault that happens in the 'mm'.
      
      Seperate out the hugepage PTE test into a seperate function
      so that the logic is clearer.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcd896ba
  6. 13 1月, 2013 1 次提交
  7. 04 1月, 2013 1 次提交
    • G
      SPARC: drivers: remove __dev* attributes. · 7c9503b8
      Greg Kroah-Hartman 提交于
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the use of __devinit, __devexit_p, __devinitdata,
      and __devexit from these drivers.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c9503b8
  8. 18 11月, 2012 1 次提交
  9. 15 10月, 2012 1 次提交
  10. 09 10月, 2012 3 次提交
    • D
      sparc64: Support transparent huge pages. · 9e695d2e
      David Miller 提交于
      This is relatively easy since PMD's now cover exactly 4MB of memory.
      
      Our PMD entries are 32-bits each, so we use a special encoding.  The
      lowest bit, PMD_ISHUGE, determines the interpretation.  This is possible
      because sparc64's page tables are purely software entities so we can use
      whatever encoding scheme we want.  We just have to make the TLB miss
      assembler page table walkers aware of the layout.
      
      set_pmd_at() works much like set_pte_at() but it has to operate in two
      page from a table of non-huge PTEs, so we have to queue up TLB flushes
      based upon what mappings are valid in the PTE table.  In the second regime
      we are going from huge-page to non-huge-page, and in that case we need
      only queue up a single TLB flush to push out the huge page mapping.
      
      We still have 5 bits remaining in the huge PMD encoding so we can very
      likely support any new pieces of THP state tracking that might get added
      in the future.
      
      With lots of help from Johannes Weiner.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e695d2e
    • D
      sparc64: Eliminate PTE table memory wastage. · c460bec7
      David Miller 提交于
      We've split up the PTE tables so that they take up half a page instead of
      a full page.  This is in order to facilitate transparent huge page
      support, which works much better if our PMDs cover 4MB instead of 8MB.
      
      What we do is have a one-behind cache for PTE table allocations in the
      mm struct.
      
      This logic triggers only on allocations.  For example, we don't try to
      keep track of free'd up page table blocks in the style that the s390 port
      does.
      
      There were only two slightly annoying aspects to this change:
      
      1) Changing pgtable_t to be a "pte_t *".  There's all of this special
         logic in the TLB free paths that needed adjustments, as did the
         PMD populate interfaces.
      
      2) init_new_context() needs to zap the pointer, since the mm struct
         just gets copied from the parent on fork.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c460bec7
    • D
      sparc64: Only support 4MB huge pages and 8KB base pages. · 15b9350a
      David Miller 提交于
      Narrowing the scope of the page size configurations will make the
      transparent hugepage changes much simpler.
      
      In the end what we really want to do is have the kernel support multiple
      huge page sizes and use whatever is appropriate as the context dictactes.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15b9350a
  11. 03 10月, 2012 1 次提交
  12. 07 9月, 2012 3 次提交
    • D
      sparc64: Use cpu_pgsz_mask for linear kernel mapping config. · c69ad0a3
      David S. Miller 提交于
      This required a little bit of reordering of how we set up the memory
      management early on.
      
      We now only know the final values of kern_linear_pte_xor[] after we
      take over the trap table and start processing TLB misses ourselves.
      
      So once we fill those values in we re-clear the kernel's 4M TSB and
      flush the TLBs.  That way if we find we support larger than 4M pages
      we won't have any stale smaller page size entries in the TSB.
      
      SUN4U Panther support for larger page sizes should now be extremely
      trivial but I have no hardware on which to test it and I believe
      that some of the sun4u TLB miss assembler needs to be audited first
      to make sure it really can handle larger than 4M PTEs properly.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c69ad0a3
    • D
      sparc64: Probe cpu page size support more portably. · ce33fdc5
      David S. Miller 提交于
      On sun4v, interrogate the machine description.  This code is extremely
      defensive in nature, and a lot of the checks can probably be removed.
      
      On sun4u things are a lot simpler.  There are the page sizes all chips
      support, and then Panther adds 32MB and 256MB pages.
      
      Report the probed value in /proc/cpuinfo
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce33fdc5
    • D
      sparc64: Support 2GB and 16GB page sizes for kernel linear mappings. · 4f93d21d
      David S. Miller 提交于
      SPARC-T4 supports 2GB pages.
      
      So convert kpte_linear_bitmap into an array of 2-bit values which
      index into kern_linear_pte_xor.
      
      Now kern_linear_pte_xor is used for 4 page size aligned regions,
      4MB, 256MB, 2GB, and 16GB respectively.
      
      Enabling 2GB pages is currently hardcoded using a check against
      sun4v_chip_type.  In the future this will be done more cleanly
      by interrogating the machine description which is the correct
      way to determine this kind of thing.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f93d21d
  13. 15 8月, 2012 1 次提交
    • D
      sparc64: Be less verbose during vmemmap population. · 2856cc2e
      David S. Miller 提交于
      On a 2-node machine with 256GB of ram we get 512 lines of
      console output, which is just too much.
      
      This mimicks Yinghai Lu's x86 commit c2b91e2e
      (x86_64/mm: check and print vmemmap allocation continuous) except that
      we aren't ever going to get contiguous block pointers in between calls
      so just print when the virtual address or node changes.
      
      This decreases the output by an order of 16.
      
      Also demote this to KERN_DEBUG.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2856cc2e
  14. 10 5月, 2012 1 次提交
    • P
      sparc: fix build fail in mm/init_64.c when NEED_MULTIPLE_NODES is off · aa6f0790
      Paul Gortmaker 提交于
      Commit 625d693e (linux-next)
      
          "sparc64: Convert over to NO_BOOTMEM."
      
      causes the following compile failure for sparc64 allnoconfig:
      
        arch/sparc/mm/init_64.c:822:16: error: unused variable 'paddr'
        arch/sparc/mm/init_64.c:1759:7: error: unused variable 'node'
        arch/sparc/mm/init_64.c:809:12: error: 'memblock_nid_range' defined but not used
      
      The paddr decl can easily be shuffled within the ifdef.  The
      memblock_nid_range is just a stub function for when NEED_MULTIPLE_NODES
      is off, but the only caller is within a NEED_MULTIPLE_NODES enabled
      section, so we can simply delete it.
      
      The unused "node" is slightly more interesting.  In the case of
      "# CONFIG_NEED_MULTIPLE_NODES is not set" we no longer get the
      definition of:
      
       #define NODE_DATA(nid)          (node_data[nid])
      
      from arch/sparc/include/asm/mmzone.h - but instead we get:
      
       #define NODE_DATA(nid)          (&contig_page_data)
      
      from include/linux/mmzone.h -- and since the arg is ignored,
      the thing really is unused.  Rather than put in a confusing
      looking __maybe_unused, simply splitting the declaration
      from the assignment seemed to me to be the least offensive.
      
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa6f0790
  15. 28 4月, 2012 1 次提交
  16. 27 4月, 2012 2 次提交
  17. 29 3月, 2012 1 次提交
  18. 09 12月, 2011 3 次提交
    • T
      sparc: Use HAVE_MEMBLOCK_NODE_MAP · 2a4814df
      Tejun Heo 提交于
      sparc doesn't access early_node_map[] directly and enabling
      HAVE_MEMBLOCK_NODE_MAP is trivial - replacing add_active_range() calls
      with memblock_set_node() and selecting HAVE_MEMBLOCK_NODE_MAP is
      enough.
      
      -v2: Use select in Kconfig instead as suggested by Sam Ravnborg.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: sparclinux@vger.kernel.org
      2a4814df
    • T
      memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users · 1aadc056
      Tejun Heo 提交于
      The only function of memblock_analyze() is now allowing resize of
      memblock region arrays.  Rename it to memblock_allow_resize() and
      update its users.
      
      * The following users remain the same other than renaming.
      
        arm/mm/init.c::arm_memblock_init()
        microblaze/kernel/prom.c::early_init_devtree()
        powerpc/kernel/prom.c::early_init_devtree()
        openrisc/kernel/prom.c::early_init_devtree()
        sh/mm/init.c::paging_init()
        sparc/mm/init_64.c::paging_init()
        unicore32/mm/init.c::uc32_memblock_init()
      
      * In the following users, analyze was used to update total size which
        is no longer necessary.
      
        powerpc/kernel/machine_kexec.c::reserve_crashkernel()
        powerpc/kernel/prom.c::early_init_devtree()
        powerpc/mm/init_32.c::MMU_init()
        powerpc/mm/tlb_nohash.c::__early_init_mmu()  
        powerpc/platforms/ps3/mm.c::ps3_mm_add_memory()
        powerpc/platforms/embedded6xx/wii.c::wii_memory_fixups()
        sh/kernel/machine_kexec.c::reserve_crashkernel()
      
      * x86/kernel/e820.c::memblock_x86_fill() was directly setting
        memblock_can_resize before populating memblock and calling analyze
        afterwards.  Call memblock_allow_resize() before start populating.
      
      memblock_can_resize is now static inside memblock.c.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      1aadc056
    • T
      memblock: Kill memblock_init() · fe091c20
      Tejun Heo 提交于
      memblock_init() initializes arrays for regions and memblock itself;
      however, all these can be done with struct initializers and
      memblock_init() can be removed.  This patch kills memblock_init() and
      initializes memblock with struct initializer.
      
      The only difference is that the first dummy entries don't have .nid
      set to MAX_NUMNODES initially.  This doesn't cause any behavior
      difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      fe091c20
  19. 30 9月, 2011 1 次提交
    • D
      sparc64: Force the execute bit in OpenFirmware's translation entries. · f4142cba
      David S. Miller 提交于
      In the OF 'translations' property, the template TTEs in the mappings
      never specify the executable bit.  This is the case even though some
      of these mappings are for OF's code segment.
      
      Therefore, we need to force the execute bit on in every mapping.
      
      This problem can only really trigger on Niagara/sun4v machines and the
      history behind this is a little complicated.
      
      Previous to sun4v, the sun4u TTE entries lacked a hardware execute
      permission bit.  So OF didn't have to ever worry about setting
      anything to handle executable pages.  Any valid TTE loaded into the
      I-TLB would be respected by the chip.
      
      But sun4v Niagara chips have a real hardware enforced executable bit
      in their TTEs.  So it has to be set or else the I-TLB throws an
      instruction access exception with type code 6 (protection violation).
      
      We've been extremely fortunate to not get bitten by this in the past.
      
      The best I can tell is that the OF's mappings for it's executable code
      were mapped using permanent locked mappings on sun4v in the past.
      Therefore, the fact that we didn't have the exec bit set in the OF
      translations we would use did not matter in practice.
      
      Thanks to Greg Onufer for helping me track this down.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4142cba
  20. 06 8月, 2011 1 次提交
  21. 05 8月, 2011 1 次提交
  22. 15 7月, 2011 1 次提交
    • T
      memblock: Don't allow archs to override memblock_nid_range() · f9b18db3
      Tejun Heo 提交于
      memblock_nid_range() is used to implement memblock_[try_]alloc_nid().
      The generic version determines the range by walking early_node_map
      with for_each_mem_pfn_range().  The generic version is defined __weak
      to allow arch override.
      
      Currently, only sparc overrides it; however, with the previous update
      to the generic implementation, there isn't much to be gained with arch
      override.  Sparc would behave exactly the same with the generic
      implementation.
      
      This patch disallows arch override for memblock_nid_range() and make
      both generic and sparc versions static.
      
      sparc is only compile tested.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/r/1310460395-30913-6-git-send-email-tj@kernel.org
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      f9b18db3
  23. 08 6月, 2011 1 次提交
  24. 17 5月, 2011 1 次提交
  25. 13 10月, 2010 1 次提交
    • Y
      memblock, bootmem: Round pfn properly for memory and reserved regions · c7fc2de0
      Yinghai Lu 提交于
      We need to round memory regions correctly -- specifically, we need to
      round reserved region in the more expansive direction (lower limit
      down, upper limit up) whereas usable memory regions need to be rounded
      in the more restrictive direction (lower limit up, upper limit down).
      
      This introduces two set of inlines:
      
      	memblock_region_memory_base_pfn()
      	memblock_region_memory_end_pfn()
      	memblock_region_reserved_base_pfn()
      	memblock_region_reserved_end_pfn()
      
      Although they are antisymmetric (and therefore are technically
      duplicates) the use of the different inlines explicitly documents the
      programmer's intention.
      
      The lack of proper rounding caused a bug on ARM, which was then found
      to also affect other architectures.
      Reported-by: NRussell King <rmk@arm.linux.org.uk>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4CB4CDFD.4020105@kernel.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      c7fc2de0
  26. 09 10月, 2010 1 次提交
  27. 05 8月, 2010 2 次提交
  28. 04 8月, 2010 2 次提交
  29. 14 7月, 2010 1 次提交