1. 13 11月, 2013 5 次提交
    • D
      sparc64: Move from 4MB to 8MB huge pages. · 37b3a8ff
      David S. Miller 提交于
      The impetus for this is that we would like to move to 64-bit PMDs and
      PGDs, but that would result in only supporting a 42-bit address space
      with the current page table layout.  It'd be nice to support at least
      43-bits.
      
      The reason we'd end up with only 42-bits after making PMDs and PGDs
      64-bit is that we only use half-page sized PTE tables in order to make
      PMDs line up to 4MB, the hardware huge page size we use.
      
      So what we do here is we make huge pages 8MB, and fabricate them using
      4MB hw TLB entries.
      
      Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in
      places that really need to operate on hardware 4MB pages.
      
      Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT,
      PGD_SHIFT, and the build time CPP test as needed.  Use a CPP test to
      make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up.
      
      This makes the pgtable cache completely unused, so remove the code
      managing it and the state used in mm_context_t.  Now we have less
      spinlocks taken in the page table allocation path.
      
      The technique we use to fabricate the 8MB pages is to transfer bit 22
      from the missing virtual address into the PTEs physical address field.
      That takes care of the transparent huge pages case.
      
      For hugetlb, we fill things in at the PTE level and that code already
      puts the sub huge page physical bits into the PTEs, based upon the
      offset, so there is nothing special we need to do.  It all just works
      out.
      
      So, a small amount of complexity in the THP case, but this code is
      about to get much simpler when we move the 64-bit PMDs as we can move
      away from the fancy 32-bit huge PMD encoding and just put a real PTE
      value in there.
      
      With bug fixes and help from Bob Picco.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37b3a8ff
    • D
      sparc64: Make PAGE_OFFSET variable. · b2d43834
      David S. Miller 提交于
      Choose PAGE_OFFSET dynamically based upon cpu type.
      
      Original UltraSPARC-I (spitfire) chips only supported a 44-bit
      virtual address space.
      
      Newer chips (T4 and later) support 52-bit virtual addresses
      and up to 47-bits of physical memory space.
      
      Therefore we have to adjust PAGE_SIZE dynamically based upon
      the capabilities of the chip.
      
      Note that this change alone does not allow us to support > 43-bit
      physical memory, to do that we need to re-arrange our page table
      support.  The current encodings of the pmd_t and pgd_t pointers
      restricts us to "32 + 11" == 43 bits.
      
      This change can waste quite a bit of memory for the various tables.
      In particular, a future change should work to size and allocate
      kern_linear_bitmap[] and sparc64_valid_addr_bitmap[] dynamically.
      This isn't easy as we really cannot take a TLB miss when accessing
      kern_linear_bitmap[].  We'd have to lock it into the TLB or similar.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      b2d43834
    • D
      sparc64: Document the shift counts used to validate linear kernel addresses. · bb7b4353
      David S. Miller 提交于
      This way we can see exactly what they are derived from, and in particular
      how they would change if we were to use a different PAGE_OFFSET value.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      bb7b4353
    • D
      sparc64: Use PAGE_OFFSET instead of a magic constant. · 922631b9
      David S. Miller 提交于
      This pertains to all of the computations of the kernel fast
      TLB miss xor values.
      
      Based upon a patch by Bob Picco.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      922631b9
    • D
      sparc64: Clean up 64-bit mmap exclusion defines. · c920745e
      David S. Miller 提交于
      Older UltraSPARC chips had an address space hole due to the MMU only
      supporting 44-bit virtual addresses.
      
      The top end of this hole also has the same value as the current
      definition of PAGE_OFFSET, so this can be confusing.
      
      Consolidate the defines for the userspace mmap exclusion range into
      page_64.h and use them in sys_sparc_64.c and hugetlbpage.c
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      c920745e
  2. 13 9月, 2013 1 次提交
  3. 12 9月, 2013 1 次提交
    • N
      mm: migrate: check movability of hugepage in unmap_and_move_huge_page() · 83467efb
      Naoya Horiguchi 提交于
      Currently hugepage migration works well only for pmd-based hugepages
      (mainly due to lack of testing,) so we had better not enable migration of
      other levels of hugepages until we are ready for it.
      
      Some users of hugepage migration (mbind, move_pages, and migrate_pages) do
      page table walk and check pud/pmd_huge() there, so they are safe.  But the
      other users (softoffline and memory hotremove) don't do this, so without
      this patch they can try to migrate unexpected types of hugepages.
      
      To prevent this, we introduce hugepage_migration_support() as an
      architecture dependent check of whether hugepage are implemented on a pmd
      basis or not.  And on some architecture multiple sizes of hugepages are
      available, so hugepage_migration_support() also checks hugepage size.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83467efb
  4. 15 7月, 2013 1 次提交
    • P
      sparc: delete __cpuinit/__CPUINIT usage from all users · 2066aadd
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/sparc uses of the __cpuinit macros from
      C files and removes __CPUINIT from assembly files.  Note that even
      though arch/sparc/kernel/trampoline_64.S has instances of ".previous"
      in it, they are all paired off against explicit ".section" directives,
      and not implicitly paired with __CPUINIT (unlike mips and arm were).
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      2066aadd
  5. 11 7月, 2013 1 次提交
  6. 04 7月, 2013 3 次提交
    • J
      mm/SPARC: prepare for removing num_physpages and simplify mem_init() · dceccbe9
      Jiang Liu 提交于
      Prepare for removing num_physpages and simplify mem_init().
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Acked-by: NSam Ravnborg <sam@ravnborg.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dceccbe9
    • J
      mm: concentrate modification of totalram_pages into the mm core · 0c988534
      Jiang Liu 提交于
      Concentrate code to modify totalram_pages into the mm core, so the arch
      memory initialized code doesn't need to take care of it.  With these
      changes applied, only following functions from mm core modify global
      variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
      free_all_bootmem_node(), adjust_managed_page_count().
      
      With this patch applied, it will be much more easier for us to keep
      totalram_pages and zone->managed_pages in consistence.
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0c988534
    • J
      mm: change signature of free_reserved_area() to fix building warnings · 11199692
      Jiang Liu 提交于
      Change signature of free_reserved_area() according to Russell King's
      suggestion to fix following build warnings:
      
        arch/arm/mm/init.c: In function 'mem_init':
        arch/arm/mm/init.c:603:2: warning: passing argument 1 of 'free_reserved_area' makes integer from pointer without a cast [enabled by default]
          free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL);
          ^
        In file included from include/linux/mman.h:4:0,
                         from arch/arm/mm/init.c:15:
        include/linux/mm.h:1301:22: note: expected 'long unsigned int' but argument is of type 'void *'
         extern unsigned long free_reserved_area(unsigned long start, unsigned long end,
      
         mm/page_alloc.c: In function 'free_reserved_area':
      >> mm/page_alloc.c:5134:3: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default]
         In file included from arch/mips/include/asm/page.h:49:0,
                          from include/linux/mmzone.h:20,
                          from include/linux/gfp.h:4,
                          from include/linux/mm.h:8,
                          from mm/page_alloc.c:18:
         arch/mips/include/asm/io.h:119:29: note: expected 'const volatile void *' but argument is of type 'long unsigned int'
         mm/page_alloc.c: In function 'free_area_init_nodes':
         mm/page_alloc.c:5030:34: warning: array subscript is below array bounds [-Warray-bounds]
      
      Also address some minor code review comments.
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      11199692
  7. 20 6月, 2013 1 次提交
  8. 19 6月, 2013 2 次提交
  9. 08 5月, 2013 1 次提交
  10. 30 4月, 2013 2 次提交
  11. 25 4月, 2013 1 次提交
  12. 20 4月, 2013 1 次提交
    • D
      sparc64: Fix race in TLB batch processing. · f36391d2
      David S. Miller 提交于
      As reported by Dave Kleikamp, when we emit cross calls to do batched
      TLB flush processing we have a race because we do not synchronize on
      the sibling cpus completing the cross call.
      
      So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.)
      and either flushes are missed or flushes will flush the wrong
      addresses.
      
      Fix this by using generic infrastructure to synchonize on the
      completion of the cross call.
      
      This first required getting the flush_tlb_pending() call out from
      switch_to() which operates with locks held and interrupts disabled.
      The problem is that smp_call_function_many() cannot be invoked with
      IRQs disabled and this is explicitly checked for with WARN_ON_ONCE().
      
      We get the batch processing outside of locked IRQ disabled sections by
      using some ideas from the powerpc port. Namely, we only batch inside
      of arch_{enter,leave}_lazy_mmu_mode() calls.  If we're not in such a
      region, we flush TLBs synchronously.
      
      1) Get rid of xcall_flush_tlb_pending and per-cpu type
         implementations.
      
      2) Do TLB batch cross calls instead via:
      
      	smp_call_function_many()
      		tlb_pending_func()
      			__flush_tlb_pending()
      
      3) Batch only in lazy mmu sequences:
      
      	a) Add 'active' member to struct tlb_batch
      	b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
      	c) Set 'active' in arch_enter_lazy_mmu_mode()
      	d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode()
      	e) Check 'active' in tlb_batch_add_one() and do a synchronous
                 flush if it's clear.
      
      4) Add infrastructure for synchronous TLB page flushes.
      
      	a) Implement __flush_tlb_page and per-cpu variants, patch
      	   as needed.
      	b) Likewise for xcall_flush_tlb_page.
      	c) Implement smp_flush_tlb_page() to invoke the cross-call.
      	d) Wire up global_flush_tlb_page() to the right routine based
                 upon CONFIG_SMP
      
      5) It turns out that singleton batches are very common, 2 out of every
         3 batch flushes have only a single entry in them.
      
         The batch flush waiting is very expensive, both because of the poll
         on sibling cpu completeion, as well as because passing the tlb batch
         pointer to the sibling cpus invokes a shared memory dereference.
      
         Therefore, in flush_tlb_pending(), if there is only one entry in
         the batch perform a completely asynchronous global_flush_tlb_page()
         instead.
      Reported-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      f36391d2
  13. 09 4月, 2013 1 次提交
  14. 01 4月, 2013 2 次提交
    • A
      sparc/iommu: fix typo s/265KB/256KB/ · 9a0ac1b6
      Akinobu Mita 提交于
      IOMMU_NPTES is 64K PTEs, so the size is 256KB (= 64K * sizeof(iopte_t))
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a0ac1b6
    • A
      sparc/srmmu: clear trailing edge of bitmap properly · 54df2db3
      Akinobu Mita 提交于
      srmmu_nocache_bitmap is cleared by bit_map_init().  But bit_map_init()
      attempts to clear by memset(), so it can't clear the trailing edge of
      bitmap properly on big-endian architecture if the number of bits is not
      a multiple of BITS_PER_LONG.
      
      Actually, the number of bits in srmmu_nocache_bitmap is not always
      a multiple of BITS_PER_LONG.  It is calculated as below:
      
              bitmap_bits = srmmu_nocache_size >> SRMMU_NOCACHE_BITMAP_SHIFT;
      
      srmmu_nocache_size is decided proportionally by the amount of system RAM
      and it is rounded to a multiple of PAGE_SIZE.  SRMMU_NOCACHE_BITMAP_SHIFT
      is defined as (PAGE_SHIFT - 4).  So it can only be said that bitmap_bits
      is a multiple of 16.
      
      This fixes the problem by using bitmap_clear() instead of memset()
      in bit_map_init() and this also uses BITS_TO_LONGS() to calculate correct
      size at bitmap allocation time.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54df2db3
  15. 21 3月, 2013 1 次提交
  16. 24 2月, 2013 3 次提交
    • S
      swap: add per-partition lock for swapfile · ec8acf20
      Shaohua Li 提交于
      swap_lock is heavily contended when I test swap to 3 fast SSD (even
      slightly slower than swap to 2 such SSD).  The main contention comes
      from swap_info_get().  This patch tries to fix the gap with adding a new
      per-partition lock.
      
      Global data like nr_swapfiles, total_swap_pages, least_priority and
      swap_list are still protected by swap_lock.
      
      nr_swap_pages is an atomic now, it can be changed without swap_lock.  In
      theory, it's possible get_swap_page() finds no swap pages but actually
      there are free swap pages.  But sounds not a big problem.
      
      Accessing partition specific data (like scan_swap_map and so on) is only
      protected by swap_info_struct.lock.
      
      Changing swap_info_struct.flags need hold swap_lock and
      swap_info_struct.lock, because scan_scan_map() will check it.  read the
      flags is ok with either the locks hold.
      
      If both swap_lock and swap_info_struct.lock must be hold, we always hold
      the former first to avoid deadlock.
      
      swap_entry_free() can change swap_list.  To delete that code, we add a
      new highest_priority_index.  Whenever get_swap_page() is called, we
      check it.  If it's valid, we use it.
      
      It's a pity get_swap_page() still holds swap_lock().  But in practice,
      swap_lock() isn't heavily contended in my test with this patch (or I can
      say there are other much more heavier bottlenecks like TLB flush).  And
      BTW, looks get_swap_page() doesn't really need the lock.  We never free
      swap_info[] and we check SWAP_WRITEOK flag.  The only risk without the
      lock is we could swapout to some low priority swap, but we can quickly
      recover after several rounds of swap, so sounds not a big deal to me.
      But I'd prefer to fix this if it's a real problem.
      
      "swap: make each swap partition have one address_space" improved the
      swapout speed from 1.7G/s to 2G/s.  This patch further improves the
      speed to 2.3G/s, so around 15% improvement.  It's a multi-process test,
      so TLB flush isn't the biggest bottleneck before the patches.
      
      [arnd@arndb.de: fix it for nommu]
      [hughd@google.com: add missing unlock]
      [minchan@kernel.org: get rid of lockdep whinge on sys_swapon]
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec8acf20
    • T
      memory-hotplug: remove memmap of sparse-vmemmap · 0197518c
      Tang Chen 提交于
      Introduce a new API vmemmap_free() to free and remove vmemmap
      pagetables.  Since pagetable implements are different, each architecture
      has to provide its own version of vmemmap_free(), just like
      vmemmap_populate().
      
      Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc.
      
      [mhocko@suse.cz: fix implicit declaration of remove_pagetable]
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NJianguo Wu <wujianguo@huawei.com>
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0197518c
    • Y
      memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap · 46723bfa
      Yasuaki Ishimatsu 提交于
      For removing memmap region of sparse-vmemmap which is allocated bootmem,
      memmap region of sparse-vmemmap needs to be registered by
      get_page_bootmem().  So the patch searches pages of virtual mapping and
      registers the pages by get_page_bootmem().
      
      NOTE: register_page_bootmem_memmap() is not implemented for ia64,
            ppc, s390, and sparc.  So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE
            and revert register_page_bootmem_info_node() when platform doesn't
            support it.
      
            It's implemented by adding a new Kconfig option named
            CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected
            by memory-hotplug feature fully supported archs(currently only on
            x86_64).
      
            Since we have 2 config options called MEMORY_HOTPLUG and
            MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately,
            and codes in function register_page_bootmem_info_node() are only
            used for collecting infomation for hot-remove, so reside it under
            MEMORY_HOTREMOVE.
      
            Besides page_isolation.c selected by MEMORY_ISOLATION under
            MEMORY_HOTPLUG is also such case, move it too.
      
      [mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE]
      [linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()]
      [mhocko@suse.cz: remove the arch specific functions without any implementation]
      [linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed]
      [rientjes@google.com: fix defined but not used warning]
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NWu Jianguo <wujianguo@huawei.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NLin Feng <linfeng@cn.fujitsu.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46723bfa
  17. 21 2月, 2013 3 次提交
    • D
      sparc64: Fix tsb_grow() in atomic context. · 0fbebed6
      David S. Miller 提交于
      If our first THP installation for an MM is via the set_pmd_at() done
      during khugepaged's collapsing we'll end up in tsb_grow() trying to do
      a GFP_KERNEL allocation with several locks held.
      
      Simply using GFP_ATOMIC in this situation is not the best option
      because we really can't have this fail, so we'd really like to keep
      this an order 0 GFP_KERNEL allocation if possible.
      
      Also, doing the TSB allocation from khugepaged is a really bad idea
      because we'll allocate it potentially from the wrong NUMA node in that
      context.
      
      So what we do is defer the hugepage TSB allocation until the first TLB
      miss we take on a hugepage.  This is slightly tricky because we have
      to handle two unusual cases:
      
      1) Taking the first hugepage TLB miss in the window trap handler.
         We'll call the winfix_trampoline when that is detected.
      
      2) An initial TSB allocation via TLB miss races with a hugetlb
         fault on another cpu running the same MM.  We handle this by
         unconditionally loading the TSB we see into the current cpu
         even if it's non-NULL at hugetlb_setup time.
      Reported-by: NMeelis Roos <mroos@ut.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fbebed6
    • D
      sparc64: Handle hugepage TSB being NULL. · bcd896ba
      David S. Miller 提交于
      Accomodate the possibility that the TSB might be NULL at
      the point that update_mmu_cache() is invoked.  This is
      necessary because we will sometimes need to defer the TSB
      allocation to the first fault that happens in the 'mm'.
      
      Seperate out the hugepage PTE test into a seperate function
      so that the logic is clearer.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcd896ba
    • D
      sparc64: Fix gfp_flags setting in tsb_grow(). · a55ee1ff
      David S. Miller 提交于
      We should "|= more_flags" rather than "= more_flags".
      Reported-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a55ee1ff
  18. 14 2月, 2013 1 次提交
  19. 13 1月, 2013 1 次提交
  20. 04 1月, 2013 1 次提交
    • G
      SPARC: drivers: remove __dev* attributes. · 7c9503b8
      Greg Kroah-Hartman 提交于
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the use of __devinit, __devexit_p, __devinitdata,
      and __devexit from these drivers.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c9503b8
  21. 12 12月, 2012 1 次提交
  22. 18 11月, 2012 1 次提交
  23. 17 10月, 2012 1 次提交
  24. 15 10月, 2012 1 次提交
  25. 11 10月, 2012 1 次提交
    • D
      sparc64: Fix deficiencies in sun4v error reporting. · f88620b9
      David S. Miller 提交于
      Missing error types, attributes, and report fields.  Pad out
      to 64-bytes.
      
      Make string reporting cleaner and easier to extend in the future using
      "const char *" arrays that index by either bit position, or absolute
      field value.
      
      Report the raw 64-byte error report as a sequence of u64s before the
      annotated version.
      
      Only report fields which are valid, given the context and the
      attribute bits which are set.
      
      For shutdown requests, use the local copy of the error report not the
      one we just freed up back to the queue.  Also, use orderly_poweroff()
      just like the Domain Services shutdown request code does.
      
      If the real-address reported is "-1" (unknown) try to disassemble the
      instruction to report the effective address of the access.  Only do
      this in privileged mode.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f88620b9
  26. 09 10月, 2012 2 次提交
    • D
      sparc64: Support transparent huge pages. · 9e695d2e
      David Miller 提交于
      This is relatively easy since PMD's now cover exactly 4MB of memory.
      
      Our PMD entries are 32-bits each, so we use a special encoding.  The
      lowest bit, PMD_ISHUGE, determines the interpretation.  This is possible
      because sparc64's page tables are purely software entities so we can use
      whatever encoding scheme we want.  We just have to make the TLB miss
      assembler page table walkers aware of the layout.
      
      set_pmd_at() works much like set_pte_at() but it has to operate in two
      page from a table of non-huge PTEs, so we have to queue up TLB flushes
      based upon what mappings are valid in the PTE table.  In the second regime
      we are going from huge-page to non-huge-page, and in that case we need
      only queue up a single TLB flush to push out the huge page mapping.
      
      We still have 5 bits remaining in the huge PMD encoding so we can very
      likely support any new pieces of THP state tracking that might get added
      in the future.
      
      With lots of help from Johannes Weiner.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e695d2e
    • D
      sparc64: Eliminate PTE table memory wastage. · c460bec7
      David Miller 提交于
      We've split up the PTE tables so that they take up half a page instead of
      a full page.  This is in order to facilitate transparent huge page
      support, which works much better if our PMDs cover 4MB instead of 8MB.
      
      What we do is have a one-behind cache for PTE table allocations in the
      mm struct.
      
      This logic triggers only on allocations.  For example, we don't try to
      keep track of free'd up page table blocks in the style that the s390 port
      does.
      
      There were only two slightly annoying aspects to this change:
      
      1) Changing pgtable_t to be a "pte_t *".  There's all of this special
         logic in the TLB free paths that needed adjustments, as did the
         PMD populate interfaces.
      
      2) init_new_context() needs to zap the pointer, since the mm struct
         just gets copied from the parent on fork.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c460bec7