1. 18 3月, 2014 1 次提交
  2. 20 2月, 2014 1 次提交
    • P
      sparc32: make copy_to/from_user_page() usable from modular code · a56b072f
      Paul Gortmaker 提交于
      While copy_to/from_user_page() users are uncommon, there is one in
      drivers/staging/lustre/lustre/libcfs/linux/linux-curproc.c which leads
      to the following:
      
      ERROR: "sparc32_cachetlb_ops" [drivers/staging/lustre/lustre/libcfs/libcfs.ko] undefined!
      
      during routine allmodconfig build coverage.  The reason this happens
      is as follows:
      
      In arch/sparc/include/asm/cacheflush_32.h we have:
      
       #define flush_cache_page(vma,addr,pfn) \
              sparc32_cachetlb_ops->cache_page(vma, addr)
      
       #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
              do {                                                    \
                      flush_cache_page(vma, vaddr, page_to_pfn(page));\
                      memcpy(dst, src, len);                          \
              } while (0)
       #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
              do {                                                    \
                      flush_cache_page(vma, vaddr, page_to_pfn(page));\
                      memcpy(dst, src, len);                          \
              } while (0)
      
      However, sparc32_cachetlb_ops isn't exported and hence the error.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a56b072f
  3. 29 1月, 2014 1 次提交
  4. 22 1月, 2014 1 次提交
    • T
      memblock: make memblock_set_node() support different memblock_type · e7e8de59
      Tang Chen 提交于
      [sfr@canb.auug.org.au: fix powerpc build]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7e8de59
  5. 19 11月, 2013 1 次提交
  6. 15 11月, 2013 3 次提交
  7. 14 11月, 2013 1 次提交
    • D
      sparc64: Encode huge PMDs using PTE encoding. · a7b9403f
      David S. Miller 提交于
      Now that we have 64-bits for PMDs we can stop using special encodings
      for the huge PMD values, and just put real PTEs in there.
      
      We allocate a _PAGE_PMD_HUGE bit to distinguish between plain PMDs and
      huge ones.  It is the same for both 4U and 4V PTE layouts.
      
      We also use _PAGE_SPECIAL to indicate the splitting state, since a
      huge PMD cannot also be special.
      
      All of the PMD --> PTE translation code disappears, and most of the
      huge PMD bit modifications and tests just degenerate into the PTE
      operations.  In particular USER_PGTABLE_CHECK_PMD_HUGE becomes
      trivial.
      
      As a side effect, normal PMDs don't shift the physical address around.
      This also speeds up the page table walks in the TLB miss paths since
      they don't have to do the shifts any more.
      
      Another non-trivial aspect is that pte_modify() has to be changed
      to preserve the _PAGE_PMD_HUGE bits as well as the page size field
      of the pte.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7b9403f
  8. 13 11月, 2013 6 次提交
    • D
      sparc64: Move to 64-bit PGDs and PMDs. · 2b77933c
      David S. Miller 提交于
      To make the page tables compact, we were using 32-bit PGDs and PMDs.
      We only had to support <= 43 bits of physical addresses so this was
      quite feasible.
      
      In order to support larger physical addresses we have to move to
      64-bit PGDs and PMDs.
      
      Most of the changes are straight-forward:
      
      1) {pgd,pmd}_t --> unsigned long
      
      2) Anything that tries to use plain "unsigned int" types with pgd/pmd
         values needs to be adjusted.  In particular things like "0U" become
         "0UL".
      
      3) {PGDIR,PMD}_BITS decrease by one.
      
      4) In the assembler page table walkers, use "ldxa" instead of "lduwa"
         and adjust the low bit masks to clear out the low 3 bits instead of
         just the low 2 bits during pgd/pmd address formation.
      
      Also, use PTRS_PER_PGD and PTRS_PER_PMD in the sizing of the
      swapper_{pg_dir,low_pmd_dir} arrays.
      
      This patch does not try to take advantage of having 64-bits in the
      PMDs to simplify the hugepage code, that will come in a subsequent
      change.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b77933c
    • D
      sparc64: Move from 4MB to 8MB huge pages. · 37b3a8ff
      David S. Miller 提交于
      The impetus for this is that we would like to move to 64-bit PMDs and
      PGDs, but that would result in only supporting a 42-bit address space
      with the current page table layout.  It'd be nice to support at least
      43-bits.
      
      The reason we'd end up with only 42-bits after making PMDs and PGDs
      64-bit is that we only use half-page sized PTE tables in order to make
      PMDs line up to 4MB, the hardware huge page size we use.
      
      So what we do here is we make huge pages 8MB, and fabricate them using
      4MB hw TLB entries.
      
      Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in
      places that really need to operate on hardware 4MB pages.
      
      Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT,
      PGD_SHIFT, and the build time CPP test as needed.  Use a CPP test to
      make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up.
      
      This makes the pgtable cache completely unused, so remove the code
      managing it and the state used in mm_context_t.  Now we have less
      spinlocks taken in the page table allocation path.
      
      The technique we use to fabricate the 8MB pages is to transfer bit 22
      from the missing virtual address into the PTEs physical address field.
      That takes care of the transparent huge pages case.
      
      For hugetlb, we fill things in at the PTE level and that code already
      puts the sub huge page physical bits into the PTEs, based upon the
      offset, so there is nothing special we need to do.  It all just works
      out.
      
      So, a small amount of complexity in the THP case, but this code is
      about to get much simpler when we move the 64-bit PMDs as we can move
      away from the fancy 32-bit huge PMD encoding and just put a real PTE
      value in there.
      
      With bug fixes and help from Bob Picco.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37b3a8ff
    • D
      sparc64: Make PAGE_OFFSET variable. · b2d43834
      David S. Miller 提交于
      Choose PAGE_OFFSET dynamically based upon cpu type.
      
      Original UltraSPARC-I (spitfire) chips only supported a 44-bit
      virtual address space.
      
      Newer chips (T4 and later) support 52-bit virtual addresses
      and up to 47-bits of physical memory space.
      
      Therefore we have to adjust PAGE_SIZE dynamically based upon
      the capabilities of the chip.
      
      Note that this change alone does not allow us to support > 43-bit
      physical memory, to do that we need to re-arrange our page table
      support.  The current encodings of the pmd_t and pgd_t pointers
      restricts us to "32 + 11" == 43 bits.
      
      This change can waste quite a bit of memory for the various tables.
      In particular, a future change should work to size and allocate
      kern_linear_bitmap[] and sparc64_valid_addr_bitmap[] dynamically.
      This isn't easy as we really cannot take a TLB miss when accessing
      kern_linear_bitmap[].  We'd have to lock it into the TLB or similar.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      b2d43834
    • D
      sparc64: Document the shift counts used to validate linear kernel addresses. · bb7b4353
      David S. Miller 提交于
      This way we can see exactly what they are derived from, and in particular
      how they would change if we were to use a different PAGE_OFFSET value.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      bb7b4353
    • D
      sparc64: Use PAGE_OFFSET instead of a magic constant. · 922631b9
      David S. Miller 提交于
      This pertains to all of the computations of the kernel fast
      TLB miss xor values.
      
      Based upon a patch by Bob Picco.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      922631b9
    • D
      sparc64: Clean up 64-bit mmap exclusion defines. · c920745e
      David S. Miller 提交于
      Older UltraSPARC chips had an address space hole due to the MMU only
      supporting 44-bit virtual addresses.
      
      The top end of this hole also has the same value as the current
      definition of PAGE_OFFSET, so this can be confusing.
      
      Consolidate the defines for the userspace mmap exclusion range into
      page_64.h and use them in sys_sparc_64.c and hugetlbpage.c
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      c920745e
  9. 13 9月, 2013 1 次提交
  10. 12 9月, 2013 1 次提交
    • N
      mm: migrate: check movability of hugepage in unmap_and_move_huge_page() · 83467efb
      Naoya Horiguchi 提交于
      Currently hugepage migration works well only for pmd-based hugepages
      (mainly due to lack of testing,) so we had better not enable migration of
      other levels of hugepages until we are ready for it.
      
      Some users of hugepage migration (mbind, move_pages, and migrate_pages) do
      page table walk and check pud/pmd_huge() there, so they are safe.  But the
      other users (softoffline and memory hotremove) don't do this, so without
      this patch they can try to migrate unexpected types of hugepages.
      
      To prevent this, we introduce hugepage_migration_support() as an
      architecture dependent check of whether hugepage are implemented on a pmd
      basis or not.  And on some architecture multiple sizes of hugepages are
      available, so hugepage_migration_support() also checks hugepage size.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83467efb
  11. 15 7月, 2013 1 次提交
    • P
      sparc: delete __cpuinit/__CPUINIT usage from all users · 2066aadd
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/sparc uses of the __cpuinit macros from
      C files and removes __CPUINIT from assembly files.  Note that even
      though arch/sparc/kernel/trampoline_64.S has instances of ".previous"
      in it, they are all paired off against explicit ".section" directives,
      and not implicitly paired with __CPUINIT (unlike mips and arm were).
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      2066aadd
  12. 11 7月, 2013 1 次提交
  13. 04 7月, 2013 3 次提交
    • J
      mm/SPARC: prepare for removing num_physpages and simplify mem_init() · dceccbe9
      Jiang Liu 提交于
      Prepare for removing num_physpages and simplify mem_init().
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Acked-by: NSam Ravnborg <sam@ravnborg.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dceccbe9
    • J
      mm: concentrate modification of totalram_pages into the mm core · 0c988534
      Jiang Liu 提交于
      Concentrate code to modify totalram_pages into the mm core, so the arch
      memory initialized code doesn't need to take care of it.  With these
      changes applied, only following functions from mm core modify global
      variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
      free_all_bootmem_node(), adjust_managed_page_count().
      
      With this patch applied, it will be much more easier for us to keep
      totalram_pages and zone->managed_pages in consistence.
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0c988534
    • J
      mm: change signature of free_reserved_area() to fix building warnings · 11199692
      Jiang Liu 提交于
      Change signature of free_reserved_area() according to Russell King's
      suggestion to fix following build warnings:
      
        arch/arm/mm/init.c: In function 'mem_init':
        arch/arm/mm/init.c:603:2: warning: passing argument 1 of 'free_reserved_area' makes integer from pointer without a cast [enabled by default]
          free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL);
          ^
        In file included from include/linux/mman.h:4:0,
                         from arch/arm/mm/init.c:15:
        include/linux/mm.h:1301:22: note: expected 'long unsigned int' but argument is of type 'void *'
         extern unsigned long free_reserved_area(unsigned long start, unsigned long end,
      
         mm/page_alloc.c: In function 'free_reserved_area':
      >> mm/page_alloc.c:5134:3: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default]
         In file included from arch/mips/include/asm/page.h:49:0,
                          from include/linux/mmzone.h:20,
                          from include/linux/gfp.h:4,
                          from include/linux/mm.h:8,
                          from mm/page_alloc.c:18:
         arch/mips/include/asm/io.h:119:29: note: expected 'const volatile void *' but argument is of type 'long unsigned int'
         mm/page_alloc.c: In function 'free_area_init_nodes':
         mm/page_alloc.c:5030:34: warning: array subscript is below array bounds [-Warray-bounds]
      
      Also address some minor code review comments.
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      11199692
  14. 20 6月, 2013 1 次提交
  15. 19 6月, 2013 2 次提交
  16. 08 5月, 2013 1 次提交
  17. 30 4月, 2013 2 次提交
  18. 25 4月, 2013 1 次提交
  19. 20 4月, 2013 1 次提交
    • D
      sparc64: Fix race in TLB batch processing. · f36391d2
      David S. Miller 提交于
      As reported by Dave Kleikamp, when we emit cross calls to do batched
      TLB flush processing we have a race because we do not synchronize on
      the sibling cpus completing the cross call.
      
      So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.)
      and either flushes are missed or flushes will flush the wrong
      addresses.
      
      Fix this by using generic infrastructure to synchonize on the
      completion of the cross call.
      
      This first required getting the flush_tlb_pending() call out from
      switch_to() which operates with locks held and interrupts disabled.
      The problem is that smp_call_function_many() cannot be invoked with
      IRQs disabled and this is explicitly checked for with WARN_ON_ONCE().
      
      We get the batch processing outside of locked IRQ disabled sections by
      using some ideas from the powerpc port. Namely, we only batch inside
      of arch_{enter,leave}_lazy_mmu_mode() calls.  If we're not in such a
      region, we flush TLBs synchronously.
      
      1) Get rid of xcall_flush_tlb_pending and per-cpu type
         implementations.
      
      2) Do TLB batch cross calls instead via:
      
      	smp_call_function_many()
      		tlb_pending_func()
      			__flush_tlb_pending()
      
      3) Batch only in lazy mmu sequences:
      
      	a) Add 'active' member to struct tlb_batch
      	b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
      	c) Set 'active' in arch_enter_lazy_mmu_mode()
      	d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode()
      	e) Check 'active' in tlb_batch_add_one() and do a synchronous
                 flush if it's clear.
      
      4) Add infrastructure for synchronous TLB page flushes.
      
      	a) Implement __flush_tlb_page and per-cpu variants, patch
      	   as needed.
      	b) Likewise for xcall_flush_tlb_page.
      	c) Implement smp_flush_tlb_page() to invoke the cross-call.
      	d) Wire up global_flush_tlb_page() to the right routine based
                 upon CONFIG_SMP
      
      5) It turns out that singleton batches are very common, 2 out of every
         3 batch flushes have only a single entry in them.
      
         The batch flush waiting is very expensive, both because of the poll
         on sibling cpu completeion, as well as because passing the tlb batch
         pointer to the sibling cpus invokes a shared memory dereference.
      
         Therefore, in flush_tlb_pending(), if there is only one entry in
         the batch perform a completely asynchronous global_flush_tlb_page()
         instead.
      Reported-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      f36391d2
  20. 09 4月, 2013 1 次提交
  21. 01 4月, 2013 2 次提交
    • A
      sparc/iommu: fix typo s/265KB/256KB/ · 9a0ac1b6
      Akinobu Mita 提交于
      IOMMU_NPTES is 64K PTEs, so the size is 256KB (= 64K * sizeof(iopte_t))
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a0ac1b6
    • A
      sparc/srmmu: clear trailing edge of bitmap properly · 54df2db3
      Akinobu Mita 提交于
      srmmu_nocache_bitmap is cleared by bit_map_init().  But bit_map_init()
      attempts to clear by memset(), so it can't clear the trailing edge of
      bitmap properly on big-endian architecture if the number of bits is not
      a multiple of BITS_PER_LONG.
      
      Actually, the number of bits in srmmu_nocache_bitmap is not always
      a multiple of BITS_PER_LONG.  It is calculated as below:
      
              bitmap_bits = srmmu_nocache_size >> SRMMU_NOCACHE_BITMAP_SHIFT;
      
      srmmu_nocache_size is decided proportionally by the amount of system RAM
      and it is rounded to a multiple of PAGE_SIZE.  SRMMU_NOCACHE_BITMAP_SHIFT
      is defined as (PAGE_SHIFT - 4).  So it can only be said that bitmap_bits
      is a multiple of 16.
      
      This fixes the problem by using bitmap_clear() instead of memset()
      in bit_map_init() and this also uses BITS_TO_LONGS() to calculate correct
      size at bitmap allocation time.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54df2db3
  22. 21 3月, 2013 1 次提交
  23. 24 2月, 2013 3 次提交
    • S
      swap: add per-partition lock for swapfile · ec8acf20
      Shaohua Li 提交于
      swap_lock is heavily contended when I test swap to 3 fast SSD (even
      slightly slower than swap to 2 such SSD).  The main contention comes
      from swap_info_get().  This patch tries to fix the gap with adding a new
      per-partition lock.
      
      Global data like nr_swapfiles, total_swap_pages, least_priority and
      swap_list are still protected by swap_lock.
      
      nr_swap_pages is an atomic now, it can be changed without swap_lock.  In
      theory, it's possible get_swap_page() finds no swap pages but actually
      there are free swap pages.  But sounds not a big problem.
      
      Accessing partition specific data (like scan_swap_map and so on) is only
      protected by swap_info_struct.lock.
      
      Changing swap_info_struct.flags need hold swap_lock and
      swap_info_struct.lock, because scan_scan_map() will check it.  read the
      flags is ok with either the locks hold.
      
      If both swap_lock and swap_info_struct.lock must be hold, we always hold
      the former first to avoid deadlock.
      
      swap_entry_free() can change swap_list.  To delete that code, we add a
      new highest_priority_index.  Whenever get_swap_page() is called, we
      check it.  If it's valid, we use it.
      
      It's a pity get_swap_page() still holds swap_lock().  But in practice,
      swap_lock() isn't heavily contended in my test with this patch (or I can
      say there are other much more heavier bottlenecks like TLB flush).  And
      BTW, looks get_swap_page() doesn't really need the lock.  We never free
      swap_info[] and we check SWAP_WRITEOK flag.  The only risk without the
      lock is we could swapout to some low priority swap, but we can quickly
      recover after several rounds of swap, so sounds not a big deal to me.
      But I'd prefer to fix this if it's a real problem.
      
      "swap: make each swap partition have one address_space" improved the
      swapout speed from 1.7G/s to 2G/s.  This patch further improves the
      speed to 2.3G/s, so around 15% improvement.  It's a multi-process test,
      so TLB flush isn't the biggest bottleneck before the patches.
      
      [arnd@arndb.de: fix it for nommu]
      [hughd@google.com: add missing unlock]
      [minchan@kernel.org: get rid of lockdep whinge on sys_swapon]
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec8acf20
    • T
      memory-hotplug: remove memmap of sparse-vmemmap · 0197518c
      Tang Chen 提交于
      Introduce a new API vmemmap_free() to free and remove vmemmap
      pagetables.  Since pagetable implements are different, each architecture
      has to provide its own version of vmemmap_free(), just like
      vmemmap_populate().
      
      Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc.
      
      [mhocko@suse.cz: fix implicit declaration of remove_pagetable]
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NJianguo Wu <wujianguo@huawei.com>
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0197518c
    • Y
      memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap · 46723bfa
      Yasuaki Ishimatsu 提交于
      For removing memmap region of sparse-vmemmap which is allocated bootmem,
      memmap region of sparse-vmemmap needs to be registered by
      get_page_bootmem().  So the patch searches pages of virtual mapping and
      registers the pages by get_page_bootmem().
      
      NOTE: register_page_bootmem_memmap() is not implemented for ia64,
            ppc, s390, and sparc.  So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE
            and revert register_page_bootmem_info_node() when platform doesn't
            support it.
      
            It's implemented by adding a new Kconfig option named
            CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected
            by memory-hotplug feature fully supported archs(currently only on
            x86_64).
      
            Since we have 2 config options called MEMORY_HOTPLUG and
            MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately,
            and codes in function register_page_bootmem_info_node() are only
            used for collecting infomation for hot-remove, so reside it under
            MEMORY_HOTREMOVE.
      
            Besides page_isolation.c selected by MEMORY_ISOLATION under
            MEMORY_HOTPLUG is also such case, move it too.
      
      [mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE]
      [linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()]
      [mhocko@suse.cz: remove the arch specific functions without any implementation]
      [linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed]
      [rientjes@google.com: fix defined but not used warning]
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NWu Jianguo <wujianguo@huawei.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NLin Feng <linfeng@cn.fujitsu.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46723bfa
  24. 21 2月, 2013 3 次提交
    • D
      sparc64: Fix tsb_grow() in atomic context. · 0fbebed6
      David S. Miller 提交于
      If our first THP installation for an MM is via the set_pmd_at() done
      during khugepaged's collapsing we'll end up in tsb_grow() trying to do
      a GFP_KERNEL allocation with several locks held.
      
      Simply using GFP_ATOMIC in this situation is not the best option
      because we really can't have this fail, so we'd really like to keep
      this an order 0 GFP_KERNEL allocation if possible.
      
      Also, doing the TSB allocation from khugepaged is a really bad idea
      because we'll allocate it potentially from the wrong NUMA node in that
      context.
      
      So what we do is defer the hugepage TSB allocation until the first TLB
      miss we take on a hugepage.  This is slightly tricky because we have
      to handle two unusual cases:
      
      1) Taking the first hugepage TLB miss in the window trap handler.
         We'll call the winfix_trampoline when that is detected.
      
      2) An initial TSB allocation via TLB miss races with a hugetlb
         fault on another cpu running the same MM.  We handle this by
         unconditionally loading the TSB we see into the current cpu
         even if it's non-NULL at hugetlb_setup time.
      Reported-by: NMeelis Roos <mroos@ut.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fbebed6
    • D
      sparc64: Handle hugepage TSB being NULL. · bcd896ba
      David S. Miller 提交于
      Accomodate the possibility that the TSB might be NULL at
      the point that update_mmu_cache() is invoked.  This is
      necessary because we will sometimes need to defer the TSB
      allocation to the first fault that happens in the 'mm'.
      
      Seperate out the hugepage PTE test into a seperate function
      so that the logic is clearer.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcd896ba
    • D
      sparc64: Fix gfp_flags setting in tsb_grow(). · a55ee1ff
      David S. Miller 提交于
      We should "|= more_flags" rather than "= more_flags".
      Reported-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a55ee1ff