1. 06 8月, 2014 1 次提交
  2. 05 8月, 2014 2 次提交
    • D
      sparc64: Guard against flushing openfirmware mappings. · 4ca9a237
      David S. Miller 提交于
      Based almost entirely upon a patch by Christopher Alexander Tobias
      Schulze.
      
      In commit db64fe02 ("mm: rewrite vmap
      layer") lazy VMAP tlb flushing was added to the vmalloc layer.  This
      causes problems on sparc64.
      
      Sparc64 has two VMAP mapped regions and they are not contiguous with
      eachother.  First we have the malloc mapping area, then another
      unrelated region, then the vmalloc region.
      
      This "another unrelated region" is where the firmware is mapped.
      
      If the lazy TLB flushing logic in the vmalloc code triggers after
      we've had both a module unload and a vfree or similar, it will pass an
      address range that goes from somewhere inside the malloc region to
      somewhere inside the vmalloc region, and thus covering the
      openfirmware area entirely.
      
      The sparc64 kernel learns about openfirmware's dynamic mappings in
      this region early in the boot, and then services TLB misses in this
      area.  But openfirmware has some locked TLB entries which are not
      mentioned in those dynamic mappings and we should thus not disturb
      them.
      
      These huge lazy TLB flush ranges causes those openfirmware locked TLB
      entries to be removed, resulting in all kinds of problems including
      hard hangs and crashes during reboot/reset.
      
      Besides causing problems like this, such huge TLB flush ranges are
      also incredibly inefficient.  A plea has been made with the author of
      the VMAP lazy TLB flushing code, but for now we'll put a safety guard
      into our flush_tlb_kernel_range() implementation.
      
      Since the implementation has become non-trivial, stop defining it as a
      macro and instead make it a function in a C source file.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ca9a237
    • D
      sparc64: Do not insert non-valid PTEs into the TSB hash table. · 18f38132
      David S. Miller 提交于
      The assumption was that update_mmu_cache() (and the equivalent for PMDs) would
      only be called when the PTE being installed will be accessible by the user.
      
      This is not true for code paths originating from remove_migration_pte().
      
      There are dire consequences for placing a non-valid PTE into the TSB.  The TLB
      miss frramework assumes thatwhen a TSB entry matches we can just load it into
      the TLB and return from the TLB miss trap.
      
      So if a non-valid PTE is in there, we will deadlock taking the TLB miss over
      and over, never satisfying the miss.
      
      Just exit early from update_mmu_cache() and friends in this situation.
      
      Based upon a report and patch from Christopher Alexander Tobias Schulze.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18f38132
  3. 22 7月, 2014 1 次提交
  4. 05 6月, 2014 1 次提交
  5. 19 5月, 2014 7 次提交
  6. 09 5月, 2014 1 次提交
    • D
      sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus. · b18eb2d7
      David S. Miller 提交于
      Access to the TSB hash tables during TLB misses requires that there be
      an atomic 128-bit quad load available so that we fetch a matching TAG
      and DATA field at the same time.
      
      On cpus prior to UltraSPARC-III only virtual address based quad loads
      are available.  UltraSPARC-III and later provide physical address
      based variants which are easier to use.
      
      When we only have virtual address based quad loads available this
      means that we have to lock the TSB into the TLB at a fixed virtual
      address on each cpu when it runs that process.  We can't just access
      the PAGE_OFFSET based aliased mapping of these TSBs because we cannot
      take a recursive TLB miss inside of the TLB miss handler without
      risking running out of hardware trap levels (some trap combinations
      can be deep, such as those generated by register window spill and fill
      traps).
      
      Without huge pages it's working perfectly fine, but when the huge TSB
      got added another chunk of fixed virtual address space was not
      allocated for this second TSB mapping.
      
      So we were mapping both the 8K and 4MB TSBs to the same exact virtual
      address, causing multiple TLB matches which gives undefined behavior.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b18eb2d7
  7. 07 5月, 2014 1 次提交
    • D
      sparc64: Don't bark so loudly about 32-bit tasks generating 64-bit fault addresses. · e5c460f4
      David S. Miller 提交于
      This was found using Dave Jone's trinity tool.
      
      When a user process which is 32-bit performs a load or a store, the
      cpu chops off the top 32-bits of the effective address before
      translating it.
      
      This is because we run 32-bit tasks with the PSTATE_AM (address
      masking) bit set.
      
      We can't run the kernel with that bit set, so when the kernel accesses
      userspace no address masking occurs.
      
      Since a 32-bit process will have no mappings in that region we will
      properly fault, so we don't try to handle this using access_ok(),
      which can safely just be a NOP on sparc64.
      
      Real faults from 32-bit processes should never generate such addresses
      so a bug check was added long ago, and it barks in the logs if this
      happens.
      
      But it also barks when a kernel user access causes this condition, and
      that _can_ happen.  For example, if a pointer passed into a system call
      is "0xfffffffc" and the kernel access 4 bytes offset from that pointer.
      
      Just handle such faults normally via the exception entries.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5c460f4
  8. 04 5月, 2014 5 次提交
  9. 29 4月, 2014 8 次提交
  10. 18 3月, 2014 1 次提交
  11. 20 2月, 2014 1 次提交
    • P
      sparc32: make copy_to/from_user_page() usable from modular code · a56b072f
      Paul Gortmaker 提交于
      While copy_to/from_user_page() users are uncommon, there is one in
      drivers/staging/lustre/lustre/libcfs/linux/linux-curproc.c which leads
      to the following:
      
      ERROR: "sparc32_cachetlb_ops" [drivers/staging/lustre/lustre/libcfs/libcfs.ko] undefined!
      
      during routine allmodconfig build coverage.  The reason this happens
      is as follows:
      
      In arch/sparc/include/asm/cacheflush_32.h we have:
      
       #define flush_cache_page(vma,addr,pfn) \
              sparc32_cachetlb_ops->cache_page(vma, addr)
      
       #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
              do {                                                    \
                      flush_cache_page(vma, vaddr, page_to_pfn(page));\
                      memcpy(dst, src, len);                          \
              } while (0)
       #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
              do {                                                    \
                      flush_cache_page(vma, vaddr, page_to_pfn(page));\
                      memcpy(dst, src, len);                          \
              } while (0)
      
      However, sparc32_cachetlb_ops isn't exported and hence the error.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a56b072f
  12. 29 1月, 2014 1 次提交
  13. 22 1月, 2014 1 次提交
    • T
      memblock: make memblock_set_node() support different memblock_type · e7e8de59
      Tang Chen 提交于
      [sfr@canb.auug.org.au: fix powerpc build]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7e8de59
  14. 19 11月, 2013 1 次提交
  15. 15 11月, 2013 3 次提交
  16. 14 11月, 2013 1 次提交
    • D
      sparc64: Encode huge PMDs using PTE encoding. · a7b9403f
      David S. Miller 提交于
      Now that we have 64-bits for PMDs we can stop using special encodings
      for the huge PMD values, and just put real PTEs in there.
      
      We allocate a _PAGE_PMD_HUGE bit to distinguish between plain PMDs and
      huge ones.  It is the same for both 4U and 4V PTE layouts.
      
      We also use _PAGE_SPECIAL to indicate the splitting state, since a
      huge PMD cannot also be special.
      
      All of the PMD --> PTE translation code disappears, and most of the
      huge PMD bit modifications and tests just degenerate into the PTE
      operations.  In particular USER_PGTABLE_CHECK_PMD_HUGE becomes
      trivial.
      
      As a side effect, normal PMDs don't shift the physical address around.
      This also speeds up the page table walks in the TLB miss paths since
      they don't have to do the shifts any more.
      
      Another non-trivial aspect is that pte_modify() has to be changed
      to preserve the _PAGE_PMD_HUGE bits as well as the page size field
      of the pte.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7b9403f
  17. 13 11月, 2013 4 次提交
    • D
      sparc64: Move to 64-bit PGDs and PMDs. · 2b77933c
      David S. Miller 提交于
      To make the page tables compact, we were using 32-bit PGDs and PMDs.
      We only had to support <= 43 bits of physical addresses so this was
      quite feasible.
      
      In order to support larger physical addresses we have to move to
      64-bit PGDs and PMDs.
      
      Most of the changes are straight-forward:
      
      1) {pgd,pmd}_t --> unsigned long
      
      2) Anything that tries to use plain "unsigned int" types with pgd/pmd
         values needs to be adjusted.  In particular things like "0U" become
         "0UL".
      
      3) {PGDIR,PMD}_BITS decrease by one.
      
      4) In the assembler page table walkers, use "ldxa" instead of "lduwa"
         and adjust the low bit masks to clear out the low 3 bits instead of
         just the low 2 bits during pgd/pmd address formation.
      
      Also, use PTRS_PER_PGD and PTRS_PER_PMD in the sizing of the
      swapper_{pg_dir,low_pmd_dir} arrays.
      
      This patch does not try to take advantage of having 64-bits in the
      PMDs to simplify the hugepage code, that will come in a subsequent
      change.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b77933c
    • D
      sparc64: Move from 4MB to 8MB huge pages. · 37b3a8ff
      David S. Miller 提交于
      The impetus for this is that we would like to move to 64-bit PMDs and
      PGDs, but that would result in only supporting a 42-bit address space
      with the current page table layout.  It'd be nice to support at least
      43-bits.
      
      The reason we'd end up with only 42-bits after making PMDs and PGDs
      64-bit is that we only use half-page sized PTE tables in order to make
      PMDs line up to 4MB, the hardware huge page size we use.
      
      So what we do here is we make huge pages 8MB, and fabricate them using
      4MB hw TLB entries.
      
      Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in
      places that really need to operate on hardware 4MB pages.
      
      Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT,
      PGD_SHIFT, and the build time CPP test as needed.  Use a CPP test to
      make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up.
      
      This makes the pgtable cache completely unused, so remove the code
      managing it and the state used in mm_context_t.  Now we have less
      spinlocks taken in the page table allocation path.
      
      The technique we use to fabricate the 8MB pages is to transfer bit 22
      from the missing virtual address into the PTEs physical address field.
      That takes care of the transparent huge pages case.
      
      For hugetlb, we fill things in at the PTE level and that code already
      puts the sub huge page physical bits into the PTEs, based upon the
      offset, so there is nothing special we need to do.  It all just works
      out.
      
      So, a small amount of complexity in the THP case, but this code is
      about to get much simpler when we move the 64-bit PMDs as we can move
      away from the fancy 32-bit huge PMD encoding and just put a real PTE
      value in there.
      
      With bug fixes and help from Bob Picco.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37b3a8ff
    • D
      sparc64: Make PAGE_OFFSET variable. · b2d43834
      David S. Miller 提交于
      Choose PAGE_OFFSET dynamically based upon cpu type.
      
      Original UltraSPARC-I (spitfire) chips only supported a 44-bit
      virtual address space.
      
      Newer chips (T4 and later) support 52-bit virtual addresses
      and up to 47-bits of physical memory space.
      
      Therefore we have to adjust PAGE_SIZE dynamically based upon
      the capabilities of the chip.
      
      Note that this change alone does not allow us to support > 43-bit
      physical memory, to do that we need to re-arrange our page table
      support.  The current encodings of the pmd_t and pgd_t pointers
      restricts us to "32 + 11" == 43 bits.
      
      This change can waste quite a bit of memory for the various tables.
      In particular, a future change should work to size and allocate
      kern_linear_bitmap[] and sparc64_valid_addr_bitmap[] dynamically.
      This isn't easy as we really cannot take a TLB miss when accessing
      kern_linear_bitmap[].  We'd have to lock it into the TLB or similar.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      b2d43834
    • D
      sparc64: Document the shift counts used to validate linear kernel addresses. · bb7b4353
      David S. Miller 提交于
      This way we can see exactly what they are derived from, and in particular
      how they would change if we were to use a different PAGE_OFFSET value.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      bb7b4353