1. 08 8月, 2016 3 次提交
  2. 16 6月, 2016 1 次提交
    • P
      locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}() · 3a1adb23
      Peter Zijlstra 提交于
      Implement FETCH-OP atomic primitives, these are very similar to the
      existing OP-RETURN primitives we already have, except they return the
      value of the atomic variable _before_ modification.
      
      This is especially useful for irreversible operations -- such as
      bitops (because it becomes impossible to reconstruct the state prior
      to modification).
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: James Y Knight <jyknight@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3a1adb23
  3. 25 12月, 2015 1 次提交
    • R
      sparc64: fix FP corruption in user copy functions · a7c5724b
      Rob Gardner 提交于
      Short story: Exception handlers used by some copy_to_user() and
      copy_from_user() functions do not diligently clean up floating point
      register usage, and this can result in a user process seeing invalid
      values in floating point registers. This sometimes makes the process
      fail.
      
      Long story: Several cpu-specific (NG4, NG2, U1, U3) memcpy functions
      use floating point registers and VIS alignaddr/faligndata to
      accelerate data copying when source and dest addresses don't align
      well. Linux uses a lazy scheme for saving floating point registers; It
      is not done upon entering the kernel since it's a very expensive
      operation. Rather, it is done only when needed. If the kernel ends up
      not using FP regs during the course of some trap or system call, then
      it can return to user space without saving or restoring them.
      
      The various memcpy functions begin their FP code with VISEntry (or a
      variation thereof), which saves the FP regs. They conclude their FP
      code with VISExit (or a variation) which essentially marks the FP regs
      "clean", ie, they contain no unsaved values. fprs.FPRS_FEF is turned
      off so that a lazy restore will be triggered when/if the user process
      accesses floating point regs again.
      
      The bug is that the user copy variants of memcpy, copy_from_user() and
      copy_to_user(), employ an exception handling mechanism to detect faults
      when accessing user space addresses, and when this handler is invoked,
      an immediate return from the function is forced, and VISExit is not
      executed, thus leaving the fprs register in an indeterminate state,
      but often with fprs.FPRS_FEF set and one or more dirty bits. This
      results in a return to user space with invalid values in the FP regs,
      and since fprs.FPRS_FEF is on, no lazy restore occurs.
      
      This bug affects copy_to_user() and copy_from_user() for NG4, NG2,
      U3, and U1. All are fixed by using a new exception handler for those
      loads and stores that are done during the time between VISEnter and
      VISExit.
      
      n.b. In NG4memcpy, the problematic code can be triggered by a copy
      size greater than 128 bytes and an unaligned source address.  This bug
      is known to be the cause of random user process memory corruptions
      while perf is running with the callgraph option (ie, perf record -g).
      This occurs because perf uses copy_from_user() to read user stacks,
      and may fault when it follows a stack frame pointer off to an
      invalid page. Validation checks on the stack address just obscure
      the underlying problem.
      Signed-off-by: NRob Gardner <rob.gardner@oracle.com>
      Signed-off-by: NDave Aldridge <david.j.aldridge@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7c5724b
  4. 08 8月, 2015 1 次提交
  5. 07 8月, 2015 1 次提交
    • D
      sparc64: Fix userspace FPU register corruptions. · 44922150
      David S. Miller 提交于
      If we have a series of events from userpsace, with %fprs=FPRS_FEF,
      like follows:
      
      ETRAP
      	ETRAP
      		VIS_ENTRY(fprs=0x4)
      		VIS_EXIT
      		RTRAP (kernel FPU restore with fpu_saved=0x4)
      	RTRAP
      
      We will not restore the user registers that were clobbered by the FPU
      using kernel code in the inner-most trap.
      
      Traps allocate FPU save slots in the thread struct, and FPU using
      sequences save the "dirty" FPU registers only.
      
      This works at the initial trap level because all of the registers
      get recorded into the top-level FPU save area, and we'll return
      to userspace with the FPU disabled so that any FPU use by the user
      will take an FPU disabled trap wherein we'll load the registers
      back up properly.
      
      But this is not how trap returns from kernel to kernel operate.
      
      The simplest fix for this bug is to always save all FPU register state
      for anything other than the top-most FPU save area.
      
      Getting rid of the optimized inner-slot FPU saving code ends up
      making VISEntryHalf degenerate into plain VISEntry.
      
      Longer term we need to do something smarter to reinstate the partial
      save optimizations.  Perhaps the fundament error is having trap entry
      and exit allocate FPU save slots and restore register state.  Instead,
      the VISEntry et al. calls should be doing that work.
      
      This bug is about two decades old.
      Reported-by: NJames Y Knight <jyknight@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44922150
  6. 27 7月, 2015 1 次提交
  7. 24 3月, 2015 1 次提交
    • D
      sparc64: Fix several bugs in memmove(). · 2077cef4
      David S. Miller 提交于
      Firstly, handle zero length calls properly.  Believe it or not there
      are a few of these happening during early boot.
      
      Next, we can't just drop to a memcpy() call in the forward copy case
      where dst <= src.  The reason is that the cache initializing stores
      used in the Niagara memcpy() implementations can end up clearing out
      cache lines before we've sourced their original contents completely.
      
      For example, considering NG4memcpy, the main unrolled loop begins like
      this:
      
           load   src + 0x00
           load   src + 0x08
           load   src + 0x10
           load   src + 0x18
           load   src + 0x20
           store  dst + 0x00
      
      Assume dst is 64 byte aligned and let's say that dst is src - 8 for
      this memcpy() call.  That store at the end there is the one to the
      first line in the cache line, thus clearing the whole line, which thus
      clobbers "src + 0x28" before it even gets loaded.
      
      To avoid this, just fall through to a simple copy only mildly
      optimized for the case where src and dst are 8 byte aligned and the
      length is a multiple of 8 as well.  We could get fancy and call
      GENmemcpy() but this is good enough for how this thing is actually
      used.
      Reported-by: NDavid Ahern <david.ahern@oracle.com>
      Reported-by: NBob Picco <bpicco@meloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2077cef4
  8. 08 11月, 2014 1 次提交
  9. 15 10月, 2014 1 次提交
    • D
      sparc64: Fix FPU register corruption with AES crypto offload. · f4da3628
      David S. Miller 提交于
      The AES loops in arch/sparc/crypto/aes_glue.c use a scheme where the
      key material is preloaded into the FPU registers, and then we loop
      over and over doing the crypt operation, reusing those pre-cooked key
      registers.
      
      There are intervening blkcipher*() calls between the crypt operation
      calls.  And those might perform memcpy() and thus also try to use the
      FPU.
      
      The sparc64 kernel FPU usage mechanism is designed to allow such
      recursive uses, but with a catch.
      
      There has to be a trap between the two FPU using threads of control.
      
      The mechanism works by, when the FPU is already in use by the kernel,
      allocating a slot for FPU saving at trap time.  Then if, within the
      trap handler, we try to use the FPU registers, the pre-trap FPU
      register state is saved into the slot.  Then at trap return time we
      notice this and restore the pre-trap FPU state.
      
      Over the long term there are various more involved ways we can make
      this work, but for a quick fix let's take advantage of the fact that
      the situation where this happens is very limited.
      
      All sparc64 chips that support the crypto instructiosn also are using
      the Niagara4 memcpy routine, and that routine only uses the FPU for
      large copies where we can't get the source aligned properly to a
      multiple of 8 bytes.
      
      We look to see if the FPU is already in use in this context, and if so
      we use the non-large copy path which only uses integer registers.
      
      Furthermore, we also limit this special logic to when we are doing
      kernel copy, rather than a user copy.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4da3628
  10. 10 9月, 2014 2 次提交
  11. 14 8月, 2014 1 次提交
  12. 22 7月, 2014 1 次提交
    • S
      sparc64: update IO access functions in PeeCeeI · 6b8b5507
      Sam Ravnborg 提交于
      The PeeCeeI.c code used in*() + out*() for IO access.
      But these are in little endian and the native (big) endian
      result was required which resulted in some bit-shifting.
      Shift the code over to use the __raw_*() variants all over.
      
      This simplifies the code as we can drop the calls
      to le16_to_cpu() and le32_to_cpu().
      And it should be a little faster too.
      
      With this change we now uses the same type of IO access functions
      in all of the file.
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b8b5507
  13. 19 7月, 2014 1 次提交
  14. 18 5月, 2014 1 次提交
  15. 02 5月, 2014 1 次提交
  16. 13 11月, 2013 1 次提交
    • D
      sparc64: Make PAGE_OFFSET variable. · b2d43834
      David S. Miller 提交于
      Choose PAGE_OFFSET dynamically based upon cpu type.
      
      Original UltraSPARC-I (spitfire) chips only supported a 44-bit
      virtual address space.
      
      Newer chips (T4 and later) support 52-bit virtual addresses
      and up to 47-bits of physical memory space.
      
      Therefore we have to adjust PAGE_SIZE dynamically based upon
      the capabilities of the chip.
      
      Note that this change alone does not allow us to support > 43-bit
      physical memory, to do that we need to re-arrange our page table
      support.  The current encodings of the pmd_t and pgd_t pointers
      restricts us to "32 + 11" == 43 bits.
      
      This change can waste quite a bit of memory for the various tables.
      In particular, a future change should work to size and allocate
      kern_linear_bitmap[] and sparc64_valid_addr_bitmap[] dynamically.
      This isn't easy as we really cannot take a TLB miss when accessing
      kern_linear_bitmap[].  We'd have to lock it into the TLB or similar.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      b2d43834
  17. 06 9月, 2013 1 次提交
  18. 01 5月, 2013 1 次提交
    • S
      Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS · 446f24d1
      Stephen Boyd 提交于
      The help text for this config is duplicated across the x86, parisc, and
      s390 Kconfig.debug files.  Arnd Bergman noted that the help text was
      slightly misleading and should be fixed to state that enabling this
      option isn't a problem when using pre 4.4 gcc.
      
      To simplify the rewording, consolidate the text into lib/Kconfig.debug
      and modify it there to be more explicit about when you should say N to
      this config.
      
      Also, make the text a bit more generic by stating that this option
      enables compile time checks so we can cover architectures which emit
      warnings vs.  ones which emit errors.  The details of how an
      architecture decided to implement the checks isn't as important as the
      concept of compile time checking of copy_from_user() calls.
      
      While we're doing this, remove all the copy_from_user_overflow() code
      that's duplicated many times and place it into lib/ so that any
      architecture supporting this option can get the function for free.
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Acked-by: NHelge Deller <deller@gmx.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      446f24d1
  19. 01 4月, 2013 1 次提交
    • A
      sparc/srmmu: clear trailing edge of bitmap properly · 54df2db3
      Akinobu Mita 提交于
      srmmu_nocache_bitmap is cleared by bit_map_init().  But bit_map_init()
      attempts to clear by memset(), so it can't clear the trailing edge of
      bitmap properly on big-endian architecture if the number of bits is not
      a multiple of BITS_PER_LONG.
      
      Actually, the number of bits in srmmu_nocache_bitmap is not always
      a multiple of BITS_PER_LONG.  It is calculated as below:
      
              bitmap_bits = srmmu_nocache_size >> SRMMU_NOCACHE_BITMAP_SHIFT;
      
      srmmu_nocache_size is decided proportionally by the amount of system RAM
      and it is rounded to a multiple of PAGE_SIZE.  SRMMU_NOCACHE_BITMAP_SHIFT
      is defined as (PAGE_SHIFT - 4).  So it can only be said that bitmap_bits
      is a multiple of 16.
      
      This fixes the problem by using bitmap_clear() instead of memset()
      in bit_map_init() and this also uses BITS_TO_LONGS() to calculate correct
      size at bitmap allocation time.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54df2db3
  20. 10 11月, 2012 1 次提交
  21. 06 10月, 2012 1 次提交
    • D
      sparc64: Niagara-4 bzero/memset, plus use MRU stores in page copy. · 9f825962
      David S. Miller 提交于
      This adds optimized memset/bzero/page-clear routines for Niagara-4.
      
      We basically can do what powerpc has been able to do for a decade (via
      the "dcbz" instruction), which is use cache line clearing stores for
      bzero and memsets with a 'c' argument of zero.
      
      As long as we make the cache initializing store to each 32-byte
      subblock of the L2 cache line, it works.
      
      As with other Niagara-4 optimized routines, the key is to make sure to
      avoid any usage of the %asi register, as reads and writes to it cost
      at least 50 cycles.
      
      For the user clear cases, we don't use these new routines, we use the
      Niagara-1 variants instead.  Those have to use %asi in an unavoidable
      way.
      
      A Niagara-4 8K page clear costs just under 600 cycles.
      
      Add definitions of the MRU variants of the cache initializing store
      ASIs.  By default, cache initializing stores install the line as Least
      Recently Used.  If we know we're going to use the data immediately
      (which is true for page copies and clears) we can use the Most
      Recently Used variant, to decrease the likelyhood of the lines being
      evicted before they get used.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f825962
  22. 29 9月, 2012 1 次提交
  23. 28 9月, 2012 1 次提交
  24. 27 9月, 2012 2 次提交
  25. 21 8月, 2012 1 次提交
  26. 27 6月, 2012 1 次提交
  27. 27 5月, 2012 1 次提交
  28. 25 5月, 2012 3 次提交
  29. 24 5月, 2012 1 次提交
    • D
      sparc: Optimize strncpy_from_user() zero byte search. · 4efcac3a
      David S. Miller 提交于
      Compute a mask that will only have 0x80 in the bytes which
      had a zero in them.  The formula is:
      
      	~(((x & 0x7f7f7f7f) + 0x7f7f7f7f) | x | 0x7f7f7f7f)
      
      In the inner word iteration, we have to compute the "x | 0x7f7f7f7f"
      part, so we can reuse that in the above calculation.
      
      Once we have this mask, we perform divide and conquer to find the
      highest 0x80 location.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4efcac3a
  30. 23 5月, 2012 1 次提交
  31. 20 5月, 2012 2 次提交
  32. 16 5月, 2012 1 次提交
  33. 14 5月, 2012 1 次提交