1. 16 8月, 2017 1 次提交
  2. 11 8月, 2017 1 次提交
  3. 05 8月, 2017 2 次提交
  4. 10 5月, 2017 1 次提交
  5. 28 3月, 2017 1 次提交
    • B
      arch/sparc: Avoid DCTI Couples · 0ae2d26f
      Babu Moger 提交于
      Avoid un-intended DCTI Couples. Use of DCTI couples is deprecated.
      Also address the "Programming Note" for optimal performance.
      
      Here is the complete text from Oracle SPARC Architecture Specs.
      
      6.3.4.7 DCTI Couples
      "A delayed control transfer instruction (DCTI) in the delay slot of
      another DCTI is referred to as a “DCTI couple”. The use of DCTI couples
      is deprecated in the Oracle SPARC Architecture; no new software should
      place a DCTI in the delay slot of another DCTI, because on future Oracle
      SPARC Architecture implementations DCTI couples may execute either
      slowly or differently than the programmer assumes it will.
      
      SPARC V8 and SPARC V9 Compatibility Note
      The SPARC V8 architecture left behavior undefined for a DCTI couple. The
      SPARC V9 architecture defined behavior in that case, but as of
      UltraSPARC Architecture 2005, use of DCTI couples was deprecated.
      Software should not expect high performance from DCTI couples, and
      performance of DCTI couples should be expected to decline further in
      future processors.
      
      Programming Note
      As noted in TABLE 6-5 on page 115, an annulled branch-always
      (branch-always with a = 1) instruction is not architecturally a DCTI.
      However, since not all implementations make that distinction, for
      optimal performance, a DCTI should not be placed in the instruction word
      immediately following an annulled branch-always instruction (BA,A or
      BPA,A)."
      Signed-off-by: NBabu Moger <babu.moger@oracle.com>
      Reviewed-by: NRob Gardner <rob.gardner@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ae2d26f
  6. 25 10月, 2016 3 次提交
  7. 08 8月, 2016 1 次提交
  8. 28 4月, 2016 1 次提交
    • D
      sparc64: Fix bootup regressions on some Kconfig combinations. · 49fa5230
      David S. Miller 提交于
      The system call tracing bug fix mentioned in the Fixes tag
      below increased the amount of assembler code in the sequence
      of assembler files included by head_64.S
      
      This caused to total set of code to exceed 0x4000 bytes in
      size, which overflows the expression in head_64.S that works
      to place swapper_tsb at address 0x408000.
      
      When this is violated, the TSB is not properly aligned, and
      also the trap table is not aligned properly either.  All of
      this together results in failed boots.
      
      So, do two things:
      
      1) Simplify some code by using ba,a instead of ba/nop to get
         those bytes back.
      
      2) Add a linker script assertion to make sure that if this
         happens again the build will fail.
      
      Fixes: 1a40b953 ("sparc: Fix system call tracing register handling.")
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Reported-by: NJoerg Abraham <joerg.abraham@nokia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49fa5230
  9. 22 4月, 2016 1 次提交
  10. 09 2月, 2016 1 次提交
  11. 25 12月, 2015 1 次提交
    • R
      sparc64: fix FP corruption in user copy functions · a7c5724b
      Rob Gardner 提交于
      Short story: Exception handlers used by some copy_to_user() and
      copy_from_user() functions do not diligently clean up floating point
      register usage, and this can result in a user process seeing invalid
      values in floating point registers. This sometimes makes the process
      fail.
      
      Long story: Several cpu-specific (NG4, NG2, U1, U3) memcpy functions
      use floating point registers and VIS alignaddr/faligndata to
      accelerate data copying when source and dest addresses don't align
      well. Linux uses a lazy scheme for saving floating point registers; It
      is not done upon entering the kernel since it's a very expensive
      operation. Rather, it is done only when needed. If the kernel ends up
      not using FP regs during the course of some trap or system call, then
      it can return to user space without saving or restoring them.
      
      The various memcpy functions begin their FP code with VISEntry (or a
      variation thereof), which saves the FP regs. They conclude their FP
      code with VISExit (or a variation) which essentially marks the FP regs
      "clean", ie, they contain no unsaved values. fprs.FPRS_FEF is turned
      off so that a lazy restore will be triggered when/if the user process
      accesses floating point regs again.
      
      The bug is that the user copy variants of memcpy, copy_from_user() and
      copy_to_user(), employ an exception handling mechanism to detect faults
      when accessing user space addresses, and when this handler is invoked,
      an immediate return from the function is forced, and VISExit is not
      executed, thus leaving the fprs register in an indeterminate state,
      but often with fprs.FPRS_FEF set and one or more dirty bits. This
      results in a return to user space with invalid values in the FP regs,
      and since fprs.FPRS_FEF is on, no lazy restore occurs.
      
      This bug affects copy_to_user() and copy_from_user() for NG4, NG2,
      U3, and U1. All are fixed by using a new exception handler for those
      loads and stores that are done during the time between VISEnter and
      VISExit.
      
      n.b. In NG4memcpy, the problematic code can be triggered by a copy
      size greater than 128 bytes and an unaligned source address.  This bug
      is known to be the cause of random user process memory corruptions
      while perf is running with the callgraph option (ie, perf record -g).
      This occurs because perf uses copy_from_user() to read user stacks,
      and may fault when it follows a stack frame pointer off to an
      invalid page. Validation checks on the stack address just obscure
      the underlying problem.
      Signed-off-by: NRob Gardner <rob.gardner@oracle.com>
      Signed-off-by: NDave Aldridge <david.j.aldridge@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7c5724b
  12. 25 10月, 2014 1 次提交
    • D
      sparc64: Fix register corruption in top-most kernel stack frame during boot. · ef3e035c
      David S. Miller 提交于
      Meelis Roos reported that kernels built with gcc-4.9 do not boot, we
      eventually narrowed this down to only impacting machines using
      UltraSPARC-III and derivitive cpus.
      
      The crash happens right when the first user process is spawned:
      
      [   54.451346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
      [   54.451346]
      [   54.571516] CPU: 1 PID: 1 Comm: init Not tainted 3.16.0-rc2-00211-gd7933ab7 #96
      [   54.666431] Call Trace:
      [   54.698453]  [0000000000762f8c] panic+0xb0/0x224
      [   54.759071]  [000000000045cf68] do_exit+0x948/0x960
      [   54.823123]  [000000000042cbc0] fault_in_user_windows+0xe0/0x100
      [   54.902036]  [0000000000404ad0] __handle_user_windows+0x0/0x10
      [   54.978662] Press Stop-A (L1-A) to return to the boot prom
      [   55.050713] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
      
      Further investigation showed that compiling only per_cpu_patch() with
      an older compiler fixes the boot.
      
      Detailed analysis showed that the function is not being miscompiled by
      gcc-4.9, but it is using a different register allocation ordering.
      
      With the gcc-4.9 compiled function, something during the code patching
      causes some of the %i* input registers to get corrupted.  Perhaps
      we have a TLB miss path into the firmware that is deep enough to
      cause a register window spill and subsequent restore when we get
      back from the TLB miss trap.
      
      Let's plug this up by doing two things:
      
      1) Stop using the firmware stack for client interface calls into
         the firmware.  Just use the kernel's stack.
      
      2) As soon as we can, call into a new function "start_early_boot()"
         to put a one-register-window buffer between the firmware's
         deepest stack frame and the top-most initial kernel one.
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef3e035c
  13. 10 9月, 2014 1 次提交
  14. 04 5月, 2014 1 次提交
  15. 11 3月, 2013 1 次提交
  16. 06 10月, 2012 1 次提交
    • D
      sparc64: Niagara-4 bzero/memset, plus use MRU stores in page copy. · 9f825962
      David S. Miller 提交于
      This adds optimized memset/bzero/page-clear routines for Niagara-4.
      
      We basically can do what powerpc has been able to do for a decade (via
      the "dcbz" instruction), which is use cache line clearing stores for
      bzero and memsets with a 'c' argument of zero.
      
      As long as we make the cache initializing store to each 32-byte
      subblock of the L2 cache line, it works.
      
      As with other Niagara-4 optimized routines, the key is to make sure to
      avoid any usage of the %asi register, as reads and writes to it cost
      at least 50 cycles.
      
      For the user clear cases, we don't use these new routines, we use the
      Niagara-1 variants instead.  Those have to use %asi in an unavoidable
      way.
      
      A Niagara-4 8K page clear costs just under 600 cycles.
      
      Add definitions of the MRU variants of the cache initializing store
      ASIs.  By default, cache initializing stores install the line as Least
      Recently Used.  If we know we're going to use the data immediately
      (which is true for page copies and clears) we can use the Most
      Recently Used variant, to decrease the likelyhood of the lines being
      evicted before they get used.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f825962
  17. 27 9月, 2012 1 次提交
  18. 20 5月, 2012 1 次提交
  19. 17 9月, 2011 1 次提交
  20. 03 8月, 2011 1 次提交
  21. 28 7月, 2011 1 次提交
    • D
      sparc: Detect and handle UltraSPARC-T3 cpu types. · 4ba991d3
      David S. Miller 提交于
      The cpu compatible string we look for is "SPARC-T3".
      
      As far as memset/memcpy optimizations go, we treat this chip the same
      as Niagara-T2/T2+.  Use cache initializing stores for memset, and use
      perfetch, FPU block loads, cache initializing stores, and block stores
      for copies.
      
      We use the Niagara-T2 perf support, since T3 is a close relative in
      this regard.  Later we'll add support for the new events T3 can
      report, plus enable T3's new "sample" mode.
      
      For now I haven't added any new ELF hwcap flags.  We probably need
      to add a couple, for example:
      
      T2 and T3 both support the population count instruction in hardware.
      
      T3 supports VIS3 instructions, including support (finally) for
      partitioned shift.  One can also now move directly between float
      and integer registers.
      
      T3 supports instructions meant to help with Galois Field and other HPC
      calculations, such as XOR multiply.  Also there are "OP and negate"
      instructions, for example "fnmul" which is multiply-and-negate.
      
      T3 recognizes the transactional memory opcodes, however since
      transactional memory isn't supported: 1) 'commit' behaves as a NOP and
      2) 'chkpt' always branches 3) 'rdcps' returns all zeros and 4) 'wrcps'
      behaves as a NOP.
      
      So we'll need about 3 new elf capability flags in the end to represent
      all of these things.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ba991d3
  22. 31 3月, 2011 1 次提交
  23. 16 6月, 2009 1 次提交
  24. 28 4月, 2009 1 次提交
  25. 30 3月, 2009 1 次提交
  26. 09 2月, 2009 1 次提交
    • D
      sparc64: Kill .fixup section bloat. · 40bdac7d
      David S. Miller 提交于
      This is an implementation of a suggestion made by Chris Torek:
      --------------------
      Something else I noticed in passing: the EX and EX_LD/EX_ST macros
      scattered throughout the various .S files make a fair bit of .fixup
      code, all of which does the same thing.  At the cost of one symbol
      in copy_in_user.S, you could just have one common two-instruction
      retl-and-mov-1 fixup that they all share.
      --------------------
      
      The following is with a defconfig build:
      
         text	   data	    bss	    dec	    hex	filename
      3972767	 344024	 584449	4901240	 4ac978	vmlinux.orig
      39688877	 344024	 584449	4897360	 4aba50	vmlinux
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40bdac7d
  27. 05 12月, 2008 2 次提交
  28. 01 9月, 2008 1 次提交
  29. 28 4月, 2008 1 次提交
  30. 22 3月, 2008 1 次提交
    • D
      [SPARC64]: Remove most limitations to kernel image size. · 64658743
      David S. Miller 提交于
      Currently kernel images are limited to 8MB in size, and this causes
      problems especially when enabling features that take up a lot of
      kernel image space such as lockdep.
      
      The code now will align the kernel image size up to 4MB and map that
      many locked TLB entries.  So, the only practical limitation is the
      number of available locked TLB entries which is 16 on Cheetah and 64
      on pre-Cheetah sparc64 cpus.  Niagara cpus don't actually have hw
      locked TLB entry support.  Rather, the hypervisor transparently
      provides support for "locked" TLB entries since it runs with physical
      addressing and does the initial TLB miss processing.
      
      Fully utilizing this change requires some help from SILO, a patch for
      which will be submitted to the maintainer.  Essentially, SILO will
      only currently map up to 8MB for the kernel image and that needs to be
      increased.
      
      Note that neither this patch nor the SILO bits will help with network
      booting.  The openfirmware code will only map up to a certain amount
      of kernel image during a network boot and there isn't much we can to
      about that other than to implemented a layered network booting
      facility.  Solaris has this, and calls it "wanboot" and we may
      implement something similar at some point.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64658743
  31. 07 2月, 2008 1 次提交
  32. 17 9月, 2007 1 次提交
    • D
      [SPARC64]: Fix lockdep, particularly on SMP. · 301feb65
      David S. Miller 提交于
      As noted by Al Viro, when we try to call prom_set_trap_table()
      in the SMP trampoline code we try to take the PROM call spinlock
      which doesn't work because the current thread pointer isn't
      valid yet and lockdep depends upon that being correct.
      
      Furthermore, we cannot set the current thread pointer register
      because it can't be properly dereferenced until we return from
      prom_set_trap_table().  Kernel TLB misses only work after that
      call.
      
      So do the PROM call to set the trap table directly instead of
      going through the OBP library C code, and thus avoid the lock
      altogether.
      
      These calls are guarenteed to be serialized fully.
      
      Since there are now no calls to the prom_set_trap_table{_sun4v}()
      library functions, they can be deleted.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      301feb65
  33. 16 8月, 2007 2 次提交
  34. 09 8月, 2007 1 次提交
  35. 25 7月, 2007 1 次提交
    • D
      [SPARC64]: Mark most of initial bootup asm as .text.init.ref_ok · 1966287d
      David S. Miller 提交于
      We can't mark the whole thing init because there are dependencies
      in bootloaders that assume that _start, or whatever the image
      entry value, is 2 instructions before the "HdrS" signature.
      
      In fact, TILO assumes this entry is always at 0x4000, yikes!
      
      Also, right after the bootloader info area there are OBP strings and
      values that get used later in the boot process, and those are not all
      provably .init yet.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1966287d