1. 08 2月, 2010 5 次提交
    • M
      sh: Optimise FDE/CIE lookup by using red-black trees · 858918b7
      Matt Fleming 提交于
      Now that the DWARF unwinder is being used to provide perf callstacks
      unwinding speed is an issue. It is no longer being used in exceptional
      circumstances where we don't care about runtime performance, e.g. when
      panicing, so it makes sense improve performance is possible.
      
      With this patch I saw a 42% improvement in unwind time when calling
      return_address(1). Greater improvements will be seen as the number of
      levels unwound increases as each unwind is now cheaper.
      
      Note that insertion time has doubled but that's just the price we pay
      for keeping the trees balanced. However, this is a one-time cost for
      kernel boot/module load and so the improvements in lookup time dominate
      the extra time we spend keeping the trees balanced.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      858918b7
    • M
      sh: Remove superfluous setup_frame_reg call · 1af0b2fc
      Matt Fleming 提交于
      There's no need to setup the frame pointer again in
      call_handle_tlbmiss. The frame pointer will already have been setup in
      handle_interrupt.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      1af0b2fc
    • M
      sh: Don't continue unwinding across interrupts · 944a3438
      Matt Fleming 提交于
      Unfortunately, due to poor DWARF info in current toolchains, unwinding
      through interrutps cannot be done reliably. The problem is that the
      DWARF info for function epilogues is wrong.
      
      Take this standard epilogue sequence,
      
      80003cc4:       e3 6f           mov     r14,r15
      80003cc6:       26 4f           lds.l   @r15+,pr
      80003cc8:       f6 6e           mov.l   @r15+,r14
      						<---- interrupt here
      80003cca:       f6 6b           mov.l   @r15+,r11
      80003ccc:       f6 6a           mov.l   @r15+,r10
      80003cce:       f6 69           mov.l   @r15+,r9
      80003cd0:       0b 00           rts
      
      If we take an interrupt at the highlighted point, the DWARF info will
      bogusly claim that the return address can be found at some offset from
      the frame pointer, even though the frame pointer was just restored. The
      worst part is if the unwinder finds a text address at the bogus stack
      address - unwinding will continue, for a bit, until it finally comes
      across an unexpected address on the stack and blows up.
      
      The only solution is to stop unwinding once we've calculated the
      function that was executing when the interrupt occurred. This PC can be
      easily calculated from pt_regs->pc.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      944a3438
    • M
      sh: Setup frame pointer in handle_exception path · 1dca56f1
      Matt Fleming 提交于
      In order to allow the DWARF unwinder to unwind through exceptions we
      need to setup the frame pointer register (r14).
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      1dca56f1
    • M
      sh: Correct the offset of the return address in ret_from_exception · 14269828
      Matt Fleming 提交于
      The address that ret_from_exception and ret_from_irq will return to is
      found in the stack slot for SPC, not PR. This error was causing the
      DWARF unwinder to pick up the wrong return address on the stack and then
      unwind using the unwind tables for the wrong function.
      
      While I'm here I might as well add CFI annotations for the other
      registers since they could be useful when unwinding.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      14269828
  2. 02 2月, 2010 3 次提交
  3. 30 1月, 2010 1 次提交
    • L
      Split 'flush_old_exec' into two functions · 221af7f8
      Linus Torvalds 提交于
      'flush_old_exec()' is the point of no return when doing an execve(), and
      it is pretty badly misnamed.  It doesn't just flush the old executable
      environment, it also starts up the new one.
      
      Which is very inconvenient for things like setting up the new
      personality, because we want the new personality to affect the starting
      of the new environment, but at the same time we do _not_ want the new
      personality to take effect if flushing the old one fails.
      
      As a result, the x86-64 '32-bit' personality is actually done using this
      insane "I'm going to change the ABI, but I haven't done it yet" bit
      (TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the
      personality, but just the "pending" bit, so that "flush_thread()" can do
      the actual personality magic.
      
      This patch in no way changes any of that insanity, but it does split the
      'flush_old_exec()' function up into a preparatory part that can fail
      (still called flush_old_exec()), and a new part that will actually set
      up the new exec environment (setup_new_exec()).  All callers are changed
      to trivially comply with the new world order.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      221af7f8
  4. 27 1月, 2010 2 次提交
  5. 26 1月, 2010 1 次提交
    • P
      sh: Mass ctrl_in/outX to __raw_read/writeX conversion. · 9d56dd3b
      Paul Mundt 提交于
      The old ctrl in/out routines are non-portable and unsuitable for
      cross-platform use. While drivers/sh has already been sanitized, there
      is still quite a lot of code that is not. This converts the arch/sh/ bits
      over, which permits us to flag the routines as deprecated whilst still
      building with -Werror for the architecture code, and to ensure that
      future users are not added.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      9d56dd3b
  6. 21 1月, 2010 4 次提交
    • P
      sh: Kill off the special uncached section and fixmap. · 2dc2f8e0
      Paul Mundt 提交于
      Now that cached_to_uncached works as advertized in 32-bit mode and we're
      never going to be able to map < 16MB anyways, there's no need for the
      special uncached section. Kill it off.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      2dc2f8e0
    • P
      sh: Track the uncached mapping size. · 3125ee72
      Paul Mundt 提交于
      This provides a variable for tracking the uncached mapping size, and uses
      it for pretty printing the uncached lowmem range. Beyond this, we'll also
      be building on top of this for figuring out from where the remainder of
      P2 becomes usable when constructing unrelated mappings.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      3125ee72
    • P
      sh: Rework P2 to only include kernel text. · 2023b843
      Paul Mundt 提交于
      This effectively neutralizes P2 by getting rid of P1 identity mapping
      for all available memory and instead only establishes a single unbuffered
      PMB entry (16MB -- the smallest available) that covers the kernel.
      
      As using segmentation for abusing caching attributes in drivers is no
      longer supported (and there are no drivers that can be enabled in 32-bit
      mode that do this), this provides us with all of the uncached access
      needs by the kernel itself.
      
      Drivers and their ilk need to specify their caching attributes when
      remapping through page tables, as usual.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      2023b843
    • P
      sh: initial PMB mapping iteration by helper macro. · 77c2019f
      Paul Mundt 提交于
      All of the cached/uncached mapping setup is duplicated for each size, and
      also misses out on the 16MB case. Rather than duplicating the same iter
      code for that we just consolidate it in to a helper macro that builds an
      iter for each size. The 16MB case is then trivially bolted on at the end.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      77c2019f
  7. 20 1月, 2010 2 次提交
  8. 19 1月, 2010 7 次提交
  9. 18 1月, 2010 2 次提交
    • P
      sh: Need IRQs enabled for init_fpu(). · 4291b730
      Paul Mundt 提交于
      This tosses in a local_irq_enable()/disable() pair around the init_fpu()
      callsite in the FPU state restore exception handler. Fixes up a slab BUG
      triggered by making a slab cache allocation that can sleep whilst
      irqs_disabled(). This follows the behaviour undertaken by the x86
      implementation.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      4291b730
    • M
      sh: Setup early PMB mappings. · 3d467676
      Matt Fleming 提交于
      More and more boards are going to start shipping that boot with the MMU
      in 32BIT mode by default. Previously we relied on the bootloader to
      setup PMB mappings for use by the kernel but we also need to cater for
      boards whose bootloaders don't set them up.
      
      If CONFIG_PMB_LEGACY is not enabled we have full control over our PMB
      mappings and can compress our address space. Usually, the distance
      between the the cached and uncached mappings of RAM is always 512MB,
      however we can compress the distance to be the amount of RAM on the
      board.
      
      pmb_init() now becomes much simpler. It no longer has to calculate any
      mappings, it just has to synchronise the software PMB table with the
      hardware.
      
      Tested on SDK7786 and SH7785LCR.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      3d467676
  10. 16 1月, 2010 1 次提交
    • M
      sh: Add fixed ioremap support · 4d35b93a
      Matt Fleming 提交于
      Some devices need to be ioremap'd and accessed very early in the boot
      process. It is not possible to use the standard ioremap() function in
      this case because that requires kmalloc()'ing some virtual address space
      and kmalloc() may not be available so early in boot.
      
      This patch provides fixmap mappings that allow physical address ranges
      to be remapped into the kernel address space during the early boot
      stages.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      4d35b93a
  11. 15 1月, 2010 1 次提交
  12. 13 1月, 2010 4 次提交
    • P
      sh: Fix up L2 cache comment typo. · 88f73d22
      Paul Mundt 提交于
      Valid sizes include 256kB, not 258kB.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      88f73d22
    • P
      sh: fixed PMB mode refactoring. · a0ab3668
      Paul Mundt 提交于
      This introduces some much overdue chainsawing of the fixed PMB support.
      fixed PMB was introduced initially to work around the fact that dynamic
      PMB mode was relatively broken, though they were never intended to
      converge. The main areas where there are differences are whether the
      system is booted in 29-bit mode or 32-bit mode, and whether legacy
      mappings are to be preserved. Any system booting in true 32-bit mode will
      not care about legacy mappings, so these are roughly decoupled.
      
      Regardless of the entry point, PMB and 32BIT are directly related as far
      as the kernel is concerned, so we also switch back to having one select
      the other.
      
      With legacy mappings iterated through and applied in the initialization
      path it's now possible to finally merge the two implementations and
      permit dynamic remapping overtop of remaining entries regardless of
      whether boot mappings are crafted by hand or inherited from the boot
      loader.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      a0ab3668
    • M
      sh: PVR detection for 2nd cut SH7786. · 7f33306e
      Matt Fleming 提交于
      The mass produced cuts use an updated PVR value, add them to the list.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      7f33306e
    • P
      sh: Move over to dynamically allocated FPU context. · 0ea820cf
      Paul Mundt 提交于
      This follows the x86 xstate changes and implements a task_xstate slab
      cache that is dynamically sized to match one of hard FP/soft FP/FPU-less.
      
      This also tidies up and consolidates some of the SH-2A/SH-4 FPU
      fragmentation. Now fpu state restorers are commonly defined, with the
      init_fpu()/fpu_init() mess reworked to follow the x86 convention.
      The fpu_init() register initialization has been replaced by xstate setup
      followed by writing out to hardware via the standard restore path.
      
      As init_fpu() now performs a slab allocation a secondary lighterweight
      restorer is also introduced for the context switch.
      
      In the future the DSP state will be rolled in here, too.
      
      More work remains for math emulation and the SH-5 FPU, which presently
      uses its own special (UP-only) interfaces.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      0ea820cf
  13. 12 1月, 2010 7 次提交