1. 10 5月, 2013 4 次提交
  2. 07 5月, 2013 6 次提交
    • V
      ARC: [mm] Lazy D-cache flush (non aliasing VIPT) · eacd0e95
      Vineet Gupta 提交于
      flush_dcache_page( ) is MM hook to ensure that a page has consistent
      views between kernel and userspace. Thus it is called when
      
      * kernel writes to a page which at some later point could get mapped to
        userspace (so kernel mapping needs to be flushed-n-inv)
      * kernel is about to read from a page with possible userspace mappings
        (so userspace mappings needs to be made coherent with kernel ones)
      
      However for Non aliasing VIPT dcache, any userspace mapping will always
      be congruent to kernel mapping. Thus d-cache need need not be flushed at
      all (or delayed indefinitely).
      
      The only reason it does need to be flushed is when mapping code pages.
      Since icache doesn't snoop dcache, those dirty dcache lines need to be
      written back to memory and icache line invalidated so that icache lines
      fetch will get the right data.
      
      Decent gains on LMBench fork/exec/sh and File I/O micro-benchmarks.
      
      (1) FPGA @ 80 MHZ
      
      Processor, Processes - times in microseconds - smaller is better
      ------------------------------------------------------------------------------
      Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                                   call  I/O stat clos TCP  inst hndl proc proc proc
      --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
      3.9-rc6-a Linux 3.9.0-r   80 4.79 8.72 66.7 116. 239. 8.39 30.4 4798 14.K 34.K
      3.9-rc6-b Linux 3.9.0-r   80 4.79 8.62 65.4 111. 239. 8.35 29.0 3995 12.K 30.K
      3.9-rc7-c Linux 3.9.0-r   80 4.79 9.00 66.1 106. 239. 8.61 30.4 2858 10.K 24.K
                                                                      ^^^^ ^^^^ ^^^
      
      File & VM system latencies in microseconds - smaller is better
      -------------------------------------------------------------------------------
      Host                 OS   0K File      10K File     Mmap    Prot   Page 100fd
                              Create Delete Create Delete Latency Fault  Fault selct
      --------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
      3.9-rc6-a Linux 3.9.0-r  317.8  204.2 1122.3  375.1 3522.0 4.288     20.7 126.8
      3.9-rc6-b Linux 3.9.0-r  298.7  223.0 1141.6  367.8 3531.0 4.866     20.9 126.4
      3.9-rc7-c Linux 3.9.0-r  278.4  179.2  862.1  339.3 3705.0 3.223     20.3 126.6
                               ^^^^^  ^^^^^  ^^^^^  ^^^^
      
      (2) Customer Silicon @ 500 MHz (166 MHz mem)
      
      ------------------------------------------------------------------------------
      Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                                   call  I/O stat clos TCP  inst hndl proc proc proc
      --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
      abilis-ba Linux 3.9.0-r  497 0.71 1.38 4.58 12.0 35.5 1.40 3.89 2070 5525 13.K
      abilis-ca Linux 3.9.0-r  497 0.71 1.40 4.61 11.8 35.6 1.37 3.92 1411 4317 10.K
                                                                      ^^^^ ^^^^ ^^^
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      eacd0e95
    • V
      ARC: [mm] consolidate icache/dcache sync code · 94bad1af
      Vineet Gupta 提交于
      Now that we have same helper used for all icache invalidates (i.e.
      vaddr+paddr based exact line invalidate), consolidate the open coded
      calls into one place.
      
      Also rename flush_icache_range_vaddr => __sync_icache_dcache
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      94bad1af
    • V
      ARC: [mm] optimise icache flush for user mappings · 24603fdd
      Vineet Gupta 提交于
      ARC icache doesn't snoop dcache thus executable pages need to be made
      coherent before mapping into userspace in flush_icache_page().
      
      However ARC700 CDU (hardware cache flush module) requires both vaddr
      (index in cache) as well as paddr (tag match) to correctly identify a
      line in the VIPT cache. A typical ARC700 SoC has aliasing icache, thus
      the paddr only based flush_icache_page() API couldn't be implemented
      efficiently. It had to loop thru all possible alias indexes and perform
      the invalidate operation (ofcourse the cache op would only succeed at
      the index(es) where tag matches - typically only 1, but the cost of
      visiting all the cache-bins needs to paid nevertheless).
      
      Turns out however that the vaddr (along with paddr) is available in
      update_mmu_cache() hence better suits ARC icache flush semantics.
      With both vaddr+paddr, exactly one flush operation per line is done.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      24603fdd
    • V
      ARC: [mm] optimize needless full mm TLB flush on munmap · 8d56bec2
      Vineet Gupta 提交于
      munmap ends up calling tlb_flush() which for ARC was flushing the entire
      TLB unconditionally (by moving the MMU to a new ASID)
      
      do_munmap
        unmap_region
          unmap_vmas
            unmap_single_vma
               unmap_page_range
                  tlb_start_vma
                  zap_pud_range
                  tlb_end_vma()
        tlb_finish_mmu
          tlb_flush()  ---> unconditional flush_tlb_mm()
      
      So even a single page munmap, a frequent operation when uClibc dynamic
      linker (ldso) is loading the dependent shared libraries, would move the
      the ASID multiple times - needlessly invalidating the pre-faulted TLB
      entries (and increasing the rate of ASID wraparound + full TLB flush).
      
      This is now optimised to only be called if tlb->full_mm (which means
      for exit/execve) cases only. And for those cases, flush_tlb_mm() is
      already optimised to be a no-op for mm->mm_users == 0.
      
      So essentially there are no mmore full mm flushes - except for fork which
      anyhow needs it for properly COW'ing parent address space.
      
      munmap now needs to do TLB range flush, which is implemented with
      tlb_end_vma()
      
      Results
      -------
      1. ASID now consistenly moves by 4 during a simple ls (as opposed to 5 or
         7 before).
      
      2. LMBench microbenchmark also shows improvements
      
      Basic system parameters
      ------------------------------------------------------------------------------
      Host                 OS Description              Mhz  tlb  cache  mem scal
                                                           pages line   par load
                                                                 bytes
      --------- ------------- ----------------------- ---- ----- ----- ------ ----
      3.9-rc5-0 Linux 3.9.0-r 3.9-rc5-0404-gcc-4.4-ba   80     8    64 1.1000 1
      3.9-rc5-0 Linux 3.9.0-r 3.9-rc5-0405-avoid-full   80     8    64 1.1200 1
      
      Processor, Processes - times in microseconds - smaller is better
      ------------------------------------------------------------------------------
      Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                                   call  I/O stat clos TCP  inst hndl proc proc proc
      --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
      3.9-rc5-0 Linux 3.9.0-r   80 4.81 8.69 68.6 118. 239. 8.53 31.6 4839 13.K 34.K
      3.9-rc5-0 Linux 3.9.0-r   80 4.46 8.36 53.8 91.3 223. 8.12 24.2 4725 13.K 33.K
      
      File & VM system latencies in microseconds - smaller is better
      -------------------------------------------------------------------------------
      Host                 OS   0K File      10K File     Mmap    Prot   Page 100fd
                              Create Delete Create Delete Latency Fault  Fault selct
      --------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
      3.9-rc5-0 Linux 3.9.0-r  314.7  223.2 1054.9  390.2  3615.0 1.590 20.1 126.6
      3.9-rc5-0 Linux 3.9.0-r  265.8  183.8 1014.2  314.1  3193.0 6.910 18.8 110.4
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      8d56bec2
    • C
      ARC: [TB10x] Add support for TB10x platform · 072eb693
      Christian Ruppert 提交于
      Infrastructure required to make the Linux kernel compile and boot on the
      Abilis Systems TB10x series of SOCs based on ARC700 CPUs:
        - Kmake related files (Kconfig, Makefile, tb10x_defconfig)
        - TB10x platform initialisation
      Signed-off-by: NChristian Ruppert <christian.ruppert@abilis.com>
      Signed-off-by: NPierrick Hascoet <pierrick.hascoet@abilis.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      072eb693
    • C
      ARC: Prepare interrupt code for external controllers · a37cdacc
      Christian Ruppert 提交于
      This patch adds some room for CPU-external interrupt controllers in the
      Linux interrupt space. Until now, only the 32 CPU internal interrupt lines
      were supported which does not allow for external interrupt controllers such
      as GPIO modules etc.
      Signed-off-by: NChristian Ruppert <christian.ruppert@abilis.com>
      Signed-off-by: NPierrick Hascoet <pierrick.hascoet@abilis.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      a37cdacc
  3. 09 4月, 2013 1 次提交
    • C
      ARC: Add implicit compiler barrier to raw_local_irq* functions · 79e5f05e
      Christian Ruppert 提交于
      ARC irqsave/restore macros were missing the compiler barrier, causing a
      stale load in irq-enabled region be used in irq-safe region, despite
      being changed, because the register holding the value was still live.
      
      The problem manifested as random crashes in timer code when stress
      testing ARCLinux (3.9-rc3) on a !SMP && !PREEMPT_COUNT
      
      Here's the exact sequence which caused this:
       (0). tv1[x] <----> t1 <---> t2
       (1). mod_timer(t1) interrupted after it calls timer_pending()
       (2). mod_timer(t2) completes
       (3). mod_timer(t1) resumes but messes up the list
       (4). __runt_timers( ) uses bogus timer_list entry / crashes in
            timer->function
      
      Essentially mod_timer() was racing against itself and while the spinlock
      serialized the tv1[] timer link list, timer_pending() called outside the
      spinlock, cached timer link list element in a register.
      With low register pressure (and a deep register file), lack of barrier
      in raw_local_irqsave() as well as preempt_disable (!PREEMPT_COUNT
      version), there was nothing to force gcc to reload across the spinlock,
      causing a stale value in reg be used for link list manipulation - ensuing
      a corruption.
      
      ARcompact disassembly which shows the culprit generated code:
      
      mod_timer:
          push_s blink
          mov_s r13,r0	# timer, timer
      ..
          ###### timer_pending( )
          ld_s r3,[r13]       # <------ <variable>.entry.next LOADED
          brne r3, 0, @.L163
      
      .L163:
      ..
          ###### spin_lock_irq( )
          lr  r5, [status32]  # flags
          bic r4, r5, 6       # temp, flags,
          and.f 0, r5, 6      # flags,
          flag.nz r4
      
          ###### detach_if_pending( ) begins
      
          tst_s r3,r3  <--------------
      			# timer_pending( ) checks timer->entry.next
                              # r3 is NOT reloaded by gcc, using stale value
          beq.d @.L169
          mov.eq r0,0
      
          #####  detach_timer( ): __list_del( )
      
          ld r4,[r13,4]    	# <variable>.entry.prev, D.31439
          st r4,[r3,4]     	# <variable>.prev, D.31439
          st r3,[r4]       	# <variable>.next, D.30246
      
      We initially tried to fix this by adding barrier() to preempt_* macros
      for !PREEMPT_COUNT but Linus clarified that it was anything but wrong.
      http://www.spinics.net/lists/kernel/msg1512709.html
      
      [vgupta: updated commitlog]
      
      Reported-by/Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com>
      Cc: Christian Ruppert <christian.ruppert@abilis.com>
      Cc: Pierrick Hascoet <pierrick.hascoet@abilis.com>
      Debugged-by/Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79e5f05e
  4. 20 3月, 2013 1 次提交
    • V
      ARC: Fix the typo in event identifier flags used by ptrace · 367f3fcd
      Vineet Gupta 提交于
      orig_r8_IS_EXCPN and orig_r8_IS_BRKPT were same values due to a
      copy/paste error. Although it looks bad and is wrong, it really doesn't
      affect gdb working.
      
      orig_r8_IS_BRKPT is the one relevant to debugging (breakpoints), since
      it is used to provide EFA vs. ERET to a ptrace "stop_pc" request.
      
      So when gdb has inserted a breakpoint, orig_r8_IS_BRKPT is already set,
      and anything else (i.e. orig_r8_IS_EXCPN) becoming same as it, really
      doesn't hurt gdb. The corollary case, could be nasty but nobody uses the
      ptrace "stop_pc" request in that case
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      367f3fcd
  5. 19 3月, 2013 1 次提交
  6. 18 3月, 2013 1 次提交
  7. 11 3月, 2013 2 次提交
    • V
      ARC: ABIv3: fork/vfork wrappers not needed in "no-legacy-syscall" ABI · 180d406e
      Vineet Gupta 提交于
      When switching to clone() only ABI - I missed out pruning the low level
      asm syscall wrappers
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      180d406e
    • V
      ARC: make allyesconfig build breakages · 1540c85b
      Vineet Gupta 提交于
        CC      drivers/mmc/host/mmc_spi.o
      drivers/mmc/host/mmc_spi.c:118: error: redefinition of 'struct scratch'
      make[3]: *** [drivers/mmc/host/mmc_spi.o] Error 1
      make[2]: *** [drivers/mmc/host] Error 2
      make[1]: *** [drivers/mmc] Error 2
      make: *** [drivers] Error 2
      
        CC      arch/arc/kernel/kgdb.o
      In file included from include/linux/kgdb.h:20,
                       from arch/arc/kernel/kgdb.c:11:
      /home/vineetg/arc/k.org/arc-port/arch/arc/include/asm/kgdb.h:34:
      warning: 'struct pt_regs' declared inside parameter list
      /home/vineetg/arc/k.org/arc-port/arch/arc/include/asm/kgdb.h:34:
      warning: its scope is only this definition or declaration, which is
      probably not what you want
      arch/arc/kernel/kgdb.c:172: error: conflicting types for 'kgdb_trap'
      
        CC      arch/arc/kernel/kgdb.o
      arch/arc/kernel/kgdb.c: In function 'pt_regs_to_gdb_regs':
      arch/arc/kernel/kgdb.c:62: error: dereferencing pointer to incomplete
      type
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      1540c85b
  8. 27 2月, 2013 3 次提交
  9. 26 2月, 2013 1 次提交
  10. 16 2月, 2013 20 次提交