1. 10 5月, 2013 1 次提交
  2. 09 5月, 2013 2 次提交
  3. 07 5月, 2013 22 次提交
    • V
      ARC: [mm] Lazy D-cache flush (non aliasing VIPT) · eacd0e95
      Vineet Gupta 提交于
      flush_dcache_page( ) is MM hook to ensure that a page has consistent
      views between kernel and userspace. Thus it is called when
      
      * kernel writes to a page which at some later point could get mapped to
        userspace (so kernel mapping needs to be flushed-n-inv)
      * kernel is about to read from a page with possible userspace mappings
        (so userspace mappings needs to be made coherent with kernel ones)
      
      However for Non aliasing VIPT dcache, any userspace mapping will always
      be congruent to kernel mapping. Thus d-cache need need not be flushed at
      all (or delayed indefinitely).
      
      The only reason it does need to be flushed is when mapping code pages.
      Since icache doesn't snoop dcache, those dirty dcache lines need to be
      written back to memory and icache line invalidated so that icache lines
      fetch will get the right data.
      
      Decent gains on LMBench fork/exec/sh and File I/O micro-benchmarks.
      
      (1) FPGA @ 80 MHZ
      
      Processor, Processes - times in microseconds - smaller is better
      ------------------------------------------------------------------------------
      Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                                   call  I/O stat clos TCP  inst hndl proc proc proc
      --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
      3.9-rc6-a Linux 3.9.0-r   80 4.79 8.72 66.7 116. 239. 8.39 30.4 4798 14.K 34.K
      3.9-rc6-b Linux 3.9.0-r   80 4.79 8.62 65.4 111. 239. 8.35 29.0 3995 12.K 30.K
      3.9-rc7-c Linux 3.9.0-r   80 4.79 9.00 66.1 106. 239. 8.61 30.4 2858 10.K 24.K
                                                                      ^^^^ ^^^^ ^^^
      
      File & VM system latencies in microseconds - smaller is better
      -------------------------------------------------------------------------------
      Host                 OS   0K File      10K File     Mmap    Prot   Page 100fd
                              Create Delete Create Delete Latency Fault  Fault selct
      --------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
      3.9-rc6-a Linux 3.9.0-r  317.8  204.2 1122.3  375.1 3522.0 4.288     20.7 126.8
      3.9-rc6-b Linux 3.9.0-r  298.7  223.0 1141.6  367.8 3531.0 4.866     20.9 126.4
      3.9-rc7-c Linux 3.9.0-r  278.4  179.2  862.1  339.3 3705.0 3.223     20.3 126.6
                               ^^^^^  ^^^^^  ^^^^^  ^^^^
      
      (2) Customer Silicon @ 500 MHz (166 MHz mem)
      
      ------------------------------------------------------------------------------
      Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                                   call  I/O stat clos TCP  inst hndl proc proc proc
      --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
      abilis-ba Linux 3.9.0-r  497 0.71 1.38 4.58 12.0 35.5 1.40 3.89 2070 5525 13.K
      abilis-ca Linux 3.9.0-r  497 0.71 1.40 4.61 11.8 35.6 1.37 3.92 1411 4317 10.K
                                                                      ^^^^ ^^^^ ^^^
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      eacd0e95
    • V
      ARC: [mm] micro-optimize page size icache invalidate · 764531cc
      Vineet Gupta 提交于
      start address is already page aligned and size is const PAGE_SIZE,
      thus fixups for alignment not needed in generated code.
      
      bloat-o-meter vmlinux-mm5 vmlinux
      add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-32 (-32)
      function                                     old     new   delta
      __inv_icache_page                             82      50     -32
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      764531cc
    • V
      ARC: [mm] remove the pessimistic all-alias-invalidate icache helpers · 7f250a0f
      Vineet Gupta 提交于
      No users of this code anymore - so RIP !
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      7f250a0f
    • V
      ARC: [mm] consolidate icache/dcache sync code · 94bad1af
      Vineet Gupta 提交于
      Now that we have same helper used for all icache invalidates (i.e.
      vaddr+paddr based exact line invalidate), consolidate the open coded
      calls into one place.
      
      Also rename flush_icache_range_vaddr => __sync_icache_dcache
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      94bad1af
    • V
      ARC: [mm] optimise icache flush for kernel mappings · 7586bf72
      Vineet Gupta 提交于
      This change continues the theme from prev commit - this time icache
      handling for kernel's own code modification (vmalloc: loadable modules,
      breakpoints for kprobes/kgdb...)
      
      flush_icache_range() calls the CDU icache helper with vaddr to enable
      exact line invalidate.
      
      For a true kernel-virtual mapping, the vaddr is actually virtual hence
      valid as index into cache. For kprobes breakpoint however, the vaddr arg
      is actually paddr - since that's how normal kernel is mapped in ARC
      memory map.  This implies that CDU will use the same addr for
      indexing as for tag match - which is fine since kernel code would only
      have that "implicit" mapping and none other.
      
      This should speed up module loading significantly - specially on default
      ARC700 icache configurations (32k) which alias.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      7586bf72
    • V
      ARC: [mm] optimise icache flush for user mappings · 24603fdd
      Vineet Gupta 提交于
      ARC icache doesn't snoop dcache thus executable pages need to be made
      coherent before mapping into userspace in flush_icache_page().
      
      However ARC700 CDU (hardware cache flush module) requires both vaddr
      (index in cache) as well as paddr (tag match) to correctly identify a
      line in the VIPT cache. A typical ARC700 SoC has aliasing icache, thus
      the paddr only based flush_icache_page() API couldn't be implemented
      efficiently. It had to loop thru all possible alias indexes and perform
      the invalidate operation (ofcourse the cache op would only succeed at
      the index(es) where tag matches - typically only 1, but the cost of
      visiting all the cache-bins needs to paid nevertheless).
      
      Turns out however that the vaddr (along with paddr) is available in
      update_mmu_cache() hence better suits ARC icache flush semantics.
      With both vaddr+paddr, exactly one flush operation per line is done.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      24603fdd
    • V
      ARC: [mm] optimize needless full mm TLB flush on munmap · 8d56bec2
      Vineet Gupta 提交于
      munmap ends up calling tlb_flush() which for ARC was flushing the entire
      TLB unconditionally (by moving the MMU to a new ASID)
      
      do_munmap
        unmap_region
          unmap_vmas
            unmap_single_vma
               unmap_page_range
                  tlb_start_vma
                  zap_pud_range
                  tlb_end_vma()
        tlb_finish_mmu
          tlb_flush()  ---> unconditional flush_tlb_mm()
      
      So even a single page munmap, a frequent operation when uClibc dynamic
      linker (ldso) is loading the dependent shared libraries, would move the
      the ASID multiple times - needlessly invalidating the pre-faulted TLB
      entries (and increasing the rate of ASID wraparound + full TLB flush).
      
      This is now optimised to only be called if tlb->full_mm (which means
      for exit/execve) cases only. And for those cases, flush_tlb_mm() is
      already optimised to be a no-op for mm->mm_users == 0.
      
      So essentially there are no mmore full mm flushes - except for fork which
      anyhow needs it for properly COW'ing parent address space.
      
      munmap now needs to do TLB range flush, which is implemented with
      tlb_end_vma()
      
      Results
      -------
      1. ASID now consistenly moves by 4 during a simple ls (as opposed to 5 or
         7 before).
      
      2. LMBench microbenchmark also shows improvements
      
      Basic system parameters
      ------------------------------------------------------------------------------
      Host                 OS Description              Mhz  tlb  cache  mem scal
                                                           pages line   par load
                                                                 bytes
      --------- ------------- ----------------------- ---- ----- ----- ------ ----
      3.9-rc5-0 Linux 3.9.0-r 3.9-rc5-0404-gcc-4.4-ba   80     8    64 1.1000 1
      3.9-rc5-0 Linux 3.9.0-r 3.9-rc5-0405-avoid-full   80     8    64 1.1200 1
      
      Processor, Processes - times in microseconds - smaller is better
      ------------------------------------------------------------------------------
      Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                                   call  I/O stat clos TCP  inst hndl proc proc proc
      --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
      3.9-rc5-0 Linux 3.9.0-r   80 4.81 8.69 68.6 118. 239. 8.53 31.6 4839 13.K 34.K
      3.9-rc5-0 Linux 3.9.0-r   80 4.46 8.36 53.8 91.3 223. 8.12 24.2 4725 13.K 33.K
      
      File & VM system latencies in microseconds - smaller is better
      -------------------------------------------------------------------------------
      Host                 OS   0K File      10K File     Mmap    Prot   Page 100fd
                              Create Delete Create Delete Latency Fault  Fault selct
      --------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
      3.9-rc5-0 Linux 3.9.0-r  314.7  223.2 1054.9  390.2  3615.0 1.590 20.1 126.6
      3.9-rc5-0 Linux 3.9.0-r  265.8  183.8 1014.2  314.1  3193.0 6.910 18.8 110.4
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      8d56bec2
    • M
      ARC: Add support for nSIM OSCI System C model · a92a5d0d
      Mischa Jonker 提交于
      This adds support for an ARC Virtual Platform. This platform is based on the
      System C standard promoted by the OSCI (Open System C Initiative) and uses
      nSIM to simulate the ARC CPU core itself.
      
      Users can build a virtual SoC by combining System C models of peripherals
      and CPU cores.
      Signed-off-by: NMischa Jonker <mjonker@synopsys.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      a92a5d0d
    • C
      ARC: [TB10x] Adapt device tree to new compatible string · 0dfad77d
      Christian Ruppert 提交于
      The original device tree was written using a slightly different
      implementation of the fixed-factor-clock device tree binding. The
      compatible string must be modified in order to be compatible with the
      new implementation.
      Signed-off-by: NChristian Ruppert <christian.ruppert@abilis.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      0dfad77d
    • C
      ARC: [TB10x] Add support for TB10x platform · 072eb693
      Christian Ruppert 提交于
      Infrastructure required to make the Linux kernel compile and boot on the
      Abilis Systems TB10x series of SOCs based on ARC700 CPUs:
        - Kmake related files (Kconfig, Makefile, tb10x_defconfig)
        - TB10x platform initialisation
      Signed-off-by: NChristian Ruppert <christian.ruppert@abilis.com>
      Signed-off-by: NPierrick Hascoet <pierrick.hascoet@abilis.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      072eb693
    • C
      ARC: [TB10x] Device tree of TB100 and TB101 Development Kits · 2eb9504b
      Christian Ruppert 提交于
      These are the device tree files for the Abilis Systems TB100 and TB101 ICs and
      their respective development kit PCBs. These files are committed in preparation
      of the following patch set which adds support for these chips to the ARC
      platform.
      Signed-off-by: NChristian Ruppert <christian.ruppert@abilis.com>
      Signed-off-by: NPierrick Hascoet <pierrick.hascoet@abilis.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      2eb9504b
    • C
      ARC: Prepare interrupt code for external controllers · a37cdacc
      Christian Ruppert 提交于
      This patch adds some room for CPU-external interrupt controllers in the
      Linux interrupt space. Until now, only the 32 CPU internal interrupt lines
      were supported which does not allow for external interrupt controllers such
      as GPIO modules etc.
      Signed-off-by: NChristian Ruppert <christian.ruppert@abilis.com>
      Signed-off-by: NPierrick Hascoet <pierrick.hascoet@abilis.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      a37cdacc
    • V
      ARC: Allow embedded arc-intc to be properly placed in DT intc hierarchy · c93d8b8c
      Vineet Gupta 提交于
      arc-intc is initialized in arc common code as it is applicable to all
      platforms. However platforms with their own external intc still need to
      refer to it for correct DT interrupt tree hierarchy setup,
      
      e.g.
      static struct of_device_id __initdata tb10x_irq_ids[] = {
      	{ .compatible = "snps,arc700-intc", .data = dummy_init_irq },
      	{ .compatible = "abilis,tb10x_ictl", .data = tb10x_init_irq },
      	{},
      };
      
      The fix is to use the generic irqchip framework to tie all irqchips in
      a special linker section and then call irqchip_init() which calls the
      DT of_irq_init() for all the intc in one go.
      
      That way the platform code need not be aware of arc-intc at all.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      c93d8b8c
    • V
      ARC: [cmdline] Don't overwrite u-boot provided bootargs · 9593a933
      Vineet Gupta 提交于
      The existing code was wrong on several counts:
      
      * uboot provided bootargs were copied into @boot_command_line, only to
        be over-written by setup_machine_fdt(), effectively lost
      
      * @cmdline_p returned by setup_arch() to start_kernel() didn't include
        the DT /bootargs
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      9593a933
    • V
      ARC: [cmdline] Remove CONFIG_CMDLINE · 6971881f
      Vineet Gupta 提交于
      Given that DeviceTree /bootargs can provide similar functionality,
      no point in providing duplicate infrastructure.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      6971881f
    • V
      ARC: [plat-arcfpga] defconfig update · 330db333
      Vineet Gupta 提交于
      * Allow initramfs path to be symlink
      * CONFIG_PREEMPT be default
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      330db333
    • V
      ARC: unaligned access emulation broken if callee-reg dest of LD/ST · ce147c74
      Vineet Gupta 提交于
      The fixup code correctly updates the callee-regs on stack, but
      fails to unwind it into actual register file. Thus userspace won't see
      the update.
      Reported-by: NNoam Camus <noamc@ezchip.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      ce147c74
    • V
      ARC: unaligned access emulation error handling consolidation · c723ea46
      Vineet Gupta 提交于
      If CONFIG_ARC_MISALIGN_ACCESS is not enabled, or if the fixup fails,
      call the same error handler: same signal/si_code to user (SIGBUS)
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      c723ea46
    • V
      ARC: Debug/crash-printing Improvements · bd3c8b11
      Vineet Gupta 提交于
      * Remove the line-break between scratch/callee-regs (sneaked in when we
        converted from printk to pr_*
      
      * Use %pS to print the symbol names of faulting PC (ret pseudo register)
        and BLINK (call return register)
      
      * Don't print user-vma for a kernel crash (only do it for
        print-fatal-signals based regfile dump)
      
      * Verbose print the Interrupt/Exception Enable/Active state
      
      * for main executable link address is 0x10000 based (vs. 0) thus offset
        of faulting PC needs to be adjusted
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      bd3c8b11
    • N
      ARC: fix typo with clock speed · 68e4790e
      Noam Camus 提交于
      Signed-off-by: NNoam Camus <noamc@ezchip.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      68e4790e
    • N
      e3edeb67
    • A
      ARC: Remove non existent refs to GENERIC_KERNEL_EXECVE & GENERIC_KERNEL_THREAD · 0e822845
      Alexander Shiyan 提交于
      This tracks mainline commit ae903caa "Bury the conditionals from
      kernel_thread/kernel_execve series" which we missed out as ARC port was
      not yet mainline.
      
      [vgupta: commit log modified]
      Signed-off-by: NAlexander Shiyan <shc_work@mail.ru>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      0e822845
  4. 17 4月, 2013 1 次提交
    • V
      ARC: [kbuild] Avoid DTB rebuilds if DTS are untouched · a89516b3
      Vineet Gupta 提交于
      Currently, for every ARC kernel build I see the following:
      
      --------------->8-----------------
        DTB    arch/arc/boot/dts/angel4.dtb.S
        AS      arch/arc/boot/dts/angel4.dtb.o
        LD      arch/arc/boot/dts/built-in.o
      rm arch/arc/boot/dts/angel4.dtb.S        <-- forces rebuild next iter
        CHK     kernel/config_data.h
      --------------->8-----------------
      
      This is because *.dts.S is intermediate file in dtb generation and is by
      default deleted by make which needs a ".SECONDARY" hint to NOT do so.
      
      This could have ideally been done in scripts/Makefile.lib - for benefit
      of all, however .SECONDARY doesn't seem to work with wildcards.
      
      Thanks to Stephen for suggesting .SECONDARY (vs .PRECIOUS) and making
      that work using a non wildcard version in arch makefile.
      
      Thanks to James Hogan for pointing out that *.dtb.S now needs to be
      added to clean-files
      Signed-off-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      a89516b3
  5. 09 4月, 2013 13 次提交
  6. 20 3月, 2013 1 次提交
    • V
      ARC: Fix the typo in event identifier flags used by ptrace · 367f3fcd
      Vineet Gupta 提交于
      orig_r8_IS_EXCPN and orig_r8_IS_BRKPT were same values due to a
      copy/paste error. Although it looks bad and is wrong, it really doesn't
      affect gdb working.
      
      orig_r8_IS_BRKPT is the one relevant to debugging (breakpoints), since
      it is used to provide EFA vs. ERET to a ptrace "stop_pc" request.
      
      So when gdb has inserted a breakpoint, orig_r8_IS_BRKPT is already set,
      and anything else (i.e. orig_r8_IS_EXCPN) becoming same as it, really
      doesn't hurt gdb. The corollary case, could be nasty but nobody uses the
      ptrace "stop_pc" request in that case
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      367f3fcd