1. 10 5月, 2013 1 次提交
    • V
      ARC: [mm] Aliasing VIPT dcache support 2/4 · 4102b533
      Vineet Gupta 提交于
      This is the meat of the series which prevents any dcache alias creation
      by always keeping the U and K mapping of a page congruent.
      If a mapping already exists, and other tries to access the page, prev
      one is flushed to physical page (wback+inv)
      
      Essentially flush_dcache_page()/copy_user_highpage() create K-mapping
      of a page, but try to defer flushing, unless U-mapping exist.
      When page is actually mapped to userspace, update_mmu_cache() flushes
      the K-mapping (in certain cases this can be optimised out)
      
      Additonally flush_cache_mm(), flush_cache_range(), flush_cache_page()
      handle the puring of stale userspace mappings on exit/munmap...
      
      flush_anon_page() handles the existing U-mapping for anon page before
      kernel reads it via the GUP path.
      
      Note that while not complete, this is enough to boot a simple
      dynamically linked Busybox based rootfs
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      4102b533
  2. 07 5月, 2013 1 次提交
    • V
      ARC: [mm] optimize needless full mm TLB flush on munmap · 8d56bec2
      Vineet Gupta 提交于
      munmap ends up calling tlb_flush() which for ARC was flushing the entire
      TLB unconditionally (by moving the MMU to a new ASID)
      
      do_munmap
        unmap_region
          unmap_vmas
            unmap_single_vma
               unmap_page_range
                  tlb_start_vma
                  zap_pud_range
                  tlb_end_vma()
        tlb_finish_mmu
          tlb_flush()  ---> unconditional flush_tlb_mm()
      
      So even a single page munmap, a frequent operation when uClibc dynamic
      linker (ldso) is loading the dependent shared libraries, would move the
      the ASID multiple times - needlessly invalidating the pre-faulted TLB
      entries (and increasing the rate of ASID wraparound + full TLB flush).
      
      This is now optimised to only be called if tlb->full_mm (which means
      for exit/execve) cases only. And for those cases, flush_tlb_mm() is
      already optimised to be a no-op for mm->mm_users == 0.
      
      So essentially there are no mmore full mm flushes - except for fork which
      anyhow needs it for properly COW'ing parent address space.
      
      munmap now needs to do TLB range flush, which is implemented with
      tlb_end_vma()
      
      Results
      -------
      1. ASID now consistenly moves by 4 during a simple ls (as opposed to 5 or
         7 before).
      
      2. LMBench microbenchmark also shows improvements
      
      Basic system parameters
      ------------------------------------------------------------------------------
      Host                 OS Description              Mhz  tlb  cache  mem scal
                                                           pages line   par load
                                                                 bytes
      --------- ------------- ----------------------- ---- ----- ----- ------ ----
      3.9-rc5-0 Linux 3.9.0-r 3.9-rc5-0404-gcc-4.4-ba   80     8    64 1.1000 1
      3.9-rc5-0 Linux 3.9.0-r 3.9-rc5-0405-avoid-full   80     8    64 1.1200 1
      
      Processor, Processes - times in microseconds - smaller is better
      ------------------------------------------------------------------------------
      Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                                   call  I/O stat clos TCP  inst hndl proc proc proc
      --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
      3.9-rc5-0 Linux 3.9.0-r   80 4.81 8.69 68.6 118. 239. 8.53 31.6 4839 13.K 34.K
      3.9-rc5-0 Linux 3.9.0-r   80 4.46 8.36 53.8 91.3 223. 8.12 24.2 4725 13.K 33.K
      
      File & VM system latencies in microseconds - smaller is better
      -------------------------------------------------------------------------------
      Host                 OS   0K File      10K File     Mmap    Prot   Page 100fd
                              Create Delete Create Delete Latency Fault  Fault selct
      --------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
      3.9-rc5-0 Linux 3.9.0-r  314.7  223.2 1054.9  390.2  3615.0 1.590 20.1 126.6
      3.9-rc5-0 Linux 3.9.0-r  265.8  183.8 1014.2  314.1  3193.0 6.910 18.8 110.4
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      8d56bec2
  3. 16 2月, 2013 2 次提交