1. 29 8月, 2013 1 次提交
    • V
      ARC: MMUv4 preps/1 - Fold PTE K/U access flags · 64b703ef
      Vineet Gupta 提交于
      The current ARC VM code has 13 flags in Page Table entry: some software
      (accesed/dirty/non-linear-maps) and rest hardware specific. With 8k MMU
      page, we need 19 bits for addressing page frame so remaining 13 bits is
      just about enough to accomodate the current flags.
      
      In MMUv4 there are 2 additional flags, SZ (normal or super page) and WT
      (cache access mode write-thru) - and additionally PFN is 20 bits (vs. 19
      before for 8k). Thus these can't be held in current PTE w/o making each
      entry 64bit wide.
      
      It seems there is some scope of compressing the current PTE flags (and
      freeing up a few bits). Currently PTE contains fully orthogonal distinct
      access permissions for kernel and user mode (Kr, Kw, Kx; Ur, Uw, Ux)
      which can be folded into one set (R, W, X). The translation of 3 PTE
      bits into 6 TLB bits (when programming the MMU) can be done based on
      following pre-requites/assumptions:
      
      1. For kernel-mode-only translations (vmalloc: 0x7000_0000 to
         0x7FFF_FFFF), PTE additionally has PAGE_GLOBAL flag set (and user
         space entries can never be global). Thus such a PTE can translate
         to Kr, Kw, Kx (as appropriate) and zero for User mode counterparts.
      
      2. For non global entries, the PTE flags can be used to create mirrored
         K and U TLB bits. This is true after commit a950549c
         "ARC: copy_(to|from)_user() to honor usermode-access permissions"
         which ensured that user-space translations _MUST_ have same access
         permissions for both U/K mode accesses so that  copy_{to,from}_user()
         play fair with fault based CoW break and such...
      
      There is no such thing as free lunch - the cost is slightly infalted
      TLB-Miss Handlers.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      64b703ef
  2. 27 6月, 2013 1 次提交
    • P
      arc: delete __cpuinit usage from all arc files · ce759956
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/arc uses of the __cpuinit macros from
      all C files.  Currently arc does not have any __CPUINIT used in
      assembly files.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      ce759956
  3. 22 6月, 2013 4 次提交
  4. 23 5月, 2013 1 次提交
    • V
      ARC: Brown paper bag bug in macro for checking cache color · 3e87974d
      Vineet Gupta 提交于
      The VM_EXEC check in update_mmu_cache() was getting optimized away
      because of a stupid error in definition of macro addr_not_cache_congruent()
      
      The intention was to have the equivalent of following:
      
      	if (a || (1 ? b : 0))
      
      but we ended up with following:
      
      	if (a || 1 ? b : 0)
      
      And because precedence of '||' is more that that of '?', gcc was optimizing
      away evaluation of <a>
      
      Nasty Repercussions:
      1. For non-aliasing configs it would mean some extraneous dcache flushes
         for non-code pages if U/K mappings were not congruent.
      2. For aliasing config, some needed dcache flush for code pages might
         be missed if U/K mappings were congruent.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      3e87974d
  5. 10 5月, 2013 2 次提交
    • V
      ARC: [mm] Aliasing VIPT dcache support 2/4 · 4102b533
      Vineet Gupta 提交于
      This is the meat of the series which prevents any dcache alias creation
      by always keeping the U and K mapping of a page congruent.
      If a mapping already exists, and other tries to access the page, prev
      one is flushed to physical page (wback+inv)
      
      Essentially flush_dcache_page()/copy_user_highpage() create K-mapping
      of a page, but try to defer flushing, unless U-mapping exist.
      When page is actually mapped to userspace, update_mmu_cache() flushes
      the K-mapping (in certain cases this can be optimised out)
      
      Additonally flush_cache_mm(), flush_cache_range(), flush_cache_page()
      handle the puring of stale userspace mappings on exit/munmap...
      
      flush_anon_page() handles the existing U-mapping for anon page before
      kernel reads it via the GUP path.
      
      Note that while not complete, this is enough to boot a simple
      dynamically linked Busybox based rootfs
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      4102b533
    • V
      ARC: [mm] Aliasing VIPT dcache support 1/4 · 6ec18a81
      Vineet Gupta 提交于
      This preps the low level dcache flush helpers to take vaddr argument in
      addition to the existing paddr to properly flush the VIPT dcache
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      6ec18a81
  6. 07 5月, 2013 3 次提交
    • V
      ARC: [mm] Lazy D-cache flush (non aliasing VIPT) · eacd0e95
      Vineet Gupta 提交于
      flush_dcache_page( ) is MM hook to ensure that a page has consistent
      views between kernel and userspace. Thus it is called when
      
      * kernel writes to a page which at some later point could get mapped to
        userspace (so kernel mapping needs to be flushed-n-inv)
      * kernel is about to read from a page with possible userspace mappings
        (so userspace mappings needs to be made coherent with kernel ones)
      
      However for Non aliasing VIPT dcache, any userspace mapping will always
      be congruent to kernel mapping. Thus d-cache need need not be flushed at
      all (or delayed indefinitely).
      
      The only reason it does need to be flushed is when mapping code pages.
      Since icache doesn't snoop dcache, those dirty dcache lines need to be
      written back to memory and icache line invalidated so that icache lines
      fetch will get the right data.
      
      Decent gains on LMBench fork/exec/sh and File I/O micro-benchmarks.
      
      (1) FPGA @ 80 MHZ
      
      Processor, Processes - times in microseconds - smaller is better
      ------------------------------------------------------------------------------
      Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                                   call  I/O stat clos TCP  inst hndl proc proc proc
      --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
      3.9-rc6-a Linux 3.9.0-r   80 4.79 8.72 66.7 116. 239. 8.39 30.4 4798 14.K 34.K
      3.9-rc6-b Linux 3.9.0-r   80 4.79 8.62 65.4 111. 239. 8.35 29.0 3995 12.K 30.K
      3.9-rc7-c Linux 3.9.0-r   80 4.79 9.00 66.1 106. 239. 8.61 30.4 2858 10.K 24.K
                                                                      ^^^^ ^^^^ ^^^
      
      File & VM system latencies in microseconds - smaller is better
      -------------------------------------------------------------------------------
      Host                 OS   0K File      10K File     Mmap    Prot   Page 100fd
                              Create Delete Create Delete Latency Fault  Fault selct
      --------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
      3.9-rc6-a Linux 3.9.0-r  317.8  204.2 1122.3  375.1 3522.0 4.288     20.7 126.8
      3.9-rc6-b Linux 3.9.0-r  298.7  223.0 1141.6  367.8 3531.0 4.866     20.9 126.4
      3.9-rc7-c Linux 3.9.0-r  278.4  179.2  862.1  339.3 3705.0 3.223     20.3 126.6
                               ^^^^^  ^^^^^  ^^^^^  ^^^^
      
      (2) Customer Silicon @ 500 MHz (166 MHz mem)
      
      ------------------------------------------------------------------------------
      Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                                   call  I/O stat clos TCP  inst hndl proc proc proc
      --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
      abilis-ba Linux 3.9.0-r  497 0.71 1.38 4.58 12.0 35.5 1.40 3.89 2070 5525 13.K
      abilis-ca Linux 3.9.0-r  497 0.71 1.40 4.61 11.8 35.6 1.37 3.92 1411 4317 10.K
                                                                      ^^^^ ^^^^ ^^^
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      eacd0e95
    • V
      ARC: [mm] optimise icache flush for user mappings · 24603fdd
      Vineet Gupta 提交于
      ARC icache doesn't snoop dcache thus executable pages need to be made
      coherent before mapping into userspace in flush_icache_page().
      
      However ARC700 CDU (hardware cache flush module) requires both vaddr
      (index in cache) as well as paddr (tag match) to correctly identify a
      line in the VIPT cache. A typical ARC700 SoC has aliasing icache, thus
      the paddr only based flush_icache_page() API couldn't be implemented
      efficiently. It had to loop thru all possible alias indexes and perform
      the invalidate operation (ofcourse the cache op would only succeed at
      the index(es) where tag matches - typically only 1, but the cost of
      visiting all the cache-bins needs to paid nevertheless).
      
      Turns out however that the vaddr (along with paddr) is available in
      update_mmu_cache() hence better suits ARC icache flush semantics.
      With both vaddr+paddr, exactly one flush operation per line is done.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      24603fdd
    • N
      e3edeb67
  7. 09 4月, 2013 1 次提交
  8. 16 2月, 2013 5 次提交
    • V
      af617428
    • V
      ARC: SMP support · 41195d23
      Vineet Gupta 提交于
      ARC common code to enable a SMP system + ISS provided SMP extensions.
      
      ARC700 natively lacks SMP support, hence some of the core features are
      are only enabled if SoCs have the necessary h/w pixie-dust. This
      includes:
      -Inter Processor Interrupts (IPI)
      -Cache coherency
      -load-locked/store-conditional
      ...
      
      The low level exception handling would be completely broken in SMP
      because we don't have hardware assisted stack switching. Thus a fair bit
      of this code is repurposing the MMU_SCRATCH reg for event handler
      prologues to keep them re-entrant.
      
      Many thanks to Rajeshwar Ranga for his initial "major" contributions to
      SMP Port (back in 2008), and to Noam Camus and Gilad Ben-Yossef for help
      with resurrecting that in 3.2 kernel (2012).
      
      Note that this platform code is again singleton design pattern - so
      multiple SMP platforms won't build at the moment - this deficiency is
      addressed in subsequent patches within this series.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Rajeshwar Ranga <rajeshwar.ranga@gmail.com>
      Cc: Noam Camus <noamc@ezchip.com>
      Cc: Gilad Ben-Yossef <gilad@benyossef.com>
      41195d23
    • V
      ARC: TLB flush Handling · d79e678d
      Vineet Gupta 提交于
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      d79e678d
    • V
      ARC: MMU Exception Handling · cc562d2e
      Vineet Gupta 提交于
      * MMU I-TLB / D-TLB Miss Exceptions
        - Fast Path TLB Refill Handler
        - slowpath TLB creation via do_page_fault() -> update_mmu_cache()
      * Duplicate PD Exception Handler
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      cc562d2e
    • V
      ARC: MMU Context Management · f1f3347d
      Vineet Gupta 提交于
      ARC700 MMU provides for tagging TLB entries with a 8-bit ASID to avoid
      having to flush the TLB every task switch.
      
      It also allows for a quick way to invalidate all the TLB entries for
      task useful for:
      * COW sementics during fork()
      * task exit()ing
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      f1f3347d