• E
    tcg: remove tb_lock · 0ac20318
    Emilio G. Cota 提交于
    Use mmap_lock in user-mode to protect TCG state and the page descriptors.
    In !user-mode, each vCPU has its own TCG state, so no locks needed.
    Per-page locks are used to protect the page descriptors.
    
    Per-TB locks are used in both modes to protect TB jumps.
    
    Some notes:
    
    - tb_lock is removed from notdirty_mem_write by passing a
      locked page_collection to tb_invalidate_phys_page_fast.
    
    - tcg_tb_lookup/remove/insert/etc have their own internal lock(s),
      so there is no need to further serialize access to them.
    
    - do_tb_flush is run in a safe async context, meaning no other
      vCPU threads are running. Therefore acquiring mmap_lock there
      is just to please tools such as thread sanitizer.
    
    - Not visible in the diff, but tb_invalidate_phys_page already
      has an assert_memory_lock.
    
    - cpu_io_recompile is !user-only, so no mmap_lock there.
    
    - Added mmap_unlock()'s before all siglongjmp's that could
      be called in user-mode while mmap_lock is held.
      + Added an assert for !have_mmap_lock() after returning from
        the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic.
    
    Performance numbers before/after:
    
    Host: AMD Opteron(tm) Processor 6376
    
                     ubuntu 17.04 ppc64 bootup+shutdown time
    
      700 +-+--+----+------+------------+-----------+------------*--+-+
          |    +    +      +            +           +           *B    |
          |         before ***B***                            ** *    |
          |tb lock removal ###D###                         ***        |
      600 +-+                                           ***         +-+
          |                                           **         #    |
          |                                        *B*          #D    |
          |                                     *** *         ##      |
      500 +-+                                ***           ###      +-+
          |                             * ***           ###           |
          |                            *B*          # ##              |
          |                          ** *          #D#                |
      400 +-+                      **            ##                 +-+
          |                      **           ###                     |
          |                    **           ##                        |
          |                  **         # ##                          |
      300 +-+  *           B*          #D#                          +-+
          |    B         ***        ###                               |
          |    *       **       ####                                  |
          |     *   ***      ###                                      |
      200 +-+   B  *B     #D#                                       +-+
          |     #B* *   ## #                                          |
          |     #*    ##                                              |
          |    + D##D#     +            +           +            +    |
      100 +-+--+----+------+------------+-----------+------------+--+-+
               1    8      16      Guest CPUs       48           64
      png: https://imgur.com/HwmBHXe
    
                  debian jessie aarch64 bootup+shutdown time
    
      90 +-+--+-----+-----+------------+------------+------------+--+-+
         |    +     +     +            +            +            +    |
         |         before ***B***                                B    |
      80 +tb lock removal ###D###                              **D  +-+
         |                                                   **###    |
         |                                                 **##       |
      70 +-+                                             ** #       +-+
         |                                             ** ##          |
         |                                           **  #            |
      60 +-+                                       *B  ##           +-+
         |                                       **  ##               |
         |                                    ***  #D                 |
      50 +-+                               ***   ##                 +-+
         |                             * **   ###                     |
         |                           **B*  ###                        |
      40 +-+                     ****  # ##                         +-+
         |                   ****     #D#                             |
         |             ***B**      ###                                |
      30 +-+    B***B**        ####                                 +-+
         |    B *   *     # ###                                       |
         |     B       ###D#                                          |
      20 +-+   D  ##D##                                             +-+
         |      D#                                                    |
         |    +     +     +            +            +            +    |
      10 +-+--+-----+-----+------------+------------+------------+--+-+
              1     8     16      Guest CPUs        48           64
      png: https://imgur.com/iGpGFtv
    
    The gains are high for 4-8 CPUs. Beyond that point, however, unrelated
    lock contention significantly hurts scalability.
    Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
    Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
    Signed-off-by: NEmilio G. Cota <cota@braap.org>
    Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
    0ac20318
cpu-exec.c 22.8 KB