• H
    mm/swap: add cluster lock · 235b6217
    Huang, Ying 提交于
    This patch is to reduce the lock contention of swap_info_struct->lock
    via using a more fine grained lock in swap_cluster_info for some swap
    operations.  swap_info_struct->lock is heavily contended if multiple
    processes reclaim pages simultaneously.  Because there is only one lock
    for each swap device.  While in common configuration, there is only one
    or several swap devices in the system.  The lock protects almost all
    swap related operations.
    
    In fact, many swap operations only access one element of
    swap_info_struct->swap_map array.  And there is no dependency between
    different elements of swap_info_struct->swap_map.  So a fine grained
    lock can be used to allow parallel access to the different elements of
    swap_info_struct->swap_map.
    
    In this patch, a spinlock is added to swap_cluster_info to protect the
    elements of swap_info_struct->swap_map in the swap cluster and the
    fields of swap_cluster_info.  This reduced locking contention for
    swap_info_struct->swap_map access greatly.
    
    Because of the added spinlock, the size of swap_cluster_info increases
    from 4 bytes to 8 bytes on the 64 bit and 32 bit system.  This will use
    additional 4k RAM for every 1G swap space.
    
    Because the size of swap_cluster_info is much smaller than the size of
    the cache line (8 vs 64 on x86_64 architecture), there may be false
    cache line sharing between spinlocks in swap_cluster_info.  To avoid the
    false sharing in the first round of the swap cluster allocation, the
    order of the swap clusters in the free clusters list is changed.  So
    that, the swap_cluster_info sharing the same cache line will be placed
    as far as possible.  After the first round of allocation, the order of
    the clusters in free clusters list is expected to be random.  So the
    false sharing should be not serious.
    
    Compared with a previous implementation using bit_spin_lock, the
    sequential swap out throughput improved about 3.2%.  Test was done on a
    Xeon E5 v3 system.  The swap device used is a RAM simulated PMEM
    (persistent memory) device.  To test the sequential swapping out, the
    test case created 32 processes, which sequentially allocate and write to
    the anonymous pages until the RAM and part of the swap device is used.
    
    [ying.huang@intel.com: v5]
      Link: http://lkml.kernel.org/r/878tqeuuic.fsf_-_@yhuang-dev.intel.com
    [minchan@kernel.org: initialize spinlock for swap_cluster_info]
      Link: http://lkml.kernel.org/r/1486434945-29753-1-git-send-email-minchan@kernel.org
    [hughd@google.com: annotate nested locking for cluster lock]
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1702161050540.21773@eggly.anvils
    Link: http://lkml.kernel.org/r/dbb860bbd825b1aaba18988015e8963f263c3f0d.1484082593.git.tim.c.chen@linux.intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
    Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
    Signed-off-by: NMinchan Kim <minchan@kernel.org>
    Signed-off-by: NHugh Dickins <hughd@google.com>
    Cc: Aaron Lu <aaron.lu@intel.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Christian Borntraeger <borntraeger@de.ibm.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
    Cc: Huang Ying <ying.huang@intel.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Jonathan Corbet <corbet@lwn.net> escreveu:
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Shaohua Li <shli@kernel.org>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    235b6217
swapfile.c 80.6 KB