• Y
    mm: thp: handle page cache THP correctly in PageTransCompoundMap · c9f8166a
    Yang Shi 提交于
    commit 169226f7e0d275c1879551f37484ef6683579a5c upstream
    
    We have a usecase to use tmpfs as QEMU memory backend and we would like
    to take the advantage of THP as well.  But, our test shows the EPT is
    not PMD mapped even though the underlying THP are PMD mapped on host.
    The number showed by /sys/kernel/debug/kvm/largepage is much less than
    the number of PMD mapped shmem pages as the below:
    
      7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted)
      Size:            4194304 kB
      [snip]
      AnonHugePages:         0 kB
      ShmemPmdMapped:   579584 kB
      [snip]
      Locked:                0 kB
    
      cat /sys/kernel/debug/kvm/largepages
      12
    
    And some benchmarks do worse than with anonymous THPs.
    
    By digging into the code we figured out that commit 127393fb ("mm:
    thp: kvm: fix memory corruption in KVM with THP enabled") checks if
    there is a single PTE mapping on the page for anonymous THP when setting
    up EPT map.  But the _mapcount < 0 check doesn't work for page cache THP
    since every subpage of page cache THP would get _mapcount inc'ed once it
    is PMD mapped, so PageTransCompoundMap() always returns false for page
    cache THP.  This would prevent KVM from setting up PMD mapped EPT entry.
    
    So we need handle page cache THP correctly.  However, when page cache
    THP's PMD gets split, kernel just remove the map instead of setting up
    PTE map like what anonymous THP does.  Before KVM calls get_user_pages()
    the subpages may get PTE mapped even though it is still a THP since the
    page cache THP may be mapped by other processes at the mean time.
    
    Checking its _mapcount and whether the THP has PTE mapped or not.
    Although this may report some false negative cases (PTE mapped by other
    processes), it looks not trivial to make this accurate.
    
    With this fix /sys/kernel/debug/kvm/largepage would show reasonable
    pages are PMD mapped by EPT as the below:
    
      7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted)
      Size:            4194304 kB
      [snip]
      AnonHugePages:         0 kB
      ShmemPmdMapped:   557056 kB
      [snip]
      Locked:                0 kB
    
      cat /sys/kernel/debug/kvm/largepages
      271
    
    And the benchmarks are as same as anonymous THPs.
    
    [yang.shi@linux.alibaba.com: v4]
      Link: http://lkml.kernel.org/r/1571865575-42913-1-git-send-email-yang.shi@linux.alibaba.com
    Link: http://lkml.kernel.org/r/1571769577-89735-1-git-send-email-yang.shi@linux.alibaba.com
    Fixes: dd78fedd ("rmap: support file thp")
    Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
    Reported-by: NGang Deng <gavin.dg@linux.alibaba.com>
    Tested-by: NGang Deng <gavin.dg@linux.alibaba.com>
    Suggested-by: NHugh Dickins <hughd@google.com>
    Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: <stable@vger.kernel.org>    [4.8+]
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
    c9f8166a
mm.h 87.0 KB