• D
    mm/hugetlb: fix hugetlb not supporting softdirty tracking · a950b115
    David Hildenbrand 提交于
    stable inclusion
    from stable-v5.10.140
    commit 62af37c5cd7f5fd071086cab645844bf5bcdc0ef
    category: bugfix
    bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
    
    Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=62af37c5cd7f5fd071086cab645844bf5bcdc0ef
    
    --------------------------------
    
    commit f96f7a40 upstream.
    
    Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2.
    
    I observed that hugetlb does not support/expect write-faults in shared
    mappings that would have to map the R/O-mapped page writable -- and I
    found two case where we could currently get such faults and would
    erroneously map an anon page into a shared mapping.
    
    Reproducers part of the patches.
    
    I propose to backport both fixes to stable trees.  The first fix needs a
    small adjustment.
    
    This patch (of 2):
    
    Staring at hugetlb_wp(), one might wonder where all the logic for shared
    mappings is when stumbling over a write-protected page in a shared
    mapping.  In fact, there is none, and so far we thought we could get away
    with that because e.g., mprotect() should always do the right thing and
    map all pages directly writable.
    
    Looks like we were wrong:
    
    --------------------------------------------------------------------------
     #include <stdio.h>
     #include <stdlib.h>
     #include <string.h>
     #include <fcntl.h>
     #include <unistd.h>
     #include <errno.h>
     #include <sys/mman.h>
    
     #define HUGETLB_SIZE (2 * 1024 * 1024u)
    
     static void clear_softdirty(void)
     {
             int fd = open("/proc/self/clear_refs", O_WRONLY);
             const char *ctrl = "4";
             int ret;
    
             if (fd < 0) {
                     fprintf(stderr, "open(clear_refs) failed\n");
                     exit(1);
             }
             ret = write(fd, ctrl, strlen(ctrl));
             if (ret != strlen(ctrl)) {
                     fprintf(stderr, "write(clear_refs) failed\n");
                     exit(1);
             }
             close(fd);
     }
    
     int main(int argc, char **argv)
     {
             char *map;
             int fd;
    
             fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
             if (!fd) {
                     fprintf(stderr, "open() failed\n");
                     return -errno;
             }
             if (ftruncate(fd, HUGETLB_SIZE)) {
                     fprintf(stderr, "ftruncate() failed\n");
                     return -errno;
             }
    
             map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
             if (map == MAP_FAILED) {
                     fprintf(stderr, "mmap() failed\n");
                     return -errno;
             }
    
             *map = 0;
    
             if (mprotect(map, HUGETLB_SIZE, PROT_READ)) {
                     fprintf(stderr, "mmprotect() failed\n");
                     return -errno;
             }
    
             clear_softdirty();
    
             if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) {
                     fprintf(stderr, "mmprotect() failed\n");
                     return -errno;
             }
    
             *map = 0;
    
             return 0;
     }
    --------------------------------------------------------------------------
    
    Above test fails with SIGBUS when there is only a single free hugetlb page.
     # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
     # ./test
     Bus error (core dumped)
    
    And worse, with sufficient free hugetlb pages it will map an anonymous page
    into a shared mapping, for example, messing up accounting during unmap
    and breaking MAP_SHARED semantics:
     # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
     # ./test
     # cat /proc/meminfo | grep HugePages_
     HugePages_Total:       2
     HugePages_Free:        1
     HugePages_Rsvd:    18446744073709551615
     HugePages_Surp:        0
    
    Reason in this particular case is that vma_wants_writenotify() will
    return "true", removing VM_SHARED in vma_set_page_prot() to map pages
    write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
    support softdirty tracking.
    
    Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com
    Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com
    Fixes: 64e45507 ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
    Signed-off-by: NDavid Hildenbrand <david@redhat.com>
    Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
    Cc: Peter Feiner <pfeiner@google.com>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Cyrill Gorcunov <gorcunov@openvz.org>
    Cc: Pavel Emelyanov <xemul@parallels.com>
    Cc: Jamie Liu <jamieliu@google.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: Bjorn Helgaas <bhelgaas@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: <stable@vger.kernel.org>	[3.18+]
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NDavid Hildenbrand <david@redhat.com>
    Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
    Reviewed-by: NWei Li <liwei391@huawei.com>
    a950b115
mmap.c 110.5 KB