• K
    mm: replace remap_file_pages() syscall with emulation · c8d78c18
    Kirill A. Shutemov 提交于
    remap_file_pages(2) was invented to be able efficiently map parts of
    huge file into limited 32-bit virtual address space such as in database
    workloads.
    
    Nonlinear mappings are pain to support and it seems there's no
    legitimate use-cases nowadays since 64-bit systems are widely available.
    
    Let's drop it and get rid of all these special-cased code.
    
    The patch replaces the syscall with emulation which creates new VMA on
    each remap_file_pages(), unless they it can be merged with an adjacent
    one.
    
    I didn't find *any* real code that uses remap_file_pages(2) to test
    emulation impact on.  I've checked Debian code search and source of all
    packages in ALT Linux.  No real users: libc wrappers, mentions in
    strace, gdb, valgrind and this kind of stuff.
    
    There are few basic tests in LTP for the syscall.  They work just fine
    with emulation.
    
    To test performance impact, I've written small test case which
    demonstrate pretty much worst case scenario: map 4G shmfs file, write to
    begin of every page pgoff of the page, remap pages in reverse order,
    read every page.
    
    The test creates 1 million of VMAs if emulation is in use, so I had to
    set vm.max_map_count to 1100000 to avoid -ENOMEM.
    
    Before:		23.3 ( +-  4.31% ) seconds
    After:		43.9 ( +-  0.85% ) seconds
    Slowdown:	1.88x
    
    I believe we can live with that.
    
    Test case:
    
            #define _GNU_SOURCE
            #include <assert.h>
            #include <stdlib.h>
            #include <stdio.h>
            #include <sys/mman.h>
    
            #define MB	(1024UL * 1024)
            #define SIZE	(4096 * MB)
    
            int main(int argc, char **argv)
            {
                    unsigned long *p;
                    long i, pass;
    
                    for (pass = 0; pass < 10; pass++) {
                            p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
                                            MAP_SHARED | MAP_ANONYMOUS, -1, 0);
                            if (p == MAP_FAILED) {
                                    perror("mmap");
                                    return -1;
                            }
    
                            for (i = 0; i < SIZE / 4096; i++)
                                    p[i * 4096 / sizeof(*p)] = i;
    
                            for (i = 0; i < SIZE / 4096; i++) {
                                    if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096,
                                                    0, (SIZE - 4096 * (i + 1)) >> 12, 0)) {
                                            perror("remap_file_pages");
                                            return -1;
                                    }
                            }
    
                            for (i = SIZE / 4096 - 1; i >= 0; i--)
                                    assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1);
    
                            munmap(p, SIZE);
                    }
    
                    return 0;
            }
    
    [akpm@linux-foundation.org: fix spello]
    [sasha.levin@oracle.com: initialize populate before usage]
    [sasha.levin@oracle.com: grab file ref to prevent race while mmaping]
    Signed-off-by: N"Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Dave Jones <davej@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Armin Rigo <arigo@tunes.org>
    Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
    Cc: Hugh Dickins <hughd@google.com>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    c8d78c18
Makefile 2.5 KB