• K
    mm: fix mprotect() behaviour on VM_LOCKED VMAs · 36f88188
    Kirill A. Shutemov 提交于
    On mlock(2) we trigger COW on private writable VMA to avoid faults in
    future.
    
    mm/gup.c:
     840 long populate_vma_page_range(struct vm_area_struct *vma,
     841                 unsigned long start, unsigned long end, int *nonblocking)
     842 {
     ...
     855          * We want to touch writable mappings with a write fault in order
     856          * to break COW, except for shared mappings because these don't COW
     857          * and we would not want to dirty them for nothing.
     858          */
     859         if ((vma->vm_flags & (VM_WRITE | VM_SHARED)) == VM_WRITE)
     860                 gup_flags |= FOLL_WRITE;
    
    But we miss this case when we make VM_LOCKED VMA writeable via
    mprotect(2). The test case:
    
    	#define _GNU_SOURCE
    	#include <fcntl.h>
    	#include <stdio.h>
    	#include <stdlib.h>
    	#include <unistd.h>
    	#include <sys/mman.h>
    	#include <sys/resource.h>
    	#include <sys/stat.h>
    	#include <sys/time.h>
    	#include <sys/types.h>
    
    	#define PAGE_SIZE 4096
    
    	int main(int argc, char **argv)
    	{
    		struct rusage usage;
    		long before;
    		char *p;
    		int fd;
    
    		/* Create a file and populate first page of page cache */
    		fd = open("/tmp", O_TMPFILE | O_RDWR, S_IRUSR | S_IWUSR);
    		write(fd, "1", 1);
    
    		/* Create a *read-only* *private* mapping of the file */
    		p = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_PRIVATE, fd, 0);
    
    		/*
    		 * Since the mapping is read-only, mlock() will populate the mapping
    		 * with PTEs pointing to page cache without triggering COW.
    		 */
    		mlock(p, PAGE_SIZE);
    
    		/*
    		 * Mapping became read-write, but it's still populated with PTEs
    		 * pointing to page cache.
    		 */
    		mprotect(p, PAGE_SIZE, PROT_READ | PROT_WRITE);
    
    		getrusage(RUSAGE_SELF, &usage);
    		before = usage.ru_minflt;
    
    		/* Trigger COW: fault in mlock()ed VMA. */
    		*p = 1;
    
    		getrusage(RUSAGE_SELF, &usage);
    		printf("faults: %ld\n", usage.ru_minflt - before);
    
    		return 0;
    	}
    
    	$ ./test
    	faults: 1
    
    Let's fix it by triggering populating of VMA in mprotect_fixup() on this
    condition. We don't care about population error as we don't in other
    similar cases i.e. mremap.
    
    [akpm@linux-foundation.org: tweak comment text]
    Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    36f88188
mprotect.c 10.8 KB