1. 20 10月, 2008 3 次提交
    • R
      mmap: handle mlocked pages during map, remap, unmap · ba470de4
      Rik van Riel 提交于
      Originally by Nick Piggin <npiggin@suse.de>
      
      Remove mlocked pages from the LRU using "unevictable infrastructure"
      during mmap(), munmap(), mremap() and truncate().  Try to move back to
      normal LRU lists on munmap() when last mlocked mapping removed.  Remove
      PageMlocked() status when page truncated from file.
      
      [akpm@linux-foundation.org: cleanup]
      [kamezawa.hiroyu@jp.fujitsu.com: fix double unlock_page()]
      [kosaki.motohiro@jp.fujitsu.com: split LRU: munlock rework]
      [lee.schermerhorn@hp.com: mlock: fix __mlock_vma_pages_range comment block]
      [akpm@linux-foundation.org: remove bogus kerneldoc token]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamewzawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba470de4
    • L
      mlock: downgrade mmap sem while populating mlocked regions · 8edb08ca
      Lee Schermerhorn 提交于
      We need to hold the mmap_sem for write to initiatate mlock()/munlock()
      because we may need to merge/split vmas.  However, this can lead to very
      long lock hold times attempting to fault in a large memory region to mlock
      it into memory.  This can hold off other faults against the mm
      [multithreaded tasks] and other scans of the mm, such as via /proc.  To
      alleviate this, downgrade the mmap_sem to read mode during the population
      of the region for locking.  This is especially the case if we need to
      reclaim memory to lock down the region.  We [probably?] don't need to do
      this for unlocking as all of the pages should be resident--they're already
      mlocked.
      
      Now, the caller's of the mlock functions [mlock_fixup() and
      mlock_vma_pages_range()] expect the mmap_sem to be returned in write mode.
       Changing all callers appears to be way too much effort at this point.
      So, restore write mode before returning.  Note that this opens a window
      where the mmap list could change in a multithreaded process.  So, at least
      for mlock_fixup(), where we could be called in a loop over multiple vmas,
      we check that a vma still exists at the start address and that vma still
      covers the page range [start,end).  If not, we return an error, -EAGAIN,
      and let the caller deal with it.
      
      Return -EAGAIN from mlock_vma_pages_range() function and mlock_fixup() if
      the vma at 'start' disappears or changes so that the page range
      [start,end) is no longer contained in the vma.  Again, let the caller deal
      with it.  Looks like only sys_remap_file_pages() [via mmap_region()]
      should actually care.
      
      With this patch, I no longer see processes like ps(1) blocked for seconds
      or minutes at a time waiting for a large [multiple gigabyte] region to be
      locked down.  However, I occassionally see delays while unlocking or
      unmapping a large mlocked region.  Should we also downgrade the mmap_sem
      for the unlock path?
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8edb08ca
    • N
      mlock: mlocked pages are unevictable · b291f000
      Nick Piggin 提交于
      Make sure that mlocked pages also live on the unevictable LRU, so kswapd
      will not scan them over and over again.
      
      This is achieved through various strategies:
      
      1) add yet another page flag--PG_mlocked--to indicate that
         the page is locked for efficient testing in vmscan and,
         optionally, fault path.  This allows early culling of
         unevictable pages, preventing them from getting to
         page_referenced()/try_to_unmap().  Also allows separate
         accounting of mlock'd pages, as Nick's original patch
         did.
      
         Note:  Nick's original mlock patch used a PG_mlocked
         flag.  I had removed this in favor of the PG_unevictable
         flag + an mlock_count [new page struct member].  I
         restored the PG_mlocked flag to eliminate the new
         count field.
      
      2) add the mlock/unevictable infrastructure to mm/mlock.c,
         with internal APIs in mm/internal.h.  This is a rework
         of Nick's original patch to these files, taking into
         account that mlocked pages are now kept on unevictable
         LRU list.
      
      3) update vmscan.c:page_evictable() to check PageMlocked()
         and, if vma passed in, the vm_flags.  Note that the vma
         will only be passed in for new pages in the fault path;
         and then only if the "cull unevictable pages in fault
         path" patch is included.
      
      4) add try_to_unlock() to rmap.c to walk a page's rmap and
         ClearPageMlocked() if no other vmas have it mlocked.
         Reuses as much of try_to_unmap() as possible.  This
         effectively replaces the use of one of the lru list links
         as an mlock count.  If this mechanism let's pages in mlocked
         vmas leak through w/o PG_mlocked set [I don't know that it
         does], we should catch them later in try_to_unmap().  One
         hopes this will be rare, as it will be relatively expensive.
      
      Original mm/internal.h, mm/rmap.c and mm/mlock.c changes:
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      
      splitlru: introduce __get_user_pages():
      
        New munlock processing need to GUP_FLAGS_IGNORE_VMA_PERMISSIONS.
        because current get_user_pages() can't grab PROT_NONE pages theresore it
        cause PROT_NONE pages can't munlock.
      
      [akpm@linux-foundation.org: fix this for pagemap-pass-mm-into-pagewalkers.patch]
      [akpm@linux-foundation.org: untangle patch interdependencies]
      [akpm@linux-foundation.org: fix things after out-of-order merging]
      [hugh@veritas.com: fix page-flags mess]
      [lee.schermerhorn@hp.com: fix munlock page table walk - now requires 'mm']
      [kosaki.motohiro@jp.fujitsu.com: build fix]
      [kosaki.motohiro@jp.fujitsu.com: fix truncate race and sevaral comments]
      [kosaki.motohiro@jp.fujitsu.com: splitlru: introduce __get_user_pages()]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b291f000
  2. 05 8月, 2008 1 次提交
    • K
      mlock() fix return values · a477097d
      KOSAKI Motohiro 提交于
      Halesh says:
      
      Please find the below testcase provide to test mlock.
      
      Test Case :
      ===========================
      
      #include <sys/resource.h>
      #include <stdio.h>
      #include <sys/stat.h>
      #include <sys/types.h>
      #include <unistd.h>
      #include <sys/mman.h>
      #include <fcntl.h>
      #include <errno.h>
      #include <stdlib.h>
      
      int main(void)
      {
        int fd,ret, i = 0;
        char *addr, *addr1 = NULL;
        unsigned int page_size;
        struct rlimit rlim;
      
        if (0 != geteuid())
        {
         printf("Execute this pgm as root\n");
         exit(1);
        }
      
        /* create a file */
        if ((fd = open("mmap_test.c",O_RDWR|O_CREAT,0755)) == -1)
        {
         printf("cant create test file\n");
         exit(1);
        }
      
        page_size = sysconf(_SC_PAGE_SIZE);
      
        /* set the MEMLOCK limit */
        rlim.rlim_cur = 2000;
        rlim.rlim_max = 2000;
      
        if ((ret = setrlimit(RLIMIT_MEMLOCK,&rlim)) != 0)
        {
         printf("Cant change limit values\n");
         exit(1);
        }
      
        addr = 0;
        while (1)
        {
        /* map a page into memory each time*/
        if ((addr = (char *) mmap(addr,page_size, PROT_READ |
      PROT_WRITE,MAP_SHARED,fd,0)) == MAP_FAILED)
        {
         printf("cant do mmap on file\n");
         exit(1);
        }
      
        if (0 == i)
          addr1 = addr;
        i++;
        errno = 0;
        /* lock the mapped memory pagewise*/
        if ((ret = mlock((char *)addr, 1500)) == -1)
        {
         printf("errno value is %d\n", errno);
         printf("cant lock maped region\n");
         exit(1);
        }
        addr = addr + page_size;
       }
      }
      ======================================================
      
      This testcase results in an mlock() failure with errno 14 that is EFAULT,
      but it has nowhere been specified that mlock() will return EFAULT.  When I
      tested the same on older kernels like 2.6.18, I got the correct result i.e
      errno 12 (ENOMEM).
      
      I think in source code mlock(2), setting errno ENOMEM has been missed in
      do_mlock() , on mlock_fixup() failure.
      
      SUSv3 requires the following behavior frmo mlock(2).
      
      [ENOMEM]
          Some or all of the address range specified by the addr and
          len arguments does not correspond to valid mapped pages
          in the address space of the process.
      
      [EAGAIN]
          Some or all of the memory identified by the operation could not
          be locked when the call was made.
      
      This rule isn't so nice and slighly strange.  but many people think
      POSIX/SUS compliance is important.
      Reported-by: NHalesh Sadashiv <halesh.sadashiv@ap.sony.com>
      Tested-by: NHalesh Sadashiv <halesh.sadashiv@ap.sony.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a477097d
  3. 17 7月, 2007 1 次提交
  4. 22 5月, 2007 1 次提交
    • A
      Detach sched.h from mm.h · e8edc6e0
      Alexey Dobriyan 提交于
      First thing mm.h does is including sched.h solely for can_do_mlock() inline
      function which has "current" dereference inside. By dealing with can_do_mlock()
      mm.h can be detached from sched.h which is good. See below, why.
      
      This patch
      a) removes unconditional inclusion of sched.h from mm.h
      b) makes can_do_mlock() normal function in mm/mlock.c
      c) exports can_do_mlock() to not break compilation
      d) adds sched.h inclusions back to files that were getting it indirectly.
      e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
         getting them indirectly
      
      Net result is:
      a) mm.h users would get less code to open, read, preprocess, parse, ... if
         they don't need sched.h
      b) sched.h stops being dependency for significant number of files:
         on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
         after patch it's only 3744 (-8.3%).
      
      Cross-compile tested on
      
      	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
      	alpha alpha-up
      	arm
      	i386 i386-up i386-defconfig i386-allnoconfig
      	ia64 ia64-up
      	m68k
      	mips
      	parisc parisc-up
      	powerpc powerpc-up
      	s390 s390-up
      	sparc sparc-up
      	sparc64 sparc64-up
      	um-x86_64
      	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
      
      as well as my two usual configs.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8edc6e0
  5. 08 12月, 2006 1 次提交
  6. 12 1月, 2006 1 次提交
  7. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4