1. 12 9月, 2013 4 次提交
    • V
      mm: munlock: bypass per-cpu pvec for putback_lru_page · 56afe477
      Vlastimil Babka 提交于
      After introducing batching by pagevecs into munlock_vma_range(), we can
      further improve performance by bypassing the copying into per-cpu pagevec
      and the get_page/put_page pair associated with that.  Instead we perform
      LRU putback directly from our pagevec.  However, this is possible only for
      single-mapped pages that are evictable after munlock.  Unevictable pages
      require rechecking after putting on the unevictable list, so for those we
      fallback to putback_lru_page(), hich handles that.
      
      After this patch, a 13% speedup was measured for munlocking a 56GB large
      memory area with THP disabled.
      
      [akpm@linux-foundation.org:clarify comment]
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NJörn Engel <joern@logfs.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56afe477
    • V
      mm: munlock: batch NR_MLOCK zone state updates · 1ebb7cc6
      Vlastimil Babka 提交于
      Depending on previous batch which introduced batched isolation in
      munlock_vma_range(), we can batch also the updates of NR_MLOCK page stats.
       After the whole pagevec is processed for page isolation, the stats are
      updated only once with the number of successful isolations.  There were
      however no measurable perfomance gains.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NJörn Engel <joern@logfs.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ebb7cc6
    • V
      mm: munlock: batch non-THP page isolation and munlock+putback using pagevec · 7225522b
      Vlastimil Babka 提交于
      Currently, munlock_vma_range() calls munlock_vma_page on each page in a
      loop, which results in repeated taking and releasing of the lru_lock
      spinlock for isolating pages one by one.  This patch batches the munlock
      operations using an on-stack pagevec, so that isolation is done under
      single lru_lock.  For THP pages, the old behavior is preserved as they
      might be split while putting them into the pagevec.  After this patch, a
      9% speedup was measured for munlocking a 56GB large memory area with THP
      disabled.
      
      A new function __munlock_pagevec() is introduced that takes a pagevec and:
      1) It clears PageMlocked and isolates all pages under lru_lock.  Zone page
      stats can be also updated using the variant which assumes disabled
      interrupts.  2) It finishes the munlock and lru putback on all pages under
      their lock_page.  Note that previously, lock_page covered also the
      PageMlocked clearing and page isolation, but it is not needed for those
      operations.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NJörn Engel <joern@logfs.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7225522b
    • V
      mm: munlock: remove unnecessary call to lru_add_drain() · 586a32ac
      Vlastimil Babka 提交于
      In munlock_vma_range(), lru_add_drain() is currently called in a loop
      before each munlock_vma_page() call.
      
      This is suboptimal for performance when munlocking many pages.  The
      benefits of per-cpu pagevec for batching the LRU putback are removed since
      the pagevec only holds at most one page from the previous loop's
      iteration.
      
      The lru_add_drain() call also does not serve any purposes for correctness
      - it does not even drain pagavecs of all cpu's.  The munlock code already
      expects and handles situations where a page cannot be isolated from the
      LRU (e.g.  because it is on some per-cpu pagevec).
      
      The history of the (not commented) call also suggest that it appears there
      as an oversight rather than intentionally.  Before commit ff6a6da6 ("mm:
      accelerate munlock() treatment of THP pages") the call happened only once
      upon entering the function.  The commit has moved the call into the while
      loope.  So while the other changes in the commit improved munlock
      performance for THP pages, it introduced the abovementioned suboptimal
      per-cpu pagevec usage.
      
      Further in history, before commit 408e82b7 ("mm: munlock use
      follow_page"), munlock_vma_pages_range() was just a wrapper around
      __mlock_vma_pages_range which performed both mlock and munlock depending
      on a flag.  However, before ba470de4 ("mmap: handle mlocked pages during
      map, remap, unmap") the function handled only mlock, not munlock.  The
      lru_add_drain call thus comes from the implementation in commit b291f000
      ("mlock: mlocked pages are unevictable" and was intended only for
      mlocking, not munlocking.  The original intention of draining the LRU
      pagevec at mlock time was to ensure the pages were on the LRU before the
      lock operation so that they could be placed on the unevictable list
      immediately.  There is very little motivation to do the same in the
      munlock path this, particularly for every single page.
      
      This patch therefore removes the call completely.  After removing the
      call, a 10% speedup was measured for munlock() of a 56GB large memory area
      with THP disabled.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NJörn Engel <joern@logfs.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      586a32ac
  2. 29 3月, 2013 1 次提交
  3. 28 2月, 2013 1 次提交
  4. 24 2月, 2013 5 次提交
  5. 13 2月, 2013 1 次提交
  6. 09 10月, 2012 3 次提交
    • D
      mm, thp: fix mlock statistics · 8449d21f
      David Rientjes 提交于
      NR_MLOCK is only accounted in single page units: there's no logic to
      handle transparent hugepages.  This patch checks the appropriate number of
      pages to adjust the statistics by so that the correct amount of memory is
      reflected.
      
      Currently:
      
      		$ grep Mlocked /proc/meminfo
      		Mlocked:           19636 kB
      
      	#define MAP_SIZE	(4 << 30)	/* 4GB */
      
      	void *ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE,
      			 MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
      	mlock(ptr, MAP_SIZE);
      
      		$ grep Mlocked /proc/meminfo
      		Mlocked:           29844 kB
      
      	munlock(ptr, MAP_SIZE);
      
      		$ grep Mlocked /proc/meminfo
      		Mlocked:           19636 kB
      
      And with this patch:
      
      		$ grep Mlock /proc/meminfo
      		Mlocked:           19636 kB
      
      	mlock(ptr, MAP_SIZE);
      
      		$ grep Mlock /proc/meminfo
      		Mlocked:         4213664 kB
      
      	munlock(ptr, MAP_SIZE);
      
      		$ grep Mlock /proc/meminfo
      		Mlocked:           19636 kB
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reported-by: NHugh Dickens <hughd@google.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NMichel Lespinasse <walken@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8449d21f
    • H
      mm: use clear_page_mlock() in page_remove_rmap() · e6c509f8
      Hugh Dickins 提交于
      We had thought that pages could no longer get freed while still marked as
      mlocked; but Johannes Weiner posted this program to demonstrate that
      truncating an mlocked private file mapping containing COWed pages is still
      mishandled:
      
      #include <sys/types.h>
      #include <sys/mman.h>
      #include <sys/stat.h>
      #include <stdlib.h>
      #include <unistd.h>
      #include <fcntl.h>
      #include <stdio.h>
      
      int main(void)
      {
      	char *map;
      	int fd;
      
      	system("grep mlockfreed /proc/vmstat");
      	fd = open("chigurh", O_CREAT|O_EXCL|O_RDWR);
      	unlink("chigurh");
      	ftruncate(fd, 4096);
      	map = mmap(NULL, 4096, PROT_WRITE, MAP_PRIVATE, fd, 0);
      	map[0] = 11;
      	mlock(map, sizeof(fd));
      	ftruncate(fd, 0);
      	close(fd);
      	munlock(map, sizeof(fd));
      	munmap(map, 4096);
      	system("grep mlockfreed /proc/vmstat");
      	return 0;
      }
      
      The anon COWed pages are not caught by truncation's clear_page_mlock() of
      the pagecache pages; but unmap_mapping_range() unmaps them, so we ought to
      look out for them there in page_remove_rmap().  Indeed, why should
      truncation or invalidation be doing the clear_page_mlock() when removing
      from pagecache?  mlock is a property of mapping in userspace, not a
      property of pagecache: an mlocked unmapped page is nonsensical.
      Reported-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ying Han <yinghan@google.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6c509f8
    • K
      mm: kill vma flag VM_RESERVED and mm->reserved_vm counter · 314e51b9
      Konstantin Khlebnikov 提交于
      A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
      currently it lost original meaning but still has some effects:
      
       | effect                 | alternative flags
      -+------------------------+---------------------------------------------
      1| account as reserved_vm | VM_IO
      2| skip in core dump      | VM_IO, VM_DONTDUMP
      3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      
      This patch removes reserved_vm counter from mm_struct.  Seems like nobody
      cares about it, it does not exported into userspace directly, it only
      reduces total_vm showed in proc.
      
      Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
      
      remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
      remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.
      
      [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      314e51b9
  7. 07 3月, 2012 1 次提交
    • L
      vm: avoid using find_vma_prev() unnecessarily · 097d5910
      Linus Torvalds 提交于
      Several users of "find_vma_prev()" were not in fact interested in the
      previous vma if there was no primary vma to be found either.  And in
      those cases, we're much better off just using the regular "find_vma()",
      and then "prev" can be looked up by just checking vma->vm_prev.
      
      The find_vma_prev() semantics are fairly subtle (see Mikulas' recent
      commit 83cd904d: "mm: fix find_vma_prev"), and the whole "return
      prev by reference" means that it generates worse code too.
      
      Thus this "let's avoid using this inconvenient and clearly too subtle
      interface when we don't really have to" patch.
      
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      097d5910
  8. 01 11月, 2011 2 次提交
  9. 31 10月, 2011 1 次提交
  10. 27 5月, 2011 1 次提交
  11. 05 5月, 2011 1 次提交
    • L
      VM: skip the stack guard page lookup in get_user_pages only for mlock · a1fde08c
      Linus Torvalds 提交于
      The logic in __get_user_pages() used to skip the stack guard page lookup
      whenever the caller wasn't interested in seeing what the actual page
      was.  But Michel Lespinasse points out that there are cases where we
      don't care about the physical page itself (so 'pages' may be NULL), but
      do want to make sure a page is mapped into the virtual address space.
      
      So using the existence of the "pages" array as an indication of whether
      to look up the guard page or not isn't actually so great, and we really
      should just use the FOLL_MLOCK bit.  But because that bit was only set
      for the VM_LOCKED case (and not all vma's necessarily have it, even for
      mlock()), we couldn't do that originally.
      
      Fix that by moving the VM_LOCKED check deeper into the call-chain, which
      actually simplifies many things.  Now mlock() gets simpler, and we can
      also check for FOLL_MLOCK in __get_user_pages() and the code ends up
      much more straightforward.
      Reported-and-reviewed-by: NMichel Lespinasse <walken@google.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a1fde08c
  12. 13 4月, 2011 1 次提交
    • L
      vm: fix mlock() on stack guard page · 95042f9e
      Linus Torvalds 提交于
      Commit 53a7706d ("mlock: do not hold mmap_sem for extended periods
      of time") changed mlock() to care about the exact number of pages that
      __get_user_pages() had brought it.  Before, it would only care about
      errors.
      
      And that doesn't work, because we also handled one page specially in
      __mlock_vma_pages_range(), namely the stack guard page.  So when that
      case was handled, the number of pages that the function returned was off
      by one.  In particular, it could be zero, and then the caller would end
      up not making any progress at all.
      
      Rather than try to fix up that off-by-one error for the mlock case
      specially, this just moves the logic to handle the stack guard page
      into__get_user_pages() itself, thus making all the counts come out
      right automatically.
      Reported-by: NRobert Święcki <robert@swiecki.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95042f9e
  13. 24 3月, 2011 1 次提交
  14. 02 2月, 2011 1 次提交
  15. 14 1月, 2011 5 次提交
    • M
      mlock: do not hold mmap_sem for extended periods of time · 53a7706d
      Michel Lespinasse 提交于
      __get_user_pages gets a new 'nonblocking' parameter to signal that the
      caller is prepared to re-acquire mmap_sem and retry the operation if
      needed.  This is used to split off long operations if they are going to
      block on a disk transfer, or when we detect contention on the mmap_sem.
      
      [akpm@linux-foundation.org: remove ref to rwsem_is_contended()]
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53a7706d
    • M
      mm: move VM_LOCKED check to __mlock_vma_pages_range() · 5fdb2002
      Michel Lespinasse 提交于
      Use a single code path for faulting in pages during mlock.
      
      The reason to have it in this patch series is that I did not want to
      update both code paths in a later change that releases mmap_sem when
      blocking on disk.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5fdb2002
    • M
      mm: add FOLL_MLOCK follow_page flag. · 110d74a9
      Michel Lespinasse 提交于
      Move the code to mlock pages from __mlock_vma_pages_range() to
      follow_page().
      
      This allows __mlock_vma_pages_range() to not have to break down work into
      16-page batches.
      
      An additional motivation for doing this within the present patch series is
      that it'll make it easier for a later chagne to drop mmap_sem when
      blocking on disk (we'd like to be able to resume at the page that was read
      from disk instead of at the start of a 16-page batch).
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      110d74a9
    • M
      mlock: only hold mmap_sem in shared mode when faulting in pages · fed067da
      Michel Lespinasse 提交于
      Currently mlock() holds mmap_sem in exclusive mode while the pages get
      faulted in.  In the case of a large mlock, this can potentially take a
      very long time, during which various commands such as 'ps auxw' will
      block.  This makes sysadmins unhappy:
      
      real    14m36.232s
      user    0m0.003s
      sys     0m0.015s
      (output from 'time ps auxw' while a 20GB file was being mlocked without
      being previously preloaded into page cache)
      
      I propose that mlock() could release mmap_sem after the VM_LOCKED bits
      have been set in all appropriate VMAs.  Then a second pass could be done
      to actually mlock the pages, in small batches, releasing mmap_sem when we
      block on disk access or when we detect some contention.
      
      This patch:
      
      Before this change, mlock() holds mmap_sem in exclusive mode while the
      pages get faulted in.  In the case of a large mlock, this can potentially
      take a very long time.  Various things will block while mmap_sem is held,
      including 'ps auxw'.  This can make sysadmins angry.
      
      I propose that mlock() could release mmap_sem after the VM_LOCKED bits
      have been set in all appropriate VMAs.  Then a second pass could be done
      to actually mlock the pages with mmap_sem held for reads only.  We need to
      recheck the vma flags after we re-acquire mmap_sem, but this is easy.
      
      In the case where a vma has been munlocked before mlock completes, pages
      that were already marked as PageMlocked() are handled by the munlock()
      call, and mlock() is careful to not mark new page batches as PageMlocked()
      after the munlock() call has cleared the VM_LOCKED vma flags.  So, the end
      result will be identical to what'd happen if munlock() had executed after
      the mlock() call.
      
      In a later change, I will allow the second pass to release mmap_sem when
      blocking on disk accesses or when it is otherwise contended, so that it
      won't be held for long periods of time even in shared mode.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Tested-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fed067da
    • M
      mlock: avoid dirtying pages and triggering writeback · 5ecfda04
      Michel Lespinasse 提交于
      When faulting in pages for mlock(), we want to break COW for anonymous or
      file pages within VM_WRITABLE, non-VM_SHARED vmas.  However, there is no
      need to write-fault into VM_SHARED vmas since shared file pages can be
      mlocked first and dirtied later, when/if they actually get written to.
      Skipping the write fault is desirable, as we don't want to unnecessarily
      cause these pages to be dirtied and queued for writeback.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Theodore Tso <tytso@google.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ecfda04
  16. 10 9月, 2010 1 次提交
  17. 21 8月, 2010 1 次提交
  18. 16 8月, 2010 1 次提交
    • L
      mm: fix up some user-visible effects of the stack guard page · d7824370
      Linus Torvalds 提交于
      This commit makes the stack guard page somewhat less visible to user
      space. It does this by:
      
       - not showing the guard page in /proc/<pid>/maps
      
         It looks like lvm-tools will actually read /proc/self/maps to figure
         out where all its mappings are, and effectively do a specialized
         "mlockall()" in user space.  By not showing the guard page as part of
         the mapping (by just adding PAGE_SIZE to the start for grows-up
         pages), lvm-tools ends up not being aware of it.
      
       - by also teaching the _real_ mlock() functionality not to try to lock
         the guard page.
      
         That would just expand the mapping down to create a new guard page,
         so there really is no point in trying to lock it in place.
      
      It would perhaps be nice to show the guard page specially in
      /proc/<pid>/maps (or at least mark grow-down segments some way), but
      let's not open ourselves up to more breakage by user space from programs
      that depends on the exact deails of the 'maps' file.
      
      Special thanks to Henrique de Moraes Holschuh for diving into lvm-tools
      source code to see what was going on with the whole new warning.
      
      Reported-and-tested-by: François Valenduc <francois.valenduc@tvcablenet.be
      Reported-by: NHenrique de Moraes Holschuh <hmh@hmh.eng.br>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7824370
  19. 26 3月, 2010 1 次提交
    • P
      x86, perf, bts, mm: Delete the never used BTS-ptrace code · faa4602e
      Peter Zijlstra 提交于
      Support for the PMU's BTS features has been upstreamed in
      v2.6.32, but we still have the old and disabled ptrace-BTS,
      as Linus noticed it not so long ago.
      
      It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
      regard for other uses (perf) and doesn't provide the flexibility
      needed for perf either.
      
      Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
      was never used and ptrace-block-step can be implemented using a
      much simpler approach.
      
      So axe all 3000 lines of it. That includes the *locked_memory*()
      APIs in mm/mlock.c as well.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Markus Metzger <markus.t.metzger@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <20100325135413.938004390@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      faa4602e
  20. 07 3月, 2010 1 次提交
  21. 16 12月, 2009 3 次提交
  22. 22 9月, 2009 3 次提交
    • H
      mm: m(un)lock avoid ZERO_PAGE · 6e919717
      Hugh Dickins 提交于
      I'm still reluctant to clutter __get_user_pages() with another flag, just
      to avoid touching ZERO_PAGE count in mlock(); though we can add that later
      if it shows up as an issue in practice.
      
      But when mlocking, we can test page->mapping slightly earlier, to avoid
      the potentially bouncy rescheduling of lock_page on ZERO_PAGE - mlock
      didn't lock_page in olden ZERO_PAGE days, so we might have regressed.
      
      And when munlocking, it turns out that FOLL_DUMP coincidentally does
      what's needed to avoid all updates to ZERO_PAGE, so use that here also.
      Plus add comment suggested by KAMEZAWA Hiroyuki.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e919717
    • H
      mm: FOLL flags for GUP flags · 58fa879e
      Hugh Dickins 提交于
      __get_user_pages() has been taking its own GUP flags, then processing
      them into FOLL flags for follow_page().  Though oddly named, the FOLL
      flags are more widely used, so pass them to __get_user_pages() now.
      Sorry, VM flags, VM_FAULT flags and FAULT_FLAGs are still distinct.
      
      (The patch to __get_user_pages() looks peculiar, with both gup_flags
      and foll_flags: the gup_flags remain constant; but as before there's
      an exceptional case, out of scope of the patch, in which foll_flags
      per page have FOLL_WRITE masked off.)
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58fa879e
    • H
      mm: munlock use follow_page · 408e82b7
      Hugh Dickins 提交于
      Hiroaki Wakabayashi points out that when mlock() has been interrupted
      by SIGKILL, the subsequent munlock() takes unnecessarily long because
      its use of __get_user_pages() insists on faulting in all the pages
      which mlock() never reached.
      
      It's worse than slowness if mlock() is terminated by Out Of Memory kill:
      the munlock_vma_pages_all() in exit_mmap() insists on faulting in all the
      pages which mlock() could not find memory for; so innocent bystanders are
      killed too, and perhaps the system hangs.
      
      __get_user_pages() does a lot that's silly for munlock(): so remove the
      munlock option from __mlock_vma_pages_range(), and use a simple loop of
      follow_page()s in munlock_vma_pages_range() instead; ignoring absent
      pages, and not marking present pages as accessed or dirty.
      
      (Change munlock() to only go so far as mlock() reached?  That does not
      work out, given the convention that mlock() claims complete success even
      when it has to give up early - in part so that an underlying file can be
      extended later, and those pages locked which earlier would give SIGBUS.)
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: <stable@kernel.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Reviewed-by: NHiroaki Wakabayashi <primulaelatior@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      408e82b7