1. 22 3月, 2006 3 次提交
    • D
      [PATCH] hugepage: Fix hugepage logic in free_pgtables() · 9da61aef
      David Gibson 提交于
      free_pgtables() has special logic to call hugetlb_free_pgd_range() instead
      of the normal free_pgd_range() on hugepage VMAs.  However, the test it uses
      to do so is incorrect: it calls is_hugepage_only_range on a hugepage sized
      range at the start of the vma.  is_hugepage_only_range() will return true
      if the given range has any intersection with a hugepage address region, and
      in this case the given region need not be hugepage aligned.  So, for
      example, this test can return true if called on, say, a 4k VMA immediately
      preceding a (nicely aligned) hugepage VMA.
      
      At present we get away with this because the powerpc version of
      hugetlb_free_pgd_range() is just a call to free_pgd_range().  On ia64 (the
      only other arch with a non-trivial is_hugepage_only_range()) we get away
      with it for a different reason; the hugepage area is not contiguous with
      the rest of the user address space, and VMAs are not permitted in between,
      so the test can't return a false positive there.
      
      Nonetheless this should be fixed.  We do that in the patch below by
      replacing the is_hugepage_only_range() test with an explicit test of the
      VMA using is_vm_hugetlb_page().
      
      This in turn changes behaviour for platforms where is_hugepage_only_range()
      returns false always (everything except powerpc and ia64).  We address this
      by ensuring that hugetlb_free_pgd_range() is defined to be identical to
      free_pgd_range() (instead of a no-op) on everything except ia64.  Even so,
      it will prevent some otherwise possible coalescing of calls down to
      free_pgd_range().  Since this only happens for hugepage VMAs, removing this
      small optimization seems unlikely to cause any trouble.
      
      This patch causes no regressions on the libhugetlbfs testsuite - ppc64
      POWER5 (8-way), ppc64 G5 (2-way) and i386 Pentium M (UP).
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9da61aef
    • N
      [PATCH] mm: more CONFIG_DEBUG_VM · b7ab795b
      Nick Piggin 提交于
      Put a few more checks under CONFIG_DEBUG_VM
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b7ab795b
    • N
      [PATCH] mm: split highorder pages · 8dfcc9ba
      Nick Piggin 提交于
      Have an explicit mm call to split higher order pages into individual pages.
       Should help to avoid bugs and be more explicit about the code's intention.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: NYoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8dfcc9ba
  2. 17 3月, 2006 1 次提交
    • H
      [PATCH] fix free swap cache latency · 6f5e6b9e
      Hugh Dickins 提交于
      Lee Revell reported 28ms latency when process with lots of swapped memory
      exits.
      
      2.6.15 introduced a latency regression when unmapping: in accounting the
      zap_work latency breaker, pte_none counted 1, pte_present PAGE_SIZE, but a
      swap entry counted nothing at all.  We think of pages present as the slow
      case, but Lee's trace shows that free_swap_and_cache's radix tree lookup
      can make a lot of work - and we could have been doing it many thousands of
      times without a latency break.
      
      Move the zap_work update up to account swap entries like pages present.
      This does account non-linear pte_file entries, and unmap_mapping_range
      skipping over swap entries, by the same amount even though they're quick:
      but neither of those cases deserves complicating the code (and they're
      treated no worse than they were in 2.6.14).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6f5e6b9e
  3. 18 2月, 2006 1 次提交
  4. 02 2月, 2006 1 次提交
  5. 10 1月, 2006 1 次提交
  6. 09 1月, 2006 1 次提交
  7. 07 1月, 2006 3 次提交
    • N
      [PATCH] mm: pfault optimisation · 41e9b63b
      Nick Piggin 提交于
      This atomic operation is superfluous: the pte will be added with the
      referenced bit set, and the page will be referenced through this mapping after
      the page fault handler returns anyway.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      41e9b63b
    • N
      [PATCH] mm: rmap optimisation · 9617d95e
      Nick Piggin 提交于
      Optimise rmap functions by minimising atomic operations when we know there
      will be no concurrent modifications.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9617d95e
    • B
      [PATCH] madvise(MADV_REMOVE): remove pages from tmpfs shm backing store · f6b3ec23
      Badari Pulavarty 提交于
      Here is the patch to implement madvise(MADV_REMOVE) - which frees up a
      given range of pages & its associated backing store.  Current
      implementation supports only shmfs/tmpfs and other filesystems return
      -ENOSYS.
      
      "Some app allocates large tmpfs files, then when some task quits and some
      client disconnect, some memory can be released.  However the only way to
      release tmpfs-swap is to MADV_REMOVE". - Andrea Arcangeli
      
      Databases want to use this feature to drop a section of their bufferpool
      (shared memory segments) - without writing back to disk/swap space.
      
      This feature is also useful for supporting hot-plug memory on UML.
      
      Concerns raised by Andrew Morton:
      
      - "We have no plan for holepunching!  If we _do_ have such a plan (or
        might in the future) then what would the API look like?  I think
        sys_holepunch(fd, start, len), so we should start out with that."
      
      - Using madvise is very weird, because people will ask "why do I need to
        mmap my file before I can stick a hole in it?"
      
      - None of the other madvise operations call into the filesystem in this
        manner.  A broad question is: is this capability an MM operation or a
        filesytem operation?  truncate, for example, is a filesystem operation
        which sometimes has MM side-effects.  madvise is an mm operation and with
        this patch, it gains FS side-effects, only they're really, really
        significant ones."
      
      Comments:
      
      - Andrea suggested the fs operation too but then it's more efficient to
        have it as a mm operation with fs side effects, because they don't
        immediatly know fd and physical offset of the range.  It's possible to
        fixup in userland and to use the fs operation but it's more expensive,
        the vmas are already in the kernel and we can use them.
      
      Short term plan &  Future Direction:
      
      - We seem to need this interface only for shmfs/tmpfs files in the short
        term.  We have to add hooks into the filesystem for correctness and
        completeness.  This is what this patch does.
      
      - In the future, plan is to support both fs and mmap apis also.  This
        also involves (other) filesystem specific functions to be implemented.
      
      - Current patch doesn't support VM_NONLINEAR - which can be addressed in
        the future.
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Andrea Arcangeli <andrea@suse.de>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f6b3ec23
  8. 17 12月, 2005 1 次提交
  9. 13 12月, 2005 1 次提交
    • L
      get_user_pages: don't try to follow PFNMAP pages · 1ff80389
      Linus Torvalds 提交于
      Nick Piggin points out that a few drivers play games with VM_IO (why?
      who knows..) and thus a pfn-remapped area may not have that bit set even
      if remap_pfn_range() set it originally.
      
      So make it explicit in get_user_pages() that we don't follow VM_PFNMAP
      pages, since pretty much by definition they do not have a "struct page"
      associated with them.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1ff80389
  10. 12 12月, 2005 3 次提交
  11. 04 12月, 2005 1 次提交
  12. 01 12月, 2005 1 次提交
    • L
      VM: add "vm_insert_page()" function · a145dd41
      Linus Torvalds 提交于
      This is what a lot of drivers will actually want to use to insert
      individual pages into a user VMA.  It doesn't have the old PageReserved
      restrictions of remap_pfn_range(), and it doesn't complain about partial
      remappings.
      
      The page you insert needs to be a nice clean kernel allocation, so you
      can't insert arbitrary page mappings with this, but that's not what
      people want.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a145dd41
  13. 30 11月, 2005 7 次提交
  14. 29 11月, 2005 3 次提交
    • N
      [PATCH] Fix vma argument in get_usr_pages() for gate areas · fa2a455b
      Nick Piggin 提交于
      The system call gate area handling called vm_normal_page() with the
      wrong vma (which was always NULL, and caused an oops).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fa2a455b
    • A
      [PATCH] Workaround for gcc 2.96 (undefined references) · e0f39591
      Alan Stern 提交于
        LD      .tmp_vmlinux1
      mm/built-in.o(.text+0x100d6): In function `copy_page_range':
      : undefined reference to `__pud_alloc'
      mm/built-in.o(.text+0x1010b): In function `copy_page_range':
      : undefined reference to `__pmd_alloc'
      mm/built-in.o(.text+0x11ef4): In function `__handle_mm_fault':
      : undefined reference to `__pud_alloc'
      fs/built-in.o(.text+0xc930): In function `install_arg_page':
      : undefined reference to `__pud_alloc'
      make: *** [.tmp_vmlinux1] Error 1
      
      Those missing references in mm/memory.c arise from this code in
      include/linux/mm.h, combined with the fact that __PGTABLE_PMD_FOLDED and
      __PGTABLE_PUD_FOLDED are both set and __ARCH_HAS_4LEVEL_HACK is not:
      
      /*
       * The following ifdef needed to get the 4level-fixup.h header to work.
       * Remove it when 4level-fixup.h has been removed.
       */
      #if defined(CONFIG_MMU) && !defined(__ARCH_HAS_4LEVEL_HACK)
      static inline pud_t *pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
      {
              return (unlikely(pgd_none(*pgd)) && __pud_alloc(mm, pgd, address))?
                      NULL: pud_offset(pgd, address);
      }
      
      static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
      {
              return (unlikely(pud_none(*pud)) && __pmd_alloc(mm, pud, address))?
                      NULL: pmd_offset(pud, address);
      }
      #endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */
      
      With my configuration the pgd_none and pud_none routines are inlines
      returning a constant 0.  Apparently the old compiler avoids generating
      calls to __pud_alloc and __pmd_alloc but still lists them as undefined
      references in the module's symbol table.
      
      I don't know which change caused this problem.  I think it was added
      somewhere between 2.6.14 and 2.6.15-rc1, because I remember building
      several 2.6.14-rc kernels without difficulty.  However I can't point to an
      individual culprit.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e0f39591
    • L
      mm: re-architect the VM_UNPAGED logic · 6aab341e
      Linus Torvalds 提交于
      This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
      explicit support for a "remapped page range" aka VM_PFNMAP.  It allows a
      VM area to contain an arbitrary range of page table entries that the VM
      never touches, and never considers to be normal pages.
      
      Any user of "remap_pfn_range()" automatically gets this new
      functionality, and doesn't even have to mark the pages reserved or
      indeed mark them any other way.  It just works.  As a side effect, doing
      mmap() on /dev/mem works for arbitrary ranges.
      
      Sparc update from David in the next commit.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6aab341e
  15. 23 11月, 2005 5 次提交
    • H
      [PATCH] unpaged: ZERO_PAGE in VM_UNPAGED · f57e88a8
      Hugh Dickins 提交于
      It's strange enough to be looking out for anonymous pages in VM_UNPAGED areas,
      let's not insert the ZERO_PAGE there - though whether it would matter will
      depend on what we decide about ZERO_PAGE refcounting.
      
      But whereas do_anonymous_page may (exceptionally) be called on a VM_UNPAGED
      area, do_no_page should never be: just BUG_ON.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f57e88a8
    • H
      [PATCH] unpaged: anon in VM_UNPAGED · ee498ed7
      Hugh Dickins 提交于
      copy_one_pte needs to copy the anonymous COWed pages in a VM_UNPAGED area,
      zap_pte_range needs to free them, do_wp_page needs to COW them: just like
      ordinary pages, not like the unpaged.
      
      But recognizing them is a little subtle: because PageReserved is no longer a
      condition for remap_pfn_range, we can now mmap all of /dev/mem (whether the
      distro permits, and whether it's advisable on this or that architecture, is
      another matter).  So if we can see a PageAnon, it may not be ours to mess with
      (or may be ours from elsewhere in the address space).  I suspect there's an
      entertaining insoluble self-referential problem here, but the page_is_anon
      function does a good practical job, and MAP_PRIVATE PROT_WRITE VM_UNPAGED will
      always be an odd choice.
      
      In updating the comment on page_address_in_vma, noticed a potential NULL
      dereference, in a path we don't actually take, but fixed it.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ee498ed7
    • H
      [PATCH] unpaged: COW on VM_UNPAGED · 920fc356
      Hugh Dickins 提交于
      Remove the BUG_ON(vma->vm_flags & VM_UNPAGED) from do_wp_page, and let it do
      Copy-On-Write without touching the VM_UNPAGED's page counts - but this is
      incomplete, because the anonymous page it inserts will itself need to be
      handled, here and in other functions - next patch.
      
      We still don't copy the page if the pfn is invalid, because the
      copy_user_highpage interface does not allow it.  But that's not been a problem
      in the past: can be added in later if the need arises.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      920fc356
    • H
      [PATCH] unpaged: VM_UNPAGED · 0b14c179
      Hugh Dickins 提交于
      Although we tend to associate VM_RESERVED with remap_pfn_range, quite a few
      drivers set VM_RESERVED on areas which are then populated by nopage.  The
      PageReserved removal in 2.6.15-rc1 changed VM_RESERVED not to free pages in
      zap_pte_range, without changing those drivers not to set it: so their pages
      just leak away.
      
      Let's not change miscellaneous drivers now: introduce VM_UNPAGED at the core,
      to flag the special areas where the ptes may have no struct page, or if they
      have then it's not to be touched.  Replace most instances of VM_RESERVED in
      core mm by VM_UNPAGED.  Force it on in remap_pfn_range, and the sparc and
      sparc64 io_remap_pfn_range.
      
      Revert addition of VM_RESERVED to powerpc vdso, it's not needed there.  Is it
      needed anywhere?  It still governs the mm->reserved_vm statistic, and special
      vmas not to be merged, and areas not to be core dumped; but could probably be
      eliminated later (the drivers are probably specifying it because in 2.4 it
      kept swapout off the vma, but in 2.6 we work from the LRU, which these pages
      don't get on).
      
      Use the VM_SHM slot for VM_UNPAGED, and define VM_SHM to 0: it serves no
      purpose whatsoever, and should be removed from drivers when we clean up.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NWilliam Irwin <wli@holomorphy.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0b14c179
    • H
      [PATCH] unpaged: get_user_pages VM_RESERVED · ed5297a9
      Hugh Dickins 提交于
      The PageReserved removal in 2.6.15-rc1 prohibited get_user_pages on the areas
      flagged VM_RESERVED in place of PageReserved.  That is correct in theory - we
      ought not to interfere with struct pages in such a reserved area; but in
      practice it broke BTTV for one.
      
      So revert to prohibiting only on VM_IO: if someone gets into trouble with
      get_user_pages on VM_RESERVED, it'll just be a "don't do that".
      
      You can argue that videobuf_mmap_mapper shouldn't set VM_RESERVED in the first
      place, but now's not the time for breaking drivers without notice.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ed5297a9
  16. 14 11月, 2005 1 次提交
    • R
      [PATCH] mm: ZAP_BLOCK causes redundant work · 51c6f666
      Robin Holt 提交于
      The address based work estimate for unmapping (for lockbreak) is and always
      was horribly inefficient for sparse mappings.  The problem is most simply
      explained with an example:
      
      If we find a pgd is clear, we still have to call into unmap_page_range
      PGDIR_SIZE / ZAP_BLOCK_SIZE times, each time checking the clear pgd, in
      order to progress the working address to the next pgd.
      
      The fundamental way to solve the problem is to keep track of the end
      address we've processed and pass it back to the higher layers.
      
      From: Nick Piggin <npiggin@suse.de>
      
        Modification to completely get away from address based work estimate
        and instead use an abstract count, with a very small cost for empty
        entries as opposed to present pages.
      
        On 2.6.14-git2, ppc64, and CONFIG_PREEMPT=y, mapping and unmapping 1TB
        of virtual address space takes 1.69s; with the following patch applied,
        this operation can be done 1000 times in less than 0.01s
      
      From: Andrew Morton <akpm@osdl.org>
      
      With CONFIG_HUTETLB_PAGE=n:
      
      mm/memory.c: In function `unmap_vmas':
      mm/memory.c:779: warning: division by zero
      
      Due to
      
      			zap_work -= (end - start) /
      					(HPAGE_SIZE / PAGE_SIZE);
      
      So make the dummy HPAGE_SIZE non-zero
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      51c6f666
  17. 30 10月, 2005 6 次提交
    • A
      [PATCH] .text page fault SMP scalability optimization · 1a44e149
      Andrea Arcangeli 提交于
      We had a problem on ppc64 where with more than 4 threads a large system
      wouldn't scale well while faulting in the .text (most of the time was spent
      in the kernel despite it was an userland compute intensive app).  The
      reason is the useless overwrite of the same pte from all cpu.
      
      I fixed it this way (verified on an older kernel but the forward port is
      almost identical).  This will benefit all archs not just ppc64.
      Signed-off-by: NAndrea Arcangeli <andrea@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1a44e149
    • H
      [PATCH] mm: fix rss and mmlist locking · f412ac08
      Hugh Dickins 提交于
      A couple of oddities were guarded by page_table_lock, no longer properly
      guarded when that is split.
      
      The mm_counters of file_rss and anon_rss: make those an atomic_t, or an
      atomic64_t if the architecture supports it, in such a case.  Definitions by
      courtesy of Christoph Lameter: who spent considerable effort on more scalable
      ways of counting, but found insufficient benefit in practice.
      
      And adding an mm with swap to the mmlist for swapoff: the list is well-
      guarded by its own lock, but the list_empty check now has to be repeated
      inside it.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f412ac08
    • H
      [PATCH] mm: split page table lock · 4c21e2f2
      Hugh Dickins 提交于
      Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
      a many-threaded application which concurrently initializes different parts of
      a large anonymous area.
      
      This patch corrects that, by using a separate spinlock per page table page, to
      guard the page table entries in that page, instead of using the mm's single
      page_table_lock.  (But even then, page_table_lock is still used to guard page
      table allocation, and anon_vma allocation.)
      
      In this implementation, the spinlock is tucked inside the struct page of the
      page table page: with a BUILD_BUG_ON in case it overflows - which it would in
      the case of 32-bit PA-RISC with spinlock debugging enabled.
      
      Splitting the lock is not quite for free: another cacheline access.  Ideally,
      I suppose we would use split ptlock only for multi-threaded processes on
      multi-cpu machines; but deciding that dynamically would have its own costs.
      So for now enable it by config, at some number of cpus - since the Kconfig
      language doesn't support inequalities, let preprocessor compare that with
      NR_CPUS.  But I don't think it's worth being user-configurable: for good
      testing of both split and unsplit configs, split now at 4 cpus, and perhaps
      change that to 8 later.
      
      There is a benefit even for singly threaded processes: kswapd can be attacking
      one part of the mm while another part is busy faulting.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4c21e2f2
    • H
      [PATCH] mm: follow_page with inner ptlock · deceb6cd
      Hugh Dickins 提交于
      Final step in pushing down common core's page_table_lock.  follow_page no
      longer wants caller to hold page_table_lock, uses pte_offset_map_lock itself;
      and so no page_table_lock is taken in get_user_pages itself.
      
      But get_user_pages (and get_futex_key) do then need follow_page to pin the
      page for them: take Daniel's suggestion of bitflags to follow_page.
      
      Need one for WRITE, another for TOUCH (it was the accessed flag before:
      vanished along with check_user_page_readable, but surely get_numa_maps is
      wrong to mark every page it finds as accessed), another for GET.
      
      And another, ANON to dispose of untouched_anonymous_page: it seems silly for
      that to descend a second time, let follow_page observe if there was no page
      table and return ZERO_PAGE if so.  Fix minor bug in that: check VM_LOCKED -
      make_pages_present ought to make readonly anonymous present.
      
      Give get_numa_maps a cond_resched while we're there.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      deceb6cd
    • H
      [PATCH] mm: kill check_user_page_readable · c34d1b4d
      Hugh Dickins 提交于
      check_user_page_readable is a problematic variant of follow_page.  It's used
      only by oprofile's i386 and arm backtrace code, at interrupt time, to
      establish whether a userspace stackframe is currently readable.
      
      This is problematic, because we want to push the page_table_lock down inside
      follow_page, and later split it; whereas oprofile is doing a spin_trylock on
      it (in the i386 case, forgotten in the arm case), and needs that to pin
      perhaps two pages spanned by the stackframe (which might be covered by
      different locks when we split).
      
      I think oprofile is going about this in the wrong way: it doesn't need to know
      the area is readable (neither i386 nor arm uses read protection of user
      pages), it doesn't need to pin the memory, it should simply
      __copy_from_user_inatomic, and see if that succeeds or not.  Sorry, but I've
      not got around to devising the sparse __user annotations for this.
      
      Then we can eliminate check_user_page_readable, and return to a single
      follow_page without the __follow_page variants.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c34d1b4d
    • H
      [PATCH] mm: unmap_vmas with inner ptlock · 508034a3
      Hugh Dickins 提交于
      Remove the page_table_lock from around the calls to unmap_vmas, and replace
      the pte_offset_map in zap_pte_range by pte_offset_map_lock: all callers are
      now safe to descend without page_table_lock.
      
      Don't attempt fancy locking for hugepages, just take page_table_lock in
      unmap_hugepage_range.  Which makes zap_hugepage_range, and the hugetlb test in
      zap_page_range, redundant: unmap_vmas calls unmap_hugepage_range anyway.  Nor
      does unmap_vmas have much use for its mm arg now.
      
      The tlb_start_vma and tlb_end_vma in unmap_page_range are now called without
      page_table_lock: if they're implemented at all, they typically come down to
      flush_cache_range (usually done outside page_table_lock) and flush_tlb_range
      (which we already audited for the mprotect case).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      508034a3