1. 23 8月, 2007 1 次提交
    • A
      fix NULL pointer dereference in __vm_enough_memory() · 34b4e4aa
      Alan Cox 提交于
      The new exec code inserts an accounted vma into an mm struct which is not
      current->mm.  The existing memory check code has a hard coded assumption
      that this does not happen as does the security code.
      
      As the correct mm is known we pass the mm to the security method and the
      helper function.  A new security test is added for the case where we need
      to pass the mm and the existing one is modified to pass current->mm to
      avoid the need to change large amounts of code.
      
      (Thanks to Tobias for fixing rejects and testing)
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
      Cc: James Morris <jmorris@redhat.com>
      Cc: Tobias Diedrich <ranma+kernel@tdiedrich.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34b4e4aa
  2. 30 7月, 2007 1 次提交
    • A
      Remove fs.h from mm.h · 4e950f6f
      Alexey Dobriyan 提交于
      Remove fs.h from mm.h. For this,
       1) Uninline vma_wants_writenotify(). It's pretty huge anyway.
       2) Add back fs.h or less bloated headers (err.h) to files that need it.
      
      As result, on x86_64 allyesconfig, fs.h dependencies cut down from 3929 files
      rebuilt down to 3444 (-12.3%).
      
      Cross-compile tested without regressions on my two usual configs and (sigh):
      
      alpha              arm-mx1ads        mips-bigsur          powerpc-ebony
      alpha-allnoconfig  arm-neponset      mips-capcella        powerpc-g5
      alpha-defconfig    arm-netwinder     mips-cobalt          powerpc-holly
      alpha-up           arm-netx          mips-db1000          powerpc-iseries
      arm                arm-ns9xxx        mips-db1100          powerpc-linkstation
      arm-assabet        arm-omap_h2_1610  mips-db1200          powerpc-lite5200
      arm-at91rm9200dk   arm-onearm        mips-db1500          powerpc-maple
      arm-at91rm9200ek   arm-picotux200    mips-db1550          powerpc-mpc7448_hpc2
      arm-at91sam9260ek  arm-pleb          mips-ddb5477         powerpc-mpc8272_ads
      arm-at91sam9261ek  arm-pnx4008       mips-decstation      powerpc-mpc8313_rdb
      arm-at91sam9263ek  arm-pxa255-idp    mips-e55             powerpc-mpc832x_mds
      arm-at91sam9rlek   arm-realview      mips-emma2rh         powerpc-mpc832x_rdb
      arm-ateb9200       arm-realview-smp  mips-excite          powerpc-mpc834x_itx
      arm-badge4         arm-rpc           mips-fulong          powerpc-mpc834x_itxgp
      arm-carmeva        arm-s3c2410       mips-ip22            powerpc-mpc834x_mds
      arm-cerfcube       arm-shannon       mips-ip27            powerpc-mpc836x_mds
      arm-clps7500       arm-shark         mips-ip32            powerpc-mpc8540_ads
      arm-collie         arm-simpad        mips-jazz            powerpc-mpc8544_ds
      arm-corgi          arm-spitz         mips-jmr3927         powerpc-mpc8560_ads
      arm-csb337         arm-trizeps4      mips-malta           powerpc-mpc8568mds
      arm-csb637         arm-versatile     mips-mipssim         powerpc-mpc85xx_cds
      arm-ebsa110        i386              mips-mpc30x          powerpc-mpc8641_hpcn
      arm-edb7211        i386-allnoconfig  mips-msp71xx         powerpc-mpc866_ads
      arm-em_x270        i386-defconfig    mips-ocelot          powerpc-mpc885_ads
      arm-ep93xx         i386-up           mips-pb1100          powerpc-pasemi
      arm-footbridge     ia64              mips-pb1500          powerpc-pmac32
      arm-fortunet       ia64-allnoconfig  mips-pb1550          powerpc-ppc64
      arm-h3600          ia64-bigsur       mips-pnx8550-jbs     powerpc-prpmc2800
      arm-h7201          ia64-defconfig    mips-pnx8550-stb810  powerpc-ps3
      arm-h7202          ia64-gensparse    mips-qemu            powerpc-pseries
      arm-hackkit        ia64-sim          mips-rbhma4200       powerpc-up
      arm-integrator     ia64-sn2          mips-rbhma4500       s390
      arm-iop13xx        ia64-tiger        mips-rm200           s390-allnoconfig
      arm-iop32x         ia64-up           mips-sb1250-swarm    s390-defconfig
      arm-iop33x         ia64-zx1          mips-sead            s390-up
      arm-ixp2000        m68k              mips-tb0219          sparc
      arm-ixp23xx        m68k-amiga        mips-tb0226          sparc-allnoconfig
      arm-ixp4xx         m68k-apollo       mips-tb0287          sparc-defconfig
      arm-jornada720     m68k-atari        mips-workpad         sparc-up
      arm-kafa           m68k-bvme6000     mips-wrppmc          sparc64
      arm-kb9202         m68k-hp300        mips-yosemite        sparc64-allnoconfig
      arm-ks8695         m68k-mac          parisc               sparc64-defconfig
      arm-lart           m68k-mvme147      parisc-allnoconfig   sparc64-up
      arm-lpd270         m68k-mvme16x      parisc-defconfig     um-x86_64
      arm-lpd7a400       m68k-q40          parisc-up            x86_64
      arm-lpd7a404       m68k-sun3         powerpc              x86_64-allnoconfig
      arm-lubbock        m68k-sun3x        powerpc-cell         x86_64-defconfig
      arm-lusl7200       mips              powerpc-celleb       x86_64-up
      arm-mainstone      mips-atlas        powerpc-chrp32
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e950f6f
  3. 20 7月, 2007 2 次提交
    • O
      mm: variable length argument support · b6a2fea3
      Ollie Wild 提交于
      Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from
      the old mm into the new mm.
      
      We create the new mm before the binfmt code runs, and place the new stack at
      the very top of the address space.  Once the binfmt code runs and figures out
      where the stack should be, we move it downwards.
      
      It is a bit peculiar in that we have one task with two mm's, one of which is
      inactive.
      
      [a.p.zijlstra@chello.nl: limit stack size]
      Signed-off-by: NOllie Wild <aaw@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      [bunk@stusta.de: unexport bprm_mm_init]
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6a2fea3
    • N
      mm: merge populate and nopage into fault (fixes nonlinear) · 54cb8821
      Nick Piggin 提交于
      Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
      the virtual address -> file offset differently from linear mappings.
      
      ->populate is a layering violation because the filesystem/pagecache code
      should need to know anything about the virtual memory mapping.  The hitch here
      is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
       But it is more logical to pass pgoff rather than have the ->nopage function
      calculate it itself anyway (because that's a similar layering violation).
      
      Having the populate handler install the pte itself is likewise a nasty thing
      to be doing.
      
      This patch introduces a new fault handler that replaces ->nopage and
      ->populate and (later) ->nopfn.  Most of the old mechanism is still in place
      so there is a lot of duplication and nice cleanups that can be removed if
      everyone switches over.
      
      The rationale for doing this in the first place is that nonlinear mappings are
      subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
      to duplicate the synchronisation logic rather than just consolidate the two.
      
      After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
      pagecache.  Seems like a fringe functionality anyway.
      
      NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
      users have hit mainline yet.
      
      [akpm@linux-foundation.org: cleanup]
      [randy.dunlap@oracle.com: doc. fixes for readahead]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54cb8821
  4. 17 7月, 2007 1 次提交
  5. 12 7月, 2007 1 次提交
    • E
      security: Protection for exploiting null dereference using mmap · ed032189
      Eric Paris 提交于
      Add a new security check on mmap operations to see if the user is attempting
      to mmap to low area of the address space.  The amount of space protected is
      indicated by the new proc tunable /proc/sys/vm/mmap_min_addr and defaults to
      0, preserving existing behavior.
      
      This patch uses a new SELinux security class "memprotect."  Policy already
      contains a number of allow rules like a_t self:process * (unconfined_t being
      one of them) which mean that putting this check in the process class (its
      best current fit) would make it useless as all user processes, which we also
      want to protect against, would be allowed. By taking the memprotect name of
      the new class it will also make it possible for us to move some of the other
      memory protect permissions out of 'process' and into the new class next time
      we bump the policy version number (which I also think is a good future idea)
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      ed032189
  6. 22 6月, 2007 1 次提交
  7. 09 5月, 2007 2 次提交
  8. 08 5月, 2007 2 次提交
  9. 03 5月, 2007 1 次提交
    • J
      [PATCH] x86: PARAVIRT: add hooks to intercept mm creation and destruction · d6dd61c8
      Jeremy Fitzhardinge 提交于
      Add hooks to allow a paravirt implementation to track the lifetime of
      an mm.  Paravirtualization requires three hooks, but only two are
      needed in common code.  They are:
      
      arch_dup_mmap, which is called when a new mmap is created at fork
      
      arch_exit_mmap, which is called when the last process reference to an
        mm is dropped, which typically happens on exit and exec.
      
      The third hook is activate_mm, which is called from the arch-specific
      activate_mm() macro/function, and so doesn't need stub versions for
      other architectures.  It's called when an mm is first used.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: linux-arch@vger.kernel.org
      Cc: James Bottomley <James.Bottomley@SteelEye.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      d6dd61c8
  10. 02 3月, 2007 1 次提交
  11. 10 2月, 2007 1 次提交
  12. 31 1月, 2007 1 次提交
  13. 09 12月, 2006 1 次提交
  14. 08 12月, 2006 1 次提交
  15. 15 11月, 2006 3 次提交
    • H
      [PATCH] hugetlb: fix error return for brk() entering a hugepage region · cd2579d7
      Hugh Dickins 提交于
      Commit cb07c9a1 causes the wrong return
      value.  is_hugepage_only_range() is a boolean, so we should return
      -EINVAL rather than 1.
      
      Also - we can use "mm" instead of looking up "current->mm" again.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cd2579d7
    • D
      [PATCH] hugetlb: check for brk() entering a hugepage region · cb07c9a1
      David Gibson 提交于
      Unlike mmap(), the codepath for brk() creates a vma without first checking
      that it doesn't touch a region exclusively reserved for hugepages.  On
      powerpc, this can allow it to create a normal page vma in a hugepage
      region, causing oopses and other badness.
      
      Add a test to prevent this.  With this patch, brk() will simply fail if it
      attempts to move the break into a hugepage reserved region.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cb07c9a1
    • H
      [PATCH] hugetlb: prepare_hugepage_range check offset too · 68589bc3
      Hugh Dickins 提交于
      (David:)
      
      If hugetlbfs_file_mmap() returns a failure to do_mmap_pgoff() - for example,
      because the given file offset is not hugepage aligned - then do_mmap_pgoff
      will go to the unmap_and_free_vma backout path.
      
      But at this stage the vma hasn't been marked as hugepage, and the backout path
      will call unmap_region() on it.  That will eventually call down to the
      non-hugepage version of unmap_page_range().  On ppc64, at least, that will
      cause serious problems if there are any existing hugepage pagetable entries in
      the vicinity - for example if there are any other hugepage mappings under the
      same PUD.  unmap_page_range() will trigger a bad_pud() on the hugepage pud
      entries.  I suspect this will also cause bad problems on ia64, though I don't
      have a machine to test it on.
      
      (Hugh:)
      
      prepare_hugepage_range() should check file offset alignment when it checks
      virtual address and length, to stop MAP_FIXED with a bad huge offset from
      unmapping before it fails further down.  PowerPC should apply the same
      prepare_hugepage_range alignment checks as ia64 and all the others do.
      
      Then none of the alignment checks in hugetlbfs_file_mmap are required (nor
      is the check for too small a mapping); but even so, move up setting of
      VM_HUGETLB and add a comment to warn of what David Gibson discovered - if
      hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region
      when unwinding from error will go the non-huge way, which may cause bad
      behaviour on architectures (powerpc and ia64) which segregate their huge
      mappings into a separate region of the address space.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Acked-by: NAdam Litke <agl@us.ibm.com>
      Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      68589bc3
  16. 16 10月, 2006 1 次提交
  17. 26 9月, 2006 2 次提交
    • C
      [PATCH] ZVC: Support NR_SLAB_RECLAIMABLE / NR_SLAB_UNRECLAIMABLE · 972d1a7b
      Christoph Lameter 提交于
      Remove the atomic counter for slab_reclaim_pages and replace the counter
      and NR_SLAB with two ZVC counter that account for unreclaimable and
      reclaimable slab pages: NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE.
      
      Change the check in vmscan.c to refer to to NR_SLAB_RECLAIMABLE.  The
      intend seems to be to check for slab pages that could be freed.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      972d1a7b
    • P
      [PATCH] mm: tracking shared dirty pages · d08b3851
      Peter Zijlstra 提交于
      Tracking of dirty pages in shared writeable mmap()s.
      
      The idea is simple: write protect clean shared writeable pages, catch the
      write-fault, make writeable and set dirty.  On page write-back clean all the
      PTE dirty bits and write protect them once again.
      
      The implementation is a tad harder, mainly because the default
      backing_dev_info capabilities were too loosely maintained.  Hence it is not
      enough to test the backing_dev_info for cap_account_dirty.
      
      The current heuristic is as follows, a VMA is eligible when:
       - its shared writeable
          (vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
       - it is not a 'special' mapping
          (vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
       - the backing_dev_info is cap_account_dirty
          mapping_cap_account_dirty(vma->vm_file->f_mapping)
       - f_op->mmap() didn't change the default page protection
      
      Page from remap_pfn_range() are explicitly excluded because their COW
      semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
      because they don't have a backing store anyway.
      
      mprotect() is taught about the new behaviour as well.  However it overrides
      the last condition.
      
      Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
      It can be called on any page, but is currently only implemented for mapped
      pages, if the page is found the be of a VMA that accounts dirty pages it will
      also wrprotect the PTE.
      
      Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
      under ->private_lock.  This seems to be safe, since ->private_lock is used to
      serialize access to the buffers, not the page itself.  This is needed because
      clear_page_dirty() will call into page_mkclean() and would thereby violate
      locking order.
      
      [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d08b3851
  18. 08 9月, 2006 1 次提交
  19. 27 7月, 2006 1 次提交
  20. 01 7月, 2006 1 次提交
  21. 23 6月, 2006 1 次提交
    • D
      [PATCH] add page_mkwrite() vm_operations method · 9637a5ef
      David Howells 提交于
      Add a new VMA operation to notify a filesystem or other driver about the
      MMU generating a fault because userspace attempted to write to a page
      mapped through a read-only PTE.
      
      This facility permits the filesystem or driver to:
      
       (*) Implement storage allocation/reservation on attempted write, and so to
           deal with problems such as ENOSPC more gracefully (perhaps by generating
           SIGBUS).
      
       (*) Delay making the page writable until the contents have been written to a
           backing cache. This is useful for NFS/AFS when using FS-Cache/CacheFS.
           It permits the filesystem to have some guarantee about the state of the
           cache.
      
       (*) Account and limit number of dirty pages. This is one piece of the puzzle
           needed to make shared writable mapping work safely in FUSE.
      
      Needed by cachefs (Or is it cachefiles?  Or fscache? <head spins>).
      
      At least four other groups have stated an interest in it or a desire to use
      the functionality it provides: FUSE, OCFS2, NTFS and JFFS2.  Also, things like
      EXT3 really ought to use it to deal with the case of shared-writable mmap
      encountering ENOSPC before we permit the page to be dirtied.
      
      From: Peter Zijlstra <a.p.zijlstra@chello.nl>
      
        get_user_pages(.write=1, .force=1) can generate COW hits on read-only
        shared mappings, this patch traps those as mkpage_write candidates and fails
        to handle them the old way.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Joel Becker <Joel.Becker@oracle.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9637a5ef
  22. 11 4月, 2006 2 次提交
  23. 01 4月, 2006 1 次提交
  24. 26 3月, 2006 1 次提交
  25. 22 3月, 2006 1 次提交
    • H
      [PATCH] remove VM_DONTCOPY bogosities · a6f563db
      Hugh Dickins 提交于
      Now that it's madvisable, remove two pieces of VM_DONTCOPY bogosity:
      
      1. There was and is no logical reason why VM_DONTCOPY should be in the
         list of flags which forbid vma merging (and those drivers which set
         it are also setting VM_IO, which itself forbids the merge).
      
      2. It's hard to understand the purpose of the VM_HUGETLB, VM_DONTCOPY
         block in vm_stat_account: but never mind, it's under CONFIG_HUGETLB,
         which (unlike CONFIG_HUGETLB_PAGE or CONFIG_HUGETLBFS) has never been
         defined.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a6f563db
  26. 12 1月, 2006 1 次提交
  27. 17 12月, 2005 1 次提交
  28. 23 11月, 2005 1 次提交
    • H
      [PATCH] unpaged: private write VM_RESERVED · 83e9b7e9
      Hugh Dickins 提交于
      The PageReserved removal in 2.6.15-rc1 issued a "deprecated" message when you
      tried to mmap or mprotect MAP_PRIVATE PROT_WRITE a VM_RESERVED, and failed
      with -EACCES: because do_wp_page lacks the refinement to COW pages in those
      areas, nor do we expect to find anonymous pages in them; and it seemed just
      bloat to add code for handling such a peculiar case.  But immediately it
      caused vbetool and ddcprobe (using lrmi) to fail.
      
      So revert the "deprecated" messages, letting mmap and mprotect succeed.  But
      leave do_wp_page's BUG_ON(vma->vm_flags & VM_RESERVED) in place until we've
      added the code to do it right: so this particular patch is only good if the
      app doesn't really need to write to that private area.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      83e9b7e9
  29. 19 11月, 2005 1 次提交
  30. 07 11月, 2005 1 次提交
  31. 31 10月, 2005 1 次提交
  32. 30 10月, 2005 2 次提交
    • H
      [PATCH] mm: unmap_vmas with inner ptlock · 508034a3
      Hugh Dickins 提交于
      Remove the page_table_lock from around the calls to unmap_vmas, and replace
      the pte_offset_map in zap_pte_range by pte_offset_map_lock: all callers are
      now safe to descend without page_table_lock.
      
      Don't attempt fancy locking for hugepages, just take page_table_lock in
      unmap_hugepage_range.  Which makes zap_hugepage_range, and the hugetlb test in
      zap_page_range, redundant: unmap_vmas calls unmap_hugepage_range anyway.  Nor
      does unmap_vmas have much use for its mm arg now.
      
      The tlb_start_vma and tlb_end_vma in unmap_page_range are now called without
      page_table_lock: if they're implemented at all, they typically come down to
      flush_cache_range (usually done outside page_table_lock) and flush_tlb_range
      (which we already audited for the mprotect case).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      508034a3
    • H
      [PATCH] mm: unlink vma before pagetables · 8f4f8c16
      Hugh Dickins 提交于
      In most places the descent from pgd to pud to pmd to pte holds mmap_sem
      (exclusively or not), which ensures that free_pgtables cannot be freeing page
      tables from any level at the same time.  But truncation and reverse mapping
      descend without mmap_sem.
      
      No problem: just make sure that a vma is unlinked from its prio_tree (or
      nonlinear list) and from its anon_vma list, after zapping the vma, but before
      freeing its page tables.  Then neither vmtruncate nor rmap can reach that vma
      whose page tables are now volatile (nor do they need to reach it, since all
      its page entries have been zapped by this stage).
      
      The i_mmap_lock and anon_vma->lock already serialize this correctly; but the
      locking hierarchy is such that we cannot take them while holding
      page_table_lock.  Well, we're trying to push that down anyway.  So in this
      patch, move anon_vma_unlink and unlink_file_vma into free_pgtables, at the
      same time as moving page_table_lock around calls to unmap_vmas.
      
      tlb_gather_mmu and tlb_finish_mmu then fall outside the page_table_lock, but
      we made them preempt_disable and preempt_enable earlier; and a long source
      audit of all the architectures has shown no problem with removing
      page_table_lock from them.  free_pgtables doesn't need page_table_lock for
      itself, nor for what it calls; tlb->mm->nr_ptes is usually protected by
      page_table_lock, but partly by non-exclusive mmap_sem - here it's decremented
      with exclusive mmap_sem, or mm_users 0.  update_hiwater_rss and
      vm_unacct_memory don't need page_table_lock either.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8f4f8c16