1. 07 3月, 2010 14 次提交
  2. 04 3月, 2010 2 次提交
    • S
      SLUB: Fix per-cpu merge conflict · 1154fab7
      Stephen Rothwell 提交于
      The slab tree adds a percpu variable usage case (commit
      9dfc6e68 "SLUB: Use this_cpu operations in
      slub"), but the percpu tree removes the prefixing of percpu variables (commit
      dd17c8f7 "percpu: remove per_cpu__ prefix"),
      thus causing the following compilation error:
      
          CC      mm/slub.o
        mm/slub.c: In function ‘alloc_kmem_cache_cpus’:
        mm/slub.c:2078: error: implicit declaration of function ‘per_cpu_var’
        mm/slub.c:2078: warning: assignment makes pointer from integer without a cast
        make[1]: *** [mm/slub.o] Error 1
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      1154fab7
    • C
      kill unused invalidate_inode_pages helper · 2ecdc82e
      Christoph Hellwig 提交于
      No one is calling this anymore as everyone has switched to
      invalidate_mapping_pages long time ago.  Also update a few
      references to it in comments.  nfs has two more, but I can't
      easily figure what they are actually referring to, so I left
      them as-is.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2ecdc82e
  3. 02 3月, 2010 1 次提交
  4. 27 2月, 2010 1 次提交
  5. 26 2月, 2010 1 次提交
  6. 23 2月, 2010 1 次提交
  7. 22 2月, 2010 2 次提交
    • Y
      x86: Fix non-bootmem compilation on PowerPC · 2ee78f7b
      Yinghai Lu 提交于
      These build errors on some non-x86 platforms (PowerPC for example):
      
       mm/page_alloc.c: In function '__alloc_memory_core_early':
         mm/page_alloc.c:3468: error: implicit declaration of function 'find_early_area'
         mm/page_alloc.c:3483: error: implicit declaration of function 'reserve_early_without_check'
      
      The function is only needed on CONFIG_NO_BOOTMEM.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Weiner <hannes@saeurebad.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      LKML-Reference: <4B747239.4070907@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2ee78f7b
    • H
      mm: Make copy_from_user() in migrate.c statically predictable · 87b8d1ad
      H. Peter Anvin 提交于
      x86-32 has had a static test for copy_on_user() overflow for a while.
      This test currently fails in mm/migrate.c resulting in an
      allyesconfig/allmodconfig build failure on x86-32:
      
      In function ‘copy_from_user’,
          inlined from ‘do_pages_stat’ at
          /home/hpa/kernel/git/mm/migrate.c:1012:
      /home/hpa/kernel/git/arch/x86/include/asm/uaccess_32.h:212: error:
          call to ‘copy_from_user_overflow’ declared
      
      Make the logic more explicit and therefore easier for gcc to
      understand.
      
      v2: rewrite the loop entirely using a more normal structure for a
          chunked-data loop (Linus Torvalds)
      Reported-by: NLen Brown <lenb@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Reviewed-and-Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Arjan van de Ven <arjan@linux.kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87b8d1ad
  8. 21 2月, 2010 1 次提交
    • R
      MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself · 4b3073e1
      Russell King 提交于
      On VIVT ARM, when we have multiple shared mappings of the same file
      in the same MM, we need to ensure that we have coherency across all
      copies.  We do this via make_coherent() by making the pages
      uncacheable.
      
      This used to work fine, until we allowed highmem with highpte - we
      now have a page table which is mapped as required, and is not available
      for modification via update_mmu_cache().
      
      Ralf Beache suggested getting rid of the PTE value passed to
      update_mmu_cache():
      
        On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
        to construct a pointer to the pte again.  Passing a pte_t * is much
        more elegant.  Maybe we might even replace the pte argument with the
        pte_t?
      
      Ben Herrenschmidt would also like the pte pointer for PowerPC:
      
        Passing the ptep in there is exactly what I want.  I want that
        -instead- of the PTE value, because I have issue on some ppc cases,
        for I$/D$ coherency, where set_pte_at() may decide to mask out the
        _PAGE_EXEC.
      
      So, pass in the mapped page table pointer into update_mmu_cache(), and
      remove the PTE value, updating all implementations and call sites to
      suit.
      
      Includes a fix from Stephen Rothwell:
      
        sparc: fix fallout from update_mmu_cache API change
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      4b3073e1
  9. 17 2月, 2010 1 次提交
  10. 13 2月, 2010 3 次提交
    • Y
      sparsemem: Put mem map for one node together. · 9bdac914
      Yinghai Lu 提交于
      Add vmemmap_alloc_block_buf for mem map only.
      
      It will fallback to the old way if it cannot get a block that big.
      
      Before this patch, when a node have 128g ram installed, memmap are
      split into two parts or more.
      [    0.000000]  [ffffea0000000000-ffffea003fffffff] PMD -> [ffff880100600000-ffff88013e9fffff] on node 1
      [    0.000000]  [ffffea0040000000-ffffea006fffffff] PMD -> [ffff88013ec00000-ffff88016ebfffff] on node 1
      [    0.000000]  [ffffea0070000000-ffffea007fffffff] PMD -> [ffff882000600000-ffff8820105fffff] on node 0
      [    0.000000]  [ffffea0080000000-ffffea00bfffffff] PMD -> [ffff882010800000-ffff8820507fffff] on node 0
      [    0.000000]  [ffffea00c0000000-ffffea00dfffffff] PMD -> [ffff882050a00000-ffff8820709fffff] on node 0
      [    0.000000]  [ffffea00e0000000-ffffea00ffffffff] PMD -> [ffff884000600000-ffff8840205fffff] on node 2
      [    0.000000]  [ffffea0100000000-ffffea013fffffff] PMD -> [ffff884020800000-ffff8840607fffff] on node 2
      [    0.000000]  [ffffea0140000000-ffffea014fffffff] PMD -> [ffff884060a00000-ffff8840709fffff] on node 2
      [    0.000000]  [ffffea0150000000-ffffea017fffffff] PMD -> [ffff886000600000-ffff8860305fffff] on node 3
      [    0.000000]  [ffffea0180000000-ffffea01bfffffff] PMD -> [ffff886030800000-ffff8860707fffff] on node 3
      [    0.000000]  [ffffea01c0000000-ffffea01ffffffff] PMD -> [ffff888000600000-ffff8880405fffff] on node 4
      [    0.000000]  [ffffea0200000000-ffffea022fffffff] PMD -> [ffff888040800000-ffff8880707fffff] on node 4
      [    0.000000]  [ffffea0230000000-ffffea023fffffff] PMD -> [ffff88a000600000-ffff88a0105fffff] on node 5
      [    0.000000]  [ffffea0240000000-ffffea027fffffff] PMD -> [ffff88a010800000-ffff88a0507fffff] on node 5
      [    0.000000]  [ffffea0280000000-ffffea029fffffff] PMD -> [ffff88a050a00000-ffff88a0709fffff] on node 5
      [    0.000000]  [ffffea02a0000000-ffffea02bfffffff] PMD -> [ffff88c000600000-ffff88c0205fffff] on node 6
      [    0.000000]  [ffffea02c0000000-ffffea02ffffffff] PMD -> [ffff88c020800000-ffff88c0607fffff] on node 6
      [    0.000000]  [ffffea0300000000-ffffea030fffffff] PMD -> [ffff88c060a00000-ffff88c0709fffff] on node 6
      [    0.000000]  [ffffea0310000000-ffffea033fffffff] PMD -> [ffff88e000600000-ffff88e0305fffff] on node 7
      [    0.000000]  [ffffea0340000000-ffffea037fffffff] PMD -> [ffff88e030800000-ffff88e0707fffff] on node 7
      
      after patch will get
      [    0.000000]  [ffffea0000000000-ffffea006fffffff] PMD -> [ffff880100200000-ffff88016e5fffff] on node 0
      [    0.000000]  [ffffea0070000000-ffffea00dfffffff] PMD -> [ffff882000200000-ffff8820701fffff] on node 1
      [    0.000000]  [ffffea00e0000000-ffffea014fffffff] PMD -> [ffff884000200000-ffff8840701fffff] on node 2
      [    0.000000]  [ffffea0150000000-ffffea01bfffffff] PMD -> [ffff886000200000-ffff8860701fffff] on node 3
      [    0.000000]  [ffffea01c0000000-ffffea022fffffff] PMD -> [ffff888000200000-ffff8880701fffff] on node 4
      [    0.000000]  [ffffea0230000000-ffffea029fffffff] PMD -> [ffff88a000200000-ffff88a0701fffff] on node 5
      [    0.000000]  [ffffea02a0000000-ffffea030fffffff] PMD -> [ffff88c000200000-ffff88c0701fffff] on node 6
      [    0.000000]  [ffffea0310000000-ffffea037fffffff] PMD -> [ffff88e000200000-ffff88e0701fffff] on node 7
      
      -v2: change buf to vmemmap_buf instead according to Ingo
           also add CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER according to Ingo
      -v3: according to Andrew, use sizeof(name) instead of hard coded 15
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <1265793639-15071-19-git-send-email-yinghai@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Acked-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      9bdac914
    • Y
      sparsemem: Put usemap for one node together · a4322e1b
      Yinghai Lu 提交于
      Could save some buffer space instead of applying one by one.
      
      Could help that system that is going to use early_res instead of bootmem
      less entries in early_res make search more faster on system with more memory.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <1265793639-15071-18-git-send-email-yinghai@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      a4322e1b
    • Y
      x86: Make 64 bit use early_res instead of bootmem before slab · 08677214
      Yinghai Lu 提交于
      Finally we can use early_res to replace bootmem for x86_64 now.
      
      Still can use CONFIG_NO_BOOTMEM to enable it or not.
      
      -v2: fix 32bit compiling about MAX_DMA32_PFN
      -v3: folded bug fix from LKML message below
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4B747239.4070907@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      08677214
  11. 07 2月, 2010 1 次提交
  12. 03 2月, 2010 4 次提交
    • J
      hugetlb: fix section mismatches · 094e9539
      Jeff Mahoney 提交于
      hugetlb_sysfs_add_hstate is called by hugetlb_register_node directly
      during init and also indirectly via sysfs after init.
      
      This patch removes the __init tag from hugetlb_sysfs_add_hstate.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      094e9539
    • A
      mm: flush dcache before writing into page to avoid alias · 931e80e4
      anfei zhou 提交于
      The cache alias problem will happen if the changes of user shared mapping
      is not flushed before copying, then user and kernel mapping may be mapped
      into two different cache line, it is impossible to guarantee the coherence
      after iov_iter_copy_from_user_atomic.  So the right steps should be:
      
      	flush_dcache_page(page);
      	kmap_atomic(page);
      	write to page;
      	kunmap_atomic(page);
      	flush_dcache_page(page);
      
      More precisely, we might create two new APIs flush_dcache_user_page and
      flush_dcache_kern_page to replace the two flush_dcache_page accordingly.
      
      Here is a snippet tested on omap2430 with VIPT cache, and I think it is
      not ARM-specific:
      
      	int val = 0x11111111;
      	fd = open("abc", O_RDWR);
      	addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
      	*(addr+0) = 0x44444444;
      	tmp = *(addr+0);
      	*(addr+1) = 0x77777777;
      	write(fd, &val, sizeof(int));
      	close(fd);
      
      The results are not always 0x11111111 0x77777777 at the beginning as expected.  Sometimes we see 0x44444444 0x77777777.
      Signed-off-by: NAnfei <anfei.zhou@gmail.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: <linux-arch@vger.kernel.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      931e80e4
    • N
      mm: purge fragmented percpu vmap blocks · 02b709df
      Nick Piggin 提交于
      Improve handling of fragmented per-CPU vmaps.  We previously don't free
      up per-CPU maps until all its addresses have been used and freed.  So
      fragmented blocks could fill up vmalloc space even if they actually had
      no active vmap regions within them.
      
      Add some logic to allow all CPUs to have these blocks purged in the case
      of failure to allocate a new vm area, and also put some logic to trim
      such blocks of a current CPU if we hit them in the allocation path (so
      as to avoid a large build up of them).
      
      Christoph reported some vmap allocation failures when using the per CPU
      vmap APIs in XFS, which cannot be reproduced after this patch and the
      previous bug fix.
      
      Cc: linux-mm@kvack.org
      Cc: stable@kernel.org
      Tested-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      --
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02b709df
    • N
      mm: percpu-vmap fix RCU list walking · de560423
      Nick Piggin 提交于
      RCU list walking of the per-cpu vmap cache was broken.  It did not use
      RCU primitives, and also the union of free_list and rcu_head is
      obviously wrong (because free_list is indeed the list we are RCU
      walking).
      
      While we are there, remove a couple of unused fields from an earlier
      iteration.
      
      These APIs aren't actually used anywhere, because of problems with the
      XFS conversion.  Christoph has now verified that the problems are solved
      with these patches.  Also it is an exported interface, so I think it
      will be good to be merged now (and Christoph wants to get the XFS
      changes into their local tree).
      
      Cc: stable@kernel.org
      Cc: linux-mm@kvack.org
      Tested-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      --
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de560423
  13. 30 1月, 2010 2 次提交
    • N
      slab: fix regression in touched logic · 44b57f1c
      Nick Piggin 提交于
      When factoring common code into transfer_objects in commit 3ded175a ("slab: add
      transfer_objects() function"), the 'touched' logic got a bit broken. When
      refilling from the shared array (taking objects from the shared array), we are
      making use of the shared array so it should be marked as touched.
      
      Subsequently pulling an element from the cpu array and allocating it should
      also touch the cpu array, but that is taken care of after the alloc_done label.
      (So yes, the cpu array was getting touched = 1 twice).
      
      So revert this logic to how it worked in earlier kernels.
      
      This also affects the behaviour in __drain_alien_cache, which would previously
      'touch' the shared array and now does not. I think it is more logical not to
      touch there, because we are pushing objects into the shared array rather than
      pulling them off. So there is no good reason to postpone reaping them -- if the
      shared array is getting utilized, then it will get 'touched' in the alloc path
      (where this patch now restores the touch).
      Acked-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      44b57f1c
    • H
      mm: fix migratetype bug which slowed swapping · a7016235
      Hugh Dickins 提交于
      After memory pressure has forced it to dip into the reserves, 2.6.32's
      5f8dcc21 "page-allocator: split per-cpu
      list into one-list-per-migrate-type" has been returning MIGRATE_RESERVE
      pages to the MIGRATE_MOVABLE free_list: in some sense depleting reserves.
      
      Fix that in the most straightforward way (which, considering the overheads
      of alternative approaches, is Mel's preference): the right migratetype is
      already in page_private(page), but free_pcppages_bulk() wasn't using it.
      
      How did this bug show up?  As a 20% slowdown in my tmpfs loop kbuild
      swapping tests, on PowerMac G5 with SLUB allocator.  Bisecting to that
      commit was easy, but explaining the magnitude of the slowdown not easy.
      
      The same effect appears, but much less markedly, with SLAB, and even
      less markedly on other machines (the PowerMac divides into fewer zones
      than x86, I think that may be a factor).  We guess that lumpy reclaim
      of short-lived high-order pages is implicated in some way, and probably
      this bug has been tickling a poor decision somewhere in page reclaim.
      
      But instrumentation hasn't told me much, I've run out of time and
      imagination to determine exactly what's going on, and shouldn't hold up
      the fix any longer: it's valid, and might even fix other misbehaviours.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7016235
  14. 28 1月, 2010 1 次提交
    • L
      mm: add new 'read_cache_page_gfp()' helper function · 0531b2aa
      Linus Torvalds 提交于
      It's a simplified 'read_cache_page()' which takes a page allocation
      flag, so that different paths can control how aggressive the memory
      allocations are that populate a address space.
      
      In particular, the intel GPU object mapping code wants to be able to do
      a certain amount of own internal memory management by automatically
      shrinking the address space when memory starts getting tight.  This
      allows it to dynamically use different memory allocation policies on a
      per-allocation basis, rather than depend on the (static) address space
      gfp policy.
      
      The actual new function is a one-liner, but re-organizing the helper
      functions to the point where you can do this with a single line of code
      is what most of the patch is all about.
      Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0531b2aa
  15. 23 1月, 2010 2 次提交
  16. 21 1月, 2010 1 次提交
    • Y
      vmalloc: remove BUG_ON due to racy counting of VM_LAZY_FREE · 88f50044
      Yongseok Koh 提交于
      In free_unmap_area_noflush(), va->flags is marked as VM_LAZY_FREE first, and
      then vmap_lazy_nr is increased atomically.
      
      But, in __purge_vmap_area_lazy(), while traversing of vmap_are_list, nr
      is counted by checking VM_LAZY_FREE is set to va->flags.  After counting
      the variable nr, kernel reads vmap_lazy_nr atomically and checks a
      BUG_ON condition whether nr is greater than vmap_lazy_nr to prevent
      vmap_lazy_nr from being negative.
      
      The problem is that, if interrupted right after marking VM_LAZY_FREE,
      increment of vmap_lazy_nr can be delayed.  Consequently, BUG_ON
      condition can be met because nr is counted more than vmap_lazy_nr.
      
      It is highly probable when vmalloc/vfree are called frequently.  This
      scenario have been verified by adding delay between marking VM_LAZY_FREE
      and increasing vmap_lazy_nr in free_unmap_area_noflush().
      
      Even the vmap_lazy_nr is for checking high watermark, it never be the
      strict watermark.  Although the BUG_ON condition is to prevent
      vmap_lazy_nr from being negative, vmap_lazy_nr is signed variable.  So,
      it could go down to negative value temporarily.
      
      Consequently, removing the BUG_ON condition is proper.
      
      A possible BUG_ON message is like the below.
      
         kernel BUG at mm/vmalloc.c:517!
         invalid opcode: 0000 [#1] SMP
         EIP: 0060:[<c04824a4>] EFLAGS: 00010297 CPU: 3
         EIP is at __purge_vmap_area_lazy+0x144/0x150
         EAX: ee8a8818 EBX: c08e77d4 ECX: e7c7ae40 EDX: c08e77ec
         ESI: 000081fe EDI: e7c7ae60 EBP: e7c7ae64 ESP: e7c7ae3c
         DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
         Call Trace:
         [<c0482ad9>] free_unmap_vmap_area_noflush+0x69/0x70
         [<c0482b02>] remove_vm_area+0x22/0x70
         [<c0482c15>] __vunmap+0x45/0xe0
         [<c04831ec>] vmalloc+0x2c/0x30
         Code: 8d 59 e0 eb 04 66 90 89 cb 89 d0 e8 87 fe ff ff 8b 43 20 89 da 8d 48 e0 8d 43 20 3b 04 24 75 e7 fe 05 a8 a5 a3 c0 e9 78 ff ff ff <0f> 0b eb fe 90 8d b4 26 00 00 00 00 56 89 c6 b8 ac a5 a3 c0 31
         EIP: [<c04824a4>] __purge_vmap_area_lazy+0x144/0x150 SS:ESP 0068:e7c7ae3c
      
      [ See also http://marc.info/?l=linux-kernel&m=126335856228090&w=2 ]
      Signed-off-by: NYongseok Koh <yongseok.koh@samsung.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88f50044
  17. 17 1月, 2010 2 次提交
    • K
      page allocator: update NR_FREE_PAGES only when necessary · 6ccf80eb
      KOSAKI Motohiro 提交于
      commit f2260e6b (page allocator: update NR_FREE_PAGES only as necessary)
      made one minor regression.  if __rmqueue() was failed, NR_FREE_PAGES stat
      go wrong.  this patch fixes it.
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Reported-by: NHuang Shijie <shijie8@gmail.com>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ccf80eb
    • D
      nommu: fix shared mmap after truncate shrinkage problems · 7e660872
      David Howells 提交于
      Fix a problem in NOMMU mmap with ramfs whereby a shared mmap can happen
      over the end of a truncation.  The problem is that
      ramfs_nommu_check_mappings() checks that the reduced file size against the
      VMA tree, but not the vm_region tree.
      
      The following sequence of events can cause the problem:
      
      	fd = open("/tmp/x", O_RDWR|O_TRUNC|O_CREAT, 0600);
      	ftruncate(fd, 32 * 1024);
      	a = mmap(NULL, 32 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
      	b = mmap(NULL, 16 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
      	munmap(a, 32 * 1024);
      	ftruncate(fd, 16 * 1024);
      	c = mmap(NULL, 32 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
      
      Mapping 'a' creates a vm_region covering 32KB of the file.  Mapping 'b'
      sees that the vm_region from 'a' is covering the region it wants and so
      shares it, pinning it in memory.
      
      Mapping 'a' then goes away and the file is truncated to the end of VMA
      'b'.  However, the region allocated by 'a' is still in effect, and has
      _not_ been reduced.
      
      Mapping 'c' is then created, and because there's a vm_region covering the
      desired region, get_unmapped_area() is _not_ called to repeat the check,
      and the mapping is granted, even though the pages from the latter half of
      the mapping have been discarded.
      
      However:
      
      	d = mmap(NULL, 16 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
      
      Mapping 'd' should work, and should end up sharing the region allocated by
      'a'.
      
      To deal with this, we shrink the vm_region struct during the truncation,
      lest do_mmap_pgoff() take it as licence to share the full region
      automatically without calling the get_unmapped_area() file op again.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e660872