1. 22 1月, 2014 22 次提交
    • T
      memblock: make memblock_set_node() support different memblock_type · e7e8de59
      Tang Chen 提交于
      [sfr@canb.auug.org.au: fix powerpc build]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7e8de59
    • T
      memblock, mem_hotplug: introduce MEMBLOCK_HOTPLUG flag to mark hotpluggable regions · 66b16edf
      Tang Chen 提交于
      In find_hotpluggable_memory, once we find out a memory region which is
      hotpluggable, we want to mark them in memblock.memory.  So that we could
      control memblock allocator not to allocte hotpluggable memory for the
      kernel later.
      
      To achieve this goal, we introduce MEMBLOCK_HOTPLUG flag to indicate the
      hotpluggable memory regions in memblock and a function
      memblock_mark_hotplug() to mark hotpluggable memory if we find one.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66b16edf
    • T
      memblock, numa: introduce flags field into memblock · 66a20757
      Tang Chen 提交于
      There is no flag in memblock to describe what type the memory is.
      Sometimes, we may use memblock to reserve some memory for special usage.
      And we want to know what kind of memory it is.  So we need a way to
      
      In hotplug environment, we want to reserve hotpluggable memory so the
      kernel won't be able to use it.  And when the system is up, we have to
      free these hotpluggable memory to buddy.  So we need to mark these
      memory first.
      
      In order to do so, we need to mark out these special memory in memblock.
      In this patch, we introduce a new "flags" member into memblock_region:
      
         struct memblock_region {
                 phys_addr_t base;
                 phys_addr_t size;
                 unsigned long flags;		/* This is new. */
         #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
                 int nid;
         #endif
         };
      
      This patch does the following things:
      1) Add "flags" member to memblock_region.
      2) Modify the following APIs' prototype:
      	memblock_add_region()
      	memblock_insert_region()
      3) Add memblock_reserve_region() to support reserve memory with flags, and keep
         memblock_reserve()'s prototype unmodified.
      4) Modify other APIs to support flags, but keep their prototype unmodified.
      
      The idea is from Wen Congyang <wency@cn.fujitsu.com> and Liu Jiang <jiang.liu@huawei.com>.
      Suggested-by: NWen Congyang <wency@cn.fujitsu.com>
      Suggested-by: NLiu Jiang <jiang.liu@huawei.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66a20757
    • G
      mm/memblock: debug: correct displaying of upper memory boundary · 931d13f5
      Grygorii Strashko 提交于
      Current memblock APIs don't work on 32 PAE or LPAE extension arches
      where the physical memory start address beyond 4GB.  The problem was
      discussed here [3] where Tejun, Yinghai(thanks) proposed a way forward
      with memblock interfaces.  Based on the proposal, this series adds
      necessary memblock interfaces and convert the core kernel code to use
      them.  Architectures already converted to NO_BOOTMEM use these new
      interfaces and other which still uses bootmem, these new interfaces just
      fallback to exiting bootmem APIs.
      
      So no functional change in behavior.  In long run, once all the
      architectures moves to NO_BOOTMEM, we can get rid of bootmem layer
      completely.  This is one step to remove the core code dependency with
      bootmem and also gives path for architectures to move away from bootmem.
      
      Testing is done on ARM architecture with 32 bit ARM LAPE machines with
      normal as well sparse(faked) memory model.
      
      This patch (of 23):
      
      When debugging is enabled (cmdline has "memblock=debug") the memblock
      will display upper memory boundary per each allocated/freed memory range
      wrongly.  For example:
      
       memblock_reserve: [0x0000009e7e8000-0x0000009e7ed000] _memblock_early_alloc_try_nid_nopanic+0xfc/0x12c
      
      The 0x0000009e7ed000 is displayed instead of 0x0000009e7ecfff
      
      Hence, correct this by changing formula used to calculate upper memory
      boundary to (u64)base + size - 1 instead of (u64)base + size everywhere
      in the debug messages.
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Walmsley <paul@pwsan.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Tony Lindgren <tony@atomide.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      931d13f5
    • D
      mm/mlock: prepare params outside critical region · 1f1cd705
      Davidlohr Bueso 提交于
      All mlock related syscalls prepare lock limits, lengths and start
      parameters with the mmap_sem held.  Move this logic outside of the
      critical region.  For the case of mlock, continue incrementing the
      amount already locked by mm->locked_vm with the rwsem taken.
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Reviewed-by: NMichel Lespinasse <walken@google.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f1cd705
    • D
      mm/mmap.c: add mlock_future_check() helper · 363ee17f
      Davidlohr Bueso 提交于
      Both do_brk and do_mmap_pgoff verify that we are actually capable of
      locking future pages if the corresponding VM_LOCKED flags are used.
      Encapsulate this logic into a single mlock_future_check() helper
      function.
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Reviewed-by: NMichel Lespinasse <walken@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      363ee17f
    • J
      mm: add overcommit_kbytes sysctl variable · 49f0ce5f
      Jerome Marchand 提交于
      Some applications that run on HPC clusters are designed around the
      availability of RAM and the overcommit ratio is fine tuned to get the
      maximum usage of memory without swapping.  With growing memory, the
      1%-of-all-RAM grain provided by overcommit_ratio has become too coarse
      for these workload (on a 2TB machine it represents no less than 20GB).
      
      This patch adds the new overcommit_kbytes sysctl variable that allow a
      much finer grain.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix nommu build]
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49f0ce5f
    • M
      mm, show_mem: remove SHOW_MEM_FILTER_PAGE_COUNT · aec6a888
      Mel Gorman 提交于
      Commit 4b59e6c4 ("mm, show_mem: suppress page counts in
      non-blockable contexts") introduced SHOW_MEM_FILTER_PAGE_COUNT to
      suppress PFN walks on large memory machines.  Commit c78e9363 ("mm:
      do not walk all of system memory during show_mem") avoided a PFN walk in
      the generic show_mem helper which removes the requirement for
      SHOW_MEM_FILTER_PAGE_COUNT in that case.
      
      This patch removes PFN walkers from the arch-specific implementations
      that report on a per-node or per-zone granularity.  ARM and unicore32
      still do a PFN walk as they report memory usage on each bank which is a
      much finer granularity where the debugging information may still be of
      use.  As the remaining arches doing PFN walks have relatively small
      amounts of memory, this patch simply removes SHOW_MEM_FILTER_PAGE_COUNT.
      
      [akpm@linux-foundation.org: fix parisc]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: James Bottomley <jejb@parisc-linux.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aec6a888
    • J
      mm/vmalloc: interchage the implementation of vmalloc_to_{pfn,page} · ece86e22
      Jianyu Zhan 提交于
      Currently we are implementing vmalloc_to_pfn() as a wrapper around
      vmalloc_to_page(), which is implemented as follow:
      
       1. walks the page talbes to generates the corresponding pfn,
       2. then converts the pfn to struct page,
       3. returns it.
      
      And vmalloc_to_pfn() re-wraps vmalloc_to_page() to get the pfn.
      
      This seems too circuitous, so this patch reverses the way: implement
      vmalloc_to_page() as a wrapper around vmalloc_to_pfn().  This makes
      vmalloc_to_pfn() and vmalloc_to_page() slightly more efficient.
      
      No functional change.
      Signed-off-by: NJianyu Zhan <nasa4836@gmail.com>
      Cc: Vladimir Murzin <murzin.v@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ece86e22
    • A
      mm/hugetlb.c: call MMU notifiers when copying a hugetlb page range · e8569dd2
      Andreas Sandberg 提交于
      When copy_hugetlb_page_range() is called to copy a range of hugetlb
      mappings, the secondary MMUs are not notified if there is a protection
      downgrade, which breaks COW semantics in KVM.
      
      This patch adds the necessary MMU notifier calls.
      Signed-off-by: NAndreas Sandberg <andreas@sandberg.pp.se>
      Acked-by: NSteve Capper <steve.capper@linaro.org>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8569dd2
    • Z
      mm, memory-failure: fix typo in me_pagecache_dirty() · 549543df
      Zhi Yong Wu 提交于
      [akpm@linux-foundation.org: s/cache/pagecache/]
      Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      549543df
    • K
      mm: create a separate slab for page->ptl allocation · b35f1819
      Kirill A. Shutemov 提交于
      If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
      is 72 bytes.  For page->ptl they will be allocated from kmalloc-96 slab,
      so we loose 24 on each.  An average system can easily allocate few tens
      thousands of page->ptl and overhead is significant.
      
      Let's create a separate slab for page->ptl allocation to solve this.
      
      To make sure that it really works this time, some numbers from my test
      machine (just booted, no load):
      
      Before:
        # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo
        kmalloc-96         31987  32190    128   30    1 : tunables  120   60    8 : slabdata   1073   1073     92
      After:
        # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo
        page->ptl          27516  28143     72   53    1 : tunables  120   60    8 : slabdata    531    531      9
        kmalloc-96          3853   5280    128   30    1 : tunables  120   60    8 : slabdata    176    176      0
      
      Note that the patch is useful not only for debug case, but also for
      PREEMPT_RT, where spinlock_t is always bloated.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b35f1819
    • Y
      mm: get rid of unnecessary pageblock scanning in setup_zone_migrate_reserve · 943dca1a
      Yasuaki Ishimatsu 提交于
      Yasuaki Ishimatsu reported memory hot-add spent more than 5 _hours_ on
      9TB memory machine since onlining memory sections is too slow.  And we
      found out setup_zone_migrate_reserve spent >90% of the time.
      
      The problem is, setup_zone_migrate_reserve scans all pageblocks
      unconditionally, but it is only necessary if the number of reserved
      block was reduced (i.e.  memory hot remove).
      
      Moreover, maximum MIGRATE_RESERVE per zone is currently 2.  It means
      that the number of reserved pageblocks is almost always unchanged.
      
      This patch adds zone->nr_migrate_reserve_block to maintain the number of
      MIGRATE_RESERVE pageblocks and it reduces the overhead of
      setup_zone_migrate_reserve dramatically.  The following table shows time
      of onlining a memory section.
      
        Amount of memory     | 128GB | 192GB | 256GB|
        ---------------------------------------------
        linux-3.12           |  23.9 |  31.4 | 44.5 |
        This patch           |   8.3 |   8.3 |  8.6 |
        Mel's proposal patch |  10.9 |  19.2 | 31.3 |
        ---------------------------------------------
                                         (millisecond)
      
        128GB : 4 nodes and each node has 32GB of memory
        192GB : 6 nodes and each node has 32GB of memory
        256GB : 8 nodes and each node has 32GB of memory
      
        (*1) Mel proposed his idea by the following threads.
             https://lkml.org/lkml/2013/10/30/272
      
      [akpm@linux-foundation.org: tweak comment]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Reported-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Tested-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      943dca1a
    • O
      mm: thp: __get_page_tail_foll() can use get_huge_page_tail() · c728852f
      Oleg Nesterov 提交于
      Cleanup. Change __get_page_tail_foll() to use get_huge_page_tail()
      to avoid the code duplication.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c728852f
    • A
      mm/hugetlb.c: defer PageHeadHuge() symbol export · 9b7ac260
      Andrea Arcangeli 提交于
      No actual need of it. So keep it internal.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b7ac260
    • A
      mm/swap.c: reorganize put_compound_page() · 26296ad2
      Andrew Morton 提交于
      Tweak it so save a tab stop, make code layout slightly less nutty.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26296ad2
    • A
      mm/hugetlb.c: simplify PageHeadHuge() and PageHuge() · 758f66a2
      Andrew Morton 提交于
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      758f66a2
    • A
      mm: hugetlbfs: use __compound_tail_refcounted in __get_page_tail too · 3bfcd13e
      Andrea Arcangeli 提交于
      Also remove hugetlb.h which isn't needed anymore as PageHeadHuge is
      handled in mm.h.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3bfcd13e
    • A
      mm: tail page refcounting optimization for slab and hugetlbfs · 44518d2b
      Andrea Arcangeli 提交于
      This skips the _mapcount mangling for slab and hugetlbfs pages.
      
      The main trouble in doing this is to guarantee that PageSlab and
      PageHeadHuge remains constant for all get_page/put_page run on the tail
      of slab or hugetlbfs compound pages.  Otherwise if they're set during
      get_page but not set during put_page, the _mapcount of the tail page
      would underflow.
      
      PageHeadHuge will remain true until the compound page is released and
      enters the buddy allocator so it won't risk to change even if the tail
      page is the last reference left on the page.
      
      PG_slab instead is cleared before the slab frees the head page with
      put_page, so if the tail pin is released after the slab freed the page,
      we would have a problem.  But in the slab case the tail pin cannot be
      the last reference left on the page.  This is because the slab code is
      free to reuse the compound page after a kfree/kmem_cache_free without
      having to check if there's any tail pin left.  In turn all tail pins
      must be always released while the head is still pinned by the slab code
      and so we know PG_slab will be still set too.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44518d2b
    • A
      mm: hugetlbfs: move the put/get_page slab and hugetlbfs optimization in a faster path · ebf360f9
      Andrea Arcangeli 提交于
      We don't actually need a reference on the head page in the slab and
      hugetlbfs paths, as long as we add a smp_rmb() which should be faster
      than get_page_unless_zero.
      
      [akpm@linux-foundation.org: fix typo in comment]
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ebf360f9
    • A
      mm: hugetlb: use get_page_foll() in follow_hugetlb_page() · a0368d4e
      Andrea Arcangeli 提交于
      get_page_foll() is more optimal and is always safe to use under the PT
      lock.  More so for hugetlbfs as there's no risk of race conditions with
      split_huge_page regardless of the PT lock.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Tested-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0368d4e
    • D
      dma-debug: introduce debug_dma_assert_idle() · 0abdd7a8
      Dan Williams 提交于
      Record actively mapped pages and provide an api for asserting a given
      page is dma inactive before execution proceeds.  Placing
      debug_dma_assert_idle() in cow_user_page() flagged the violation of the
      dma-api in the NET_DMA implementation (see commit 77873803 "net_dma:
      mark broken").
      
      The implementation includes the capability to count, in a limited way,
      repeat mappings of the same page that occur without an intervening
      unmap.  This 'overlap' counter is limited to the few bits of tag space
      in a radix tree.  This mechanism is added to mitigate false negative
      cases where, for example, a page is dma mapped twice and
      debug_dma_assert_idle() is called after the page is un-mapped once.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Vinod Koul <vinod.koul@intel.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0abdd7a8
  2. 15 1月, 2014 1 次提交
    • M
      mm: fix crash when using XFS on loopback · 03e5ac2f
      Mikulas Patocka 提交于
      Commit 8456a648 ("slab: use struct page for slab management") causes
      a crash in the LVM2 testsuite on PA-RISC (the crashing test is
      fsadm.sh).  The testsuite doesn't crash on 3.12, crashes on 3.13-rc1 and
      later.
      
       Bad Address (null pointer deref?): Code=15 regs=000000413edd89a0 (Addr=000006202224647d)
       CPU: 3 PID: 24008 Comm: loop0 Not tainted 3.13.0-rc6 #5
       task: 00000001bf3c0048 ti: 000000413edd8000 task.ti: 000000413edd8000
      
            YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
       PSW: 00001000000001101111100100001110 Not tainted
       r00-03  000000ff0806f90e 00000000405c8de0 000000004013e6c0 000000413edd83f0
       r04-07  00000000405a95e0 0000000000000200 00000001414735f0 00000001bf349e40
       r08-11  0000000010fe3d10 0000000000000001 00000040829c7778 000000413efd9000
       r12-15  0000000000000000 000000004060d800 0000000010fe3000 0000000010fe3000
       r16-19  000000413edd82a0 00000041078ddbc0 0000000000000010 0000000000000001
       r20-23  0008f3d0d83a8000 0000000000000000 00000040829c7778 0000000000000080
       r24-27  00000001bf349e40 00000001bf349e40 202d66202224640d 00000000405a95e0
       r28-31  202d662022246465 000000413edd88f0 000000413edd89a0 0000000000000001
       sr00-03  000000000532c000 0000000000000000 0000000000000000 000000000532c000
       sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      
       IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000401fe42c 00000000401fe430
        IIR: 539c0030    ISR: 00000000202d6000  IOR: 000006202224647d
        CPU:        3   CR30: 000000413edd8000 CR31: 0000000000000000
        ORIG_R28: 00000000405a95e0
        IAOQ[0]: vma_interval_tree_iter_first+0x14/0x48
        IAOQ[1]: vma_interval_tree_iter_first+0x18/0x48
        RP(r2): flush_dcache_page+0x128/0x388
       Backtrace:
         flush_dcache_page+0x128/0x388
         lo_splice_actor+0x90/0x148 [loop]
         splice_from_pipe_feed+0xc0/0x1d0
         __splice_from_pipe+0xac/0xc0
         lo_direct_splice_actor+0x1c/0x70 [loop]
         splice_direct_to_actor+0xec/0x228
         lo_receive+0xe4/0x298 [loop]
         loop_thread+0x478/0x640 [loop]
         kthread+0x134/0x168
         end_fault_vector+0x20/0x28
         xfs_setsize_buftarg+0x0/0x90 [xfs]
      
       Kernel panic - not syncing: Bad Address (null pointer deref?)
      
      Commit 8456a648 changes the page structure so that the slab
      subsystem reuses the page->mapping field.
      
      The crash happens in the following way:
       * XFS allocates some memory from slab and issues a bio to read data
         into it.
       * the bio is sent to the loopback device.
       * lo_receive creates an actor and calls splice_direct_to_actor.
       * lo_splice_actor copies data to the target page.
       * lo_splice_actor calls flush_dcache_page because the page may be
         mapped by userspace.  In that case we need to flush the kernel cache.
       * flush_dcache_page asks for the list of userspace mappings, however
         that page->mapping field is reused by the slab subsystem for a
         different purpose.  This causes the crash.
      
      Note that other architectures without coherent caches (sparc, arm, mips)
      also call page_mapping from flush_dcache_page, so they may crash in the
      same way.
      
      This patch fixes this bug by testing if the page is a slab page in
      page_mapping and returning NULL if it is.
      
      The patch also fixes VM_BUG_ON(PageSlab(page)) that could happen in
      earlier kernels in the same scenario on architectures without cache
      coherence when CONFIG_DEBUG_VM is enabled - so it should be backported
      to stable kernels.
      
      In the old kernels, the function page_mapping is placed in
      include/linux/mm.h, so you should modify the patch accordingly when
      backporting it.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: John David Anglin <dave.anglin@bell.net>]
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Reviewed-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      03e5ac2f
  3. 12 1月, 2014 1 次提交
    • H
      thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once only · eecc1e42
      Hugh Dickins 提交于
      We see General Protection Fault on RSI in copy_page_rep: that RSI is
      what you get from a NULL struct page pointer.
      
        RIP: 0010:[<ffffffff81154955>]  [<ffffffff81154955>] copy_page_rep+0x5/0x10
        RSP: 0000:ffff880136e15c00  EFLAGS: 00010286
        RAX: ffff880000000000 RBX: ffff880136e14000 RCX: 0000000000000200
        RDX: 6db6db6db6db6db7 RSI: db73880000000000 RDI: ffff880dd0c00000
        RBP: ffff880136e15c18 R08: 0000000000000200 R09: 000000000005987c
        R10: 000000000005987c R11: 0000000000000200 R12: 0000000000000001
        R13: ffffea00305aa000 R14: 0000000000000000 R15: 0000000000000000
        FS:  00007f195752f700(0000) GS:ffff880c7fc20000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000093010000 CR3: 00000001458e1000 CR4: 00000000000027e0
        Call Trace:
          copy_user_huge_page+0x93/0xab
          do_huge_pmd_wp_page+0x710/0x815
          handle_mm_fault+0x15d8/0x1d70
          __do_page_fault+0x14d/0x840
          do_page_fault+0x2f/0x90
          page_fault+0x22/0x30
      
      do_huge_pmd_wp_page() tests is_huge_zero_pmd(orig_pmd) four times: but
      since shrink_huge_zero_page() can free the huge_zero_page, and we have
      no hold of our own on it here (except where the fourth test holds
      page_table_lock and has checked pmd_same), it's possible for it to
      answer yes the first time, but no to the second or third test.  Change
      all those last three to tests for NULL page.
      
      (Note: this is not the same issue as trinity's DEBUG_PAGEALLOC BUG
      in copy_page_rep with RSI: ffff88009c422000, reported by Sasha Levin
      in https://lkml.org/lkml/2013/3/29/103.  I believe that one is due
      to the source page being split, and a tail page freed, while copy
      is in progress; and not a problem without DEBUG_PAGEALLOC, since
      the pmd_same check will prevent a miscopy from being made visible.)
      
      Fixes: 97ae1749 ("thp: implement refcounting for huge zero page")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org # v3.10 v3.11 v3.12
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eecc1e42
  4. 03 1月, 2014 6 次提交
    • N
      mm/memory-failure.c: transfer page count from head page to tail page after split thp · a3e0f9e4
      Naoya Horiguchi 提交于
      Memory failures on thp tail pages cause kernel panic like below:
      
         mce: [Hardware Error]: Machine check events logged
         MCE exception done on CPU 7
         BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
         IP: [<ffffffff811b7cd1>] dequeue_hwpoisoned_huge_page+0x131/0x1e0
         PGD bae42067 PUD ba47d067 PMD 0
         Oops: 0000 [#1] SMP
        ...
         CPU: 7 PID: 128 Comm: kworker/7:2 Tainted: G   M       O 3.13.0-rc4-131217-1558-00003-g83b7df08e462 #25
        ...
         Call Trace:
           me_huge_page+0x3e/0x50
           memory_failure+0x4bb/0xc20
           mce_process_work+0x3e/0x70
           process_one_work+0x171/0x420
           worker_thread+0x11b/0x3a0
           ? manage_workers.isra.25+0x2b0/0x2b0
           kthread+0xe4/0x100
           ? kthread_create_on_node+0x190/0x190
           ret_from_fork+0x7c/0xb0
           ? kthread_create_on_node+0x190/0x190
        ...
         RIP   dequeue_hwpoisoned_huge_page+0x131/0x1e0
         CR2: 0000000000000058
      
      The reasoning of this problem is shown below:
       - when we have a memory error on a thp tail page, the memory error
         handler grabs a refcount of the head page to keep the thp under us.
       - Before unmapping the error page from processes, we split the thp,
         where page refcounts of both of head/tail pages don't change.
       - Then we call try_to_unmap() over the error page (which was a tail
         page before). We didn't pin the error page to handle the memory error,
         this error page is freed and removed from LRU list.
       - We never have the error page on LRU list, so the first page state
         check returns "unknown page," then we move to the second check
         with the saved page flag.
       - The saved page flag have PG_tail set, so the second page state check
         returns "hugepage."
       - We call me_huge_page() for freed error page, then we hit the above panic.
      
      The root cause is that we didn't move refcount from the head page to the
      tail page after split thp.  So this patch suggests to do this.
      
      This panic was introduced by commit 524fca1e ("HWPOISON: fix
      misjudgement of page_action() for errors on mlocked pages").  Note that we
      did have the same refcount problem before this commit, but it was just
      ignored because we had only first page state check which returned "unknown
      page." The commit changed the refcount problem from "doesn't work" to
      "kernel panic."
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: <stable@vger.kernel.org>	[3.9+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a3e0f9e4
    • M
      mm: remove bogus warning in copy_huge_pmd() · d0319bd5
      Mel Gorman 提交于
      Sasha Levin reported the following warning being triggered
      
        WARNING: CPU: 28 PID: 35287 at mm/huge_memory.c:887 copy_huge_pmd+0x145/ 0x3a0()
        Call Trace:
          copy_huge_pmd+0x145/0x3a0
          copy_page_range+0x3f2/0x560
          dup_mmap+0x2c9/0x3d0
          dup_mm+0xad/0x150
          copy_process+0xa68/0x12e0
          do_fork+0x96/0x270
          SyS_clone+0x16/0x20
          stub_clone+0x69/0x90
      
      This warning was introduced by "mm: numa: Avoid unnecessary disruption
      of NUMA hinting during migration" for paranoia reasons but the warning
      is bogus.  I was thinking of parallel races between NUMA hinting faults
      and forks but this warning would also be triggered by a parallel reclaim
      splitting a THP during a fork.  Remote the bogus warning.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0319bd5
    • V
      memcg: fix memcg_size() calculation · 695c6083
      Vladimir Davydov 提交于
      The mem_cgroup structure contains nr_node_ids pointers to
      mem_cgroup_per_node objects, not the objects themselves.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@openvz.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      695c6083
    • R
      mm: fix use-after-free in sys_remap_file_pages · 4eb91982
      Rik van Riel 提交于
      remap_file_pages calls mmap_region, which may merge the VMA with other
      existing VMAs, and free "vma".  This can lead to a use-after-free bug.
      Avoid the bug by remembering vm_flags before calling mmap_region, and
      not trying to dereference vma later.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4eb91982
    • V
      mm: munlock: fix deadlock in __munlock_pagevec() · 3b25df93
      Vlastimil Babka 提交于
      Commit 7225522b ("mm: munlock: batch non-THP page isolation and
      munlock+putback using pagevec" introduced __munlock_pagevec() to speed
      up munlock by holding lru_lock over multiple isolated pages.  Pages that
      fail to be isolated are put_page()d immediately, also within the lock.
      
      This can lead to deadlock when __munlock_pagevec() becomes the holder of
      the last page pin and put_page() leads to __page_cache_release() which
      also locks lru_lock.  The deadlock has been observed by Sasha Levin
      using trinity.
      
      This patch avoids the deadlock by deferring put_page() operations until
      lru_lock is released.  Another pagevec (which is also used by later
      phases of the function is reused to gather the pages for put_page()
      operation.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3b25df93
    • V
      mm: munlock: fix a bug where THP tail page is encountered · c424be1c
      Vlastimil Babka 提交于
      Since commit ff6a6da6 ("mm: accelerate munlock() treatment of THP
      pages") munlock skips tail pages of a munlocked THP page.  However, when
      the head page already has PageMlocked unset, it will not skip the tail
      pages.
      
      Commit 7225522b ("mm: munlock: batch non-THP page isolation and
      munlock+putback using pagevec") has added a PageTransHuge() check which
      contains VM_BUG_ON(PageTail(page)).  Sasha Levin found this triggered
      using trinity, on the first tail page of a THP page without PageMlocked
      flag.
      
      This patch fixes the issue by skipping tail pages also in the case when
      PageMlocked flag is unset.  There is still a possibility of race with
      THP page split between clearing PageMlocked and determining how many
      pages to skip.  The race might result in former tail pages not being
      skipped, which is however no longer a bug, as during the skip the
      PageTail flags are cleared.
      
      However this race also affects correctness of NR_MLOCK accounting, which
      is to be fixed in a separate patch.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c424be1c
  5. 22 12月, 2013 1 次提交
    • B
      aio/migratepages: make aio migrate pages sane · 8e321fef
      Benjamin LaHaise 提交于
      The arbitrary restriction on page counts offered by the core
      migrate_page_move_mapping() code results in rather suspicious looking
      fiddling with page reference counts in the aio_migratepage() operation.
      To fix this, make migrate_page_move_mapping() take an extra_count parameter
      that allows aio to tell the code about its own reference count on the page
      being migrated.
      
      While cleaning up aio_migratepage(), make it validate that the old page
      being passed in is actually what aio_migratepage() expects to prevent
      misbehaviour in the case of races.
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      8e321fef
  6. 21 12月, 2013 4 次提交
  7. 19 12月, 2013 5 次提交