1. 09 9月, 2015 29 次提交
  2. 08 9月, 2015 1 次提交
  3. 05 9月, 2015 10 次提交
    • M
      mm: remove struct node_active_region · 4e6dab42
      minkyung88.kim 提交于
      struct node_active_region is not used anymore.  Remove it.
      Signed-off-by: Nminkyung88.kim <minkyung88.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e6dab42
    • O
      mm: move ->mremap() from file_operations to vm_operations_struct · 5477e70a
      Oleg Nesterov 提交于
      vma->vm_ops->mremap() looks more natural and clean in move_vma(), and this
      way ->mremap() can have more users.  Say, vdso.
      
      While at it, s/aio_ring_remap/aio_ring_mremap/.
      
      Note: this is the minimal change before ->mremap() finds another user in
      file_operations; this method should have more arguments, and it can be
      used to kill arch_remap().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5477e70a
    • V
      genalloc: add support of multiple gen_pools per device · c98c3635
      Vladimir Zapolskiy 提交于
      This change fills devm_gen_pool_create()/gen_pool_get() "name" argument
      stub with contents and extends of_gen_pool_get() functionality on this
      basis.
      
      If there is no associated platform device with a device node passed to
      of_gen_pool_get(), the function attempts to get a label property or device
      node name (= repeats MTD OF partition standard) and seeks for a named
      gen_pool registered by device of the parent device node.
      
      The main idea of the change is to allow registration of independent
      gen_pools under the same umbrella device, say "partitions" on "storage
      device", the original functionality of one "partition" per "storage
      device" is untouched.
      
      [akpm@linux-foundation.org: fix constness in devres_find()]
      [dan.carpenter@oracle.com: freeing const data pointers]
      Signed-off-by: NVladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
      Cc: Philipp Zabel <p.zabel@pengutronix.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
      Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
      Cc: Shawn Guo <shawnguo@kernel.org>
      Cc: Sascha Hauer <kernel@pengutronix.de>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c98c3635
    • V
      genalloc: add name arg to gen_pool_get() and devm_gen_pool_create() · 73858173
      Vladimir Zapolskiy 提交于
      This change modifies gen_pool_get() and devm_gen_pool_create() client
      interfaces adding one more argument "name" of a gen_pool object.
      
      Due to implementation gen_pool_get() is capable to retrieve only one
      gen_pool associated with a device even if multiple gen_pools are created,
      fortunately right at the moment it is sufficient for the clients, hence
      provide NULL as a valid argument on both producer devm_gen_pool_create()
      and consumer gen_pool_get() sides.
      
      Because only one created gen_pool per device is addressable, explicitly
      add a restriction to devm_gen_pool_create() to create only one gen_pool
      per device, this implies two possible error codes returned by the
      function, account it on client side (only misc/sram).  This completes
      client side changes related to genalloc updates.
      
      [akpm@linux-foundation.org: gen_pool_get() cleanup]
      Signed-off-by: NVladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
      Cc: Philipp Zabel <p.zabel@pengutronix.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
      Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
      Cc: Shawn Guo <shawnguo@kernel.org>
      Cc: Sascha Hauer <kernel@pengutronix.de>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73858173
    • M
      mm: defer flush of writable TLB entries · d950c947
      Mel Gorman 提交于
      If a PTE is unmapped and it's dirty then it was writable recently.  Due to
      deferred TLB flushing, it's best to assume a writable TLB cache entry
      exists.  With that assumption, the TLB must be flushed before any IO can
      start or the page is freed to avoid lost writes or data corruption.  This
      patch defers flushing of potentially writable TLBs as long as possible.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d950c947
    • M
      mm: send one IPI per CPU to TLB flush all entries after unmapping pages · 72b252ae
      Mel Gorman 提交于
      An IPI is sent to flush remote TLBs when a page is unmapped that was
      potentially accesssed by other CPUs.  There are many circumstances where
      this happens but the obvious one is kswapd reclaiming pages belonging to a
      running process as kswapd and the task are likely running on separate
      CPUs.
      
      On small machines, this is not a significant problem but as machine gets
      larger with more cores and more memory, the cost of these IPIs can be
      high.  This patch uses a simple structure that tracks CPUs that
      potentially have TLB entries for pages being unmapped.  When the unmapping
      is complete, the full TLB is flushed on the assumption that a refill cost
      is lower than flushing individual entries.
      
      Architectures wishing to do this must give the following guarantee.
      
              If a clean page is unmapped and not immediately flushed, the
              architecture must guarantee that a write to that linear address
              from a CPU with a cached TLB entry will trap a page fault.
      
      This is essentially what the kernel already depends on but the window is
      much larger with this patch applied and is worth highlighting.  The
      architecture should consider whether the cost of the full TLB flush is
      higher than sending an IPI to flush each individual entry.  An additional
      architecture helper called flush_tlb_local is required.  It's a trivial
      wrapper with some accounting in the x86 case.
      
      The impact of this patch depends on the workload as measuring any benefit
      requires both mapped pages co-located on the LRU and memory pressure.  The
      case with the biggest impact is multiple processes reading mapped pages
      taken from the vm-scalability test suite.  The test case uses NR_CPU
      readers of mapped files that consume 10*RAM.
      
      Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs
      
                                                 4.2.0-rc1          4.2.0-rc1
                                                   vanilla       flushfull-v7
      Ops lru-file-mmap-read-elapsed      159.62 (  0.00%)   120.68 ( 24.40%)
      Ops lru-file-mmap-read-time_range    30.59 (  0.00%)     2.80 ( 90.85%)
      Ops lru-file-mmap-read-time_stddv     6.70 (  0.00%)     0.64 ( 90.38%)
      
                 4.2.0-rc1    4.2.0-rc1
                   vanilla flushfull-v7
      User          581.00       611.43
      System       5804.93      4111.76
      Elapsed       161.03       122.12
      
      This is showing that the readers completed 24.40% faster with 29% less
      system CPU time.  From vmstats, it is known that the vanilla kernel was
      interrupted roughly 900K times per second during the steady phase of the
      test and the patched kernel was interrupts 180K times per second.
      
      The impact is lower on a single socket machine.
      
                                                 4.2.0-rc1          4.2.0-rc1
                                                   vanilla       flushfull-v7
      Ops lru-file-mmap-read-elapsed       25.33 (  0.00%)    20.38 ( 19.54%)
      Ops lru-file-mmap-read-time_range     0.91 (  0.00%)     1.44 (-58.24%)
      Ops lru-file-mmap-read-time_stddv     0.28 (  0.00%)     0.47 (-65.34%)
      
                 4.2.0-rc1    4.2.0-rc1
                   vanilla flushfull-v7
      User           58.09        57.64
      System        111.82        76.56
      Elapsed        27.29        22.55
      
      It's still a noticeable improvement with vmstat showing interrupts went
      from roughly 500K per second to 45K per second.
      
      The patch will have no impact on workloads with no memory pressure or have
      relatively few mapped pages.  It will have an unpredictable impact on the
      workload running on the CPU being flushed as it'll depend on how many TLB
      entries need to be refilled and how long that takes.  Worst case, the TLB
      will be completely cleared of active entries when the target PFNs were not
      resident at all.
      
      [sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72b252ae
    • M
      x86, mm: trace when an IPI is about to be sent · 5b74283a
      Mel Gorman 提交于
      When unmapping pages it is necessary to flush the TLB.  If that page was
      accessed by another CPU then an IPI is used to flush the remote CPU.  That
      is a lot of IPIs if kswapd is scanning and unmapping >100K pages per
      second.
      
      There already is a window between when a page is unmapped and when it is
      TLB flushed.  This series increases the window so multiple pages can be
      flushed using a single IPI.  This should be safe or the kernel is hosed
      already.
      
      Patch 1 simply made the rest of the series easier to write as ftrace
              could identify all the senders of TLB flush IPIS.
      
      Patch 2 tracks what CPUs potentially map a PFN and then sends an IPI
              to flush the entire TLB.
      
      Patch 3 tracks when there potentially are writable TLB entries that
              need to be batched differently
      
      Patch 4 increases SWAP_CLUSTER_MAX to further batch flushes
      
      The performance impact is documented in the changelogs but in the optimistic
      case on a 4-socket machine the full series reduces interrupts from 900K
      interrupts/second to 60K interrupts/second.
      
      This patch (of 4):
      
      It is easy to trace when an IPI is received to flush a TLB but harder to
      detect what event sent it.  This patch makes it easy to identify the
      source of IPIs being transmitted for TLB flushes on x86.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b74283a
    • A
      userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation · c1a4de99
      Andrea Arcangeli 提交于
      This implements mcopy_atomic and mfill_zeropage that are the lowlevel
      VM methods that are invoked respectively by the UFFDIO_COPY and
      UFFDIO_ZEROPAGE userfaultfd commands.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
      Cc: zhang.zhanghailiang@huawei.com
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c1a4de99
    • A
      userfaultfd: activate syscall · 1380fca0
      Andrea Arcangeli 提交于
      This activates the userfaultfd syscall.
      
      [sfr@canb.auug.org.au: activate syscall fix]
      [akpm@linux-foundation.org: don't enable userfaultfd on powerpc]
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
      Cc: zhang.zhanghailiang@huawei.com
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1380fca0
    • A
      userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx · 19a809af
      Andrea Arcangeli 提交于
      vma->vm_userfaultfd_ctx is yet another vma parameter that vma_merge
      must be aware about so that we can merge vmas back like they were
      originally before arming the userfaultfd on some memory range.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
      Cc: zhang.zhanghailiang@huawei.com
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19a809af