1. 14 9月, 2015 6 次提交
  2. 11 9月, 2015 4 次提交
    • V
      proc: export idle flag via kpageflags · f074a8f4
      Vladimir Davydov 提交于
      As noted by Minchan, a benefit of reading idle flag from /proc/kpageflags
      is that one can easily filter dirty and/or unevictable pages while
      estimating the size of unused memory.
      
      Note that idle flag read from /proc/kpageflags may be stale in case the
      page was accessed via a PTE, because it would be too costly to iterate
      over all page mappings on each /proc/kpageflags read to provide an
      up-to-date value.  To make sure the flag is up-to-date one has to read
      /sys/kernel/mm/page_idle/bitmap first.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Reviewed-by: NAndres Lagar-Cavilla <andreslc@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f074a8f4
    • V
      mm: introduce idle page tracking · 33c3fc71
      Vladimir Davydov 提交于
      Knowing the portion of memory that is not used by a certain application or
      memory cgroup (idle memory) can be useful for partitioning the system
      efficiently, e.g.  by setting memory cgroup limits appropriately.
      Currently, the only means to estimate the amount of idle memory provided
      by the kernel is /proc/PID/{clear_refs,smaps}: the user can clear the
      access bit for all pages mapped to a particular process by writing 1 to
      clear_refs, wait for some time, and then count smaps:Referenced.  However,
      this method has two serious shortcomings:
      
       - it does not count unmapped file pages
       - it affects the reclaimer logic
      
      To overcome these drawbacks, this patch introduces two new page flags,
      Idle and Young, and a new sysfs file, /sys/kernel/mm/page_idle/bitmap.
      A page's Idle flag can only be set from userspace by setting bit in
      /sys/kernel/mm/page_idle/bitmap at the offset corresponding to the page,
      and it is cleared whenever the page is accessed either through page tables
      (it is cleared in page_referenced() in this case) or using the read(2)
      system call (mark_page_accessed()). Thus by setting the Idle flag for
      pages of a particular workload, which can be found e.g.  by reading
      /proc/PID/pagemap, waiting for some time to let the workload access its
      working set, and then reading the bitmap file, one can estimate the amount
      of pages that are not used by the workload.
      
      The Young page flag is used to avoid interference with the memory
      reclaimer.  A page's Young flag is set whenever the Access bit of a page
      table entry pointing to the page is cleared by writing to the bitmap file.
      If page_referenced() is called on a Young page, it will add 1 to its
      return value, therefore concealing the fact that the Access bit was
      cleared.
      
      Note, since there is no room for extra page flags on 32 bit, this feature
      uses extended page flags when compiled on 32 bit.
      
      [akpm@linux-foundation.org: fix build]
      [akpm@linux-foundation.org: kpageidle requires an MMU]
      [akpm@linux-foundation.org: decouple from page-flags rework]
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Reviewed-by: NAndres Lagar-Cavilla <andreslc@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33c3fc71
    • V
      proc: add kpagecgroup file · 80ae2fdc
      Vladimir Davydov 提交于
      /proc/kpagecgroup contains a 64-bit inode number of the memory cgroup each
      page is charged to, indexed by PFN.  Having this information is useful for
      estimating a cgroup working set size.
      
      The file is present if CONFIG_PROC_PAGE_MONITOR && CONFIG_MEMCG.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Reviewed-by: NAndres Lagar-Cavilla <andreslc@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80ae2fdc
    • D
      zswap: update docs for runtime-changeable attributes · 9c4c5ef3
      Dan Streetman 提交于
      Change the Documentation/vm/zswap.txt doc to indicate that the "zpool" and
      "compressor" params are now changeable at runtime.
      Signed-off-by: NDan Streetman <ddstreet@ieee.org>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c4c5ef3
  3. 10 9月, 2015 5 次提交
  4. 09 9月, 2015 10 次提交
  5. 07 9月, 2015 1 次提交
  6. 06 9月, 2015 4 次提交
  7. 05 9月, 2015 5 次提交
    • J
      doc: dt: add documentation for nxp,lpc1788-rtc · dcb9372b
      Joachim Eastwood 提交于
      Document NXP LPC178x/18xx/408x/43xx bindings
      Signed-off-by: NJoachim Eastwood <manabian@gmail.com>
      Signed-off-by: NAlexandre Belloni <alexandre.belloni@free-electrons.com>
      dcb9372b
    • M
      Documentation/features/vm: add feature description and arch support status for... · c7e1e3cc
      Mel Gorman 提交于
      Documentation/features/vm: add feature description and arch support status for batched TLB flush after unmap
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7e1e3cc
    • A
      userfaultfd: change the read API to return a uffd_msg · a9b85f94
      Andrea Arcangeli 提交于
      I had requests to return the full address (not the page aligned one) to
      userland.
      
      It's not entirely clear how the page offset could be relevant because
      userfaults aren't like SIGBUS that can sigjump to a different place and it
      actually skip resolving the fault depending on a page offset.  There's
      currently no real way to skip the fault especially because after a
      UFFDIO_COPY|ZEROPAGE, the fault is optimized to be retried within the
      kernel without having to return to userland first (not even self modifying
      code replacing the .text that touched the faulting address would prevent
      the fault to be repeated).  Userland cannot skip repeating the fault even
      more so if the fault was triggered by a KVM secondary page fault or any
      get_user_pages or any copy-user inside some syscall which will return to
      kernel code.  The second time FAULT_FLAG_RETRY_NOWAIT won't be set leading
      to a SIGBUS being raised because the userfault can't wait if it cannot
      release the mmap_map first (and FAULT_FLAG_RETRY_NOWAIT is required for
      that).
      
      Still returning userland a proper structure during the read() on the uffd,
      can allow to use the current UFFD_API for the future non-cooperative
      extensions too and it looks cleaner as well.  Once we get additional
      fields there's no point to return the fault address page aligned anymore
      to reuse the bits below PAGE_SHIFT.
      
      The only downside is that the read() syscall will read 32bytes instead of
      8bytes but that's not going to be measurable overhead.
      
      The total number of new events that can be extended or of new future bits
      for already shipped events, is limited to 64 by the features field of the
      uffdio_api structure.  If more will be needed a bump of UFFD_API will be
      required.
      
      [akpm@linux-foundation.org: use __packed]
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
      Cc: zhang.zhanghailiang@huawei.com
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9b85f94
    • A
      userfaultfd: uAPI · 1038628d
      Andrea Arcangeli 提交于
      Defines the uAPI of the userfaultfd, notably the ioctl numbers and protocol.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
      Cc: zhang.zhanghailiang@huawei.com
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1038628d
    • A
      userfaultfd: linux/Documentation/vm/userfaultfd.txt · 25edd8bf
      Andrea Arcangeli 提交于
      This is the latest userfaultfd patchset.  The postcopy live migration
      feature on the qemu side is mostly ready to be merged and it entirely
      depends on the userfaultfd syscall to be merged as well.  So it'd be great
      if this patchset could be reviewed for merging in -mm.
      
      Userfaults allow to implement on demand paging from userland and more
      generally they allow userland to more efficiently take control of the
      behavior of page faults than what was available before (PROT_NONE +
      SIGSEGV trap).
      
      The use cases are:
      
      1) KVM postcopy live migration (one form of cloud memory
         externalization).
      
         KVM postcopy live migration is the primary driver of this work:
      
          http://blog.zhaw.ch/icclab/setting-up-post-copy-live-migration-in-openstack/
          http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04873.html
      
      2) postcopy live migration of binaries inside linux containers:
      
          http://thread.gmane.org/gmane.linux.kernel.mm/132662
      
      3) KVM postcopy live snapshotting (allowing to limit/throttle the
         memory usage, unlike fork would, plus the avoidance of fork
         overhead in the first place).
      
         While the wrprotect tracking is not implemented yet, the syscall API is
         already contemplating the wrprotect fault tracking and it's generic enough
         to allow its later implementation in a backwards compatible fashion.
      
      4) KVM userfaults on shared memory. The UFFDIO_COPY lowlevel method
         should be extended to work also on tmpfs and then the
         uffdio_register.ioctls will notify userland that UFFDIO_COPY is
         available even when the registered virtual memory range is tmpfs
         backed.
      
      5) alternate mechanism to notify web browsers or apps on embedded
         devices that volatile pages have been reclaimed. This basically
         avoids the need to run a syscall before the app can access with the
         CPU the virtual regions marked volatile. This depends on point 4)
         to be fulfilled first, as volatile pages happily apply to tmpfs.
      
      Even though there wasn't a real use case requesting it yet, it also
      allows to implement distributed shared memory in a way that readonly
      shared mappings can exist simultaneously in different hosts and they
      can be become exclusive at the first wrprotect fault.
      
      This patch (of 22):
      
      Add documentation.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
      Cc: zhang.zhanghailiang@huawei.com
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      25edd8bf
  8. 04 9月, 2015 2 次提交
  9. 03 9月, 2015 3 次提交