1. 12 2月, 2015 7 次提交
  2. 11 2月, 2015 1 次提交
  3. 11 12月, 2014 1 次提交
  4. 18 11月, 2014 1 次提交
    • Q
      x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific · 4aae7e43
      Qiaowei Ren 提交于
      MPX-enabled applications using large swaths of memory can
      potentially have large numbers of bounds tables in process
      address space to save bounds information. These tables can take
      up huge swaths of memory (as much as 80% of the memory on the
      system) even if we clean them up aggressively. In the worst-case
      scenario, the tables can be 4x the size of the data structure
      being tracked. IOW, a 1-page structure can require 4 bounds-table
      pages.
      
      Being this huge, our expectation is that folks using MPX are
      going to be keen on figuring out how much memory is being
      dedicated to it. So we need a way to track memory use for MPX.
      
      If we want to specifically track MPX VMAs we need to be able to
      distinguish them from normal VMAs, and keep them from getting
      merged with normal VMAs. A new VM_ flag set only on MPX VMAs does
      both of those things. With this flag, MPX bounds-table VMAs can
      be distinguished from other VMAs, and userspace can also walk
      /proc/$pid/smaps to get memory usage for MPX.
      
      In addition to this flag, we also introduce a special ->vm_ops
      specific to MPX VMAs (see the patch "add MPX specific mmap
      interface"), but currently different ->vm_ops do not by
      themselves prevent VMA merging, so we still need this flag.
      
      We understand that VM_ flags are scarce and are open to other
      options.
      Signed-off-by: NQiaowei Ren <qiaowei.ren@intel.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-mips@linux-mips.org
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20141114151825.565625B3@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      4aae7e43
  5. 14 10月, 2014 1 次提交
    • P
      mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared · 64e45507
      Peter Feiner 提交于
      For VMAs that don't want write notifications, PTEs created for read faults
      have their write bit set.  If the read fault happens after VM_SOFTDIRTY is
      cleared, then the PTE's softdirty bit will remain clear after subsequent
      writes.
      
      Here's a simple code snippet to demonstrate the bug:
      
        char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
                       MAP_ANONYMOUS | MAP_SHARED, -1, 0);
        system("echo 4 > /proc/$PPID/clear_refs"); /* clear VM_SOFTDIRTY */
        assert(*m == '\0');     /* new PTE allows write access */
        assert(!soft_dirty(x));
        *m = 'x';               /* should dirty the page */
        assert(soft_dirty(x));  /* fails */
      
      With this patch, write notifications are enabled when VM_SOFTDIRTY is
      cleared.  Furthermore, to avoid unnecessary faults, write notifications
      are disabled when VM_SOFTDIRTY is set.
      
      As a side effect of enabling and disabling write notifications with
      care, this patch fixes a bug in mprotect where vm_page_prot bits set by
      drivers were zapped on mprotect.  An analogous bug was fixed in mmap by
      commit c9d0bf24 ("mm: uncached vma support with writenotify").
      Signed-off-by: NPeter Feiner <pfeiner@google.com>
      Reported-by: NPeter Feiner <pfeiner@google.com>
      Suggested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Jamie Liu <jamieliu@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64e45507
  6. 10 10月, 2014 15 次提交
  7. 26 9月, 2014 1 次提交
    • P
      mm: softdirty: addresses before VMAs in PTE holes aren't softdirty · 87e6d49a
      Peter Feiner 提交于
      In PTE holes that contain VM_SOFTDIRTY VMAs, unmapped addresses before
      VM_SOFTDIRTY VMAs are reported as softdirty by /proc/pid/pagemap.  This
      bug was introduced in commit 68b5a652 ("mm: softdirty: respect
      VM_SOFTDIRTY in PTE holes").  That commit made /proc/pid/pagemap look at
      VM_SOFTDIRTY in PTE holes but neglected to observe the start of VMAs
      returned by find_vma.
      
      Tested:
        Wrote a selftest that creates a PMD-sized VMA then unmaps the first
        page and asserts that the page is not softdirty. I'm going to send the
        pagemap selftest in a later commit.
      Signed-off-by: NPeter Feiner <pfeiner@google.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Jamie Liu <jamieliu@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87e6d49a
  8. 07 8月, 2014 1 次提交
    • P
      mm: softdirty: respect VM_SOFTDIRTY in PTE holes · 68b5a652
      Peter Feiner 提交于
      After a VMA is created with the VM_SOFTDIRTY flag set, /proc/pid/pagemap
      should report that the VMA's virtual pages are soft-dirty until
      VM_SOFTDIRTY is cleared (i.e., by the next write of "4" to
      /proc/pid/clear_refs).  However, pagemap ignores the VM_SOFTDIRTY flag
      for virtual addresses that fall in PTE holes (i.e., virtual addresses
      that don't have a PMD, PUD, or PGD allocated yet).
      
      To observe this bug, use mmap to create a VMA large enough such that
      there's a good chance that the VMA will occupy an unused PMD, then test
      the soft-dirty bit on its pages.  In practice, I found that a VMA that
      covered a PMD's worth of address space was big enough.
      
      This patch adds the necessary VMA lookup to the PTE hole callback in
      /proc/pid/pagemap's page walk and sets soft-dirty according to the VMAs'
      VM_SOFTDIRTY flag.
      Signed-off-by: NPeter Feiner <pfeiner@google.com>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Hugh Dickins <hughd@google.com>
      Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      68b5a652
  9. 07 6月, 2014 2 次提交
  10. 05 6月, 2014 1 次提交
  11. 21 5月, 2014 1 次提交
  12. 08 4月, 2014 1 次提交
    • D
      mm: per-thread vma caching · 615d6e87
      Davidlohr Bueso 提交于
      This patch is a continuation of efforts trying to optimize find_vma(),
      avoiding potentially expensive rbtree walks to locate a vma upon faults.
      The original approach (https://lkml.org/lkml/2013/11/1/410), where the
      largest vma was also cached, ended up being too specific and random,
      thus further comparison with other approaches were needed.  There are
      two things to consider when dealing with this, the cache hit rate and
      the latency of find_vma().  Improving the hit-rate does not necessarily
      translate in finding the vma any faster, as the overhead of any fancy
      caching schemes can be too high to consider.
      
      We currently cache the last used vma for the whole address space, which
      provides a nice optimization, reducing the total cycles in find_vma() by
      up to 250%, for workloads with good locality.  On the other hand, this
      simple scheme is pretty much useless for workloads with poor locality.
      Analyzing ebizzy runs shows that, no matter how many threads are
      running, the mmap_cache hit rate is less than 2%, and in many situations
      below 1%.
      
      The proposed approach is to replace this scheme with a small per-thread
      cache, maximizing hit rates at a very low maintenance cost.
      Invalidations are performed by simply bumping up a 32-bit sequence
      number.  The only expensive operation is in the rare case of a seq
      number overflow, where all caches that share the same address space are
      flushed.  Upon a miss, the proposed replacement policy is based on the
      page number that contains the virtual address in question.  Concretely,
      the following results are seen on an 80 core, 8 socket x86-64 box:
      
      1) System bootup: Most programs are single threaded, so the per-thread
         scheme does improve ~50% hit rate by just adding a few more slots to
         the cache.
      
      +----------------+----------+------------------+
      | caching scheme | hit-rate | cycles (billion) |
      +----------------+----------+------------------+
      | baseline       | 50.61%   | 19.90            |
      | patched        | 73.45%   | 13.58            |
      +----------------+----------+------------------+
      
      2) Kernel build: This one is already pretty good with the current
         approach as we're dealing with good locality.
      
      +----------------+----------+------------------+
      | caching scheme | hit-rate | cycles (billion) |
      +----------------+----------+------------------+
      | baseline       | 75.28%   | 11.03            |
      | patched        | 88.09%   | 9.31             |
      +----------------+----------+------------------+
      
      3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.
      
      +----------------+----------+------------------+
      | caching scheme | hit-rate | cycles (billion) |
      +----------------+----------+------------------+
      | baseline       | 70.66%   | 17.14            |
      | patched        | 91.15%   | 12.57            |
      +----------------+----------+------------------+
      
      4) Ebizzy: There's a fair amount of variation from run to run, but this
         approach always shows nearly perfect hit rates, while baseline is just
         about non-existent.  The amounts of cycles can fluctuate between
         anywhere from ~60 to ~116 for the baseline scheme, but this approach
         reduces it considerably.  For instance, with 80 threads:
      
      +----------------+----------+------------------+
      | caching scheme | hit-rate | cycles (billion) |
      +----------------+----------+------------------+
      | baseline       | 1.06%    | 91.54            |
      | patched        | 99.97%   | 14.18            |
      +----------------+----------+------------------+
      
      [akpm@linux-foundation.org: fix nommu build, per Davidlohr]
      [akpm@linux-foundation.org: document vmacache_valid() logic]
      [akpm@linux-foundation.org: attempt to untangle header files]
      [akpm@linux-foundation.org: add vmacache_find() BUG_ON]
      [hughd@google.com: add vmacache_valid_mm() (from Oleg)]
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: adjust and enhance comments]
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NMichel Lespinasse <walken@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Tested-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      615d6e87
  13. 15 11月, 2013 3 次提交
  14. 13 11月, 2013 2 次提交
  15. 17 10月, 2013 1 次提交
  16. 12 9月, 2013 1 次提交