1. 06 11月, 2015 12 次提交
    • A
      mm, slub, kasan: enable user tracking by default with KASAN=y · 89d3c87e
      Andrey Ryabinin 提交于
      It's recommended to have slub's user tracking enabled with CONFIG_KASAN,
      because:
      
      a) User tracking disables slab merging which improves
          detecting out-of-bounds accesses.
      b) User tracking metadata acts as redzone which also improves
          detecting out-of-bounds accesses.
      c) User tracking provides additional information about object.
          This information helps to understand bugs.
      
      Currently it is not enabled by default.  Besides recompiling the kernel
      with KASAN and reinstalling it, user also have to change the boot cmdline,
      which is not very handy.
      
      Enable slub user tracking by default with KASAN=y, since there is no good
      reason to not do this.
      
      [akpm@linux-foundation.org: little fixes, per David]
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89d3c87e
    • A
      kasan: various fixes in documentation · 0295fd5d
      Andrey Konovalov 提交于
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0295fd5d
    • H
      Documentation/filesystems/proc.txt: a little tidying · a5be3563
      Hugh Dickins 提交于
      There's an odd line about "Locked" at the head of the description of
      /proc/meminfo: it seems to have strayed from /proc/PID/smaps, so lead it
      back there.  Move "Swap" and "SwapPss" descriptions down above it, to
      match the order in the file (though "PageSize"s still undescribed).
      
      The example of "Locked: 374 kB" (the same as Pss, neither Rss nor Size) is
      so unlikely as to be misleading: just make it 0, this is /bin/bash text;
      which would be "dw" (disabled write) not "de" (do not expand).
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5be3563
    • H
      mm: page migration avoid touching newpage until no going back · cf4b769a
      Hugh Dickins 提交于
      We have had trouble in the past from the way in which page migration's
      newpage is initialized in dribs and drabs - see commit 8bdd6380 ("mm:
      fix direct reclaim writeback regression") which proposed a cleanup.
      
      We have no actual problem now, but I think the procedure would be clearer
      (and alternative get_new_page pools safer to implement) if we assert that
      newpage is not touched until we are sure that it's going to be used -
      except for taking the trylock on it in __unmap_and_move().
      
      So shift the early initializations from move_to_new_page() into
      migrate_page_move_mapping(), mapping and NULL-mapping paths.  Similarly
      migrate_huge_page_move_mapping(), but its NULL-mapping path can just be
      deleted: you cannot reach hugetlbfs_migrate_page() with a NULL mapping.
      
      Adjust stages 3 to 8 in the Documentation file accordingly.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf4b769a
    • H
      mm: rmap use pte lock not mmap_sem to set PageMlocked · b87537d9
      Hugh Dickins 提交于
      KernelThreadSanitizer (ktsan) has shown that the down_read_trylock() of
      mmap_sem in try_to_unmap_one() (when going to set PageMlocked on a page
      found mapped in a VM_LOCKED vma) is ineffective against races with
      exit_mmap()'s munlock_vma_pages_all(), because mmap_sem is not held when
      tearing down an mm.
      
      But that's okay, those races are benign; and although we've believed for
      years in that ugly down_read_trylock(), it's unsuitable for the job, and
      frustrates the good intention of setting PageMlocked when it fails.
      
      It just doesn't matter if here we read vm_flags an instant before or after
      a racing mlock() or munlock() or exit_mmap() sets or clears VM_LOCKED: the
      syscalls (or exit) work their way up the address space (taking pt locks
      after updating vm_flags) to establish the final state.
      
      We do still need to be careful never to mark a page Mlocked (hence
      unevictable) by any race that will not be corrected shortly after.  The
      page lock protects from many of the races, but not all (a page is not
      necessarily locked when it's unmapped).  But the pte lock we just dropped
      is good to cover the rest (and serializes even with
      munlock_vma_pages_all(), so no special barriers required): now hold on to
      the pte lock while calling mlock_vma_page().  Is that lock ordering safe?
      Yes, that's how follow_page_pte() calls it, and how page_remove_rmap()
      calls the complementary clear_page_mlock().
      
      This fixes the following case (though not a case which anyone has
      complained of), which mmap_sem did not: truncation's preliminary
      unmap_mapping_range() is supposed to remove even the anonymous COWs of
      filecache pages, and that might race with try_to_unmap_one() on a
      VM_LOCKED vma, so that mlock_vma_page() sets PageMlocked just after
      zap_pte_range() unmaps the page, causing "Bad page state (mlocked)" when
      freed.  The pte lock protects against this.
      
      You could say that it also protects against the more ordinary case, racing
      with the preliminary unmapping of a filecache page itself: but in our
      current tree, that's independently protected by i_mmap_rwsem; and that
      race would be why "Bad page state (mlocked)" was seen before commit
      48ec833b ("Revert mm/memory.c: share the i_mmap_rwsem").
      
      Vlastimil Babka points out another race which this patch protects against.
       try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked a
      moment after munlock_vma_pages_all() did its Phase 1 TestClearPageMlocked:
      leaving PageMlocked and unevictable when it should be evictable.  mmap_sem
      is ineffective because exit_mmap() does not hold it; page lock ineffective
      because __munlock_pagevec() only takes it afterwards, in Phase 2; pte lock
      is effective because __munlock_pagevec_fill() takes it to get the page,
      after VM_LOCKED was cleared from vm_flags, so visible to try_to_unmap_one.
      
      Kirill Shutemov points out that if the compiler chooses to implement a
      "vma->vm_flags &= VM_WHATEVER" or "vma->vm_flags |= VM_WHATEVER" operation
      with an intermediate store of unrelated bits set, since I'm here foregoing
      its usual protection by mmap_sem, try_to_unmap_one() might catch sight of
      a spurious VM_LOCKED in vm_flags, and make the wrong decision.  This does
      not appear to be an immediate problem, but we may want to define vm_flags
      accessors in future, to guard against such a possibility.
      
      While we're here, make a related optimization in try_to_munmap_one(): if
      it's doing TTU_MUNLOCK, then there's no point at all in descending the
      page tables and getting the pt lock, unless the vma is VM_LOCKED.  Yes,
      that can change racily, but it can change racily even without the
      optimization: it's not critical.  Far better not to waste time here.
      
      Stopped short of separating try_to_munlock_one() from try_to_munmap_one()
      on this occasion, but that's probably the sensible next step - with a
      rename, given that try_to_munlock()'s business is to try to set Mlocked.
      
      Updated the unevictable-lru Documentation, to remove its reference to mmap
      semaphore, but found a few more updates needed in just that area.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b87537d9
    • H
      mm Documentation: undoc non-linear vmas · 7a14239a
      Hugh Dickins 提交于
      While updating some mm Documentation, I came across a few straggling
      references to the non-linear vmas which were happily removed in v4.0.
      Delete them.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a14239a
    • E
      Documentation/vm/transhuge.txt: add information about max_ptes_swap · 80f73b4b
      Ebru Akagunduz 提交于
      max_ptes_swap specifies how many pages can be brought in from swap when
      collapsing a group of pages into a transparent huge page.
      
      /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_swap
      
      A higher value can cause excessive swap IO and waste memory.  A lower
      value can prevent THPs from being collapsed, resulting fewer pages being
      collapsed into THPs, and lower memory access performance.
      Signed-off-by: NEbru Akagunduz <ebru.akagunduz@gmail.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80f73b4b
    • N
      mm: hugetlb: proc: add HugetlbPages field to /proc/PID/status · 5d317b2b
      Naoya Horiguchi 提交于
      Currently there's no easy way to get per-process usage of hugetlb pages,
      which is inconvenient because userspace applications which use hugetlb
      typically want to control their processes on the basis of how much memory
      (including hugetlb) they use.  So this patch simply provides easy access
      to the info via /proc/PID/status.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NJoern Engel <joern@logfs.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d317b2b
    • N
      mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps · 25ee01a2
      Naoya Horiguchi 提交于
      Currently /proc/PID/smaps provides no usage info for vma(VM_HUGETLB),
      which is inconvenient when we want to know per-task or per-vma base
      hugetlb usage.  To solve this, this patch adds new fields for hugetlb
      usage like below:
      
        Size:              20480 kB
        Rss:                   0 kB
        Pss:                   0 kB
        Shared_Clean:          0 kB
        Shared_Dirty:          0 kB
        Private_Clean:         0 kB
        Private_Dirty:         0 kB
        Referenced:            0 kB
        Anonymous:             0 kB
        AnonHugePages:         0 kB
        Shared_Hugetlb:    18432 kB
        Private_Hugetlb:    2048 kB
        Swap:                  0 kB
        KernelPageSize:     2048 kB
        MMUPageSize:        2048 kB
        Locked:                0 kB
        VmFlags: rd wr mr mw me de ht
      
      [hughd@google.com: fix Private_Hugetlb alignment ]
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NJoern Engel <joern@logfs.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      25ee01a2
    • S
      Doc/slub: document slabinfo-gnuplot.sh script · 76f8ec71
      Sergey Senozhatsky 提交于
      Add documentation on how to use slabinfo-gnuplot.sh script.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76f8ec71
    • D
      kernel/watchdog.c: add sysctl knob hardlockup_panic · ac1f5912
      Don Zickus 提交于
      The only way to enable a hardlockup to panic the machine is to set
      'nmi_watchdog=panic' on the kernel command line.
      
      This makes it awkward for end users and folks who want to run automate
      tests (like myself).
      
      Mimic the softlockup_panic knob and create a /proc/sys/kernel/hardlockup_panic
      knob.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Acked-by: NJiri Kosina <jkosina@suse.cz>
      Reviewed-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac1f5912
    • J
      kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup · 55537871
      Jiri Kosina 提交于
      In many cases of hardlockup reports, it's actually not possible to know
      why it triggered, because the CPU that got stuck is usually waiting on a
      resource (with IRQs disabled) in posession of some other CPU is holding.
      
      IOW, we are often looking at the stacktrace of the victim and not the
      actual offender.
      
      Introduce sysctl / cmdline parameter that makes it possible to have
      hardlockup detector perform all-CPU backtrace.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Reviewed-by: NAaron Tomlin <atomlin@redhat.com>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      55537871
  2. 04 11月, 2015 1 次提交
    • L
      atomic: remove all traces of READ_ONCE_CTRL() and atomic*_read_ctrl() · 105ff3cb
      Linus Torvalds 提交于
      This seems to be a mis-reading of how alpha memory ordering works, and
      is not backed up by the alpha architecture manual.  The helper functions
      don't do anything special on any other architectures, and the arguments
      that support them being safe on other architectures also argue that they
      are safe on alpha.
      
      Basically, the "control dependency" is between a previous read and a
      subsequent write that is dependent on the value read.  Even if the
      subsequent write is actually done speculatively, there is no way that
      such a speculative write could be made visible to other cpu's until it
      has been committed, which requires validating the speculation.
      
      Note that most weakely ordered architectures (very much including alpha)
      do not guarantee any ordering relationship between two loads that depend
      on each other on a control dependency:
      
          read A
          if (val == 1)
              read B
      
      because the conditional may be predicted, and the "read B" may be
      speculatively moved up to before reading the value A.  So we require the
      user to insert a smp_rmb() between the two accesses to be correct:
      
          read A;
          if (A == 1)
              smp_rmb()
              read B
      
      Alpha is further special in that it can break that ordering even if the
      *address* of B depends on the read of A, because the cacheline that is
      read later may be stale unless you have a memory barrier in between the
      pointer read and the read of the value behind a pointer:
      
          read ptr
          read offset(ptr)
      
      whereas all other weakly ordered architectures guarantee that the data
      dependency (as opposed to just a control dependency) will order the two
      accesses.  As a result, alpha needs a "smp_read_barrier_depends()" in
      between those two reads for them to be ordered.
      
      The coontrol dependency that "READ_ONCE_CTRL()" and "atomic_read_ctrl()"
      had was a control dependency to a subsequent *write*, however, and
      nobody can finalize such a subsequent write without having actually done
      the read.  And were you to write such a value to a "stale" cacheline
      (the way the unordered reads came to be), that would seem to lose the
      write entirely.
      
      So the things that make alpha able to re-order reads even more
      aggressively than other weak architectures do not seem to be relevant
      for a subsequent write.  Alpha memory ordering may be strange, but
      there's no real indication that it is *that* strange.
      
      Also, the alpha architecture reference manual very explicitly talks
      about the definition of "Dependence Constraints" in section 5.6.1.7,
      where a preceding read dominates a subsequent write.
      
      Such a dependence constraint admittedly does not impose a BEFORE (alpha
      architecture term for globally visible ordering), but it does guarantee
      that there can be no "causal loop".  I don't see how you could avoid
      such a loop if another cpu could see the stored value and then impact
      the value of the first read.  Put another way: the read and the write
      could not be seen as being out of order wrt other cpus.
      
      So I do not see how these "x_ctrl()" functions can currently be necessary.
      
      I may have to eat my words at some point, but in the absense of clear
      proof that alpha actually needs this, or indeed even an explanation of
      how alpha could _possibly_ need it, I do not believe these functions are
      called for.
      
      And if it turns out that alpha really _does_ need a barrier for this
      case, that barrier still should not be "smp_read_barrier_depends()".
      We'd have to make up some new speciality barrier just for alpha, along
      with the documentation for why it really is necessary.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul E McKenney <paulmck@us.ibm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      105ff3cb
  3. 03 11月, 2015 12 次提交
  4. 01 11月, 2015 2 次提交
  5. 31 10月, 2015 13 次提交