1. 24 2月, 2017 1 次提交
  2. 16 2月, 2017 1 次提交
  3. 14 2月, 2017 1 次提交
    • H
      rhashtable: Add nested tables · 40137906
      Herbert Xu 提交于
      This patch adds code that handles GFP_ATOMIC kmalloc failure on
      insertion.  As we cannot use vmalloc, we solve it by making our
      hash table nested.  That is, we allocate single pages at each level
      and reach our desired table size by nesting them.
      
      When a nested table is created, only a single page is allocated
      at the top-level.  Lower levels are allocated on demand during
      insertion.  Therefore for each insertion to succeed, only two
      (non-consecutive) pages are needed.
      
      After a nested table is created, a rehash will be scheduled in
      order to switch to a vmalloced table as soon as possible.  Also,
      the rehash code will never rehash into a nested table.  If we
      detect a nested table during a rehash, the rehash will be aborted
      and a new rehash will be scheduled.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40137906
  4. 10 2月, 2017 3 次提交
  5. 06 2月, 2017 1 次提交
    • W
      debugobjects: Reduce contention on the global pool_lock · 858274b6
      Waiman Long 提交于
      On a large SMP system with many CPUs, the global pool_lock may become
      a performance bottleneck as all the CPUs that need to allocate or
      free debug objects have to take the lock. That can sometimes cause
      soft lockups like:
      
       NMI watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [rcuos/1:21]
       ...
       RIP: 0010:[<ffffffff817c216b>]  [<ffffffff817c216b>]
      	_raw_spin_unlock_irqrestore+0x3b/0x60
       ...
       Call Trace:
        [<ffffffff813f40d1>] free_object+0x81/0xb0
        [<ffffffff813f4f33>] debug_check_no_obj_freed+0x193/0x220
        [<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0
        [<ffffffff81284996>] ? file_free_rcu+0x36/0x60
        [<ffffffff81251712>] kmem_cache_free+0xd2/0x380
        [<ffffffff81284960>] ? fput+0x90/0x90
        [<ffffffff81284996>] file_free_rcu+0x36/0x60
        [<ffffffff81124c23>] rcu_nocb_kthread+0x1b3/0x550
        [<ffffffff81124b71>] ? rcu_nocb_kthread+0x101/0x550
        [<ffffffff81124a70>] ? sync_exp_work_done.constprop.63+0x50/0x50
        [<ffffffff810c59d1>] kthread+0x101/0x120
        [<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0
        [<ffffffff817c2d32>] ret_from_fork+0x22/0x50
      
      To reduce the amount of contention on the pool_lock, the actual
      kmem_cache_free() of the debug objects will be delayed if the pool_lock
      is busy. This will temporarily increase the amount of free objects
      available at the free pool when the system is busy. As a result,
      the number of kmem_cache allocation and freeing is reduced.
      
      To further reduce the lock operations free debug objects in batches of
      four.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: "Du Changbin" <changbin.du@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Stancek <jstancek@redhat.com>
      Link: http://lkml.kernel.org/r/1483647425-4135-4-git-send-email-longman@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      858274b6
  6. 04 2月, 2017 2 次提交
  7. 03 2月, 2017 1 次提交
    • J
      ext4: move halfmd4 into hash.c directly · 1c83a9aa
      Jason A. Donenfeld 提交于
      The "half md4" transform should not be used by any new code. And
      fortunately, it's only used now by ext4. Since ext4 supports several
      hashing methods, at some point it might be desirable to move to
      something like SipHash. As an intermediate step, remove half md4 from
      cryptohash.h and lib, and make it just a local function in ext4's
      hash.c. There's precedent for doing this; the other function ext can use
      for its hashes -- TEA -- is also implemented in the same place. Also, by
      being a local function, this might allow gcc to perform some additional
      optimizations.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1c83a9aa
  8. 25 1月, 2017 2 次提交
  9. 24 1月, 2017 1 次提交
    • M
      rcu: Enable RCU tracepoints by default to aid in debugging · 96151825
      Matt Fleming 提交于
      While debugging a performance issue I needed to understand why
      RCU sofitrqs were firing so frequently.
      
      Unfortunately, the RCU callback tracepoints are hidden behind
      CONFIG_RCU_TRACE which defaults to off in the upstream kernel and is
      likely to also be disabled in enterprise distribution configs.
      
      Enable it by default for CONFIG_TREE_RCU. However, we must keep it
      disabled for tiny RCU, because it would otherwise pull in a large
      amount of code that would make tiny RCU less than tiny.
      
      I ran some file system metadata intensive workloads (git checkout,
      FS-Mark) on a variety of machines with this patch and saw no
      detectable change in performance.
      
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      96151825
  10. 20 1月, 2017 1 次提交
  11. 16 1月, 2017 1 次提交
  12. 15 1月, 2017 1 次提交
  13. 14 1月, 2017 1 次提交
  14. 11 1月, 2017 1 次提交
  15. 08 1月, 2017 1 次提交
    • J
      mm: workingset: fix use-after-free in shadow node shrinker · ea07b862
      Johannes Weiner 提交于
      Several people report seeing warnings about inconsistent radix tree
      nodes followed by crashes in the workingset code, which all looked like
      use-after-free access from the shadow node shrinker.
      
      Dave Jones managed to reproduce the issue with a debug patch applied,
      which confirmed that the radix tree shrinking indeed frees shadow nodes
      while they are still linked to the shadow LRU:
      
        WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
        CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
        Call Trace:
           delete_node+0x1e4/0x200
           __radix_tree_delete_node+0xd/0x10
           shadow_lru_isolate+0xe6/0x220
           __list_lru_walk_one.isra.4+0x9b/0x190
           list_lru_walk_one+0x23/0x30
           scan_shadow_nodes+0x2e/0x40
           shrink_slab.part.44+0x23d/0x5d0
           shrink_node+0x22c/0x330
           kswapd+0x392/0x8f0
      
      This is the WARN_ON_ONCE(!list_empty(&node->private_list)) placed in the
      inlined radix_tree_shrink().
      
      The problem is with 14b46879 ("mm: workingset: move shadow entry
      tracking to radix tree exceptional tracking"), which passes an update
      callback into the radix tree to link and unlink shadow leaf nodes when
      tree entries change, but forgot to pass the callback when reclaiming a
      shadow node.
      
      While the reclaimed shadow node itself is unlinked by the shrinker, its
      deletion from the tree can cause the left-most leaf node in the tree to
      be shrunk.  If that happens to be a shadow node as well, we don't unlink
      it from the LRU as we should.
      
      Consider this tree, where the s are shadow entries:
      
             root->rnode
                  |
             [0       n]
              |       |
           [s    ] [sssss]
      
      Now the shadow node shrinker reclaims the rightmost leaf node through
      the shadow node LRU:
      
             root->rnode
                  |
             [0        ]
              |
          [s     ]
      
      Because the parent of the deleted node is the first level below the
      root and has only one child in the left-most slot, the intermediate
      level is shrunk and the node containing the single shadow is put in
      its place:
      
             root->rnode
                  |
             [s        ]
      
      The shrinker again sees a single left-most slot in a first level node
      and thus decides to store the shadow in root->rnode directly and free
      the node - which is a leaf node on the shadow node LRU.
      
        root->rnode
             |
             s
      
      Without the update callback, the freed node remains on the shadow LRU,
      where it causes later shrinker runs to crash.
      
      Pass the node updater callback into __radix_tree_delete_node() in case
      the deletion causes the left-most branch in the tree to collapse too.
      
      Also add warnings when linked nodes are freed right away, rather than
      wait for the use-after-free when the list is scanned much later.
      
      Fixes: 14b46879 ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking")
      Reported-by: NDave Chinner <david@fromorbit.com>
      Reported-by: NHugh Dickins <hughd@google.com>
      Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-and-tested-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chris Leech <cleech@redhat.com>
      Cc: Lee Duncan <lduncan@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea07b862
  16. 07 1月, 2017 1 次提交
  17. 26 12月, 2016 1 次提交
    • T
      ktime: Get rid of the union · 2456e855
      Thomas Gleixner 提交于
      ktime is a union because the initial implementation stored the time in
      scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
      variant for 32bit machines. The Y2038 cleanup removed the timespec variant
      and switched everything to scalar nanoseconds. The union remained, but
      become completely pointless.
      
      Get rid of the union and just keep ktime_t as simple typedef of type s64.
      
      The conversion was done with coccinelle and some manual mopping up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      2456e855
  18. 25 12月, 2016 2 次提交
  19. 23 12月, 2016 1 次提交
    • A
      [iov_iter] fix iterate_all_kinds() on empty iterators · 33844e66
      Al Viro 提交于
      Problem similar to ones dealt with in "fold checks into iterate_and_advance()"
      and followups, except that in this case we really want to do nothing when
      asked for zero-length operation - unlike zero-length iterate_and_advance(),
      zero-length iterate_all_kinds() has no side effects, and callers are simpler
      that way.
      
      That got exposed when copy_from_iter_full() had been used by tipc, which
      builds an msghdr with zero payload and (now) feeds it to a primitive
      based on iterate_all_kinds() instead of iterate_and_advance().
      Reported-by: NJon Maloy <jon.maloy@ericsson.com>
      Tested-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      33844e66
  20. 21 12月, 2016 1 次提交
  21. 19 12月, 2016 2 次提交
  22. 16 12月, 2016 1 次提交
  23. 15 12月, 2016 12 次提交