1. 21 5月, 2016 29 次提交
  2. 18 3月, 2016 5 次提交
    • M
      radix_tree: add radix_tree_dump · 7cf19af4
      Matthew Wilcox 提交于
      This is debug code which is #if 0 out.
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7cf19af4
    • M
      radix_tree: add support for multi-order entries · e6145236
      Matthew Wilcox 提交于
      With huge pages, it is convenient to have the radix tree be able to
      return an entry that covers multiple indices.  Previous attempts to deal
      with the problem have involved inserting N duplicate entries, which is a
      waste of memory and leads to problems trying to handle aliased tags, or
      probing the tree multiple times to find alternative entries which might
      cover the requested index.
      
      This approach inserts one canonical entry into the tree for a given
      range of indices, and may also insert other entries in order to ensure
      that lookups find the canonical entry.
      
      This solution only tolerates inserting powers of two that are greater
      than the fanout of the tree.  If we wish to expand the radix tree's
      abilities to support large-ish pages that is less than the fanout at the
      penultimate level of the tree, then we would need to add one more step
      in lookup to ensure that any sibling nodes in the final level of the
      tree are dereferenced and we return the canonical entry that they
      reference.
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6145236
    • M
      radix_tree: loop based on shift count, not height · 0070e28d
      Matthew Wilcox 提交于
      When we introduce entries that can cover multiple indices, we will need
      to stop in __radix_tree_create based on the shift, not the height.
      Split out for ease of bisect.
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0070e28d
    • M
      radix_tree: tag all internal tree nodes as indirect pointers · 339e6353
      Matthew Wilcox 提交于
      Set the 'indirect_ptr' bit on all the pointers to internal nodes, not
      just on the root node.  This enables the following patches to support
      multi-order entries in the radix tree.  This patch is split out for ease
      of bisection.
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      339e6353
    • V
      radix-tree: account radix_tree_node to memory cgroup · 58e698af
      Vladimir Davydov 提交于
      Allocation of radix_tree_node objects can be easily triggered from
      userspace, so we should account them to memory cgroup.  Besides, we need
      them accounted for making shadow node shrinker per memcg (see
      mm/workingset.c).
      
      A tricky thing about accounting radix_tree_node objects is that they are
      mostly allocated through radix_tree_preload(), so we can't just set
      SLAB_ACCOUNT for radix_tree_node_cachep - that would likely result in a
      lot of unrelated cgroups using objects from each other's caches.
      
      One way to overcome this would be making radix tree preloads per memcg,
      but that would probably look cumbersome and overcomplicated.
      
      Instead, we make radix_tree_node_alloc() first try to allocate from the
      cache with __GFP_ACCOUNT, no matter if the caller has preloaded or not,
      and only if it fails fall back on using per cpu preloads.  This should
      make most allocations accounted.
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58e698af
  3. 04 2月, 2016 1 次提交
    • M
      radix-tree: fix race in gang lookup · 46437f9a
      Matthew Wilcox 提交于
      If the indirect_ptr bit is set on a slot, that indicates we need to redo
      the lookup.  Introduce a new function radix_tree_iter_retry() which
      forces the loop to retry the lookup by setting 'slot' to NULL and
      turning the iterator back to point at the problematic entry.
      
      This is a pretty rare problem to hit at the moment; the lookup has to
      race with a grow of the radix tree from a height of 0.  The consequences
      of hitting this race are that gang lookup could return a pointer to a
      radix_tree_node instead of a pointer to whatever the user had inserted
      in the tree.
      
      Fixes: cebbd29e ("radix-tree: rewrite gang lookup using iterator")
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ohad Ben-Cohen <ohad@wizery.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46437f9a
  4. 07 11月, 2015 1 次提交
    • M
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman 提交于
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  5. 26 6月, 2015 1 次提交
  6. 19 5月, 2015 1 次提交
    • F
      sched/preempt: Merge preempt_mask.h into preempt.h · 92cf2118
      Frederic Weisbecker 提交于
      preempt_mask.h defines all the preempt_count semantics and related
      symbols: preempt, softirq, hardirq, nmi, preempt active, need resched,
      etc...
      
      preempt.h defines the accessors and mutators of preempt_count.
      
      But there is a messy dependency game around those two header files:
      
      	* preempt_mask.h includes preempt.h in order to access preempt_count()
      
      	* preempt_mask.h defines all preempt_count semantic and symbols
      	  except PREEMPT_NEED_RESCHED that is needed by asm/preempt.h
      	  Thus we need to define it from preempt.h, right before including
      	  asm/preempt.h, instead of defining it to preempt_mask.h with the
      	  other preempt_count symbols. Therefore the preempt_count semantics
      	  happen to be spread out.
      
      	* We plan to introduce preempt_active_[enter,exit]() to consolidate
      	  preempt_schedule*() code. But we'll need to access both preempt_count
      	  mutators (preempt_count_add()) and preempt_count symbols
      	  (PREEMPT_ACTIVE, PREEMPT_OFFSET). The usual place to define preempt
      	  operations is in preempt.h but then we'll need symbols in
      	  preempt_mask.h which already includes preempt.h. So we end up with
      	  a ressource circle dependency.
      
      Lets merge preempt_mask.h into preempt.h to solve these dependency issues.
      This way we gather semantic symbols and operation definition of
      preempt_count in a single file.
      
      This is a dumb copy-paste merge. Further merge re-arrangments are
      performed in a subsequent patch to ease review.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1431441711-29753-2-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      92cf2118
  7. 13 2月, 2015 1 次提交
  8. 07 6月, 2014 1 次提交