1. 06 5月, 2021 1 次提交
  2. 01 5月, 2021 14 次提交
  3. 14 3月, 2021 1 次提交
  4. 25 2月, 2021 16 次提交
  5. 10 2月, 2021 1 次提交
    • J
      Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" · e82553c1
      Johannes Weiner 提交于
      This reverts commit 536d3bf2, as it can
      cause writers to memory.high to get stuck in the kernel forever,
      performing page reclaim and consuming excessive amounts of CPU cycles.
      
      Before the patch, a write to memory.high would first put the new limit
      in place for the workload, and then reclaim the requested delta.  After
      the patch, the kernel tries to reclaim the delta before putting the new
      limit into place, in order to not overwhelm the workload with a sudden,
      large excess over the limit.  However, if reclaim is actively racing
      with new allocations from the uncurbed workload, it can keep the write()
      working inside the kernel indefinitely.
      
      This is causing problems in Facebook production.  A privileged
      system-level daemon that adjusts memory.high for various workloads
      running on a host can get unexpectedly stuck in the kernel and
      essentially turn into a sort of involuntary kswapd for one of the
      workloads.  We've observed that daemon busy-spin in a write() for
      minutes at a time, neglecting its other duties on the system, and
      expending privileged system resources on behalf of a workload.
      
      To remedy this, we have first considered changing the reclaim logic to
      break out after a couple of loops - whether the workload has converged
      to the new limit or not - and bound the write() call this way.  However,
      the root cause that inspired the sequence change in the first place has
      been fixed through other means, and so a revert back to the proven
      limit-setting sequence, also used by memory.max, is preferable.
      
      The sequence was changed to avoid extreme latencies in the workload when
      the limit was lowered: the sudden, large excess created by the limit
      lowering would erroneously trigger the penalty sleeping code that is
      meant to throttle excessive growth from below.  Allocating threads could
      end up sleeping long after the write() had already reclaimed the delta
      for which they were being punished.
      
      However, erroneous throttling also caused problems in other scenarios at
      around the same time.  This resulted in commit b3ff9291 ("mm, memcg:
      reclaim more aggressively before high allocator throttling"), included
      in the same release as the offending commit.  When allocating threads
      now encounter large excess caused by a racing write() to memory.high,
      instead of entering punitive sleeps, they will simply be tasked with
      helping reclaim down the excess, and will be held no longer than it
      takes to accomplish that.  This is in line with regular limit
      enforcement - i.e.  if the workload allocates up against or over an
      otherwise unchanged limit from below.
      
      With the patch breaking userspace, and the root cause addressed by other
      means already, revert it again.
      
      Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org
      Fixes: 536d3bf2 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NTejun Heo <tj@kernel.org>
      Acked-by: NChris Down <chris@chrisdown.name>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: <stable@vger.kernel.org>	[5.8+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e82553c1
  6. 25 1月, 2021 1 次提交
    • R
      mm: memcg/slab: optimize objcg stock draining · 3de7d4f2
      Roman Gushchin 提交于
      Imran Khan reported a 16% regression in hackbench results caused by the
      commit f2fe7b09 ("mm: memcg/slab: charge individual slab objects
      instead of pages").  The regression is noticeable in the case of a
      consequent allocation of several relatively large slab objects, e.g.
      skb's.  As soon as the amount of stocked bytes exceeds PAGE_SIZE,
      drain_obj_stock() and __memcg_kmem_uncharge() are called, and it leads
      to a number of atomic operations in page_counter_uncharge().
      
      The corresponding call graph is below (provided by Imran Khan):
      
        |__alloc_skb
        |    |
        |    |__kmalloc_reserve.isra.61
        |    |    |
        |    |    |__kmalloc_node_track_caller
        |    |    |    |
        |    |    |    |slab_pre_alloc_hook.constprop.88
        |    |    |     obj_cgroup_charge
        |    |    |    |    |
        |    |    |    |    |__memcg_kmem_charge
        |    |    |    |    |    |
        |    |    |    |    |    |page_counter_try_charge
        |    |    |    |    |
        |    |    |    |    |refill_obj_stock
        |    |    |    |    |    |
        |    |    |    |    |    |drain_obj_stock.isra.68
        |    |    |    |    |    |    |
        |    |    |    |    |    |    |__memcg_kmem_uncharge
        |    |    |    |    |    |    |    |
        |    |    |    |    |    |    |    |page_counter_uncharge
        |    |    |    |    |    |    |    |    |
        |    |    |    |    |    |    |    |    |page_counter_cancel
        |    |    |    |
        |    |    |    |
        |    |    |    |__slab_alloc
        |    |    |    |    |
        |    |    |    |    |___slab_alloc
        |    |    |    |    |
        |    |    |    |slab_post_alloc_hook
      
      Instead of directly uncharging the accounted kernel memory, it's
      possible to refill the generic page-sized per-cpu stock instead.  It's a
      much faster operation, especially on a default hierarchy.  As a bonus,
      __memcg_kmem_uncharge_page() will also get faster, so the freeing of
      page-sized kernel allocations (e.g.  large kmallocs) will become faster.
      
      A similar change has been done earlier for the socket memory by the
      commit 475d0487 ("mm: memcontrol: use per-cpu stocks for socket
      memory uncharging").
      
      Link: https://lkml.kernel.org/r/20210106042239.2860107-1-guro@fb.com
      Fixes: f2fe7b09 ("mm: memcg/slab: charge individual slab objects instead of pages")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Reported-by: NImran Khan <imran.f.khan@oracle.com>
      Tested-by: NImran Khan <imran.f.khan@oracle.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NMichal Koutn <mkoutny@suse.com>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3de7d4f2
  7. 24 1月, 2021 1 次提交
  8. 20 12月, 2020 3 次提交
  9. 16 12月, 2020 2 次提交