1. 14 8月, 2020 1 次提交
    • J
      mm: memcontrol: fix warning when allocating the root cgroup · 9f457179
      Johannes Weiner 提交于
      Commit 3e38e0aa ("mm: memcg: charge memcg percpu memory to the
      parent cgroup") adds memory tracking to the memcg kernel structures
      themselves to make cgroups liable for the memory they are consuming
      through the allocation of child groups (which can be significant).
      
      This code is a bit awkward as it's spread out through several functions:
      The outermost function does memalloc_use_memcg(parent) to set up
      current->active_memcg, which designates which cgroup to charge, and the
      inner functions pass GFP_ACCOUNT to request charging for specific
      allocations.  To make sure this dependency is satisfied at all times -
      to make sure we don't randomly charge whoever is calling the functions -
      the inner functions warn on !current->active_memcg.
      
      However, this triggers a false warning when the root memcg itself is
      allocated.  No parent exists in this case, and so current->active_memcg
      is rightfully NULL.  It's a false positive, not indicative of a bug.
      
      Delete the warnings for now, we can revisit this later.
      
      Fixes: 3e38e0aa ("mm: memcg: charge memcg percpu memory to the parent cgroup")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f457179
  2. 13 8月, 2020 4 次提交
  3. 08 8月, 2020 23 次提交
  4. 25 7月, 2020 2 次提交
    • H
      mm/memcg: fix refcount error while moving and swapping · 8d22a935
      Hugh Dickins 提交于
      It was hard to keep a test running, moving tasks between memcgs with
      move_charge_at_immigrate, while swapping: mem_cgroup_id_get_many()'s
      refcount is discovered to be 0 (supposedly impossible), so it is then
      forced to REFCOUNT_SATURATED, and after thousands of warnings in quick
      succession, the test is at last put out of misery by being OOM killed.
      
      This is because of the way moved_swap accounting was saved up until the
      task move gets completed in __mem_cgroup_clear_mc(), deferred from when
      mem_cgroup_move_swap_account() actually exchanged old and new ids.
      Concurrent activity can free up swap quicker than the task is scanned,
      bringing id refcount down 0 (which should only be possible when
      offlining).
      
      Just skip that optimization: do that part of the accounting immediately.
      
      Fixes: 615d66c3 ("mm: memcontrol: fix memcg id ref counter on swap charge move")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2007071431050.4726@eggly.anvilsSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d22a935
    • B
      mm/memcontrol: fix OOPS inside mem_cgroup_get_nr_swap_pages() · 82ff165c
      Bhupesh Sharma 提交于
      Prabhakar reported an OOPS inside mem_cgroup_get_nr_swap_pages()
      function in a corner case seen on some arm64 boards when kdump kernel
      runs with "cgroup_disable=memory" passed to the kdump kernel via
      bootargs.
      
      The root-cause behind the same is that currently mem_cgroup_swap_init()
      function is implemented as a subsys_initcall() call instead of a
      core_initcall(), this means 'cgroup_memory_noswap' still remains set to
      the default value (false) even when memcg is disabled via
      "cgroup_disable=memory" boot parameter.
      
      This may result in premature OOPS inside mem_cgroup_get_nr_swap_pages()
      function in corner cases:
      
        Unable to handle kernel NULL pointer dereference at virtual address 0000000000000188
        Mem abort info:
          ESR = 0x96000006
          EC = 0x25: DABT (current EL), IL = 32 bits
          SET = 0, FnV = 0
          EA = 0, S1PTW = 0
        Data abort info:
          ISV = 0, ISS = 0x00000006
          CM = 0, WnR = 0
        [0000000000000188] user address but active_mm is swapper
        Internal error: Oops: 96000006 [#1] SMP
        Modules linked in:
        <..snip..>
        Call trace:
          mem_cgroup_get_nr_swap_pages+0x9c/0xf4
          shrink_lruvec+0x404/0x4f8
          shrink_node+0x1a8/0x688
          do_try_to_free_pages+0xe8/0x448
          try_to_free_pages+0x110/0x230
          __alloc_pages_slowpath.constprop.106+0x2b8/0xb48
          __alloc_pages_nodemask+0x2ac/0x2f8
          alloc_page_interleave+0x20/0x90
          alloc_pages_current+0xdc/0xf8
          atomic_pool_expand+0x60/0x210
          __dma_atomic_pool_init+0x50/0xa4
          dma_atomic_pool_init+0xac/0x158
          do_one_initcall+0x50/0x218
          kernel_init_freeable+0x22c/0x2d0
          kernel_init+0x18/0x110
          ret_from_fork+0x10/0x18
        Code: aa1403e3 91106000 97f82a27 14000011 (f940c663)
        ---[ end trace 9795948475817de4 ]---
        Kernel panic - not syncing: Fatal exception
        Rebooting in 10 seconds..
      
      Fixes: eccb52e7 ("mm: memcontrol: prepare swap controller setup for integration")
      Reported-by: NPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: NBhupesh Sharma <bhsharma@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Link: http://lkml.kernel.org/r/1593641660-13254-2-git-send-email-bhsharma@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82ff165c
  5. 17 7月, 2020 1 次提交
  6. 26 6月, 2020 3 次提交
  7. 10 6月, 2020 2 次提交
  8. 05 6月, 2020 1 次提交
  9. 04 6月, 2020 3 次提交
    • J
      mm: base LRU balancing on an explicit cost model · 1431d4d1
      Johannes Weiner 提交于
      Currently, scan pressure between the anon and file LRU lists is balanced
      based on a mixture of reclaim efficiency and a somewhat vague notion of
      "value" of having certain pages in memory over others.  That concept of
      value is problematic, because it has caused us to count any event that
      remotely makes one LRU list more or less preferrable for reclaim, even
      when these events are not directly comparable and impose very different
      costs on the system.  One example is referenced file pages that we still
      deactivate and referenced anonymous pages that we actually rotate back to
      the head of the list.
      
      There is also conceptual overlap with the LRU algorithm itself.  By
      rotating recently used pages instead of reclaiming them, the algorithm
      already biases the applied scan pressure based on page value.  Thus, when
      rebalancing scan pressure due to rotations, we should think of reclaim
      cost, and leave assessing the page value to the LRU algorithm.
      
      Lastly, considering both value-increasing as well as value-decreasing
      events can sometimes cause the same type of event to be counted twice,
      i.e.  how rotating a page increases the LRU value, while reclaiming it
      succesfully decreases the value.  In itself this will balance out fine,
      but it quietly skews the impact of events that are only recorded once.
      
      The abstract metric of "value", the murky relationship with the LRU
      algorithm, and accounting both negative and positive events make the
      current pressure balancing model hard to reason about and modify.
      
      This patch switches to a balancing model of accounting the concrete,
      actually observed cost of reclaiming one LRU over another.  For now, that
      cost includes pages that are scanned but rotated back to the list head.
      Subsequent patches will add consideration for IO caused by refaulting of
      recently evicted pages.
      
      Replace struct zone_reclaim_stat with two cost counters in the lruvec, and
      make everything that affects cost go through a new lru_note_cost()
      function.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Link: http://lkml.kernel.org/r/20200520232525.798933-9-hannes@cmpxchg.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1431d4d1
    • J
      mm: memcontrol: update page->mem_cgroup stability rules · a0b5b414
      Johannes Weiner 提交于
      The previous patches have simplified the access rules around
      page->mem_cgroup somewhat:
      
      1. We never change page->mem_cgroup while the page is isolated by
         somebody else.  This was by far the biggest exception to our rules and
         it didn't stop at lock_page() or lock_page_memcg().
      
      2. We charge pages before they get put into page tables now, so the
         somewhat fishy rule about "can be in page table as long as it's still
         locked" is now gone and boiled down to having an exclusive reference to
         the page.
      
      Document the new rules.  Any of the following will stabilize the
      page->mem_cgroup association:
      
      - the page lock
      - LRU isolation
      - lock_page_memcg()
      - exclusive access to the page
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
      Reviewed-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Link: http://lkml.kernel.org/r/20200508183105.225460-20-hannes@cmpxchg.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0b5b414
    • J
      mm: memcontrol: delete unused lrucare handling · d9eb1ea2
      Johannes Weiner 提交于
      Swapin faults were the last event to charge pages after they had already
      been put on the LRU list.  Now that we charge directly on swapin, the
      lrucare portion of the charge code is unused.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Link: http://lkml.kernel.org/r/20200508183105.225460-19-hannes@cmpxchg.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d9eb1ea2