1. 13 8月, 2020 3 次提交
  2. 08 8月, 2020 23 次提交
  3. 25 7月, 2020 2 次提交
    • H
      mm/memcg: fix refcount error while moving and swapping · 8d22a935
      Hugh Dickins 提交于
      It was hard to keep a test running, moving tasks between memcgs with
      move_charge_at_immigrate, while swapping: mem_cgroup_id_get_many()'s
      refcount is discovered to be 0 (supposedly impossible), so it is then
      forced to REFCOUNT_SATURATED, and after thousands of warnings in quick
      succession, the test is at last put out of misery by being OOM killed.
      
      This is because of the way moved_swap accounting was saved up until the
      task move gets completed in __mem_cgroup_clear_mc(), deferred from when
      mem_cgroup_move_swap_account() actually exchanged old and new ids.
      Concurrent activity can free up swap quicker than the task is scanned,
      bringing id refcount down 0 (which should only be possible when
      offlining).
      
      Just skip that optimization: do that part of the accounting immediately.
      
      Fixes: 615d66c3 ("mm: memcontrol: fix memcg id ref counter on swap charge move")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2007071431050.4726@eggly.anvilsSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d22a935
    • B
      mm/memcontrol: fix OOPS inside mem_cgroup_get_nr_swap_pages() · 82ff165c
      Bhupesh Sharma 提交于
      Prabhakar reported an OOPS inside mem_cgroup_get_nr_swap_pages()
      function in a corner case seen on some arm64 boards when kdump kernel
      runs with "cgroup_disable=memory" passed to the kdump kernel via
      bootargs.
      
      The root-cause behind the same is that currently mem_cgroup_swap_init()
      function is implemented as a subsys_initcall() call instead of a
      core_initcall(), this means 'cgroup_memory_noswap' still remains set to
      the default value (false) even when memcg is disabled via
      "cgroup_disable=memory" boot parameter.
      
      This may result in premature OOPS inside mem_cgroup_get_nr_swap_pages()
      function in corner cases:
      
        Unable to handle kernel NULL pointer dereference at virtual address 0000000000000188
        Mem abort info:
          ESR = 0x96000006
          EC = 0x25: DABT (current EL), IL = 32 bits
          SET = 0, FnV = 0
          EA = 0, S1PTW = 0
        Data abort info:
          ISV = 0, ISS = 0x00000006
          CM = 0, WnR = 0
        [0000000000000188] user address but active_mm is swapper
        Internal error: Oops: 96000006 [#1] SMP
        Modules linked in:
        <..snip..>
        Call trace:
          mem_cgroup_get_nr_swap_pages+0x9c/0xf4
          shrink_lruvec+0x404/0x4f8
          shrink_node+0x1a8/0x688
          do_try_to_free_pages+0xe8/0x448
          try_to_free_pages+0x110/0x230
          __alloc_pages_slowpath.constprop.106+0x2b8/0xb48
          __alloc_pages_nodemask+0x2ac/0x2f8
          alloc_page_interleave+0x20/0x90
          alloc_pages_current+0xdc/0xf8
          atomic_pool_expand+0x60/0x210
          __dma_atomic_pool_init+0x50/0xa4
          dma_atomic_pool_init+0xac/0x158
          do_one_initcall+0x50/0x218
          kernel_init_freeable+0x22c/0x2d0
          kernel_init+0x18/0x110
          ret_from_fork+0x10/0x18
        Code: aa1403e3 91106000 97f82a27 14000011 (f940c663)
        ---[ end trace 9795948475817de4 ]---
        Kernel panic - not syncing: Fatal exception
        Rebooting in 10 seconds..
      
      Fixes: eccb52e7 ("mm: memcontrol: prepare swap controller setup for integration")
      Reported-by: NPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: NBhupesh Sharma <bhsharma@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Link: http://lkml.kernel.org/r/1593641660-13254-2-git-send-email-bhsharma@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82ff165c
  4. 17 7月, 2020 1 次提交
  5. 26 6月, 2020 3 次提交
  6. 10 6月, 2020 2 次提交
  7. 05 6月, 2020 1 次提交
  8. 04 6月, 2020 5 次提交
    • J
      mm: base LRU balancing on an explicit cost model · 1431d4d1
      Johannes Weiner 提交于
      Currently, scan pressure between the anon and file LRU lists is balanced
      based on a mixture of reclaim efficiency and a somewhat vague notion of
      "value" of having certain pages in memory over others.  That concept of
      value is problematic, because it has caused us to count any event that
      remotely makes one LRU list more or less preferrable for reclaim, even
      when these events are not directly comparable and impose very different
      costs on the system.  One example is referenced file pages that we still
      deactivate and referenced anonymous pages that we actually rotate back to
      the head of the list.
      
      There is also conceptual overlap with the LRU algorithm itself.  By
      rotating recently used pages instead of reclaiming them, the algorithm
      already biases the applied scan pressure based on page value.  Thus, when
      rebalancing scan pressure due to rotations, we should think of reclaim
      cost, and leave assessing the page value to the LRU algorithm.
      
      Lastly, considering both value-increasing as well as value-decreasing
      events can sometimes cause the same type of event to be counted twice,
      i.e.  how rotating a page increases the LRU value, while reclaiming it
      succesfully decreases the value.  In itself this will balance out fine,
      but it quietly skews the impact of events that are only recorded once.
      
      The abstract metric of "value", the murky relationship with the LRU
      algorithm, and accounting both negative and positive events make the
      current pressure balancing model hard to reason about and modify.
      
      This patch switches to a balancing model of accounting the concrete,
      actually observed cost of reclaiming one LRU over another.  For now, that
      cost includes pages that are scanned but rotated back to the list head.
      Subsequent patches will add consideration for IO caused by refaulting of
      recently evicted pages.
      
      Replace struct zone_reclaim_stat with two cost counters in the lruvec, and
      make everything that affects cost go through a new lru_note_cost()
      function.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Link: http://lkml.kernel.org/r/20200520232525.798933-9-hannes@cmpxchg.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1431d4d1
    • J
      mm: memcontrol: update page->mem_cgroup stability rules · a0b5b414
      Johannes Weiner 提交于
      The previous patches have simplified the access rules around
      page->mem_cgroup somewhat:
      
      1. We never change page->mem_cgroup while the page is isolated by
         somebody else.  This was by far the biggest exception to our rules and
         it didn't stop at lock_page() or lock_page_memcg().
      
      2. We charge pages before they get put into page tables now, so the
         somewhat fishy rule about "can be in page table as long as it's still
         locked" is now gone and boiled down to having an exclusive reference to
         the page.
      
      Document the new rules.  Any of the following will stabilize the
      page->mem_cgroup association:
      
      - the page lock
      - LRU isolation
      - lock_page_memcg()
      - exclusive access to the page
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
      Reviewed-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Link: http://lkml.kernel.org/r/20200508183105.225460-20-hannes@cmpxchg.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0b5b414
    • J
      mm: memcontrol: delete unused lrucare handling · d9eb1ea2
      Johannes Weiner 提交于
      Swapin faults were the last event to charge pages after they had already
      been put on the LRU list.  Now that we charge directly on swapin, the
      lrucare portion of the charge code is unused.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Link: http://lkml.kernel.org/r/20200508183105.225460-19-hannes@cmpxchg.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d9eb1ea2
    • J
      mm: memcontrol: make swap tracking an integral part of memory control · 2d1c4980
      Johannes Weiner 提交于
      Without swap page tracking, users that are otherwise memory controlled can
      easily escape their containment and allocate significant amounts of memory
      that they're not being charged for.  That's because swap does readahead,
      but without the cgroup records of who owned the page at swapout, readahead
      pages don't get charged until somebody actually faults them into their
      page table and we can identify an owner task.  This can be maliciously
      exploited with MADV_WILLNEED, which triggers arbitrary readahead
      allocations without charging the pages.
      
      Make swap swap page tracking an integral part of memcg and remove the
      Kconfig options.  In the first place, it was only made configurable to
      allow users to save some memory.  But the overhead of tracking cgroup
      ownership per swap page is minimal - 2 byte per page, or 512k per 1G of
      swap, or 0.04%.  Saving that at the expense of broken containment
      semantics is not something we should present as a coequal option.
      
      The swapaccount=0 boot option will continue to exist, and it will
      eliminate the page_counter overhead and hide the swap control files, but
      it won't disable swap slot ownership tracking.
      
      This patch makes sure we always have the cgroup records at swapin time;
      the next patch will fix the actual bug by charging readahead swap pages at
      swapin time rather than at fault time.
      
      v2: fix double swap charge bug in cgroup1/cgroup2 code gating
      
      [hannes@cmpxchg.org: fix crash with cgroup_disable=memory]
        Link: http://lkml.kernel.org/r/20200521215855.GB815153@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
      Link: http://lkml.kernel.org/r/20200508183105.225460-16-hannes@cmpxchg.orgDebugged-by: NHugh Dickins <hughd@google.com>
      Debugged-by: NMichal Hocko <mhocko@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d1c4980
    • J
      mm: memcontrol: prepare swap controller setup for integration · eccb52e7
      Johannes Weiner 提交于
      A few cleanups to streamline the swap controller setup:
      
      - Replace the do_swap_account flag with cgroup_memory_noswap. This
        brings it in line with other functionality that is usually available
        unless explicitly opted out of - nosocket, nokmem.
      
      - Remove the really_do_swap_account flag that stores the boot option
        and is later used to switch the do_swap_account. It's not clear why
        this indirection is/was necessary. Use do_swap_account directly.
      
      - Minor coding style polishing
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Link: http://lkml.kernel.org/r/20200508183105.225460-15-hannes@cmpxchg.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eccb52e7