1. 06 2月, 2021 4 次提交
  2. 29 1月, 2021 1 次提交
    • W
      Revert "mm/slub: fix a memory leak in sysfs_slab_add()" · 757fed1d
      Wang Hai 提交于
      This reverts commit dde3c6b7.
      
      syzbot report a double-free bug. The following case can cause this bug.
      
       - mm/slab_common.c: create_cache(): if the __kmem_cache_create() fails,
         it does:
      
      	out_free_cache:
      		kmem_cache_free(kmem_cache, s);
      
       - but __kmem_cache_create() - at least for slub() - will have done
      
      	sysfs_slab_add(s)
      		-> sysfs_create_group() .. fails ..
      		-> kobject_del(&s->kobj); .. which frees s ...
      
      We can't remove the kmem_cache_free() in create_cache(), because other
      error cases of __kmem_cache_create() do not free this.
      
      So, revert the commit dde3c6b7 ("mm/slub: fix a memory leak in
      sysfs_slab_add()") to fix this.
      
      Reported-by: syzbot+d0bd96b4696c1ef67991@syzkaller.appspotmail.com
      Fixes: dde3c6b7 ("mm/slub: fix a memory leak in sysfs_slab_add()")
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NWang Hai <wanghai38@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      757fed1d
  3. 27 1月, 2021 1 次提交
  4. 25 1月, 2021 11 次提交
  5. 21 1月, 2021 1 次提交
  6. 18 1月, 2021 1 次提交
    • L
      mm: don't put pinned pages into the swap cache · feb889fb
      Linus Torvalds 提交于
      So technically there is nothing wrong with adding a pinned page to the
      swap cache, but the pinning obviously means that the page can't actually
      be free'd right now anyway, so it's a bit pointless.
      
      However, the real problem is not with it being a bit pointless: the real
      issue is that after we've added it to the swap cache, we'll try to unmap
      the page.  That will succeed, because the code in mm/rmap.c doesn't know
      or care about pinned pages.
      
      Even the unmapping isn't fatal per se, since the page will stay around
      in memory due to the pinning, and we do hold the connection to it using
      the swap cache.  But when we then touch it next and take a page fault,
      the logic in do_swap_page() will map it back into the process as a
      possibly read-only page, and we'll then break the page association on
      the next COW fault.
      
      Honestly, this issue could have been fixed in any of those other places:
      (a) we could refuse to unmap a pinned page (which makes conceptual
      sense), or (b) we could make sure to re-map a pinned page writably in
      do_swap_page(), or (c) we could just make do_wp_page() not COW the
      pinned page (which was what we historically did before that "mm:
      do_wp_page() simplification" commit).
      
      But while all of them are equally valid models for breaking this chain,
      not putting pinned pages into the swap cache in the first place is the
      simplest one by far.
      
      It's also the safest one: the reason why do_wp_page() was changed in the
      first place was that getting the "can I re-use this page" wrong is so
      fraught with errors.  If you do it wrong, you end up with an incorrectly
      shared page.
      
      As a result, using "page_maybe_dma_pinned()" in either do_wp_page() or
      do_swap_page() would be a serious bug since it is only a (very good)
      heuristic.  Re-using the page requires a hard black-and-white rule with
      no room for ambiguity.
      
      In contrast, saying "this page is very likely dma pinned, so let's not
      add it to the swap cache and try to unmap it" is an obviously safe thing
      to do, and if the heuristic might very rarely be a false positive, no
      harm is done.
      
      Fixes: 09854ba9 ("mm: do_wp_page() simplification")
      Reported-and-tested-by: NMartin Raiber <martin@urbackup.org>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Xu <peterx@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      feb889fb
  7. 13 1月, 2021 8 次提交
  8. 06 1月, 2021 1 次提交
    • L
      mm: make wait_on_page_writeback() wait for multiple pending writebacks · c2407cf7
      Linus Torvalds 提交于
      Ever since commit 2a9127fc ("mm: rewrite wait_on_page_bit_common()
      logic") we've had some very occasional reports of BUG_ON(PageWriteback)
      in write_cache_pages(), which we thought we already fixed in commit
      073861ed ("mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)").
      
      But syzbot just reported another one, even with that commit in place.
      
      And it turns out that there's a simpler way to trigger the BUG_ON() than
      the one Hugh found with page re-use.  It all boils down to the fact that
      the page writeback is ostensibly serialized by the page lock, but that
      isn't actually really true.
      
      Yes, the people _setting_ writeback all do so under the page lock, but
      the actual clearing of the bit - and waking up any waiters - happens
      without any page lock.
      
      This gives us this fairly simple race condition:
      
        CPU1 = end previous writeback
        CPU2 = start new writeback under page lock
        CPU3 = write_cache_pages()
      
        CPU1          CPU2            CPU3
        ----          ----            ----
      
        end_page_writeback()
          test_clear_page_writeback(page)
          ... delayed...
      
                      lock_page();
                      set_page_writeback()
                      unlock_page()
      
                                      lock_page()
                                      wait_on_page_writeback();
      
          wake_up_page(page, PG_writeback);
          .. wakes up CPU3 ..
      
                                      BUG_ON(PageWriteback(page));
      
      where the BUG_ON() happens because we woke up the PG_writeback bit
      becasue of the _previous_ writeback, but a new one had already been
      started because the clearing of the bit wasn't actually atomic wrt the
      actual wakeup or serialized by the page lock.
      
      The reason this didn't use to happen was that the old logic in waiting
      on a page bit would just loop if it ever saw the bit set again.
      
      The nice proper fix would probably be to get rid of the whole "wait for
      writeback to clear, and then set it" logic in the writeback path, and
      replace it with an atomic "wait-to-set" (ie the same as we have for page
      locking: we set the page lock bit with a single "lock_page()", not with
      "wait for lock bit to clear and then set it").
      
      However, out current model for writeback is that the waiting for the
      writeback bit is done by the generic VFS code (ie write_cache_pages()),
      but the actual setting of the writeback bit is done much later by the
      filesystem ".writepages()" function.
      
      IOW, to make the writeback bit have that same kind of "wait-to-set"
      behavior as we have for page locking, we'd have to change our roughly
      ~50 different writeback functions.  Painful.
      
      Instead, just make "wait_on_page_writeback()" loop on the very unlikely
      situation that the PG_writeback bit is still set, basically re-instating
      the old behavior.  This is very non-optimal in case of contention, but
      since we only ever set the bit under the page lock, that situation is
      controlled.
      
      Reported-by: syzbot+2fc0712f8f8b8b8fa0ef@syzkaller.appspotmail.com
      Fixes: 2a9127fc ("mm: rewrite wait_on_page_bit_common() logic")
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2407cf7
  9. 30 12月, 2020 6 次提交
  10. 23 12月, 2020 6 次提交