1. 11 11月, 2021 1 次提交
  2. 07 11月, 2021 2 次提交
  3. 18 10月, 2021 14 次提交
  4. 27 9月, 2021 3 次提交
  5. 04 9月, 2021 5 次提交
  6. 10 8月, 2021 1 次提交
  7. 30 6月, 2021 8 次提交
  8. 24 5月, 2021 1 次提交
  9. 07 5月, 2021 1 次提交
  10. 01 5月, 2021 1 次提交
  11. 24 3月, 2021 1 次提交
  12. 06 1月, 2021 1 次提交
    • L
      mm: make wait_on_page_writeback() wait for multiple pending writebacks · c2407cf7
      Linus Torvalds 提交于
      Ever since commit 2a9127fc ("mm: rewrite wait_on_page_bit_common()
      logic") we've had some very occasional reports of BUG_ON(PageWriteback)
      in write_cache_pages(), which we thought we already fixed in commit
      073861ed ("mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)").
      
      But syzbot just reported another one, even with that commit in place.
      
      And it turns out that there's a simpler way to trigger the BUG_ON() than
      the one Hugh found with page re-use.  It all boils down to the fact that
      the page writeback is ostensibly serialized by the page lock, but that
      isn't actually really true.
      
      Yes, the people _setting_ writeback all do so under the page lock, but
      the actual clearing of the bit - and waking up any waiters - happens
      without any page lock.
      
      This gives us this fairly simple race condition:
      
        CPU1 = end previous writeback
        CPU2 = start new writeback under page lock
        CPU3 = write_cache_pages()
      
        CPU1          CPU2            CPU3
        ----          ----            ----
      
        end_page_writeback()
          test_clear_page_writeback(page)
          ... delayed...
      
                      lock_page();
                      set_page_writeback()
                      unlock_page()
      
                                      lock_page()
                                      wait_on_page_writeback();
      
          wake_up_page(page, PG_writeback);
          .. wakes up CPU3 ..
      
                                      BUG_ON(PageWriteback(page));
      
      where the BUG_ON() happens because we woke up the PG_writeback bit
      becasue of the _previous_ writeback, but a new one had already been
      started because the clearing of the bit wasn't actually atomic wrt the
      actual wakeup or serialized by the page lock.
      
      The reason this didn't use to happen was that the old logic in waiting
      on a page bit would just loop if it ever saw the bit set again.
      
      The nice proper fix would probably be to get rid of the whole "wait for
      writeback to clear, and then set it" logic in the writeback path, and
      replace it with an atomic "wait-to-set" (ie the same as we have for page
      locking: we set the page lock bit with a single "lock_page()", not with
      "wait for lock bit to clear and then set it").
      
      However, out current model for writeback is that the waiting for the
      writeback bit is done by the generic VFS code (ie write_cache_pages()),
      but the actual setting of the writeback bit is done much later by the
      filesystem ".writepages()" function.
      
      IOW, to make the writeback bit have that same kind of "wait-to-set"
      behavior as we have for page locking, we'd have to change our roughly
      ~50 different writeback functions.  Painful.
      
      Instead, just make "wait_on_page_writeback()" loop on the very unlikely
      situation that the PG_writeback bit is still set, basically re-instating
      the old behavior.  This is very non-optimal in case of contention, but
      since we only ever set the bit under the page lock, that situation is
      controlled.
      
      Reported-by: syzbot+2fc0712f8f8b8b8fa0ef@syzkaller.appspotmail.com
      Fixes: 2a9127fc ("mm: rewrite wait_on_page_bit_common() logic")
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2407cf7
  13. 25 11月, 2020 1 次提交
    • H
      mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback) · 073861ed
      Hugh Dickins 提交于
      Twice now, when exercising ext4 looped on shmem huge pages, I have crashed
      on the PF_ONLY_HEAD check inside PageWaiters(): ext4_finish_bio() calling
      end_page_writeback() calling wake_up_page() on tail of a shmem huge page,
      no longer an ext4 page at all.
      
      The problem is that PageWriteback is not accompanied by a page reference
      (as the NOTE at the end of test_clear_page_writeback() acknowledges): as
      soon as TestClearPageWriteback has been done, that page could be removed
      from page cache, freed, and reused for something else by the time that
      wake_up_page() is reached.
      
      https://lore.kernel.org/linux-mm/20200827122019.GC14765@casper.infradead.org/
      Matthew Wilcox suggested avoiding or weakening the PageWaiters() tail
      check; but I'm paranoid about even looking at an unreferenced struct page,
      lest its memory might itself have already been reused or hotremoved (and
      wake_up_page_bit() may modify that memory with its ClearPageWaiters()).
      
      Then on crashing a second time, realized there's a stronger reason against
      that approach.  If my testing just occasionally crashes on that check,
      when the page is reused for part of a compound page, wouldn't it be much
      more common for the page to get reused as an order-0 page before reaching
      wake_up_page()?  And on rare occasions, might that reused page already be
      marked PageWriteback by its new user, and already be waited upon?  What
      would that look like?
      
      It would look like BUG_ON(PageWriteback) after wait_on_page_writeback()
      in write_cache_pages() (though I have never seen that crash myself).
      
      Matthew Wilcox explaining this to himself:
       "page is allocated, added to page cache, dirtied, writeback starts,
      
        --- thread A ---
        filesystem calls end_page_writeback()
              test_clear_page_writeback()
        --- context switch to thread B ---
        truncate_inode_pages_range() finds the page, it doesn't have writeback set,
        we delete it from the page cache.  Page gets reallocated, dirtied, writeback
        starts again.  Then we call write_cache_pages(), see
        PageWriteback() set, call wait_on_page_writeback()
        --- context switch back to thread A ---
        wake_up_page(page, PG_writeback);
        ... thread B is woken, but because the wakeup was for the old use of
        the page, PageWriteback is still set.
      
        Devious"
      
      And prior to 2a9127fc ("mm: rewrite wait_on_page_bit_common() logic")
      this would have been much less likely: before that, wake_page_function()'s
      non-exclusive case would stop walking and not wake if it found Writeback
      already set again; whereas now the non-exclusive case proceeds to wake.
      
      I have not thought of a fix that does not add a little overhead: the
      simplest fix is for end_page_writeback() to get_page() before calling
      test_clear_page_writeback(), then put_page() after wake_up_page().
      
      Was there a chance of missed wakeups before, since a page freed before
      reaching wake_up_page() would have PageWaiters cleared?  I think not,
      because each waiter does hold a reference on the page.  This bug comes
      when the old use of the page, the one we do TestClearPageWriteback on,
      had *no* waiters, so no additional page reference beyond the page cache
      (and whoever racily freed it).  The reuse of the page has a waiter
      holding a reference, and its own PageWriteback set; but the belated
      wake_up_page() has woken the reuse to hit that BUG_ON(PageWriteback).
      
      Reported-by: syzbot+3622cea378100f45d59f@syzkaller.appspotmail.com
      Reported-by: NQian Cai <cai@lca.pw>
      Fixes: 2a9127fc ("mm: rewrite wait_on_page_bit_common() logic")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org # v5.8+
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      073861ed