1. 29 6月, 2022 1 次提交
  2. 20 5月, 2022 1 次提交
    • M
      mm: don't be stuck to rmap lock on reclaim path · 6d4675e6
      Minchan Kim 提交于
      The rmap locks(i_mmap_rwsem and anon_vma->root->rwsem) could be contended
      under memory pressure if processes keep working on their vmas(e.g., fork,
      mmap, munmap).  It makes reclaim path stuck.  In our real workload traces,
      we see kswapd is waiting the lock for 300ms+(worst case, a sec) and it
      makes other processes entering direct reclaim, which were also stuck on
      the lock.
      
      This patch makes lru aging path try_lock mode like shink_page_list so the
      reclaim context will keep working with next lru pages without being stuck.
      if it found the rmap lock contended, it rotates the page back to head of
      lru in both active/inactive lrus to make them consistent behavior, which
      is basic starting point rather than adding more heristic.
      
      Since this patch introduces a new "contended" field as out-param along
      with try_lock in-param in rmap_walk_control, it's not immutable any longer
      if the try_lock is set so remove const keywords on rmap related functions.
      Since rmap walking is already expensive operation, I doubt the const
      would help sizable benefit( And we didn't have it until 5.17).
      
      In a heavy app workload in Android, trace shows following statistics.  It
      almost removes rmap lock contention from reclaim path.
      
      Martin Liu reported:
      
      Before:
      
         max_dur(ms)  min_dur(ms)  max-min(dur)ms  avg_dur(ms)  sum_dur(ms)  count blocked_function
               1632            0            1631   151.542173        31672    209  page_lock_anon_vma_read
                601            0             601   145.544681        28817    198  rmap_walk_file
      
      After:
      
         max_dur(ms)  min_dur(ms)  max-min(dur)ms  avg_dur(ms)  sum_dur(ms)  count blocked_function
                NaN          NaN              NaN          NaN          NaN    0.0             NaN
                  0            0                0     0.127645            1     12  rmap_walk_file
      
      [minchan@kernel.org: add comment, per Matthew]
        Link: https://lkml.kernel.org/r/YnNqeB5tUf6LZ57b@google.com
      Link: https://lkml.kernel.org/r/20220510215423.164547-1-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: John Dias <joaodias@google.com>
      Cc: Tim Murray <timmurray@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Martin Liu <liumartin@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      6d4675e6
  3. 13 5月, 2022 4 次提交
  4. 10 5月, 2022 1 次提交
  5. 29 4月, 2022 8 次提交
  6. 22 4月, 2022 2 次提交
    • X
      mm/memory-failure.c: skip huge_zero_page in memory_failure() · d173d541
      Xu Yu 提交于
      Kernel panic when injecting memory_failure for the global
      huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows.
      
        Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
        page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
        head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
        flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
        raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
        raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
        page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
        ------------[ cut here ]------------
        kernel BUG at mm/huge_memory.c:2499!
        invalid opcode: 0000 [#1] PREEMPT SMP PTI
        CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
        Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
        RIP: 0010:split_huge_page_to_list+0x66a/0x880
        Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
        RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
        RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
        RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
        R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
        R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
        FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         try_to_split_thp_page+0x3a/0x130
         memory_failure+0x128/0x800
         madvise_inject_error.cold+0x8b/0xa1
         __x64_sys_madvise+0x54/0x60
         do_syscall_64+0x35/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        RIP: 0033:0x7fc3754f8bf9
        Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
        RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
        RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
        RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
        RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
        R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
        R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000
      
      This makes huge_zero_page bail out explicitly before split in
      memory_failure(), thus the panic above won't happen again.
      
      Link: https://lkml.kernel.org/r/497d3835612610e370c74e697ea3c721d1d55b9c.1649775850.git.xuyu@linux.alibaba.com
      Fixes: 6a46079c ("HWPOISON: The high level memory error handler in the VM v7")
      Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Suggested-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
      Acked-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d173d541
    • N
      mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb() · 405ce051
      Naoya Horiguchi 提交于
      There is a race condition between memory_failure_hugetlb() and hugetlb
      free/demotion, which causes setting PageHWPoison flag on the wrong page.
      The one simple result is that wrong processes can be killed, but another
      (more serious) one is that the actual error is left unhandled, so no one
      prevents later access to it, and that might lead to more serious results
      like consuming corrupted data.
      
      Think about the below race window:
      
        CPU 1                                   CPU 2
        memory_failure_hugetlb
        struct page *head = compound_head(p);
                                                hugetlb page might be freed to
                                                buddy, or even changed to another
                                                compound page.
      
        get_hwpoison_page -- page is not what we want now...
      
      The current code first does prechecks roughly and then reconfirms after
      taking refcount, but it's found that it makes code overly complicated,
      so move the prechecks in a single hugetlb_lock range.
      
      A newly introduced function, try_memory_failure_hugetlb(), always takes
      hugetlb_lock (even for non-hugetlb pages).  That can be improved, but
      memory_failure() is rare in principle, so should not be a big problem.
      
      Link: https://lkml.kernel.org/r/20220408135323.1559401-2-naoya.horiguchi@linux.dev
      Fixes: 761ad8d7 ("mm: hwpoison: introduce memory_failure_hugetlb()")
      Signed-off-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reported-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      405ce051
  7. 23 3月, 2022 15 次提交
  8. 22 3月, 2022 3 次提交
  9. 30 1月, 2022 1 次提交
  10. 15 1月, 2022 4 次提交