1. 12 9月, 2022 2 次提交
  2. 09 8月, 2022 6 次提交
  3. 30 7月, 2022 1 次提交
  4. 18 7月, 2022 4 次提交
  5. 04 7月, 2022 2 次提交
  6. 29 6月, 2022 1 次提交
  7. 17 6月, 2022 1 次提交
    • Z
      mm/memory-failure: disable unpoison once hw error happens · 67f22ba7
      zhenwei pi 提交于
      Currently unpoison_memory(unsigned long pfn) is designed for soft
      poison(hwpoison-inject) only.  Since 17fae129, the KPTE gets cleared
      on a x86 platform once hardware memory corrupts.
      
      Unpoisoning a hardware corrupted page puts page back buddy only, the
      kernel has a chance to access the page with *NOT PRESENT* KPTE.  This
      leads BUG during accessing on the corrupted KPTE.
      
      Suggested by David&Naoya, disable unpoison mechanism when a real HW error
      happens to avoid BUG like this:
      
       Unpoison: Software-unpoisoned page 0x61234
       BUG: unable to handle page fault for address: ffff888061234000
       #PF: supervisor write access in kernel mode
       #PF: error_code(0x0002) - not-present page
       PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
       Oops: 0002 [#1] PREEMPT SMP NOPTI
       CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G   M       OE     5.18.0.bm.1-amd64 #7
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
       RIP: 0010:clear_page_erms+0x7/0x10
       Code: ...
       RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
       RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
       RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
       R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
       R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
       FS:  00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       PKRU: 55555554
       Call Trace:
        <TASK>
        prep_new_page+0x151/0x170
        get_page_from_freelist+0xca0/0xe20
        ? sysvec_apic_timer_interrupt+0xab/0xc0
        ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
        __alloc_pages+0x17e/0x340
        __folio_alloc+0x17/0x40
        vma_alloc_folio+0x84/0x280
        __handle_mm_fault+0x8d4/0xeb0
        handle_mm_fault+0xd5/0x2a0
        do_user_addr_fault+0x1d0/0x680
        ? kvm_read_and_reset_apf_flags+0x3b/0x50
        exc_page_fault+0x78/0x170
        asm_exc_page_fault+0x27/0x30
      
      Link: https://lkml.kernel.org/r/20220615093209.259374-2-pizhenwei@bytedance.com
      Fixes: 847ce401 ("HWPOISON: Add unpoisoning support")
      Fixes: 17fae129 ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
      Signed-off-by: Nzhenwei pi <pizhenwei@bytedance.com>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>	[5.8+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      67f22ba7
  8. 20 5月, 2022 1 次提交
    • M
      mm: don't be stuck to rmap lock on reclaim path · 6d4675e6
      Minchan Kim 提交于
      The rmap locks(i_mmap_rwsem and anon_vma->root->rwsem) could be contended
      under memory pressure if processes keep working on their vmas(e.g., fork,
      mmap, munmap).  It makes reclaim path stuck.  In our real workload traces,
      we see kswapd is waiting the lock for 300ms+(worst case, a sec) and it
      makes other processes entering direct reclaim, which were also stuck on
      the lock.
      
      This patch makes lru aging path try_lock mode like shink_page_list so the
      reclaim context will keep working with next lru pages without being stuck.
      if it found the rmap lock contended, it rotates the page back to head of
      lru in both active/inactive lrus to make them consistent behavior, which
      is basic starting point rather than adding more heristic.
      
      Since this patch introduces a new "contended" field as out-param along
      with try_lock in-param in rmap_walk_control, it's not immutable any longer
      if the try_lock is set so remove const keywords on rmap related functions.
      Since rmap walking is already expensive operation, I doubt the const
      would help sizable benefit( And we didn't have it until 5.17).
      
      In a heavy app workload in Android, trace shows following statistics.  It
      almost removes rmap lock contention from reclaim path.
      
      Martin Liu reported:
      
      Before:
      
         max_dur(ms)  min_dur(ms)  max-min(dur)ms  avg_dur(ms)  sum_dur(ms)  count blocked_function
               1632            0            1631   151.542173        31672    209  page_lock_anon_vma_read
                601            0             601   145.544681        28817    198  rmap_walk_file
      
      After:
      
         max_dur(ms)  min_dur(ms)  max-min(dur)ms  avg_dur(ms)  sum_dur(ms)  count blocked_function
                NaN          NaN              NaN          NaN          NaN    0.0             NaN
                  0            0                0     0.127645            1     12  rmap_walk_file
      
      [minchan@kernel.org: add comment, per Matthew]
        Link: https://lkml.kernel.org/r/YnNqeB5tUf6LZ57b@google.com
      Link: https://lkml.kernel.org/r/20220510215423.164547-1-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: John Dias <joaodias@google.com>
      Cc: Tim Murray <timmurray@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Martin Liu <liumartin@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      6d4675e6
  9. 13 5月, 2022 4 次提交
  10. 10 5月, 2022 1 次提交
  11. 29 4月, 2022 8 次提交
  12. 22 4月, 2022 2 次提交
    • X
      mm/memory-failure.c: skip huge_zero_page in memory_failure() · d173d541
      Xu Yu 提交于
      Kernel panic when injecting memory_failure for the global
      huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows.
      
        Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
        page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
        head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
        flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
        raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
        raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
        page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
        ------------[ cut here ]------------
        kernel BUG at mm/huge_memory.c:2499!
        invalid opcode: 0000 [#1] PREEMPT SMP PTI
        CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
        Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
        RIP: 0010:split_huge_page_to_list+0x66a/0x880
        Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
        RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
        RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
        RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
        R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
        R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
        FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         try_to_split_thp_page+0x3a/0x130
         memory_failure+0x128/0x800
         madvise_inject_error.cold+0x8b/0xa1
         __x64_sys_madvise+0x54/0x60
         do_syscall_64+0x35/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        RIP: 0033:0x7fc3754f8bf9
        Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
        RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
        RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
        RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
        RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
        R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
        R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000
      
      This makes huge_zero_page bail out explicitly before split in
      memory_failure(), thus the panic above won't happen again.
      
      Link: https://lkml.kernel.org/r/497d3835612610e370c74e697ea3c721d1d55b9c.1649775850.git.xuyu@linux.alibaba.com
      Fixes: 6a46079c ("HWPOISON: The high level memory error handler in the VM v7")
      Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Suggested-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
      Acked-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d173d541
    • N
      mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb() · 405ce051
      Naoya Horiguchi 提交于
      There is a race condition between memory_failure_hugetlb() and hugetlb
      free/demotion, which causes setting PageHWPoison flag on the wrong page.
      The one simple result is that wrong processes can be killed, but another
      (more serious) one is that the actual error is left unhandled, so no one
      prevents later access to it, and that might lead to more serious results
      like consuming corrupted data.
      
      Think about the below race window:
      
        CPU 1                                   CPU 2
        memory_failure_hugetlb
        struct page *head = compound_head(p);
                                                hugetlb page might be freed to
                                                buddy, or even changed to another
                                                compound page.
      
        get_hwpoison_page -- page is not what we want now...
      
      The current code first does prechecks roughly and then reconfirms after
      taking refcount, but it's found that it makes code overly complicated,
      so move the prechecks in a single hugetlb_lock range.
      
      A newly introduced function, try_memory_failure_hugetlb(), always takes
      hugetlb_lock (even for non-hugetlb pages).  That can be improved, but
      memory_failure() is rare in principle, so should not be a big problem.
      
      Link: https://lkml.kernel.org/r/20220408135323.1559401-2-naoya.horiguchi@linux.dev
      Fixes: 761ad8d7 ("mm: hwpoison: introduce memory_failure_hugetlb()")
      Signed-off-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reported-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      405ce051
  13. 23 3月, 2022 7 次提交