1. 18 1月, 2023 1 次提交
  2. 02 12月, 2022 1 次提交
    • D
      mm/hugetlb: fix hugetlb not supporting softdirty tracking · 9742b1b6
      David Hildenbrand 提交于
      stable inclusion
      from stable-v5.10.140
      commit 62af37c5cd7f5fd071086cab645844bf5bcdc0ef
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=62af37c5cd7f5fd071086cab645844bf5bcdc0ef
      
      --------------------------------
      
      commit f96f7a40 upstream.
      
      Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2.
      
      I observed that hugetlb does not support/expect write-faults in shared
      mappings that would have to map the R/O-mapped page writable -- and I
      found two case where we could currently get such faults and would
      erroneously map an anon page into a shared mapping.
      
      Reproducers part of the patches.
      
      I propose to backport both fixes to stable trees.  The first fix needs a
      small adjustment.
      
      This patch (of 2):
      
      Staring at hugetlb_wp(), one might wonder where all the logic for shared
      mappings is when stumbling over a write-protected page in a shared
      mapping.  In fact, there is none, and so far we thought we could get away
      with that because e.g., mprotect() should always do the right thing and
      map all pages directly writable.
      
      Looks like we were wrong:
      
      --------------------------------------------------------------------------
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <fcntl.h>
       #include <unistd.h>
       #include <errno.h>
       #include <sys/mman.h>
      
       #define HUGETLB_SIZE (2 * 1024 * 1024u)
      
       static void clear_softdirty(void)
       {
               int fd = open("/proc/self/clear_refs", O_WRONLY);
               const char *ctrl = "4";
               int ret;
      
               if (fd < 0) {
                       fprintf(stderr, "open(clear_refs) failed\n");
                       exit(1);
               }
               ret = write(fd, ctrl, strlen(ctrl));
               if (ret != strlen(ctrl)) {
                       fprintf(stderr, "write(clear_refs) failed\n");
                       exit(1);
               }
               close(fd);
       }
      
       int main(int argc, char **argv)
       {
               char *map;
               int fd;
      
               fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
               if (!fd) {
                       fprintf(stderr, "open() failed\n");
                       return -errno;
               }
               if (ftruncate(fd, HUGETLB_SIZE)) {
                       fprintf(stderr, "ftruncate() failed\n");
                       return -errno;
               }
      
               map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
               if (map == MAP_FAILED) {
                       fprintf(stderr, "mmap() failed\n");
                       return -errno;
               }
      
               *map = 0;
      
               if (mprotect(map, HUGETLB_SIZE, PROT_READ)) {
                       fprintf(stderr, "mmprotect() failed\n");
                       return -errno;
               }
      
               clear_softdirty();
      
               if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) {
                       fprintf(stderr, "mmprotect() failed\n");
                       return -errno;
               }
      
               *map = 0;
      
               return 0;
       }
      --------------------------------------------------------------------------
      
      Above test fails with SIGBUS when there is only a single free hugetlb page.
       # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
       # ./test
       Bus error (core dumped)
      
      And worse, with sufficient free hugetlb pages it will map an anonymous page
      into a shared mapping, for example, messing up accounting during unmap
      and breaking MAP_SHARED semantics:
       # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
       # ./test
       # cat /proc/meminfo | grep HugePages_
       HugePages_Total:       2
       HugePages_Free:        1
       HugePages_Rsvd:    18446744073709551615
       HugePages_Surp:        0
      
      Reason in this particular case is that vma_wants_writenotify() will
      return "true", removing VM_SHARED in vma_set_page_prot() to map pages
      write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
      support softdirty tracking.
      
      Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com
      Fixes: 64e45507 ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Jamie Liu <jamieliu@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: <stable@vger.kernel.org>	[3.18+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Reviewed-by: NWei Li <liwei391@huawei.com>
      9742b1b6
  3. 18 11月, 2022 1 次提交
  4. 11 11月, 2022 1 次提交
  5. 29 9月, 2022 1 次提交
  6. 26 7月, 2022 1 次提交
  7. 06 7月, 2022 1 次提交
  8. 17 1月, 2022 1 次提交
  9. 30 12月, 2021 4 次提交
  10. 23 12月, 2021 1 次提交
  11. 10 12月, 2021 1 次提交
  12. 03 12月, 2021 1 次提交
  13. 30 10月, 2021 2 次提交
  14. 02 9月, 2021 1 次提交
  15. 19 7月, 2021 1 次提交
  16. 07 12月, 2020 1 次提交
  17. 19 10月, 2020 2 次提交
  18. 17 10月, 2020 2 次提交
  19. 14 10月, 2020 8 次提交
  20. 12 10月, 2020 1 次提交
    • M
      mm: mmap: Fix general protection fault in unlink_file_vma() · bc4fe4cd
      Miaohe Lin 提交于
      The syzbot reported the below general protection fault:
      
        general protection fault, probably for non-canonical address
        0xe00eeaee0000003b: 0000 [#1] PREEMPT SMP KASAN
        KASAN: maybe wild-memory-access in range [0x00777770000001d8-0x00777770000001df]
        CPU: 1 PID: 10488 Comm: syz-executor721 Not tainted 5.9.0-rc3-syzkaller #0
        RIP: 0010:unlink_file_vma+0x57/0xb0 mm/mmap.c:164
        Call Trace:
           free_pgtables+0x1b3/0x2f0 mm/memory.c:415
           exit_mmap+0x2c0/0x530 mm/mmap.c:3184
           __mmput+0x122/0x470 kernel/fork.c:1076
           mmput+0x53/0x60 kernel/fork.c:1097
           exit_mm kernel/exit.c:483 [inline]
           do_exit+0xa8b/0x29f0 kernel/exit.c:793
           do_group_exit+0x125/0x310 kernel/exit.c:903
           get_signal+0x428/0x1f00 kernel/signal.c:2757
           arch_do_signal+0x82/0x2520 arch/x86/kernel/signal.c:811
           exit_to_user_mode_loop kernel/entry/common.c:136 [inline]
           exit_to_user_mode_prepare+0x1ae/0x200 kernel/entry/common.c:167
           syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:242
           entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      It's because the ->mmap() callback can change vma->vm_file and fput the
      original file.  But the commit d70cec89 ("mm: mmap: merge vma after
      call_mmap() if possible") failed to catch this case and always fput()
      the original file, hence add an extra fput().
      
      [ Thanks Hillf for pointing this extra fput() out. ]
      
      Fixes: d70cec89 ("mm: mmap: merge vma after call_mmap() if possible")
      Reported-by: syzbot+c5d5a51dcbb558ca0cb5@syzkaller.appspotmail.com
      Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Christian König <ckoenig.leichtzumerken@gmail.com>
      Cc: Hongxiang Lou <louhongxiang@huawei.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Link: https://lkml.kernel.org/r/20200916090733.31427-1-linmiaohe@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc4fe4cd
  21. 25 9月, 2020 1 次提交
  22. 04 9月, 2020 1 次提交
  23. 08 8月, 2020 3 次提交
  24. 25 7月, 2020 1 次提交
  25. 30 6月, 2020 1 次提交
    • P
      mm/mmap.c: Add cond_resched() for exit_mmap() CPU stalls · 0a3b3c25
      Paul E. McKenney 提交于
      A large process running on a heavily loaded system can encounter the
      following RCU CPU stall warning:
      
        rcu: INFO: rcu_sched self-detected stall on CPU
        rcu: 	3-....: (20998 ticks this GP) idle=4ea/1/0x4000000000000002 softirq=556558/556558 fqs=5190
        	(t=21013 jiffies g=1005461 q=132576)
        NMI backtrace for cpu 3
        CPU: 3 PID: 501900 Comm: aio-free-ring-w Kdump: loaded Not tainted 5.2.9-108_fbk12_rc3_3858_gb83b75af7909 #1
        Hardware name: Wiwynn   HoneyBadger/PantherPlus, BIOS HBM6.71 02/03/2016
        Call Trace:
         <IRQ>
         dump_stack+0x46/0x60
         nmi_cpu_backtrace.cold.3+0x13/0x50
         ? lapic_can_unplug_cpu.cold.27+0x34/0x34
         nmi_trigger_cpumask_backtrace+0xba/0xca
         rcu_dump_cpu_stacks+0x99/0xc7
         rcu_sched_clock_irq.cold.87+0x1aa/0x397
         ? tick_sched_do_timer+0x60/0x60
         update_process_times+0x28/0x60
         tick_sched_timer+0x37/0x70
         __hrtimer_run_queues+0xfe/0x270
         hrtimer_interrupt+0xf4/0x210
         smp_apic_timer_interrupt+0x5e/0x120
         apic_timer_interrupt+0xf/0x20
         </IRQ>
        RIP: 0010:kmem_cache_free+0x223/0x300
        Code: 88 00 00 00 0f 85 ca 00 00 00 41 8b 55 18 31 f6 f7 da 41 f6 45 0a 02 40 0f 94 c6 83 c6 05 9c 41 5e fa e8 a0 a7 01 00 41 56 9d <49> 8b 47 08 a8 03 0f 85 87 00 00 00 65 48 ff 08 e9 3d fe ff ff 65
        RSP: 0018:ffffc9000e8e3da8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
        RAX: 0000000000020000 RBX: ffff88861b9de960 RCX: 0000000000000030
        RDX: fffffffffffe41e8 RSI: 000060777fe3a100 RDI: 000000000001be18
        RBP: ffffea00186e7780 R08: ffffffffffffffff R09: ffffffffffffffff
        R10: ffff88861b9dea28 R11: ffff88887ffde000 R12: ffffffff81230a1f
        R13: ffff888854684dc0 R14: 0000000000000206 R15: ffff8888547dbc00
         ? remove_vma+0x4f/0x60
         remove_vma+0x4f/0x60
         exit_mmap+0xd6/0x160
         mmput+0x4a/0x110
         do_exit+0x278/0xae0
         ? syscall_trace_enter+0x1d3/0x2b0
         ? handle_mm_fault+0xaa/0x1c0
         do_group_exit+0x3a/0xa0
         __x64_sys_exit_group+0x14/0x20
         do_syscall_64+0x42/0x100
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      And on a PREEMPT=n kernel, the "while (vma)" loop in exit_mmap() can run
      for a very long time given a large process.  This commit therefore adds
      a cond_resched() to this loop, providing RCU any needed quiescent states.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <linux-mm@kvack.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      0a3b3c25