1. 15 8月, 2020 40 次提交
    • K
      iomap: constify ioreadX() iomem argument (as in generic implementation) · 8f28ca6b
      Krzysztof Kozlowski 提交于
      Patch series "iomap: Constify ioreadX() iomem argument", v3.
      
      The ioread8/16/32() and others have inconsistent interface among the
      architectures: some taking address as const, some not.
      
      It seems there is nothing really stopping all of them to take pointer to
      const.
      
      This patch (of 4):
      
      The ioreadX() and ioreadX_rep() helpers have inconsistent interface.  On
      some architectures void *__iomem address argument is a pointer to const,
      on some not.
      
      Implementations of ioreadX() do not modify the memory under the address so
      they can be converted to a "const" version for const-safety and
      consistency among architectures.
      
      [krzk@kernel.org: sh: clk: fix assignment from incompatible pointer type for ioreadX()]
        Link: http://lkml.kernel.org/r/20200723082017.24053-1-krzk@kernel.org
      [akpm@linux-foundation.org: fix drivers/mailbox/bcm-pdc-mailbox.c]
        Link: http://lkml.kernel.org/r/202007132209.Rxmv4QyS%25lkp@intel.comSuggested-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Kalle Valo <kvalo@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jon Mason <jdmason@kudzu.us>
      Cc: Allen Hubbe <allenbh@gmail.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Link: http://lkml.kernel.org/r/20200709072837.5869-1-krzk@kernel.org
      Link: http://lkml.kernel.org/r/20200709072837.5869-2-krzk@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f28ca6b
    • K
      sh: use generic strncpy() · f9e7ff9c
      Kuninori Morimoto 提交于
      Current SH will get below warning at strncpy()
      
      In file included from ${LINUX}/arch/sh/include/asm/string.h:3,
                       from ${LINUX}/include/linux/string.h:20,
                       from ${LINUX}/include/linux/bitmap.h:9,
                       from ${LINUX}/include/linux/nodemask.h:95,
                       from ${LINUX}/include/linux/mmzone.h:17,
                       from ${LINUX}/include/linux/gfp.h:6,
                       from ${LINUX}/innclude/linux/slab.h:15,
                       from ${LINUX}/linux/drivers/mmc/host/vub300.c:38:
      ${LINUX}/drivers/mmc/host/vub300.c: In function 'new_system_port_status':
      ${LINUX}/arch/sh/include/asm/string_32.h:51:42: warning: array subscript\
        80 is above array bounds of 'char[26]' [-Warray-bounds]
         : "0" (__dest), "1" (__src), "r" (__src+__n)
                                           ~~~~~^~~~
      
      In general, strncpy() should behave like below.
      
      	char dest[10];
      	char *src = "12345";
      
      	strncpy(dest, src, 10);
      	// dest = {'1', '2', '3', '4', '5',
      	           '\0','\0','\0','\0','\0'}
      
      But, current SH strnpy() has 2 issues.
      1st is it will access to out-of-memory (= src + 10).
      2nd is it needs big fixup for it, and maintenance __asm__
      code is difficult.
      
      To solve these issues, this patch simply uses generic strncpy()
      instead of architecture specific one.
      Signed-off-by: NKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Alan Modra <amodra@gmail.com>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Chen Zhou <chenzhou10@huawei.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Romain Naour <romain.naour@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://marc.info/?l=linux-renesas-soc&m=157664657013309Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9e7ff9c
    • K
      sh: clkfwk: remove r8/r16/r32 · a8e3943b
      Kuninori Morimoto 提交于
      SH will get below warning
      
      ${LINUX}/drivers/sh/clk/cpg.c: In function 'r8':
      ${LINUX}/drivers/sh/clk/cpg.c:41:17: warning: passing argument 1 of 'ioread8'
       discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        return ioread8(addr);
                       ^~~~
      In file included from ${LINUX}/arch/sh/include/asm/io.h:21,
                       from ${LINUX}/include/linux/io.h:13,
                       from ${LINUX}/drivers/sh/clk/cpg.c:14:
      ${LINUX}/include/asm-generic/iomap.h:29:29: note: expected 'void *' but
      argument is of type 'const void *'
       extern unsigned int ioread8(void __iomem *);
                                   ^~~~~~~~~~~~~~
      
      We don't need "const" for r8/r16/r32.  And we don't need r8/r16/r32
      themselvs.  This patch cleanup these.
      Signed-off-by: NKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Alan Modra <amodra@gmail.com>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Chen Zhou <chenzhou10@huawei.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Romain Naour <romain.naour@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      X-MARC-Message: https://marc.info/?l=linux-renesas-soc&m=157852973916903Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8e3943b
    • R
      include/asm-generic/vmlinux.lds.h: align ro_after_init · 7f897acb
      Romain Naour 提交于
      Since the patch [1], building the kernel using a toolchain built with
      binutils 2.33.1 prevents booting a sh4 system under Qemu.  Apply the patch
      provided by Alan Modra [2] that fix alignment of rodata.
      
      [1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ebd2263ba9a9124d93bbc0ece63d7e0fae89b40e
      [2] https://www.sourceware.org/ml/binutils/2019-12/msg00112.htmlSigned-off-by: NRomain Naour <romain.naour@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Alan Modra <amodra@gmail.com>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Chen Zhou <chenzhou10@huawei.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: <stable@vger.kernel.org>
      Link: https://marc.info/?l=linux-sh&m=158429470221261Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f897acb
    • Q
      mm: annotate a data race in page_zonenum() · c403f6a3
      Qian Cai 提交于
       BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
      
       write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
        page_cpupid_xchg_last+0x51/0x80
        page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
        wp_page_reuse+0x3e/0xc0
        wp_page_reuse at mm/memory.c:2453
        do_wp_page+0x472/0x7b0
        do_wp_page at mm/memory.c:2798
        __handle_mm_fault+0xcb0/0xd00
        handle_pte_fault at mm/memory.c:4049
        (inlined by) __handle_mm_fault at mm/memory.c:4163
        handle_mm_fault+0xfc/0x2f0
        handle_mm_fault at mm/memory.c:4200
        do_page_fault+0x263/0x6f9
        do_user_addr_fault at arch/x86/mm/fault.c:1465
        (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
        page_fault+0x34/0x40
      
       read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
        put_page+0x15a/0x1f0
        page_zonenum at include/linux/mm.h:923
        (inlined by) is_zone_device_page at include/linux/mm.h:929
        (inlined by) page_is_devmap_managed at include/linux/mm.h:948
        (inlined by) put_page at include/linux/mm.h:1023
        wp_page_copy+0x571/0x930
        wp_page_copy at mm/memory.c:2615
        do_wp_page+0x107/0x7b0
        __handle_mm_fault+0xcb0/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      A page never changes its zone number. The zone number happens to be
      stored in the same word as other bits which are modified, but the zone
      number bits will never be modified by any other write, so it can accept
      a reload of the zone bits after an intervening write and it don't need
      to use READ_ONCE(). Thus, annotate this data race using
      ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
      writes to it.
      Suggested-by: NMarco Elver <elver@google.com>
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Link: http://lkml.kernel.org/r/1581619089-14472-1-git-send-email-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c403f6a3
    • Q
      mm/swap.c: annotate data races for lru_rotate_pvecs · 7e0cc01e
      Qian Cai 提交于
      Read to lru_add_pvec->nr could be interrupted and then write to the same
      variable.  The write has local interrupt disabled, but the plain reads
      result in data races.  However, it is unlikely the compilers could do much
      damage here given that lru_add_pvec->nr is a "unsigned char" and there is
      an existing compiler barrier.  Thus, annotate the reads using the
      data_race() macro.  The data races were reported by KCSAN,
      
       BUG: KCSAN: data-race in lru_add_drain_cpu / rotate_reclaimable_page
      
       write to 0xffff9291ebcb8a40 of 1 bytes by interrupt on cpu 23:
        rotate_reclaimable_page+0x2df/0x490
        pagevec_add at include/linux/pagevec.h:81
        (inlined by) rotate_reclaimable_page at mm/swap.c:259
        end_page_writeback+0x1b5/0x2b0
        end_swap_bio_write+0x1d0/0x280
        bio_endio+0x297/0x560
        dec_pending+0x218/0x430 [dm_mod]
        clone_endio+0xe4/0x2c0 [dm_mod]
        bio_endio+0x297/0x560
        blk_update_request+0x201/0x920
        scsi_end_request+0x6b/0x4a0
        scsi_io_completion+0xb7/0x7e0
        scsi_finish_command+0x1ed/0x2a0
        scsi_softirq_done+0x1c9/0x1d0
        blk_done_softirq+0x181/0x1d0
        __do_softirq+0xd9/0x57c
        irq_exit+0xa2/0xc0
        do_IRQ+0x8b/0x190
        ret_from_intr+0x0/0x42
        delay_tsc+0x46/0x80
        __const_udelay+0x3c/0x40
        __udelay+0x10/0x20
        kcsan_setup_watchpoint+0x202/0x3a0
        __tsan_read1+0xc2/0x100
        lru_add_drain_cpu+0xb8/0x3f0
        lru_add_drain+0x25/0x40
        shrink_active_list+0xe1/0xc80
        shrink_lruvec+0x766/0xb70
        shrink_node+0x2d6/0xca0
        do_try_to_free_pages+0x1f7/0x9a0
        try_to_free_pages+0x252/0x5b0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x16e/0x6f0
        __handle_mm_fault+0xcd5/0xd40
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffff9291ebcb8a40 of 1 bytes by task 37761 on cpu 23:
        lru_add_drain_cpu+0xb8/0x3f0
        lru_add_drain_cpu at mm/swap.c:602
        lru_add_drain+0x25/0x40
        shrink_active_list+0xe1/0xc80
        shrink_lruvec+0x766/0xb70
        shrink_node+0x2d6/0xca0
        do_try_to_free_pages+0x1f7/0x9a0
        try_to_free_pages+0x252/0x5b0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x16e/0x6f0
        __handle_mm_fault+0xcd5/0xd40
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       2 locks held by oom02/37761:
        #0: ffff9281e5928808 (&mm->mmap_sem#2){++++}, at: do_page_fault
        #1: ffffffffb3ade380 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part
       irq event stamp: 1949217
       trace_hardirqs_on_thunk+0x1a/0x1c
       __do_softirq+0x2e7/0x57c
       __do_softirq+0x34c/0x57c
       irq_exit+0xa2/0xc0
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 23 PID: 37761 Comm: oom02 Not tainted 5.6.0-rc3-next-20200226+ #6
       Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NMarco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/20200228044018.1263-1-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e0cc01e
    • Q
      mm/rmap: annotate a data race at tlb_flush_batched · 9c1177b6
      Qian Cai 提交于
      mm->tlb_flush_batched could be accessed concurrently as noticed by
      KCSAN,
      
       BUG: KCSAN: data-race in flush_tlb_batched_pending / try_to_unmap_one
      
       write to 0xffff93f754880bd0 of 1 bytes by task 822 on cpu 6:
        try_to_unmap_one+0x59a/0x1ab0
        set_tlb_ubc_flush_pending at mm/rmap.c:635
        (inlined by) try_to_unmap_one at mm/rmap.c:1538
        rmap_walk_anon+0x296/0x650
        rmap_walk+0xdf/0x100
        try_to_unmap+0x18a/0x2f0
        shrink_page_list+0xef6/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        balance_pgdat+0x652/0xd90
        kswapd+0x396/0x8d0
        kthread+0x1e0/0x200
        ret_from_fork+0x27/0x50
      
       read to 0xffff93f754880bd0 of 1 bytes by task 6364 on cpu 4:
        flush_tlb_batched_pending+0x29/0x90
        flush_tlb_batched_pending at mm/rmap.c:682
        change_p4d_range+0x5dd/0x1030
        change_pte_range at mm/mprotect.c:44
        (inlined by) change_pmd_range at mm/mprotect.c:212
        (inlined by) change_pud_range at mm/mprotect.c:240
        (inlined by) change_p4d_range at mm/mprotect.c:260
        change_protection+0x222/0x310
        change_prot_numa+0x3e/0x60
        task_numa_work+0x219/0x350
        task_work_run+0xed/0x140
        prepare_exit_to_usermode+0x2cc/0x2e0
        ret_from_intr+0x32/0x42
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 4 PID: 6364 Comm: mtest01 Tainted: G        W    L 5.5.0-next-20200210+ #5
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      flush_tlb_batched_pending() is under PTL but the write is not, but
      mm->tlb_flush_batched is only a bool type, so the value is unlikely to be
      shattered.  Thus, mark it as an intentional data race by using the data
      race macro.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/1581450783-8262-1-git-send-email-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c1177b6
    • Q
      mm/mempool: fix a data race in mempool_free() · abe1de42
      Qian Cai 提交于
      mempool_t pool.curr_nr could be accessed concurrently as noticed by
      KCSAN,
      
       BUG: KCSAN: data-race in mempool_free / remove_element
      
       write to 0xffffffffa937638c of 4 bytes by task 6359 on cpu 113:
        remove_element+0x4a/0x1c0
        remove_element at mm/mempool.c:132
        mempool_alloc+0x102/0x210
        (inlined by) mempool_alloc at mm/mempool.c:399
        bio_alloc_bioset+0x106/0x2c0
        get_swap_bio+0x49/0x230
        __swap_writepage+0x680/0xc30
        swap_writepage+0x9c/0xf0
        pageout+0x33e/0xae0
        shrink_page_list+0x1f57/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        <snip>
      
       read to 0xffffffffa937638c of 4 bytes by interrupt on cpu 64:
        mempool_free+0x3e/0x150
        mempool_free at mm/mempool.c:492
        bio_free+0x192/0x280
        bio_put+0x91/0xd0
        end_swap_bio_write+0x1d8/0x280
        bio_endio+0x2c2/0x5b0
        dec_pending+0x22b/0x440 [dm_mod]
        clone_endio+0xe4/0x2c0 [dm_mod]
        bio_endio+0x2c2/0x5b0
        blk_update_request+0x217/0x940
        scsi_end_request+0x6b/0x4d0
        scsi_io_completion+0xb7/0x7e0
        scsi_finish_command+0x223/0x310
        scsi_softirq_done+0x1d5/0x210
        blk_mq_complete_request+0x224/0x250
        scsi_mq_done+0xc2/0x250
        pqi_raid_io_complete+0x5a/0x70 [smartpqi]
        pqi_irq_handler+0x150/0x1410 [smartpqi]
        __handle_irq_event_percpu+0x90/0x540
        handle_irq_event_percpu+0x49/0xd0
        handle_irq_event+0x85/0xca
        handle_edge_irq+0x13f/0x3e0
        do_IRQ+0x86/0x190
        <snip>
      
      Since the write is under pool->lock but the read is done as lockless.
      Even though the commit 5b990546 ("mempool: fix and document
      synchronization and memory barrier usage") introduced the smp_wmb() and
      smp_rmb() pair to improve the situation, it is adequate to protect it
      from data races which could lead to a logic bug, so fix it by adding
      READ_ONCE() for the read.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Link: http://lkml.kernel.org/r/1581446384-2131-1-git-send-email-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      abe1de42
    • Q
      mm/list_lru: fix a data race in list_lru_count_one · a1f45935
      Qian Cai 提交于
      struct list_lru_one l.nr_items could be accessed concurrently as noticed
      by KCSAN,
      
       BUG: KCSAN: data-race in list_lru_count_one / list_lru_isolate_move
      
       write to 0xffffa102789c4510 of 8 bytes by task 823 on cpu 39:
        list_lru_isolate_move+0xf9/0x130
        list_lru_isolate_move at mm/list_lru.c:180
        inode_lru_isolate+0x12b/0x2a0
        __list_lru_walk_one+0x122/0x3d0
        list_lru_walk_one+0x75/0xa0
        prune_icache_sb+0x8b/0xc0
        super_cache_scan+0x1b8/0x250
        do_shrink_slab+0x256/0x6d0
        shrink_slab+0x41b/0x4a0
        shrink_node+0x35c/0xd80
        balance_pgdat+0x652/0xd90
        kswapd+0x396/0x8d0
        kthread+0x1e0/0x200
        ret_from_fork+0x27/0x50
      
       read to 0xffffa102789c4510 of 8 bytes by task 6345 on cpu 56:
        list_lru_count_one+0x116/0x2f0
        list_lru_count_one at mm/list_lru.c:193
        super_cache_count+0xe8/0x170
        do_shrink_slab+0x95/0x6d0
        shrink_slab+0x41b/0x4a0
        shrink_node+0x35c/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 56 PID: 6345 Comm: oom01 Tainted: G        W    L 5.5.0-next-20200205+ #4
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      A shattered l.nr_items could affect the shrinker behaviour due to a data
      race. Fix it by adding READ_ONCE() for the read. Since the writes are
      aligned and up to word-size, assume those are safe from data races to
      avoid readability issues of writing WRITE_ONCE(var, var + val).
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1581114679-5488-1-git-send-email-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a1f45935
    • Q
      mm/memcontrol: fix a data race in scan count · e0e3f42f
      Qian Cai 提交于
      struct mem_cgroup_per_node mz.lru_zone_size[zone_idx][lru] could be
      accessed concurrently as noticed by KCSAN,
      
       BUG: KCSAN: data-race in lruvec_lru_size / mem_cgroup_update_lru_size
      
       write to 0xffff9c804ca285f8 of 8 bytes by task 50951 on cpu 12:
        mem_cgroup_update_lru_size+0x11c/0x1d0
        mem_cgroup_update_lru_size at mm/memcontrol.c:1266
        isolate_lru_pages+0x6a9/0xf30
        shrink_active_list+0x123/0xcc0
        shrink_lruvec+0x8fd/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffff9c804ca285f8 of 8 bytes by task 50964 on cpu 95:
        lruvec_lru_size+0xbb/0x270
        mem_cgroup_get_zone_lru_size at include/linux/memcontrol.h:536
        (inlined by) lruvec_lru_size at mm/vmscan.c:326
        shrink_lruvec+0x1d0/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_current+0xa6/0x120
        alloc_slab_page+0x3b1/0x540
        allocate_slab+0x70/0x660
        new_slab+0x46/0x70
        ___slab_alloc+0x4ad/0x7d0
        __slab_alloc+0x43/0x70
        kmem_cache_alloc+0x2c3/0x420
        getname_flags+0x4c/0x230
        getname+0x22/0x30
        do_sys_openat2+0x205/0x3b0
        do_sys_open+0x9a/0xf0
        __x64_sys_openat+0x62/0x80
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 95 PID: 50964 Comm: cc1 Tainted: G        W  O L    5.5.0-next-20200204+ #6
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      The write is under lru_lock, but the read is done as lockless.  The scan
      count is used to determine how aggressively the anon and file LRU lists
      should be scanned.  Load tearing could generate an inefficient heuristic,
      so fix it by adding READ_ONCE() for the read.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Link: http://lkml.kernel.org/r/20200206034945.2481-1-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0e3f42f
    • Q
      mm/page_counter: fix various data races at memsw · 6e4bd50f
      Qian Cai 提交于
      Commit 3e32cb2e ("mm: memcontrol: lockless page counters") could had
      memcg->memsw->watermark and memcg->memsw->failcnt been accessed
      concurrently as reported by KCSAN,
      
       BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge
      
       read to 0xffff8fb18c4cd190 of 8 bytes by task 1081 on cpu 59:
        page_counter_try_charge+0x4d/0x150 mm/page_counter.c:138
        try_charge+0x131/0xd50 mm/memcontrol.c:2405
        __memcg_kmem_charge_memcg+0x58/0x140
        __memcg_kmem_charge+0xcc/0x280
        __alloc_pages_nodemask+0x1e1/0x450
        alloc_pages_current+0xa6/0x120
        pte_alloc_one+0x17/0xd0
        __pte_alloc+0x3a/0x1f0
        copy_p4d_range+0xc36/0x1990
        copy_page_range+0x21d/0x360
        dup_mmap+0x5f5/0x7a0
        dup_mm+0xa2/0x240
        copy_process+0x1b3f/0x3460
        _do_fork+0xaa/0xa20
        __x64_sys_clone+0x13b/0x170
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       write to 0xffff8fb18c4cd190 of 8 bytes by task 1153 on cpu 120:
        page_counter_try_charge+0x5b/0x150 mm/page_counter.c:139
        try_charge+0x131/0xd50 mm/memcontrol.c:2405
        mem_cgroup_try_charge+0x159/0x460
        mem_cgroup_try_charge_delay+0x3d/0xa0
        wp_page_copy+0x14d/0x930
        do_wp_page+0x107/0x7b0
        __handle_mm_fault+0xce6/0xd40
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge
      
       write to 0xffff88809bbf2158 of 8 bytes by task 11782 on cpu 0:
        page_counter_try_charge+0x100/0x170 mm/page_counter.c:129
        try_charge+0x185/0xbf0 mm/memcontrol.c:2405
        __memcg_kmem_charge_memcg+0x4a/0xe0 mm/memcontrol.c:2837
        __memcg_kmem_charge+0xcf/0x1b0 mm/memcontrol.c:2877
        __alloc_pages_nodemask+0x26c/0x310 mm/page_alloc.c:4780
      
       read to 0xffff88809bbf2158 of 8 bytes by task 11814 on cpu 1:
        page_counter_try_charge+0xef/0x170 mm/page_counter.c:129
        try_charge+0x185/0xbf0 mm/memcontrol.c:2405
        __memcg_kmem_charge_memcg+0x4a/0xe0 mm/memcontrol.c:2837
        __memcg_kmem_charge+0xcf/0x1b0 mm/memcontrol.c:2877
        __alloc_pages_nodemask+0x26c/0x310 mm/page_alloc.c:4780
      
      Since watermark could be compared or set to garbage due to a data race
      which would change the code logic, fix it by adding a pair of READ_ONCE()
      and WRITE_ONCE() in those places.
      
      The "failcnt" counter is tolerant of some degree of inaccuracy and is only
      used to report stats, a data race will not be harmful, thus mark it as an
      intentional data race using the data_race() macro.
      
      Fixes: 3e32cb2e ("mm: memcontrol: lockless page counters")
      Reported-by: syzbot+f36cfe60b1006a94f9dc@syzkaller.appspotmail.com
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Marco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Link: http://lkml.kernel.org/r/1581519682-23594-1-git-send-email-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e4bd50f
    • Q
      mm/swapfile: fix and annotate various data races · a449bf58
      Qian Cai 提交于
      swap_info_struct si.highest_bit, si.swap_map[offset] and si.flags could
      be accessed concurrently separately as noticed by KCSAN,
      
      === si.highest_bit ===
      
       write to 0xffff8d5abccdc4d4 of 4 bytes by task 5353 on cpu 24:
        swap_range_alloc+0x81/0x130
        swap_range_alloc at mm/swapfile.c:681
        scan_swap_map_slots+0x371/0xb90
        get_swap_pages+0x39d/0x5c0
        get_swap_page+0xf2/0x524
        add_to_swap+0xe4/0x1c0
        shrink_page_list+0x1795/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
      
       read to 0xffff8d5abccdc4d4 of 4 bytes by task 6672 on cpu 70:
        scan_swap_map_slots+0x4a6/0xb90
        scan_swap_map_slots at mm/swapfile.c:892
        get_swap_pages+0x39d/0x5c0
        get_swap_page+0xf2/0x524
        add_to_swap+0xe4/0x1c0
        shrink_page_list+0x1795/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 70 PID: 6672 Comm: oom01 Tainted: G        W    L 5.5.0-next-20200205+ #3
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      === si.swap_map[offset] ===
      
       write to 0xffffbc370c29a64c of 1 bytes by task 6856 on cpu 86:
        __swap_entry_free_locked+0x8c/0x100
        __swap_entry_free_locked at mm/swapfile.c:1209 (discriminator 4)
        __swap_entry_free.constprop.20+0x69/0xb0
        free_swap_and_cache+0x53/0xa0
        unmap_page_range+0x7f8/0x1d70
        unmap_single_vma+0xcd/0x170
        unmap_vmas+0x18b/0x220
        exit_mmap+0xee/0x220
        mmput+0x10e/0x270
        do_exit+0x59b/0xf40
        do_group_exit+0x8b/0x180
      
       read to 0xffffbc370c29a64c of 1 bytes by task 6855 on cpu 20:
        _swap_info_get+0x81/0xa0
        _swap_info_get at mm/swapfile.c:1140
        free_swap_and_cache+0x40/0xa0
        unmap_page_range+0x7f8/0x1d70
        unmap_single_vma+0xcd/0x170
        unmap_vmas+0x18b/0x220
        exit_mmap+0xee/0x220
        mmput+0x10e/0x270
        do_exit+0x59b/0xf40
        do_group_exit+0x8b/0x180
      
      === si.flags ===
      
       write to 0xffff956c8fc6c400 of 8 bytes by task 6087 on cpu 23:
        scan_swap_map_slots+0x6fe/0xb50
        scan_swap_map_slots at mm/swapfile.c:887
        get_swap_pages+0x39d/0x5c0
        get_swap_page+0x377/0x524
        add_to_swap+0xe4/0x1c0
        shrink_page_list+0x1795/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
      
       read to 0xffff956c8fc6c400 of 8 bytes by task 6207 on cpu 63:
        _swap_info_get+0x41/0xa0
        __swap_info_get at mm/swapfile.c:1114
        put_swap_page+0x84/0x490
        __remove_mapping+0x384/0x5f0
        shrink_page_list+0xff1/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
      
      The writes are under si->lock but the reads are not. For si.highest_bit
      and si.swap_map[offset], data race could trigger logic bugs, so fix them
      by having WRITE_ONCE() for the writes and READ_ONCE() for the reads
      except those isolated reads where they compare against zero which a data
      race would cause no harm. Thus, annotate them as intentional data races
      using the data_race() macro.
      
      For si.flags, the readers are only interested in a single bit where a
      data race there would cause no issue there.
      
      [cai@lca.pw: add a missing annotation for si->flags in memory.c]
        Link: http://lkml.kernel.org/r/1581612647-5958-1-git-send-email-cai@lca.pwSigned-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/1581095163-12198-1-git-send-email-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a449bf58
    • K
      mm/filemap.c: fix a data race in filemap_fault() · e630bfac
      Kirill A. Shutemov 提交于
      struct file_ra_state ra.mmap_miss could be accessed concurrently during
      page faults as noticed by KCSAN,
      
       BUG: KCSAN: data-race in filemap_fault / filemap_map_pages
      
       write to 0xffff9b1700a2c1b4 of 4 bytes by task 3292 on cpu 30:
        filemap_fault+0x920/0xfc0
        do_sync_mmap_readahead at mm/filemap.c:2384
        (inlined by) filemap_fault at mm/filemap.c:2486
        __xfs_filemap_fault+0x112/0x3e0 [xfs]
        xfs_filemap_fault+0x74/0x90 [xfs]
        __do_fault+0x9e/0x220
        do_fault+0x4a0/0x920
        __handle_mm_fault+0xc69/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffff9b1700a2c1b4 of 4 bytes by task 3313 on cpu 32:
        filemap_map_pages+0xc2e/0xd80
        filemap_map_pages at mm/filemap.c:2625
        do_fault+0x3da/0x920
        __handle_mm_fault+0xc69/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 32 PID: 3313 Comm: systemd-udevd Tainted: G        W    L 5.5.0-next-20200210+ #1
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      ra.mmap_miss is used to contribute the readahead decisions, a data race
      could be undesirable.  Both the read and write is only under non-exclusive
      mmap_sem, two concurrent writers could even underflow the counter.  Fix
      the underflow by writing to a local variable before committing a final
      store to ra.mmap_miss given a small inaccuracy of the counter should be
      acceptable.
      Signed-off-by: NKirill A. Shutemov <kirill@shutemov.name>
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NQian Cai <cai@lca.pw>
      Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Marco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/20200211030134.1847-1-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e630bfac
    • Q
      mm/swap_state: mark various intentional data races · b96a3db2
      Qian Cai 提交于
      swap_cache_info.* could be accessed concurrently as noticed by
      KCSAN,
      
       BUG: KCSAN: data-race in lookup_swap_cache / lookup_swap_cache
      
       write to 0xffffffff85517318 of 8 bytes by task 94138 on cpu 101:
        lookup_swap_cache+0x12e/0x460
        lookup_swap_cache at mm/swap_state.c:322
        do_swap_page+0x112/0xeb0
        __handle_mm_fault+0xc7a/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffffffff85517318 of 8 bytes by task 91655 on cpu 100:
        lookup_swap_cache+0x117/0x460
        lookup_swap_cache at mm/swap_state.c:322
        shmem_swapin_page+0xc7/0x9e0
        shmem_getpage_gfp+0x2ca/0x16c0
        shmem_fault+0xef/0x3c0
        __do_fault+0x9e/0x220
        do_fault+0x4a0/0x920
        __handle_mm_fault+0xc69/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 100 PID: 91655 Comm: systemd-journal Tainted: G        W  O L 5.5.0-next-20200204+ #6
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
       write to 0xffffffff8d717308 of 8 bytes by task 11365 on cpu 87:
         __delete_from_swap_cache+0x681/0x8b0
         __delete_from_swap_cache at mm/swap_state.c:178
      
       read to 0xffffffff8d717308 of 8 bytes by task 11275 on cpu 53:
         __delete_from_swap_cache+0x66e/0x8b0
         __delete_from_swap_cache at mm/swap_state.c:178
      
      Both the read and write are done as lockless. Since swap_cache_info.*
      are only used to print out counter information, even if any of them
      missed a few incremental due to data races, it will be harmless, so just
      mark it as an intentional data race using the data_race() macro.
      
      While at it, fix a checkpatch.pl warning,
      
      WARNING: Single statement macros should not use a do {} while (0) loop
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/20200207003715.1578-1-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b96a3db2
    • Q
      mm/page_io: mark various intentional data races · 7b37e226
      Qian Cai 提交于
      struct swap_info_struct si.flags could be accessed concurrently as noticed
      by KCSAN,
      
       BUG: KCSAN: data-race in scan_swap_map_slots / swap_readpage
      
       write to 0xffff9c77b80ac400 of 8 bytes by task 91325 on cpu 16:
        scan_swap_map_slots+0x6fe/0xb50
        scan_swap_map_slots at mm/swapfile.c:887
        get_swap_pages+0x39d/0x5c0
        get_swap_page+0x377/0x524
        add_to_swap+0xe4/0x1c0
        shrink_page_list+0x1740/0x2820
        shrink_inactive_list+0x316/0x8b0
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffff9c77b80ac400 of 8 bytes by task 5422 on cpu 7:
        swap_readpage+0x204/0x6a0
        swap_readpage at mm/page_io.c:380
        read_swap_cache_async+0xa2/0xb0
        swapin_readahead+0x6a0/0x890
        do_swap_page+0x465/0xeb0
        __handle_mm_fault+0xc7a/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 7 PID: 5422 Comm: gmain Tainted: G        W  O L 5.5.0-next-20200204+ #6
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      Other reads,
      
       read to 0xffff91ea33eac400 of 8 bytes by task 11276 on cpu 120:
        __swap_writepage+0x140/0xc20
        __swap_writepage at mm/page_io.c:289
      
       read to 0xffff91ea33eac400 of 8 bytes by task 11264 on cpu 16:
        swap_set_page_dirty+0x44/0x1f4
        swap_set_page_dirty at mm/page_io.c:442
      
      The write is under &si->lock, but the reads are done as lockless.  Since
      the reads only check for a specific bit in the flag, it is harmless even
      if load tearing happens.  Thus, just mark them as intentional data races
      using the data_race() macro.
      
      [cai@lca.pw: add a missing annotation]
        Link: http://lkml.kernel.org/r/1581612585-5812-1-git-send-email-cai@lca.pwSigned-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/20200207003601.1526-1-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b37e226
    • Q
      mm/frontswap: mark various intentional data races · 96bdd2bc
      Qian Cai 提交于
      There are a few information counters that are intentionally not protected
      against increment races, so just annotate them using the data_race()
      macro.
      
       BUG: KCSAN: data-race in __frontswap_store / __frontswap_store
      
       write to 0xffffffff8b7174d8 of 8 bytes by task 6396 on cpu 103:
        __frontswap_store+0x2d0/0x344
        inc_frontswap_failed_stores at mm/frontswap.c:70
        (inlined by) __frontswap_store at mm/frontswap.c:280
        swap_writepage+0x83/0xf0
        pageout+0x33e/0xae0
        shrink_page_list+0x1f57/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffffffff8b7174d8 of 8 bytes by task 6405 on cpu 47:
        __frontswap_store+0x2b9/0x344
        inc_frontswap_failed_stores at mm/frontswap.c:70
        (inlined by) __frontswap_store at mm/frontswap.c:280
        swap_writepage+0x83/0xf0
        pageout+0x33e/0xae0
        shrink_page_list+0x1f57/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1581114499-5042-1-git-send-email-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96bdd2bc
    • Q
      mm/kmemleak: silence KCSAN splats in checksum · 69d0b54d
      Qian Cai 提交于
      Even if KCSAN is disabled for kmemleak, update_checksum() could still call
      crc32() (which is outside of kmemleak.c) to dereference object->pointer.
      Thus, the value of object->pointer could be accessed concurrently as
      noticed by KCSAN,
      
       BUG: KCSAN: data-race in crc32_le_base / do_raw_spin_lock
      
       write to 0xffffb0ea683a7d50 of 4 bytes by task 23575 on cpu 12:
        do_raw_spin_lock+0x114/0x200
        debug_spin_lock_after at kernel/locking/spinlock_debug.c:91
        (inlined by) do_raw_spin_lock at kernel/locking/spinlock_debug.c:115
        _raw_spin_lock+0x40/0x50
        __handle_mm_fault+0xa9e/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffffb0ea683a7d50 of 4 bytes by task 839 on cpu 60:
        crc32_le_base+0x67/0x350
        crc32_le_base+0x67/0x350:
        crc32_body at lib/crc32.c:106
        (inlined by) crc32_le_generic at lib/crc32.c:179
        (inlined by) crc32_le at lib/crc32.c:197
        kmemleak_scan+0x528/0xd90
        update_checksum at mm/kmemleak.c:1172
        (inlined by) kmemleak_scan at mm/kmemleak.c:1497
        kmemleak_scan_thread+0xcc/0xfa
        kthread+0x1e0/0x200
        ret_from_fork+0x27/0x50
      
      If a shattered value was returned due to a data race, it will be corrected
      in the next scan.  Thus, let KCSAN ignore all reads in the region to
      silence KCSAN in case the write side is non-atomic.
      Suggested-by: NMarco Elver <elver@google.com>
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NMarco Elver <elver@google.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Link: http://lkml.kernel.org/r/20200317182754.2180-1-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69d0b54d
    • X
      all arch: remove system call sys_sysctl · 88db0aa2
      Xiaoming Ni 提交于
      Since commit 61a47c1a ("sysctl: Remove the sysctl system call"),
      sys_sysctl is actually unavailable: any input can only return an error.
      
      We have been warning about people using the sysctl system call for years
      and believe there are no more users.  Even if there are users of this
      interface if they have not complained or fixed their code by now they
      probably are not going to, so there is no point in warning them any
      longer.
      
      So completely remove sys_sysctl on all architectures.
      
      [nixiaoming@huawei.com: s390: fix build error for sys_call_table_emu]
       Link: http://lkml.kernel.org/r/20200618141426.16884-1-nixiaoming@huawei.comSigned-off-by: NXiaoming Ni <nixiaoming@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: Will Deacon <will@kernel.org>		[arm/arm64]
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: chenzefeng <chenzefeng2@huawei.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christian Brauner <christian@brauner.io>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Diego Elio Pettenò <flameeyes@flameeyes.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kars de Jong <jongk@linux-m68k.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Paul Burton <paulburton@kernel.org>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sami Tolvanen <samitolvanen@google.com>
      Cc: Sargun Dhillon <sargun@sargun.me>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Cc: Sven Schnelle <svens@stackframe.org>
      Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zhou Yanjie <zhouyanjie@wanyeetech.com>
      Link: http://lkml.kernel.org/r/20200616030734.87257-1-nixiaoming@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88db0aa2
    • R
    • M
      mm: introduce offset_in_thp · ee6c400f
      Matthew Wilcox (Oracle) 提交于
      Mirroring offset_in_page(), this gives you the offset within this
      particular page, no matter what size page it is.  It optimises down to
      offset_in_page() if CONFIG_TRANSPARENT_HUGEPAGE is not set.
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: NZi Yan <ziy@nvidia.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200629151959.15779-8-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee6c400f
    • M
      mm: add thp_head · 2be1d718
      Matthew Wilcox (Oracle) 提交于
      This is like compound_head() but compiles away when
      CONFIG_TRANSPARENT_HUGEPAGE is not enabled.
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: NZi Yan <ziy@nvidia.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200629151959.15779-7-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2be1d718
    • M
      mm: replace hpage_nr_pages with thp_nr_pages · 6c357848
      Matthew Wilcox (Oracle) 提交于
      The thp prefix is more frequently used than hpage and we should be
      consistent between the various functions.
      
      [akpm@linux-foundation.org: fix mm/migrate.c]
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: NZi Yan <ziy@nvidia.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6c357848
    • M
      mm: add thp_size · af3bbc12
      Matthew Wilcox (Oracle) 提交于
      This function returns the number of bytes in a THP.  It is like
      page_size(), but compiles to just PAGE_SIZE if CONFIG_TRANSPARENT_HUGEPAGE
      is disabled.
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: NZi Yan <ziy@nvidia.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200629151959.15779-5-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af3bbc12
    • M
      mm: add thp_order · 6ffbb458
      Matthew Wilcox (Oracle) 提交于
      This function returns the order of a transparent huge page.  It compiles
      to 0 if CONFIG_TRANSPARENT_HUGEPAGE is disabled.
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: NZi Yan <ziy@nvidia.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200629151959.15779-4-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ffbb458
    • M
      mm: move page-flags include to top of file · 41901567
      Matthew Wilcox (Oracle) 提交于
      Give up on the notion that we can remove page-flags.h from mm.h.  There
      are currently 14 inline functions which use a PageFoo function.  Also, two
      of the files directly included by mm.h include page-flags.h themselves,
      and there are probably more indirect inclusions.  So just include it at
      the top like any other header file.
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: NZi Yan <ziy@nvidia.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200629151959.15779-3-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41901567
    • M
      mm: store compound_nr as well as compound_order · 1378a5ee
      Matthew Wilcox (Oracle) 提交于
      Patch series "THP prep patches".
      
      These are some generic cleanups and improvements, which I would like
      merged into mmotm soon.  The first one should be a performance improvement
      for all users of compound pages, and the others are aimed at getting code
      to compile away when CONFIG_TRANSPARENT_HUGEPAGE is disabled (ie small
      systems).  Also better documented / less confusing than the current prefix
      mixture of compound, hpage and thp.
      
      This patch (of 7):
      
      This removes a few instructions from functions which need to know how many
      pages are in a compound page.  The storage used is either page->mapping on
      64-bit or page->index on 32-bit.  Both of these are fine to overlay on
      tail pages.
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: NZi Yan <ziy@nvidia.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200629151959.15779-1-willy@infradead.org
      Link: http://lkml.kernel.org/r/20200629151959.15779-2-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1378a5ee
    • G
      mailmap: add entry for Greg Kurz · 14a36a43
      Greg Kurz 提交于
      I had stopped using gkurz@linux.vnet.ibm.com a while back already but this
      email address was shutdown last June when I quit IBM.  It's about time to
      map it to groug@kaod.org.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/159724692879.76040.4938578139173154028.stgit@bahia.lanSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      14a36a43
    • K
      selftests/exec: add file type errno tests · 0f71241a
      Kees Cook 提交于
      Make sure execve() returns the expected errno values for non-regular
      files.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Link: http://lkml.kernel.org/r/20200813231723.2725102-3-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f71241a
    • K
      exec: restore EACCES of S_ISDIR execve() · fc4177be
      Kees Cook 提交于
      Patch series "Fix S_ISDIR execve() errno".
      
      Fix an errno change for execve() of directories, noticed by Marc Zyngier.
      Along with the fix, include a regression test to avoid seeing this return
      in the future.
      
      This patch (of 2):
      
      The return code for attempting to execute a directory has always been
      EACCES.  Adjust the S_ISDIR exec test to reflect the old errno instead of
      the general EISDIR for other kinds of "open" attempts on directories.
      
      Fixes: 633fb6ac ("exec: move S_ISREG() check earlier")
      Reported-by: NMarc Zyngier <maz@kernel.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NGreg Kroah-Hartman <gregkh@android.com>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@google.com>
      Link: http://lkml.kernel.org/r/20200813231723.2725102-2-keescook@chromium.org
      Link: https://lore.kernel.org/lkml/20200813151305.6191993b@whySigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc4177be
    • N
      lz4: fix kernel decompression speed · b1a3e75e
      Nick Terrell 提交于
      This patch replaces all memcpy() calls with LZ4_memcpy() which calls
      __builtin_memcpy() so the compiler can inline it.
      
      LZ4 relies heavily on memcpy() with a constant size being inlined.  In x86
      and i386 pre-boot environments memcpy() cannot be inlined because memcpy()
      doesn't get defined as __builtin_memcpy().
      
      An equivalent patch has been applied upstream so that the next import
      won't lose this change [1].
      
      I've measured the kernel decompression speed using QEMU before and after
      this patch for the x86_64 and i386 architectures.  The speed-up is about
      10x as shown below.
      
      Code	Arch	Kernel Size	Time	Speed
      v5.8	x86_64	11504832 B	148 ms	 79 MB/s
      patch	x86_64	11503872 B	 13 ms	885 MB/s
      v5.8	i386	 9621216 B	 91 ms	106 MB/s
      patch	i386	 9620224 B	 10 ms	962 MB/s
      
      I also measured the time to decompress the initramfs on x86_64, i386, and
      arm.  All three show the same decompression speed before and after, as
      expected.
      
      [1] https://github.com/lz4/lz4/pull/890Signed-off-by: NNick Terrell <terrelln@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Yann Collet <yann.collet.73@gmail.com>
      Cc: Gao Xiang <gaoxiang25@huawei.com>
      Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Arvind Sankar <nivedita@alum.mit.edu>
      Link: http://lkml.kernel.org/r/20200803194022.2966806-1-nickrterrell@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1a3e75e
    • B
      Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone" · a8a4b7ae
      Baoquan He 提交于
      This reverts commit 26e7dead.
      
      Sonny reported that one of their tests started failing on the latest
      kernel on their Chrome OS platform.  The root cause is that the above
      commit removed the protection line of empty zone, while the parser used in
      the test relies on the protection line to mark the end of each zone.
      
      Let's revert it to avoid breaking userspace testing or applications.
      
      Fixes: 26e7dead ("mm/vmstat.c: do not show lowmem reserve protection information of empty zone)"
      Reported-by: NSonny Rao <sonnyrao@chromium.org>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>	[5.8.x]
      Link: http://lkml.kernel.org/r/20200811075412.12872-1-bhe@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8a4b7ae
    • M
      asm-generic: pgalloc.h: use correct #ifdef to enable pud_alloc_one() · 9922c1de
      Mike Rapoport 提交于
      The #ifdef statement that guards the generic version of pud_alloc_one() by
      mistake used __HAVE_ARCH_PUD_FREE instead of __HAVE_ARCH_PUD_ALLOC_ONE.
      
      Fix it.
      
      Fixes: d9e8b929 ("asm-generic: pgalloc: provide generic pud_alloc_one() and pud_free_one()")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20200812191415.GE163101@linux.ibm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9922c1de
    • L
      Merge tag 'timers-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b923f124
      Linus Torvalds 提交于
      Pull timekeeping updates from Thomas Gleixner:
       "A set of timekeeping/VDSO updates:
      
         - Preparatory work to allow S390 to switch over to the generic VDSO
           implementation.
      
           S390 requires that the VDSO data pointer is handed in to the
           counter read function when time namespace support is enabled.
           Adding the pointer is a NOOP for all other architectures because
           the compiler is supposed to optimize that out when it is unused in
           the architecture specific inline. The change also solved a similar
           problem for MIPS which fortunately has time namespaces not yet
           enabled.
      
           S390 needs to update clock related VDSO data independent of the
           timekeeping updates. This was solved so far with yet another
           sequence counter in the S390 implementation. A better solution is
           to utilize the already existing VDSO sequence count for this. The
           core code now exposes helper functions which allow to serialize
           against the timekeeper code and against concurrent readers.
      
           S390 needs extra data for their clock readout function. The initial
           common VDSO data structure did not provide a way to add that. It
           now has an embedded architecture specific struct embedded which
           defaults to an empty struct.
      
           Doing this now avoids tree dependencies and conflicts post rc1 and
           allows all other architectures which work on generic VDSO support
           to work from a common upstream base.
      
         - A trivial comment fix"
      
      * tag 'timers-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        time: Delete repeated words in comments
        lib/vdso: Allow to add architecture-specific vdso data
        timekeeping/vsyscall: Provide vdso_update_begin/end()
        vdso/treewide: Add vdso_data pointer argument to __arch_get_hw_counter()
      b923f124
    • L
      Merge tag 'timers-core-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b6b178e3
      Linus Torvalds 提交于
      Pull more timer updates from Thomas Gleixner:
       "A set of posix CPU timer changes which allows to defer the heavy work
        of posix CPU timers into task work context. The tick interrupt is
        reduced to a quick check which queues the work which is doing the
        heavy lifting before returning to user space or going back to guest
        mode. Moving this out is deferring the signal delivery slightly but
        posix CPU timers are inaccurate by nature as they depend on the tick
        so there is no real damage. The relevant test cases all passed.
      
        This lifts the last offender for RT out of the hard interrupt context
        tick handler, but it also has the general benefit that the actual
        heavy work is accounted to the task/process and not to the tick
        interrupt itself.
      
        Further optimizations are possible to break long sighand lock hold and
        interrupt disabled (on !RT kernels) times when a massive amount of
        posix CPU timers (which are unpriviledged) is armed for a
        task/process.
      
        This is currently only enabled for x86 because the architecture has to
        ensure that task work is handled in KVM before entering a guest, which
        was just established for x86 with the new common entry/exit code which
        got merged post 5.8 and is not the case for other KVM architectures"
      
      * tag 'timers-core-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Select POSIX_CPU_TIMERS_TASK_WORK
        posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
        posix-cpu-timers: Split run_posix_cpu_timers()
      b6b178e3
    • L
      Merge tag 'irq-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1d229a65
      Linus Torvalds 提交于
      Pull irq fixes from Thomas Gleixner:
       "Two fixes in the core interrupt code which ensure that all error exits
        unlock the descriptor lock"
      
      * tag 'irq-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Unlock irq descriptor after errors
        genirq/PM: Always unlock IRQ descriptor in rearm_wake_irq()
      1d229a65
    • L
      Merge tag 'for-linus' of git://github.com/openrisc/linux · e1d74fbe
      Linus Torvalds 提交于
      Pull OpenRISC updates from Stafford Horne:
       "A few patches all over the place during this cycle, mostly bug and
        sparse warning fixes for OpenRISC, but a few enhancements too. Note,
        there are 2 non OpenRISC specific fixups.
      
        Non OpenRISC fixes:
      
         - In init we need to align the init_task correctly to fix an issue
           with MUTEX_FLAGS, reviewed by Peter Z. No one picked this up so I
           kept it on my tree.
      
         - In asm-generic/io.h I fixed up some sparse warnings, OK'd by Arnd.
           Arnd asked to merge it via my tree.
      
        OpenRISC fixes:
      
         - Many fixes for OpenRISC sprase warnings.
      
         - Add support OpenRISC SMP tlb flushing rather than always flushing
           the entire TLB on every CPU.
      
         - Fix bug when dumping stack via /proc/xxx/stack of user threads"
      
      * tag 'for-linus' of git://github.com/openrisc/linux:
        openrisc: uaccess: Add user address space check to access_ok
        openrisc: signal: Fix sparse address space warnings
        openrisc: uaccess: Remove unused macro __addr_ok
        openrisc: uaccess: Use static inline function in access_ok
        openrisc: uaccess: Fix sparse address space warnings
        openrisc: io: Fixup defines and move include to the end
        asm-generic/io.h: Fix sparse warnings on big-endian architectures
        openrisc: Implement proper SMP tlb flushing
        openrisc: Fix oops caused when dumping stack
        openrisc: Add support for external initrd images
        init: Align init_task to avoid conflict with MUTEX_FLAGS
        openrisc: fix __user in raw_copy_to_user()'s prototype
      e1d74fbe
    • L
      Merge tag 'powerpc-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 7fca4dee
      Linus Torvalds 提交于
      Pull powerpc fix from Michael Ellerman:
       "One fix for a boot crash on some platforms introduced by the recent
        pkey refactoring.
      
        Thanks to Christian Zigotzky and Aneesh Kumar K.V"
      
      * tag 'powerpc-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/pkeys: Fix boot failures with Nemo board (A-EON AmigaOne X1000)
      7fca4dee
    • L
      Merge tag 'for-linus-5.9-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 0520058d
      Linus Torvalds 提交于
      Pull more xen updates from Juergen Gross:
      
       - Remove support for running as 32-bit Xen PV-guest.
      
         32-bit PV guests are rarely used, are lacking security fixes for
         Meltdown, and can be easily replaced by PVH mode. Another series for
         doing more cleanup will follow soon (removal of 32-bit-only pvops
         functionality).
      
       - Fixes and additional features for the Xen display frontend driver.
      
      * tag 'for-linus-5.9-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        drm/xen-front: Pass dumb buffer data offset to the backend
        xen: Sync up with the canonical protocol definition in Xen
        drm/xen-front: Add YUYV to supported formats
        drm/xen-front: Fix misused IS_ERR_OR_NULL checks
        xen/gntdev: Fix dmabuf import with non-zero sgt offset
        x86/xen: drop tests for highmem in pv code
        x86/xen: eliminate xen-asm_64.S
        x86/xen: remove 32-bit Xen PV guest support
      0520058d
    • L
      Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux · cd94257d
      Linus Torvalds 提交于
      Pull hyper-v fixes from Wei Liu:
      
       - fix oops reporting on Hyper-V
      
       - make objtool happy
      
      * tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
        x86/hyperv: Make hv_setup_sched_clock inline
        Drivers: hv: vmbus: Only notify Hyper-V for die events that are oops
      cd94257d
    • E
      x86/fsgsbase/64: Fix NULL deref in 86_fsgsbase_read_task · 8ab49526
      Eric Dumazet 提交于
      syzbot found its way in 86_fsgsbase_read_task() and triggered this oops:
      
         KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
         CPU: 0 PID: 6866 Comm: syz-executor262 Not tainted 5.8.0-syzkaller #0
         RIP: 0010:x86_fsgsbase_read_task+0x16d/0x310 arch/x86/kernel/process_64.c:393
         Call Trace:
           putreg32+0x3ab/0x530 arch/x86/kernel/ptrace.c:876
           genregs32_set arch/x86/kernel/ptrace.c:1026 [inline]
           genregs32_set+0xa4/0x100 arch/x86/kernel/ptrace.c:1006
           copy_regset_from_user include/linux/regset.h:326 [inline]
           ia32_arch_ptrace arch/x86/kernel/ptrace.c:1061 [inline]
           compat_arch_ptrace+0x36c/0xd90 arch/x86/kernel/ptrace.c:1198
           __do_compat_sys_ptrace kernel/ptrace.c:1420 [inline]
           __se_compat_sys_ptrace kernel/ptrace.c:1389 [inline]
           __ia32_compat_sys_ptrace+0x220/0x2f0 kernel/ptrace.c:1389
           do_syscall_32_irqs_on arch/x86/entry/common.c:84 [inline]
           __do_fast_syscall_32+0x57/0x80 arch/x86/entry/common.c:126
           do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:149
           entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
      
      This can happen if ptrace() or sigreturn() pokes an LDT selector into FS
      or GS for a task with no LDT and something tries to read the base before
      a return to usermode notices the bad selector and fixes it.
      
      The fix is to make sure ldt pointer is not NULL.
      
      Fixes: 07e1d88a ("x86/fsgsbase/64: Fix ptrace() to read the FS/GS base accurately")
      Co-developed-by: NJann Horn <jannh@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Chang S. Bae <chang.seok.bae@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Markus T Metzger <markus.t.metzger@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ab49526