1. 17 10月, 2020 40 次提交
    • J
      io-wq: assign NUMA node locality if appropriate · a8b595b2
      Jens Axboe 提交于
      There was an assumption that kthread_create_on_node() would properly set
      NUMA affinities in terms of CPUs allowed, but it doesn't. Make sure we
      do this when creating an io-wq context on NUMA.
      
      Cc: stable@vger.kernel.org
      Stefan Metzmacher <metze@samba.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a8b595b2
    • J
      io_uring: fix error path cleanup in io_sqe_files_register() · 55cbc256
      Jens Axboe 提交于
      syzbot reports the following crash:
      
      general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      CPU: 1 PID: 8927 Comm: syz-executor.3 Not tainted 5.9.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:io_file_from_index fs/io_uring.c:5963 [inline]
      RIP: 0010:io_sqe_files_register fs/io_uring.c:7369 [inline]
      RIP: 0010:__io_uring_register fs/io_uring.c:9463 [inline]
      RIP: 0010:__do_sys_io_uring_register+0x2fd2/0x3ee0 fs/io_uring.c:9553
      Code: ec 03 49 c1 ee 03 49 01 ec 49 01 ee e8 57 61 9c ff 41 80 3c 24 00 0f 85 9b 09 00 00 4d 8b af b8 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 28 00 0f 85 76 09 00 00 49 8b 55 00 89 d8 c1 f8 09 48 98 4c
      RSP: 0018:ffffc90009137d68 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9000ef2a000
      RDX: 0000000000040000 RSI: ffffffff81d81dd9 RDI: 0000000000000005
      RBP: dffffc0000000000 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffed1012882a37
      R13: 0000000000000000 R14: ffffed1012882a38 R15: ffff888094415000
      FS:  00007f4266f3c700(0000) GS:ffff8880ae500000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000118c000 CR3: 000000008e57d000 CR4: 00000000001506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45de59
      Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f4266f3bc78 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab
      RAX: ffffffffffffffda RBX: 00000000000083c0 RCX: 000000000045de59
      RDX: 0000000020000280 RSI: 0000000000000002 RDI: 0000000000000005
      RBP: 000000000118bf68 R08: 0000000000000000 R09: 0000000000000000
      R10: 40000000000000a1 R11: 0000000000000246 R12: 000000000118bf2c
      R13: 00007fff2fa4f12f R14: 00007f4266f3c9c0 R15: 000000000118bf2c
      Modules linked in:
      ---[ end trace 2a40a195e2d5e6e6 ]---
      RIP: 0010:io_file_from_index fs/io_uring.c:5963 [inline]
      RIP: 0010:io_sqe_files_register fs/io_uring.c:7369 [inline]
      RIP: 0010:__io_uring_register fs/io_uring.c:9463 [inline]
      RIP: 0010:__do_sys_io_uring_register+0x2fd2/0x3ee0 fs/io_uring.c:9553
      Code: ec 03 49 c1 ee 03 49 01 ec 49 01 ee e8 57 61 9c ff 41 80 3c 24 00 0f 85 9b 09 00 00 4d 8b af b8 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 28 00 0f 85 76 09 00 00 49 8b 55 00 89 d8 c1 f8 09 48 98 4c
      RSP: 0018:ffffc90009137d68 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9000ef2a000
      RDX: 0000000000040000 RSI: ffffffff81d81dd9 RDI: 0000000000000005
      RBP: dffffc0000000000 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffed1012882a37
      R13: 0000000000000000 R14: ffffed1012882a38 R15: ffff888094415000
      FS:  00007f4266f3c700(0000) GS:ffff8880ae400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000074a918 CR3: 000000008e57d000 CR4: 00000000001506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      which is a copy of fget failure condition jumping to cleanup, but the
      cleanup requires ctx->file_data to be assigned. Assign it when setup,
      and ensure that we clear it again for the error path exit.
      
      Fixes: 5398ae69 ("io_uring: clean file_data access in files_register")
      Reported-by: syzbot+f4ebcc98223dafd8991e@syzkaller.appspotmail.com
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      55cbc256
    • J
      Revert "io_uring: mark io_uring_fops/io_op_defs as __read_mostly" · 0918682b
      Jens Axboe 提交于
      This reverts commit 738277ad.
      
      This change didn't make a lot of sense, and as Linus reports, it actually
      fails on clang:
      
         /tmp/io_uring-dd40c4.s:26476: Warning: ignoring changed section
         attributes for .data..read_mostly
      
      The arrays are already marked const so, by definition, they are not
      just read-mostly, they are read-only.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0918682b
    • P
      io_uring: fix REQ_F_COMP_LOCKED by killing it · 216578e5
      Pavel Begunkov 提交于
      REQ_F_COMP_LOCKED is used and implemented in a buggy way. The problem is
      that the flag is set before io_put_req() but not cleared after, and if
      that wasn't the final reference, the request will be freed with the flag
      set from some other context, which may not hold a spinlock. That means
      possible races with removing linked timeouts and unsynchronised
      completion (e.g. access to CQ).
      
      Instead of fixing REQ_F_COMP_LOCKED, kill the flag and use
      task_work_add() to move such requests to a fresh context to free from
      it, as was done with __io_free_req_finish().
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      216578e5
    • P
      io_uring: dig out COMP_LOCK from deep call chain · 4edf20f9
      Pavel Begunkov 提交于
      io_req_clean_work() checks REQ_F_COMP_LOCK to pass this two layers up.
      Move the check up into __io_free_req(), so at least it doesn't looks so
      ugly and would facilitate further changes.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4edf20f9
    • P
      io_uring: don't put a poll req under spinlock · 6a0af224
      Pavel Begunkov 提交于
      Move io_put_req() in io_poll_task_handler() from under spinlock. This
      eliminates the need to use REQ_F_COMP_LOCKED, at the expense of
      potentially having to grab the lock again. That's still a better trade
      off than relying on the locked flag.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6a0af224
    • P
      io_uring: don't unnecessarily clear F_LINK_TIMEOUT · b1b74cfc
      Pavel Begunkov 提交于
      If a request had REQ_F_LINK_TIMEOUT it would've been cleared in
      __io_kill_linked_timeout() by the time of __io_fail_links(), so no need
      to care about it.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b1b74cfc
    • P
      io_uring: don't set COMP_LOCKED if won't put · 368c5481
      Pavel Begunkov 提交于
      __io_kill_linked_timeout() sets REQ_F_COMP_LOCKED for a linked timeout
      even if it can't cancel it, e.g. it's already running. It not only races
      with io_link_timeout_fn() for ->flags field, but also leaves the flag
      set and so io_link_timeout_fn() may find it and decide that it holds the
      lock. Hopefully, the second problem is potential.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      368c5481
    • C
      io_uring: Fix sizeof() mismatch · 035fbafc
      Colin Ian King 提交于
      An incorrect sizeof() is being used, sizeof(file_data->table) is not
      correct, it should be sizeof(*file_data->table).
      
      Fixes: 5398ae69 ("io_uring: clean file_data access in files_register")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      035fbafc
    • L
      Merge tag 'ovl-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs · 071a0578
      Linus Torvalds 提交于
      Pull overlayfs updates from Miklos Szeredi:
      
       - Improve performance for certain container setups by introducing a
         "volatile" mode
      
       - ioctl improvements
      
       - continue preparation for unprivileged overlay mounts
      
      * tag 'ovl-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
        ovl: use generic vfs_ioc_setflags_prepare() helper
        ovl: support [S|G]ETFLAGS and FS[S|G]ETXATTR ioctls for directories
        ovl: rearrange ovl_can_list()
        ovl: enumerate private xattrs
        ovl: pass ovl_fs down to functions accessing private xattrs
        ovl: drop flags argument from ovl_do_setxattr()
        ovl: adhere to the vfs_ vs. ovl_do_ conventions for xattrs
        ovl: use ovl_do_getxattr() for private xattr
        ovl: fold ovl_getxattr() into ovl_get_redirect_xattr()
        ovl: clean up ovl_getxattr() in copy_up.c
        duplicate ovl_getxattr()
        ovl: provide a mount option "volatile"
        ovl: check for incompatible features in work dir
      071a0578
    • L
      Merge tag 'afs-fixes-20201016' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · fad70111
      Linus Torvalds 提交于
      Pull afs updates from David Howells:
       "A collection of fixes to fix afs_cell struct refcounting, thereby
        fixing a slew of related syzbot bugs:
      
         - Fix the cell tree in the netns to use an rwsem rather than RCU.
      
           There seem to be some problems deriving from the use of RCU and a
           seqlock to walk the rbtree, but it's not entirely clear what since
           there are several different failures being seen.
      
           Changing things to use an rwsem instead makes it more robust. The
           extra performance derived from using RCU isn't necessary in this
           case since the only time we're looking up a cell is during mount or
           when cells are being manually added.
      
         - Fix the refcounting by splitting the usage counter into a memory
           refcount and an active users counter. The usage counter was doing
           double duty, keeping track of whether a cell is still in use and
           keeping track of when it needs to be destroyed - but this makes the
           clean up tricky. Separating these out simplifies the logic.
      
         - Fix purging a cell that has an alias. A cell alias pins the cell
           it's an alias of, but the alias is always later in the list. Trying
           to purge in a single pass causes rmmod to hang in such a case.
      
         - Fix cell removal. If a cell's manager is requeued whilst it's
           removing itself, the manager will run again and re-remove itself,
           causing problems in various places. Follow Hillf Danton's
           suggestion to insert a more terminal state that causes the manager
           to do nothing post-removal.
      
        In additional to the above, two other changes:
      
         - Add a tracepoint for the cell refcount and active users count. This
           helped with debugging the above and may be useful again in future.
      
         - Downgrade an assertion to a print when a still-active server is
           seen during purging. This was happening as a consequence of
           incomplete cell removal before the servers were cleaned up"
      
      * tag 'afs-fixes-20201016' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        afs: Don't assert on unpurgeable server records
        afs: Add tracing for cell refcount and active user count
        afs: Fix cell removal
        afs: Fix cell purging with aliases
        afs: Fix cell refcounting by splitting the usage counter
        afs: Fix rapid cell addition/removal by not using RCU on cells tree
      fad70111
    • L
      Merge tag 'f2fs-for-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs · 7a3daded
      Linus Torvalds 提交于
      Pull f2fs updates from Jaegeuk Kim:
       "In this round, we've added new features such as zone capacity for ZNS
        and a new GC policy, ATGC, along with in-memory segment management. In
        addition, we could improve the decompression speed significantly by
        changing virtual mapping method. Even though we've fixed lots of small
        bugs in compression support, I feel that it becomes more stable so
        that I could give it a try in production.
      
        Enhancements:
         - suport zone capacity in NVMe Zoned Namespace devices
         - introduce in-memory current segment management
         - add standart casefolding support
         - support age threshold based garbage collection
         - improve decompression speed by changing virtual mapping method
      
        Bug fixes:
         - fix condition checks in some ioctl() such as compression, move_range, etc
         - fix 32/64bits support in data structures
         - fix memory allocation in zstd decompress
         - add some boundary checks to avoid kernel panic on corrupted image
         - fix disallowing compression for non-empty file
         - fix slab leakage of compressed block writes
      
        In addition, it includes code refactoring for better readability and
        minor bug fixes for compression and zoned device support"
      
      * tag 'f2fs-for-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (51 commits)
        f2fs: code cleanup by removing unnecessary check
        f2fs: wait for sysfs kobject removal before freeing f2fs_sb_info
        f2fs: fix writecount false positive in releasing compress blocks
        f2fs: introduce check_swap_activate_fast()
        f2fs: don't issue flush in f2fs_flush_device_cache() for nobarrier case
        f2fs: handle errors of f2fs_get_meta_page_nofail
        f2fs: fix to set SBI_NEED_FSCK flag for inconsistent inode
        f2fs: reject CASEFOLD inode flag without casefold feature
        f2fs: fix memory alignment to support 32bit
        f2fs: fix slab leak of rpages pointer
        f2fs: compress: fix to disallow enabling compress on non-empty file
        f2fs: compress: introduce cic/dic slab cache
        f2fs: compress: introduce page array slab cache
        f2fs: fix to do sanity check on segment/section count
        f2fs: fix to check segment boundary during SIT page readahead
        f2fs: fix uninit-value in f2fs_lookup
        f2fs: remove unneeded parameter in find_in_block()
        f2fs: fix wrong total_sections check and fsmeta check
        f2fs: remove duplicated code in sanity_check_area_boundary
        f2fs: remove unused check on version_bitmap
        ...
      7a3daded
    • L
      Merge tag 'docs/v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 54a4c789
      Linus Torvalds 提交于
      Pull documentation updates from Mauro Carvalho Chehab:
       "A series of patches addressing warnings produced by make htmldocs.
        This includes:
      
         - kernel-doc markup fixes
      
         - ReST fixes
      
         - Updates at the build system in order to support newer versions of
           the docs build toolchain (Sphinx)
      
        After this series, the number of html build warnings should reduce
        significantly, and building with Sphinx 3.1 or later should now be
        supported (although it is still recommended to use Sphinx 2.4.4).
      
        As agreed with Jon, I should be sending you a late pull request by the
        end of the merge window addressing remaining issues with docs build,
        as there are a number of warning fixes that depends on pull requests
        that should be happening along the merge window.
      
        The end goal is to have a clean htmldocs build on Kernel 5.10.
      
        PS. It should be noticed that Sphinx 3.0 is not currently supported,
        as it lacks support for C domain namespaces. Such feature, needed in
        order to document uAPI system calls with Sphinx 3.x, was added only on
        Sphinx 3.1"
      
      * tag 'docs/v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (75 commits)
        PM / devfreq: remove a duplicated kernel-doc markup
        mm/doc: fix a literal block markup
        workqueue: fix a kernel-doc warning
        docs: virt: user_mode_linux_howto_v2.rst: fix a literal block markup
        Input: sparse-keymap: add a description for @sw
        rcu/tree: docs: document bkvcache new members at struct kfree_rcu_cpu
        nl80211: docs: add a description for s1g_cap parameter
        usb: docs: document altmode register/unregister functions
        kunit: test.h: fix a bad kernel-doc markup
        drivers: core: fix kernel-doc markup for dev_err_probe()
        docs: bio: fix a kerneldoc markup
        kunit: test.h: solve kernel-doc warnings
        block: bio: fix a warning at the kernel-doc markups
        docs: powerpc: syscall64-abi.rst: fix a malformed table
        drivers: net: hamradio: fix document location
        net: appletalk: Kconfig: Fix docs location
        dt-bindings: fix references to files converted to yaml
        memblock: get rid of a :c:type leftover
        math64.h: kernel-docs: Convert some markups into normal comments
        media: uAPI: buffer.rst: remove a left-over documentation
        ...
      54a4c789
    • L
      Merge tag 'trace-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 93f3d8f5
      Linus Torvalds 提交于
      Pull tracing fix from Steven Rostedt:
       "Fix mismatch section of adding early trace events.
      
        Fixes the issue of a mismatch section that was missed due to gcc
        inlining the offending function, while clang did not (and reported the
        issue)"
      
      * tag 'trace-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Remove __init from __trace_early_add_new_event()
      93f3d8f5
    • L
      Merge tag 'printk-for-5.10-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux · 8119c433
      Linus Torvalds 提交于
      Pull printk fix from Petr Mladek:
       "Prevent overflow in the new lockless ringbuffer"
      
      * tag 'printk-for-5.10-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
        printk: ringbuffer: Wrong data pointer when appending small string
      8119c433
    • L
      Merge tag 'kgdb-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux · 49dc6fbc
      Linus Torvalds 提交于
      Pull kgdb updates from Daniel Thompson:
       "A fairly modest set of changes for this cycle.
      
        Of particular note are an earlycon fix from Doug Anderson and my own
        changes to get kgdb/kdb to honour the kprobe blocklist. The later
        creates a safety rail that strongly encourages developers not to place
        breakpoints in, for example, arch specific trap handling code.
      
        Also included are a couple of small fixes and tweaks: an API update,
        eliminate a coverity dead code warning, improved handling of search
        during multi-line printk and a couple of typo corrections"
      
      * tag 'kgdb-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
        kdb: Fix pager search for multi-line strings
        kernel: debug: Centralize dbg_[de]activate_sw_breakpoints
        kgdb: Add NOKPROBE labels on the trap handler functions
        kgdb: Honour the kprobe blocklist when setting breakpoints
        kernel/debug: Fix spelling mistake in debug_core.c
        kdb: Use newer api for tasklist scanning
        kgdb: Make "kgdbcon" work properly with "kgdb_earlycon"
        kdb: remove unnecessary null check of dbg_io_ops
      49dc6fbc
    • L
      Merge tag 'mips_5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 09a31a7e
      Linus Torvalds 提交于
      Pull MIPS updates from Thomas Bogendoerfer:
      
       - removed support for PNX833x alias NXT_STB22x
      
       - included Ingenic SoC support into generic MIPS kernels
      
       - added support for new Ingenic SoCs
      
       - converted workaround selection to use Kconfig
      
       - replaced old boot mem functions by memblock_*
      
       - enabled COP2 usage in kernel for Loongson64 to make use
         of 16byte load/stores possible
      
       - cleanups and fixes
      
      * tag 'mips_5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (92 commits)
        MIPS: DEC: Restore bootmem reservation for firmware working memory area
        MIPS: dec: fix section mismatch
        bcm963xx_tag.h: fix duplicated word
        mips: ralink: enable zboot support
        MIPS: ingenic: Remove CPU_SUPPORTS_HUGEPAGES
        MIPS: cpu-probe: remove MIPS_CPU_BP_GHIST option bit
        MIPS: cpu-probe: introduce exclusive R3k CPU probe
        MIPS: cpu-probe: move fpu probing/handling into its own file
        MIPS: replace add_memory_region with memblock
        MIPS: Loongson64: Clean up numa.c
        MIPS: Loongson64: Select SMP in Kconfig to avoid build error
        mips: octeon: Add Ubiquiti E200 and E220 boards
        MIPS: SGI-IP28: disable use of ll/sc in kernel
        MIPS: tx49xx: move tx4939_add_memory_regions into only user
        MIPS: pgtable: Remove used PAGE_USERIO define
        MIPS: alchemy: Share prom_init implementation
        MIPS: alchemy: Fix build breakage, if TOUCHSCREEN_WM97XX is disabled
        MIPS: process: include exec.h header in process.c
        MIPS: process: Add prototype for function arch_dup_task_struct
        MIPS: idle: Add prototype for function check_wait
        ...
      09a31a7e
    • L
      Merge tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 847d4287
      Linus Torvalds 提交于
      Pull s390 updates from Vasily Gorbik:
      
       - Remove address space overrides using set_fs()
      
       - Convert to generic vDSO
      
       - Convert to generic page table dumper
      
       - Add ARCH_HAS_DEBUG_WX support
      
       - Add leap seconds handling support
      
       - Add NVMe firmware-assisted kernel dump support
      
       - Extend NVMe boot support with memory clearing control and addition of
         kernel parameters
      
       - AP bus and zcrypt api code rework. Add adapter configure/deconfigure
         interface. Extend debug features. Add failure injection support
      
       - Add ECC secure private keys support
      
       - Add KASan support for running protected virtualization host with
         4-level paging
      
       - Utilize destroy page ultravisor call to speed up secure guests
         shutdown
      
       - Implement ioremap_wc() and ioremap_prot() with MIO in PCI code
      
       - Various checksum improvements
      
       - Other small various fixes and improvements all over the code
      
      * tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (85 commits)
        s390/uaccess: fix indentation
        s390/uaccess: add default cases for __put_user_fn()/__get_user_fn()
        s390/zcrypt: fix wrong format specifications
        s390/kprobes: move insn_page to text segment
        s390/sie: fix typo in SIGP code description
        s390/lib: fix kernel doc for memcmp()
        s390/zcrypt: Introduce Failure Injection feature
        s390/zcrypt: move ap_msg param one level up the call chain
        s390/ap/zcrypt: revisit ap and zcrypt error handling
        s390/ap: Support AP card SCLP config and deconfig operations
        s390/sclp: Add support for SCLP AP adapter config/deconfig
        s390/ap: add card/queue deconfig state
        s390/ap: add error response code field for ap queue devices
        s390/ap: split ap queue state machine state from device state
        s390/zcrypt: New config switch CONFIG_ZCRYPT_DEBUG
        s390/zcrypt: introduce msg tracking in zcrypt functions
        s390/startup: correct early pgm check info formatting
        s390: remove orphaned extern variables declarations
        s390/kasan: make sure int handler always run with DAT on
        s390/ipl: add support to control memory clearing for nvme re-IPL
        ...
      847d4287
    • L
      Merge tag 'powerpc-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 96685f86
      Linus Torvalds 提交于
      Pull powerpc updates from Michael Ellerman:
      
       - A series from Nick adding ARCH_WANT_IRQS_OFF_ACTIVATE_MM & selecting
         it for powerpc, as well as a related fix for sparc.
      
       - Remove support for PowerPC 601.
      
       - Some fixes for watchpoints & addition of a new ptrace flag for
         detecting ISA v3.1 (Power10) watchpoint features.
      
       - A fix for kernels using 4K pages and the hash MMU on bare metal
         Power9 systems with > 16TB of RAM, or RAM on the 2nd node.
      
       - A basic idle driver for shallow stop states on Power10.
      
       - Tweaks to our sched domains code to better inform the scheduler about
         the hardware topology on Power9/10, where two SMT4 cores can be
         presented by firmware as an SMT8 core.
      
       - A series doing further reworks & cleanups of our EEH code.
      
       - Addition of a filter for RTAS (firmware) calls done via sys_rtas(),
         to prevent root from overwriting kernel memory.
      
       - Other smaller features, fixes & cleanups.
      
      Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V,
      Athira Rajeev, Biwen Li, Cameron Berkenpas, Cédric Le Goater, Christophe
      Leroy, Christoph Hellwig, Colin Ian King, Daniel Axtens, David Dai, Finn
      Thain, Frederic Barrat, Gautham R. Shenoy, Greg Kurz, Gustavo Romero,
      Ira Weiny, Jason Yan, Joel Stanley, Jordan Niethe, Kajol Jain, Konrad
      Rzeszutek Wilk, Laurent Dufour, Leonardo Bras, Liu Shixin, Luca
      Ceresoli, Madhavan Srinivasan, Mahesh Salgaonkar, Nathan Lynch, Nicholas
      Mc Guire, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran, Pedro
      Miraglia Franco de Carvalho, Pratik Rajesh Sampat, Qian Cai, Qinglang
      Miao, Ravi Bangoria, Russell Currey, Satheesh Rajendran, Scott Cheloha,
      Segher Boessenkool, Srikar Dronamraju, Stan Johnson, Stephen Kitt,
      Stephen Rothwell, Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain,
      Vaidyanathan Srinivasan, Vasant Hegde, Wang Wensheng, Wolfram Sang, Yang
      Yingliang, zhengbin.
      
      * tag 'powerpc-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (228 commits)
        Revert "powerpc/pci: unmap legacy INTx interrupts when a PHB is removed"
        selftests/powerpc: Fix eeh-basic.sh exit codes
        cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_reboot_notifier
        powerpc/time: Make get_tb() common to PPC32 and PPC64
        powerpc/time: Make get_tbl() common to PPC32 and PPC64
        powerpc/time: Remove get_tbu()
        powerpc/time: Avoid using get_tbl() and get_tbu() internally
        powerpc/time: Make mftb() common to PPC32 and PPC64
        powerpc/time: Rename mftbl() to mftb()
        powerpc/32s: Remove #ifdef CONFIG_PPC_BOOK3S_32 in head_book3s_32.S
        powerpc/32s: Rename head_32.S to head_book3s_32.S
        powerpc/32s: Setup the early hash table at all time.
        powerpc/time: Remove ifdef in get_dec() and set_dec()
        powerpc: Remove get_tb_or_rtc()
        powerpc: Remove __USE_RTC()
        powerpc: Tidy up a bit after removal of PowerPC 601.
        powerpc: Remove support for PowerPC 601
        powerpc: Remove PowerPC 601
        powerpc: Drop SYNC_601() ISYNC_601() and SYNC()
        powerpc: Remove CONFIG_PPC601_SYNC_FIX
        ...
      96685f86
    • L
      Merge branch 'akpm' (patches from Andrew) · c4cf498d
      Linus Torvalds 提交于
      Merge more updates from Andrew Morton:
       "155 patches.
      
        Subsystems affected by this patch series: mm (dax, debug, thp,
        readahead, page-poison, util, memory-hotplug, zram, cleanups), misc,
        core-kernel, get_maintainer, MAINTAINERS, lib, bitops, checkpatch,
        binfmt, ramfs, autofs, nilfs, rapidio, panic, relay, kgdb, ubsan,
        romfs, and fault-injection"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (155 commits)
        lib, uaccess: add failure injection to usercopy functions
        lib, include/linux: add usercopy failure capability
        ROMFS: support inode blocks calculation
        ubsan: introduce CONFIG_UBSAN_LOCAL_BOUNDS for Clang
        sched.h: drop in_ubsan field when UBSAN is in trap mode
        scripts/gdb/tasks: add headers and improve spacing format
        scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts command
        kernel/relay.c: drop unneeded initialization
        panic: dump registers on panic_on_warn
        rapidio: fix the missed put_device() for rio_mport_add_riodev
        rapidio: fix error handling path
        nilfs2: fix some kernel-doc warnings for nilfs2
        autofs: harden ioctl table
        ramfs: fix nommu mmap with gaps in the page cache
        mm: remove the now-unnecessary mmget_still_valid() hack
        mm/gup: take mmap_lock in get_dump_page()
        binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot
        coredump: rework elf/elf_fdpic vma_dump_size() into common helper
        coredump: refactor page range dumping into common helper
        coredump: let dump_emit() bail out on short writes
        ...
      c4cf498d
    • A
      lib, uaccess: add failure injection to usercopy functions · 4d0e9df5
      Albert van der Linde 提交于
      To test fault-tolerance of user memory access functions, introduce fault
      injection to usercopy functions.
      
      If a failure is expected return either -EFAULT or the total amount of
      bytes that were not copied.
      Signed-off-by: NAlbert van der Linde <alinde@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Reviewed-by: NAlexander Potapenko <glider@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Marco Elver <elver@google.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Link: http://lkml.kernel.org/r/20200831171733.955393-3-alinde@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4d0e9df5
    • A
      lib, include/linux: add usercopy failure capability · 2c739ced
      Albert van der Linde 提交于
      Patch series "add fault injection to user memory access", v3.
      
      The goal of this series is to improve testing of fault-tolerance in usages
      of user memory access functions, by adding support for fault injection.
      
      syzkaller/syzbot are using the existing fault injection modes and will use
      this particular feature also.
      
      The first patch adds failure injection capability for usercopy functions.
      The second changes usercopy functions to use this new failure capability
      (copy_from_user, ...).  The third patch adds get/put/clear_user failures
      to x86.
      
      This patch (of 3):
      
      Add a failure injection capability to improve testing of fault-tolerance
      in usages of user memory access functions.
      
      Add CONFIG_FAULT_INJECTION_USERCOPY to enable faults in usercopy
      functions.  The should_fail_usercopy function is to be called by these
      functions (copy_from_user, get_user, ...) in order to fail or not.
      Signed-off-by: NAlbert van der Linde <alinde@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Reviewed-by: NAlexander Potapenko <glider@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Link: http://lkml.kernel.org/r/20200831171733.955393-1-alinde@google.com
      Link: http://lkml.kernel.org/r/20200831171733.955393-2-alinde@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2c739ced
    • L
      ROMFS: support inode blocks calculation · d9bc85de
      Libing Zhou 提交于
      When use 'stat' tool to display file status, the 'Blocks' field always in
      '0', this is not good for tool 'du'(e.g.: busybox 'du'), it always output
      '0' size for the files under ROMFS since such tool calculates number of
      512B Blocks.
      
      This patch calculates approx.  number of 512B blocks based on inode size.
      Signed-off-by: NLibing Zhou <libing.zhou@nokia-sbell.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Link: http://lkml.kernel.org/r/20200811052606.4243-1-libing.zhou@nokia-sbell.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d9bc85de
    • G
      ubsan: introduce CONFIG_UBSAN_LOCAL_BOUNDS for Clang · 6a6155f6
      George Popescu 提交于
      When the kernel is compiled with Clang, -fsanitize=bounds expands to
      -fsanitize=array-bounds and -fsanitize=local-bounds.
      
      Enabling -fsanitize=local-bounds with Clang has the unfortunate
      side-effect of inserting traps; this goes back to its original intent,
      which was as a hardening and not a debugging feature [1].  The same
      feature made its way into -fsanitize=bounds, but the traps remained.  For
      that reason, -fsanitize=bounds was split into 'array-bounds' and
      'local-bounds' [2].
      
      Since 'local-bounds' doesn't behave like a normal sanitizer, enable it
      with Clang only if trapping behaviour was requested by
      CONFIG_UBSAN_TRAP=y.
      
      Add the UBSAN_BOUNDS_LOCAL config to Kconfig.ubsan to enable the
      'local-bounds' option by default when UBSAN_TRAP is enabled.
      
      [1] http://lists.llvm.org/pipermail/llvm-dev/2012-May/049972.html
      [2] http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20131021/091536.htmlSuggested-by: NMarco Elver <elver@google.com>
      Signed-off-by: NGeorge Popescu <georgepope@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NDavid Brazdil <dbrazdil@google.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Michal Marek <michal.lkml@markovi.net>
      Cc: Nathan Chancellor <natechancellor@gmail.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: https://lkml.kernel.org/r/20200922074330.2549523-1-georgepope@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a6155f6
    • E
      sched.h: drop in_ubsan field when UBSAN is in trap mode · 5cf53f3c
      Elena Petrova 提交于
      in_ubsan field of task_struct is only used in lib/ubsan.c, which in its
      turn is used only `ifneq ($(CONFIG_UBSAN_TRAP),y)`.
      
      Removing unnecessary field from a task_struct will help preserve the ABI
      between vanilla and CONFIG_UBSAN_TRAP'ed kernels.  In particular, this
      will help enabling bounds sanitizer transparently for Android's GKI.
      Signed-off-by: NElena Petrova <lenaptr@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Jann Horn <jannh@google.com>
      Link: https://lkml.kernel.org/r/20200910134802.3160311-1-lenaptr@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5cf53f3c
    • R
      scripts/gdb/tasks: add headers and improve spacing format · 4fbe310e
      Ritesh Harjani 提交于
      With the patch.
      <e.g. o/p>
            TASK          PID    COMM
      0xffffffff82c2b8c0   0   swapper/0
      0xffff888a0ba20040   1   systemd
      0xffff888a0ba24040   2   kthreadd
      0xffff888a0ba28040   3   rcu_gp
      
      w/o
      0xffffffff82c2b8c0 <init_task> 0 swapper/0
      0xffff888a0ba20040 1 systemd
      0xffff888a0ba24040 2 kthreadd
      0xffff888a0ba28040 3 rcu_gp
      Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NJan Kiszka <jan.kiszka@siemens.com>
      Cc: Kieran Bingham <kbingham@kernel.org>
      Link: http://lkml.kernel.org/r/54c868c79b5fc364a8be7799891934a6fe6d1464.1597742951.git.riteshh@linux.ibm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4fbe310e
    • R
      scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts command · 998ec76b
      Ritesh Harjani 提交于
      This is many times found useful while debugging some FS related
      issue.
      
      <e.g. output>
            mount          super_block     devname pathname fstype options
      0xffff888a0bfa4b40 0xffff888a0bfc1000 none / rootfs rw 0 0
      0xffff888a033f75c0 0xffff8889fcf65000 /dev/root / ext4 rw,relatime 0 0
      0xffff8889fc8ce040 0xffff888a0bb51000 devtmpfs /dev devtmpfs rw,relatime 0 0
      Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NJan Kiszka <jan.kiszka@siemens.com>
      Cc: Kieran Bingham <kbingham@kernel.org>
      Link: http://lkml.kernel.org/r/a3c4177e1597b3e06d66d55e07d72c0c46a03571.1597742951.git.riteshh@linux.ibm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      998ec76b
    • S
      kernel/relay.c: drop unneeded initialization · ac05b7a1
      Sudip Mukherjee 提交于
      The variable 'consumed' is initialized with the consumed count but
      immediately after that the consumed count is updated and assigned to
      'consumed' again thus overwriting the previous value.  So, drop the
      unneeded initialization.
      Signed-off-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20201005205727.1147-1-sudipm.mukherjee@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac05b7a1
    • A
      panic: dump registers on panic_on_warn · 3f388f28
      Alexey Kardashevskiy 提交于
      Currently we print stack and registers for ordinary warnings but we do not
      for panic_on_warn which looks as oversight - panic() will reboot the
      machine but won't print registers.
      
      This moves printing of registers and modules earlier.
      
      This does not move the stack dumping as panic() dumps it.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: Douglas Anderson <dianders@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Link: https://lkml.kernel.org/r/20200804095054.68724-1-aik@ozlabs.ruSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f388f28
    • J
      rapidio: fix the missed put_device() for rio_mport_add_riodev · 85094c05
      Jing Xiangfeng 提交于
      rio_mport_add_riodev() misses to call put_device() when the device already
      exists.  Add the missed function call to fix it.
      
      Fixes: e8de3701 ("rapidio: add mport char device driver")
      Signed-off-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Alexandre Bounine <alex.bou9@gmail.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
      Link: https://lkml.kernel.org/r/20200922072525.42330-1-jingxiangfeng@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      85094c05
    • S
      rapidio: fix error handling path · fa63f083
      Souptick Joarder 提交于
      rio_dma_transfer() attempts to clamp the return value of
      pin_user_pages_fast() to be >= 0.  However, the attempt fails because
      nr_pages is overridden a few lines later, and restored to the undesirable
      -ERRNO value.
      
      The return value is ultimately stored in nr_pages, which in turn is passed
      to unpin_user_pages(), which expects nr_pages >= 0, else, disaster.
      
      Fix this by fixing the nesting of the assignment to nr_pages: nr_pages
      should be clamped to zero if pin_user_pages_fast() returns -ERRNO, or set
      to the return value of pin_user_pages_fast(), otherwise.
      
      [jhubbard@nvidia.com: new changelog]
      
      Fixes: e8de3701 ("rapidio: add mport char device driver")
      Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Alexandre Bounine <alex.bou9@gmail.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Link: https://lkml.kernel.org/r/1600227737-20785-1-git-send-email-jrdr.linux@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fa63f083
    • W
      nilfs2: fix some kernel-doc warnings for nilfs2 · 64ead520
      Wang Hai 提交于
      Fixes the following W=1 kernel build warning(s):
      
      fs/nilfs2/bmap.c:378: warning: Excess function parameter 'bhp' description in 'nilfs_bmap_assign'
      fs/nilfs2/cpfile.c:907: warning: Excess function parameter 'status' description in 'nilfs_cpfile_change_cpmode'
      fs/nilfs2/cpfile.c:946: warning: Excess function parameter 'stat' description in 'nilfs_cpfile_get_stat'
      fs/nilfs2/page.c:76: warning: Excess function parameter 'inode' description in 'nilfs_forget_buffer'
      fs/nilfs2/sufile.c:563: warning: Excess function parameter 'stat' description in 'nilfs_sufile_get_stat'
      Signed-off-by: NWang Hai <wanghai38@huawei.com>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/1601386269-2423-1-git-send-email-konishi.ryusuke@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64ead520
    • M
      autofs: harden ioctl table · 589f6b52
      Matthew Wilcox 提交于
      The table of ioctl functions should be marked const in order to put them
      in read-only memory, and we should use array_index_nospec() to avoid
      speculation disclosing the contents of kernel memory to userspace.
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NIan Kent <raven@themaw.net>
      Link: https://lkml.kernel.org/r/20200818122203.GO17456@casper.infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      589f6b52
    • M
      ramfs: fix nommu mmap with gaps in the page cache · 50b7d856
      Matthew Wilcox (Oracle) 提交于
      ramfs needs to check that pages are both physically contiguous and
      contiguous in the file.  If the page cache happens to have, eg, page A for
      index 0 of the file, no page for index 1, and page A+1 for index 2, then
      an mmap of the first two pages of the file will succeed when it should
      fail.
      
      Fixes: 642fb4d1 ("[PATCH] NOMMU: Provide shared-writable mmap support on ramfs")
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: David Howells <dhowells@redhat.com>
      Link: https://lkml.kernel.org/r/20200914122239.GO6583@casper.infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50b7d856
    • J
      mm: remove the now-unnecessary mmget_still_valid() hack · 4d45e75a
      Jann Horn 提交于
      The preceding patches have ensured that core dumping properly takes the
      mmap_lock.  Thanks to that, we can now remove mmget_still_valid() and all
      its users.
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4d45e75a
    • J
      mm/gup: take mmap_lock in get_dump_page() · 7f3bfab5
      Jann Horn 提交于
      Properly take the mmap_lock before calling into the GUP code from
      get_dump_page(); and play nice, allowing the GUP code to drop the
      mmap_lock if it has to sleep.
      
      As Linus pointed out, we don't actually need the VMA because
      __get_user_pages() will flush the dcache for us if necessary.
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/20200827114932.3572699-7-jannh@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f3bfab5
    • J
      binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot · a07279c9
      Jann Horn 提交于
      In both binfmt_elf and binfmt_elf_fdpic, use a new helper
      dump_vma_snapshot() to take a snapshot of the VMA list (including the gate
      VMA, if we have one) while protected by the mmap_lock, and then use that
      snapshot instead of walking the VMA list without locking.
      
      An alternative approach would be to keep the mmap_lock held across the
      entire core dumping operation; however, keeping the mmap_lock locked while
      we may be blocked for an unbounded amount of time (e.g.  because we're
      dumping to a FUSE filesystem or so) isn't really optimal; the mmap_lock
      blocks things like the ->release handler of userfaultfd, and we don't
      really want critical system daemons to grind to a halt just because
      someone "gifted" them SCM_RIGHTS to an eternally-locked userfaultfd, or
      something like that.
      
      Since both the normal ELF code and the FDPIC ELF code need this
      functionality (and if any other binfmt wants to add coredump support in
      the future, they'd probably need it, too), implement this with a common
      helper in fs/coredump.c.
      
      A downside of this approach is that we now need a bigger amount of kernel
      memory per userspace VMA in the normal ELF case, and that we need O(n)
      kernel memory in the FDPIC ELF case at all; but 40 bytes per VMA shouldn't
      be terribly bad.
      
      There currently is a data race between stack expansion and anything that
      reads ->vm_start or ->vm_end under the mmap_lock held in read mode; to
      mitigate that for core dumping, take the mmap_lock in write mode when
      taking a snapshot of the VMA hierarchy.  (If we only took the mmap_lock in
      read mode, we could end up with a corrupted core dump if someone does
      get_user_pages_remote() concurrently.  Not really a major problem, but
      taking the mmap_lock either way works here, so we might as well avoid the
      issue.) (This doesn't do anything about the existing data races with stack
      expansion in other mm code.)
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/20200827114932.3572699-6-jannh@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a07279c9
    • J
      coredump: rework elf/elf_fdpic vma_dump_size() into common helper · 429a22e7
      Jann Horn 提交于
      At the moment, the binfmt_elf and binfmt_elf_fdpic code have slightly
      different code to figure out which VMAs should be dumped, and if so,
      whether the dump should contain the entire VMA or just its first page.
      
      Eliminate duplicate code by reworking the binfmt_elf version into a
      generic core dumping helper in coredump.c.
      
      As part of that, change the heuristic for detecting executable/library
      header pages to check whether the inode is executable instead of looking
      at the file mode.
      
      This is less problematic in terms of locking because it lets us avoid
      get_user() under the mmap_sem.  (And arguably it looks nicer and makes
      more sense in generic code.)
      
      Adjust a little bit based on the binfmt_elf_fdpic version: ->anon_vma is
      only meaningful under CONFIG_MMU, otherwise we have to assume that the VMA
      has been written to.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/20200827114932.3572699-5-jannh@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      429a22e7
    • J
      coredump: refactor page range dumping into common helper · afc63a97
      Jann Horn 提交于
      Both fs/binfmt_elf.c and fs/binfmt_elf_fdpic.c need to dump ranges of
      pages into the coredump file.  Extract that logic into a common helper.
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/20200827114932.3572699-4-jannh@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      afc63a97
    • J
      coredump: let dump_emit() bail out on short writes · df0c09c0
      Jann Horn 提交于
      dump_emit() has a retry loop, but there seems to be no way for that retry
      logic to actually be used; and it was also buggy, writing the same data
      repeatedly after a short write.
      
      Let's just bail out on a short write.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/20200827114932.3572699-3-jannh@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      df0c09c0