1. 14 12月, 2020 14 次提交
  2. 12 12月, 2020 1 次提交
    • M
      proc: use untagged_addr() for pagemap_read addresses · 40d6366e
      Miles Chen 提交于
      When we try to visit the pagemap of a tagged userspace pointer, we find
      that the start_vaddr is not correct because of the tag.
      To fix it, we should untag the userspace pointers in pagemap_read().
      
      I tested with 5.10-rc4 and the issue remains.
      
      Explanation from Catalin in [1]:
      
       "Arguably, that's a user-space bug since tagged file offsets were never
        supported. In this case it's not even a tag at bit 56 as per the arm64
        tagged address ABI but rather down to bit 47. You could say that the
        problem is caused by the C library (malloc()) or whoever created the
        tagged vaddr and passed it to this function. It's not a kernel
        regression as we've never supported it.
      
        Now, pagemap is a special case where the offset is usually not
        generated as a classic file offset but rather derived by shifting a
        user virtual address. I guess we can make a concession for pagemap
        (only) and allow such offset with the tag at bit (56 - PAGE_SHIFT + 3)"
      
      My test code is based on [2]:
      
      A userspace pointer which has been tagged by 0xb4: 0xb400007662f541c8
      
      userspace program:
      
        uint64 OsLayer::VirtualToPhysical(void *vaddr) {
      	uint64 frame, paddr, pfnmask, pagemask;
      	int pagesize = sysconf(_SC_PAGESIZE);
      	off64_t off = ((uintptr_t)vaddr) / pagesize * 8; // off = 0xb400007662f541c8 / pagesize * 8 = 0x5a00003b317aa0
      	int fd = open(kPagemapPath, O_RDONLY);
      	...
      
      	if (lseek64(fd, off, SEEK_SET) != off || read(fd, &frame, 8) != 8) {
      		int err = errno;
      		string errtxt = ErrorString(err);
      		if (fd >= 0)
      			close(fd);
      		return 0;
      	}
        ...
        }
      
      kernel fs/proc/task_mmu.c:
      
        static ssize_t pagemap_read(struct file *file, char __user *buf,
      		size_t count, loff_t *ppos)
        {
      	...
      	src = *ppos;
      	svpfn = src / PM_ENTRY_BYTES; // svpfn == 0xb400007662f54
      	start_vaddr = svpfn << PAGE_SHIFT; // start_vaddr == 0xb400007662f54000
      	end_vaddr = mm->task_size;
      
      	/* watch out for wraparound */
      	// svpfn == 0xb400007662f54
      	// (mm->task_size >> PAGE) == 0x8000000
      	if (svpfn > mm->task_size >> PAGE_SHIFT) // the condition is true because of the tag 0xb4
      		start_vaddr = end_vaddr;
      
      	ret = 0;
      	while (count && (start_vaddr < end_vaddr)) { // we cannot visit correct entry because start_vaddr is set to end_vaddr
      		int len;
      		unsigned long end;
      		...
      	}
      	...
        }
      
      [1] https://lore.kernel.org/patchwork/patch/1343258/
      [2] https://github.com/stressapptest/stressapptest/blob/master/src/os.cc#L158
      
      Link: https://lkml.kernel.org/r/20201204024347.8295-1-miles.chen@mediatek.comSigned-off-by: NMiles Chen <miles.chen@mediatek.com>
      Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
      Cc: <stable@vger.kernel.org>	[5.4-]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      40d6366e
  3. 11 12月, 2020 3 次提交
    • A
      NFS: Disable READ_PLUS by default · 21e31401
      Anna Schumaker 提交于
      We've been seeing failures with xfstests generic/091 and generic/263
      when using READ_PLUS. I've made some progress on these issues, and the
      tests fail later on but still don't pass. Let's disable READ_PLUS by
      default until we can work out what is going on.
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      21e31401
    • D
      NFSv4.2: Fix 5 seconds delay when doing inter server copy · fe8eb820
      Dai Ngo 提交于
      Since commit b4868b44 ("NFSv4: Wait for stateid updates after
      CLOSE/OPEN_DOWNGRADE"), every inter server copy operation suffers 5
      seconds delay regardless of the size of the copy. The delay is from
      nfs_set_open_stateid_locked when the check by nfs_stateid_is_sequential
      fails because the seqid in both nfs4_state and nfs4_stateid are 0.
      
      Fix __nfs42_ssc_open to delay setting of NFS_OPEN_STATE in nfs4_state,
      until after the call to update_open_stateid, to indicate this is the 1st
      open. This fix is part of a 2 patches, the other patch is the fix in the
      source server to return the stateid for COPY_NOTIFY request with seqid 1
      instead of 0.
      
      Fixes: ce0887ac ("NFSD add nfs4 inter ssc to nfsd4_copy")
      Signed-off-by: NDai Ngo <dai.ngo@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      fe8eb820
    • C
      NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation · 1c87b851
      Chuck Lever 提交于
      By switching to an XFS-backed export, I am able to reproduce the
      ibcomp worker crash on my client with xfstests generic/013.
      
      For the failing LISTXATTRS operation, xdr_inline_pages() is called
      with page_len=12 and buflen=128.
      
      - When ->send_request() is called, rpcrdma_marshal_req() does not
        set up a Reply chunk because buflen is smaller than the inline
        threshold. Thus rpcrdma_convert_iovs() does not get invoked at
        all and the transport's XDRBUF_SPARSE_PAGES logic is not invoked
        on the receive buffer.
      
      - During reply processing, rpcrdma_inline_fixup() tries to copy
        received data into rq_rcv_buf->pages because page_len is positive.
        But there are no receive pages because rpcrdma_marshal_req() never
        allocated them.
      
      The result is that the ibcomp worker faults and dies. Sometimes that
      causes a visible crash, and sometimes it results in a transport hang
      without other symptoms.
      
      RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
      should eventually be fixed or replaced. However, my preference is
      that upper-layer operations should explicitly allocate their receive
      buffers (using GFP_KERNEL) when possible, rather than relying on
      XDRBUF_SPARSE_PAGES.
      Reported-by: NOlga kornievskaia <kolga@netapp.com>
      Suggested-by: NOlga kornievskaia <kolga@netapp.com>
      Fixes: c10a7514 ("NFSv4.2: add the extended attribute proc functions.")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NOlga kornievskaia <kolga@netapp.com>
      Reviewed-by: NFrank van der Linden <fllinden@amazon.com>
      Tested-by: NOlga kornievskaia <kolga@netapp.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      1c87b851
  4. 10 12月, 2020 1 次提交
    • D
      zonefs: fix page reference and BIO leak · 6bea0225
      Damien Le Moal 提交于
      In zonefs_file_dio_append(), the pages obtained using
      bio_iov_iter_get_pages() are not released on completion of the
      REQ_OP_APPEND BIO, nor when bio_iov_iter_get_pages() fails.
      Furthermore, a call to bio_put() is missing when
      bio_iov_iter_get_pages() fails.
      
      Fix these resource leaks by adding BIO resource release code (bio_put()i
      and bio_release_pages()) at the end of the function after the BIO
      execution and add a jump to this resource cleanup code in case of
      bio_iov_iter_get_pages() failure.
      
      While at it, also fix the call to task_io_account_write() to be passed
      the correct BIO size instead of bio_iov_iter_get_pages() return value.
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Fixes: 02ef12a6 ("zonefs: use REQ_OP_ZONE_APPEND for sync DIO")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      6bea0225
  5. 09 12月, 2020 1 次提交
    • D
      afs: Fix memory leak when mounting with multiple source parameters · 4cb68296
      David Howells 提交于
      There's a memory leak in afs_parse_source() whereby multiple source=
      parameters overwrite fc->source in the fs_context struct without freeing
      the previously recorded source.
      
      Fix this by only permitting a single source parameter and rejecting with
      an error all subsequent ones.
      
      This was caught by syzbot with the kernel memory leak detector, showing
      something like the following trace:
      
        unreferenced object 0xffff888114375440 (size 32):
          comm "repro", pid 5168, jiffies 4294923723 (age 569.948s)
          backtrace:
            slab_post_alloc_hook+0x42/0x79
            __kmalloc_track_caller+0x125/0x16a
            kmemdup_nul+0x24/0x3c
            vfs_parse_fs_string+0x5a/0xa1
            generic_parse_monolithic+0x9d/0xc5
            do_new_mount+0x10d/0x15a
            do_mount+0x5f/0x8e
            __do_sys_mount+0xff/0x127
            do_syscall_64+0x2d/0x3a
            entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 13fcc683 ("afs: Add fs_context support")
      Reported-by: syzbot+86dc6632faaca40133ab@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4cb68296
  6. 08 12月, 2020 1 次提交
    • H
      io_uring: fix file leak on error path of io ctx creation · f26c08b4
      Hillf Danton 提交于
      Put file as part of error handling when setting up io ctx to fix
      memory leaks like the following one.
      
         BUG: memory leak
         unreferenced object 0xffff888101ea2200 (size 256):
           comm "syz-executor355", pid 8470, jiffies 4294953658 (age 32.400s)
           hex dump (first 32 bytes):
             00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
             20 59 03 01 81 88 ff ff 80 87 a8 10 81 88 ff ff   Y..............
           backtrace:
             [<000000002e0a7c5f>] kmem_cache_zalloc include/linux/slab.h:654 [inline]
             [<000000002e0a7c5f>] __alloc_file+0x1f/0x130 fs/file_table.c:101
             [<000000001a55b73a>] alloc_empty_file+0x69/0x120 fs/file_table.c:151
             [<00000000fb22349e>] alloc_file+0x33/0x1b0 fs/file_table.c:193
             [<000000006e1465bb>] alloc_file_pseudo+0xb2/0x140 fs/file_table.c:233
             [<000000007118092a>] anon_inode_getfile fs/anon_inodes.c:91 [inline]
             [<000000007118092a>] anon_inode_getfile+0xaa/0x120 fs/anon_inodes.c:74
             [<000000002ae99012>] io_uring_get_fd fs/io_uring.c:9198 [inline]
             [<000000002ae99012>] io_uring_create fs/io_uring.c:9377 [inline]
             [<000000002ae99012>] io_uring_setup+0x1125/0x1630 fs/io_uring.c:9411
             [<000000008280baad>] do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
             [<00000000685d8cf0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported-by: syzbot+71c4697e27c99fddcf17@syzkaller.appspotmail.com
      Fixes: 0f212204 ("io_uring: don't rely on weak ->files references")
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NHillf Danton <hdanton@sina.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f26c08b4
  7. 07 12月, 2020 2 次提交
  8. 04 12月, 2020 3 次提交
  9. 02 12月, 2020 2 次提交
  10. 01 12月, 2020 4 次提交
  11. 30 11月, 2020 1 次提交
  12. 27 11月, 2020 1 次提交
    • A
      gfs2: Upgrade shared glocks for atime updates · 82e938bd
      Andreas Gruenbacher 提交于
      Commit 20f82999 ("gfs2: Rework read and page fault locking") lifted
      the glock lock taking from the low-level ->readpage and ->readahead
      address space operations to the higher-level ->read_iter file and
      ->fault vm operations.  The glocks are still taken in LM_ST_SHARED mode
      only.  On filesystems mounted without the noatime option, ->read_iter
      sometimes needs to update the atime as well, though.  Right now, this
      leads to a failed locking mode assertion in gfs2_dirty_inode.
      
      Fix that by introducing a new update_time inode operation.  There, if
      the glock is held non-exclusively, upgrade it to an exclusive lock.
      Reported-by: NAlexander Aring <aahringo@redhat.com>
      Fixes: 20f82999 ("gfs2: Rework read and page fault locking")
      Cc: stable@vger.kernel.org # v5.8+
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      82e938bd
  13. 26 11月, 2020 3 次提交
    • P
      io_uring: fix files grab/cancel race · af604703
      Pavel Begunkov 提交于
      When one task is in io_uring_cancel_files() and another is doing
      io_prep_async_work() a race may happen. That's because after accounting
      a request inflight in first call to io_grab_identity() it still may fail
      and go to io_identity_cow(), which migh briefly keep dangling
      work.identity and not only.
      
      Grab files last, so io_prep_async_work() won't fail if it did get into
      ->inflight_list.
      
      note: the bug shouldn't exist after making io_uring_cancel_files() not
      poking into other tasks' requests.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      af604703
    • B
      gfs2: Don't freeze the file system during unmount · f39e7d3a
      Bob Peterson 提交于
      GFS2's freeze/thaw mechanism uses a special freeze glock to control its
      operation. It does this with a sync glock operation (glops.c) called
      freeze_go_sync. When the freeze glock is demoted (glock's do_xmote) the
      glops function causes the file system to be frozen. This is intended. However,
      GFS2's mount and unmount processes also hold the freeze glock to prevent other
      processes, perhaps on different cluster nodes, from mounting the frozen file
      system in read-write mode.
      
      Before this patch, there was no check in freeze_go_sync for whether a freeze
      in intended or whether the glock demote was caused by a normal unmount.
      So it was trying to freeze the file system it's trying to unmount, which
      ends up in a deadlock.
      
      This patch adds an additional check to freeze_go_sync so that demotes of the
      freeze glock are ignored if they come from the unmount process.
      
      Fixes: 20b32912 ("gfs2: Fix regression in freeze_go_sync")
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      f39e7d3a
    • B
      gfs2: check for empty rgrp tree in gfs2_ri_update · 77872151
      Bob Peterson 提交于
      If gfs2 tries to mount a (corrupt) file system that has no resource
      groups it still tries to set preferences on the first one, which causes
      a kernel null pointer dereference. This patch adds a check to function
      gfs2_ri_update so this condition is detected and reported back as an
      error.
      
      Reported-by: syzbot+e3f23ce40269a4c9053a@syzkaller.appspotmail.com
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      77872151
  14. 25 11月, 2020 3 次提交
    • A
      efivarfs: revert "fix memory leak in efivarfs_create()" · ff04f3b6
      Ard Biesheuvel 提交于
      The memory leak addressed by commit fe5186cf is a false positive:
      all allocations are recorded in a linked list, and freed when the
      filesystem is unmounted. This leads to double frees, and as reported
      by David, leads to crashes if SLUB is configured to self destruct when
      double frees occur.
      
      So drop the redundant kfree() again, and instead, mark the offending
      pointer variable so the allocation is ignored by kmemleak.
      
      Cc: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
      Fixes: fe5186cf ("efivarfs: fix memory leak in efivarfs_create()")
      Reported-by: NDavid Laight <David.Laight@aculab.com>
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      ff04f3b6
    • A
      gfs2: set lockdep subclass for iopen glocks · 515b269d
      Alexander Aring 提交于
      This patch introduce a new globs attribute to define the subclass of the
      glock lockref spinlock. This avoid the following lockdep warning, which
      occurs when we lock an inode lock while an iopen lock is held:
      
      ============================================
      WARNING: possible recursive locking detected
      5.10.0-rc3+ #4990 Not tainted
      --------------------------------------------
      kworker/0:1/12 is trying to acquire lock:
      ffff9067d45672d8 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: lockref_get+0x9/0x20
      
      but task is already holding lock:
      ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&gl->gl_lockref.lock);
        lock(&gl->gl_lockref.lock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      3 locks held by kworker/0:1/12:
       #0: ffff9067c1bfdd38 ((wq_completion)delete_workqueue){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
       #1: ffffac594006be70 ((work_completion)(&(&gl->gl_delete)->work)){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
       #2: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260
      
      stack backtrace:
      CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.10.0-rc3+ #4990
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
      Workqueue: delete_workqueue delete_work_func
      Call Trace:
       dump_stack+0x8b/0xb0
       __lock_acquire.cold+0x19e/0x2e3
       lock_acquire+0x150/0x410
       ? lockref_get+0x9/0x20
       _raw_spin_lock+0x27/0x40
       ? lockref_get+0x9/0x20
       lockref_get+0x9/0x20
       delete_work_func+0x188/0x260
       process_one_work+0x237/0x540
       worker_thread+0x4d/0x3b0
       ? process_one_work+0x540/0x540
       kthread+0x127/0x140
       ? __kthread_bind_mask+0x60/0x60
       ret_from_fork+0x22/0x30
      Suggested-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      515b269d
    • A
      gfs2: Fix deadlock dumping resource group glocks · 16e6281b
      Alexander Aring 提交于
      Commit 0e539ca1 ("gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump")
      introduced additional locking in gfs2_rgrp_go_dump, which is also used for
      dumping resource group glocks via debugfs.  However, on that code path, the
      glock spin lock is already taken in dump_glock, and taking it again in
      gfs2_glock2rgrp leads to deadlock.  This can be reproduced with:
      
        $ mkfs.gfs2 -O -p lock_nolock /dev/FOO
        $ mount /dev/FOO /mnt/foo
        $ touch /mnt/foo/bar
        $ cat /sys/kernel/debug/gfs2/FOO/glocks
      
      Fix that by not taking the glock spin lock inside the go_dump callback.
      
      Fixes: 0e539ca1 ("gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump")
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      16e6281b