- 30 8月, 2019 1 次提交
-
-
由 Darrick J. Wong 提交于
Use -ECANCELED to signal "stop iterating" instead of these magical *_ITER_ABORT values, since it's duplicative. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 28 8月, 2019 9 次提交
-
-
由 Eric Sandeen 提交于
xfs_trans_log_buf() takes a final argument of the last byte to log in the buffer; b_length is in basic blocks, so this isn't the correct last byte. Fix it. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Darrick J. Wong 提交于
In xfs_rmap_irec_offset_unpack, we should always clear the contents of rm_flags before we begin unpacking the encoded (ondisk) offset into the incore rm_offset and incore rm_flags fields. Remove the open-coded field zeroing as this encourages api misuse. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Remove the return value from the functions that schedule deferred bmap operations since they never fail and do not return status. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Remove the return value from the functions that schedule deferred refcount operations since they never fail and do not return status. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Remove the return value from the functions that schedule deferred rmap operations since they never fail and do not return status. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
This function doesn't use the @state parameter, so get rid of it. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
In xfs_bmbt_diff_two_keys, we perform a signed int64_t subtraction with two unsigned 64-bit quantities. If the second quantity is actually the "maximum" key (all ones) as used in _query_all, the subtraction effectively becomes addition of two positive numbers and the function returns incorrect results. Fix this with explicit comparisons of the unsigned values. Nobody needs this now, but the online repair patches will need this to work properly. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
The xfs_rmap_has_other_keys helper aborts the iteration as soon as it has an answer. Don't let this abort leak out to callers. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
In xfs_ialloc_setup_geometry, it's possible for a malicious/corrupt fs image to set an unreasonably large value for sb_inopblog which will cause ialloc_blks to be zero. If sb_imax_pct is also set, this results in a division by zero error in the second do_div call. Therefore, force maxicount to zero if ialloc_blks is zero. Note that the kernel metadata verifiers will catch the garbage inopblog value and abort the fs mount long before it tries to set up the inode geometry; this is needed to avoid a crash in xfs_db while setting up the xfs_mount structure. Found by fuzzing sb_inopblog to 122 in xfs/350. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
-
- 27 8月, 2019 6 次提交
-
-
由 Darrick J. Wong 提交于
The inode block mapping scrub function does more work for btree format extent maps than is absolutely necessary -- first it will walk the bmbt and check all the entries, and then it will load the incore tree and check every entry in that tree, possibly for a second time. Simplify the code and decrease check runtime by separating the two responsibilities. The bmbt walk will make sure the incore extent mappings are loaded, check the shape of the bmap btree (via xchk_btree) and check that every bmbt record has a corresponding incore extent map; and the incore extent map walk takes all the responsibility for checking the mapping records and cross referencing them with other AG metadata. This enables us to clean up some messy parameter handling and reduce redundant code. Rename a few functions to make the split of responsibilities clearer. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 zhengbin 提交于
Fixes gcc warning: fs/xfs/libxfs/xfs_btree.c:4475: warning: Excess function parameter 'max_recs' description in 'xfs_btree_sblock_v5hdr_verify' fs/xfs/libxfs/xfs_btree.c:4475: warning: Excess function parameter 'pag_max_level' description in 'xfs_btree_sblock_v5hdr_verify' Fixes: c5ab131b ("libxfs: refactor short btree block verification") Signed-off-by: Nzhengbin <zhengbin13@huawei.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Dave Chinner 提交于
Memory we use to submit for IO needs strict alignment to the underlying driver contraints. Worst case, this is 512 bytes. Given that all allocations for IO are always a power of 2 multiple of 512 bytes, the kernel heap provides natural alignment for objects of these sizes and that suffices. Until, of course, memory debugging of some kind is turned on (e.g. red zones, poisoning, KASAN) and then the alignment of the heap objects is thrown out the window. Then we get weird IO errors and data corruption problems because drivers don't validate alignment and do the wrong thing when passed unaligned memory buffers in bios. TO fix this, introduce kmem_alloc_io(), which will guaranteeat least 512 byte alignment of buffers for IO, even if memory debugging options are turned on. It is assumed that the minimum allocation size will be 512 bytes, and that sizes will be power of 2 mulitples of 512 bytes. Use this everywhere we allocate buffers for IO. This no longer fails with log recovery errors when KASAN is enabled due to the brd driver not handling unaligned memory buffers: # mkfs.xfs -f /dev/ram0 ; mount /dev/ram0 /mnt/test Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Dave Chinner 提交于
Needed to feed into the allocation routine to guarantee the memory buffers we add to bios are correctly aligned to the underlying device. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Dave Chinner 提交于
When trying to correlate XFS kernel allocations to memory reclaim behaviour, it is useful to know what allocations XFS is actually attempting. This information is not directly available from tracepoints in the generic memory allocation and reclaim tracepoints, so these new trace points provide a high level indication of what the XFS memory demand actually is. There is no per-filesystem context in this code, so we just trace the type of allocation, the size and the allocation constraints. The kmem code also doesn't include much of the common XFS headers, so there are a few definitions that need to be added to the trace headers and a couple of types that need to be made common to avoid needing to include the whole world in the kmem code. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Tetsuo Handa 提交于
Since no caller is using KM_NOSLEEP and no callee branches on KM_SLEEP, we can remove KM_NOSLEEP and replace KM_SLEEP with 0. Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 25 8月, 2019 1 次提交
-
-
由 Oleg Nesterov 提交于
userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even if mm->core_state != NULL. Otherwise a page fault can see userfaultfd_missing() == T and use an already freed userfaultfd_ctx. Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com Fixes: 04f5866e ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping") Signed-off-by: NOleg Nesterov <oleg@redhat.com> Reported-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com> Tested-by: NKefeng Wang <wangkefeng.wang@huawei.com> Cc: Peter Xu <peterx@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 8月, 2019 2 次提交
-
-
由 Darrick J. Wong 提交于
Benjamin Moody reported to Debian that XFS partially wedges when a chgrp fails on account of being out of disk quota. I ran his reproducer script: # adduser dummy # adduser dummy plugdev # dd if=/dev/zero bs=1M count=100 of=test.img # mkfs.xfs test.img # mount -t xfs -o gquota test.img /mnt # mkdir -p /mnt/dummy # chown -c dummy /mnt/dummy # xfs_quota -xc 'limit -g bsoft=100k bhard=100k plugdev' /mnt (and then as user dummy) $ dd if=/dev/urandom bs=1M count=50 of=/mnt/dummy/foo $ chgrp plugdev /mnt/dummy/foo and saw: ================================================ WARNING: lock held when returning to user space! 5.3.0-rc5 #rc5 Tainted: G W ------------------------------------------------ chgrp/47006 is leaving the kernel with locks still held! 1 lock held by chgrp/47006: #0: 000000006664ea2d (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0xd2/0x290 [xfs] ...which is clearly caused by xfs_setattr_nonsize failing to unlock the ILOCK after the xfs_qm_vop_chown_reserve call fails. Add the missing unlock. Reported-by: benjamin.moody@gmail.com Fixes: 253f4911 ("xfs: better xfs_trans_alloc interface") Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Tested-by: NSalvatore Bonaccorso <carnil@debian.org>
-
由 Jens Axboe 提交于
The outer poll loop checks for whether we need to reschedule, and returns to userspace if we do. However, it's possible to get stuck in the inner loop as well, if the CPU we are running on needs to reschedule to finish the IO work. Add the need_resched() check in the inner loop as well. This fixes a potential hang if the kernel is configured with CONFIG_PREEMPT_VOLUNTARY=y. Reported-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Tested-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 8月, 2019 11 次提交
-
-
由 Liu Song 提交于
If the number of dirty pages to be written back is large, then writeback_inodes_sb will block waiting for a long time, causing hung task detection alarm. Therefore, we should limit the maximum number of pages written back this time, which let the budget be completed faster. The remaining dirty pages tend to rely on the writeback mechanism to complete the synchronization. Fixes: b6e51316 ("writeback: separate starting of sync vs opportunistic writeback") Signed-off-by: NLiu Song <liu.song11@zte.com.cn> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Richard Weinberger 提交于
Currently on a freshly mounted UBIFS, c->min_log_bytes is 0. This can lead to a log overrun and make commits fail. Recent kernels will report the following assert: UBIFS assert failed: c->lhead_lnum != c->ltail_lnum, in fs/ubifs/log.c:412 c->min_log_bytes can have two states, 0 and c->leb_size. It controls how much bytes of the log area are reserved for non-bud nodes such as commit nodes. After a commit it has to be set to c->leb_size such that we have always enough space for a commit. While a commit runs it can be 0 to make the remaining bytes of the log available to writers. Having it set to 0 right after mount is wrong since no space for commits is reserved. Fixes: 1e51764a ("UBIFS: add new flash file system") Reported-and-tested-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Richard Weinberger 提交于
We unlock after orphan_delete(), so no need to unlock in the function too. Reported-by: NHan Xu <han.xu@nxp.com> Fixes: 8009ce95 ("ubifs: Don't leak orphans on memory during commit") Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 YueHaibing 提交于
It seems that 'yfs_RXYFSStoreOpaqueACL2' should be use in yfs_fs_store_opaque_acl2(). Fixes: f5e45463 ("afs: Implement YFS ACL setting") Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 Marc Dionne 提交于
The afs_lookup trace event can cause the following: [ 216.576777] BUG: kernel NULL pointer dereference, address: 000000000000023b [ 216.576803] #PF: supervisor read access in kernel mode [ 216.576813] #PF: error_code(0x0000) - not-present page ... [ 216.576913] RIP: 0010:trace_event_raw_event_afs_lookup+0x9e/0x1c0 [kafs] If the inode from afs_do_lookup() is an error other than ENOENT, or if it is ENOENT and afs_try_auto_mntpt() returns an error, the trace event will try to dereference the error pointer as a valid pointer. Use IS_ERR_OR_NULL to only pass a valid pointer for the trace, or NULL. Ideally the trace would include the error value, but for now just avoid the oops. Fixes: 80548b03 ("afs: Add more tracepoints") Signed-off-by: NMarc Dionne <marc.dionne@auristor.com> Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Fix a leak on the cell refcount in afs_lookup_cell_rcu() due to non-clearance of the default error in the case a NULL cell name is passed and the workstation default cell is used. Also put a bit at the end to make sure we don't leak a cell ref if we're going to be returning an error. This leak results in an assertion like the following when the kafs module is unloaded: AFS: Assertion failed 2 == 1 is false 0x2 == 0x1 is false ------------[ cut here ]------------ kernel BUG at fs/afs/cell.c:770! ... RIP: 0010:afs_manage_cells+0x220/0x42f [kafs] ... process_one_work+0x4c2/0x82c ? pool_mayday_timeout+0x1e1/0x1e1 ? do_raw_spin_lock+0x134/0x175 worker_thread+0x336/0x4a6 ? rescuer_thread+0x4af/0x4af kthread+0x1de/0x1ee ? kthread_park+0xd4/0xd4 ret_from_fork+0x24/0x30 Fixes: 989782dc ("afs: Overhaul cell database management") Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 Jeff Layton 提交于
When ceph_mdsc_do_request returns an error, we can't assume that the filelock_reply pointer will be set. Only try to fetch fields out of the r_reply_info when it returns success. Cc: stable@vger.kernel.org Reported-by: NHector Martin <hector@marcansoft.com> Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Erqi Chen 提交于
clear_page_dirty_for_io(page) before mapping->a_ops->invalidatepage(). invalidatepage() clears page's private flag, if dirty flag is not cleared, the page may cause BUG_ON failure in ceph_set_page_dirty(). Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/40862Signed-off-by: NErqi Chen <chenerqi@gmail.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Luis Henriques 提交于
Calling ceph_buffer_put() in fill_inode() may result in freeing the i_xattrs.blob buffer while holding the i_ceph_lock. This can be fixed by postponing the call until later, when the lock is released. The following backtrace was triggered by fstests generic/070. BUG: sleeping function called from invalid context at mm/vmalloc.c:2283 in_atomic(): 1, irqs_disabled(): 0, pid: 3852, name: kworker/0:4 6 locks held by kworker/0:4/3852: #0: 000000004270f6bb ((wq_completion)ceph-msgr){+.+.}, at: process_one_work+0x1b8/0x5f0 #1: 00000000eb420803 ((work_completion)(&(&con->work)->work)){+.+.}, at: process_one_work+0x1b8/0x5f0 #2: 00000000be1c53a4 (&s->s_mutex){+.+.}, at: dispatch+0x288/0x1476 #3: 00000000559cb958 (&mdsc->snap_rwsem){++++}, at: dispatch+0x2eb/0x1476 #4: 000000000d5ebbae (&req->r_fill_mutex){+.+.}, at: dispatch+0x2fc/0x1476 #5: 00000000a83d0514 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: fill_inode.isra.0+0xf8/0xf70 CPU: 0 PID: 3852 Comm: kworker/0:4 Not tainted 5.2.0+ #441 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014 Workqueue: ceph-msgr ceph_con_workfn Call Trace: dump_stack+0x67/0x90 ___might_sleep.cold+0x9f/0xb1 vfree+0x4b/0x60 ceph_buffer_release+0x1b/0x60 fill_inode.isra.0+0xa9b/0xf70 ceph_fill_trace+0x13b/0xc70 ? dispatch+0x2eb/0x1476 dispatch+0x320/0x1476 ? __mutex_unlock_slowpath+0x4d/0x2a0 ceph_con_workfn+0xc97/0x2ec0 ? process_one_work+0x1b8/0x5f0 process_one_work+0x244/0x5f0 worker_thread+0x4d/0x3e0 kthread+0x105/0x140 ? process_one_work+0x5f0/0x5f0 ? kthread_park+0x90/0x90 ret_from_fork+0x3a/0x50 Signed-off-by: NLuis Henriques <lhenriques@suse.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Luis Henriques 提交于
Calling ceph_buffer_put() in __ceph_build_xattrs_blob() may result in freeing the i_xattrs.blob buffer while holding the i_ceph_lock. This can be fixed by having this function returning the old blob buffer and have the callers of this function freeing it when the lock is released. The following backtrace was triggered by fstests generic/117. BUG: sleeping function called from invalid context at mm/vmalloc.c:2283 in_atomic(): 1, irqs_disabled(): 0, pid: 649, name: fsstress 4 locks held by fsstress/649: #0: 00000000a7478e7e (&type->s_umount_key#19){++++}, at: iterate_supers+0x77/0xf0 #1: 00000000f8de1423 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: ceph_check_caps+0x7b/0xc60 #2: 00000000562f2b27 (&s->s_mutex){+.+.}, at: ceph_check_caps+0x3bd/0xc60 #3: 00000000f83ce16a (&mdsc->snap_rwsem){++++}, at: ceph_check_caps+0x3ed/0xc60 CPU: 1 PID: 649 Comm: fsstress Not tainted 5.2.0+ #439 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x67/0x90 ___might_sleep.cold+0x9f/0xb1 vfree+0x4b/0x60 ceph_buffer_release+0x1b/0x60 __ceph_build_xattrs_blob+0x12b/0x170 __send_cap+0x302/0x540 ? __lock_acquire+0x23c/0x1e40 ? __mark_caps_flushing+0x15c/0x280 ? _raw_spin_unlock+0x24/0x30 ceph_check_caps+0x5f0/0xc60 ceph_flush_dirty_caps+0x7c/0x150 ? __ia32_sys_fdatasync+0x20/0x20 ceph_sync_fs+0x5a/0x130 iterate_supers+0x8f/0xf0 ksys_sync+0x4f/0xb0 __ia32_sys_sync+0xa/0x10 do_syscall_64+0x50/0x1c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fc6409ab617 Signed-off-by: NLuis Henriques <lhenriques@suse.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Luis Henriques 提交于
Calling ceph_buffer_put() in __ceph_setxattr() may end up freeing the i_xattrs.prealloc_blob buffer while holding the i_ceph_lock. This can be fixed by postponing the call until later, when the lock is released. The following backtrace was triggered by fstests generic/117. BUG: sleeping function called from invalid context at mm/vmalloc.c:2283 in_atomic(): 1, irqs_disabled(): 0, pid: 650, name: fsstress 3 locks held by fsstress/650: #0: 00000000870a0fe8 (sb_writers#8){.+.+}, at: mnt_want_write+0x20/0x50 #1: 00000000ba0c4c74 (&type->i_mutex_dir_key#6){++++}, at: vfs_setxattr+0x55/0xa0 #2: 000000008dfbb3f2 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: __ceph_setxattr+0x297/0x810 CPU: 1 PID: 650 Comm: fsstress Not tainted 5.2.0+ #437 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x67/0x90 ___might_sleep.cold+0x9f/0xb1 vfree+0x4b/0x60 ceph_buffer_release+0x1b/0x60 __ceph_setxattr+0x2b4/0x810 __vfs_setxattr+0x66/0x80 __vfs_setxattr_noperm+0x59/0xf0 vfs_setxattr+0x81/0xa0 setxattr+0x115/0x230 ? filename_lookup+0xc9/0x140 ? rcu_read_lock_sched_held+0x74/0x80 ? rcu_sync_lockdep_assert+0x2e/0x60 ? __sb_start_write+0x142/0x1a0 ? mnt_want_write+0x20/0x50 path_setxattr+0xba/0xd0 __x64_sys_lsetxattr+0x24/0x30 do_syscall_64+0x50/0x1c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7ff23514359a Signed-off-by: NLuis Henriques <lhenriques@suse.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 21 8月, 2019 2 次提交
-
-
由 Jens Axboe 提交于
We need to check if we have CQEs pending before starting a poll loop, as those could be the events we will be spinning for (and hence we'll find none). This can happen if a CQE triggers an error, or if it is found by eg an IRQ before we get a chance to find it through polling. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If a request issue ends up being punted to async context to avoid blocking, we can get into a situation where the original application enters the poll loop for that very request before it has been issued. This should not be an issue, except that the polling will hold the io_uring uring_ctx mutex for the duration of the poll. When the async worker has actually issued the request, it needs to acquire this mutex to add the request to the poll issued list. Since the application polling is already holding this mutex, the workqueue sleeps on the mutex forever, and the application thus never gets a chance to poll for the very request it was interested in. Fix this by ensuring that the polling drops the uring_ctx occasionally if it's not making any progress. Reported-by: NJeffrey M. Birnbaum <jmbnyc@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 8月, 2019 1 次提交
-
-
由 Ira Weiny 提交于
The parens used in the while loop would result in error being assigned the value 1 rather than the intended errno value. This is required to return -ETXTBSY from follow on break_layout() changes. Signed-off-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 19 8月, 2019 2 次提交
-
-
由 Eric W. Biederman 提交于
My recent to change to only use force_sig for a synchronous events wound up breaking signal reception cifs and drbd. I had overlooked the fact that by default kthreads start out with all signals set to SIG_IGN. So a change I thought was safe turned out to have made it impossible for those kernel thread to catch their signals. Reverting the work on force_sig is a bad idea because what the code was doing was very much a misuse of force_sig. As the way force_sig ultimately allowed the signal to happen was to change the signal handler to SIG_DFL. Which after the first signal will allow userspace to send signals to these kernel threads. At least for wake_ack_receiver in drbd that does not appear actively wrong. So correct this problem by adding allow_kernel_signal that will allow signals whose siginfo reports they were sent by the kernel through, but will not allow userspace generated signals, and update cifs and drbd to call allow_kernel_signal in an appropriate place so that their thread can receive this signal. Fixing things this way ensures that userspace won't be able to send signals and cause problems, that it is clear which signals the threads are expecting to receive, and it guarantees that nothing else in the system will be affected. This change was partly inspired by similar cifs and drbd patches that added allow_signal. Reported-by: Nronnie sahlberg <ronniesahlberg@gmail.com> Reported-by: NChristoph Böhmwalder <christoph.boehmwalder@linbit.com> Tested-by: NChristoph Böhmwalder <christoph.boehmwalder@linbit.com> Cc: Steve French <smfrench@gmail.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: David Laight <David.Laight@ACULAB.COM> Fixes: 247bc947 ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes") Fixes: 72abe3bc ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig") Fixes: fee10990 ("signal/drbd: Use send_sig not force_sig") Fixes: 3cf5d076 ("signal: Remove task parameter from force_sig") Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Darrick J. Wong 提交于
While trawling through the dedupe file comparison code trying to fix page deadlocking problems, Dave Chinner noticed that the reflink code only takes shared IOLOCK/MMAPLOCKs on the source file. Because page_mkwrite and directio writes do not take the EXCL versions of those locks, this means that reflink can race with writer processes. For pure remapping this can lead to undefined behavior and file corruption; for dedupe this means that we cannot be sure that the contents are identical when we decide to go ahead with the remapping. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 17 8月, 2019 4 次提交
-
-
由 Darrick J. Wong 提交于
When dedupe wants to use the page cache to compare parts of two files for dedupe, we must be very careful to handle locking correctly. The current code doesn't do this. It must lock and unlock the page only once if the two pages are the same, since the overlapping range check doesn't catch this when blocksize < pagesize. If the pages are distinct but from the same file, we must observe page locking order and lock them in order of increasing offset to avoid clashing with writeback locking. Fixes: 876bec6f ("vfs: refactor clone/dedupe_file_range common functions") Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBill O'Donnell <billodo@redhat.com> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Christoph Hellwig 提交于
For 31-bit s390 user space, we have to pass pointer arguments through compat_ptr() in the compat_ioctl handler. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Always try the native ioctl if we don't have a compat handler. This removes a lot of boilerplate code as 'modern' ioctls should generally be compat clean, and fixes the missing entries for the recently added FS_IOC_GETFSLABEL/FS_IOC_SETFSLABEL ioctls. Fixes: f7664b31 ("xfs: implement online get/set fs label") Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 He Zhe 提交于
reply_cache_stats uses wrong parameter as seq file private structure and thus causes the following kernel crash when users read /proc/fs/nfsd/reply_cache_stats BUG: kernel NULL pointer dereference, address: 00000000000001f9 PGD 0 P4D 0 Oops: 0000 [#3] SMP PTI CPU: 6 PID: 1502 Comm: cat Tainted: G D 5.3.0-rc3+ #1 Hardware name: Intel Corporation Broadwell Client platform/Basking Ridge, BIOS BDW-E2R1.86C.0118.R01.1503110618 03/11/2015 RIP: 0010:nfsd_reply_cache_stats_show+0x3b/0x2d0 Code: 41 54 49 89 f4 48 89 fe 48 c7 c7 b3 10 33 88 53 bb e8 03 00 00 e8 88 82 d1 ff bf 58 89 41 00 e8 eb c5 85 00 48 83 eb 01 75 f0 <41> 8b 94 24 f8 01 00 00 48 c7 c6 be 10 33 88 4c 89 ef bb e8 03 00 RSP: 0018:ffffaa520106fe08 EFLAGS: 00010246 RAX: 000000cfe1a77123 RBX: 0000000000000000 RCX: 0000000000291b46 RDX: 000000cf00000000 RSI: 0000000000000006 RDI: 0000000000291b28 RBP: ffffaa520106fe20 R08: 0000000000000006 R09: 000000cfe17e55dd R10: ffffa424e47c0000 R11: 000000000000030b R12: 0000000000000001 R13: ffffa424e5697000 R14: 0000000000000001 R15: ffffa424e5697000 FS: 00007f805735f580(0000) GS:ffffa424f8f80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000001f9 CR3: 00000000655ce005 CR4: 00000000003606e0 Call Trace: seq_read+0x194/0x3e0 __vfs_read+0x1b/0x40 vfs_read+0x95/0x140 ksys_read+0x61/0xe0 __x64_sys_read+0x1a/0x20 do_syscall_64+0x4d/0x120 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f805728b861 Code: fe ff ff 50 48 8d 3d 86 b4 09 00 e8 79 e0 01 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 d9 19 0d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 48 83 ec 28 48 89 54 RSP: 002b:00007ffea1ce3c38 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f805728b861 RDX: 0000000000020000 RSI: 00007f8057183000 RDI: 0000000000000003 RBP: 00007f8057183000 R08: 00007f8057182010 R09: 0000000000000000 R10: 0000000000000022 R11: 0000000000000246 R12: 0000559a60e8ff10 R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 Modules linked in: CR2: 00000000000001f9 ---[ end trace 01613595153f0cba ]--- RIP: 0010:nfsd_reply_cache_stats_show+0x3b/0x2d0 Code: 41 54 49 89 f4 48 89 fe 48 c7 c7 b3 10 33 88 53 bb e8 03 00 00 e8 88 82 d1 ff bf 58 89 41 00 e8 eb c5 85 00 48 83 eb 01 75 f0 <41> 8b 94 24 f8 01 00 00 48 c7 c6 be 10 33 88 4c 89 ef bb e8 03 00 RSP: 0018:ffffaa52004b3e08 EFLAGS: 00010246 RAX: 0000002bab45a7c6 RBX: 0000000000000000 RCX: 0000000000291b4c RDX: 0000002b00000000 RSI: 0000000000000004 RDI: 0000000000291b28 RBP: ffffaa52004b3e20 R08: 0000000000000004 R09: 0000002bab1c8c7a R10: ffffa424e5500000 R11: 00000000000002a9 R12: 0000000000000001 R13: ffffa424e4475000 R14: 0000000000000001 R15: ffffa424e4475000 FS: 00007f805735f580(0000) GS:ffffa424f8f80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000001f9 CR3: 00000000655ce005 CR4: 00000000003606e0 Killed Fixes: 3ba75830 ("nfsd4: drc containerization") Signed-off-by: NHe Zhe <zhe.he@windriver.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 16 8月, 2019 1 次提交
-
-
由 J. Bruce Fields 提交于
A process could race in an open and attempt to read one of these files before i_private is initialized, and get a spurious error. Reported-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-