- 23 1月, 2021 29 次提交
-
-
由 Brian Foster 提交于
xfs_log_sbcount() calls xfs_sync_sb() to sync superblock counters to disk when lazy superblock accounting is enabled. This occurs on unmount, freeze, and read-only (re)mount and ensures the final values are calculated and persisted to disk before each form of quiesce completes. Now that log covering occurs in all of these contexts and uses the same xfs_sync_sb() mechanism to update log state, there is no need to log the superblock separately for any reason. Update the log quiesce path to sync the superblock at least once for any mount where lazy superblock accounting is enabled. If the log is already covered, it will remain in the covered state. Otherwise, the next sync as part of the normal covering sequence will carry the associated superblock update with it. Remove xfs_log_sbcount() now that it is no longer needed. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
-
由 Brian Foster 提交于
Now that log covering occurs on quiesce, we'd like to reuse the underlying superblock sync for final superblock updates. This includes things like lazy superblock counter updates, log feature incompat bits in the future, etc. One quirk to this approach is that once the log is in the IDLE (i.e. already covered) state, any subsequent log write resets the state back to NEED. This means that a final superblock sync to an already covered log requires two more sb syncs to return the log back to IDLE again. For example, if a lazy superblock enabled filesystem is mount cycled without any modifications, the unmount path syncs the superblock once and writes an unmount record. With the desired log quiesce covering behavior, we sync the superblock three times at unmount time: once for the lazy superblock counter update and twice more to cover the log. By contrast, if the log is active or only partially covered at unmount time, a final superblock sync would doubly serve as the one or two remaining syncs required to cover the log. This duplicate covering sequence is unnecessary because the filesystem remains consistent if a crash occurs at any point. The superblock will either be recovered in the event of a crash or written back before the log is quiesced and potentially cleaned with an unmount record. Update the log covering state machine to remain in the IDLE state if additional covering checkpoints pass through the log. This facilitates final superblock updates (such as lazy superblock counters) via a single sb sync without losing covered status. This provides some consistency with the active and partially covered cases and also avoids harmless, but spurious checkpoints when quiescing the log. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
-
由 Brian Foster 提交于
The log quiesce mechanism historically terminates by marking the log clean with an unmount record. The primary objective is to indicate that log recovery is no longer required after the quiesce has flushed all in-core changes and written back filesystem metadata. While this is perfectly fine, it is somewhat hacky as currently used in certain contexts. For example, filesystem freeze quiesces (i.e. cleans) the log and immediately redirties it with a dummy superblock transaction to ensure that log recovery runs in the event of a crash. While this functions correctly, cleaning the log from freeze context is clearly superfluous given the current redirtying behavior. Instead, the desired behavior can be achieved by simply covering the log. This effectively retires all on-disk log items from the active range of the log by issuing two synchronous and sequential dummy superblock update transactions that serve to update the on-disk log head and tail. The subtle difference is that the log technically remains dirty due to the lack of an unmount record, though recovery is effectively a no-op due to the content of the checkpoints being clean (i.e. the unmodified on-disk superblock). Log covering currently runs in the background and only triggers once the filesystem and log has idled. The purpose of the background mechanism is to prevent log recovery from replaying the most recently logged items long after those items may have been written back. In the quiesce path, the log has been deliberately idled by forcing the log and pushing the AIL until empty in a context where no further mutable filesystem operations are allowed. Therefore, we can cover the log as the final step in the log quiesce codepath to reflect that all previously active items have been successfully written back. This facilitates selective log covering from certain contexts (i.e. freeze) that only seek to quiesce, but not necessarily clean the log. Note that as a side effect of this change, log covering now occurs when cleaning the log as well. This is harmless, facilitates subsequent cleanups, and is mostly temporary as various operations switch to use explicit log covering. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
-
由 Brian Foster 提交于
Log quiesce is currently associated with cleaning the log, which is accomplished by writing an unmount record as the last step of the quiesce sequence. The quiesce codepath is a bit convoluted in this regard due to how it is reused from various contexts. In preparation to create separate log cleaning and log covering interfaces, lift the write of the unmount record into a new cleaning helper and call that wherever xfs_log_quiesce() is currently invoked. No functional changes. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Brian Foster 提交于
The log covering helper checks whether the filesystem is writable to determine whether to cover the log. The helper is currently only called from the background log worker. In preparation to reuse the helper from freezing contexts, lift the check into xfs_log_worker(). Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Brian Foster 提交于
xfs_log_sbcount() syncs the superblock specifically to accumulate the in-core percpu superblock counters and commit them to disk. This is required to maintain filesystem consistency across quiesce (freeze, read-only mount/remount) or unmount when lazy superblock accounting is enabled because individual transactions do not update the superblock directly. This mechanism works as expected for writable mounts, but xfs_log_sbcount() skips the update for read-only mounts. Read-only mounts otherwise still allow log recovery and write out an unmount record during log quiesce. If a read-only mount performs log recovery, it can modify the in-core superblock counters and write an unmount record when the filesystem unmounts without ever syncing the in-core counters. This leaves the filesystem with a clean log but in an inconsistent state with regard to lazy sb counters. Update xfs_log_sbcount() to use the same logic xfs_log_unmount_write() uses to determine when to write an unmount record. This ensures that lazy accounting is always synced before the log is cleaned. Refactor this logic into a new helper to distinguish between a writable filesystem and a writable log. Specifically, the log is writable unless the filesystem is mounted with the norecovery mount option, the underlying log device is read-only, or the filesystem is shutdown. Drop the freeze state check because the update is already allowed during the freezing process and no context calls this function on an already frozen fs. Also, retain the shutdown check in xfs_log_unmount_write() to catch the case where the preceding log force might have triggered a shutdown. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NGao Xiang <hsiangkao@redhat.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBill O'Donnell <billodo@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Jeffrey Mitchell 提交于
When XFS creates a new symlink, it writes its size to disk but not to the VFS inode. This causes i_size_read() to return 0 for that symlink until it is re-read from disk, for example when the system is rebooted. I found this inconsistency while protecting directories with eCryptFS. The command "stat path/to/symlink/in/ecryptfs" will report "Size: 0" if the symlink was created after the last reboot on an XFS root. Call i_size_write() in xfs_symlink() Signed-off-by: NJeffrey Mitchell <jeffrey.mitchell@starlab.io> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Brian Foster 提交于
xfs_buftarg_drain() is called from xfs_log_quiesce() to ensure the buffer cache is reclaimed during unmount. xfs_log_quiesce() is also called from xfs_quiesce_attr(), however, which means that cache state is completely drained for filesystem freeze and read-only remount. While technically harmless, this is unnecessarily heavyweight. Both freeze and read-only mounts allow reads and thus allow population of the buffer cache. Therefore, the transitional sequence in either case really only needs to quiesce outstanding writes to return the filesystem in a generally read-only state. Additionally, some users have reported that attempts to freeze a filesystem concurrent with a read-heavy workload causes the freeze process to stall for a significant amount of time. This occurs because, as mentioned above, the read workload repopulates the buffer LRU while the freeze task attempts to drain it. To improve this situation, replace the drain in xfs_log_quiesce() with a buffer I/O quiesce and lift the drain into the unmount path. This removes buffer LRU reclaim from freeze and read-only [re]mount, but ensures the LRU is still drained before the filesystem unmounts. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Brian Foster 提交于
xfs_wait_buftarg() is vaguely named and somewhat overloaded. Its primary purpose is to reclaim all buffers from the provided buffer target LRU. In preparation to refactor xfs_wait_buftarg() into serialization and LRU draining components, rename the function and associated helpers to something more descriptive. This patch has no functional changes with the minor exception of renaming a tracepoint. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Yumei Huang 提交于
An assert failure is triggered by syzkaller test due to ATTR_KILL_PRIV is not cleared before xfs_setattr_size. As ATTR_KILL_PRIV is not checked/used by xfs_setattr_size, just remove it from the assert. Signed-off-by: NYumei Huang <yuhuang@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Christoph Hellwig 提交于
XFS always inherits the SGID bit if it is set on the parent inode, while the generic inode_init_owner does not do this in a few cases where it can create a possible security problem, see commit 0fa3ecd8 ("Fix up non-directory creation in SGID directories") for details. Switch XFS to use the generic helper for the normal path to fix this, just keeping the simple field inheritance open coded for the case of the non-sgid case with the bsdgrpid mount option. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Reported-by: NChristian Brauner <christian.brauner@ubuntu.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Eric Biggers 提交于
The comment in xfs_file_aio_write_checks() about calling file_modified() after dropping the ilock doesn't make sense, because the code that unconditionally acquires and drops the ilock was removed by commit 467f7899 ("xfs: reduce ilock hold times in xfs_file_aio_write_checks"). Remove this outdated comment. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NEric Biggers <ebiggers@google.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chandan Babu R 提交于
This commit adds XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag which helps userspace test programs to get xfs_bmap_btalloc() to always allocate minlen sized extents. This is required for test programs which need a guarantee that minlen extents allocated for a file do not get merged with their existing neighbours in the inode's BMBT. "Inode fork extent overflow check" for Directories, Xattrs and extension of realtime inodes need this since the file offset at which the extents are being allocated cannot be explicitly controlled from userspace. One way to use this error tag is to, 1. Consume all of the free space by sequentially writing to a file. 2. Punch alternate blocks of the file. This causes CNTBT to contain sufficient number of one block sized extent records. 3. Inject XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag. After step 3, xfs_bmap_btalloc() will issue space allocation requests for minlen sized extents only. ENOSPC error code is returned to userspace when there aren't any "one block sized" extents left in any of the AGs. Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chandan Babu R 提交于
This commit moves over the code in xfs_bmap_btalloc() which is responsible for processing an allocated extent to a new function. Apart from xfs_bmap_btalloc(), the new function will be invoked by another function introduced in a future commit. Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chandan Babu R 提交于
This commit moves over the code which computes stripe alignment and extent size hint alignment into a separate function. Apart from xfs_bmap_btalloc(), the new function will be used by another function introduced in a future commit. Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chandan Babu R 提交于
The check for verifying if the allocated extent is from an AG whose index is greater than or equal to that of tp->t_firstblock is already done a couple of statements earlier in the same function. Hence this commit removes the redundant assert statement. Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chandan Babu R 提交于
This commit adds XFS_ERRTAG_REDUCE_MAX_IEXTENTS error tag which enables userspace programs to test "Inode fork extent count overflow detection" by reducing maximum possible inode fork extent count to 10. Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chandan Babu R 提交于
Removing an initial range of source/donor file's extent and adding a new extent (from donor/source file) in its place will cause extent count to increase by 1. Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chandan Babu R 提交于
Remapping an extent involves unmapping the existing extent and mapping in the new extent. When unmapping, an extent containing the entire unmap range can be split into two extents, i.e. | Old extent | hole | Old extent | Hence extent count increases by 1. Mapping in the new extent into the destination file can increase the extent count by 1. Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chandan Babu R 提交于
Moving an extent to data fork can cause a sub-interval of an existing extent to be unmapped. This will increase extent count by 1. Mapping in the new extent can increase the extent count by 1 again i.e. | Old extent | New extent | Old extent | Hence number of extents increases by 2. Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chandan Babu R 提交于
A write to a sub-interval of an existing unwritten extent causes the original extent to be split into 3 extents i.e. | Unwritten | Real | Unwritten | Hence extent count can increase by 2. Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chandan Babu R 提交于
Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to be added. One extra extent for dabtree in case a local attr is large enough to cause a double split. It can also cause extent count to increase proportional to the size of a remote xattr's value. Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chandan Babu R 提交于
A rename operation is essentially a directory entry remove operation from the perspective of parent directory (i.e. src_dp) of rename's source. Hence the only place where we check for extent count overflow for src_dp is in xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real() returns -ENOSPC when it detects a possible extent count overflow and in response, the higher layers of directory handling code do the following: 1. Data/Free blocks: XFS lets these blocks linger until a future remove operation removes them. 2. Dabtree blocks: XFS swaps the blocks with the last block in the Leaf space and unmaps the last block. For target_dp, there are two cases depending on whether the destination directory entry exists or not. When destination directory entry does not exist (i.e. target_ip == NULL), extent count overflow check is performed only when transaction has a non-zero sized space reservation associated with it. With a zero-sized space reservation, XFS allows a rename operation to continue only when the directory has sufficient free space in its data/leaf/free space blocks to hold the new entry. When destination directory entry exists (i.e. target_ip != NULL), all we need to do is change the inode number associated with the already existing entry. Hence there is no need to perform an extent count overflow check. Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chandan Babu R 提交于
Directory entry removal must always succeed; Hence XFS does the following during low disk space scenario: 1. Data/Free blocks linger until a future remove operation. 2. Dabtree blocks would be swapped with the last block in the leaf space and then the new last block will be unmapped. This facility is reused during low inode extent count scenario i.e. this commit causes xfs_bmap_del_extent_real() to return -ENOSPC error code so that the above mentioned behaviour is exercised causing no change to the directory's extent count. Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chandan Babu R 提交于
Directory entry addition can cause the following, 1. Data block can be added/removed. A new extent can cause extent count to increase by 1. 2. Free disk block can be added/removed. Same behaviour as described above for Data block. 3. Dabtree blocks. XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH. Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chandan Babu R 提交于
The extent mapping the file offset at which a hole has to be inserted will be split into two extents causing extent count to increase by 1. Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chandan Babu R 提交于
When adding a new data extent (without modifying an inode's existing extents) the extent count increases only by 1. This commit checks for extent count overflow in such cases. Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chandan Babu R 提交于
XFS does not check for possible overflow of per-inode extent counter fields when adding extents to either data or attr fork. For e.g. 1. Insert 5 million xattrs (each having a value size of 255 bytes) and then delete 50% of them in an alternating manner. 2. On a 4k block sized XFS filesystem instance, the above causes 98511 extents to be created in the attr fork of the inode. xfsaild/loop0 2008 [003] 1475.127209: probe:xfs_inode_to_disk: (ffffffffa43fb6b0) if_nextents=98511 i_ino=131 3. The incore inode fork extent counter is a signed 32-bit quantity. However the on-disk extent counter is an unsigned 16-bit quantity and hence cannot hold 98511 extents. 4. The following incorrect value is stored in the attr extent counter, # xfs_db -f -c 'inode 131' -c 'print core.naextents' /dev/loop0 core.naextents = -32561 This commit adds a new helper function (i.e. xfs_iext_count_may_overflow()) to check for overflow of the per-inode data and xattr extent counters. Future patches will use this function to make sure that an FS operation won't cause the extent counter to overflow. Suggested-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Darrick J. Wong 提交于
When overlayfs is running on top of xfs and the user unlinks a file in the overlay, overlayfs will create a whiteout inode and ask xfs to "rename" the whiteout file atop the one being unlinked. If the file being unlinked loses its one nlink, we then have to put the inode on the unlinked list. This requires us to grab the AGI buffer of the whiteout inode to take it off the unlinked list (which is where whiteouts are created) and to grab the AGI buffer of the file being deleted. If the whiteout was created in a higher numbered AG than the file being deleted, we'll lock the AGIs in the wrong order and deadlock. Therefore, grab all the AGI locks we think we'll need ahead of time, and in order of increasing AG number per the locking rules. Reported-by: Nwenli xie <wlxie7296@gmail.com> Fixes: 93597ae8 ("xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()") Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 18 1月, 2021 5 次提交
-
-
由 Linus Torvalds 提交于
-
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux由 Linus Torvalds 提交于
Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix 'CPU too large' error in Intel PT - Correct event attribute sizes in 'perf inject' - Sync build_bug.h and kvm.h kernel copies - Fix bpf.h header include directive in 5sec.c 'perf trace' bpf example - libbpf tests fixes - Fix shadow stat 'perf test' for non-bash shells - Take cgroups into account for shadow stats in 'perf stat' * tag 'perf-tools-fixes-2021-01-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf inject: Correct event attribute sizes perf intel-pt: Fix 'CPU too large' error perf stat: Take cgroups into account for shadow stats perf stat: Introduce struct runtime_stat_data libperf tests: Fail when failing to get a tracepoint id libperf tests: If a test fails return non-zero libperf tests: Avoid uninitialized variable warning perf test: Fix shadow stat test for non-bash shells tools headers: Syncronize linux/build_bug.h with the kernel sources tools headers UAPI: Sync kvm.h headers with the kernel sources perf bpf examples: Fix bpf.h header include directive in 5sec.c example
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux由 Linus Torvalds 提交于
Pull powerpc fixes from Michael Ellerman: "One fix for a lack of alignment in our linker script, that can lead to crashes depending on configuration etc. One fix for the 32-bit VDSO after the C VDSO conversion. Thanks to Andreas Schwab, Ariel Marcovitch, and Christophe Leroy" * tag 'powerpc-5.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/vdso: Fix clock_gettime_fallback for vdso32 powerpc: Fix alignment bug within the init sections
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs由 Linus Torvalds 提交于
Pull misc vfs fixes from Al Viro: "Several assorted fixes. I still think that audit ->d_name race is better fixed this way for the benefit of backports, with any possibly fancier variants done on top of it" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: dump_common_audit_data(): fix racy accesses to ->d_name iov_iter: fix the uaccess area in copy_compat_iovec_from_user umount(2): move the flag validity checks first
-
由 Linus Torvalds 提交于
So technically there is nothing wrong with adding a pinned page to the swap cache, but the pinning obviously means that the page can't actually be free'd right now anyway, so it's a bit pointless. However, the real problem is not with it being a bit pointless: the real issue is that after we've added it to the swap cache, we'll try to unmap the page. That will succeed, because the code in mm/rmap.c doesn't know or care about pinned pages. Even the unmapping isn't fatal per se, since the page will stay around in memory due to the pinning, and we do hold the connection to it using the swap cache. But when we then touch it next and take a page fault, the logic in do_swap_page() will map it back into the process as a possibly read-only page, and we'll then break the page association on the next COW fault. Honestly, this issue could have been fixed in any of those other places: (a) we could refuse to unmap a pinned page (which makes conceptual sense), or (b) we could make sure to re-map a pinned page writably in do_swap_page(), or (c) we could just make do_wp_page() not COW the pinned page (which was what we historically did before that "mm: do_wp_page() simplification" commit). But while all of them are equally valid models for breaking this chain, not putting pinned pages into the swap cache in the first place is the simplest one by far. It's also the safest one: the reason why do_wp_page() was changed in the first place was that getting the "can I re-use this page" wrong is so fraught with errors. If you do it wrong, you end up with an incorrectly shared page. As a result, using "page_maybe_dma_pinned()" in either do_wp_page() or do_swap_page() would be a serious bug since it is only a (very good) heuristic. Re-using the page requires a hard black-and-white rule with no room for ambiguity. In contrast, saying "this page is very likely dma pinned, so let's not add it to the swap cache and try to unmap it" is an obviously safe thing to do, and if the heuristic might very rarely be a false positive, no harm is done. Fixes: 09854ba9 ("mm: do_wp_page() simplification") Reported-and-tested-by: NMartin Raiber <martin@urbackup.org> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 1月, 2021 6 次提交
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi由 Linus Torvalds 提交于
Pull SCSI fixes from James Bottomley: "Nine minor fixes, seven in drivers and two in the core SCSI disk driver (sd) which should be harmless involving removing an unused variable and quietening a spurious warning" Signed-off-by: NJames E.J. Bottomley <jejb@linux.ibm.com> * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: Remove obsolete variable in sd_remove() scsi: sd: Suppress spurious errors when WRITE SAME is being disabled scsi: scsi_debug: Fix memleak in scsi_debug_init() scsi: mpt3sas: Fix spelling mistake in Kconfig "compatiblity" -> "compatibility" scsi: qedi: Correct max length of CHAP secret scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback scsi: ufs: Relocate flush of exceptional event scsi: ufs: Relax the condition of UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL scsi: ufs: Fix possible power drain during system suspend
-
由 Al Viro 提交于
We are not guaranteed the locking environment that would prevent dentry getting renamed right under us. And it's possible for old long name to be freed after rename, leading to UAF here. Cc: stable@kernel.org # v2.6.2+ Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
git://git.kernel.dk/linux-block由 Linus Torvalds 提交于
Pull block fixes from Jens Axboe: "Just an nvme pull request via Christoph: - don't initialize hwmon for discover controllers (Sagi Grimberg) - fix iov_iter handling in nvme-tcp (Sagi Grimberg) - fix a preempt warning in nvme-tcp (Sagi Grimberg) - fix a possible NULL pointer dereference in nvme (Israel Rukshin)" * tag 'block-5.11-2021-01-16' of git://git.kernel.dk/linux-block: nvme: don't intialize hwmon for discovery controllers nvme-tcp: fix possible data corruption with bio merges nvme-tcp: Fix warning with CONFIG_DEBUG_PREEMPT nvmet-rdma: Fix NULL deref when setting pi_enable and traddr INADDR_ANY
-
git://git.kernel.dk/linux-block由 Linus Torvalds 提交于
Pull io_uring fixes from Jens Axboe: "We still have a pending fix for a cancelation issue, but it's still being investigated. In the meantime: - Dead mm handling fix (Pavel) - SQPOLL setup error handling (Pavel) - Flush timeout sequence fix (Marcelo) - Missing finish_wait() for one exit case" * tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-block: io_uring: ensure finish_wait() is always called in __io_uring_task_cancel() io_uring: flush timeouts that should already have expired io_uring: do sqo disable on install_fd error io_uring: fix null-deref in io_disable_sqo_submit io_uring: don't take files/mm for a dead task io_uring: drop mm and files after task_work_run
-
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux由 Linus Torvalds 提交于
Pull RISC-V fixes from Palmer Dabbelt: "There are a few more fixes than a normal rc4, largely due to the bubble introduced by the holiday break: - return -ENOSYS for syscall number -1, which previously returned an uninitialized value. - ensure of_clk_init() has been called in time_init(), without which clock drivers may not be initialized. - fix sifive,uart0 driver to properly display the baud rate. A fix to initialize MPIE that allows interrupts to be processed during system calls. - avoid erronously begin tracing IRQs when interrupts are disabled, which at least triggers suprious lockdep failures. - workaround for a warning related to calling smp_processor_id() while preemptible. The warning itself is suprious on currently availiable systems. - properly include the generic time VDSO calls. A fix to our kasan address mapping. A fix to the HiFive Unleashed device tree, which allows the Ethernet PHY to be properly initialized by Linux (as opposed to relying on the bootloader). - defconfig update to include SiFive's GPIO driver, which is present on the HiFive Unleashed and necessary to initialize the PHY. - avoid allocating memory while initializing reserved memory. - avoid allocating the last 4K of memory, as pointers there alias with syscall errors. There are also two cleanups that should have no functional effect but do fix build warnings: - drop a duplicated definition of PAGE_KERNEL_EXEC. - properly declare the asm register SP shim. - cleanup the rv32 memory size Kconfig entry, to reflect the actual size of memory availiable" * tag 'riscv-for-linus-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: Fix maximum allowed phsyical memory for RV32 RISC-V: Set current memblock limit RISC-V: Do not allocate memblock while iterating reserved memblocks riscv: stacktrace: Move register keyword to beginning of declaration riscv: defconfig: enable gpio support for HiFive Unleashed dts: phy: add GPIO number and active state used for phy reset dts: phy: fix missing mdio device and probe failure of vsc8541-01 device riscv: Fix KASAN memory mapping. riscv: Fixup CONFIG_GENERIC_TIME_VSYSCALL riscv: cacheinfo: Fix using smp_processor_id() in preemptible riscv: Trace irq on only interrupt is enabled riscv: Drop a duplicated PAGE_KERNEL_EXEC riscv: Enable interrupts during syscalls with M-Mode riscv: Fix sifive serial driver riscv: Fix kernel time_init() riscv: return -ENOSYS for syscall -1
-
由 Linus Torvalds 提交于
Turning a pinned page read-only breaks the pinning after COW. Don't do it. The whole "track page soft dirty" state doesn't work with pinned pages anyway, since the page might be dirtied by the pinning entity without ever being noticed in the page tables. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-