- 01 2月, 2020 1 次提交
-
-
由 Ira Weiny 提交于
At some point filemap_write_and_wait() and filemap_write_and_wait_range() got the exact same implementation with the exception of the range being specified in *_range() Similar to other functions in fs.h which call *_range(..., 0, LLONG_MAX), change filemap_write_and_wait() to be a static inline which calls filemap_write_and_wait_range() Link: http://lkml.kernel.org/r/20191129160713.30892-1-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 1月, 2020 1 次提交
-
-
由 Jiufei Xue 提交于
This doesn't cause any behavior changes and will be used by overlay async IO implementation. Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 03 1月, 2020 1 次提交
-
-
由 Arnd Bergmann 提交于
Now that both native and compat ioctl syscalls are in the same file, a couple of simplifications can be made, bringing the implementation closer together: - do_vfs_ioctl(), ioctl_preallocate(), and compat_ioctl_preallocate() can become static, allowing the compiler to optimize better - slightly update the coding style for consistency between the functions. - rather than listing each command in two switch statements for the compat case, just call a single function that has all the common commands. As a side-effect, FS_IOC_RESVSP/FS_IOC_RESVSP64 are now available to x86 compat tasks, along with FS_IOC_RESVSP_32/FS_IOC_RESVSP64_32. This is harmless for i386 emulation, and can be considered a bugfix for x32 emulation, which never supported these in the past. Reviewed-by: NBen Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: NArnd Bergmann <arnd@arndb.de>
-
- 08 12月, 2019 1 次提交
-
-
由 Thomas Gleixner 提交于
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch the i_size() and part_nr_sects_…() code over to use CONFIG_PREEMPTION. Update the comment for fsstack_copy_inode_size() also to refer to CONFIG_PREEMPTION. [bigeasy: +PREEMPT comments] Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20191015191821.11479-24-bigeasy@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 01 12月, 2019 1 次提交
-
-
由 Konstantin Khlebnikov 提交于
This helper prints warning if direct I/O write failed to invalidate cache, and set EIO at inode to warn usersapce about possible data corruption. See also commit 5a9d929d ("iomap: report collisions between directio and buffered writes to userspace"). Direct I/O is supported by non-disk filesystems, for example NFS. Thus generic code needs this even in kernel without CONFIG_BLOCK. Link: http://lkml.kernel.org/r/157270038074.4812.7980855544557488880.stgit@buzzSigned-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 11月, 2019 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit 0be0ee71. I was hoping it would be benign to switch over entirely to FMODE_STREAM, and we'd have just a couple of small fixups we'd need, but it looks like we're not quite there yet. While it worked fine on both my desktop and laptop, they are fairly similar in other respects, and run mostly the same loads. Kenneth Crudup reports that it seems to break both his vmware installation and the KDE upower service. In both cases apparently leading to timeouts due to waitinmg for the f_pos lock. There are a number of character devices in particular that definitely want stream-like behavior, but that currently don't get marked as streams, and as a result get the exclusion between concurrent read()/write() on the same file descriptor. Which doesn't work well for them. The most obvious example if this is /dev/console and /dev/tty, which use console_fops and tty_fops respectively (and ptmx_fops for the pty master side). It may be that it's just this that causes problems, but we clearly weren't ready yet. Because there's a number of other likely common cases that don't have llseek implementations and would seem to act as stream devices: /dev/fuse (fuse_dev_operations) /dev/mcelog (mce_chrdev_ops) /dev/mei0 (mei_fops) /dev/net/tun (tun_fops) /dev/nvme0 (nvme_dev_fops) /dev/tpm0 (tpm_fops) /proc/self/ns/mnt (ns_file_operations) /dev/snd/pcm* (snd_pcm_f_ops[]) and while some of these could be trivially automatically detected by the vfs layer when the character device is opened by just noticing that they have no read or write operations either, it often isn't that obvious. Some character devices most definitely do use the file position, even if they don't allow seeking: the firmware update code, for example, uses simple_read_from_buffer() that does use f_pos, but doesn't allow seeking back and forth. We'll revisit this when there's a better way to detect the problem and fix it (possibly with a coccinelle script to do more of the FMODE_STREAM annotations). Reported-by: NKenneth R. Crudup <kenny@panix.com> Cc: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 11月, 2019 1 次提交
-
-
由 Linus Torvalds 提交于
fdget_pos() is used by file operations that will read and update f_pos: things like "read()", "write()" and "lseek()" (but not, for example, "pread()/pwrite" that get their file positions elsewhere). However, it had two separate escape clauses for this, because not everybody wants or needs serialization of the file position. The first and most obvious case is the "file descriptor doesn't have a position at all", ie a stream-like file. Except we didn't actually use FMODE_STREAM, but instead used FMODE_ATOMIC_POS. The reason for that was that FMODE_STREAM didn't exist back in the days, but also that we didn't want to mark all the special cases, so we only marked the ones that _required_ position atomicity according to POSIX - regular files and directories. The case one was intentionally lazy, but now that we _do_ have FMODE_STREAM we could and should just use it. With the change to use FMODE_STREAM, there are no remaining uses for FMODE_ATOMIC_POS, and all the code to set it is deleted. Any cases where we don't want the serialization because the driver (or subsystem) doesn't use the file position should just be updated to do "stream_open()". We've done that for all the obvious and common situations, we may need a few more. Quoting Kirill Smelkov in the original FMODE_STREAM thread (see link below for full email): "And I appreciate if people could help at least somehow with "getting rid of mixed case entirely" (i.e. always lock f_pos_lock on !FMODE_STREAM), because this transition starts to diverge from my particular use-case too far. To me it makes sense to do that transition as follows: - convert nonseekable_open -> stream_open via stream_open.cocci; - audit other nonseekable_open calls and convert left users that truly don't depend on position to stream_open; - extend stream_open.cocci to analyze alloc_file_pseudo as well (this will cover pipes and sockets), or maybe convert pipes and sockets to FMODE_STREAM manually; - extend stream_open.cocci to analyze file_operations that use no_llseek or noop_llseek, but do not use nonseekable_open or alloc_file_pseudo. This might find files that have stream semantic but are opened differently; - extend stream_open.cocci to analyze file_operations whose .read/.write do not use ppos at all (independently of how file was opened); - ... - after that remove FMODE_ATOMIC_POS and always take f_pos_lock if !FMODE_STREAM; - gather bug reports for deadlocked read/write and convert missed cases to FMODE_STREAM, probably extending stream_open.cocci along the road to catch similar cases i.e. always take f_pos_lock unless a file is explicitly marked as being stream, and try to find and cover all files that are streams" We have not done the "extend stream_open.cocci to analyze alloc_file_pseudo" as well, but the previous commit did manually handle the case of pipes and sockets. The other case where we can avoid locking f_pos is the "this file descriptor only has a single user and it is us, and thus there is no need to lock it". The second test was correct, although a bit subtle and worth just re-iterating here. There are two kinds of other sources of references to the same file descriptor: file descriptors that have been explicitly shared across fork() or with dup(), and file tables having elevated reference counts due to threading (or explicit file sharing with clone()). The first case would have incremented the file count explicitly, and in the second case the previous __fdget() would have incremented it for us and set the FDPUT_FPUT flag. But in both cases the file count would be greater than one, so the "file_count(file) > 1" test catches both situations. Also note that if file_count is 1, that also means that no other thread can have access to the file table, so there also cannot be races with concurrent calls to dup()/fork()/clone() that would increment the file count any other way. Link: https://lore.kernel.org/linux-fsdevel/20190413184404.GA13490@deco.navytux.spb.ru Cc: Kirill Smelkov <kirr@nexedi.com> Cc: Eic Dumazet <edumazet@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Marco Elver <elver@google.com> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Paul McKenney <paulmck@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 11月, 2019 2 次提交
-
-
由 Christoph Hellwig 提交于
In general drivers should never mess with partition tables directly. Unfortunately s390 and loop do for somewhat historic reasons, but they can use bdev_disk_changed directly instead when we export it as they satisfy the sanity checks we have in __blkdev_reread_part. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> [dasd] Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Large parts of rescan_partitions aren't about partitions, and moving it to block_dev.c will allow for some further cleanups by merging it into its only caller. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 10月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
These use the same scheme as the pre-existing mapping of the XFS RESVP ioctls to ->falloc, so just extend it and remove the XFS implementation. Signed-off-by: NChristoph Hellwig <hch@lst.de> [darrick: fix compile error on s390] Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 23 10月, 2019 1 次提交
-
-
由 Arnd Bergmann 提交于
Many drivers have ioctl() handlers that are completely compatible between 32-bit and 64-bit architectures, except for the argument that is passed down from user space and may have to be passed through compat_ptr() in order to become a valid 64-bit pointer. Using ".compat_ptr = compat_ptr_ioctl" in file operations should let us simplify a lot of those drivers to avoid #ifdef checks, and convert additional drivers that don't have proper compat handling yet. On most architectures, the compat_ptr_ioctl() just passes all arguments to the corresponding ->ioctl handler. The exception is arch/s390, where compat_ptr() clears the top bit of a 32-bit pointer value, so user space pointers to the second 2GB alias the first 2GB, as is the case for native 32-bit s390 user space. The compat_ptr_ioctl() function must therefore be used only with ioctl functions that either ignore the argument or pass a pointer to a compatible data type. If any ioctl command handled by fops->unlocked_ioctl passes a plain integer instead of a pointer, or any of the passed data types is incompatible between 32-bit and 64-bit architectures, a proper handler is required instead of compat_ptr_ioctl. Signed-off-by: NArnd Bergmann <arnd@arndb.de> --- v3: add a better description v2: use compat_ptr_ioctl instead of generic_compat_ioctl_ptrarg, as suggested by Al Viro
-
- 25 9月, 2019 1 次提交
-
-
由 Song Liu 提交于
In previous patch, an application could put part of its text section in THP via madvise(). These THPs will be protected from writes when the application is still running (TXTBSY). However, after the application exits, the file is available for writes. This patch avoids writes to file THP by dropping page cache for the file when the file is open for write. A new counter nr_thps is added to struct address_space. In do_dentry_open(), if the file is open for write and nr_thps is non-zero, we drop page cache for the whole file. Link: http://lkml.kernel.org/r/20190801184244.3169074-8-songliubraving@fb.comSigned-off-by: NSong Liu <songliubraving@fb.com> Reported-by: Nkbuild test robot <lkp@intel.com> Acked-by: NRik van Riel <riel@surriel.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 8月, 2019 1 次提交
-
-
由 Jan Kara 提交于
Filesystems will need to call this function from their fadvise handlers. CC: stable@vger.kernel.org Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 30 8月, 2019 2 次提交
-
-
由 Deepa Dinamani 提交于
timespec_trunc() function is used to truncate a filesystem timestamp to the right granularity. But, the function does not clamp tv_sec part of the timestamps according to the filesystem timestamp limits. The replacement api: timestamp_truncate() also alters the signature of the function to accommodate filesystem timestamp clamping according to flesystem limits. Note that the tv_nsec part is set to 0 if tv_sec is not within the range supported for the filesystem. Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Acked-by: NJeff Layton <jlayton@kernel.org>
-
由 Deepa Dinamani 提交于
Add fields to the superblock to track the min and max timestamps supported by filesystems. Initially, when a superblock is allocated, initialize it to the max and min values the fields can hold. Individual filesystems override these to match their actual limits. Pseudo filesystems are assumed to always support the min and max allowable values for the fields. Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Acked-by: NJeff Layton <jlayton@kernel.org>
-
- 20 8月, 2019 1 次提交
-
-
由 Darrick J. Wong 提交于
Don't let userspace write to an active swap file because the kernel effectively has a long term lease on the storage and things could get seriously corrupted if we let this happen. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 19 8月, 2019 1 次提交
-
-
由 Jeff Layton 提交于
With the new file caching infrastructure in nfsd, we can end up holding files open for an indefinite period of time, even when they are still idle. This may prevent the kernel from handing out leases on the file, which is something we don't want to block. Fix this by running a SRCU notifier call chain whenever on any lease attempt. nfsd can then purge the cache for that inode before returning. Since SRCU is only conditionally compiled in, we must only define the new chain if it's enabled, and users of the chain must ensure that SRCU is enabled. Signed-off-by: NJeff Layton <jeff.layton@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 13 8月, 2019 1 次提交
-
-
由 Eric Biggers 提交于
Add a new fscrypt ioctl, FS_IOC_ADD_ENCRYPTION_KEY. This ioctl adds an encryption key to the filesystem's fscrypt keyring ->s_master_keys, making any files encrypted with that key appear "unlocked". Why we need this ~~~~~~~~~~~~~~~~ The main problem is that the "locked/unlocked" (ciphertext/plaintext) status of encrypted files is global, but the fscrypt keys are not. fscrypt only looks for keys in the keyring(s) the process accessing the filesystem is subscribed to: the thread keyring, process keyring, and session keyring, where the session keyring may contain the user keyring. Therefore, userspace has to put fscrypt keys in the keyrings for individual users or sessions. But this means that when a process with a different keyring tries to access encrypted files, whether they appear "unlocked" or not is nondeterministic. This is because it depends on whether the files are currently present in the inode cache. Fixing this by consistently providing each process its own view of the filesystem depending on whether it has the key or not isn't feasible due to how the VFS caches work. Furthermore, while sometimes users expect this behavior, it is misguided for two reasons. First, it would be an OS-level access control mechanism largely redundant with existing access control mechanisms such as UNIX file permissions, ACLs, LSMs, etc. Encryption is actually for protecting the data at rest. Second, almost all users of fscrypt actually do need the keys to be global. The largest users of fscrypt, Android and Chromium OS, achieve this by having PID 1 create a "session keyring" that is inherited by every process. This works, but it isn't scalable because it prevents session keyrings from being used for any other purpose. On general-purpose Linux distros, the 'fscrypt' userspace tool [1] can't similarly abuse the session keyring, so to make 'sudo' work on all systems it has to link all the user keyrings into root's user keyring [2]. This is ugly and raises security concerns. Moreover it can't make the keys available to system services, such as sshd trying to access the user's '~/.ssh' directory (see [3], [4]) or NetworkManager trying to read certificates from the user's home directory (see [5]); or to Docker containers (see [6], [7]). By having an API to add a key to the *filesystem* we'll be able to fix the above bugs, remove userspace workarounds, and clearly express the intended semantics: the locked/unlocked status of an encrypted directory is global, and encryption is orthogonal to OS-level access control. Why not use the add_key() syscall ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We use an ioctl for this API rather than the existing add_key() system call because the ioctl gives us the flexibility needed to implement fscrypt-specific semantics that will be introduced in later patches: - Supporting key removal with the semantics such that the secret is removed immediately and any unused inodes using the key are evicted; also, the eviction of any in-use inodes can be retried. - Calculating a key-dependent cryptographic identifier and returning it to userspace. - Allowing keys to be added and removed by non-root users, but only keys for v2 encryption policies; and to prevent denial-of-service attacks, users can only remove keys they themselves have added, and a key is only really removed after all users who added it have removed it. Trying to shoehorn these semantics into the keyrings syscalls would be very difficult, whereas the ioctls make things much easier. However, to reuse code the implementation still uses the keyrings service internally. Thus we get lockless RCU-mode key lookups without having to re-implement it, and the keys automatically show up in /proc/keys for debugging purposes. References: [1] https://github.com/google/fscrypt [2] https://goo.gl/55cCrI#heading=h.vf09isp98isb [3] https://github.com/google/fscrypt/issues/111#issuecomment-444347939 [4] https://github.com/google/fscrypt/issues/116 [5] https://bugs.launchpad.net/ubuntu/+source/fscrypt/+bug/1770715 [6] https://github.com/google/fscrypt/issues/128 [7] https://askubuntu.com/questions/1130306/cannot-run-docker-on-an-encrypted-filesystemReviewed-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NEric Biggers <ebiggers@google.com>
-
- 31 7月, 2019 1 次提交
-
-
由 Jan Kara 提交于
Commit 33ec3e53 ("loop: Don't change loop device under exclusive opener") made LOOP_SET_FD ioctl acquire exclusive block device reference while it updates loop device binding. However this can make perfectly valid mount(2) fail with EBUSY due to racing LOOP_SET_FD holding temporarily the exclusive bdev reference in cases like this: for i in {a..z}{a..z}; do dd if=/dev/zero of=$i.image bs=1k count=0 seek=1024 mkfs.ext2 $i.image mkdir mnt$i done echo "Run" for i in {a..z}{a..z}; do mount -o loop -t ext2 $i.image mnt$i & done Fix the problem by not getting full exclusive bdev reference in LOOP_SET_FD but instead just mark the bdev as being claimed while we update the binding information. This just blocks new exclusive openers instead of failing them with EBUSY thus fixing the problem. Fixes: 33ec3e53 ("loop: Don't change loop device under exclusive opener") Cc: stable@vger.kernel.org Tested-by: NKai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 7月, 2019 1 次提交
-
-
由 Eric Biggers 提交于
Analogous to fs/crypto/, add fields to the VFS inode and superblock for use by the fs/verity/ support layer: - ->s_vop: points to the fsverity_operations if the filesystem supports fs-verity, otherwise is NULL. - ->i_verity_info: points to cached fs-verity information for the inode after someone opens it, otherwise is NULL. - S_VERITY: bit in ->i_flags that identifies verity inodes, even when they haven't been opened yet and thus still have NULL ->i_verity_info. Reviewed-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: NEric Biggers <ebiggers@google.com>
-
- 04 7月, 2019 1 次提交
-
-
由 Benjamin Coddington 提交于
After the update to use nlm_lockowners for the NLM server, there are no more users of lm_compare_owner and lm_owner_key. Signed-off-by: NBenjamin Coddington <bcodding@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 01 7月, 2019 2 次提交
-
-
由 Darrick J. Wong 提交于
Create a generic checking function for the incoming FS_IOC_FSSETXATTR fsxattr values so that we can standardize some of the implementation behaviors. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NJan Kara <jack@suse.cz>
-
由 Darrick J. Wong 提交于
Create a generic function to check incoming FS_IOC_SETFLAGS flag values and later prepare the inode for updates so that we can standardize the implementations that follow ext4's flag values. Note that the efivarfs implementation no longer fails a no-op SETFLAGS without CAP_LINUX_IMMUTABLE since that's the behavior in ext*. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NChristoph Hellwig <hch@lst.de> Acked-by: NDavid Sterba <dsterba@suse.com> Reviewed-by: NBob Peterson <rpeterso@redhat.com>
-
- 21 6月, 2019 1 次提交
-
-
由 Ross Zwisler 提交于
In the spirit of filemap_fdatawait_range() and filemap_fdatawait_keep_errors(), introduce filemap_fdatawait_range_keep_errors() which both takes a range upon which to wait and does not clear errors from the address space. Signed-off-by: NRoss Zwisler <zwisler@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
-
- 19 6月, 2019 1 次提交
-
-
由 Amir Goldstein 提交于
check_conflicting_open() is checking for existing fd's open for read or for write before allowing to take a write lease. The check that was implemented using i_count and d_count is an approximation that has several false positives. For example, overlayfs since v4.19, takes an extra reference on the dentry; An open with O_PATH takes a reference on the dentry although the file cannot be read nor written. Change the implementation to use i_readcount and i_writecount to eliminate the false positive conflicts and allow a write lease to be taken on an overlayfs file. The change of behavior with existing fd's open with O_PATH is symmetric w.r.t. current behavior of lease breakers - an open with O_PATH currently does not break a write lease. This increases the size of struct inode by 4 bytes on 32bit archs when CONFIG_FILE_LOCKING is defined and CONFIG_IMA was not already defined. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NJeff Layton <jlayton@kernel.org>
-
- 10 6月, 2019 4 次提交
-
-
由 Amir Goldstein 提交于
The combination of file_remove_privs() and file_update_mtime() is quite common in filesystem ->write_iter() methods. Modelled after the helper file_accessed(), introduce file_modified() and use it from generic_remap_file_range_prep(). Note that the order of calling file_remove_privs() before file_update_mtime() in the helper was matched to the more common order by filesystems and not the current order in generic_remap_file_range_prep(). Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Amir Goldstein 提交于
Like the clone and dedupe interfaces we've recently fixed, the copy_file_range() implementation is missing basic sanity, limits and boundary condition tests on the parameters that are passed to it from userspace. Create a new "generic_copy_file_checks()" function modelled on the generic_remap_checks() function to provide this missing functionality. [Amir] Shorten copy length instead of checking pos_in limits because input file size already abides by the limits. Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Amir Goldstein 提交于
Factor out helper with some checks on in/out file that are common to clone_file_range and copy_file_range. Suggested-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Dave Chinner 提交于
Right now if vfs_copy_file_range() does not use any offload mechanism, it falls back to calling do_splice_direct(). This fails to do basic sanity checks on the files being copied. Before we start adding this necessarily functionality to the fallback path, separate it out into generic_copy_file_range(). generic_copy_file_range() has the same prototype as ->copy_file_range() so that filesystems can use it in their custom ->copy_file_range() method if they so choose. Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 09 6月, 2019 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
A recent documentation conversion renamed this file but forgot to update the links. Fixes: af96c1e3 ("docs: filesystems: vfs: Convert vfs.txt to RST") Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: NJonathan Corbet <corbet@lwn.net>
-
- 29 5月, 2019 1 次提交
-
-
由 Jan Kara 提交于
Proc filesystem has special locking rules for various files. Thus fanotify which opens files on event delivery can easily deadlock against another process that waits for fanotify permission event to be handled. Since permission events on /proc have doubtful value anyway, just disallow them. Link: https://lore.kernel.org/linux-fsdevel/20190320131642.GE9485@quack2.suse.cz/Reviewed-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 26 5月, 2019 4 次提交
-
-
由 David Howells 提交于
Kill sget_userns(), folding it into sget() as that's the only remaining user. Signed-off-by: NDavid Howells <dhowells@redhat.com> cc: linux-fsdevel@vger.kernel.org
-
由 Al Viro 提交于
... now that all other callers are gone Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 David Howells 提交于
Kill mount_ns() as it has been replaced by vfs_get_super() in the new mount API. Signed-off-by: NDavid Howells <dhowells@redhat.com> cc: linux-fsdevel@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Once upon a time we used to set ->d_name of e.g. pipefs root so that d_path() on pipes would work. These days it's completely pointless - dentries of pipes are not even connected to pipefs root. However, mount_pseudo() had set the root dentry name (passed as the second argument) and callers kept inventing names to pass to it. Including those that didn't *have* any non-root dentries to start with... All of that had been pointless for about 8 years now; it's time to get rid of that cargo-culting... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 03 5月, 2019 1 次提交
-
-
由 Jens Axboe 提交于
This just pulls out the ksys_sync_file_range() code to work on a struct file instead of an fd, so we can use it elsewhere. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 5月, 2019 1 次提交
-
-
由 Al Viro 提交于
A lot of ->destroy_inode() instances end with call_rcu() of a callback that does RCU-delayed part of freeing. Introduce a new method for doing just that, with saner signature. Rules: ->destroy_inode ->free_inode f g immediate call of f(), RCU-delayed call of g() f NULL immediate call of f(), no RCU-delayed calls NULL g RCU-delayed call of g() NULL NULL RCU-delayed default freeing IOW, NULL ->free_inode gives the same behaviour as now. Note that NULL, NULL is equivalent to NULL, free_inode_nonrcu; we could mandate the latter form, but that would have very little benefit beyond making rules a bit more symmetric. It would break backwards compatibility, require extra boilerplate and expected semantics for (NULL, NULL) pair would have no use whatsoever... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 26 4月, 2019 1 次提交
-
-
由 Gabriel Krisman Bertazi 提交于
This patch implements the actual support for case-insensitive file name lookups in ext4, based on the feature bit and the encoding stored in the superblock. A filesystem that has the casefold feature set is able to configure directories with the +F (EXT4_CASEFOLD_FL) attribute, enabling lookups to succeed in that directory in a case-insensitive fashion, i.e: match a directory entry even if the name used by userspace is not a byte per byte match with the disk name, but is an equivalent case-insensitive version of the Unicode string. This operation is called a case-insensitive file name lookup. The feature is configured as an inode attribute applied to directories and inherited by its children. This attribute can only be enabled on empty directories for filesystems that support the encoding feature, thus preventing collision of file names that only differ by case. * dcache handling: For a +F directory, Ext4 only stores the first equivalent name dentry used in the dcache. This is done to prevent unintentional duplication of dentries in the dcache, while also allowing the VFS code to quickly find the right entry in the cache despite which equivalent string was used in a previous lookup, without having to resort to ->lookup(). d_hash() of casefolded directories is implemented as the hash of the casefolded string, such that we always have a well-known bucket for all the equivalencies of the same string. d_compare() uses the utf8_strncasecmp() infrastructure, which handles the comparison of equivalent, same case, names as well. For now, negative lookups are not inserted in the dcache, since they would need to be invalidated anyway, because we can't trust missing file dentries. This is bad for performance but requires some leveraging of the vfs layer to fix. We can live without that for now, and so does everyone else. * on-disk data: Despite using a specific version of the name as the internal representation within the dcache, the name stored and fetched from the disk is a byte-per-byte match with what the user requested, making this implementation 'name-preserving'. i.e. no actual information is lost when writing to storage. DX is supported by modifying the hashes used in +F directories to make them case/encoding-aware. The new disk hashes are calculated as the hash of the full casefolded string, instead of the string directly. This allows us to efficiently search for file names in the htree without requiring the user to provide an exact name. * Dealing with invalid sequences: By default, when a invalid UTF-8 sequence is identified, ext4 will treat it as an opaque byte sequence, ignoring the encoding and reverting to the old behavior for that unique file. This means that case-insensitive file name lookup will not work only for that file. An optional bit can be set in the superblock telling the filesystem code and userspace tools to enforce the encoding. When that optional bit is set, any attempt to create a file name using an invalid UTF-8 sequence will fail and return an error to userspace. * Normalization algorithm: The UTF-8 algorithms used to compare strings in ext4 is implemented lives in fs/unicode, and is based on a previous version developed by SGI. It implements the Canonical decomposition (NFD) algorithm described by the Unicode specification 12.1, or higher, combined with the elimination of ignorable code points (NFDi) and full case-folding (CF) as documented in fs/unicode/utf8_norm.c. NFD seems to be the best normalization method for EXT4 because: - It has a lower cost than NFC/NFKC (which requires decomposing to NFD as an intermediary step) - It doesn't eliminate important semantic meaning like compatibility decompositions. Although: - This implementation is not completely linguistic accurate, because different languages have conflicting rules, which would require the specialization of the filesystem to a given locale, which brings all sorts of problems for removable media and for users who use more than one language. Signed-off-by: NGabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 25 4月, 2019 1 次提交
-
-
由 David Howells 提交于
Add two tracepoints for monitoring AFS file locking. Firstly, add one that follows the operational part: echo 1 >/sys/kernel/debug/tracing/events/afs/afs_flock_op/enable And add a second that more follows the event-driven part: echo 1 >/sys/kernel/debug/tracing/events/afs/afs_flock_ev/enable Individual file_lock structs seen by afs are tagged with debugging IDs that are displayed in the trace log to make it easier to see what's going on, especially as setting the first lock always seems to involve copying the file_lock twice. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
- 13 4月, 2019 1 次提交
-
-
由 Lukas Bulwahn 提交于
commit d7065da0 ("get rid of the magic around f_count in aio") added fput_atomic to include/linux/fs.h, motivated by its use in __aio_put_req() in fs/aio.c. Later, commit 3ffa3c0e ("aio: now fput() is OK from interrupt context; get rid of manual delayed __fput()") removed the only use of fput_atomic in __aio_put_req(), but did not remove the since then unused fput_atomic definition in include/linux/fs.h. We curate this now and finally remove the unused definition. This issue was identified during a code review due to a coccinelle warning from the atomic_as_refcounter.cocci rule pointing to the use of atomic_t in fput_atomic. Suggested-by: NKrystian Radlak <kradlak@exida.com> Signed-off-by: NLukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-