- 02 9月, 2020 1 次提交
-
-
由 David Howells 提交于
task #29263287 commit 3f19b2ab97a97b413c24b66c67ae16daa4f56c35 upstream Make the inode hash table RCU searchable so that searches that want to access or modify an inode without taking a ref on that inode can do so without taking the inode hash table lock. The main thing this requires is some RCU annotation on the list manipulation operations. Inodes are already freed by RCU in most cases. Users of this interface must take care as the inode may be still under construction or may be being torn down around them. There are at least three instances where this can be of use: (1) Testing whether the inode number iunique() is going to return is currently unique (the iunique_lock is still held). (2) Ext4 date stamp updating. (3) AFS callback breaking. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> cc: linux-ext4@vger.kernel.org cc: linux-afs@lists.infradead.org [jeffle: resolve collision in afs_break_one_callback since code base change] Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
- 22 6月, 2019 1 次提交
-
-
由 Alexander Lochmann 提交于
commit f69e749a49353d96af1a293f56b5b56de59c668a upstream. file_remove_privs() might be called for non-regular files, e.g. blkdev inode. There is no reason to do its job on things like blkdev inodes, pipes, or cdevs. Hence, abort if file does not refer to a regular inode. AV: more to the point, for devices there might be any number of inodes refering to given device. Which one to strip the permissions from, even if that made any sense in the first place? All of them will be observed with contents modified, after all. Found by LockDoc (Alexander Lochmann, Horst Schirmeier and Olaf Spinczyk) Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NAlexander Lochmann <alexander.lochmann@tu-dortmund.de> Signed-off-by: NHorst Schirmeier <horst.schirmeier@tu-dortmund.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Cc: Zubin Mithra <zsm@chromium.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 20 2月, 2019 1 次提交
-
-
由 Dave Chinner 提交于
commit 69056ee6a8a3d576ed31e38b3b14c70d6c74edcc upstream. This reverts commit a76cf1a474d7d ("mm: don't reclaim inodes with many attached pages"). This change causes serious changes to page cache and inode cache behaviour and balance, resulting in major performance regressions when combining worklaods such as large file copies and kernel compiles. https://bugzilla.kernel.org/show_bug.cgi?id=202441 This change is a hack to work around the problems introduced by changing how agressive shrinkers are on small caches in commit 172b06c3 ("mm: slowly shrink slabs with a relatively small number of objects"). It creates more problems than it solves, wasn't adequately reviewed or tested, so it needs to be reverted. Link: http://lkml.kernel.org/r/20190130041707.27750-2-david@fromorbit.com Fixes: a76cf1a474d7d ("mm: don't reclaim inodes with many attached pages") Signed-off-by: NDave Chinner <dchinner@redhat.com> Cc: Wolfgang Walter <linux@stwm.de> Cc: Roman Gushchin <guro@fb.com> Cc: Spock <dairinin@gmail.com> Cc: Rik van Riel <riel@surriel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 21 11月, 2018 1 次提交
-
-
由 Roman Gushchin 提交于
commit a76cf1a474d7dbcd9336b5f5afb0162baa142cf0 upstream. Spock reported that commit 172b06c3 ("mm: slowly shrink slabs with a relatively small number of objects") leads to a regression on his setup: periodically the majority of the pagecache is evicted without an obvious reason, while before the change the amount of free memory was balancing around the watermark. The reason behind is that the mentioned above change created some minimal background pressure on the inode cache. The problem is that if an inode is considered to be reclaimed, all belonging pagecache page are stripped, no matter how many of them are there. So, if a huge multi-gigabyte file is cached in the memory, and the goal is to reclaim only few slab objects (unused inodes), we still can eventually evict all gigabytes of the pagecache at once. The workload described by Spock has few large non-mapped files in the pagecache, so it's especially noticeable. To solve the problem let's postpone the reclaim of inodes, which have more than 1 attached page. Let's wait until the pagecache pages will be evicted naturally by scanning the corresponding LRU lists, and only then reclaim the inode structure. Link: http://lkml.kernel.org/r/20181023164302.20436-1-guro@fb.comSigned-off-by: NRoman Gushchin <guro@fb.com> Reported-by: NSpock <dairinin@gmail.com> Tested-by: NSpock <dairinin@gmail.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: <stable@vger.kernel.org> [4.19.x] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 04 8月, 2018 2 次提交
-
-
由 Miklos Szeredi 提交于
iput() ends up calling ->evict() on new inode, which is not yet initialized by owning fs. So use destroy_inode() instead. Add to sb->s_inodes list only if inode is not in I_CREATING state (meaning that it wasn't allocated with new_inode(), which already does the insertion). Reported-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 80ea09a0 ("vfs: factor out inode_insert5()") Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
We don't want open-by-handle picking half-set-up in-core struct inode from e.g. mkdir() having failed halfway through. In other words, we don't want such inodes returned by iget_locked() on their way to extinction. However, we can't just have them unhashed - otherwise open-by-handle immediately *after* that would've ended up creating a new in-core inode over the on-disk one that is in process of being freed right under us. Solution: new flag (I_CREATING) set by insert_inode_locked() and removed by unlock_new_inode() and a new primitive (discard_new_inode()) to be used by such halfway-through-setup failure exits instead of unlock_new_inode() / iput() combinations. That primitive unlocks new inode, but leaves I_CREATING in place. iget_locked() treats finding an I_CREATING inode as failure (-ESTALE, once we sort out the error propagation). insert_inode_locked() treats the same as instant -EBUSY. ilookup() treats those as icache miss. [Fix by Dan Carpenter <dan.carpenter@oracle.com> folded in] Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 18 7月, 2018 2 次提交
-
-
由 Miklos Szeredi 提交于
This reverts commit 598e3c8f. Overlayfs no longer relies on the vfs correct atime handling. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
This reverts commit cd91304e. Overlayfs no longer relies on the vfs correct atime handling. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 06 7月, 2018 1 次提交
-
-
由 Linus Torvalds 提交于
sgid directories have special semantics, making newly created files in the directory belong to the group of the directory, and newly created subdirectories will also become sgid. This is historically used for group-shared directories. But group directories writable by non-group members should not imply that such non-group members can magically join the group, so make sure to clear the sgid bit on non-directories for non-members (but remember that sgid without group execute means "mandatory locking", just to confuse things even more). Reported-by: NJann Horn <jannh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 6月, 2018 1 次提交
-
-
由 Deepa Dinamani 提交于
struct timespec is not y2038 safe. Transition vfs to use y2038 safe struct timespec64 instead. The change was made with the help of the following cocinelle script. This catches about 80% of the changes. All the header file and logic changes are included in the first 5 rules. The rest are trivial substitutions. I avoid changing any of the function signatures or any other filesystem specific data structures to keep the patch simple for review. The script can be a little shorter by combining different cases. But, this version was sufficient for my usecase. virtual patch @ depends on patch @ identifier now; @@ - struct timespec + struct timespec64 current_time ( ... ) { - struct timespec now = current_kernel_time(); + struct timespec64 now = current_kernel_time64(); ... - return timespec_trunc( + return timespec64_trunc( ... ); } @ depends on patch @ identifier xtime; @@ struct \( iattr \| inode \| kstat \) { ... - struct timespec xtime; + struct timespec64 xtime; ... } @ depends on patch @ identifier t; @@ struct inode_operations { ... int (*update_time) (..., - struct timespec t, + struct timespec64 t, ...); ... } @ depends on patch @ identifier t; identifier fn_update_time =~ "update_time$"; @@ fn_update_time (..., - struct timespec *t, + struct timespec64 *t, ...) { ... } @ depends on patch @ identifier t; @@ lease_get_mtime( ... , - struct timespec *t + struct timespec64 *t ) { ... } @te depends on patch forall@ identifier ts; local idexpression struct inode *inode_node; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn_update_time =~ "update_time$"; identifier fn; expression e, E3; local idexpression struct inode *node1; local idexpression struct inode *node2; local idexpression struct iattr *attr1; local idexpression struct iattr *attr2; local idexpression struct iattr attr; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; @@ ( ( - struct timespec ts; + struct timespec64 ts; | - struct timespec ts = current_time(inode_node); + struct timespec64 ts = current_time(inode_node); ) <+... when != ts ( - timespec_equal(&inode_node->i_xtime, &ts) + timespec64_equal(&inode_node->i_xtime, &ts) | - timespec_equal(&ts, &inode_node->i_xtime) + timespec64_equal(&ts, &inode_node->i_xtime) | - timespec_compare(&inode_node->i_xtime, &ts) + timespec64_compare(&inode_node->i_xtime, &ts) | - timespec_compare(&ts, &inode_node->i_xtime) + timespec64_compare(&ts, &inode_node->i_xtime) | ts = current_time(e) | fn_update_time(..., &ts,...) | inode_node->i_xtime = ts | node1->i_xtime = ts | ts = inode_node->i_xtime | <+... attr1->ia_xtime ...+> = ts | ts = attr1->ia_xtime | ts.tv_sec | ts.tv_nsec | btrfs_set_stack_timespec_sec(..., ts.tv_sec) | btrfs_set_stack_timespec_nsec(..., ts.tv_nsec) | - ts = timespec64_to_timespec( + ts = ... -) | - ts = ktime_to_timespec( + ts = ktime_to_timespec64( ...) | - ts = E3 + ts = timespec_to_timespec64(E3) | - ktime_get_real_ts(&ts) + ktime_get_real_ts64(&ts) | fn(..., - ts + timespec64_to_timespec(ts) ,...) ) ...+> ( <... when != ts - return ts; + return timespec64_to_timespec(ts); ...> ) | - timespec_equal(&node1->i_xtime1, &node2->i_xtime2) + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2) | - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2) + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2) | - timespec_compare(&node1->i_xtime1, &node2->i_xtime2) + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2) | node1->i_xtime1 = - timespec_trunc(attr1->ia_xtime1, + timespec64_trunc(attr1->ia_xtime1, ...) | - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2, + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2, ...) | - ktime_get_real_ts(&attr1->ia_xtime1) + ktime_get_real_ts64(&attr1->ia_xtime1) | - ktime_get_real_ts(&attr.ia_xtime1) + ktime_get_real_ts64(&attr.ia_xtime1) ) @ depends on patch @ struct inode *node; struct iattr *attr; identifier fn; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; expression e; @@ ( - fn(node->i_xtime); + fn(timespec64_to_timespec(node->i_xtime)); | fn(..., - node->i_xtime); + timespec64_to_timespec(node->i_xtime)); | - e = fn(attr->ia_xtime); + e = fn(timespec64_to_timespec(attr->ia_xtime)); ) @ depends on patch forall @ struct inode *node; struct iattr *attr; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); fn (..., - &attr->ia_xtime, + &ts, ...); ) ...+> } @ depends on patch forall @ struct inode *node; struct iattr *attr; struct kstat *stat; identifier ia_xtime =~ "^ia_[acm]time$"; identifier i_xtime =~ "^i_[acm]time$"; identifier xtime =~ "^[acm]time$"; identifier fn, ret; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime); + &ts); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime); + &ts); | + ts = timespec64_to_timespec(stat->xtime); ret = fn (..., - &stat->xtime); + &ts); ) ...+> } @ depends on patch @ struct inode *node; struct inode *node2; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier i_xtime3 =~ "^i_[acm]time$"; struct iattr *attrp; struct iattr *attrp2; struct iattr attr ; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; struct kstat *stat; struct kstat stat1; struct timespec64 ts; identifier xtime =~ "^[acmb]time$"; expression e; @@ ( ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ; | node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | stat->xtime = node2->i_xtime1; | stat1.xtime = node2->i_xtime1; | ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ; | ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2; | - e = node->i_xtime1; + e = timespec64_to_timespec( node->i_xtime1 ); | - e = attrp->ia_xtime1; + e = timespec64_to_timespec( attrp->ia_xtime1 ); | node->i_xtime1 = current_time(...); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | - node->i_xtime1 = e; + node->i_xtime1 = timespec_to_timespec64(e); ) Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Cc: <anton@tuxera.com> Cc: <balbi@kernel.org> Cc: <bfields@fieldses.org> Cc: <darrick.wong@oracle.com> Cc: <dhowells@redhat.com> Cc: <dsterba@suse.com> Cc: <dwmw2@infradead.org> Cc: <hch@lst.de> Cc: <hirofumi@mail.parknet.co.jp> Cc: <hubcap@omnibond.com> Cc: <jack@suse.com> Cc: <jaegeuk@kernel.org> Cc: <jaharkes@cs.cmu.edu> Cc: <jslaby@suse.com> Cc: <keescook@chromium.org> Cc: <mark@fasheh.com> Cc: <miklos@szeredi.hu> Cc: <nico@linaro.org> Cc: <reiserfs-devel@vger.kernel.org> Cc: <richard@nod.at> Cc: <sage@redhat.com> Cc: <sfrench@samba.org> Cc: <swhiteho@redhat.com> Cc: <tj@kernel.org> Cc: <trond.myklebust@primarydata.com> Cc: <tytso@mit.edu> Cc: <viro@zeniv.linux.org.uk>
-
- 31 5月, 2018 2 次提交
-
-
由 Miklos Szeredi 提交于
Split out common helper for race free insertion of an already allocated inode into the cache. Use this from iget5_locked() and insert_inode_locked4(). Make iget5_locked() use new_inode()/iput() instead of alloc_inode()/destroy_inode() directly. Also export to modules for use by filesystems which want to preallocate an inode before file/directory creation. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Darrick J. Wong 提交于
In inode_init_always(), we clear the inode mapping flags, which clears any retained error (AS_EIO, AS_ENOSPC) bits. Unfortunately, we do not also clear wb_err, which means that old mapping errors can leak through to new inodes. This is crucial for the XFS inode allocation path because we recycle old in-core inodes and we do not want error state from an old file to leak into the new file. This bug was discovered by running generic/036 and generic/047 in a loop and noticing that the EIOs generated by the collision of direct and buffered writes in generic/036 would survive the remount between 036 and 047, and get reported to the fsyncs (on different files!) in generic/047. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 30 5月, 2018 1 次提交
-
-
由 Darrick J. Wong 提交于
In inode_init_always(), we clear the inode mapping flags, which clears any retained error (AS_EIO, AS_ENOSPC) bits. Unfortunately, we do not also clear wb_err, which means that old mapping errors can leak through to new inodes. This is crucial for the XFS inode allocation path because we recycle old in-core inodes and we do not want error state from an old file to leak into the new file. This bug was discovered by running generic/036 and generic/047 in a loop and noticing that the EIOs generated by the collision of direct and buffered writes in generic/036 would survive the remount between 036 and 047, and get reported to the fsyncs (on different files!) in generic/047. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 26 5月, 2018 1 次提交
-
-
由 Deepa Dinamani 提交于
As vfs moves to using struct timespec64 to represent times, update the argument to timespec_truncate() to use struct timespec64. Also change the name of the function. The rest of the implementation logic is the same. Move this to fs/inode.c instead of kernel/time/time.c as all the users of this api are filesystems. Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Cc: <viro@zeniv.linux.org.uk>
-
- 12 4月, 2018 1 次提交
-
-
由 Matthew Wilcox 提交于
Remove the address_space ->tree_lock and use the xa_lock newly added to the radix_tree_root. Rename the address_space ->page_tree to ->i_pages, since we don't really care that it's a tree. [willy@infradead.org: fix nds32, fs/dax.c] Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com> Acked-by: NJeff Layton <jlayton@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 3月, 2018 2 次提交
-
-
由 Dave Chinner 提交于
Noticed when looking at why cycling 600k inodes/s through the inode cache was taking a total of 8% cpu in memset() during inode initialisation. There is no need to zero the inode.i_data structure twice. This increases single threaded bulkstat throughput from ~200,000 inodes/s to ~220,000 inodes/s, so we save a substantial amount of CPU time per inode init by doing this. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
__mark_inode_dirty already takes care of that, and for the XFS lazytime implementation we need to know that ->dirty_inode was called because I_DIRTY_TIME was set. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 07 2月, 2018 1 次提交
-
-
由 Shakeel Butt 提交于
Commit 7994e6f7 ("vfs: Move waiting for inode writeback from end_writeback() to evict_inode()") removed inode_sync_wait() from end_writeback() and commit dbd5768f ("vfs: Rename end_writeback() to clear_inode()") renamed end_writeback() to clear_inode(). After these patches there is no sleeping operation in clear_inode(). So, remove might_sleep() from it. Link: http://lkml.kernel.org/r/20171108004354.40308-1-shakeelb@google.comSigned-off-by: NShakeel Butt <shakeelb@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Greg Thelen <gthelen@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 1月, 2018 2 次提交
-
-
由 Jeff Layton 提交于
We only really need to update i_version if someone has queried for it since we last incremented it. By doing that, we can avoid having to update the inode if the times haven't changed. If the times have changed, then we go ahead and forcibly increment the counter, under the assumption that we'll be going to the storage anyway, and the increment itself is relatively cheap. Signed-off-by: NJeff Layton <jlayton@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz>
-
由 Jeff Layton 提交于
Add a documentation blob that explains what the i_version field is, how it is expected to work, and how it is currently implemented by various filesystems. We already have inode_inc_iversion. Add several other functions for manipulating and accessing the i_version counter. For now, the implementation is trivial and basically works the way that all of the open-coded i_version accesses work today. Future patches will convert existing users of i_version to use the new API, and then convert the backend implementation to do things more efficiently. Signed-off-by: NJeff Layton <jlayton@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 28 11月, 2017 1 次提交
-
-
由 Linus Torvalds 提交于
This is a pure automated search-and-replace of the internal kernel superblock flags. The s_flags are now called SB_*, with the names and the values for the moment mirroring the MS_* flags that they're equivalent to. Note how the MS_xyz flags are the ones passed to the mount system call, while the SB_xyz flags are what we then use in sb->s_flags. The script to do this was: # places to look in; re security/*: it generally should *not* be # touched (that stuff parses mount(2) arguments directly), but # there are two places where we really deal with superblock flags. FILES="drivers/mtd drivers/staging/lustre fs ipc mm \ include/linux/fs.h include/uapi/linux/bfs_fs.h \ security/apparmor/apparmorfs.c security/apparmor/include/lib.h" # the list of MS_... constants SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \ DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \ POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \ I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \ ACTIVE NOUSER" SED_PROG= for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done # we want files that contain at least one of MS_..., # with fs/namespace.c and fs/pnode.c excluded. L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c') for f in $L; do sed -i $f $SED_PROG; done Requested-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 10月, 2017 1 次提交
-
-
由 Mark Rutland 提交于
locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() Please do not apply this to mainline directly, instead please re-run the coccinelle script shown below and apply its output. For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't harmful, and changing them results in churn. However, for some features, the read/write distinction is critical to correct operation. To distinguish these cases, separate read/write accessors must be used. This patch migrates (most) remaining ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following coccinelle script: ---- // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and // WRITE_ONCE() // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) ---- Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: linux-arch@vger.kernel.org Cc: mpe@ellerman.id.au Cc: shuah@kernel.org Cc: snitzer@redhat.com Cc: thor.thayer@linux.intel.com Cc: tj@kernel.org Cc: viro@zeniv.linux.org.uk Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 9月, 2017 1 次提交
-
-
由 Davidlohr Bueso 提交于
Allow interval trees to quickly check for overlaps to avoid unnecesary tree lookups in interval_tree_iter_first(). As of this patch, all interval tree flavors will require using a 'rb_root_cached' such that we can have the leftmost node easily available. While most users will make use of this feature, those with special functions (in addition to the generic insert, delete, search calls) will avoid using the cached option as they can do funky things with insertions -- for example, vma_interval_tree_insert_after(). [jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()] Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de> Signed-off-by: NJérôme Glisse <jglisse@redhat.com> Acked-by: NChristian König <christian.koenig@amd.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NDoug Ledford <dledford@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Jason Wang <jasowang@redhat.com> Cc: Christian Benvenuti <benve@cisco.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 9月, 2017 1 次提交
-
-
由 Miklos Szeredi 提交于
Need to treat non-regular overlayfs files the same as regular files when checking for an atime update. Add a d_real() flag to make it return the upper dentry for all file types. Reported-by: N"zhangyi (F)" <yi.zhang@huawei.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 02 9月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
When we introduced the bmap redo log items, we set MS_ACTIVE on the mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes from being truncated prematurely during log recovery. This also had the effect of putting linked inodes on the lru instead of evicting them. Unfortunately, we neglected to find all those unreferenced lru inodes and evict them after finishing log recovery, which means that we leak them if anything goes wrong in the rest of xfs_mountfs, because the lru is only cleaned out on unmount. Therefore, evict unreferenced inodes in the lru list immediately after clearing MS_ACTIVE. Fixes: 17c12bcd ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped") Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Cc: viro@ZenIV.linux.org.uk Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 07 7月, 2017 1 次提交
-
-
由 Pavel Tatashin 提交于
Update dcache, inode, pid, mountpoint, and mount hash tables to use HASH_ZERO, and remove initialization after allocations. In case of places where HASH_EARLY was used such as in __pv_init_lock_hash the zeroed hash table was already assumed, because memblock zeroes the memory. CPU: SPARC M6, Memory: 7T Before fix: Dentry cache hash table entries: 1073741824 Inode-cache hash table entries: 536870912 Mount-cache hash table entries: 16777216 Mountpoint-cache hash table entries: 16777216 ftrace: allocating 20414 entries in 40 pages Total time: 11.798s After fix: Dentry cache hash table entries: 1073741824 Inode-cache hash table entries: 536870912 Mount-cache hash table entries: 16777216 Mountpoint-cache hash table entries: 16777216 ftrace: allocating 20414 entries in 40 pages Total time: 3.198s CPU: Intel Xeon E5-2630, Memory: 2.2T: Before fix: Dentry cache hash table entries: 536870912 Inode-cache hash table entries: 268435456 Mount-cache hash table entries: 8388608 Mountpoint-cache hash table entries: 8388608 CPU: Physical Processor ID: 0 Total time: 3.245s After fix: Dentry cache hash table entries: 536870912 Inode-cache hash table entries: 268435456 Mount-cache hash table entries: 8388608 Mountpoint-cache hash table entries: 8388608 CPU: Physical Processor ID: 0 Total time: 3.244s Link: http://lkml.kernel.org/r/1488432825-92126-4-git-send-email-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: NBabu Moger <babu.moger@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 6月, 2017 1 次提交
-
-
由 Kees Cook 提交于
Checking for capabilities should be the last operation when performing access control tests so that PF_SUPERPRIV is set only when it was required for success (implying that the capability was needed for the operation). Reported-by: NSolar Designer <solar@openwall.com> Signed-off-by: NKees Cook <keescook@chromium.org> Acked-by: NSerge Hallyn <serge@hallyn.com> Reviewed-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 28 6月, 2017 1 次提交
-
-
由 Jens Axboe 提交于
Define a set of write life time hints: RWH_WRITE_LIFE_NOT_SET No hint information set RWH_WRITE_LIFE_NONE No hints about write life time RWH_WRITE_LIFE_SHORT Data written has a short life time RWH_WRITE_LIFE_MEDIUM Data written has a medium life time RWH_WRITE_LIFE_LONG Data written has a long life time RWH_WRITE_LIFE_EXTREME Data written has an extremely long life time The intent is for these values to be relative to each other, no absolute meaning should be attached to these flag names. Add an fcntl interface for querying these flags, and also for setting them as well: F_GET_RW_HINT Returns the read/write hint set on the underlying inode. F_SET_RW_HINT Set one of the above write hints on the underlying inode. F_GET_FILE_RW_HINT Returns the read/write hint set on the file descriptor. F_SET_FILE_RW_HINT Set one of the above write hints on the file descriptor. The user passes in a 64-bit pointer to get/set these values, and the interface returns 0/-1 on success/error. Sample program testing/implementing basic setting/getting of write hints is below. Add support for storing the write life time hint in the inode flags and in struct file as well, and pass them to the kiocb flags. If both a file and its corresponding inode has a write hint, then we use the one in the file, if available. The file hint can be used for sync/direct IO, for buffered writeback only the inode hint is available. This is in preparation for utilizing these hints in the block layer, to guide on-media data placement. /* * writehint.c: get or set an inode write hint */ #include <stdio.h> #include <fcntl.h> #include <stdlib.h> #include <unistd.h> #include <stdbool.h> #include <inttypes.h> #ifndef F_GET_RW_HINT #define F_LINUX_SPECIFIC_BASE 1024 #define F_GET_RW_HINT (F_LINUX_SPECIFIC_BASE + 11) #define F_SET_RW_HINT (F_LINUX_SPECIFIC_BASE + 12) #endif static char *str[] = { "RWF_WRITE_LIFE_NOT_SET", "RWH_WRITE_LIFE_NONE", "RWH_WRITE_LIFE_SHORT", "RWH_WRITE_LIFE_MEDIUM", "RWH_WRITE_LIFE_LONG", "RWH_WRITE_LIFE_EXTREME" }; int main(int argc, char *argv[]) { uint64_t hint; int fd, ret; if (argc < 2) { fprintf(stderr, "%s: file <hint>\n", argv[0]); return 1; } fd = open(argv[1], O_RDONLY); if (fd < 0) { perror("open"); return 2; } if (argc > 2) { hint = atoi(argv[2]); ret = fcntl(fd, F_SET_RW_HINT, &hint); if (ret < 0) { perror("fcntl: F_SET_RW_HINT"); return 4; } } ret = fcntl(fd, F_GET_RW_HINT, &hint); if (ret < 0) { perror("fcntl: F_GET_RW_HINT"); return 3; } printf("%s: hint %s\n", argv[1], str[hint]); close(fd); return 0; } Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 6月, 2017 1 次提交
-
-
由 Ingo Molnar 提交于
Rename 'struct wait_bit_queue::wait' to ::wq_entry, to more clearly name it as a wait-queue entry. Propagate it to a couple of usage sites where the wait-bit-queue internals are exposed. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 5月, 2017 1 次提交
-
-
由 Masahiro Yamada 提交于
Fix typos and add the following to the scripts/spelling.txt: intialisation||initialisation intialised||initialised intialise||initialise This commit does not intend to change the British spelling itself. Link: http://lkml.kernel.org/r/1481573103-11329-18-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 5月, 2017 1 次提交
-
-
由 Josef Bacik 提交于
By default we set DCACHE_REFERENCED and I_REFERENCED on any dentry or inode we create. This is problematic as this means that it takes two trips through the LRU for any of these objects to be reclaimed, regardless of their actual lifetime. With enough pressure from these caches we can easily evict our working set from page cache with single use objects. So instead only set *REFERENCED if we've already been added to the LRU list. This means that we've been touched since the first time we were accessed, and so more likely to need to hang out in cache. To illustrate this issue I wrote the following scripts https://github.com/josefbacik/debug-scripts/tree/master/cache-pressure on my test box. It is a single socket 4 core CPU with 16gib of RAM and I tested on an Intel 2tib NVME drive. The cache-pressure.sh script creates a new file system and creates 2 6.5gib files in order to take up 13gib of the 16gib of ram with pagecache. Then it runs a test program that reads these 2 files in a loop, and keeps track of how often it has to read bytes for each loop. On an ideal system with no pressure we should have to read 0 bytes indefinitely. The second thing this script does is start a fs_mark job that creates a ton of 0 length files, putting pressure on the system with slab only allocations. On exit the script prints out how many bytes were read by the read-file program. The results are as follows Without patch: /mnt/btrfs-test/reads/file1: total read during loops 27262988288 /mnt/btrfs-test/reads/file2: total read during loops 27262976000 With patch: /mnt/btrfs-test/reads/file2: total read during loops 18640457728 /mnt/btrfs-test/reads/file1: total read during loops 9565376512 This patch results in a 50% reduction of the amount of pages evicted from our working set. Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 10 4月, 2017 2 次提交
-
-
由 Jan Kara 提交于
Currently we free fsnotify_mark_connector structure only when inode / vfsmount is getting freed. This can however impose noticeable memory overhead when marks get attached to inodes only temporarily. So free the connector structure once the last mark is detached from the object. Since notification infrastructure can be working with the connector under the protection of fsnotify_mark_srcu, we have to be careful and free the fsnotify_mark_connector only after SRCU period passes. Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com> Reviewed-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Jan Kara 提交于
Currently notification marks are attached to object (inode or vfsmnt) by a hlist_head in the object. The list is also protected by a spinlock in the object. So while there is any mark attached to the list of marks, the object must be pinned in memory (and thus e.g. last iput() deleting inode cannot happen). Also for list iteration in fsnotify() to work, we must hold fsnotify_mark_srcu lock so that mark itself and mark->obj_list.next cannot get freed. Thus we are required to wait for response to fanotify events from userspace process with fsnotify_mark_srcu lock held. That causes issues when userspace process is buggy and does not reply to some event - basically the whole notification subsystem gets eventually stuck. So to be able to drop fsnotify_mark_srcu lock while waiting for response, we have to pin the mark in memory and make sure it stays in the object list (as removing the mark waiting for response could lead to lost notification events for groups later in the list). However we don't want inode reclaim to block on such mark as that would lead to system just locking up elsewhere. This commit is the first in the series that paves way towards solving these conflicting lifetime needs. Instead of anchoring the list of marks directly in the object, we anchor it in a dedicated structure (fsnotify_mark_connector) and just point to that structure from the object. The following commits will also add spinlock protecting the list and object pointer to the structure. Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com> Reviewed-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 08 10月, 2016 1 次提交
-
-
由 Andreas Gruenbacher 提交于
The IOP_XATTR inode operations flag in inode->i_opflags indicates that the inode has xattr support. The flag is automatically set by new_inode() on filesystems with xattr support (where sb->s_xattr is defined), and cleared otherwise. Filesystems can explicitly clear it for inodes that should not have xattr support. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 28 9月, 2016 2 次提交
-
-
由 Deepa Dinamani 提交于
current_fs_time() uses struct super_block* as an argument. As per Linus's suggestion, this is changed to take struct inode* as a parameter instead. This is because the function is primarily meant for vfs inode timestamps. Also the function was renamed as per Arnd's suggestion. Change all calls to current_fs_time() to use the new current_time() function instead. current_fs_time() will be deleted. Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Deepa Dinamani 提交于
current_fs_time() is used for inode timestamps. Change the signature of the function to take inode pointer instead of superblock as per Linus's suggestion. Also, move the api under vfs as per the discussion on the thread: https://lkml.org/lkml/2016/6/9/36 . As per Arnd's suggestion on the thread, changing the function name. current_fs_time() will be deleted after all the references to it are replaced by current_time(). There was a bug reported by kbuild test bot with the change as some of the calls to current_time() were made before the super_block was initialized. Catch these accidental assignments as timespec_trunc() does for wrong granularities. This allows for the function to work right even in these circumstances. But, adds a warning to make the user aware of the bug. A coccinelle script was used to identify all the current .alloc_inode super_block callbacks that updated inode timestamps. proc filesystem was the only one that was modifying inode times as part of this callback. The series includes a patch to fix that. Note that timespec_trunc() will also be moved to fs/inode.c in a separate patch when this will need to be revamped for bounds checking purposes. Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 16 9月, 2016 1 次提交
-
-
由 Miklos Szeredi 提交于
On overlayfs relatime_need_update() needs inode times to be correct on overlay inode. But i_mtime and i_ctime are updated by filesystem code on underlying inode only, so they will be out-of-date on the overlay inode. This patch copies the times from the underlying inode if needed. This can't be done if called from RCU lookup (link following) but link m/ctime are not updated by fs, so this is all right. This patch doesn't change functionality for anything but overlayfs. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 03 8月, 2016 3 次提交
-
-
由 Miklos Szeredi 提交于
Only used by the vfs. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
file_remove_privs() is called with inode lock on file_inode(), which proceeds to calling notify_change() on file->f_path.dentry. Which triggers the WARN_ON_ONCE(!inode_is_locked(inode)) in addition to deadlocking later when ovl_setattr tries to lock the underlying inode again. Fix this mess by not mixing the layers, but doing everything on underlying dentry/inode. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Fixes: 07a2daab ("ovl: Copy up underlying inode's ->i_mode to overlay inode") Cc: <stable@vger.kernel.org>
-
由 Vladimir Davydov 提交于
Radix trees may be used not only for storing page cache pages, so unconditionally accounting radix tree nodes to the current memory cgroup is bad: if a radix tree node is used for storing data shared among different cgroups we risk pinning dead memory cgroups forever. So let's only account radix tree nodes if it was explicitly requested by passing __GFP_ACCOUNT to INIT_RADIX_TREE. Currently, we only want to account page cache entries, so mark mapping->page_tree so. Fixes: 58e698af ("radix-tree: account radix_tree_node to memory cgroup") Link: http://lkml.kernel.org/r/1470057188-7864-1-git-send-email-vdavydov@virtuozzo.comSigned-off-by: NVladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> [4.6+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-