- 20 7月, 2011 3 次提交
-
-
由 Dave Chinner 提交于
The inode unused list is currently a global LRU. This does not match the other global filesystem cache - the dentry cache - which uses per-superblock LRU lists. Hence we have related filesystem object types using different LRU reclaimation schemes. To enable a per-superblock filesystem cache shrinker, both of these caches need to have per-sb unused object LRU lists. Hence this patch converts the global inode LRU to per-sb LRUs. The patch only does rudimentary per-sb propotioning in the shrinker infrastructure, as this gets removed when the per-sb shrinker callouts are introduced later on. Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Call the given function for all superblocks of given type. Function gets a superblock (with s_umount locked shared) and (void *) argument supplied by caller of iterator. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 04 6月, 2011 1 次提交
-
-
由 Al Viro 提交于
Caching "we have already removed suid/caps" was overenthusiastic as merged. On network filesystems we might have had suid/caps set on another client, silently picked by this client on revalidate, all of that *without* clearing the S_NOSEC flag. AFAICS, the only reasonably sane way to deal with that is * new superblock flag; unless set, S_NOSEC is not going to be set. * local block filesystems set it in their ->mount() (more accurately, mount_bdev() does, so does btrfs ->mount(), users of mount_bdev() other than local block ones clear it) * if any network filesystem (or a cluster one) wants to use S_NOSEC, it'll need to set MS_NOSEC in sb->s_flags *AND* take care to clear S_NOSEC when inode attribute changes are picked from other clients. It's not an earth-shattering hole (anybody that can set suid on another client will almost certainly be able to write to the file before doing that anyway), but it's a bug that needs fixing. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 27 5月, 2011 1 次提交
-
-
由 Dan Magenheimer 提交于
This fourth patch of eight in this cleancache series provides the core hooks in VFS for: initializing cleancache per filesystem; capturing clean pages reclaimed by page cache; attempting to get pages from cleancache before filesystem read; and ensuring coherency between pagecache, disk, and cleancache. Note that the placement of these hooks was stable from 2.6.18 to 2.6.38; a minor semantic change was required due to a patchset in 2.6.39. All hooks become no-ops if CONFIG_CLEANCACHE is unset, or become a check of a boolean global if CONFIG_CLEANCACHE is set but no cleancache "backend" has claimed cleancache_ops. Details and a FAQ can be found in Documentation/vm/cleancache.txt [v8: minchan.kim@gmail.com: adapt to new remove_from_page_cache function] Signed-off-by: NChris Mason <chris.mason@oracle.com> Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com> Reviewed-by: NJeremy Fitzhardinge <jeremy@goop.org> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Jan Beulich <JBeulich@novell.com> Cc: Andreas Dilger <adilger@sun.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Nitin Gupta <ngupta@vflare.org>
-
- 19 5月, 2011 1 次提交
-
-
由 Jeff Layton 提交于
I originally intended to remove this warning in 2.6.34, but it's not in a high performance codepath and might help us to catch bugs later. Let's keep it, but fix the comment to allay confusion about its removal. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
- 18 3月, 2011 1 次提交
-
-
由 Al Viro 提交于
new function: mount_fs(). Does all work done by vfs_kern_mount() except the allocation and filling of vfsmount; returns root dentry or ERR_PTR(). vfs_kern_mount() switched to using it and taken to fs/namespace.c, along with its wrappers. alloc_vfsmnt()/free_vfsmnt() made static. functions in namespace.c slightly reordered. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 17 3月, 2011 2 次提交
-
-
由 Jens Axboe 提交于
We don't have proper reference counting for this yet, so we run into cases where the device is pulled and we OOPS on flushing the fs data. This happens even though the dirty inodes have already been migrated to the default_backing_dev_info. Reported-by: NTorsten Hilbrich <torsten.hilbrich@secunet.com> Tested-by: NTorsten Hilbrich <torsten.hilbrich@secunet.com> Cc: stable@kernel.org Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Al Viro 提交于
This is an ex-parrot. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 12 2月, 2011 1 次提交
-
-
由 Boaz Harrosh 提交于
In commit fa0d7e3d ("fs: icache RCU free inodes"), we use rcu free inode instead of freeing the inode directly. It causes a crash when we rmmod immediately after we umount the volume[1]. So we need to call rcu_barrier after we kill_sb so that the inode is freed before we do rmmod. The idea is inspired by Aneesh Kumar. rcu_barrier will wait for all callbacks to end before preceding. The original patch was done by Tao Ma, but synchronize_rcu() is not enough here. 1. http://marc.info/?l=linux-fsdevel&m=129680863330185&w=2Tested-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 1月, 2011 1 次提交
-
-
由 Al Viro 提交于
Instead of splitting refcount between (per-cpu) mnt_count and (SMP-only) mnt_longrefs, make all references contribute to mnt_count again and keep track of how many are longterm ones. Accounting rules for longterm count: * 1 for each fs_struct.root.mnt * 1 for each fs_struct.pwd.mnt * 1 for having non-NULL ->mnt_ns * decrement to 0 happens only under vfsmount lock exclusive That allows nice common case for mntput() - since we can't drop the final reference until after mnt_longterm has reached 0 due to the rules above, mntput() can grab vfsmount lock shared and check mnt_longterm. If it turns out to be non-zero (which is the common case), we know that this is not the final mntput() and can just blindly decrement percpu mnt_count. Otherwise we grab vfsmount lock exclusive and do usual decrement-and-check of percpu mnt_count. For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm(); namespace.c uses the latter in places where we don't already hold vfsmount lock exclusive and opencodes a few remaining spots where we need to manipulate mnt_longterm. Note that we mostly revert the code outside of fs/namespace.c back to what we used to have; in particular, normal code doesn't need to care about two kinds of references, etc. And we get to keep the optimization Nick's variant had bought us... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 07 1月, 2011 2 次提交
-
-
由 Nick Piggin 提交于
The problem that this patch aims to fix is vfsmount refcounting scalability. We need to take a reference on the vfsmount for every successful path lookup, which often go to the same mount point. The fundamental difficulty is that a "simple" reference count can never be made scalable, because any time a reference is dropped, we must check whether that was the last reference. To do that requires communication with all other CPUs that may have taken a reference count. We can make refcounts more scalable in a couple of ways, involving keeping distributed counters, and checking for the global-zero condition less frequently. - check the global sum once every interval (this will delay zero detection for some interval, so it's probably a showstopper for vfsmounts). - keep a local count and only taking the global sum when local reaches 0 (this is difficult for vfsmounts, because we can't hold preempt off for the life of a reference, so a counter would need to be per-thread or tied strongly to a particular CPU which requires more locking). - keep a local difference of increments and decrements, which allows us to sum the total difference and hence find the refcount when summing all CPUs. Then, keep a single integer "long" refcount for slow and long lasting references, and only take the global sum of local counters when the long refcount is 0. This last scheme is what I implemented here. Attached mounts and process root and working directory references are "long" references, and everything else is a short reference. This allows scalable vfsmount references during path walking over mounted subtrees and unattached (lazy umounted) mounts with processes still running in them. This results in one fewer atomic op in the fastpath: mntget is now just a per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock and non-atomic decrement in the common case. However code is otherwise bigger and heavier, so single threaded performance is basically a wash. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Nick Piggin 提交于
We can turn the dcache hash locking from a global dcache_hash_lock into per-bucket locking. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
- 13 11月, 2010 2 次提交
-
-
由 Tejun Heo 提交于
After recent blkdev_get() modifications, open_by_devnum() and open_bdev_exclusive() are simple wrappers around blkdev_get(). Replace them with blkdev_get_by_dev() and blkdev_get_by_path(). blkdev_get_by_dev() is identical to open_by_devnum(). blkdev_get_by_path() is slightly different in that it doesn't automatically add %FMODE_EXCL to @mode. All users are converted. Most conversions are mechanical and don't introduce any behavior difference. There are several exceptions. * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no reason to OR it explicitly on blkdev_put(). * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in sb->s_mode. * With the above changes, sb->s_mode now always should contain FMODE_EXCL. WARN_ON_ONCE() added to kill_block_super() to detect errors. The new blkdev_get_*() functions are with proper docbook comments. While at it, add function description to blkdev_get() too. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Joern Engel <joern@lazybastard.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: reiserfs-devel@vger.kernel.org Cc: xfs-masters@oss.sgi.com Cc: Alexander Viro <viro@zeniv.linux.org.uk>
-
由 Tejun Heo 提交于
Over time, block layer has accumulated a set of APIs dealing with bdev open, close, claim and release. * blkdev_get/put() are the primary open and close functions. * bd_claim/release() deal with exclusive open. * open/close_bdev_exclusive() are combination of open and claim and the other way around, respectively. * bd_link/unlink_disk_holder() to create and remove holder/slave symlinks. * open_by_devnum() wraps bdget() + blkdev_get(). The interface is a bit confusing and the decoupling of open and claim makes it impossible to properly guarantee exclusive access as in-kernel open + claim sequence can disturb the existing exclusive open even before the block layer knows the current open if for another exclusive access. Reorganize the interface such that, * blkdev_get() is extended to include exclusive access management. @holder argument is added and, if is @FMODE_EXCL specified, it will gain exclusive access atomically w.r.t. other exclusive accesses. * blkdev_put() is similarly extended. It now takes @mode argument and if @FMODE_EXCL is set, it releases an exclusive access. Also, when the last exclusive claim is released, the holder/slave symlinks are removed automatically. * bd_claim/release() and close_bdev_exclusive() are no longer necessary and either made static or removed. * bd_link_disk_holder() remains the same but bd_unlink_disk_holder() is no longer necessary and removed. * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev() and blkdev_get(). It also has an unexpected extra bdev_read_only() test which probably should be moved into blkdev_get(). * open_by_devnum() is modified to take @holder argument and pass it to blkdev_get(). Most of bdev open/close operations are unified into blkdev_get/put() and most exclusive accesses are tested atomically at the open time (as it should). This cleans up code and removes some, both valid and invalid, but unnecessary all the same, corner cases. open_bdev_exclusive() and open_by_devnum() can use further cleanup - rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop special features. Well, let's leave them for another day. Most conversions are straight-forward. drbd conversion is a bit more involved as there was some reordering, but the logic should stay the same. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NNeil Brown <neilb@suse.de> Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NPhilipp Reisner <philipp.reisner@linbit.com> Cc: Peter Osterlund <petero2@telia.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Alex Elder <aelder@sgi.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: dm-devel@redhat.com Cc: drbd-dev@lists.linbit.com Cc: Leo Chen <leochen@broadcom.com> Cc: Scott Branden <sbranden@broadcom.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Joern Engel <joern@logfs.org> Cc: reiserfs-devel@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk>
-
- 29 10月, 2010 5 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... and switch of the obvious get_sb_bdev() users to ->mount() Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
eventual replacement for ->get_sb() - does *not* get vfsmount, return ERR_PTR(error) or root of subtree to be mounted. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 26 10月, 2010 1 次提交
-
-
由 Al Viro 提交于
Pull removal of fsnotify marks into generic_shutdown_super(). Split umount-time work into a new function - evict_inodes(). Make sure that invalidate_inodes() will be able to cope with I_FREEING once we change locking in iput(). Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 18 8月, 2010 1 次提交
-
-
由 Nick Piggin 提交于
fs: scale files_lock Improve scalability of files_lock by adding per-cpu, per-sb files lists, protected with an lglock. The lglock provides fast access to the per-cpu lists to add and remove files. It also provides a snapshot of all the per-cpu lists (although this is very slow). One difficulty with this approach is that a file can be removed from the list by another CPU. We must track which per-cpu list the file is on with a new variale in the file struct (packed into a hole on 64-bit archs). Scalability could suffer if files are frequently removed from different cpu's list. However loads with frequent removal of files imply short interval between adding and removing the files, and the scheduler attempts to avoid moving processes too far away. Also, even in the case of cross-CPU removal, the hardware has much more opportunity to parallelise cacheline transfers with N cachelines than with 1. A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs degenerates to contending on a single lock, which is no worse than before. When more than one CPU are allocating files, even if they are always freed by different CPUs, there will be more parallelism than the single-lock case. Testing results: On a 2 socket, 8 core opteron, I measure the number of times the lock is taken to remove the file, the number of times it is removed by the same CPU that added it, and the number of times it is removed by the same node that added it. Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%) kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%) dbench 64 locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%) So a file is removed from the same CPU it was added by over 90% of the time. It remains within the same node 95% of the time. Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile. throughput 2.6.34-rc2 24.5 +patch 24.9 us sys idle IO wait (in %) 2.6.34-rc2 51.25 28.25 17.25 3.25 +patch 53.75 18.5 19 8.75 So significantly less CPU time spent in kernel code, higher idle time and slightly higher throughput. Single threaded performance difference was within the noise of microbenchmarks. That is not to say penalty does not exist, the code is larger and more memory accesses required so it will be slightly slower. Cc: linux-kernel@vger.kernel.org Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: NNick Piggin <npiggin@kernel.dk> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 10 8月, 2010 3 次提交
-
-
由 Al Viro 提交于
just delay __put_super() a bit Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
If sget() finds a matching superblock being set up, it'll grab an active reference to it and grab s_umount. That's fine - we'll wait for completion of foofs_get_sb() that way. However, if said foofs_get_sb() fails we'll end up holding the halfway-created superblock. deactivate_locked_super() called by foofs_get_sb() will just unlock the sucker since we are holding another active reference to it. What we need is a way to tell if superblock has been successfully set up. Unfortunately, neither ->s_root nor the check for MS_ACTIVE quite fit. Cheap and easy way, suitable for backport: new flag set by the (only) caller of ->get_sb(). If that flag isn't present by the time sget() grabbed s_umount on preexisting superblock it has found, it's seeing a stillborn and should just bury it with deactivate_locked_super() (and repeat the search). Longer term we want to set that flag in ->get_sb() instances (and check for it to distinguish between "sget() found us a live sb" and "sget() has allocated an sb, we need to set it up" in there, instead of checking ->s_root as we do now). Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org
-
由 Tejun Heo 提交于
Fix an obscure AB-BA deadlock in get_sb_bdev(). When a superblock is mounted more than once get_sb_bdev() calls close_bdev_exclusive() to drop the extra bdev reference while holding s_umount. However, sb->s_umount nests inside bd_mutex during __invalidate_device() and close_bdev_exclusive() acquires bd_mutex during blkdev_put(); thus creating an AB-BA deadlock. This condition doesn't trigger frequently. For this condition to be visible to lockdep, the filesystem must occupy the whole device (as __invalidate_device() only grabs bd_mutex for the whole device), the FS must be mounted more than once and partition rescan should be issued while the FS is still mounted. Fix it by dropping s_umount over close_bdev_exclusive(). Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NCiprian Docan <docan@eden.rutgers.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 30 6月, 2010 1 次提交
-
-
由 npiggin@suse.de 提交于
list_for_each_entry_safe is not suitable to protect against concurrent modification of the list. 6754af64 introduced a race in sb walking. list_for_each_entry can use the trick of pinning the current entry in the list before we drop and retake the lock because it subsequently follows cur->next. However list_for_each_entry_safe saves n=cur->next for following before entering the loop body, so when the lock is dropped, n may be deleted. Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: John Stultz <johnstul@us.ibm.com> Cc: Frank Mayhar <fmayhar@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 5月, 2010 1 次提交
-
-
由 Randy Dunlap 提交于
Fix fs/super.c kernel-doc warning and function notation: Warning(fs/super.c:957): No description found for parameter 'sb' Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 24 5月, 2010 3 次提交
-
-
由 Christoph Hellwig 提交于
Only set the quota operation vectors if the filesystem actually supports quota instead of doing it for all filesystems in alloc_super(). [Jan Kara: Export dquot_operations and vfs_quotactl_ops] Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Christoph Hellwig 提交于
Currently the VFS calls into the quotactl interface for unmounting filesystems. This means filesystems with their own quota handling can't easily distinguish between user-space originating quotaoff and an unount. Instead move the responsibily of the unmount handling into the filesystem to be consistent with all other dquot handling. Note that we do call dquot_disable a lot later now, e.g. after a sync_filesystem. But this is fine as the quota code does all its writes via blockdev's mapping and that is synced even later. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Christoph Hellwig 提交于
Currently do_remount_sb calls into the dquot code to tell it about going from rw to ro and ro to rw. Move this code into the filesystem to not depend on the dquot code in the VFS - note ocfs2 already ignores these calls and handles remount by itself. This gets rid of overloading the quotactl calls and allows to unify the VFS and XFS codepaths in that area later. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 22 5月, 2010 10 次提交
-
-
由 Roland Dreier 提交于
> ============================================= > [ INFO: possible recursive locking detected ] > 2.6.31-2-generic #14~rbd3 > --------------------------------------------- > firefox-3.5/4162 is trying to acquire lock: > (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0 > > but task is already holding lock: > (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0 > > other info that might help us debug this: > 3 locks held by firefox-3.5/4162: > #0: (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0 > #1: (&sb->s_type->i_mutex_key#11/1){+.+.+.}, at: [<ffffffff81139d5a>] lock_rename+0x6a/0xf0 > #2: (&sb->s_type->i_mutex_key#11/2){+.+.+.}, at: [<ffffffff81139d6f>] lock_rename+0x7f/0xf0 > > stack backtrace: > Pid: 4162, comm: firefox-3.5 Tainted: G C 2.6.31-2-generic #14~rbd3 > Call Trace: > [<ffffffff8108ae74>] print_deadlock_bug+0xf4/0x100 > [<ffffffff8108ce26>] validate_chain+0x4c6/0x750 > [<ffffffff8108d2e7>] __lock_acquire+0x237/0x430 > [<ffffffff8108d585>] lock_acquire+0xa5/0x150 > [<ffffffff81139d31>] ? lock_rename+0x41/0xf0 > [<ffffffff815526ad>] __mutex_lock_common+0x4d/0x3d0 > [<ffffffff81139d31>] ? lock_rename+0x41/0xf0 > [<ffffffff81139d31>] ? lock_rename+0x41/0xf0 > [<ffffffff8120eaf9>] ? ecryptfs_rename+0x99/0x170 > [<ffffffff81552b36>] mutex_lock_nested+0x46/0x60 > [<ffffffff81139d31>] lock_rename+0x41/0xf0 > [<ffffffff8120eb2a>] ecryptfs_rename+0xca/0x170 > [<ffffffff81139a9e>] vfs_rename_dir+0x13e/0x160 > [<ffffffff8113ac7e>] vfs_rename+0xee/0x290 > [<ffffffff8113c212>] ? __lookup_hash+0x102/0x160 > [<ffffffff8113d512>] sys_renameat+0x252/0x280 > [<ffffffff81133eb4>] ? cp_new_stat+0xe4/0x100 > [<ffffffff8101316a>] ? sysret_check+0x2e/0x69 > [<ffffffff8108c34d>] ? trace_hardirqs_on_caller+0x14d/0x190 > [<ffffffff8113d55b>] sys_rename+0x1b/0x20 > [<ffffffff81013132>] system_call_fastpath+0x16/0x1b The trace above is totally reproducible by doing a cross-directory rename on an ecryptfs directory. The issue seems to be that sys_renameat() does lock_rename() then calls into the filesystem; if the filesystem is ecryptfs, then ecryptfs_rename() again does lock_rename() on the lower filesystem, and lockdep can't tell that the two s_vfs_rename_mutexes are different. It seems an annotation like the following is sufficient to fix this (it does get rid of the lockdep trace in my simple tests); however I would like to make sure I'm not misunderstanding the locking, hence the CC list... Signed-off-by: NRoland Dreier <rdreier@cisco.com> Cc: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Cc: Dustin Kirkland <kirkland@canonical.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Josef Bacik 提交于
Currently the way we do freezing is by passing sb>s_bdev to freeze_bdev and then letting it do all the work. But freezing is more of an fs thing, and doesn't really have much to do with the bdev at all, all the work gets done with the super. In btrfs we do not populate s_bdev, since we can have multiple bdev's for one fs and setting s_bdev makes removing devices from a pool kind of tricky. This means that freezing a btrfs filesystem fails, which causes us to corrupt with things like tux-on-ice which use the fsfreeze mechanism. So instead of populating sb->s_bdev with a random bdev in our pool, I've broken the actual fs freezing stuff into freeze_super and thaw_super. These just take the super_block that we're freezing and does the appropriate work. It's basically just copy and pasted from freeze_bdev. I've then converted freeze_bdev over to use the new super helpers. I've tested this with ext4 and btrfs and verified everything continues to work the same as before. The only new gotcha is multiple calls to the fsfreeze ioctl will return EBUSY if the fs is already frozen. I thought this was a better solution than adding a freeze counter to the super_block, but if everybody hates this idea I'm open to suggestions. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... and switch the simple "loop over superblocks and do something" loops to it. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
If superblock had been still alive, we would've returned it... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
This one needs restarts... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
need list_for_each_entry_safe() here. Original didn't even have restart logics, so if you race with umount() it blew up. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-