- 04 2月, 2009 8 次提交
-
-
由 Chris Mason 提交于
Most of the btrfs metadata operations can be protected by a spinlock, but some operations still need to schedule. So far, btrfs has been using a mutex along with a trylock loop, most of the time it is able to avoid going for the full mutex, so the trylock loop is a big performance gain. This commit is step one for getting rid of the blocking locks entirely. btrfs_tree_lock takes a spinlock, and the code explicitly switches to a blocking lock when it starts an operation that can schedule. We'll be able get rid of the blocking locks in smaller pieces over time. Tracing allows us to find the most common cause of blocking, so we can start with the hot spots first. The basic idea is: btrfs_tree_lock() returns with the spin lock held btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in the extent buffer flags, and then drops the spin lock. The buffer is still considered locked by all of the btrfs code. If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops the spin lock and waits on a wait queue for the blocking bit to go away. Much of the code that needs to set the blocking bit finishes without actually blocking a good percentage of the time. So, an adaptive spin is still used against the blocking bit to avoid very high context switch rates. btrfs_clear_lock_blocking() clears the blocking bit and returns with the spinlock held again. btrfs_tree_unlock() can be called on either blocking or spinning locks, it does the right thing based on the blocking bit. ctree.c has a helper function to set/clear all the locked buffers in a path as blocking. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Before metadata is written to disk, it is updated to reflect that writeout has begun. Once this update is done, the block must be cow'd before it can be modified again. This update was originally synchronized by using a per-fs spinlock. Today the buffers for the metadata blocks are locked before writeout begins, and everyone that tests the flag has the buffer locked as well. So, the per-fs spinlock (called hash_lock for no good reason) is no longer required. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
extent_io.c has debugging code to report and free leaked extent_state and extent_buffer objects at rmmod time. This helps track down leaks and it saves you from rebooting just to properly remove the kmem_cache object. But, the code runs under a fairly expensive spinlock and the checks to see if it is currently enabled are not entirely consistent. Some use #ifdef and some #if. This changes everything to #if and disables the leak checking. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
When a block goes through cow, we update the reference counts of everything that block points to. The internal pointers of the block can be in just about any order, and it is likely to have clusters of things that are close together and clusters of things that are not. To help reduce the seeks that come with updating all of these reference counts, sort them by byte number before actual updates are done. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Tracing shows the delay between when an async thread goes to sleep and when more work is added is often very short. This commit adds a little bit of delay and extra checking to the code right before we schedule out. It allows more work to be added to the worker without requiring notifications from other procs. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Jim Owens 提交于
Add call to LSM security initialization and save resulting security xattr for new inodes. Add xattr support to symlink inode ops. Set inode->i_op for existing special files. Signed-off-by: Njim owens <jowens@hp.com>
-
由 Christian Hesse 提交于
This patch adds a menu entry to kconfig to enable acls for btrfs. This allows you to enable FS_POSIX_ACL at kernel compile time. (updated by Jeff Mahoney to make the changes in fs/btrfs/Kconfig instead) Signed-off-by: NChristian Hesse <mail@earthworm.de> Signed-off-by: NJeff Mahoney <jeffm@suse.com>
-
由 Chris Mason 提交于
The async bio submission thread was missing some bios that were added after it had decided there was no work left to do. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 29 1月, 2009 1 次提交
-
-
由 Chris Mason 提交于
After btrfs_readdir has gone through all the directory items, it sets the directory f_pos to the largest possible int. This way applications that mix readdir with creating new files don't end up in an endless loop finding the new directory items as they go. It was a workaround for a bug in git, but the assumption was that if git could make this looping mistake than it would be a common problem. The largest possible int chosen was INT_LIMIT(typeof(file->f_pos), and it is possible for that to be a larger number than 32 bit glibc expects to come out of readdir. This patches switches that to INT_LIMIT(off_t), which should keep applications happy on 32 and 64 bit machines. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 28 1月, 2009 3 次提交
-
-
由 Jeff Layton 提交于
nfsd4_lockt does a search for a lockstateowner when building the lock struct to test. If one is found, it'll set fl_owner to it. Regardless of whether that happens, it'll also set fl_lmops. Given that this lock is basically a "lightweight" lock that's just used for checking conflicts, setting fl_lmops is probably not appropriate for it. This behavior exposed a bug in DLM's GETLK implementation where it wasn't clearing out the fields in the file_lock before filling in conflicting lock info. While we were able to fix this in DLM, it still seems pointless and dangerous to set the fl_lmops this way when we may have a NULL lockstateowner. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@pig.fieldses.org>
-
由 J. Bruce Fields 提交于
Since override_creds() took its own reference on new, we need to release our own reference. (Note the put_cred on the return value puts the *old* value of current->creds, not the new passed-in value). Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 J. Bruce Fields 提交于
We're forgetting to check the return value from groups_alloc(). Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
- 27 1月, 2009 1 次提交
-
-
由 Vegard Nossum 提交于
If userspace supplies an invalid pointer to a read() of an inotify instance, the inotify device's event list mutex is unlocked twice. This causes an unbalance which effectively leaves the data structure unprotected, and we can trigger oopses by accessing the inotify instance from different tasks concurrently. The best fix (contributed largely by Linus) is a total rewrite of the function in question: On Thu, Jan 22, 2009 at 7:05 AM, Linus Torvalds wrote: > The thing to notice is that: > > - locking is done in just one place, and there is no question about it > not having an unlock. > > - that whole double-while(1)-loop thing is gone. > > - use multiple functions to make nesting and error handling sane > > - do error testing after doing the things you always need to do, ie do > this: > > mutex_lock(..) > ret = function_call(); > mutex_unlock(..) > > .. test ret here .. > > instead of doing conditional exits with unlocking or freeing. > > So if the code is written in this way, it may still be buggy, but at least > it's not buggy because of subtle "forgot to unlock" or "forgot to free" > issues. > > This _always_ unlocks if it locked, and it always frees if it got a > non-error kevent. Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Robert Love <rlove@google.com> Cc: <stable@kernel.org> Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 1月, 2009 5 次提交
-
-
由 Miklos Szeredi 提交于
Move fuse_copy_finish() to before calling fuse_notify_poll_wakeup(). This is not a big issue because fuse_notify_poll_wakeup() should be atomic, but it's cleaner this way, and later uses of notification will need to be able to finish the copying before performing some actions. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
If a fuse filesystem is unmounted but the device file descriptor remains open and a new mount reuses the old device number, then the mount fails with EEXIST and the following warning is printed in the kernel log: WARNING: at fs/sysfs/dir.c:462 sysfs_add_one+0x35/0x3d() sysfs: duplicate filename '0:15' can not be created The cause is that the bdi belonging to the fuse filesystem was destoryed only after the device file was released. Fix this by calling bdi_destroy() from fuse_put_super() instead. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org
-
由 Miklos Szeredi 提交于
Clean up error handling for the whole of fuse_fill_super() function. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Fix the leaking file reference if allocation or initialization of fuse_conn failed. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org
-
由 Dan Carpenter 提交于
ff is set to NULL and then dereferenced on line 65. Compile tested only. Signed-off-by: NDan Carpenter <error27@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org
-
- 22 1月, 2009 22 次提交
-
-
由 Chris Mason 提交于
Just before reading a leaf, btrfs scans the node for blocks that are close by and reads them too. It tries to build up a large window of IO looking for blocks that are within a max distance from the top and bottom of the IO window. This patch changes things to just look for blocks within 64k of the target block. It will trigger less IO and make for lower latencies on the read size. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-