- 30 10月, 2010 10 次提交
-
-
由 Sage Weil 提交于
Add support for an async transaction commit that is ordered such that any subsequent operations will join the following transaction, but does not wait until the current commit is fully on disk. This avoids much of the latency associated with the btrfs_commit_transaction for callers concerned with serialization and not safety. The wait_for_unblock flag controls whether we wait for the 'middle' portion of commit_transaction to complete, which is necessary if the caller expects some of the modifications contained in the commit to be available (this is the case for subvol/snapshot creation). Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Sage Weil 提交于
We calculate timeout (either 1 or MAX_SCHEDULE_TIMEOUT) based on whether num_writers > 1 or should_grow at the top of the loop. Then, much much later, we wait for that timeout if either num_writers or should_grow is true. However, it's possible for a racing process (calling btrfs_end_transaction()) to decrement num_writers such that we wait forever instead of for 1. Fix this by deciding how long to wait when we wait. Include a smp_mb() before checking if the waitqueue is active to ensure the num_writers is visible. Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Sage Weil 提交于
I'm no lockdep expert, but this appears to make the lockdep warning go away for the i_mutex locking in the clone ioctl. Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Sage Weil 提交于
We had an edge case issue where the requested range was just following an existing extent. Instead of skipping to the next extent, we used the previous one which lead to having zero sized extents. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Sage Weil 提交于
The lookup_first_ordered_extent() was done on the wrong inode, and the ->delalloc_bytes test was wrong, as the following btrfs_wait_ordered_range() would only invoke a range write and wouldn't write the entire file data range. Also, a bad parameter was passed to btrfs_wait_ordered_range(). Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
The alloc_target variable is not really used. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Andi Kleen 提交于
These are all the cases where a variable is set, but not read which are not bugs as far as I can see, but simply leftovers. Still needs more review. Found by gcc 4.6's new warnings Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Andi Kleen 提交于
These are all the cases where a variable is set, but not read which are really bugs. - Couple of incorrect error handling fixed. - One incorrect use of a allocation policy - Some other things Still needs more review. Found by gcc 4.6's new warnings. [akpm@linux-foundation.org: fix build. Might have been bitrot] Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Julia Lawall 提交于
Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more clear what is the purpose of the operation, which otherwise looks like a no-op. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; T x; identifier f; @@ T f (...) { <+... - ERR_PTR(PTR_ERR(x)) + x ...+> } @@ expression x; @@ - ERR_PTR(PTR_ERR(x)) + ERR_CAST(x) // </smpl> Signed-off-by: NJulia Lawall <julia@diku.dk> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Julia Lawall 提交于
Use memdup_user when user data is immediately copied into the allocated region. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression from,to,size,flag; position p; identifier l1,l2; @@ - to = \(kmalloc@p\|kzalloc@p\)(size,flag); + to = memdup_user(from,size); if ( - to==NULL + IS_ERR(to) || ...) { <+... when != goto l1; - -ENOMEM + PTR_ERR(to) ...+> } - if (copy_from_user(to, from, size) != 0) { - <+... when != goto l2; - -EFAULT - ...+> - } // </smpl> Signed-off-by: NJulia Lawall <julia@diku.dk> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 29 10月, 2010 14 次提交
-
-
由 Chris Mason 提交于
When btrfs is mounted in degraded mode, it has some internal structures to track the missing devices. This missing device is setup as readonly, but the mapping code can get upset when we try to write to it. This changes the mapping code to return -EIO instead of oops when we try to write to the readonly device. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Miao Xie 提交于
This patch reduces the CPU time spent in the extent buffer search by using the radix tree instead of the rbtree and using the rcu lock instead of the spin lock. I did a quick test by the benchmark tool[1] and found the patch improve the file creation/deletion performance problem that I have reported[2]. Before applying this patch: Create files: Total files: 50000 Total time: 0.971531 Average time: 0.000019 Delete files: Total files: 50000 Total time: 1.366761 Average time: 0.000027 After applying this patch: Create files: Total files: 50000 Total time: 0.927455 Average time: 0.000019 Delete files: Total files: 50000 Total time: 1.292280 Average time: 0.000026 [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3 [2] http://marc.info/?l=linux-btrfs&m=128212635122920&w=2Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Miao Xie 提交于
restructure try_release_extent_buffer() and write a function to release the extent buffer. It will be used later. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
We have a fairly complex set of loops around walking our list of delalloc inodes when we find metadata delalloc space running low. It doesn't work very well, can use large amounts of CPU and doesn't do very efficient writeback. This switches us to kick the bdi flusher threads instead. All dirty data in btrfs is accounted as delalloc data, so this is very similar in terms of what it writes, but we're able to just kick off the IO and wait for progress. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
An earlier commit tried to keep us from allocating too many empty metadata chunks. It was somewhat too restrictive and could lead to ENOSPC errors on empty filesystems. This increases the limits to about 5% of the FS size, allowing more metadata chunks to be preallocated. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
When btrfs is running low on metadata space, it needs to force delayed allocation pages to disk. It currently does this with a suboptimal walk of a private list of inodes with delayed allocation, and it would be much better if we used the generic flusher threads. writeback_inodes_sb_if_idle would be ideal, but it waits for the flusher thread to start IO on all the dirty pages in the FS before it returns. This adds variants of writeback_inodes_sb* that allow the caller to control how many pages get sent down. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
When btrfs discovers the generation number in a btree block is incorrect, it can loop forever without forcing the RAID code to try a valid mirror, and without returning EIO. This changes things to properly kick out the EIO. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
If you mount -o space_cache, the option will be persistent across mounts, but to make sure the user knows that they did this, emit a message telling them if they didn't mount with -o space_cache but the feature is still used. Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
If something goes wrong with the free space cache we need a way to make sure it's not loaded on mount and that it's cleared for everybody. When you pass the clear_cache option it will make it so all block groups are setup to be cleared, which keeps them from being loaded and then they will be truncated when the transaction is committed. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
There are just a few things that need to be fixed in the kernel to support mixed data+metadata block groups. Mostly we just need to make sure that if we are using mixed block groups that we continue to allocate mixed block groups as we need them. Also we need to make sure __find_space_info will find our space info if we search for DATA or METADATA only. Tested this with xfstests and it works nicely. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
With the free space disk caching we can mark the block group as started with the caching, but we don't have a caching ctl. This can race with anybody else who tries to get the caching ctl before we cache (this is very hard to do btw). So instead check to see if cache->caching_ctl is set, and if not return NULL. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
This patch actually loads the free space cache if it exists. The only thing that really changes here is that we need to cache the block group if we're going to remove an extent from it. Previously we did not do this since the caching kthread would pick it up. With the on disk cache we don't have this luxury so we need to make sure we read the on disk cache in first, and then remove the extent, that way when the extent is unpinned the free space is added to the block group. This has been tested with all sorts of things. Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
This is a simple bit, just dump the free space cache out to our preallocated inode when we're writing out dirty block groups. There are a bunch of changes in inode.c in order to account for special cases. Mostly when we're doing the writeout we're holding trans_mutex, so we need to use the nolock transacation functions. Also we can't do asynchronous completions since the async thread could be blocked on already completed IO waiting for the transaction lock. This has been tested with xfstests and btrfs filesystem balance, as well as my ENOSPC tests. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
In order to save free space cache, we need an inode to hold the data, and we need a special item to point at the right inode for the right block group. So first, create a special item that will point to the right inode, and the number of extent entries we will have and the number of bitmaps we will have. We truncate and pre-allocate space everytime to make sure it's uptodate. This feature will be turned on as soon as you mount with -o space_cache, however it is safe to boot into old kernels, they will just generate the cache the old fashion way. When you boot back into a newer kernel we will notice that we modified and not the cache and automatically discard the cache. Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 27 10月, 2010 2 次提交
-
-
由 Josef Bacik 提交于
Because btrfs_dirty_inode does a btrfs_join_transaction, it doesn't actually reserve space. It does this so we can try and dirty the inode quickly without having to deal with the ENOSPC problems. But if it does get back ENOSPC it handles it properly. The problem is use_block_rsv does a WARN_ON whenever this case happens, even tho btrfs_dirty_inode takes it into account and actually expects to get -ENOSPC if things are particularly tight. So instead just remove the warning. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
btrfs_commit_transaction will free our trans, but because we pass trans to shrink_delalloc we could possibly have a use after free situation. So instead if we commit the transaction, set trans to null and set committed to true so we don't keep trying to commit a transaction. This fixes a panic I could reproduce at will. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 23 10月, 2010 8 次提交
-
-
由 Josef Bacik 提交于
If we failed to find the root subvol id, or the subvol=<name>, we would deactivate the locked super and close the devices. The problem is at this point we have gotten the SB all setup, which includes setting super_operations, so when we'd deactiveate the super, we'd do a close_ctree() which closes the devices, so we'd end up closing the devices twice. So if you do something like this mount /dev/sda1 /mnt/test1 mount /dev/sda1 /mnt/test2 -o subvol=xxx umount /mnt/test1 it would blow up (if subvol xxx doesn't exist). This patch fixes that problem. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
With multi-threaded writes we were getting ENOSPC early because somebody would come in, start flushing delalloc because they couldn't make their reservation, and in the meantime other threads would come in and use the space that was getting freed up, so when the original thread went to check to see if they had space they didn't and they'd return ENOSPC. So instead if we have some free space but not enough for our reservation, take the reservation and then start doing the flushing. The only time we don't take reservations is when we've already overcommitted our space, that way we don't have people who come late to the party way overcommitting ourselves. This also moves all of the retrying and flushing code into reserve_metdata_bytes so it's all uniform. This keeps my fs_mark test from returning -ENOSPC as soon as it starts and actually lets me fill up the disk. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
Because the ENOSPC code over reserves super aggressively we end up allocating chunks way more often than we should. For example with my fs_mark tests on a 2gb fs I can end up reserved 1gb just for metadata, when only 34mb of that is being used. So instead check to see if the amount of space actually used is less than 30% of the total space, and if so don't allocate a chunk, but only if we have at least 256mb of free space to make sure we don't put too much pressure on free space. Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
Currently we try and flush delalloc, but we only do that in a sort of weak way, which works fine in most cases but if we're under heavy pressure we need to be able to wait for flushing to happen. Also instead of checking the bytes reserved in the block_rsv, check the space info since it is more accurate. The sync option will be used in a future patch. Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
The global reservation stuff tries to add together DATA and METADATA used in order to figure out how much to reserve for everything, but this doesn't work right for mixed block groups. Instead if we have mixed block groups just set data used to 0. Also with mixed block groups we will use bytes_may_use for keeping track of delalloc bytes, so we need to take that into account in our reservation calculations. Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
The new ENOSPC stuff breaks out the raid types which breaks the way we were reporting df to the system. This fixes it back so that Available is the total space available to data and used is the actual bytes used by the filesystem. This means that Available is Total - data used - all of the metadata space. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
The new ENOSPC stuff broke the df ioctl since we no longer create seperate space info's for each RAID type. So instead, loop through each space info's raid lists so we can get the right RAID information which will allow the df ioctl to tell us RAID types again. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
In very severe ENOSPC cases we can run out of inodes to do delalloc on, which means we'll just keep looping trying to shrink delalloc. Instead, if we fail to shrink delalloc 3 times in a row break out since we're not likely to make any progress. Tested this with a 100mb fs an xfstests test 13. Before the patch it would hang the box, with the patch we get -ENOSPC like we should. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 15 10月, 2010 3 次提交
-
-
由 Linus Torvalds 提交于
If you build aout support as a module, you'll want these exported. Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Tony Luck reports that the addition of the access_ok() check in commit 0eead9ab ("Don't dump task struct in a.out core-dumps") broke the ia64 compile due to missing the necessary header file includes. Rather than add yet another include (<asm/unistd.h>) to make everything happy, just uninline the silly core dump helper functions and move the bodies to fs/exec.c where they make a lot more sense. dump_seek() in particular was too big to be an inline function anyway, and none of them are in any way performance-critical. And we really don't need to mess up our include file headers more than they already are. Reported-and-tested-by: NTony Luck <tony.luck@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
akiphie points out that a.out core-dumps have that odd task struct dumping that was never used and was never really a good idea (it goes back into the mists of history, probably the original core-dumping code). Just remove it. Also do the access_ok() check on dump_write(). It probably doesn't matter (since normal filesystems all seem to do it anyway), but he points out that it's normally done by the VFS layer, so ... [ I suspect that we should possibly do "vfs_write()" instead of calling ->write directly. That also does the whole fsnotify and write statistics thing, which may or may not be a good idea. ] And just to be anal, do this all for the x86-64 32-bit a.out emulation code too, even though it's not enabled (and won't currently even compile) Reported-by: Nakiphie <akiphie@lavabit.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 10月, 2010 1 次提交
-
-
由 J. Bruce Fields 提交于
As of commit 43a9aa64 "NFSD: Fill in WCC data for REMOVE, RMDIR, MKNOD, and MKDIR", we sometimes call fh_unlock on a filehandle that isn't fully initialized. We should fix up the callers, but as a quick fix it is also sufficient just to remove this assertion. Reported-by: NMarius Tolzmann <tolzmann@molgen.mpg.de> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 12 10月, 2010 1 次提交
-
-
由 Eric Paris 提交于
This patch disables the fanotify syscalls by just not building them and letting the cond_syscall() statements in kernel/sys_ni.c redirect them to sys_ni_syscall(). It was pointed out by Tvrtko Ursulin that the fanotify interface did not include an explicit prioritization between groups. This is necessary for fanotify to be usable for hierarchical storage management software, as they must get first access to the file, before inotify-like notifiers see the file. This feature can be added in an ABI compatible way in the next release (by using a number of bits in the flags field to carry the info) but it was suggested by Alan that maybe we should just hold off and do it in the next cycle, likely with an (new) explicit argument to the syscall. I don't like this approach best as I know people are already starting to use the current interface, but Alan is all wise and noone on list backed me up with just using what we have. I feel this is needlessly ripping the rug out from under people at the last minute, but if others think it needs to be a new argument it might be the best way forward. Three choices: Go with what we got (and implement the new feature next cycle). Add a new field right now (and implement the new feature next cycle). Wait till next cycle to release the ABI (and implement the new feature next cycle). This is number 3. Signed-off-by: NEric Paris <eparis@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 10月, 2010 1 次提交
-
-
由 Boaz Harrosh 提交于
This BUG is there since the first submit of the code, but only triggered in last Kernel. It's timing related do to the asynchronous object-creation behaviour of exofs. (Which should be investigated farther) The bug is obvious hence the fixed. Signed-off-by: Boaz Harrosh <Boaz Harrosh bharrosh@panasas.com>
-