- 11 3月, 2014 1 次提交
-
-
This is an extension to my previous commit titled: "Btrfs: faster file extent item replace operations" (hash 1acae57b) Instead of inserting the new file extent item if we deleted existing file extent items covering our target file range, also allow to insert the new file extent item if we didn't find any existing items to delete and replace_extent != 0, since in this case our caller would do another tree search to insert the new file extent item anyway, therefore just combine the two tree searches into a single one, saving cpu time, reducing lock contention and reducing btree node/leaf COW operations. This covers the case where applications keep doing tail append writes to files, which for example is the case of Apache CouchDB (its database and view index files are always open with O_APPEND). Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
- 29 1月, 2014 9 次提交
-
-
由 Chris Mason 提交于
If we truncate an uncompressed inline item, ram_bytes isn't updated to reflect the new size. The fixe uses the size directly from the item header when reading uncompressed inlines, and also fixes truncate to update the size as it goes. Reported-by: NJens Axboe <axboe@fb.com> Signed-off-by: NChris Mason <clm@fb.com> CC: stable@vger.kernel.org
-
由 Miao Xie 提交于
When we ran the 274th case of xfstests with nodatacow mount option, We met the following warning message: WARNING: CPU: 1 PID: 14185 at fs/btrfs/extent-tree.c:3734 btrfs_free_reserved_data_space+0xa6/0xd0 It is caused by the race between the write back and nocow buffered write: Task1 Task2 __btrfs_buffered_write() skip data reservation reserve the metadata space copy the data dirty the pages unlock the pages write back the pages release the data space becasue there is no noreserve flag set the noreserve flag This patch fixes this problem by unlocking the pages after the noreserve flag is set. Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Josef Bacik 提交于
Looking into some performance related issues with large amounts of metadata revealed that we can have some pretty huge swings in fsync() performance. If we have a lot of delayed refs backed up (as you will tend to do with lots of metadata) fsync() will wander off and try to run some of those delayed refs which can result in reading from disk and such. Since the actual act of fsync() doesn't create any delayed refs there is no need to make it throttle on delayed ref stuff, that will be handled by other people. With this patch we get much smoother fsync performance with large amounts of metadata. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
When writing to a file we drop existing file extent items that cover the write range and then add a new file extent item that represents that write range. Before this change we were doing a tree lookup to remove the file extent items, and then after we did another tree lookup to insert the new file extent item. Most of the time all the file extent items we need to drop are located within a single leaf - this is the leaf where our new file extent item ends up at. Therefore, in this common case just combine these 2 operations into a single one. By avoiding the second btree navigation for insertion of the new file extent item, we reduce btree node/leaf lock acquisitions/releases, btree block/leaf COW operations, CPU time on btree node/leaf key binary searches, etc. Besides for file writes, this is an operation that happens for file fsync's as well. However log btrees are much less likely to big as big as regular fs btrees, therefore the impact of this change is smaller. The following benchmark was performed against an SSD drive and a HDD drive, both for random and sequential writes: sysbench --test=fileio --file-num=4096 --file-total-size=8G \ --file-test-mode=[rndwr|seqwr] --num-threads=512 \ --file-block-size=8192 \ --max-requests=1000000 \ --file-fsync-freq=0 --file-io-mode=sync [prepare|run] All results below are averages of 10 runs of the respective test. ** SSD sequential writes Before this change: 225.88 Mb/sec After this change: 277.26 Mb/sec ** SSD random writes Before this change: 49.91 Mb/sec After this change: 56.39 Mb/sec ** HDD sequential writes Before this change: 68.53 Mb/sec After this change: 69.87 Mb/sec ** HDD random writes Before this change: 13.04 Mb/sec After this change: 14.39 Mb/sec Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
fs/btrfs/file.c: In function ‘prepare_pages.isra.18’: fs/btrfs/file.c:1265:6: warning: ‘err’ may be used uninitialized in this function [-Wuninitialized] Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
If the ordered extent's last byte was 1 less than our region's start byte, we would unnecessarily wait for the completion of that ordered extent, because it doesn't intersect our target range. Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
When we ran sysbench on the fs with compression, the following WARN_ONs were triggered: fs/btrfs/inode.c:7829 WARN_ON(BTRFS_I(inode)->outstanding_extents); fs/btrfs/inode.c:7830 WARN_ON(BTRFS_I(inode)->reserved_extents); fs/btrfs/inode.c:7832 WARN_ON(BTRFS_I(inode)->csum_bytes); Steps to reproduce: # mkfs.btrfs -f <dev> # mount -o compress <dev> <mnt> # cd <mnt> # sysbench --test=fileio --num-threads=8 --file-total-size=8G \ > --file-block-size=32K --file-io-mode=rndwr --file-fsync-freq=0 \ > --file-fsync-end=no --max-requests=300000 --file-extra-flags=direct \ > --file-test-mode=sync prepare # cd - # umount <mnt> # mount -o compress <dev> <mnt> # cd <mnt> # sysbench --test=fileio --num-threads=8 --file-total-size=8G \ > --file-block-size=32K --file-io-mode=rndwr --file-fsync-freq=0 \ > --file-fsync-end=no --max-requests=300000 --file-extra-flags=direct \ > --file-test-mode=sync run # cd - # umount <mnt> The reason of this problem is: Task0 Task1 btrfs_direct_IO unlock(&inode->i_mutex) lock(&inode->i_mutex) reserve_space() prepare_pages() lock_extent() clear_extent() unlock_extent() lock_extent() test_extent(uptodate) return false copy_data() set_delalloc_extent() extent need compress go back to buffered write clear_extent(DELALLOC | DIRTY) unlock_extent() Task 0 and 1 wrote the same place, and task0 cleared the delalloc flag which was set by task1, it made the dirty pages in that extents couldn't be flushed into the disk, so the reserved space for that extent was not released at the end. This patch fixes the above bug by unlocking the extent after the delalloc. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
- the caller has gotten the inode object, needn't pass the file object. And if so, we needn't define a inode pointer variant. - the position should be aligned by the page size not sector size, so we also needn't pass the root object into prepare_pages(). Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Josef Bacik 提交于
Btrfs has always had these filler extent data items for holes in inodes. This has made somethings very easy, like logging hole punches and sending hole punches. However for large holey files these extent data items are pure overhead. So add an incompatible feature to no longer add hole extents to reduce the amount of metadata used by these sort of files. This has a few changes for logging and send obviously since they will need to detect holes and log/send the holes if there are any. I've tested this thoroughly with xfstests and it doesn't cause any issues with and without the incompat format set. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 12 11月, 2013 4 次提交
-
-
由 Dulshani Gunawardhana 提交于
Fix spacing issues detected via checkpatch.pl in accordance with the kernel style guidelines. Signed-off-by: NDulshani Gunawardhana <dulshani.gunawardhana89@gmail.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
I noticed that if the free space cache has an error writing out it's data it won't actually error out, it will just carry on. This is because it doesn't check the return value of btrfs_wait_ordered_range, which didn't actually return anything. So fix this in order to keep us from making free space cache look valid when it really isnt. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Zach Brown 提交于
fs/btrfs/compat.h only contained trivial macro wrappers of drop_nlink() and inc_nlink(). This doesn't belong in mainline. Signed-off-by: NZach Brown <zab@redhat.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
Whoever wrote this was braindead. Also it doesn't work right if you have VACANCY's since we assumed you would only have that at the end of the file, which won't be the case in the near future. I tested this with generic/285 and generic/286 as well as the btrfs tests that use fssum since it uses seek_hole/seek_data to verify things are ok. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 21 9月, 2013 1 次提交
-
-
In btrfs_sync_file(), if the call to btrfs_log_dentry_safe() returns a negative error (for e.g. -ENOMEM via btrfs_log_inode()), we would return without ending/freeing the transaction. Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 04 9月, 2013 1 次提交
-
-
由 Christoph Hellwig 提交于
Call generic_write_sync() from the deferred I/O completion handler if O_DSYNC is set for a write request. Also make sure various callers don't call generic_write_sync if the direct I/O code returns -EIOCBQUEUED. Based on an earlier patch from Jan Kara <jack@suse.cz> with updates from Jeff Moyer <jmoyer@redhat.com> and Darrick J. Wong <darrick.wong@oracle.com>. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 01 9月, 2013 3 次提交
-
-
由 Stefan Behrens 提交于
btrfs_read_fs_root_no_name() already checks if btrfs_root_refs() is zero and returns ENOENT in this case. There is no need to do it again in three more places. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
I noticed while looking at a deadlock that we are always starting a transaction in cow_file_range(). This isn't really needed since we only need a transaction if we are doing an inline extent, or if the allocator needs to allocate a chunk. So push down all the transaction start stuff to be closer to where we actually need a transaction in all of these cases. This will hopefully reduce our write latency when we are committing often. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
We can end up with inodes on the auto defrag list that exist on roots that are going to be deleted. This is extra work we don't need to do, so just bail if our root has 0 root refs. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 10 8月, 2013 1 次提交
-
-
由 Josef Bacik 提交于
I noticed while running multi-threaded fsync tests that sometimes fsck would complain about an improper gap. This happens because we fail to add a hole extent to the file, which was happening when we'd split a hole EM because btrfs_drop_extent_cache was just discarding the whole em instead of splitting it. So this patch fixes this by allowing us to split a hole em properly, which means that added holes actually get logged properly and we no longer see this fsck error. Thankfully we're tolerant of these sort of problems so a user would not see any adverse effects of this bug, other than fsck complaining. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 03 7月, 2013 1 次提交
-
-
由 Jie Liu 提交于
For those file systems(btrfs/ext4/ocfs2/tmpfs) that support SEEK_DATA/SEEK_HOLE functions, we end up handling the similar matter in lseek_execute() to update the current file offset to the desired offset if it is valid, ceph also does the simliar things at ceph_llseek(). To reduce the duplications, this patch make lseek_execute() public accessible so that we can call it directly from the underlying file systems. Thanks Dave Chinner for this suggestion. [AV: call it vfs_setpos(), don't bring the removed 'inode' argument back] v2->v1: - Add kernel-doc comments for lseek_execute() - Call lseek_execute() in ceph->llseek() Signed-off-by: NJie Liu <jeff.liu@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Chris Mason <chris.mason@fusionio.com> Cc: Josef Bacik <jbacik@fusionio.com> Cc: Ben Myers <bpm@sgi.com> Cc: Ted Tso <tytso@mit.edu> Cc: Hugh Dickins <hughd@google.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Sage Weil <sage@inktank.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 02 7月, 2013 1 次提交
-
-
由 Josef Bacik 提交于
We always just try and reserve data space when we write, but if we are out of space but have prealloc'ed extents we should still successfully write. This patch will try and see if we can write to prealloc'ed space and if we can go ahead and allow the write to continue. With this patch we now pass xfstests generic/274. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 01 7月, 2013 1 次提交
-
-
由 Josef Bacik 提交于
This has plagued us forever and I'm so over working around it. When we truncate down to a non-page aligned offset we will call btrfs_truncate_page to zero out the end of the page and write it back to disk, this will keep us from exposing stale data if we truncate back up from that point. The problem with this is it requires data space to do this, and people don't really expect to get ENOSPC from truncate() for these sort of things. This also tends to bite the orphan cleanup stuff too which keeps people from mounting. To get around this we can just move this into btrfs_cont_expand() to make sure if we are truncating up from a non-page size aligned i_size we will zero out the rest of this page so that we don't expose stale data. This will give ENOSPC if you try to truncate() up or if you try to write past the end of isize, which is much more reasonable. This fixes xfstests generic/083 failing to mount because of the orphan cleanup failing. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 14 6月, 2013 1 次提交
-
-
由 Stefan Behrens 提交于
btrfs_read_fs_root_no_name() already checks if btrfs_root_refs() is zero and returns ENOENT in this case. There is no need to do it again in six places. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 08 5月, 2013 1 次提交
-
-
由 Kent Overstreet 提交于
Faster kernel compiles by way of fewer unnecessary includes. [akpm@linux-foundation.org: fix fallout] [akpm@linux-foundation.org: fix build] Signed-off-by: NKent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 5月, 2013 4 次提交
-
-
由 Eric Sandeen 提交于
Big patch, but all it does is add statics to functions which are in fact static, then remove the associated dead-code fallout. removed functions: btrfs_iref_to_path() __btrfs_lookup_delayed_deletion_item() __btrfs_search_delayed_insertion_item() __btrfs_search_delayed_deletion_item() find_eb_for_page() btrfs_find_block_group() range_straddles_pages() extent_range_uptodate() btrfs_file_extent_length() btrfs_scrub_cancel_devid() btrfs_start_transaction_lflush() btrfs_print_tree() is left because it is used for debugging. btrfs_start_transaction_lflush() and btrfs_reada_detach() are left for symmetry. ulist.c functions are left, another patch will take care of those. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Tsutomu Itoh 提交于
If argument 'trans' is unnecessary in the function where fixup_low_keys() is called, 'trans' is deleted. Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
A user sent me a btrfs-image of a file system that was panicing on mount during the log recovery. I had originally thought these problems were from a bug in the free space cache code, but that was just a symptom of the problem. The problem is if your application does something like this [prealloc][prealloc][prealloc] the internal extent maps will merge those all together into one extent map, even though on disk they are 3 separate extents. So if you go to write into one of these ranges the extent map will be right since we use the physical extent when doing the write, but when we log the extents they will use the wrong sizes for the remainder prealloc space. If this doesn't happen to trip up the free space cache (which it won't in a lot of cases) then you will get bogus entries in your extent tree which will screw stuff up later. The data and such will still work, but everything else is broken. This patch fixes this by not allowing extents that are on the modified list to be merged. This has the side effect that we are no longer adding everything to the modified list all the time, which means we now have to call btrfs_drop_extents every time we log an extent into the tree. So this allows me to drop all this speciality code I was using to get around calling btrfs_drop_extents. With this patch the testcase I've created no longer creates a bogus file system after replaying the log. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
When logging changed extents I was logging ram_bytes as the current length, which isn't correct, it's supposed to be the ram bytes of the original extent. This is for compression where even if we split the extent we need to know the ram bytes so when we uncompress the extent we know how big it will be. This was still working out right with compression for some reason but I think we were getting lucky. It was definitely off for prealloc which is why I noticed it, btrfsck was complaining about it. With this patch btrfsck no longer complains after a log replay. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 10 4月, 2013 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 22 3月, 2013 1 次提交
-
-
由 Wang Shilong 提交于
Steps to reproduce: mkfs.btrfs <disk> mount <disk> <mnt> btrfs quota enable <mnt> btrfs sub create <mnt>/subv btrfs qgroup limit 10M <mnt>/subv fallocate --length 20M <mnt>/subv/data For the above example, fallocating will return successfully which is not expected, we try to fix it by doing qgroup reservation before fallocating. Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 16 3月, 2013 1 次提交
-
-
由 Liu Bo 提交于
Users report that an extent map's list is still linked when it's actually going to be freed from cache. The story is that a) when we're going to drop an extent map and may split this large one into smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means that it's on the list to be logged, then the smaller ems split from it will also be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected. b) we'll keep ems from unlinking the list and freeing when they are flagged with EXTENT_FLAG_LOGGING, because the log code holds one reference. The end result is the warning, but the truth is that we set the flag EXTENT_FLAG_LOGGING only during fsync. So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one. Reported-by: NJohannes Hirte <johannes.hirte@fem.tu-ilmenau.de> Reported-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 27 2月, 2013 1 次提交
-
-
由 Qu Wenruo 提交于
Though most of the btrfs codes are using ALIGN macro for page alignment, there are still some codes using open-coded alignment like the following: ------ u64 mask = ((u64)root->stripesize - 1); u64 ret = (val + mask) & ~mask; ------ Or even hidden one: ------ num_bytes = (end - start + blocksize) & ~(blocksize - 1); ------ Sometimes these open-coded alignment is not so easy to understand for newbie like me. This commit changes the open-coded alignment to the ALIGN macro for a better readability. Also there is a previous patch from David Sterba with similar changes, but the patch is for 3.2 kernel and seems not merged. http://www.spinics.net/lists/linux-btrfs/msg12747.html Cc: David Sterba <dave@jikos.cz> Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 23 2月, 2013 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 21 2月, 2013 3 次提交
-
-
由 Miao Xie 提交于
If we remount the fs to close the auto defragment or make the fs R/O, we should stop the auto defragment. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
Miao made the ordered operations stuff run async, which introduced a deadlock where we could get somebody (sync) racing in and committing the transaction while a commit was already happening. The new committer would try and flush ordered operations which would hang waiting for the commit to finish because it is done asynchronously and no longer inherits the callers trans handle. To fix this we need to make the ordered operations list a per transaction list. We can get new inodes added to the ordered operation list by truncating them and then having another process writing to them, so this makes it so that anybody trying to add an ordered operation _must_ start a transaction in order to add itself to the list, which will keep new inodes from getting added to the ordered operations list after we start committing. This should fix the deadlock and also keeps us from doing a lot more work than we need to during commit. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Miao Xie 提交于
There is no lock to protect fs_info->fs_state, it will introduce some problems, such as the value may be covered by the other task when several tasks modify it. For example: Task0 - CPU0 Task1 - CPU1 mov %fs_state rax or $0x1 rax mov %fs_state rax or $0x2 rax mov rax %fs_state mov rax %fs_state The expected value is 3, but in fact, it is 2. Though this problem doesn't happen now (because there is only one flag currently), the code is error prone, if we add other flags, the above problem will happen to a certainty. Now we use bit operation for it to fix the above problem. In this way, we can make the code more robust and be easy to add new flags. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 20 2月, 2013 2 次提交
-
-
由 Filipe Brandenburger 提交于
The header file will then be installed under /usr/include/linux so that userspace applications can refer to Btrfs ioctls by name and use the same structs used internally in the kernel. Signed-off-by: NFilipe Brandenburger <filbranden@google.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
Since we don't actually copy the extent information from the source tree in the fast case we don't need to wait for ordered io to be completed in order to fsync, we just need to wait for the io to be completed. So when we're logging our file just attach all of the ordered extents to the log, and then when the log syncs just wait for IO_DONE on the ordered extents and then write the super. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 06 2月, 2013 1 次提交
-
-
由 Liu Bo 提交于
While running snapshot testscript created by Mitch and David, the race between autodefrag and snapshot deletion can lead to corruption of dead_root list so that we can get crash on btrfs_clean_old_snapshots(). And besides autodefrag, scrub also does the same thing, ie. read root first and get inode. Here is the story(take autodefrag as an example): (1) when we delete a snapshot or subvolume, it will set its root's refs to zero and do a iput() on its own inode, and if this inode happens to be the only active in-meory one in root's inode rbtree, it will add itself to the global dead_roots list for later cleanup. (2) after (1), the autodefrag thread may read another inode for defrag and the inode is just in the deleted snapshot/subvolume, but all of these are without checking if the root is still valid(refs > 0). So the end up result is adding the deleted snapshot/subvolume's root to the global dead_roots list AGAIN. Fortunately, we already have a srcu lock to avoid the race, ie. subvol_srcu. So all we need to do is to take the lock to protect 'read root and get inode', since we synchronize to wait for the rcu grace period before adding something to the global dead_roots list. Reported-by: NMitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-