- 08 4月, 2014 3 次提交
-
-
由 Wang Shilong 提交于
This patch fix a regression caused by the following patch: Btrfs: don't flush all delalloc inodes when we doesn't get s_umount lock break while loop will make us call @spin_unlock() without calling @spin_lock() before, fix it. Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Wang Shilong 提交于
To compress a small file range(<=blocksize) that is not an inline extent can not save disk space at all. skip it can save us some cpu time. This patch can also fix wrong setting nocompression flag for inode, say a case when @total_in is 4096, and then we get @total_compressed 52,because we do aligment to page cache size firstly, and then we get into conclusion @total_in=@total_compressed thus we will clear this inode's compression flag. An exception comes from inserting inline extent failure but we still have @total_compressed < @total_in,so we will still reset inode's flag, this is ok, because we don't have good compression effect. Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Wang Shilong 提交于
While running fsstress and snapshots concurrently, we will hit something like followings: Thread 1 Thread 2 |->fallocate |->write pages |->join transaction |->add ordered extent |->end transaction |->flushing data |->creating pending snapshots |->write data into src root's fallocated space After above work flows finished, we will get a state that source and snapshot root share same space, but source root have written data into fallocated space, this will make fsck fail to verify checksums for snapshot root's preallocating file extent data.Nocow writting also has this same problem. Fix this problem by syncing snapshots with nocow writting: 1.for nocow writting,if there are pending snapshots, we will fall into COW way. 2.if there are pending nocow writes, snapshots for this root will be blocked until nocow writting finish. Reported-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com> Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 04 4月, 2014 1 次提交
-
-
由 Johannes Weiner 提交于
Reclaim will be leaving shadow entries in the page cache radix tree upon evicting the real page. As those pages are found from the LRU, an iput() can lead to the inode being freed concurrently. At this point, reclaim must no longer install shadow pages because the inode freeing code needs to ensure the page tree is really empty. Add an address_space flag, AS_EXITING, that the inode freeing code sets under the tree lock before doing the final truncate. Reclaim will check for this flag before installing shadow pages. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NRik van Riel <riel@redhat.com> Reviewed-by: NMinchan Kim <minchan@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Metin Doslu <metin@citusdata.com> Cc: Michel Lespinasse <walken@google.com> Cc: Ozgun Erdogan <ozgun@citusdata.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 3月, 2014 13 次提交
-
-
由 Miao Xie 提交于
We didn't have a lock to protect the access to the delalloc inodes list, that is we might access a empty delalloc inodes list if someone start flushing delalloc inodes because the delalloc inodes were moved into a other list temporarily. Fix it by wrapping the access with a lock. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
由 Miao Xie 提交于
We needn't flush all delalloc inodes when we doesn't get s_umount lock, or we would make the tasks wait for a long time. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
由 Miao Xie 提交于
As the comment in the btrfs_direct_IO says, only the compressed pages need be flush again to make sure they are on the disk, but the common pages needn't, so we add a if statement to check if the inode has compressed pages or not, if no, skip the flush. And in order to prevent the write ranges from intersecting, we need wait for the running ordered extents. But the current code waits for them twice, one is done before the direct IO starts (in btrfs_wait_ordered_range()), the other is before we get the blocks, it is unnecessary. because we can do the direct IO without holding i_mutex, it means that the intersected ordered extents may happen during the direct IO, the first wait can not avoid this problem. So we use filemap_fdatawrite_range() instead of btrfs_wait_ordered_range() to remove the first wait. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
由 Qu Wenruo 提交于
Since the "_struct" suffix is mainly used for distinguish the differnt btrfs_work between the original and the newly created one, there is no need using the suffix since all btrfs_workers are changed into btrfs_workqueue. Also this patch fixed some codes whose code style is changed due to the too long "_struct" suffix. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Tested-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
由 Qu Wenruo 提交于
Replace the fs_info->fixup_workers with the newly created btrfs_workqueue. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Tested-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
由 Qu Wenruo 提交于
Replace the fs_info->endio_* workqueues with the newly created btrfs_workqueue. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Tested-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
由 Qu Wenruo 提交于
Replace the fs_info->submit_workers with the newly created btrfs_workqueue. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Tested-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
由 Qu Wenruo 提交于
Much like the fs_info->workers, replace the fs_info->delalloc_workers use the same btrfs_workqueue. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Tested-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
由 Miao Xie 提交于
We can not release the reserved metadata space for the first write if we find the write position is pre-allocated. Because the kernel might write the data on the disk before we do the second write but after the can-nocow check, if we release the space for the first write, we might fail to update the metadata because of no space. Fix this problem by end nocow write if there is dirty data in the range whose space is pre-allocated. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
由 Liu Bo 提交于
So after transaction is aborted, we need to cleanup inode resources by calling btrfs_invalidate_inodes(), and btrfs_invalidate_inodes() hopes roots' refs to be zero in old times and sets a WARN_ON(), however, this is not always true within cleaning up transaction, so we get to detect transaction abortion and not warn at all. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
由 Wang Shilong 提交于
Btrfs send is assuming readonly root won't change, let's skip readonly root. Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
由 Josef Bacik 提交于
When I converted the BUG_ON() for the free_space_cache_inode in cow_file_range I made it so we just return an error instead of unlocking all of our various stuff. This is a mistake and causes us to hang when we run into this. This patch fixes this problem. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
由 Josef Bacik 提交于
While trying to reproduce a delayed ref problem I noticed the box kept falling over using all 80gb of my ram with btrfs_inode's and btrfs_delayed_node's. Turns out this is because we only throttle delayed inode updates in btrfs_dirty_inode, which doesn't actually get called that often, especially when all you are doing is creating a bunch of files. So balance delayed inode updates everytime we create a new inode. With this patch we no longer use up all of our ram with delayed inode updates. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
- 15 2月, 2014 1 次提交
-
-
由 Josef Bacik 提交于
A user was running into errors from an NFS export of a subvolume that had a default subvol set. When we mount a default subvol we will use d_obtain_alias() to find an existing dentry for the subvolume in the case that the root subvol has already been mounted, or a dummy one is allocated in the case that the root subvol has not already been mounted. This allows us to connect the dentry later on if we wander into the path. However if we don't ever wander into the path we will keep DCACHE_DISCONNECTED set for a long time, which angers NFS. It doesn't appear to cause any problems but it is annoying nonetheless, so simply unset DCACHE_DISCONNECTED in the get_default_root case and switch btrfs_lookup() to use d_materialise_unique() instead which will make everything play nicely together and reconnect stuff if we wander into the defaul subvol path from a different way. With this patch I'm no longer getting the NFS errors when exporting a volume that has been mounted with a default subvol set. Thanks, cc: bfields@fieldses.org cc: ebiederm@xmission.com Signed-off-by: NJosef Bacik <jbacik@fb.com> Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 04 2月, 2014 1 次提交
-
-
由 Josef Bacik 提交于
It's just broken and it's taking a lot of effort to fix it, so for now just disable it so people can defrag in peace. Thanks, Cc: stable@vger.kernel.org Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 29 1月, 2014 17 次提交
-
-
由 Chris Mason 提交于
We have a race during inode init because the BTRFS_I(inode)->location is setup after the inode hash table lock is dropped. btrfs_find_actor uses the location field, so our search might not find an existing inode in the hash table if we race with the inode init code. This commit changes things to setup the location field sooner. Also the find actor now uses only the location objectid to match inodes. For inode hashing, we just need a unique and stable test, it doesn't have to reflect the inode numbers we show to userland. Signed-off-by: NChris Mason <clm@fb.com> CC: stable@vger.kernel.org
-
由 Chris Mason 提交于
If we truncate an uncompressed inline item, ram_bytes isn't updated to reflect the new size. The fixe uses the size directly from the item header when reading uncompressed inlines, and also fixes truncate to update the size as it goes. Reported-by: NJens Axboe <axboe@fb.com> Signed-off-by: NChris Mason <clm@fb.com> CC: stable@vger.kernel.org
-
由 Gui Hecheng 提交于
When we have two adjacent extents in relink_extent_backref, we try to merge them. When we use btrfs_search_slot to locate the slot for the current extent, we shouldn't set "ins_len = 1", because we will merge it into the previous extent rather than insert a new item. Otherwise, we may happen to create a new leaf in btrfs_search_slot and path->slot[0] will be 0. Then we try to fetch the previous item using "path->slots[0]--", and it will cause a warning as follows: [ 145.713385] WARNING: CPU: 3 PID: 1796 at fs/btrfs/extent_io.c:5043 map_private_extent_buffer+0xd4/0xe0 [ 145.713387] btrfs bad mapping eb start 53370886 len 4096, wanted 167772306 8 ... [ 145.713462] [<ffffffffa034b1f4>] map_private_extent_buffer+0xd4/0xe0 [ 145.713476] [<ffffffffa030097a>] ? btrfs_free_path+0x2a/0x40 [ 145.713485] [<ffffffffa0340864>] btrfs_get_token_64+0x64/0xf0 [ 145.713498] [<ffffffffa033472c>] relink_extent_backref+0x41c/0x820 [ 145.713508] [<ffffffffa0334d69>] btrfs_finish_ordered_io+0x239/0xa80 I encounter this warning when running defrag having mkfs.btrfs with option -M. At the same time there are read/writes & snapshots running at background. Signed-off-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Wang Shilong 提交于
Steps to reproduce: # mkfs.btrfs -f /dev/sda8 # mount /dev/sda8 /mnt -o flushoncommit # dd if=/dev/zero of=/mnt/data bs=4k count=102400 & # mount /dev/sda8 /mnt -o remount, ro When remounting RW to RO, the logic is to firstly set flag to RO and then commit transaction, however with option flushoncommit enabled,we will do RO check within committing transaction, so we get a transaction abortion here. Actually,here check is wrong, we should check if FS_STATE_ERROR is set, fix it. Reported-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Suggested-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
This change adds infrastructure to allow for generic properties for inodes. Properties are name/value pairs that can be associated with inodes for different purposes. They are stored as xattrs with the prefix "btrfs." Properties can be inherited - this means when a directory inode has inheritable properties set, these are added to new inodes created under that directory. Further, subvolumes can also have properties associated with them, and they can be inherited from their parent subvolume. Naturally, directory properties have priority over subvolume properties (in practice a subvolume property is just a regular property associated with the root inode, objectid 256, of the subvolume's fs tree). This change also adds one specific property implementation, named "compression", whose values can be "lzo" or "zlib" and it's an inheritable property. The corresponding changes to btrfs-progs were also implemented. A patch with xfstests for this feature will follow once there's agreement on this change/feature. Further, the script at the bottom of this commit message was used to do some benchmarks to measure any performance penalties of this feature. Basically the tests correspond to: Test 1 - create a filesystem and mount it with compress-force=lzo, then sequentially create N files of 64Kb each, measure how long it took to create the files, unmount the filesystem, mount the filesystem and perform an 'ls -lha' against the test directory holding the N files, and report the time the command took. Test 2 - create a filesystem and don't use any compression option when mounting it - instead set the compression property of the subvolume's root to 'lzo'. Then create N files of 64Kb, and report the time it took. The unmount the filesystem, mount it again and perform an 'ls -lha' like in the former test. This means every single file ends up with a property (xattr) associated to it. Test 3 - same as test 2, but uses 4 properties - 3 are duplicates of the compression property, have no real effect other than adding more work when inheriting properties and taking more btree leaf space. Test 4 - same as test 3 but with 10 properties per file. Results (in seconds, and averages of 5 runs each), for different N numbers of files follow. * Without properties (test 1) file creation time ls -lha time 10 000 files 3.49 0.76 100 000 files 47.19 8.37 1 000 000 files 518.51 107.06 * With 1 property (compression property set to lzo - test 2) file creation time ls -lha time 10 000 files 3.63 0.93 100 000 files 48.56 9.74 1 000 000 files 537.72 125.11 * With 4 properties (test 3) file creation time ls -lha time 10 000 files 3.94 1.20 100 000 files 52.14 11.48 1 000 000 files 572.70 142.13 * With 10 properties (test 4) file creation time ls -lha time 10 000 files 4.61 1.35 100 000 files 58.86 13.83 1 000 000 files 656.01 177.61 The increased latencies with properties are essencialy because of: *) When creating an inode, we now synchronously write 1 more item (an xattr item) for each property inherited from the parent dir (or subvolume). This could be done in an asynchronous way such as we do for dir intex items (delayed-inode.c), which could help reduce the file creation latency; *) With properties, we now have larger fs trees. For this particular test each xattr item uses 75 bytes of leaf space in the fs tree. This could be less by using a new item for xattr items, instead of the current btrfs_dir_item, since we could cut the 'location' and 'type' fields (saving 18 bytes) and maybe 'transid' too (saving a total of 26 bytes per xattr item) from the btrfs_dir_item type. Also tried batching the xattr insertions (ignoring proper hash collision handling, since it didn't exist) when creating files that inherit properties from their parent inode/subvolume, but the end results were (surprisingly) essentially the same. Test script: $ cat test.pl #!/usr/bin/perl -w use strict; use Time::HiRes qw(time); use constant NUM_FILES => 10_000; use constant FILE_SIZES => (64 * 1024); use constant DEV => '/dev/sdb4'; use constant MNT_POINT => '/home/fdmanana/btrfs-tests/dev'; use constant TEST_DIR => (MNT_POINT . '/testdir'); system("mkfs.btrfs", "-l", "16384", "-f", DEV) == 0 or die "mkfs.btrfs failed!"; # following line for testing without properties #system("mount", "-o", "compress-force=lzo", DEV, MNT_POINT) == 0 or die "mount failed!"; # following 2 lines for testing with properties system("mount", DEV, MNT_POINT) == 0 or die "mount failed!"; system("btrfs", "prop", "set", MNT_POINT, "compression", "lzo") == 0 or die "set prop failed!"; system("mkdir", TEST_DIR) == 0 or die "mkdir failed!"; my ($t1, $t2); $t1 = time(); for (my $i = 1; $i <= NUM_FILES; $i++) { my $p = TEST_DIR . '/file_' . $i; open(my $f, '>', $p) or die "Error opening file!"; $f->autoflush(1); for (my $j = 0; $j < FILE_SIZES; $j += 4096) { print $f ('A' x 4096) or die "Error writing to file!"; } close($f); } $t2 = time(); print "Time to create " . NUM_FILES . ": " . ($t2 - $t1) . " seconds.\n"; system("umount", DEV) == 0 or die "umount failed!"; system("mount", DEV, MNT_POINT) == 0 or die "mount failed!"; $t1 = time(); system("bash -c 'ls -lha " . TEST_DIR . " > /dev/null'") == 0 or die "ls failed!"; $t2 = time(); print "Time to ls -lha all files: " . ($t2 - $t1) . " seconds.\n"; system("umount", DEV) == 0 or die "umount failed!"; Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
When writing to a file we drop existing file extent items that cover the write range and then add a new file extent item that represents that write range. Before this change we were doing a tree lookup to remove the file extent items, and then after we did another tree lookup to insert the new file extent item. Most of the time all the file extent items we need to drop are located within a single leaf - this is the leaf where our new file extent item ends up at. Therefore, in this common case just combine these 2 operations into a single one. By avoiding the second btree navigation for insertion of the new file extent item, we reduce btree node/leaf lock acquisitions/releases, btree block/leaf COW operations, CPU time on btree node/leaf key binary searches, etc. Besides for file writes, this is an operation that happens for file fsync's as well. However log btrees are much less likely to big as big as regular fs btrees, therefore the impact of this change is smaller. The following benchmark was performed against an SSD drive and a HDD drive, both for random and sequential writes: sysbench --test=fileio --file-num=4096 --file-total-size=8G \ --file-test-mode=[rndwr|seqwr] --num-threads=512 \ --file-block-size=8192 \ --max-requests=1000000 \ --file-fsync-freq=0 --file-io-mode=sync [prepare|run] All results below are averages of 10 runs of the respective test. ** SSD sequential writes Before this change: 225.88 Mb/sec After this change: 277.26 Mb/sec ** SSD random writes Before this change: 49.91 Mb/sec After this change: 56.39 Mb/sec ** HDD sequential writes Before this change: 68.53 Mb/sec After this change: 69.87 Mb/sec ** HDD random writes Before this change: 13.04 Mb/sec After this change: 14.39 Mb/sec Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
The following warning message was outputed when running the 274th case of xfstests with nodatacow option: BUG: Bad page state in process kswapd0 pfn:1c66f page:ffffea0000636848 count:0 mapcount:0 mapping:(null) index:0x78000 page flags: 0x1000000000100a(error|uptodate|private_2) It is because the check of nocow range was wrong, we should compare the start and end position of the extent with the write position to verify if the write position was in the extent, but the current code just used the start postion to do the check, so we got the wrong extent and told the caller that it was a nocow write. And then when we write back the dirty pages, we found we should cow the extent, but at that time, there was no space in the fs, we had to the error flag for the page. When someone reclaimed that page, the above warning outputed. Fix it. Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
If we do a btree search with the goal of updating an existing item without changing its size (ins_len == 0 and cow == 1), then we never need to hold locks on upper level nodes (even when slot == 0) after we COW their child nodes/leaves, as we won't have node splits or merges in this scenario (that is, no key additions, removals or shifts on any nodes or leaves). Therefore release the locks immediately after COWing the child nodes/leaves while navigating the btree, even if their parent slot is 0, instead of returning a path to the caller with those nodes locked, which would get released only when the caller releases or frees the path (or if it calls btrfs_unlock_up_safe). This is a common scenario, for example when updating inode items in fs trees and block group items in the extent tree. The following benchmarks were performed on a quad core machine with 32Gb of ram, using a leaf/node size of 4Kb (to generate deeper fs trees more quickly). sysbench --test=fileio --file-num=131072 --file-total-size=8G \ --file-test-mode=seqwr --num-threads=512 --file-block-size=8192 \ --max-requests=100000 --file-io-mode=sync [prepare|run] Before this change: 49.85Mb/s (average of 5 runs) After this change: 50.38Mb/s (average of 5 runs) Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
The inode reference item is close to inode item, so we insert it simultaneously with the inode item insertion when we create a file/directory.. In fact, we also can handle the inode reference deletion by the same way. So we made this patch to introduce the delayed inode reference deletion for the single link inode(At most case, the file doesn't has hard link, so we don't take the hard link into account). This function is based on the delayed inode mechanism. After applying this patch, we can reduce the time of the file/directory deletion by ~10%. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Frank Holton 提交于
Convert all applicable cases of printk and pr_* to the btrfs_* macros. Fix all uses of the BTRFS prefix. Signed-off-by: NFrank Holton <fholton@gmail.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Wang Shilong 提交于
See the warning below: [ 1209.102076] [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs] [ 1209.102084] [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs] [ 1209.102089] [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40 [ 1209.102092] [<ffffffff8118ab2e>] evict+0x9e/0x190 [ 1209.102094] [<ffffffff8118b313>] iput+0xf3/0x180 [ 1209.102101] [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs] [ 1209.102107] [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs] clear extent bit here to avoid triggering WARN_ON() in remove_extent_mapping() Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Wang Shilong 提交于
Chris introduced hleper function read_csums() and this function has been removed, but we forgot to remove its corresponding comments. Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Tsutomu Itoh 提交于
Clean up btrfs_lookup_dentry() to never return NULL, but PTR_ERR(-ENOENT) instead. This keeps the return value convention consistent. Callers who use btrfs_lookup_dentry() require a trivial update. create_snapshot() in particular looks like it can also lose a BUG_ON(!inode) which is not really needed - there seems less harm in returning ENOENT to userspace at that point in the stack than there is to crash the machine. Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
The inode eviction can be very slow, because during eviction we tell the VFS to truncate all of the inode's pages. This results in calls to btrfs_invalidatepage() which in turn does calls to lock_extent_bits() and clear_extent_bit(). These calls result in too many merges and splits of extent_state structures, which consume a lot of time and cpu when the inode has many pages. In some scenarios I have experienced umount times higher than 15 minutes, even when there's no pending IO (after a btrfs fs sync). A quick way to reproduce this issue: $ mkfs.btrfs -f /dev/sdb3 $ mount /dev/sdb3 /mnt/btrfs $ cd /mnt/btrfs $ sysbench --test=fileio --file-num=128 --file-total-size=16G \ --file-test-mode=seqwr --num-threads=128 \ --file-block-size=16384 --max-time=60 --max-requests=0 run $ time btrfs fi sync . FSSync '.' real 0m25.457s user 0m0.000s sys 0m0.092s $ cd .. $ time umount /mnt/btrfs real 1m38.234s user 0m0.000s sys 1m25.760s The same test on ext4 runs much faster: $ mkfs.ext4 /dev/sdb3 $ mount /dev/sdb3 /mnt/ext4 $ cd /mnt/ext4 $ sysbench --test=fileio --file-num=128 --file-total-size=16G \ --file-test-mode=seqwr --num-threads=128 \ --file-block-size=16384 --max-time=60 --max-requests=0 run $ sync $ cd .. $ time umount /mnt/ext4 real 0m3.626s user 0m0.004s sys 0m3.012s After this patch, the unmount (inode evictions) is much faster: $ mkfs.btrfs -f /dev/sdb3 $ mount /dev/sdb3 /mnt/btrfs $ cd /mnt/btrfs $ sysbench --test=fileio --file-num=128 --file-total-size=16G \ --file-test-mode=seqwr --num-threads=128 \ --file-block-size=16384 --max-time=60 --max-requests=0 run $ time btrfs fi sync . FSSync '.' real 0m26.774s user 0m0.000s sys 0m0.084s $ cd .. $ time umount /mnt/btrfs real 0m1.811s user 0m0.000s sys 0m1.564s Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Kelley Nielsen 提交于
This patch is the second step in bootstrapping the btrfs_find_item interface. The btrfs_find_root_ref() is similar to the former __inode_info(); it accepts four of its parameters, and duplicates the first half of its functionality. Replace the one former call to btrfs_find_root_ref() with a call to btrfs_find_item(), along with the defined key type that was used internally by btrfs_find_root ref, and a null found key. In btrfs_find_item(), add a test for the null key at the place where the functionality of btrfs_find_root_ref() ends; btrfs_find_item() then returns if the test passes. Finally, remove btrfs_find_root_ref(). Signed-off-by: NKelley Nielsen <kelleynnn@gmail.com> Suggested-by: NZach Brown <zab@redhat.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Valentina Giusti 提交于
Variable owner in btrfs_new_inode is unused since commit d82a6f1d (Btrfs: kill BTRFS_I(inode)->block_group) Signed-off-by: NValentina Giusti <valentina.giusti@microon.de> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Josef Bacik 提交于
Btrfs has always had these filler extent data items for holes in inodes. This has made somethings very easy, like logging hole punches and sending hole punches. However for large holey files these extent data items are pure overhead. So add an incompatible feature to no longer add hole extents to reduce the amount of metadata used by these sort of files. This has a few changes for logging and send obviously since they will need to detect holes and log/send the holes if there are any. I've tested this thoroughly with xfstests and it doesn't cause any issues with and without the incompat format set. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 26 1月, 2014 1 次提交
-
-
由 Christoph Hellwig 提交于
Also don't bother to set up a .get_acl method for symlinks as we do not support access control (ACLs or even mode bits) for symlinks in Linux. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 06 12月, 2013 1 次提交
-
-
由 Christoph Hellwig 提交于
Currently notify_change directly updates i_version for size updates, which not only is counter to how all other fields are updated through struct iattr, but also breaks XFS, which need inode updates to happen under its own lock, and synchronized to the structure that gets written to the log. Remove the update in the common code, and it to btrfs and ext4, XFS already does a proper updaste internally and currently gets a double update with the existing code. IMHO this is 3.13 and -stable material and should go in through the XFS tree. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Acked-by: NJan Kara <jack@suse.cz> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NChris Mason <clm@fb.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 24 11月, 2013 2 次提交
-
-
由 Kent Overstreet 提交于
Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: NKent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6
-
由 Kent Overstreet 提交于
With immutable biovecs we don't want code accessing bi_io_vec directly - the uses this patch changes weren't incorrect since they all own the bio, but it makes the code harder to audit for no good reason - also, this will help with multipage bvecs later. Signed-off-by: NKent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
-