- 24 5月, 2011 12 次提交
-
-
由 Josef Bacik 提交于
If we have a very large filesystem, we can spend a lot of time in find_free_extent just trying to allocate from empty block groups. So instead check to see if the block group even has enough space for the allocation, and if not go on to the next block group. Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
Our readahead is sort of sloppy, and really isn't always needed. For example if ls is doing a stating ls (which is the default) it's going to stat in non-disk order, so if say you have a directory with a stupid amount of files, readahead is going to do nothing but waste time in the case of doing the stat. Taking the unconditional readahead out made my test go from 57 minutes to 36 minutes. This means that everywhere we do loop through the tree we want to make sure we do set path->reada properly, so I went through and found all of the places where we loop through the path and set reada to 1. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
When the fs is super full and we unmount the fs, we could get stuck in this thing where unmount is waiting for the caching kthread to make progress and the caching kthread keeps scheduling because we're in the middle of a commit. So instead just let the caching kthread keep going and only yeild if need_resched(). This makes my horrible umount case go from taking up to 10 minutes to taking less than 20 seconds. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
Originally this was going to be used as a way to give hints to the allocator, but frankly we can get much better hints elsewhere and it's not even used at all for anything usefull. In addition to be completely useless, when we initialize an inode we try and find a freeish block group to set as the inodes block group, and with a completely full 40gb fs this takes _forever_, so I imagine with say 1tb fs this is just unbearable. So just axe the thing altoghether, we don't need it and it saves us 8 bytes in the inode and saves us 500 microseconds per inode lookup in my testcase. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
We have a bit of debugging in btrfs_search_slot to make sure the level of the cow block is the same as the original block we were cow'ing. I don't think I've ever seen this tripped, so kill it. This saves us 2 kmap's per level in our search. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
If we have particularly full nodes, we could call btrfs_node_blockptr up to 32 times, which is 32 pairs of kmap/kunmap, which _sucks_. So go ahead and map the extent buffer while we look for readahead targets. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
In count_range_bits we are adjusting total_bytes based on the range we are searching for, but we don't adjust the range start according to the range we are searching for, which makes for weird results. For example, if the range [0-8192] is set DELALLOC, but I search for 4096-8192, I will get back 4096 for the number of bytes found, but the range_start will be 0, which makes it look like the range is [0-4096]. So instead set range_start = max(cur_start, state->start). This makes everything come out right. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
The ceph guys keep running into problems where we have space reserved in our orphan block rsv when freeing it up. This is because they tend to do snapshots alot, so their truncates tend to use a bunch of space, so when we go to do things like update the inode we have to steal reservation space in order to make the reservation happen. This happens because truncate can use as much space as it freaking feels like, but we still have to hold space for removing the orphan item and updating the inode, which will definitely always happen. So in order to fix this we need to split all of the reservation stuf up. So with this patch we have 1) The orphan block reserve which only holds the space for deleting our orphan item when everything is over. 2) The truncate block reserve which gets allocated and used specifically for the space that the truncate will use on a per truncate basis. 3) The transaction will always have 1 item's worth of data reserved so we can update the inode normally. Hopefully this will make the ceph problem go away. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
We use trans_mutex for lots of things, here's a basic list 1) To serialize trans_handles joining the currently running transaction 2) To make sure that no new trans handles are started while we are committing 3) To protect the dead_roots list and the transaction lists Really the serializing trans_handles joining is not too hard, and can really get bogged down in acquiring a reference to the transaction. So replace the trans_mutex with a trans_lock spinlock and use it to do the following 1) Protect fs_info->running_transaction. All trans handles have to do is check this, and then take a reference of the transaction and keep on going. 2) Protect the fs_info->trans_list. This doesn't get used too much, basically it just holds the current transactions, which will usually just be the currently committing transaction and the currently running transaction at most. 3) Protect the dead roots list. This is only ever processed by splicing the list so this is relatively simple. 4) Protect the fs_info->reloc_ctl stuff. This is very lightweight and was using the trans_mutex before, so this is a pretty straightforward change. 5) Protect fs_info->no_trans_join. Because we don't hold the trans_lock over the entirety of the commit we need to have a way to block new people from creating a new transaction while we're doing our work. So we set no_trans_join and in join_transaction we test to see if that is set, and if it is we do a wait_on_commit. 6) Make the transaction use count atomic so we don't need to take locks to modify it when we're dropping references. 7) Add a commit_lock to the transaction to make sure multiple people trying to commit the same transaction don't race and commit at the same time. 8) Make open_ioctl_trans an atomic so we don't have to take any locks for ioctl trans. I have tested this with xfstests, but obviously it is a pretty hairy change so lots of testing is greatly appreciated. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
We currently track trans handles in current->journal_info, but we don't actually use it. This patch fixes it. This will cover the case where we have multiple people starting transactions down the call chain. This keeps us from having to allocate a new handle and all of that, we just increase the use count of the current handle, save the old block_rsv, and return. I tested this with xfstests and it worked out fine. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
I keep forgetting that btrfs_join_transaction() just ignores the num_items argument, which leads me to sending pointless patches and looking stupid :). So just kill the num_items argument from btrfs_join_transaction and btrfs_start_ioctl_transaction, since neither of them use it. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
In the prealloc filling code and compressed code we don't set trans->block_rsv to the delalloc block reserve properly, which is going to make us use metadata from the wrong pool, this patch fixes that. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 15 5月, 2011 5 次提交
-
-
由 Li Zefan 提交于
Steps to reproduce the bug: - Call FS_IOC_SETLFAGS ioctl with flags=FS_COMPR_FL - Call FS_IOC_SETFLAGS ioctl with flags=0 - Call FS_IOC_GETFLAGS ioctl, and you'll see FS_COMPR_FL is still set! Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
As we've added per file compression/cow support. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
FS_COW_FL and FS_NOCOW_FL were newly introduced to control per file COW in btrfs, but FS_NOCOW_FL is sufficient. The fact is we don't have corresponding BTRFS_INODE_COW flag. COW is default, and FS_NOCOW_FL can be used to switch off COW for a single file. If we mount btrfs with nodatacow, a newly created file will be set with the FS_NOCOW_FL flag. So to turn on COW for it, we can just clear the FS_NOCOW_FL flag. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 liubo 提交于
When a btrfs disk is created by mixed data & metadata option, it will have no pure data or pure metadata space info. In btrfs's for-linus branch, commit 78b1ea13838039cd88afdd62519b40b344d6c920 (Btrfs: fix OOPS of empty filesystem after balance) initializes space infos at the very beginning. The problem is this initialization does not take the mixed case into account, which will cause btrfs will easily get into ENOSPC in mixed case. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Daniel J Blueman 提交于
If posix_acl_from_xattr() returns an error code, a negative address is dereferenced causing an oops; fix by checking for error code first. Signed-off-by: NDaniel J Blueman <daniel.blueman@gmail.com> Reviewed-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 27 4月, 2011 1 次提交
-
-
由 Lucas De Marchi 提交于
These changes were incorrectly fixed by codespell. They were now manually corrected. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
-
- 26 4月, 2011 8 次提交
-
-
由 Tsutomu Itoh 提交于
The error processing of several places is changed like setting the error number only at the error. Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
In btrfs_submit_direct_hook if the first btrfs_map_block fails we need to put the orig_bio, not bio. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
If our space cache is wrong, we do the right thing and free up everything that we loaded, however we don't reset the total_bitmaps counter or the thresholds or anything. So in btrfs_remove_free_space_cache make sure to call free_bitmap() if it's a bitmap, this will keep us from panicing when we check to make sure we don't have too many bitmaps. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
Since commit dc89e982, we've changed to use a specific slab for alocation of free_space items. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Tsutomu Itoh 提交于
The check on the return value of kmalloc() is added to some places. Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Itaru Kitayama 提交于
the space cache use extent_readpages() to read free space information, so we can not use GFP_KERNEL flag to allocate memory, or it may lead to deadlock. Signed-off-by: NItaru Kitayama <kitayama@cl.bb4u.ne.jp> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Tsutomu Itoh 提交于
It is necessary to unlock mutex_lock before it return an error when btrfs_alloc_path() fails. Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 20 4月, 2011 1 次提交
-
-
由 Chris Mason 提交于
The Btrfs submit bio threads have a small number of threads responsible for pushing down bios we've collected for a large number of devices. Since we do all the bios for a single device at once, we want to make sure we unplug and send down the bios for each device as we're done processing them. The new plugging API removed the btrfs code to unplug while processing bios, this adds it back with the new API. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 18 4月, 2011 1 次提交
-
-
由 Chris Mason 提交于
The free space caching code was recently reworked to cache all the pages it needed instead of using find_get_page everywhere. One loop was missed though, so it ended up leaking pages. This fixes it to use our page array instead of find_get_page. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 16 4月, 2011 3 次提交
-
-
由 Josef Bacik 提交于
Everytime we try to allocate disk space we try and see if we can pre-emptively allocate a chunk, but in the common case we don't allocate anything, so there is no sense in taking the chunk_mutex at all. So instead if we are allocating a chunk, mark it in the space_info so we don't get two people trying to allocate at the same time. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Reviewed-by: NLiu Bo <liubo2009@cn.fujitsu.com>
-
由 Chris Mason 提交于
A recent commit caches the extent state in end_bio_extent_readpage, but the search it does should look for locked extents. This fixes things to make it more effective. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
find_free_extent likes to allocate in contiguous clusters, which makes writeback faster, especially on SSD storage. As the FS fragments, these clusters become harder to find and we have to decide between allocating a new chunk to make more clusters or giving up on the cluster to allocate from the free space we have. Right now it creates too many chunks, and you can end up with a whole FS that is mostly empty metadata chunks. This commit changes the allocation code to be more strict and only allocate new chunks when we've made good use of the chunks we already have. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 13 4月, 2011 5 次提交
-
-
由 Miao Xie 提交于
Call posix_acl_valid() to check if an acl is valid or not. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
-
由 Miao Xie 提交于
Link count of the inode is not decreased if btrfs_set_inode_index() fails. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Singed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
-
由 Li Zefan 提交于
btrfs_next_leaf() can return -errno, and we should propagate it to userspace. This also simplifies how we walk the btree path. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
-
由 Li Zefan 提交于
btrfs_next_leaf() can return -errno, and we should propagate it to userspace. This also simplifies how we walk the btree path. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
-
由 Chris Mason 提交于
The extent_io code can take cached pointers into the extent state trees, and these can make lookups much faster in common operations. The caching only happens when specific bits are set that prevent merging and splitting of the extent state. A help function was added to uncache the state, and it was testing the same set of conditionals. This can leak in very strange corner cases where the lock bit goes away unexpectedly. The uncaching should be unconditional. Once we have a ref on the extent we should always give it up. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 12 4月, 2011 4 次提交
-
-
由 Arne Jansen 提交于
In several places the sequence (set_extent_uptodate, unlock_extent) is used. This leads to a duplicate lookup of the extent state. This patch lets set_extent_uptodate return a cached extent_state which can be passed to unlock_extent_cached. The occurences of the above sequences are updated to use the cache. Only end_bio_extent_readpage is updated that it first gets a cached state to pass it to the readpage_end_io_hook as the prototype requested and is later on being used for set/unlock. Signed-off-by: NArne Jansen <sensille@gmx.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
I've been working on making our O_DIRECT latency not suck and I noticed we were taking the trans_mutex in btrfs_end_transaction. So to do this we convert num_writers and use_count to atomic_t's and just decrement them in btrfs_end_transaction. Instead of deleting the transaction from the trans list in put_transaction we do that in btrfs_commit_transaction() since that's the only time it actually needs to be removed from the list. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Xin Zhong 提交于
We create two subvolumes (meego_root and meego_home) in btrfs root directory. And set meego_root as default mount subvolume. After we remount btrfs, meego_root is mounted to top directory by default. Then when we try to mount meego_home (subvol=meego_home) to a subdirectory, it failed. The problem is when default mount subvolume is set to meego_root, we search meego_home in meego_root but can not find it. So the solution is to add a new mount option (subvolrootid) to specify subvol id of root and search subvol name in it. For our case, now we can use "-o subvolrootid=0,subvol=meego_home) to mount meego_home. Detail information can be found in meego bugzilla: https://bugs.meego.com/show_bug.cgi?id=15055Signed-off-by: NZhong, Xin <xin.zhong@intel.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Daniel J Blueman 提交于
Fix address space annotation correct in ioctl.c. Signed-off-by: NDaniel J Blueman <daniel.blueman@gmail.com> BTRFS_BLOCK_GROUP_SYSTEM, @@ -2387,7 +2387,7 @@ long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg) up_read(&info->groups_sem); } - user_dest = (struct btrfs_ioctl_space_info *) + user_dest = (struct btrfs_ioctl_space_info __user *) (arg + sizeof(struct btrfs_ioctl_space_args)); if (copy_to_user(user_dest, dest_orig, alloc_size)) Reviewed-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-