- 30 5月, 2012 20 次提交
-
-
由 Miao Xie 提交于
Two files in the different subvolumes may have the same inode id, so The rb-tree which is used to manage the defragment object must take it into account. This patch fix this problem. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
-
由 Josef Bacik 提交于
If cow_file_range_inline fails with ENOSPC we abort the transaction which isn't very nice. This really shouldn't be happening anyways but there's no sense in making it a horrible error when we can easily just go allocate normal data space for this stuff. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
Ceph was hitting this race where we would remove an inode from the per-root orphan list before we would release the space we had reserved for the inode. We actually don't need a list or anything, we just need to make sure the root doesn't try to free up the orphan reserve until after the inodes have released their reservations. So use an atomic counter instead of a list on the root and only decrement the counter after we've released our reservation. I've tested this as well as several others and we no longer see the warnings that you would see while running ceph. Thanks, Btrfs: fix how we deal with the orphan block rsv Ceph was hitting this race where we would remove an inode from the per-root orphan list before we would release the space we had reserved for the inode. We actually don't need a list or anything, we just need to make sure the root doesn't try to free up the orphan reserve until after the inodes have released their reservations. So use an atomic counter instead of a list on the root and only decrement the counter after we've released our reservation. I've tested this as well as several others and we no longer see the warnings that you would see while running ceph. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
Miao pointed this out while I was working on an orphan problem that messing with a bitfield where different ranges are protected by different locks doesn't work out right. Turns out we've been doing this forever where we have different parts of the bit field protected by either no lock at all or different locks which could cause all sorts of weird problems including the issue I was hitting. So instead make a runtime_flags thing that we use the normal bit operations on that are all atomic so we can keep having our no/different locking for the different flags and then make force_compress it's own thing so it can be treated normally. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
When we write out the free space cache we will write out everything that is in our in memory tree, and then we will just walk the pinned extents tree and write anything we see there. The problem with this is that during normal operations the pinned extents will be merged back into the free space tree normally, and then we can allocate space from the merged areas and commit them to the tree log. If we crash and replay the tree log we will crash again because the tree log will try to free up space from what looks like 2 seperate but contiguous entries, since one entry is from the original free space cache and the other was a pinned extent that was merged back. To fix this we just need to walk the free space tree after we load it and merge contiguous entries back together. This will keep the tree log stuff from breaking and it will make the allocator behave more nicely. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Liu Bo 提交于
In normal cases, we would not be allowed to do balance in RO mode. However, when we're using a seeding device and adding another device to sprout, things will change: $ mkfs.btrfs /dev/sdb7 $ btrfstune -S 1 /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs -o ro $ btrfs fi bal /mnt/btrfs -----------------------> fail. $ btrfs dev add /dev/sdb8 /mnt/btrfs $ btrfs fi bal /mnt/btrfs -----------------------> works! It should not be designed as an exception, and we'd better add another check for mnt flags. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Reviewed-by: NJosef Bacik <josef@redhat.com>
-
由 Liu Bo 提交于
Fully utilize our extent state's new helper functions to use fastpath as much as possible. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Reviewed-by: NJosef Bacik <josef@redhat.com>
-
由 Liu Bo 提交于
Reproduce: $ mkfs.btrfs /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs -o ro $ btrfs dev add /dev/sdb8 /mnt/btrfs ERROR: error adding the device '/dev/sdb8' - Invalid argument Since we mount with readonly options, and /dev/sdb7 is not a seeding one, a readonly notification is preferred. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Reviewed-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
We noticed that the ordered extent completion doesn't really rely on having a page and that it could be done independantly of ending the writeback on a page. This patch makes us not do the threaded endio stuff for normal buffered writes and direct writes so we can end page writeback as soon as possible (in irq context) and only start threads to do the ordered work when it is actually done. Compression needs to be reworked some to take advantage of this as well, but atm it has to do a find_get_page in its endio handler so it must be done in its own thread. This makes direct writes quite a bit faster. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
We are checking delalloc to see if it is ok to update the i_size. There are 2 cases it stops us from updating 1) If there is delalloc between our current disk_i_size and this ordered extent 2) If there is delalloc between our current ordered extent and the next ordered extent These tests are racy however since we can set delalloc for these ranges at any time. Also for the first case if we notice there is delalloc between disk_i_size and our ordered extent we will not update disk_i_size and assume that when that delalloc bit gets written out it will update everything properly. However if we crash before that we will have file extents outside of our i_size, which is not good, so this test is dangerous as well as racy. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Jim Meyering 提交于
There is an off-by-one error: allocating room for a maximal result string but without room for a trailing NUL. That, can lead to returning a transformed string that is not NUL-terminated, and then to a caller reading beyond end of the malloc'd buffer. Rewrite to s/kzalloc/kmalloc/, remove unwarranted use of strncpy (the result is guaranteed to fit), remove dead strlen at end, and change a few variable names and comments. Reviewed-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NJim Meyering <meyering@redhat.com>
-
由 Jim Meyering 提交于
A device with name of length BTRFS_DEVICE_PATH_NAME_MAX or longer would not be NUL-terminated in the DEV_INFO ioctl result buffer. Signed-off-by: NJim Meyering <meyering@redhat.com>
-
由 Jim Meyering 提交于
The buffer read-overrun would be triggered by a printk format starting with <N>, where N is a single digit. NUL-terminate after strncpy. Use memcpy, not strncpy, since we know the string we're copying fits in the destination buffer and contains no NUL byte. Signed-off-by: NJim Meyering <meyering@redhat.com>
-
由 Daniel J Blueman 提交于
Address some minor type issues identified by sparse checker. Signed-off-by: NDaniel J Blueman <daniel@quora.org>
-
由 Sergei Trofimovich 提交于
Changing 'mount -oremount,thread_pool=2 /' didn't make any effect: maximum amount of worker threads is specified in 2 places: - in 'strict btrfs_fs_info::thread_pool_size' - in each worker struct: 'struct btrfs_workers::max_workers' 'mount -oremount' updated only 'btrfs_fs_info::thread_pool_size'. Fix it by pushing new maximum value to all created worker structures as well. Cc: Josef Bacik <josef@redhat.com> Cc: Chris Mason <chris.mason@oracle.com> Reviewed-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NSergei Trofimovich <slyfox@gentoo.org>
-
由 Josef Bacik 提交于
We already do the btrfs_wait_ordered_range which will do this for us, so just remove this call so we don't call it twice. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
In btrfs_wait_ordered_range we have been calling filemap_fdata_write() twice because compression does strange things and then waiting. Then we look up ordered extents and if we find any we will always schedule_timeout(); once and then loop back around and do it all again. We will even check to see if there is delalloc pages on this range and loop again. So this patch gets rid of the multipe fdata_write() calls and just does filemap_write_and_wait(). In the case of compression we will still find the ordered extents and start those individually if we need to so that is ok, but in the normal buffered case we avoid all this weird overhead. Then in the case of the schedule_timeout(1), we don't need it. All callers either 1) don't care, they just want to make sure what they just wrote maeks it to disk or 2) are doing the lock()->lookup ordered->unlock->flush thing in which case it will lock and check for ordered extents _anyway_ so get back to them as quickly as possible. The delaloc check is simply not needed, this only catches the case where we write to the file again since doing the filemap_write_and_wait() and if the caller truly cares about that it will take care of everything itself. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
These warnings are bogus since we will always have at least one page in an eb, but to make the compiler happy just set ret = 0 in these two cases. Thanks, Btrfs: fix compile warnings in extent_io.c These warnings are bogus since we will always have at least one page in an eb, but to make the compiler happy just set ret = 0 in these two cases. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
When running compilebench I noticed we were spending some time looking up acls on new inodes, which shouldn't be happening since there were no acls. This is because when we init acls on the inode after creating them we don't cache the fact there are no acls if there aren't any. Doing this adds a little bit of a bump to my compilebench runs. Thanks, Btrfs: cache no acl on new inodes Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
We've been keeping around the inode sequence number in hopes that somebody would use it, but nobody uses it and people actually use i_version which serves the same purpose, so use i_version where we used the incore inode's sequence number and that way the sequence is updated properly across the board, and not just in file write. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 11 5月, 2012 7 次提交
-
-
由 Dan Carpenter 提交于
It confuses Smatch that we use two names for the same lock. Plus the shorter name is nicer. This doesn't change how the code works, it's just a cleanup. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
-
由 Stefan Behrens 提交于
The integrity checker used to be coded for nodesize == leafsize == sectorsize == PAGE_CACHE_SIZE. This is now changed to support sizes for nodesize and leafsize which are N * PAGE_CACHE_SIZE. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
-
由 Wang Sheng-Hui 提交于
In tree_insert, var *entry is used in the loop only, and is useless out of the loop. Remove the useless assignment after the loop. Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>
-
由 Wang Sheng-Hui 提交于
The return value of find_first_extent_bit is 1 or 0, no < 0. And if found something, return 0; if nothing was found, return 1. Fix the comment. Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>
-
由 Wang Sheng-Hui 提交于
num_extent_pages returns the number of pages in the specific range, not the index of the last page in the eb range. btrfs_release_extent_buffer_page is called with start_idx set 0 in current codes, so it's not a problem yet. But the logic is indeed wrong. Fix it here. Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>
-
由 Wang Sheng-Hui 提交于
No 'delete' arg is used for clear_state_bit. Cleanup the comment. Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>
-
由 Wang Sheng-Hui 提交于
The code path should not reach there. Remove it. Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>
-
- 06 5月, 2012 1 次提交
-
-
由 Chris Mason 提交于
verify_parent_transid needs to lock the extent range to make sure no IO is underway, and so it can safely clear the uptodate bits if our checks fail. But, a few callers are using it with spinlocks held. Most of the time, the generation numbers are going to match, and we don't want to switch to a blocking lock just for the error case. This adds an atomic flag to verify_parent_transid, and changes it to return EAGAIN if it needs to block to properly verifiy things. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 05 5月, 2012 4 次提交
-
-
由 Stefan Behrens 提交于
Fix that when scrub tries to repair an I/O or checksum error and one of the devices containing the mirror is missing, it crashes in bio_add_page because the bdev is a NULL pointer for missing devices. Reported-by: NMarco L. Crociani <marco.crociani@gmail.com> Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Alexander Block 提交于
Fix the size members of btrfs_ioctl_ino_path_args and btrfs_ioctl_logical_ino_args. The user space btrfs-progs utilities used __u64 and the kernel headers used __u32 before. Signed-off-by: NAlexander Block <ablock84@googlemail.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
If we happen to alloc a extent buffer and then alloc a page and notice that page is already attached to an extent buffer, we will only unlock it and free our existing eb. Any pages currently attached to that eb will be properly freed, but we don't do the page_cache_release() on the page where we noticed the other extent buffer which can cause us to leak pages and I hope cause the weird issues we've been seeing in this area. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
add_root_to_dirty_list happens once at the very beginning of the transaction, but it is still racey. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 28 4月, 2012 7 次提交
-
-
由 Chris Mason 提交于
We're spending huge amounts of time on lock contention during end_io processing because we unconditionally assume we are overwriting an existing extent in the file for each IO. This checks to see if we are outside i_size, and if so, it uses a less expensive readonly search of the btree to look for existing extents. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Btrfs has an optimization where it will preallocate dentries during readdir to fill in enough information to open the inode without an extra lookup. But, we're calling d_alloc, which is doing GFP_KERNEL allocations, and that leads to deadlocks because our readdir code has tree locks held. For now, disable this optimization. We'll fix the gfp mask in the next merge window. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Daniel J Blueman 提交于
Fix out-of-space checking, addressing a warning and potential resource leak when resizing the filesystem down while allocating blocks. Signed-off-by: NDaniel J Blueman <daniel@quora.org> Reviewed-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Stefan Behrens 提交于
may_commit_transaction() calls spin_lock(&space_info->lock); spin_lock(&delayed_rsv->lock); and update_global_block_rsv() calls spin_lock(&block_rsv->lock); spin_lock(&sinfo->lock); Lockdep complains about this at run time. Everywhere except in update_global_block_rsv(), the space_info lock is the outer lock, therefore the locking order in update_global_block_rsv() is changed. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Daniel J Blueman 提交于
I was seeing root_list corruption on unmount during fs resize in 3.4-rc4; add correct locking to address this. Signed-off-by: NDaniel J Blueman <daniel@quora.org> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Jan Schmidt 提交于
btrfs_map_block sets mirror_num, so that the repair code knows eventually which device gave us the read error. For RAID10, mirror_num must be 1 or 2. Before this fix mirror_num was incorrectly related to our stripe index. Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
btrfs_start_delalloc_inodes will just walk the list of delalloc inodes and start writing them out, but it doesn't splice the list or anything so as long as somebody is doing work on the box you could end up in this section _forever_. So just remove it, it's not needed anyway since sync will start writeback on all inodes anyway, all we need to do is wait for ordered extents and then we can commit the transaction. In my horrible torture test sync goes from taking 4 minutes to about 1.5 minutes. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 19 4月, 2012 1 次提交
-
-
由 Stefan Behrens 提交于
The bitfield member mount_opt was too small by one bit to hold the mount option that enabled to include data extents in the integrity checker. Since the same issue happened when the BTRFS_MOUNT_PANIC_ON_FATAL_ERROR option was added (git rebase silently merges so that the increase of the size of the bitfield member is lost), the bit limit was removed entirely. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
-