- 29 8月, 2012 3 次提交
-
-
由 Josef Bacik 提交于
We need a barrir before calling waitqueue_active otherwise we will miss wakeups. So in places that do atomic_dec(); then atomic_read() use atomic_dec_return() which imply a memory barrier (see memory-barriers.txt) and then add an explicit memory barrier everywhere else that need them. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
We've been allocating a big array for csums instead of storing them in the io_tree like we do for buffered reads because previously we were locking the entire range, so we didn't have an extent state for each sector of the range. But now that we do the range locking as we map the buffers we can limit the mapping lenght to sectorsize and use the private part of the io_tree for our csums. This allows us to avoid an extra memory allocation for direct reads which could incur latency. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
A deadlock in xfstests 113 was uncovered by commit d187663e This is because we would not return EIOCBQUEUED for short AIO reads, instead we'd wait for the DIO to complete and then return the amount of data we transferred, which would allow our stuff to unlock the remaning amount. But with this change this no longer happens, so if we have a short AIO read (for example if we try to read past EOF), we could leave the section from EOF to the end of where we tried to read locked. Fixing this is tricky since there is no clear way to know exactly how much data DIO truly submitted for IO, so to make this less hard on ourselves and less combersome we need to lock the extents as we try to map them, and then we unlock any areas we didn't actually map. This makes us completely safe from deadlocks and reliance on a particular behavior of the DIO code. This also lays the groundwork for allowing us to use the normal csum storage method for reads which means we can remove an allocation. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 26 7月, 2012 1 次提交
-
-
由 Alexander Block 提交于
This patch introduces uuids for subvolumes. Each subvolume has it's own uuid. In case it was snapshotted, it also contains parent_uuid. In case it was received, it also contains received_uuid. It also introduces subvolume ctime/otime/stime/rtime. The first two are comparable to the times found in inodes. otime is the origin/creation time and ctime is the change time. stime/rtime are only valid on received subvolumes. stime is the time of the subvolume when it was sent. rtime is the time of the subvolume when it was received. Additionally to the times, we have a transid for each time. They are updated at the same place as the times. btrfs receive uses stransid and rtransid to find out if a received subvolume changed in the meantime. If an older kernel mounts a filesystem with the extented fields, all fields become invalid. The next mount with a new kernel will detect this and reset the fields. Signed-off-by: NAlexander Block <ablock84@googlemail.com> Reviewed-by: NDavid Sterba <dave@jikos.cz> Reviewed-by: NArne Jansen <sensille@gmx.net> Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>
-
- 24 7月, 2012 6 次提交
-
-
由 Li Zefan 提交于
The otime field is not zeroed, so users will see random otime in an old filesystem with a new kernel which has otime support in the future. The reserved bytes are also not zeroed, and we'll have compatibility issue if we make use of those bytes. Signed-off-by: NLi Zefan <lizefan@huawei.com>
-
由 Li Zefan 提交于
Inodes always allocate free space with BTRFS_BLOCK_GROUP_DATA type, which means every inode has the same BTRFS_I(inode)->free_space pointer. This shrinks struct btrfs_inode by 4 bytes (or 8 bytes on 64 bits). Signed-off-by: NLi Zefan <lizefan@huawei.com>
-
由 Liu Bo 提交于
Since root can be fetched via BTRFS_I macro directly, we can save an args for btrfs_is_free_space_inode(). Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Liu Bo 提交于
It should be 10 * 1024 * 1024. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
由 Tsutomu Itoh 提交于
We didn't check error of btrfs_update_inode(), but that error looks easy to bubble back up. Reviewed-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Alexander Block 提交于
Before the update_time inode operation was indroduced, it was not possible to prevent updates of atime on RO subvolumes. VFS was only able to check for RO on the mount, but did not know anything about btrfs subvolumes. btrfs_update_time does now check if the root is RO and skip updating of times. Signed-off-by: NAlexander Block <ablock84@googlemail.com>
-
- 03 7月, 2012 2 次提交
-
-
由 Liu Bo 提交于
When we're evicting an inode during log recovery, we need to ensure that the inode is not in orphan state any more, which means inode's run_time flags has _no_ BTRFS_INODE_HAS_ORPHAN_ITEM. Thus, the BUG_ON was triggered because of a wrong check for the flags. Reviewed-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
Miao pointed out there's a problem with mixing dio writes and buffered reads. If the read happens between us invalidating the page range and actually locking the extent we can bring in pages into page cache. Then once the write finishes if somebody tries to read again it will just find uptodate pages and we'll read stale data. So we need to lock the extent and check for uptodate bits in the range. If there are uptodate bits we need to unlock and invalidate again. This will keep this race from happening since we will hold the extent locked until we create the ordered extent, and then teh read side always waits for ordered extents. There was also a race in how we updated i_size, previously we were relying on the generic DIO stuff to adjust the i_size after the DIO had completed, but this happens outside of the extent lock which means reads could come in and not see the updated i_size. So instead move this work into where we create the extents, and then this way the update ordered i_size stuff works properly in the endio handlers. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 21 6月, 2012 1 次提交
-
-
由 Josef Bacik 提交于
There is some concern that these iput()'s could be the final iputs and could induce lockups on people waiting on writeback. This would happen in the rare case that we don't create ordered extents because of an error, but it is theoretically possible and we already have a mechanism to deal with this so just make them delayed iputs to negate any worry. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 15 6月, 2012 5 次提交
-
-
由 Liu Bo 提交于
When we move a file into a directory with compression flag, we need to inherite BTRFS_INODE_COMPRESS and clear BTRFS_INODE_NOCOMPRESS as well. But if we move a file into a directory without compression flag, we need to clear both of them. It is the way how our setflags deals with compression flag, so keep the same behaviour here. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
I removed this in an earlier commit and I was wrong. Because compression can return from filemap_fdatawrite() without having actually set any of it's pages as writeback() it can make filemap_fdatawait() do essentially nothing, and then we won't find any ordered extents because they may not have been created yet. So not only does this make fsync() completely useless, but it will also screw up if you truncate on a non-page aligned offset since we zero out the end and then wait on ordered extents and then call drop caches. We can drop the cache before the io completes and then we try to unpin the extent we just wrote we won't find it and everything goes sideways. So fix this by putting it back and put a giant comment there to keep me from trying to remove it in the future. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
A user reported lots of problems using compression on the new code and it turns out part of the problem was that igrab() was failing when we added a new ordered extent. This is because when writing out an inode under compression we immediately return without actually doing anything to the pages, and then in another thread at some point down the line actually do the ordered dance. The problem is between the point that we start writeback and we actually add the ordered extent we could be trying to reclaim the inode, which makes igrab() return NULL. So we need to do an igrab() when we create the async extent and then drop it when we are done with it. This makes sure we stay pinned in memory until the ordered extent can get a reference on it and we are good to go. With this patch we no longer panic in btrfs_finish_ordered_io(). Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
I was getting hung on umount when a transaction was aborted because a range of one of the free space inodes was still locked. This is because the nocow stuff doesn't unlock anything on error. This fixed the problem and I verified that is what was happening. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
While doing my enospc work I got a transaction abortion that resulted in a panic when we tried to unlock_page() an already unlocked page. This is because we aren't calling extent_clear_unlock_delalloc with the locked page so it was unlocking all the pages in the range. This is wrong since __extent_writepage expects to have the page locked still unless we return *page_started as 1. This should keep us from panicing. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 02 6月, 2012 1 次提交
-
-
由 Josef Bacik 提交于
Btrfs had been doing it's own file_update_time so we could catch ENOSPC properly, so just update our btrfs_update_time to work with the new stuff and then we'll be fancy later. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 30 5月, 2012 5 次提交
-
-
由 Josef Bacik 提交于
If cow_file_range_inline fails with ENOSPC we abort the transaction which isn't very nice. This really shouldn't be happening anyways but there's no sense in making it a horrible error when we can easily just go allocate normal data space for this stuff. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
Ceph was hitting this race where we would remove an inode from the per-root orphan list before we would release the space we had reserved for the inode. We actually don't need a list or anything, we just need to make sure the root doesn't try to free up the orphan reserve until after the inodes have released their reservations. So use an atomic counter instead of a list on the root and only decrement the counter after we've released our reservation. I've tested this as well as several others and we no longer see the warnings that you would see while running ceph. Thanks, Btrfs: fix how we deal with the orphan block rsv Ceph was hitting this race where we would remove an inode from the per-root orphan list before we would release the space we had reserved for the inode. We actually don't need a list or anything, we just need to make sure the root doesn't try to free up the orphan reserve until after the inodes have released their reservations. So use an atomic counter instead of a list on the root and only decrement the counter after we've released our reservation. I've tested this as well as several others and we no longer see the warnings that you would see while running ceph. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
Miao pointed this out while I was working on an orphan problem that messing with a bitfield where different ranges are protected by different locks doesn't work out right. Turns out we've been doing this forever where we have different parts of the bit field protected by either no lock at all or different locks which could cause all sorts of weird problems including the issue I was hitting. So instead make a runtime_flags thing that we use the normal bit operations on that are all atomic so we can keep having our no/different locking for the different flags and then make force_compress it's own thing so it can be treated normally. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
We noticed that the ordered extent completion doesn't really rely on having a page and that it could be done independantly of ending the writeback on a page. This patch makes us not do the threaded endio stuff for normal buffered writes and direct writes so we can end page writeback as soon as possible (in irq context) and only start threads to do the ordered work when it is actually done. Compression needs to be reworked some to take advantage of this as well, but atm it has to do a find_get_page in its endio handler so it must be done in its own thread. This makes direct writes quite a bit faster. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
We've been keeping around the inode sequence number in hopes that somebody would use it, but nobody uses it and people actually use i_version which serves the same purpose, so use i_version where we used the incore inode's sequence number and that way the sequence is updated properly across the board, and not just in file write. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 06 5月, 2012 1 次提交
-
-
由 Jan Kara 提交于
After we moved inode_sync_wait() from end_writeback() it doesn't make sense to call the function end_writeback() anymore. Rename it to clear_inode() which well says what the function really does - set I_CLEAR flag. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
-
- 28 4月, 2012 1 次提交
-
-
由 Chris Mason 提交于
Btrfs has an optimization where it will preallocate dentries during readdir to fill in enough information to open the inode without an extra lookup. But, we're calling d_alloc, which is doing GFP_KERNEL allocations, and that leads to deadlocks because our readdir code has tree locks held. For now, disable this optimization. We'll fix the gfp mask in the next merge window. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 19 4月, 2012 3 次提交
-
-
由 Josef Bacik 提交于
A user reported a panic where we were trying to fix a bad mirror but the mirror number we were giving was 0, which is invalid. This is because we don't do the transid verification until after the read, so as far as the read code is concerned the read was a success. So instead store the mirror we read from so that if there is some failure post read we know which mirror to try next and which mirror needs to be fixed if we find a good copy of the block. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Arne Jansen 提交于
When inserting into the radix tree returns EEXIST, get the existing entry without giving up the spinlock in between. There was a race for both the zones trees and the extent tree. Signed-off-by: NArne Jansen <sensille@gmx.net>
-
由 Li Zefan 提交于
Follow those instructions, and you'll trigger a warning in the beginning of d_set_d_op(): # mkfs.btrfs /dev/loop3 # mount /dev/loop3 /mnt # btrfs sub create /mnt/sub # btrfs sub snap /mnt /mnt/snap # touch /mnt/snap/sub touch: cannot touch `tmp': Permission denied __d_alloc() set d_op to sb->s_d_op (btrfs_dentry_operations), and then simple_lookup() reset it to simple_dentry_operations, which triggered the warning. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
-
- 29 3月, 2012 1 次提交
-
-
由 Liu Bo 提交于
$ mkfs.btrfs disk $ mount disk /mnt -o autodefrag $ dd if=/dev/zero of=/mnt/foobar bs=4k count=10 2>/dev/null && sync $ for i in `seq 9 -2 0`; do dd if=/dev/zero of=/mnt/foobar bs=4k count=1 \ seek=$i conv=notrunc 2> /dev/null; done && sync then we'll get to defrag "foobar" again and again. So does option "-o autodefrag,compress". Reasons: When the cleaner kthread gets to fetch inodes from the defrag tree and defrag them, it will dirty pages and submit them, this will comes to another DATA COW where the processing inode will be inserted to the defrag tree again. This patch sets a rule for COW code, i.e. insert an inode when we're really going to make some defragments. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 27 3月, 2012 2 次提交
-
-
由 Josef Bacik 提交于
This patch simplifies how we track our extent buffers. Previously we could exit writepages with only having written half of an extent buffer, which meant we had to track the state of the pages and the state of the extent buffers differently. Now we only read in entire extent buffers and write out entire extent buffers, this allows us to simply set bits in our bflags to indicate the state of the eb and we no longer have to do things like track uptodate with our iotree. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
We have been passing nothing but (u64)-1 to find_free_extent for search_end in all of the callers, so it's completely useless, and we've always been passing 0 in as search_start, so just remove them as function arguments and move search_start into find_free_extent. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 22 3月, 2012 8 次提交
-
-
由 Jeff Mahoney 提交于
btrfs currently handles most errors with BUG_ON. This patch is a work-in- progress but aims to handle most errors other than internal logic errors and ENOMEM more gracefully. This iteration prevents most crashes but can run into lockups with the page lock on occasion when the timing "works out." Signed-off-by: NJeff Mahoney <jeffm@suse.com>
-
由 Mark Fasheh 提交于
This is called from only one place - create_subvol() which passes errors safely back out to it's caller, btrfs_mksubvol where they are handled. Additionally, btrfs_create_subvol_root() itself bug's needlessly from error return of btrfs_update_inode(). Since create_subvol() was fixed to catch errors we can bubble this one up too. Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Jeff Mahoney 提交于
set_extent_bit can do exclusive locking but only when called by lock_extent*, Drop the exclusive bits argument except when called by lock_extent. Signed-off-by: NJeff Mahoney <jeffm@suse.com>
-
由 Jeff Mahoney 提交于
lock_extent and unlock_extent are always called with GFP_NOFS, drop the argument and use GFP_NOFS consistently. Signed-off-by: NJeff Mahoney <jeffm@suse.com>
-
由 Jeff Mahoney 提交于
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
-
由 Jeff Mahoney 提交于
This pushes failures from the submit_bio_hook callbacks, btrfs_submit_bio_hook and btree_submit_bio_hook into the callers, including callers of submit_one_bio where it catches the failures with BUG_ON. It also pushes up through the ->readpage_io_failed_hook to end_bio_extent_writepage where the error is already caught with BUG_ON. Signed-off-by: NJeff Mahoney <jeffm@suse.com>
-
由 Jeff Mahoney 提交于
In submit_extent_page, there's a visually noisy if statement that, in the midst of other conditions, does the tree dependency for tree->ops and tree->ops->merge_bio_hook before calling it, and then another condition afterwards. If an error is returned from merge_bio_hook, there's no way to catch it. It's considered a routine "1" return value instead of a failure. This patch factors out the dependency check into a new local merge_bio routine and BUG's on an error. The if statement is less noisy as a side- effect. Signed-off-by: NJeff Mahoney <jeffm@suse.com>
-
由 Jeff Mahoney 提交于
btrfs_submit_bio_hook currently calls btrfs_bio_wq_end_io in either case of an if statement that determines one of the arguments. This patch moves the function call outside of the if statement and uses it to only determine the different argument. This allows us to catch an error in one place in a more visually obvious way. Signed-off-by: NJeff Mahoney <jeffm@suse.com>
-