- 29 8月, 2012 8 次提交
-
-
由 Josef Bacik 提交于
If you do the following mkfs.btrfs /dev/sdb /dev/sdc rmmod btrfs dd if=/dev/zero of=/dev/sdb bs=1M count=1 mount -o degraded /dev/sdc /mnt/btrfs-test the box will panic trying to deref the name for the missing dev since it is the lower numbered devid. So fix show_devname to not use missing devices. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Stefan Behrens 提交于
In iterate_inodes_from_logical() the error result from extent_from_logical() is patched by mistake. Typically ENOENT is patched to EINVAL because (-ENOENT & BTRFS_EXTENT_FLAG_TREE_BLOCK) evaluates to true. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
-
由 Josef Bacik 提交于
A deadlock in xfstests 113 was uncovered by commit d187663e This is because we would not return EIOCBQUEUED for short AIO reads, instead we'd wait for the DIO to complete and then return the amount of data we transferred, which would allow our stuff to unlock the remaning amount. But with this change this no longer happens, so if we have a short AIO read (for example if we try to read past EOF), we could leave the section from EOF to the end of where we tried to read locked. Fixing this is tricky since there is no clear way to know exactly how much data DIO truly submitted for IO, so to make this less hard on ourselves and less combersome we need to lock the extents as we try to map them, and then we unlock any areas we didn't actually map. This makes us completely safe from deadlocks and reliance on a particular behavior of the DIO code. This also lays the groundwork for allowing us to use the normal csum storage method for reads which means we can remove an allocation. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Dan Carpenter 提交于
"trans->transid" is cpu endian but we want to store the data as little endian. "item->ctime.nsec" is only 32 bits, not 64. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
-
由 Dan Carpenter 提交于
We should release this mutex before returning the error code. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
-
由 Dan Carpenter 提交于
add_qgroup_rb() never returns NULL, only error pointers. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
-
由 Dan Carpenter 提交于
These are returning zero when it should be returning a negative error code. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
-
由 Stefan Behrens 提交于
This should obviously not be "if (&flag)" but "if (flag)". Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
-
- 26 7月, 2012 10 次提交
-
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
git://github.com/ablock84/linux-btrfs由 Chris Mason 提交于
This is the kernel portion of btrfs send/receive Conflicts: fs/btrfs/Makefile fs/btrfs/backref.h fs/btrfs/ctree.c fs/btrfs/ioctl.c fs/btrfs/ioctl.h Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Alexander Block 提交于
This patch introduces the BTRFS_IOC_SEND ioctl that is required for send. It allows btrfs-progs to implement full and incremental sends. Patches for btrfs-progs will follow. Signed-off-by: NAlexander Block <ablock84@googlemail.com> Reviewed-by: NDavid Sterba <dave@jikos.cz> Reviewed-by: NArne Jansen <sensille@gmx.net> Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>
-
由 Alexander Block 提交于
This function is used to find the differences between two trees. The tree compare skips whole subtrees if it detects shared tree blocks and thus is pretty fast. Signed-off-by: NAlexander Block <ablock84@googlemail.com> Reviewed-by: NDavid Sterba <dave@jikos.cz> Reviewed-by: NArne Jansen <sensille@gmx.net> Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>
-
由 Alexander Block 提交于
This patch introduces uuids for subvolumes. Each subvolume has it's own uuid. In case it was snapshotted, it also contains parent_uuid. In case it was received, it also contains received_uuid. It also introduces subvolume ctime/otime/stime/rtime. The first two are comparable to the times found in inodes. otime is the origin/creation time and ctime is the change time. stime/rtime are only valid on received subvolumes. stime is the time of the subvolume when it was sent. rtime is the time of the subvolume when it was received. Additionally to the times, we have a transid for each time. They are updated at the same place as the times. btrfs receive uses stransid and rtransid to find out if a received subvolume changed in the meantime. If an older kernel mounts a filesystem with the extented fields, all fields become invalid. The next mount with a new kernel will detect this and reset the fields. Signed-off-by: NAlexander Block <ablock84@googlemail.com> Reviewed-by: NDavid Sterba <dave@jikos.cz> Reviewed-by: NArne Jansen <sensille@gmx.net> Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>
-
由 Alexander Block 提交于
Make iref_to_path non static (needed in send) and rename it to btrfs_iref_to_path Signed-off-by: NAlexander Block <ablock84@googlemail.com>
-
由 Chris Mason 提交于
We were missing wakeups on the delayed ref waitqueue due to races on waitqueue_active. Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Chris Mason 提交于
Each ordered operation has a free callback, and this was called with the worker spinlock held. Josef made the free callback also call iput, which we can't do with the spinlock. This drops the spinlock for the free operation and grabs it again before moving through the rest of the list. We'll circle back around to this and find a cleaner way that doesn't bounce the lock around so much. Signed-off-by: NChris Mason <chris.mason@fusionio.com> cc: stable@kernel.org
-
由 Mitch Harder 提交于
In support of the recently added capability to remount with lzo compression, provide a helper function to check the compression INCOMPAT flags when remounting with lzo compression, and set the flags if necessary. Also, implement the new helper function when defragmenting with explicit lzo compression and when setting the default subvolume. Signed-off-by: NMitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
git://git.jan-o-sch.net/btrfs-unstable由 Chris Mason 提交于
Conflicts: fs/btrfs/ioctl.c fs/btrfs/ioctl.h fs/btrfs/transaction.c fs/btrfs/transaction.h Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 25 7月, 2012 2 次提交
-
-
由 Arne Jansen 提交于
Often no exact match is wanted but just the next lower or higher item. There's a lot of duplicated code throughout btrfs to deal with the corner cases. This patch adds a helper function that can facilitate searching. Signed-off-by: NArne Jansen <sensille@gmx.net>
-
由 David Sterba 提交于
Lift the EXDEV condition and allow different root trees for files being cloned, then pass source inode's root when searching for extents. Cloning is not allowed to cross vfsmounts, ie. when two subvolumes from one filesystem are mounted separately. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
- 24 7月, 2012 20 次提交
-
-
由 Liu Bo 提交于
While testing with my buffer read fio jobs[1], I find that btrfs does not perform well enough. Here is a scenario in fio jobs: We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file, and all of them will race on add_to_page_cache_lru(), and if one thread successfully puts its page into the page cache, it takes the responsibility to read the page's data. And what's more, reading a page needs a period of time to finish, in which other threads can slide in and process rest pages: t1 t2 t3 t4 add Page1 read Page1 add Page2 | read Page2 add Page3 | | read Page3 add Page4 | | | read Page4 -----|------------|-----------|-----------|-------- v v v v bio bio bio bio Now we have four bios, each of which holds only one page since we need to maintain consecutive pages in bio. Thus, we can end up with far more bios than we need. Here we're going to a) delay the real read-page section and b) try to put more pages into page cache. With that said, we can make each bio hold more pages and reduce the number of bios we need. Here is some numbers taken from fio results: w/o patch w patch ------------- -------- --------------- READ: 745MB/s +25% 934MB/s [1]: [global] group_reporting thread numjobs=4 bs=32k rw=read ioengine=sync directory=/mnt/btrfs/ [READ] filename=foobar size=2000M invalidate=1 Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Liu Bo 提交于
For backref walking, we've introduce delayed ref's sequence. However, it changes our preallocation behavior. The story is that when we preallocate an extent and then mark it written piece by piece, the ideal case should be that we don't need to COW the extent, which is why we use 'preallocate'. But we may not make use of preallocation, since when we check for cross refs on the extent, we may have two ref entries which have the same content except the sequence value, and we recognize them as cross refs and do COW to allocate another extent. So we end up with several pieces of space instead of an whole extent. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
There is a small window where an eb can have no IO bits set on it, which could potentially result in extent_buffer_under_io() returning false when we want it to return true, which could result in not fun things happening. So in order to protect this case we need to hold the refs_lock when we make this transition to make sure we get reliable results out of extent_buffer_udner_io(). Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
This sounds sort of impossible but it is the only thing I can think of and at the very least it is theoretically possible so here it goes. If we are in try_release_extent_buffer we will check that the ref count on the extent buffer is 1 and not under IO, and then go down and clear the tree ref. If between this check and clearing the tree ref somebody else comes in and grabs a ref on the eb and the marks it dirty before try_release_extent_buffer() does it's tree ref clear we can end up with a dirty eb that will be freed while it is still dirty which will result in a panic. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
I noticed while looking at an extent_buffer race that we will unconditionally return 1 if we get down to release_extent_buffer after clearing the tree ref. However we can easily race in here and get a ref on the eb and not actually free the eb. So make release_extent_buffer return 1 if it free'd the eb and 0 if not so we can be a little kinder to the vm. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Stefan Behrens 提交于
Code is added to suppress the I/O stats printing at mount time if all statistic values are zero. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
-
由 Stefan Behrens 提交于
People complained about the annoying kernel log message "btrfs: no dev_stats entry found ... (OK on first mount after mkfs)" everytime a filesystem is mounted for the first time after running mkfs. Since the distribution of the btrfs-progs is not synchronized to the kernel version, mkfs like it is now will be used also in the future. Then this message is not useful to find errors, it is just annoying. This commit removes the printk(). Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
-
由 Li Zefan 提交于
BTRFS_SETGET_FUNCS macro is used to generate btrfs_set_foo() and btrfs_foo() functions, which read and write specific fields in the extent buffer. The total number of set/get functions is ~200, but in fact we only need 8 functions: 2 for u8 field, 2 for u16, 2 for u32 and 2 for u64. It results in redunction of ~37K bytes. text data bss dec hex filename 629661 12489 216 642366 9cd3e fs/btrfs/btrfs.o.orig 592637 12489 216 605342 93c9e fs/btrfs/btrfs.o Signed-off-by: NLi Zefan <lizefan@huawei.com>
-
由 Li Zefan 提交于
The otime field is not zeroed, so users will see random otime in an old filesystem with a new kernel which has otime support in the future. The reserved bytes are also not zeroed, and we'll have compatibility issue if we make use of those bytes. Signed-off-by: NLi Zefan <lizefan@huawei.com>
-
由 Li Zefan 提交于
Inodes always allocate free space with BTRFS_BLOCK_GROUP_DATA type, which means every inode has the same BTRFS_I(inode)->free_space pointer. This shrinks struct btrfs_inode by 4 bytes (or 8 bytes on 64 bits). Signed-off-by: NLi Zefan <lizefan@huawei.com>
-
由 Anand Jain 提交于
Changing printk_in_rcu to printk_ratelimited_in_rcu will suffice Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Jan Schmidt 提交于
When calling btrfs_next_old_leaf, we were leaking an extent buffer in the rare case of using the deadlock avoidance code needed for the tree mod log. Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Liu Bo 提交于
If a block group is ro, do not count its entries in when we dump space info. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Liu Bo 提交于
Block group has ro attributes, make dump_space_info show it. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Liu Bo 提交于
Here is the whole story: 1) A free space cache consists of two parts: o free space cache inode, which is special becase it's stored in root tree. o free space info, which is stored as the above inode's file data. But we only build up another new inode and does not flush its free space info onto disk when we _clear and setup_ free space cache, and this ends up with that the block group cache's cache_state remains DC_SETUP instead of DC_WRITTEN. And holding DC_SETUP means that we will not truncate this free space cache inode, which means the disk offset of its file extent will remain _unchanged_ at least until next transaction finishes committing itself. 2) We can set a block group readonly when we relocate the block group. However, if the readonly block group covers the disk offset where our free space cache inode is going to write, it will force the free space cache inode into cow_file_range() and it'll end up hitting a BUG_ON. 3) Due to the above analysis, we fix this bug by adding the missing dirty flag. 4) However, it's not over, there is still another case, nospace_cache. With nospace_cache, we do not want to set dirty flag, instead we just truncate free space cache inode and bail out with setting cache state DC_WRITTEN. We can benifit from it since it saves us another 'pre-allocation' part which usually costs a lot. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Liu Bo 提交于
During disk balance, we prealloc new file extent for file data relocation, but we may fail in 'no available space' case, and it leads to flipping btrfs into readonly. It is not necessary to bail out and abort transaction since we do have several ways to rescue ourselves from ENOSPC case. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Liu Bo 提交于
Since root can be fetched via BTRFS_I macro directly, we can save an args for btrfs_is_free_space_inode(). Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Liu Bo 提交于
For btree inode, its root is also 'tree root', so btree inode can be misunderstood as a free space inode. We should add one more check for btree inode. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Stefan Behrens 提交于
From btree_read_extent_buffer_pages(), currently repair_io_failure() can be called with mirror_num being zero when submit_one_bio() returned an error before. This used to cause a BUG_ON(!mirror_num) in repair_io_failure() and indeed this is not a case that needs the I/O repair code to rewrite disk blocks. This commit prevents calling repair_io_failure() in this case and thus avoids the BUG_ON() and malfunction. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
So shrink_delalloc has grown all sorts of cruft over the years thanks to many reworkings of how we track enospc. What happens now as we fill up the disk is we will loop for freaking ever hoping to reclaim a arbitrary amount of space of metadata, this was from when everybody flushed at the same time. Now we only have people flushing one at a time. So instead of trying to reclaim a huge amount of space, just try to flush a decent chunk of space, and stop looping as soon as we have enough free space to satisfy our reservation. This makes xfstests 224 go much faster. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-