- 11 10月, 2011 2 次提交
-
-
由 Chris Mason 提交于
The btrfs file defrag code will loop through the extents and force COW on them. But there is a concurrent truncate in the middle of the defrag, it might end up defragging the same range over and over again. The problem is that writepage won't go through and do anything on pages past i_size, so the cow won't happen, so the file will appear to still be fragmented. defrag will end up hitting the same extents again and again. In the worst case, the truncate can actually live lock with the defrag because the defrag keeps creating new ordered extents which the truncate code keeps waiting on. The fix here is to make defrag check for i_size inside the main loop, instead of just once before the looping starts. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
Follow those steps: # mount -o autodefrag /dev/sda7 /mnt # dd if=/dev/urandom of=/mnt/tmp bs=200K count=1 # sync # dd if=/dev/urandom of=/mnt/tmp bs=8K count=1 conv=notrunc and then it'll go into a loop: writeback -> defrag -> writeback ... It's because writeback writes [8K, 200K] and then writes [0, 8K]. I tried to make writeback know if the pages are dirtied by defrag, but the patch was a bit intrusive. Here I simply set writeback_index when we defrag a file. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 01 10月, 2011 1 次提交
-
-
由 Josef Bacik 提交于
A user reported a problem where ceph was getting into 100% cpu usage while doing some writing. It turns out it's because we were doing a short write on a not uptodate page, which means we'd fall back at one page at a time and fault the page in. The problem is our position is on the page boundary, so our fault in logic wasn't actually reading the page, so we'd just spin forever or until the page got read in by somebody else. This will force a readpage if we end up doing a short copy. Alexandre could reproduce this easily with ceph and reports it fixes his problem. I also wrote a reproducer that no longer hangs my box with this patch. Thanks, Reported-and-tested-by: NAlexandre Oliva <aoliva@redhat.com> Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 21 9月, 2011 1 次提交
-
-
由 Sage Weil 提交于
Fix a crash/BUG_ON in the clone ioctl due to insufficient reservation. We need to reserve space for: - adjusting the old extent (possibly splitting it) - adding the new extent - updating the inode Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 18 9月, 2011 4 次提交
-
-
由 Li Zefan 提交于
The dst file will have the same inode flags with dst file after file clone, and I think it's unexpected. For example, the dst file will suddenly become immutable after getting some share of data with src file, if the src is immutable. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
To reproduce the bug: # mount /dev/sda7 /mnt # dd if=/dev/zero of=/mnt/src bs=4K count=1 # umount /mnt # mount -o nodatasum /dev/sda7 /mnt # dd if=/dev/zero of=/mnt/dst bs=4K count=1 # clone_range -s 4K -l 4K /mnt/src /mnt/dst # echo 3 > /proc/sys/vm/drop_caches # cat /mnt/dst # dmesg ... btrfs no csum found for inode 258 start 0 btrfs csum failed ino 258 off 0 csum 2566472073 private 0 It's because part of the file is checksummed and the other part is not, and then btrfs will complain checksum is not found when we read the file. Disallow file clone if src and dst file have different checksum flag, so we ensure a file is completely checksummed or unchecksummed. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
It's a bug in commit f81c9cdc (Btrfs: truncate pages from clone ioctl target range) We should pass the dest range to the truncate function, but not the src range. Also move the function before locking extent state. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Hidetoshi Seto 提交于
Since the d_off in the first dirent for "." (that originates from the 4th argument "offset" of filldir() for the 2nd dirent for "..") is wrongly assigned in btrfs_real_readdir(), telldir returns same offset for different locations. | # mkfs.btrfs /dev/sdb1 | # mount /dev/sdb1 fs0 | # cd fs0 | # touch file0 file1 | # ../test | telldir: 0 | readdir: d_off = 2, d_name = "." | telldir: 2 | readdir: d_off = 2, d_name = ".." | telldir: 2 | readdir: d_off = 3, d_name = "file0" | telldir: 3 | readdir: d_off = 2147483647, d_name = "file1" | telldir: 2147483647 To fix this problem, pass filp->f_pos (which is loff_t) instead. | # ../test | telldir: 0 | readdir: d_off = 1, d_name = "." | telldir: 1 | readdir: d_off = 2, d_name = ".." | telldir: 2 | readdir: d_off = 3, d_name = "file0" : At the moment the "offset" for "." is unused because there is no preceding dirent, however it is better to pass filp->f_pos to follow grammatical usage. Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 11 9月, 2011 11 次提交
-
-
由 Li Zefan 提交于
You can see there's no file extent with range [0, 4096]. Check this by btrfsck: # btrfsck /dev/sda7 root 5 inode 258 errors 100 ... Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
num_bytes should be 4096 not 12288. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 David Sterba 提交于
An attribute is not removed by 'setfattr -x attr file' and remains visible in attr list. This makes xfstests/062 pass again. Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Miao Xie 提交于
If we write some data into the data hole of the file(no preallocation for this hole), Btrfs will allocate some disk space, and update nbytes of the inode, but the other element--disk_i_size needn't be updated. At this condition, we must update inode metadata though disk_i_size is not changed(btrfs_ordered_update_i_size() return 1). # mkfs.btrfs /dev/sdb1 # mount /dev/sdb1 /mnt # touch /mnt/a # truncate -s 856002 /mnt/a # dd if=/dev/zero of=/mnt/a bs=4K count=1 conv=nocreat,notrunc # umount /mnt # btrfsck /dev/sdb1 root 5 inode 257 errors 400 found 32768 bytes used err is 1 Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Miao Xie 提交于
When we write some data to the place that is beyond the end of the file in direct I/O mode, a data hole will be created. And Btrfs should insert a file extent item that point to this hole into the fs tree. But unfortunately Btrfs forgets doing it. The following is a simple way to reproduce it: # mkfs.btrfs /dev/sdc2 # mount /dev/sdc2 /test4 # touch /test4/a # dd if=/dev/zero of=/test4/a seek=8 count=1 bs=4K oflag=direct conv=nocreat,notrunc # umount /test4 # btrfsck /dev/sdc2 root 5 inode 257 errors 100 Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Tested-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Miao Xie 提交于
The function - btrfs_cont_expand() forgot to close the transaction handle before it jump out the while loop. Fix it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Liu Bo 提交于
At the beginning of create_pending_snapshot, trans->block_rsv is set to pending->block_rsv and is used for snapshot things, however, when it is done, we do not recover it as will. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Liu Bo 提交于
While truncating free space cache, we forget to change trans->block_rsv back to the original one, but leave it with the orphan_block_rsv, and then with option inode_cache enable, it leads to countless warnings of btrfs_alloc_free_block and btrfs_orphan_commit_root: WARNING: at fs/btrfs/extent-tree.c:5711 btrfs_alloc_free_block+0x180/0x350 [btrfs]() ... WARNING: at fs/btrfs/inode.c:2193 btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]() Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
It's not enough to just search the commit root, since we could be cow'ing the very block we need to search through, which would mean that its locked and we'll still deadlock. So use path->skip_locking as well. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Sergei Trofimovich 提交于
iput() shouldn't be called for inodes in I_NEW state. We need to mark inode as constructed first. WARNING: at fs/inode.c:1309 iput+0x20b/0x210() Call Trace: [<ffffffff8103e7ba>] warn_slowpath_common+0x7a/0xb0 [<ffffffff8103e805>] warn_slowpath_null+0x15/0x20 [<ffffffff810eaf0b>] iput+0x20b/0x210 [<ffffffff811b96fb>] btrfs_iget+0x1eb/0x4a0 [<ffffffff811c3ad6>] btrfs_run_defrag_inodes+0x136/0x210 [<ffffffff811ad55f>] cleaner_kthread+0x17f/0x1a0 [<ffffffff81035b7d>] ? sub_preempt_count+0x9d/0xd0 [<ffffffff811ad3e0>] ? transaction_kthread+0x280/0x280 [<ffffffff8105af86>] kthread+0x96/0xa0 [<ffffffff814336d4>] kernel_thread_helper+0x4/0x10 [<ffffffff8105aef0>] ? kthread_worker_fn+0x190/0x190 [<ffffffff814336d0>] ? gs_change+0xb/0xb Signed-off-by: NSergei Trofimovich <slyfox@gentoo.org> CC: Konstantin Khlebnikov <khlebnikov@openvz.org> Tested-by: NDavid Sterba <dsterba@suse.cz> CC: Josef Bacik <josef@redhat.com> CC: Chris Mason <chris.mason@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Liu Bo 提交于
We can reproduce this oops via the following steps: $ mkfs.btrfs /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs $ for ((i=0; i<3; i++)); do btrfs sub snap /mnt/btrfs /mnt/btrfs/s_$i; done $ rm -fr /mnt/btrfs/* $ rm -fr /mnt/btrfs/* then we'll get ------------[ cut here ]------------ kernel BUG at fs/btrfs/inode.c:2264! [...] Call Trace: [<ffffffffa05578c7>] btrfs_rmdir+0xf7/0x1b0 [btrfs] [<ffffffff81150b95>] vfs_rmdir+0xa5/0xf0 [<ffffffff81153cc3>] do_rmdir+0x123/0x140 [<ffffffff81145ac7>] ? fput+0x197/0x260 [<ffffffff810aecff>] ? audit_syscall_entry+0x1bf/0x1f0 [<ffffffff81153d0d>] sys_unlinkat+0x2d/0x40 [<ffffffff8147896b>] system_call_fastpath+0x16/0x1b RIP [<ffffffffa054f7b9>] btrfs_orphan_add+0x179/0x1a0 [btrfs] When it comes to btrfs_lookup_dentry, we may set a snapshot's inode->i_ino to BTRFS_EMPTY_SUBVOL_DIR_OBJECTID instead of BTRFS_FIRST_FREE_OBJECTID, while the snapshot's location.objectid remains unchanged. However, btrfs_ino() does not take this into account, and returns a wrong ino, and causes the oops. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 18 8月, 2011 1 次提交
-
-
由 Josef Bacik 提交于
xfstests exposed a problem with preallocate when it fallocates a range that already has an extent. We don't set the new i_size properly because we see that we already have an extent. This isn't right and we should update i_size if the space already exists. With this patch we now pass xfstests 075. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 17 8月, 2011 10 次提交
-
-
由 Sage Weil 提交于
We need to truncate page cache pages for the clone ioctl target range or else we'll confuse ourselves to no end. If the old data was cached, we used to still see it (until remount). If the page was partially updated we used to get a mix of old and new data. Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Miao Xie 提交于
sync_pending is uninitialized before it be used, fix it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Miao Xie 提交于
Btrfs subtracted the size of the allocated space twice when it allocated the space from the bitmap in the cluster, it broke the free space information and led to oops finally. And this patch also fixes the bug that ctl->free_space was subtracted without lock. Reported-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Dan Carpenter 提交于
We don't use the defrag struct on this path. Signed-off-by: NDan Carpenter <error27@gmail.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
We've stopped using highmem for extent buffers. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Tsutomu Itoh 提交于
The filesystem turns readonly instead of returning the error to the caller when detected error in btrfs_drop_snapshot(). and, because the caller doesn't check the error, the function type is changed to 'void'. Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 liubo 提交于
When checking if there is enough space for balancing a block group, since we do not take raid types into consideration, we do not account corrent amounts of space that we needed. This makes us do some extra work before we get ENOSPC. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 liubo 提交于
When balancing, we'll first try to shrink devices for some space, but if it is working on a full multi-disk partition with raid protection, we may encounter a bug, that is, while shrinking, total_bytes may be less than bytes_used, and btrfs may allocate a dev extent that accesses out of device's bounds. Then we will not be able to write or read the data which stores at the end of the device, and get the followings: device fsid 0939f071-7ea3-46c8-95df-f176d773bfb6 devid 1 transid 10 /dev/sdb5 Btrfs detected SSD devices, enabling SSD mode btrfs: relocating block group 476315648 flags 9 btrfs: found 4 extents attempt to access beyond end of device sdb5: rw=145, want=546176, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546304, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546432, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546560, limit=546147 attempt to access beyond end of device Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 liubo 提交于
When btrfs recovers from a crash, it may hit the oops below: ------------[ cut here ]------------ kernel BUG at fs/btrfs/inode.c:4580! [...] RIP: 0010:[<ffffffffa03df251>] [<ffffffffa03df251>] btrfs_add_link+0x161/0x1c0 [btrfs] [...] Call Trace: [<ffffffffa03e7b31>] ? btrfs_inode_ref_index+0x31/0x80 [btrfs] [<ffffffffa04054e9>] add_inode_ref+0x319/0x3f0 [btrfs] [<ffffffffa0407087>] replay_one_buffer+0x2c7/0x390 [btrfs] [<ffffffffa040444a>] walk_down_log_tree+0x32a/0x480 [btrfs] [<ffffffffa0404695>] walk_log_tree+0xf5/0x240 [btrfs] [<ffffffffa0406cc0>] btrfs_recover_log_trees+0x250/0x350 [btrfs] [<ffffffffa0406dc0>] ? btrfs_recover_log_trees+0x350/0x350 [btrfs] [<ffffffffa03d18b2>] open_ctree+0x1442/0x17d0 [btrfs] [...] This comes from that while replaying an inode ref item, we forget to check those old conflicting DIR_ITEM and DIR_INDEX items in fs/file tree, then we will come to conflict corners which lead to BUG_ON(). Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Tested-by: NAndy Lutomirski <luto@mit.edu> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
We have a problem where if a user specifies discard but doesn't actually support it we will return EOPNOTSUPP from btrfs_discard_extent. This is a problem because this gets called (in a fashion) from the tree log recovery code, which has a nice little BUG_ON(ret) after it, which causes us to fail the tree log replay. So instead detect wether our devices support discard when we're adding them and then don't issue discards if we know that the device doesn't support it. And just for good measure set ret = 0 in btrfs_issue_discard just in case we still get EOPNOTSUPP so we don't screw anybody up like this again. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 06 8月, 2011 1 次提交
-
-
由 Chris Mason 提交于
Btrfs does bio submissions from a worker thread, and each device has a list of high priority bios and regular priority bios. Synchronous writes go to the high priority thread while async writes go to regular list. This commit brings back an explicit unplug any time we switch from high to regular priority, which makes it easier for the block layer to give us low latencies. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 02 8月, 2011 9 次提交
-
-
由 Josef Bacik 提交于
When doing a writepage we call writepages to try and write out any other dirty pages in the area. This could cause problems where we commit a transaction and then have somebody else dirtying metadata in the area as we could end up writing out a lot more than we care about, which could cause latency on anybody who is waiting for the transaction to completely finish committing. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Mitch Harder 提交于
The variable 'last_index' is calculated in the __btrfs_buffered_write function and passed as a parameter to the prepare_pages function, but is not used anywhere in the prepare_pages function. Remove instances of 'last_index' in these functions. Signed-off-by: NMitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Xiao Guangrong 提交于
find_first_extent_bit() and find_first_extent_bit_state() share most of the code, and we can just make the former call the latter. Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Xiao Guangrong 提交于
We can just use cond_resched_lock(). Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Xiao Guangrong 提交于
Don't duplicate set_state_bits(). Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Xiao Guangrong 提交于
These members are not used at all. Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
unpin_extent_cache() and add_extent_mapping() shares the same code that merges extent maps. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
lookup_extent_map() and search_extent_map() can share most of code. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
rb_node returned by __tree_search() can be a valid pointer or NULL, but won't be some errno. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-