- 17 1月, 2012 5 次提交
-
-
由 Ilya Dryomov 提交于
Recognize BTRFS_BALANCE_RESUME flag passed from userspace. We use the same heuristics used when recovering balance after a crash to try to start where we left off last time. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Implement an ioctl for canceling restriper. Currently we wait until relocation of the current block group is finished, in future this can be done by triggering a commit. Balance item is deleted and no memory about the interrupted balance is kept. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Implement an ioctl for pausing restriper. This pauses the relocation, but balance is still considered to be "in progress": balance item is not deleted, other volume operations cannot be started, etc. If paused in the middle of profile changing operation we will continue making allocations with the target profile. Add a hook to close_ctree() to pause restriper and free its data structures on unmount. (It's safe to unmount when restriper is in "paused" state, we will resume with the same parameters on the next mount) Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
This allows to have a separate set of filters for each chunk type (data,meta,sys). The code however is generic and switch on chunk type is only done once. This commit also adds a type filter: it allows to balance for example meta and system chunks w/o touching data ones. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Add basic restriper infrastructure: extended balancing ioctl and all related ioctl data structures, add data structure for tracking restriper's state to fs_info, etc. The semantics of the old balancing ioctl are fully preserved. Explicitly disallow any volume operations when balance is in progress. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 16 12月, 2011 1 次提交
-
-
由 Josef Bacik 提交于
Running xfstests 269 with some tracing my scripts kept spitting out errors about releasing bytes that we didn't actually have reserved. This took me down a huge rabbit hole and it turns out the way we deal with reserved_extents is wrong, we need to only be setting it if the reservation succeeds, otherwise the free() method will come in and unreserve space that isn't actually reserved yet, which can lead to other warnings and such. The math was all working out right in the end, but it caused all sorts of other issues in addition to making my scripts yell and scream and generally make it impossible for me to track down the original issue I was looking for. The other problem is with our error handling in the reservation code. There are two cases that we need to deal with 1) We raced with free. In this case free won't free anything because csum_bytes is modified before we dro the lock in our reservation path, so free rightly doesn't release any space because the reservation code may be depending on that reservation. However if we fail, we need the reservation side to do the free at that point since that space is no longer in use. So as it stands the code was doing this fine and it worked out, except in case #2 2) We don't race with free. Nobody comes in and changes anything, and our reservation fails. In this case we didn't reserve anything anyway and we just need to clean up csum_bytes but not free anything. So we keep track of csum_bytes before we drop the lock and if it hasn't changed we know we can just decrement csum_bytes and carry on. Because of the case where we can race with free()'s since we have to drop our spin_lock to do the reservation, I'm going to serialize all reservations with the i_mutex. We already get this for free in the heavy use paths, truncate and file write all hold the i_mutex, just needed to add it to page_mkwrite and various ioctl/balance things. With this patch my space leak scripts no longer scream bloody murder. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 15 12月, 2011 1 次提交
-
-
由 Li Zefan 提交于
To reproduce the bug: # touch /mnt/tmp # stat /mnt/tmp | grep Change Change: 2011-12-09 09:32:23.412105981 +0800 # chattr +i /mnt/tmp # stat /mnt/tmp | grep Change Change: 2011-12-09 09:32:43.198105295 +0800 # umount /mnt # mount /dev/loop1 /mnt # stat /mnt/tmp | grep Change Change: 2011-12-09 09:32:23.412105981 +0800 We should update ctime of in-memory inode before calling btrfs_update_inode(). Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 01 12月, 2011 1 次提交
-
-
由 Mike Fleetwood 提交于
It seems overly harsh to fail a resize of a btrfs file system to the same size when a shrink or grow would succeed. User app GParted trips over this error. Allow it by bypassing the shrink or grow operation. Signed-off-by: NMike Fleetwood <mike.fleetwood@googlemail.com>
-
- 20 11月, 2011 2 次提交
-
-
由 Arnd Hannemann 提交于
For the user it is confusing to find something like: [10197.627710] new size for /dev/mapper/vg0-usr_share is 3221225472 in kernel log, because it doesn't point directly to btrfs. This patch prefixes those messages with "btrfs:" like other btrfs related printks. Signed-off-by: NArnd Hannemann <arnd@arndnet.de> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Jeff Mahoney 提交于
This patch casts to unsigned long before casting to a pointer and fixes the following warnings: fs/btrfs/extent_io.c:2289:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] fs/btrfs/ioctl.c:2933:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] fs/btrfs/ioctl.c:2937:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] fs/btrfs/ioctl.c:3020:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] fs/btrfs/scrub.c:275:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] fs/btrfs/backref.c:686:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 06 11月, 2011 2 次提交
-
-
由 Chris Mason 提交于
The new ioctls to follow backrefs are not clean for 32/64 bit compat. This reworks them for u64s everywhere. They are brand new, so there are no problems with changing the interface now. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 David Sterba 提交于
fs_info has now ~9kb, more than fits into one page. This will cause mount failure when memory is too fragmented. Top space consumers are super block structures super_copy and super_for_commit, ~2.8kb each. Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64) Add a wrapper for freeing fs_info and all of it's dynamically allocated members. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
- 21 10月, 2011 6 次提交
-
-
由 Lukas Czerner 提交于
We should retirn EINVAL if the start is beyond the end of the file system in the btrfs_ioctl_fitrim(). Fix that by adding the appropriate check for it. Also in the btrfs_trim_fs() it is possible that len+start might overflow if big values are passed. Fix it by decrementing the len so that start+len is equal to the file system size in the worst case. Signed-off-by: NLukas Czerner <lczerner@redhat.com>
-
由 Li Zefan 提交于
We won't defrag an extent, if it's bigger than the threshold we specified and there's no small extent before it, but actually the code doesn't work this way. There are three bugs: - When should_defrag_range() decides we should keep on defragmenting an extent, last_len is not incremented. (old bug) - The length that passes to should_defrag_range() is not the length we're going to defrag. (new bug) - We always defrag 256K bytes data, and a big extent can be part of this range. (new bug) For a file with 4 extents: | 4K | 4K | 256K | 256K | The result of defrag with (the default) 256K extent thresh should be: | 264K | 256K | but with those bugs, we'll get: | 520K | Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
-
由 Li Zefan 提交于
It's off-by-one, and thus we may skip the last page while defragmenting. An example case: # create /mnt/file with 2 4K file extents # btrfs fi defrag /mnt/file # sync # filefrag /mnt/file /mnt/file: 2 extents found So it's not defragmented. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
-
由 Li Zefan 提交于
Don't use inode->i_size directly, since we're not holding i_mutex. This also fixes another bug, that i_size can change after it's checked against 0 and then (i_size - 1) can be negative. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
-
由 Li Zefan 提交于
There's an off-by-one bug: # create a file with lots of 4K file extents # btrfs fi defrag /mnt/file # sync # filefrag -v /mnt/file Filesystem type is: 9123683e File size of /mnt/file is 1228800 (300 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3372 64 1 64 3136 3435 1 2 65 3436 3136 64 3 129 3201 3499 1 4 130 3500 3201 64 5 194 3266 3563 1 6 195 3564 3266 64 7 259 3331 3627 1 8 260 3628 3331 40 eof After this patch: ... # filefrag -v /mnt/file Filesystem type is: 9123683e File size of /mnt/file is 1228800 (300 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3372 300 eof /mnt/file: 1 extent found Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
-
由 Diego Calleja 提交于
kmemleak found this: unreferenced object 0xffff8801b64af968 (size 512): comm "btrfs-cleaner", pid 3317, jiffies 4306810886 (age 903.272s) hex dump (first 32 bytes): 00 82 01 07 00 ea ff ff c0 83 01 07 00 ea ff ff ................ 80 82 01 07 00 ea ff ff c0 87 01 07 00 ea ff ff ................ backtrace: [<ffffffff816875cc>] kmemleak_alloc+0x5c/0xc0 [<ffffffff8114aec3>] kmem_cache_alloc_trace+0x163/0x240 [<ffffffff8127a290>] btrfs_defrag_file+0xf0/0xb20 [<ffffffff8125d9a5>] btrfs_run_defrag_inodes+0x165/0x210 [<ffffffff812479d7>] cleaner_kthread+0x177/0x190 [<ffffffff81075c7d>] kthread+0x8d/0xa0 [<ffffffff816af5f4>] kernel_thread_helper+0x4/0x10 [<ffffffffffffffff>] 0xffffffffffffffff "pages" is not always freed. Fix it removing the unnecesary additional return. Signed-off-by: NDiego Calleja <diegocg@gmail.com>
-
- 20 10月, 2011 2 次提交
-
-
由 Josef Bacik 提交于
Xfstests 79 was failing because we were inheriting the S_APPEND flag when we weren't supposed to. There isn't any specific documentation on this so I'm taking the test as the standard of how things work, and having S_APPEND set on a directory doesn't mean that S_APPEND gets inherited by its children according to this test. So only inherit btrfs specific things. This will let us set compress/nocompress on specific directories and everything in the directories will inherit this flag, same with nodatacow. With this patch test 79 passes. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
Johannes pointed out we were allocating only kernel pages for doing writes, which is kind of a big deal if you are on 32bit and have more than a gig of ram. So fix our allocations to use the mapping's gfp but still clear __GFP_FS so we don't re-enter. Thanks, Reported-by: NJohannes Weiner <jweiner@redhat.com> Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 11 10月, 2011 2 次提交
-
-
由 Chris Mason 提交于
The btrfs file defrag code will loop through the extents and force COW on them. But there is a concurrent truncate in the middle of the defrag, it might end up defragging the same range over and over again. The problem is that writepage won't go through and do anything on pages past i_size, so the cow won't happen, so the file will appear to still be fragmented. defrag will end up hitting the same extents again and again. In the worst case, the truncate can actually live lock with the defrag because the defrag keeps creating new ordered extents which the truncate code keeps waiting on. The fix here is to make defrag check for i_size inside the main loop, instead of just once before the looping starts. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
Follow those steps: # mount -o autodefrag /dev/sda7 /mnt # dd if=/dev/urandom of=/mnt/tmp bs=200K count=1 # sync # dd if=/dev/urandom of=/mnt/tmp bs=8K count=1 conv=notrunc and then it'll go into a loop: writeback -> defrag -> writeback ... It's because writeback writes [8K, 200K] and then writes [0, 8K]. I tried to make writeback know if the pages are dirtied by defrag, but the patch was a bit intrusive. Here I simply set writeback_index when we defrag a file. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 29 9月, 2011 1 次提交
-
-
由 Jan Schmidt 提交于
these ioctls make use of the new functions initially added for scrub. they return all inodes belonging to a logical address (BTRFS_IOC_LOGICAL_INO) and all paths belonging to an inode (BTRFS_IOC_INO_PATHS). Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
-
- 21 9月, 2011 1 次提交
-
-
由 Sage Weil 提交于
Fix a crash/BUG_ON in the clone ioctl due to insufficient reservation. We need to reserve space for: - adjusting the old extent (possibly splitting it) - adding the new extent - updating the inode Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 18 9月, 2011 3 次提交
-
-
由 Li Zefan 提交于
The dst file will have the same inode flags with dst file after file clone, and I think it's unexpected. For example, the dst file will suddenly become immutable after getting some share of data with src file, if the src is immutable. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
To reproduce the bug: # mount /dev/sda7 /mnt # dd if=/dev/zero of=/mnt/src bs=4K count=1 # umount /mnt # mount -o nodatasum /dev/sda7 /mnt # dd if=/dev/zero of=/mnt/dst bs=4K count=1 # clone_range -s 4K -l 4K /mnt/src /mnt/dst # echo 3 > /proc/sys/vm/drop_caches # cat /mnt/dst # dmesg ... btrfs no csum found for inode 258 start 0 btrfs csum failed ino 258 off 0 csum 2566472073 private 0 It's because part of the file is checksummed and the other part is not, and then btrfs will complain checksum is not found when we read the file. Disallow file clone if src and dst file have different checksum flag, so we ensure a file is completely checksummed or unchecksummed. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
It's a bug in commit f81c9cdc (Btrfs: truncate pages from clone ioctl target range) We should pass the dest range to the truncate function, but not the src range. Also move the function before locking extent state. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 11 9月, 2011 2 次提交
-
-
由 Li Zefan 提交于
You can see there's no file extent with range [0, 4096]. Check this by btrfsck: # btrfsck /dev/sda7 root 5 inode 258 errors 100 ... Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
num_bytes should be 4096 not 12288. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 17 8月, 2011 1 次提交
-
-
由 Sage Weil 提交于
We need to truncate page cache pages for the clone ioctl target range or else we'll confuse ourselves to no end. If the old data was cached, we used to still see it (until remount). If the page was partially updated we used to get a mix of old and new data. Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 02 8月, 2011 1 次提交
-
-
由 Li Zefan 提交于
Memory areas [ptr, ptr+total_len] and [name, name+total_len] may overlap, so it's wrong to use memcpy(). Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 28 7月, 2011 2 次提交
-
-
由 Josef Bacik 提交于
So I had this brilliant idea to use atomic counters for outstanding and reserved extents, but this turned out to be a bad idea. Consider this where we have 1 outstanding extent and 1 reserved extent Reserver Releaser atomic_dec(outstanding) now 0 atomic_read(outstanding)+1 get 1 atomic_read(reserved) get 1 don't actually reserve anything because they are the same atomic_cmpxchg(reserved, 1, 0) atomic_inc(outstanding) atomic_add(0, reserved) free reserved space for 1 extent Then the reserver now has no actual space reserved for it, and when it goes to finish the ordered IO it won't have enough space to do it's allocation and you get those lovely warnings. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
grab_cache_page will use mapping_gfp_mask(), which for all inodes is set to GFP_HIGHUSER_MOVABLE. So instead use find_or_create_page in all cases where we need GFP_NOFS so we don't deadlock. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 21 7月, 2011 1 次提交
-
-
由 Al Viro 提交于
both callers there have dentry->d_parent stabilized by the fact that their caller had obtained dentry from lookup_one_len() and had not dropped ->i_mutex on parent since then. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 16 6月, 2011 1 次提交
-
-
由 Josef Bacik 提交于
Currently there is nothing protecting the pending_snapshots list on the transaction. We only hold the directory mutex that we are snapshotting and a read lock on the subvol_sem, so we could race with somebody else creating a snapshot in a different directory and end up with list corruption. So protect this list with the trans_lock. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 11 6月, 2011 1 次提交
-
-
由 Li Zefan 提交于
The size of struct btrfs_ioctl_fs_info_args is as big as 1KB, so don't declare the variable on stack. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Reviewed-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 04 6月, 2011 1 次提交
-
-
由 David Sterba 提交于
commit 4cb5300b ("Btrfs: add mount -o auto_defrag") accesses inode number directly while it should use the helper with the new inode number allocator. Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 27 5月, 2011 1 次提交
-
-
由 Chris Mason 提交于
This will detect small random writes into files and queue the up for an auto defrag process. It isn't well suited to database workloads yet, but works for smaller files such as rpm, sqlite or bdb databases. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 24 5月, 2011 2 次提交
-
-
由 Xiao Guangrong 提交于
fs_devices->devices is only updated on remove and add device paths, so we can use rcu to protect it in the reader side Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Hugo Mills 提交于
Btrfs's tree search ioctl has a field to indicate that no more than a given number of records should be returned. The ioctl doesn't honour this, as the tested value is not incremented until the end of the copy_to_sk function. This patch removes an unnecessary local variable, and updates the num_found counter as each key is found in the tree. Signed-off-by: NHugo Mills <hugo@carfax.org.uk> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-