- 03 12月, 2014 3 次提交
-
-
由 Miao Xie 提交于
This patch implement the RAID5/6 common data repair function, the implementation is similar to the scrub on the other RAID such as RAID1, the differentia is that we don't read the data from the mirror, we use the data repair function of RAID5/6. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
-
由 Zhao Lei 提交于
stripe_index's value was set again in latter line: stripe_index = 0; Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.cz>
-
由 Zhao Lei 提交于
bbio_ret in this condition is always !NULL because previous code already have a check-and-skip: 4908 if (!bbio_ret) 4909 goto out; Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.cz>
-
- 04 10月, 2014 1 次提交
-
-
由 Fabian Frederick 提交于
bi_sector and bi_size moved to bi_iter since commit 4f024f37 ("block: Abstract out bvec iterator") Signed-off-by: NFabian Frederick <fabf@skynet.be> Signed-off-by: NChris Mason <clm@fb.com>
-
- 23 9月, 2014 1 次提交
-
-
由 Josef Bacik 提交于
One problem that has plagued us is that a user will use up all of his space with data, remove a bunch of that data, and then try to create a bunch of small files and run out of space. This happens because all the chunks were allocated for data since the metadata requirements were so low. But now there's a bunch of empty data block groups and not enough metadata space to do anything. This patch solves this problem by automatically deleting empty block groups. If we notice the used count go down to 0 when deleting or on mount notice that a block group has a used count of 0 then we will queue it to be deleted. When the cleaner thread runs we will double check to make sure the block group is still empty and then we will delete it. This patch has the side effect of no longer having a bunch of BUG_ON()'s in the chunk delete code, which will be helpful for both this and relocate. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 18 9月, 2014 30 次提交
-
-
由 Chris Mason 提交于
This reverts commit b96de000. This commit is triggering failures to mount by subvolume id in some configurations. The main problem is how many different ways this scanning function is used, both for scanning while mounted and unmounted. A proper cleanup is too big for late rcs. For now, just revert the commit and we'll put a better fix into a later merge window. Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
We need real mirror number for RAID0/5/6 when reading data, or if read error happens, we would pass 0 as the number of the mirror on which the io error happens. It is wrong and would cause the filesystem read the data from the corrupted mirror again. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
rw_devices counter is often used to tune the profile when doing chunk allocation, so we should modify it under the chunk_mutex context to avoid getting wrong chunk profile. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
For a missing device, we don't know it belong to which fs before we read its fsid from the chunk tree. So we add them into the current fs device list at first. When we get its fsid, we should move them to their own fs device list. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
When we open a seed filesystem, if the degraded mount option is set, we continue to mount the fs if we don't find some devices in the seed filesystem. But we should stop mounting if other errors happen. Fix it Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
The problem is: Task0(device scan task) Task1(device replace task) scan_one_device() mutex_lock(&uuid_mutex) device = find_device() mutex_lock(&device_list_mutex) lock_chunk() rm_and_free_source_device unlock_chunk() mutex_unlock(&device_list_mutex) check device Destroying the target device if device replace fails also has the same problem. We fix this problem by locking uuid_mutex during destroying source device or target device, just like the device remove operation. It is a temporary solution, we can fix this problem and make the code more clear by atomic counter in the future. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
We can build a new filesystem based a seed filesystem, and we need clone the fs devices when we open the new filesystem. But someone might clear the seed flag of the seed filesystem, then mount that filesystem and remove some device. If we mount the new filesystem, we might access a device list which was being changed when we clone the fs devices. Fix it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
There were several problems about chunk mutex usage: - Lock chunk mutex when updating metadata. It would cause the nested deadlock because updating metadata might need allocate new chunks that need acquire chunk mutex. We remove chunk mutex at this case, because b-tree lock and other lock mechanism can help us. - ABBA deadlock occured between device_list_mutex and chunk_mutex. When we update device status, we must acquire device_list_mutex at the beginning, and then we might get chunk_mutex during the device status update because we need allocate new chunks for metadata COW. But at most place, we acquire chunk_mutex at first and then acquire device list mutex. We need change the lock order. - Some place we needn't acquire chunk_mutex. For example we needn't get chunk_mutex when we free a empty seed fs_devices structure. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
We didn't protect the system chunk array when we added a new system chunk into it, it would cause the array be corrupted if someone remove/add some system chunk into array at the same time. Fix it by chunk lock. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
->total_bytes,->disk_total_bytes,->bytes_used is protected by chunk lock when we change them, but sometimes we read them without any lock, and we might get unexpected value. We fix this problem like inode's i_size. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
We should update free_chunk_space in time when we allocate a new chunk, not when we deal with the pending device update and block group insertion, because we need the real free_chunk_space data to calculate the reserved space, if we don't update it in time, we would consider the disk space which has be allocated as free space, and would use it to do overcommit reservation. Fix it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
We should update device->bytes_used in the lock context of chunk_mutex, or we would get wrong data. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
During removing a device, we have modified free_chunk_space when we shrink the device, so we needn't assign a new value to it after the device shrink. Fix it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
device->bytes_used will be changed when allocating a new chunk, and disk_total_size will be changed if resizing is successful. Meanwhile, the on-disk super blocks of the previous transaction might not be updated. Considering the consistency of the metadata in the previous transaction, We should use the size in the previous transaction to check if the super block is beyond the boundary of the device. Though it is not big problem because we don't use it now, but anyway it is better that we make it be consistent with the common metadata, maybe we will use it in the future. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
total_size will be changed when resizing a device, and disk_total_size will be changed if resizing is successful. Meanwhile, the on-disk super blocks of the previous transaction might not be updated. Considering the consistency of the metadata in the previous transaction, We should use the size in the previous transaction to check if the super block is beyond the boundary of the device. Fix it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
We didn't protect the assignment of the target device, it might cause the problem that the super block update was skipped because we might find wrong size of the target device during the assignment. Fix it by moving the assignment sentences into the initialization function of the target device. And there is another merit that we can check if the target device is suitable more early. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
The member variants - num_can_discard - of fs_devices structure are set, but no one use them to do anything. so remove them. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Anand Jain 提交于
we are assigning number_devices to the total_bytes, that's very confusing for a moment Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Anand Jain 提交于
seed fs devices don't participate as rw_device, so don't increment rw_devices when the device being handled belongs to a seed fs. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Anand Jain 提交于
When we replace all the seed device in the system there is no point in just keeping the btrfs_fs_devices with out any device Signed-off-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Anand Jain 提交于
We are not updating sprout fs seed pointer when all seed device is replaced. This patch will check if all seed device has been replaced and then update the sprout pointer accordingly. Same reproducer as in the previous patch would apply here. And notice that btrfs_close_device will check if seed fs is present and spits out the error with out this patch. int btrfs_close_devices(struct btrfs_fs_devices *fs_devices) { :: seed_devices = fs_devices->seed; :: while (seed_devices) { fs_devices = seed_devices; seed_devices = fs_devices->seed; __btrfs_close_devices(fs_devices); free_fs_devices(fs_devices); } Signed-off-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Anand Jain 提交于
reproducer: mount /dev/sdb /btrfs btrfs dev add /dev/sdc /btrfs btrfs rep start -B /dev/sdb /dev/sdd /btrfs umount /btrfs WARNING: CPU: 0 PID: 12661 at fs/btrfs/volumes.c:891 __btrfs_close_devices+0x1b0/0x200 [btrfs]() :: __btrfs_close_devices() :: WARN_ON(fs_devices->open_devices); After the seed device has been replaced the new target device is no more a seed device. So we need to update the device numbers in the fs_devices as pointed by the fs_info. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Anand Jain 提交于
There is no logical change in this patch, just a preparatory patch, so that changes can be easily reasoned. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Filipe Manana 提交于
None of the uses of btrfs_search_forward() need to have the path nodes (level >= 1) read locked, only the leaf needs to be locked while the caller processes it. Therefore make it return a path with all nodes unlocked, except for the leaf. This change is motivated by the observation that during a file fsync we repeatdly call btrfs_search_forward() and process the returned leaf while upper nodes of the returned path (level >= 1) are read locked, which unnecessarily blocks other tasks that want to write to the same fs/subvol btree. Therefore instead of modifying the fsync code to unlock all nodes with level >= 1 immediately after calling btrfs_search_forward(), change btrfs_search_forward() to do it, so that it benefits all callers. Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
The member variants - latest_devid and latest_trans - of fs_devices structure are set, but no one use them to do anything. so remove them. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
The io error might happen during writing out the device stats, and the device stats information and dirty flag would be update at that time, but the current code didn't consider this case, just clear the dirty flag, it would cause that we forgot to write out the new device stats information. Fix it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 HIMANGI SARAOGI 提交于
Use BUG_ON(x) rather than if(x) BUG(); The semantic patch that fixes this problem is as follows: // <smpl> @@ identifier x; @@ -if (x) BUG(); +BUG_ON(x); // </smpl> Signed-off-by: NHimangi Saraogi <himangi774@gmail.com> Acked-by: NJulia Lawall <julia.lawall@lip6.fr> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
If we mounted a seed filesystem with degraded option, and then added a new device into the seed filesystem, then we found adding device failed because of the IO failure. Steps to reproduce: # mkfs.btrfs -d raid1 -m raid1 <dev0> <dev1> # btrfstune -S 1 <dev0> # mount <dev0> -o degraded <mnt> # btrfs device add -f <dev2> <mnt> It is because the original didn't set the chunk on the seed device to be read-only if the degraded flag was set. It was introduced by patch f48b9075, which fixed the problem the raid1 filesystem became read-only after one device of it was missing. But this fix method was not right, we should set the read-only flag according to the number of the missing devices, not the degraded mount option, if the number of the missing devices is less than the max error number that the profile of the chunk tolerates, we don't set it to be read-only. Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 David Sterba 提交于
btrfs_set_key_type and btrfs_key_type are used inconsistently along with open coded variants. Other members of btrfs_key are accessed directly without any helpers anyway. Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <clm@fb.com>
-
- 24 8月, 2014 1 次提交
-
-
由 Liu Bo 提交于
This has been reported and discussed for a long time, and this hang occurs in both 3.15 and 3.16. Btrfs now migrates to use kernel workqueue, but it introduces this hang problem. Btrfs has a kind of work queued as an ordered way, which means that its ordered_func() must be processed in the way of FIFO, so it usually looks like -- normal_work_helper(arg) work = container_of(arg, struct btrfs_work, normal_work); work->func() <---- (we name it work X) for ordered_work in wq->ordered_list ordered_work->ordered_func() ordered_work->ordered_free() The hang is a rare case, first when we find free space, we get an uncached block group, then we go to read its free space cache inode for free space information, so it will file a readahead request btrfs_readpages() for page that is not in page cache __do_readpage() submit_extent_page() btrfs_submit_bio_hook() btrfs_bio_wq_end_io() submit_bio() end_workqueue_bio() <--(ret by the 1st endio) queue a work(named work Y) for the 2nd also the real endio() So the hang occurs when work Y's work_struct and work X's work_struct happens to share the same address. A bit more explanation, A,B,C -- struct btrfs_work arg -- struct work_struct kthread: worker_thread() pick up a work_struct from @worklist process_one_work(arg) worker->current_work = arg; <-- arg is A->normal_work worker->current_func(arg) normal_work_helper(arg) A = container_of(arg, struct btrfs_work, normal_work); A->func() A->ordered_func() A->ordered_free() <-- A gets freed B->ordered_func() submit_compressed_extents() find_free_extent() load_free_space_inode() ... <-- (the above readhead stack) end_workqueue_bio() btrfs_queue_work(work C) B->ordered_free() As if work A has a high priority in wq->ordered_list and there are more ordered works queued after it, such as B->ordered_func(), its memory could have been freed before normal_work_helper() returns, which means that kernel workqueue code worker_thread() still has worker->current_work pointer to be work A->normal_work's, ie. arg's address. Meanwhile, work C is allocated after work A is freed, work C->normal_work and work A->normal_work are likely to share the same address(I confirmed this with ftrace output, so I'm not just guessing, it's rare though). When another kthread picks up work C->normal_work to process, and finds our kthread is processing it(see find_worker_executing_work()), it'll think work C as a collision and skip then, which ends up nobody processing work C. So the situation is that our kthread is waiting forever on work C. Besides, there're other cases that can lead to deadlock, but the real problem is that all btrfs workqueue shares one work->func, -- normal_work_helper, so this makes each workqueue to have its own helper function, but only a wraper pf normal_work_helper. With this patch, I no long hit the above hang. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 19 8月, 2014 4 次提交
-
-
由 Miao Xie 提交于
total_bytes of device is just a in-memory variant which is used to record the size of the device, and it might be changed before we resize a device, if the resize operation fails, it will be fallbacked. But some code used it to update on-disk metadata of the device, it would cause the problem that on-disk metadata of the devices was not consistent. We should use the other variant named disk_total_bytes to update the on-disk metadata of device, because that variant is updated only when the resize operation is successful. Fix it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
The seed filesystem was destroyed by the device replace, the reproduce method is: # mkfs.btrfs -f <dev0> # btrfstune -S 1 <dev0> # mount <dev0> <mnt> # btrfs device add <dev1> <mnt> # umount <mnt> # mount <dev1> <mnt> # btrfs replace start -f <dev0> <dev2> <mnt> # umount <mnt> # mount <dev0> <mnt> It is because we erase the super block on the seed device. It is wrong, we should not change anything on the seed device. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
The missing devices are accounted by its own fs device, for example the missing devices in seed filesystem will be accounted by the fs device of the seed filesystem, not by the new filesystem which is based on the seed filesystem, so when we remove the missing device in the seed filesystem, we should decrease the counter of its own fs device. Fix it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
We forgot to zero some members in fs_devices when we create new fs_devices from the one of the seed fs. It would cause the problem that we got wrong chunk profile when allocating chunks. Fix it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-