- 11 3月, 2014 1 次提交
-
-
由 Miao Xie 提交于
During device replace test, we hit a null pointer deference (It was very easy to reproduce it by running xfstests' btrfs/011 on the devices with the virtio scsi driver). There were two bugs that caused this problem: - We might allocate new chunks on the replaced device after we updated the mapping tree. And we forgot to replace the source device in those mapping of the new chunks. - We might get the mapping information which including the source device before the mapping information update. And then submit the bio which was based on that mapping information after we freed the source device. For the first bug, we can fix it by doing mapping tree update and source device remove in the same context of the chunk mutex. The chunk mutex is used to protect the allocable device list, the above method can avoid the new chunk allocation, and after we remove the source device, all the new chunks will be allocated on the new device. So it can fix the first bug. For the second bug, we need make sure all flighting bios are finished and no new bios are produced during we are removing the source device. To fix this problem, we introduced a global @bio_counter, we not only inc/dec @bio_counter outsize of map_blocks, but also inc it before submitting bio and dec @bio_counter when ending bios. Since Raid56 is a little different and device replace dosen't support raid56 yet, it is not addressed in the patch and I add comments to make sure we will fix it in the future. Reported-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
- 12 11月, 2013 2 次提交
-
-
由 Dulshani Gunawardhana 提交于
Pack the structure btrfs_device in volumes.h to eliminate holes detected by pahole, thus reducing binary memory footprint. Signed-off-by: NDulshani Gunawardhana <dulshani.gunawardhana89@gmail.com> Reviewed-by: NZach Brown <zab@redhat.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Wang Shilong 提交于
Originally, we introduced scrub_super_lock to synchronize tree log code with scrubbing super. However we can replace scrub_super_lock with device_list_mutex, because writing super will hold this mutex, this will reduce an extra lock holding when writing supers in sync log code. Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 01 9月, 2013 4 次提交
-
-
由 Ilya Dryomov 提交于
Currently btrfs_device is allocated ad-hoc in a few different places, and as a result not all fields are initialized properly. In particular, readahead state is only initialized in device_list_add (at scan time), and not in btrfs_init_new_device (when the new device is added with 'btrfs dev add'). Fix this by adding an allocation helper and switch everybody but __btrfs_close_devices to it. (__btrfs_close_devices is dealt with in a later commit.) Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Stefan Behrens 提交于
If the filesystem was mounted with an old kernel that was not aware of the UUID tree, this is detected by looking at the uuid_tree_generation field of the superblock (similar to how the free space cache is doing it). If a mismatch is detected at mount time, a thread is started that does two things: 1. Iterate through the UUID tree, check each entry, delete those entries that are not valid anymore (i.e., the subvol does not exist anymore or the value changed). 2. Iterate through the root tree, for each found subvolume, add the UUID tree entries for the subvolume (if they are not already there). This mechanism is also used to handle and repair errors that happened during the initial creation and filling of the tree. The update of the uuid_tree_generation field (which indicates that the state of the UUID tree is up to date) is blocked until all create and repair operations are successfully completed. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Stefan Behrens 提交于
This tree is not created by mkfs.btrfs. Therefore when a filesystem is mounted writable and the UUID tree does not exist, this tree is created if required. The tree is also added to the fs_info structure and initialized, but this commit does not yet read or write UUID tree elements. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Miao Xie 提交于
Before applying this patch, we cached the csum value into the extent state tree when reading some data from the disk, this operation increased the lock contention of the state tree. Now, we just store the csum value into the bio structure or other unshared structure, so we can reduce the lock contention. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 02 7月, 2013 1 次提交
-
-
由 Josef Bacik 提交于
When adjusting the enospc rules for relocation I ran into a deadlock because we were relocating the only system chunk and that forced us to try and allocate a new system chunk while holding locks in the chunk tree, which caused us to deadlock. To fix this I've moved all of the dev extent addition and chunk addition out to the delayed chunk completion stuff. We still keep the in-memory stuff which makes sure everything is consistent. One change I had to make was to search the commit root of the device tree to find a free dev extent, and hold onto any chunk em's that we allocated in that transaction so we do not allocate the same dev extent twice. This has the side effect of fixing a bug with balance that has been there ever since balance existed. Basically you can free a block group and it's dev extent and then immediately allocate that dev extent for a new block group and write stuff to that dev extent, all within the same transaction. So if you happen to crash during a balance you could come back to a completely broken file system. This patch should keep these sort of things from happening in the future since we won't be able to allocate free'd dev extents until after the transaction commits. This has passed all of the xfstests and my super annoying stress test followed by a balance. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 14 6月, 2013 1 次提交
-
-
由 Miao Xie 提交于
There are several functions whose code is similar, such as btrfs_find_last_root() btrfs_read_fs_root_no_radix() Besides that, some functions are invoked twice, it is unnecessary, for example, we are sure that all roots which is found in btrfs_find_orphan_roots() have their orphan items, so it is unnecessary to check the orphan item again. So cleanup it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 18 5月, 2013 1 次提交
-
-
由 Chris Mason 提交于
Btrfs has been pointer tagging bi_private and using bi_bdev to store the stripe index and mirror number of failed IOs. As bios bubble back up through the call chain, we use these to decide if and how to retry our IOs. They are also used to count IO failures on a per device basis. Recently a bio tracepoint was added lead to crashes because we were abusing bi_bdev. This commit adds a btrfs bioset, and creates explicit fields for the mirror number and stripe index. The plan is to extend this structure for all of the fields currently in struct btrfs_bio, which will mean one less kmalloc in our IO path. Signed-off-by: NChris Mason <chris.mason@fusionio.com> Reported-by: NTejun Heo <tj@kernel.org>
-
- 07 5月, 2013 1 次提交
-
-
由 Eric Sandeen 提交于
Big patch, but all it does is add statics to functions which are in fact static, then remove the associated dead-code fallout. removed functions: btrfs_iref_to_path() __btrfs_lookup_delayed_deletion_item() __btrfs_search_delayed_insertion_item() __btrfs_search_delayed_deletion_item() find_eb_for_page() btrfs_find_block_group() range_straddles_pages() extent_range_uptodate() btrfs_file_extent_length() btrfs_scrub_cancel_devid() btrfs_start_transaction_lflush() btrfs_print_tree() is left because it is used for debugging. btrfs_start_transaction_lflush() and btrfs_reada_detach() are left for symmetry. ulist.c functions are left, another patch will take care of those. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 20 2月, 2013 1 次提交
-
-
由 Filipe Brandenburger 提交于
The header file will then be installed under /usr/include/linux so that userspace applications can refer to Btrfs ioctls by name and use the same structs used internally in the kernel. Signed-off-by: NFilipe Brandenburger <filbranden@google.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 02 2月, 2013 1 次提交
-
-
由 David Woodhouse 提交于
This builds on David Woodhouse's original Btrfs raid5/6 implementation. The code has changed quite a bit, blame Chris Mason for any bugs. Read/modify/write is done after the higher levels of the filesystem have prepared a given bio. This means the higher layers are not responsible for building full stripes, and they don't need to query for the topology of the extents that may get allocated during delayed allocation runs. It also means different files can easily share the same stripe. But, it does expose us to incorrect parity if we crash or lose power while doing a read/modify/write cycle. This will be addressed in a later commit. Scrub is unable to repair crc errors on raid5/6 chunks. Discard does not work on raid5/6 (yet) The stripe size is fixed at 64KiB per disk. This will be tunable in a later commit. Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 17 12月, 2012 1 次提交
-
-
由 Liu Bo 提交于
Raid properties can be shared among raid calculation code, we can put them into a global table to keep it simple. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 13 12月, 2012 9 次提交
-
-
由 Stefan Behrens 提交于
This commit contains all the essential changes to the core code of Btrfs for support of the device replace procedure. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Stefan Behrens 提交于
This adds a new file to the sources together with the header file and the changes to ioctl.h and ctree.h that are required by the new C source file. Additionally, 4 new functions are added to volume.c that deal with device creation and destruction. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Stefan Behrens 提交于
This patch adds some code to disallow operations on the device that is used as the target for the device replace operation. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Stefan Behrens 提交于
A small number of functions that are used in a device replace procedure when the operation is resumed at mount time are unable to pass the same root pointer that would be used in the regular (ioctl) context. And since the root pointer is not required, only the fs_info is, the root pointer argument is replaced with the fs_info pointer argument. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Stefan Behrens 提交于
This new function is used by the device replace procedure in a later patch. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Stefan Behrens 提交于
This is required for the device replace procedure in a later step. Two calling functions also had to be changed to have the fs_info pointer: repair_io_failure() and scrub_setup_recheck_block(). Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Stefan Behrens 提交于
This is required for the device replace procedure in a later step. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Stefan Behrens 提交于
The new function btrfs_find_device_missing_or_by_path() will be used for the device replace procedure. This function itself calls the second new function btrfs_find_device_by_path(). Unfortunately, it is not possible to currently make the rest of the code use these functions as well, since all functions that look similar at first view are all a little bit different in what they are doing. But in the future, new code could benefit from these two new functions, and currently, device replace uses them. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Stefan Behrens 提交于
The device replace procedure makes use of the scrub code. The scrub code is the most efficient code to read the allocated data of a disk, i.e. it reads sequentially in order to avoid disk head movements, it skips unallocated blocks, it uses read ahead mechanisms, and it contains all the code to detect and repair defects. This commit is a first preparation step to adapt the scrub code to be shareable for the device replace procedure. The block device will be removed from the scrub context state structure in a later step. It used to be the source block device. The scrub code as it is used for the device replace procedure reads the source data from whereever it is optimal. The source device might even be gone (disconnected, for instance due to a hardware failure). Or the drive can be so faulty so that the device replace procedure tries to avoid access to the faulty source drive as much as possible, and only if all other mirrors are damaged, as a last resort, the source disk is accessed. The modified scrub code operates as if it would handle the source drive and thereby generates an exact copy of the source disk on the target disk, even if the source disk is not present at all. Therefore the block device pointer to the source disk is removed in a later patch, and therefore the context structure is renamed (this is the goal of the current patch) to reflect that no source block device scope is there anymore. Summary: This first preparation step consists of a textual substitution of the term "dev" to the term "ctx" whereever the scrub context is used. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 29 8月, 2012 1 次提交
-
-
由 Stefan Behrens 提交于
Commit 442a4f63 added btrfs device statistic counters for detected IO and checksum errors to Linux 3.5. The statistic part that counts checksum errors in end_bio_extent_readpage() can cause a BUG() in a subfunction: "kernel BUG at fs/btrfs/volumes.c:3762!" That part is reverted with the current patch. However, the counting of checksum errors in the scrub context remains active, and the counting of detected IO errors (read, write or flush errors) in all contexts remains active. Cc: stable <stable@vger.kernel.org> # 3.5 Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 24 7月, 2012 2 次提交
-
-
由 Josef Bacik 提交于
This will be used in conjunction with btrfs device ready <dev>. This is needed for initrd's to have a nice and lightweight way to tell if all of the devices needed for a file system are in the cache currently. This keeps them from having to do mount+sleep loops waiting for devices to show up. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 David Sterba 提交于
Commit c11d2c23 (Btrfs: add ioctl to get and reset the device stats) introduced two ioctls doing almost the same thing distinguished by just the ioctl number which encodes "do reset after read". I have suggested http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg16604.html to implement it via the ioctl args. This hasn't happen, and I think we should use a more clean way to pass flags and should not waste ioctl numbers. CC: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
- 03 7月, 2012 2 次提交
-
-
由 Ilya Dryomov 提交于
This introduces btrfs_resume_balance_async(), which, given that restriper state was recovered earlier by btrfs_recover_balance(), resumes balance in btrfs-balance kthread. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Fix a bug that triggered asserts in btrfs_balance() in both normal and resume modes -- restriper state was not properly restored on read-only mounts. This factors out resuming code from btrfs_restore_balance(), which is now also called earlier in the mount sequence to avoid the problem of some early writes getting the old profile. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 15 6月, 2012 1 次提交
-
-
由 Josef Bacik 提交于
Al pointed out that we can just toss out the old name on a device and add a new one arbitrarily, so anybody who uses device->name in printk could possibly use free'd memory. Instead of adding locking around all of this he suggested doing it with RCU, so I've introduced a struct rcu_string that does just that and have gone through and protected all accesses to device->name that aren't under the uuid_mutex with rcu_read_lock(). This protects us and I will use it for dealing with removing the device that we used to mount the file system in a later patch. Thanks, Reviewed-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 30 5月, 2012 3 次提交
-
-
由 Stefan Behrens 提交于
The device statistics are written into the device tree with each transaction commit. Only modified statistics are written. When a filesystem is mounted, the device statistics for each involved device are read from the device tree and used to initialize the counters. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
-
由 Stefan Behrens 提交于
An ioctl interface is added to get the device statistic counters. A second ioctl is added to atomically get and reset these counters. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
-
由 Stefan Behrens 提交于
The goal is to detect when drives start to get an increased error rate, when drives should be replaced soon. Therefore statistic counters are added that count IO errors (read, write and flush). Additionally, the software detected errors like checksum errors and corrupted blocks are counted. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
-
- 22 3月, 2012 1 次提交
-
-
由 Jeff Mahoney 提交于
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
-
- 17 1月, 2012 7 次提交
-
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Implement an ioctl for canceling restriper. Currently we wait until relocation of the current block group is finished, in future this can be done by triggering a commit. Balance item is deleted and no memory about the interrupted balance is kept. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Implement an ioctl for pausing restriper. This pauses the relocation, but balance is still considered to be "in progress": balance item is not deleted, other volume operations cannot be started, etc. If paused in the middle of profile changing operation we will continue making allocations with the target profile. Add a hook to close_ctree() to pause restriper and free its data structures on unmount. (It's safe to unmount when restriper is in "paused" state, we will resume with the same parameters on the next mount) Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
On mount, if balance item is found, resume balance in a separate kernel thread. Try to be smart to continue roughly where previous balance (or convert) was interrupted. For chunk types that were being converted to some profile we turn on soft convert, in case of a simple balance we turn on usage filter and relocate only less-than-90%-full chunks of that type. These are just heuristics but they help quite a bit, and can be improved in future. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
When doing convert from one profile to another if soft mode is on restriper won't touch chunks that already have the profile we are converting to. This is useful if e.g. half of the FS was converted earlier. The soft mode switch is (like every other filter) per-type. This means that we can convert for example meta chunks the "hard" way while converting data chunks selectively with soft switch. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Profile changing is done by launching a balance with BTRFS_BALANCE_CONVERT bits set and target fields of respective btrfs_balance_args structs initialized. Profile reducing code in this case will pick restriper's target profile if it's available instead of doing a blind reduce. If target profile is not yet available it goes back to a plain reduce. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Select chunks which have at least one byte located inside a given [vstart, vend) virtual address space range. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-