- 30 1月, 2018 6 次提交
-
-
由 Khazhismel Kumykov 提交于
Move the last used path to the end of the list (least preferred) so that ties are more evenly distributed. For example, in case with three paths with one that is slower than others, the remaining two would be unevenly used if they tie. This is due to the rotation not being a truely fair distribution. Illustrated: paths a, b, c, 'c' has 1 outstanding IO, a and b are 'tied' Three possible rotations: (a, b, c) -> best path 'a' (b, c, a) -> best path 'b' (c, a, b) -> best path 'a' (a, b, c) -> best path 'a' (b, c, a) -> best path 'b' (c, a, b) -> best path 'a' ... So 'a' is used 2x more than 'b', although they should be used evenly. With this change, the most recently used path is always the least preferred, removing this bias resulting in even distribution. (a, b, c) -> best path 'a' (b, c, a) -> best path 'b' (c, a, b) -> best path 'a' (c, b, a) -> best path 'b' ... Signed-off-by: NKhazhismel Kumykov <khazhy@google.com> Reviewed-by: NMartin Wilck <mwilck@suse.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Scott Bauer 提交于
Since the unstripe target takes a target length which is the size of *one* striped member we're trying to expose, not the total size of *all* the striped members, the check does not make sense and fails for some striped setups. For example, say we have a 4TB striped device: or 3907018496 sectors per underlying device: if (sector_div(width, uc->stripes)) : 3907018496 / 2(num stripes) == 1953509248 tmp_len = width; if (sector_div(tmp_len, uc->chunk_size)) : 1953509248 / 256(chunk size) == 7630895.5 (fails) Fix this by removing the first check which isn't valid for unstriping. Signed-off-by: NScott Bauer <scott.bauer@intel.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Luis de Bethencourt 提交于
The trailing semicolon is an empty statement that does no operation. Removing it since it doesn't do anything. Signed-off-by: NLuis de Bethencourt <luisbg@kernel.org> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
The 'verify_rq_based:' code in dm_table_determine_type() was checking all devices in the DM table rather than only checking the data devices. Fix this by using the immutable target's iterate_devices method. Also, tweak the block of dm_table_determine_type() code that decides whether to upgrade from DM_TYPE_BIO_BASED to DM_TYPE_NVME_BIO_BASED so that it makes sure the immutable_target doesn't support require splitting IOs. These changes have been verified to allow a "thin-pool" target whose data device is an NVMe device to be upgraded to DM_TYPE_NVME_BIO_BASED. Using the thin-pool in NVMe bio-based mode was verified to pass all the device-mapper-test-suite's "thin-provisioning" tests. Also verified that request-based DM multipath (with queue_mode "rq" and "mq") works as expected using the 'mptest' harness. Fixes: 22c11858 ("dm: introduce DM_TYPE_NVME_BIO_BASED") Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Also, add dm_sysfs_init() error handling to dm_create(). Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Add DM_ENDIO_DELAY_REQUEUE to allow request-based multipath's multipath_end_io() to instruct dm-rq.c:dm_done() to delay a requeue. This is beneficial to do if BLK_STS_RESOURCE is returned from the target (because target is busy). Relative to blk-mq: kick the hw queues via blk_mq_requeue_work(), indirectly from dm-rq.c:__dm_mq_kick_requeue_list(), after a delay. For old .request_fn: use blk_delay_queue(). bio-based multipath doesn't have feature parity with request-based for retryable error requeues; that is something that'll need fixing in the future. Suggested-by: NBart Van Assche <bart.vanassche@wdc.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NBart Van Assche <bart.vanassche@wdc.com> [as interpreted from Bart's "... patch looks fine to me."]
-
- 18 1月, 2018 1 次提交
-
-
由 Ming Lei 提交于
blk_insert_cloned_request() is called in the fast path of a dm-rq driver (e.g. blk-mq request-based DM mpath). blk_insert_cloned_request() uses blk_mq_request_bypass_insert() to directly append the request to the blk-mq hctx->dispatch_list of the underlying queue. 1) This way isn't efficient enough because the hctx spinlock is always used. 2) With blk_insert_cloned_request(), we completely bypass underlying queue's elevator and depend on the upper-level dm-rq driver's elevator to schedule IO. But dm-rq currently can't get the underlying queue's dispatch feedback at all. Without knowing whether a request was issued or not (e.g. due to underlying queue being busy) the dm-rq elevator will not be able to provide effective IO merging (as a side-effect of dm-rq currently blindly destaging a request from its elevator only to requeue it after a delay, which kills any opportunity for merging). This obviously causes very bad sequential IO performance. Fix this by updating blk_insert_cloned_request() to use blk_mq_request_direct_issue(). blk_mq_request_direct_issue() allows a request to be issued directly to the underlying queue and returns the dispatch feedback (blk_status_t). If blk_mq_request_direct_issue() returns BLK_SYS_RESOURCE the dm-rq driver will now use DM_MAPIO_REQUEUE to _not_ destage the request. Whereby preserving the opportunity to merge IO. With this, request-based DM's blk-mq sequential IO performance is vastly improved (as much as 3X in mpath/virtio-scsi testing). Signed-off-by: NMing Lei <ming.lei@redhat.com> [blk-mq.c changes heavily influenced by Ming Lei's initial solution, but they were refactored to make them less fragile and easier to read/review] Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 1月, 2018 19 次提交
-
-
由 Ming Lei 提交于
Avoid using DM_MAPIO_REQUEUE unless absolutely necessary because it results in dm-rq.c:dm_mq_queue_rq() returning BLK_STS_RESOURCE to blk-mq -- doing so should only ever be done if the underlying queue is out of resources. So switch to returning DM_MAPIO_DELAY_REQUEUE from multipath_clone_and_map() if either MPATHF_QUEUE_IO or MPATHF_PG_INIT_REQUIRED are set. Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Ming Lei 提交于
blk-mq will rerun queue via RESTART or dispatch wake after one request is completed, so not necessary to wait random time for requeuing, we should trust blk-mq to do it. More importantly, we need to return BLK_STS_RESOURCE to blk-mq so that dequeuing from the I/O scheduler can be stopped, this results in improved I/O merging. Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Ma Shimiao 提交于
If source string is longer than max, kstrndup will allocate max+1 space. So make sure the result will not exceed max. Signed-off-by: NMa Shimiao <mashimiao.fnst@cn.fujitsu.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mikulas Patocka 提交于
The rw_semaphore is acquired for read only in two places, neither is performance-critical. So replace it with a mutex -- which is more efficient. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Goldwyn Rodrigues 提交于
One can crash dm-flakey by specifying more feature arguments than the number of features supplied. Checking for null in arg_name avoids this. dmsetup create flakey-test --table "0 66076080 flakey /dev/sdb9 0 0 180 2 drop_writes" Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Brian Norris 提交于
If anyone is going to use dm_table_create(), they probably should be able to use dm_table_destroy() too. Move the dm_table_destroy() definition outside the private header, near dm_table_create() Signed-off-by: NBrian Norris <briannorris@chromium.org> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Wei Yongjun 提交于
Fixes the following sparse warning: drivers/md/dm-raid.c:33:1: warning: symbol 'raid_sets' was not declared. Should it be static? Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Aliaksei Karaliou 提交于
dm_bufio_client_create() does not check result of register_shrinker() which was tagged as __must_check recently, reported by sparse. Signed-off-by: NAliaksei Karaliou <akaraliou.dev@gmail.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Aliaksei Karaliou 提交于
The client's mutex needs to be destroyed in dm_bufio_client_destroy() as well as the dm_bufio_client_create() error path. Signed-off-by: NAliaksei Karaliou <akaraliou.dev@gmail.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mikulas Patocka 提交于
Use REQ_OP_READ and REQ_OP_WRITE macros instead of READ and WRITE. They have the same value, but the block layer uses REQ_OP so bufio should too. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Scott Bauer 提交于
This device mapper "unstriped" target remaps and unstripes I/O so it is issued solely on a single drive in a HW RAID0 or dm-striped target. In a 4 drive HW RAID0 the striped target exposes 1/4th of the LBA range as a virtual drive. Each I/O to that virtual drive will only be issued to the 1 drive that was selected of the 4 drives in the HW RAID0. This unstriped target is most useful for Intel NVMe drives that have multiple cores but that do not have firmware control to pin separate LBA ranges to each discrete cpu core. Signed-off-by: NScott Bauer <scott.bauer@intel.com> Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com> Acked-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Wei Yongjun 提交于
Fix to return error code -ENOMEM from the mempool_create_kmalloc_pool() error handling case instead of 0, as done elsewhere in this function. Fixes: ef43aa38 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)") Cc: stable@vger.kernel.org # 4.12+ Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Ondrej Kozina 提交于
Loading key via kernel keyring service erases the internal key copy immediately after we pass it in crypto layer. This is wrong because IV is initialized later and we use wrong key for the initialization (instead of real key there's just zeroed block). The bug may cause data corruption if key is loaded via kernel keyring service first and later same crypt device is reactivated using exactly same key in hexbyte representation, or vice versa. The bug (and fix) affects only ciphers using following IVs: essiv, lmk and tcw. Fixes: c538f6ec ("dm crypt: add ability to use keys from the kernel key retention service") Cc: stable@vger.kernel.org # 4.10+ Signed-off-by: NOndrej Kozina <okozina@redhat.com> Reviewed-by: NMilan Broz <gmazyland@gmail.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mikulas Patocka 提交于
Some asynchronous cipher implementations may use DMA. The stack may be mapped in the vmalloc area that doesn't support DMA. Therefore, the cipher request and initialization vector shouldn't be on the stack. Fix this by allocating the request and iv with kmalloc. Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Milan Broz 提交于
If dm-crypt uses authenticated mode with separate MAC, there are two concatenated part of the key structure - key(s) for encryption and authentication key. Add a missing check for authenticated key length. If this key length is smaller than actually provided key, dm-crypt now properly fails instead of crashing. Fixes: ef43aa38 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)") Cc: stable@vger.kernel.org # 4.12+ Reported-by: NSalah Coronya <salahx@yahoo.com> Signed-off-by: NMilan Broz <gmazyland@gmail.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
When inserting a new key/value pair into a btree we walk down the spine of btree nodes performing the following 2 operations: i) space for a new entry ii) adjusting the first key entry if the new key is lower than any in the node. If the _root_ node is full, the function btree_split_beneath() allocates 2 new nodes, and redistibutes the root nodes entries between them. The root node is left with 2 entries corresponding to the 2 new nodes. btree_split_beneath() then adjusts the spine to point to one of the two new children. This means the first key is never adjusted if the new key was lower, ie. operation (ii) gets missed out. This can result in the new key being 'lost' for a period; until another low valued key is inserted that will uncover it. This is a serious bug, and quite hard to make trigger in normal use. A reproducing test case ("thin create devices-in-reverse-order") is available as part of the thin-provision-tools project: https://github.com/jthornber/thin-provisioning-tools/blob/master/functional-tests/device-mapper/dm-tests.scm#L593 Fix the issue by changing btree_split_beneath() so it no longer adjusts the spine. Instead it unlocks both the new nodes, and lets the main loop in btree_insert_raw() relock the appropriate one and make any neccessary adjustments. Cc: stable@vger.kernel.org Reported-by: NMonty Pavel <monty_pavel@sina.com> Signed-off-by: NJoe Thornber <thornber@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Dennis Yang 提交于
For btree removal, there is a corner case that a single thread could takes 6 locks which is more than THIN_MAX_CONCURRENT_LOCKS(5) and leads to deadlock. A btree removal might eventually call rebalance_children()->rebalance3() to rebalance entries of three neighbor child nodes when shadow_spine has already acquired two write locks. In rebalance3(), it tries to shadow and acquire the write locks of all three child nodes. However, shadowing a child node requires acquiring a read lock of the original child node and a write lock of the new block. Although the read lock will be released after block shadowing, shadowing the third child node in rebalance3() could still take the sixth lock. (2 write locks for shadow_spine + 2 write locks for the first two child nodes's shadow + 1 write lock for the last child node's shadow + 1 read lock for the last child node) Cc: stable@vger.kernel.org Signed-off-by: NDennis Yang <dennisyang@qnap.com> Acked-by: NJoe Thornber <thornber@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 16 1月, 2018 1 次提交
-
-
由 Tomasz Majchrzak 提交于
In order to provide data consistency with PPL for disks with write-back cache enabled all data has to be flushed to disks before next PPL entry. The disks to be flushed are marked in the bitmap. It's modified under a mutex and it's only read after PPL io unit is submitted. A limitation of 64 disks in the array has been introduced to keep data structures and implementation simple. RAID5 arrays with so many disks are not likely due to high risk of multiple disks failure. Such restriction should not be a real life limitation. With write-back cache disabled next PPL entry is submitted when data write for current one completes. Data flush defers next log submission so trigger it when there are no stripes for handling found. As PPL assures all data is flushed to disk at request completion, just acknowledge flush request when PPL is enabled. Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: NShaohua Li <sh.li@alibaba-inc.com>
-
- 15 1月, 2018 1 次提交
-
-
由 Mike Snitzer 提交于
DM is no longer prone to having its request_queue be improperly initialized. Summary of changes: - defer DM's blk_register_queue() from add_disk()-time until dm_setup_md_queue() by using add_disk_no_queue_reg() in alloc_dev(). - dm_setup_md_queue() is updated to fully initialize DM's request_queue (_after_ all table loads have occurred and the request_queue's type, features and limits are known). A very welcome side-effect of these changes is DM no longer needs to: 1) backfill the "mq" sysfs entry (because historically DM didn't initialize the request_queue to use blk-mq until _after_ blk_register_queue() was called via add_disk()). 2) call elv_register_queue() to get .request_fn request-based DM device's "iosched" exposed in syfs. In addition, blk-mq debugfs support is now made available because request-based DM's blk-mq request_queue is now properly initialized before dm_setup_md_queue() calls blk_register_queue(). These changes also stave off the need to introduce new DM-specific workarounds in block core, e.g. this proposal: https://patchwork.kernel.org/patch/10067961/ In the end DM devices should be less unicorn in nature (relative to initialization and availability of block core infrastructure provided by the request_queue). Signed-off-by: NMike Snitzer <snitzer@redhat.com> Tested-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 1月, 2018 1 次提交
-
-
由 Keith Busch 提交于
Uses common code for determining if an error should be retried on alternate path. Acked-by: NMike Snitzer <snitzer@redhat.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 1月, 2018 1 次提交
-
-
由 Michael Lyle 提交于
Otherwise, architectures that do negated adds of atomics (e.g. s390) to do atomic_sub fail in closure_set_stopped. Signed-off-by: NMichael Lyle <mlyle@lyle.org> Cc: Kent Overstreet <kent.overstreet@gmail.com> Reported-by: Nkbuild test robot <lkp@intel.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 1月, 2018 10 次提交
-
-
由 Michael Lyle 提交于
Bcache needs to scale the dirty data in the cache over the multiple backing disks in order to calculate writeback rates for each. The previous code did this by multiplying the target number of dirty sectors by the backing device size, and expected it to fit into a uint64_t; this blows up on relatively small backing devices. The new approach figures out the bdev's share in 16384ths of the overall cached data. This is chosen to cope well when bdevs drastically vary in size and to ensure that bcache can cross the petabyte boundary for each backing device. This has been improved based on Tang Junhui's feedback to ensure that every device gets a share of dirty data, no matter how small it is compared to the total backing pool. The existing mechanism is very limited; this is purely a bug fix to remove limits on volume size. However, there still needs to be change to make this "fair" over many volumes where some are idle. Reported-by: NJack Douglas <jack@douglastechnology.co.uk> Signed-off-by: NMichael Lyle <mlyle@lyle.org> Reviewed-by: NTang Junhui <tang.junhui@zte.com.cn> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
Bcache only does recoverable I/O for read operations by calling cached_dev_read_error(). For write opertions there is no I/O recovery for failed requests. But in bch_count_io_errors() no matter read or write I/Os, before errors counter reaches io error limit, pr_err() always prints "IO error on %, recoverying". For write requests this information is misleading, because there is no I/O recovery at all. This patch adds a parameter 'is_read' to bch_count_io_errors(), and only prints "recovering" by pr_err() when the bio direction is READ. Signed-off-by: NColy Li <colyli@suse.de> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Reviewed-by: NTang Junhui <tang.junhui@zte.com.cn> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
Member devices of struct cache_set is used to reference all attached bcache devices to this cache set. If it is treated as array of pointers, size of devices[] is indicated by member nr_uuids of struct cache_set. nr_uuids is calculated in drivers/md/super.c:bch_cache_set_alloc(), bucket_bytes(c) / sizeof(struct uuid_entry) Bucket size is determined by user space tool "make-bcache", by default it is 1024 sectors (defined in bcache-tools/make-bcache.c:main()). So default nr_uuids value is 4096 from the above calculation. Every time when bcache code iterates bcache devices of a cache set, all the 4096 pointers are checked even only 1 bcache device is attached to the cache set, that's a wast of time and unncessary. This patch adds a member devices_max_used to struct cache_set. Its value is 1 + the maximum used index of devices[] in a cache set. When iterating all valid bcache devices of a cache set, use c->devices_max_used in for-loop may reduce a lot of useless checking. Personally, my motivation of this patch is not for performance, I use it in bcache debugging, which helps me to narrow down the scape to check valid bcached devices of a cache set. Signed-off-by: NColy Li <colyli@suse.de> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Reviewed-by: NTang Junhui <tang.junhui@zte.com.cn> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Zhai Zhaoxuan 提交于
The function cached_dev_make_request() and flash_dev_make_request() call generic_start_io_acct() with (struct bcache_device)->disk when they start a closure. Then the function bio_complete() calls generic_end_io_acct() with (struct search)->orig_bio->bi_disk when the closure has done. Since the `bi_disk` is not the bcache device, the generic_end_io_acct() is called with a wrong device queue. It causes the "inflight" (in struct hd_struct) counter keep increasing without decreasing. This patch fix the problem by calling generic_end_io_acct() with (struct bcache_device)->disk. Signed-off-by: NZhai Zhaoxuan <kxuanobj@gmail.com> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Reviewed-by: NColy Li <colyli@suse.de> Reviewed-by: NTang Junhui <tang.junhui@zte.com.cn> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Kent Overstreet 提交于
[edit by mlyle: include sched/debug.h to get __sched] Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> Signed-off-by: NMichael Lyle <mlyle@lyle.org> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Kent Overstreet 提交于
Eliminates cases where sync can race and fail to complete / get stuck. Removes many status flags and simplifies entering-and-exiting closure sleeping behaviors. [mlyle: fixed conflicts due to changed return behavior in mainline. extended commit comment, and squashed down two commits that were mostly contradictory to get to this state. Changed __set_current_state to set_current_state per Jens review comment] Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> Signed-off-by: NMichael Lyle <mlyle@lyle.org> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Michael Lyle 提交于
If the control system would wait for at least half a second, and there's been no reqs hitting the backing disk for awhile: use an alternate mode where we have at most one contiguous set of writebacks in flight at a time. (But don't otherwise delay). If front-end IO appears, it will still be quick, as it will only have to contend with one real operation in flight. But otherwise, we'll be sending data to the backing disk as quickly as it can accept it (with one op at a time). Signed-off-by: NMichael Lyle <mlyle@lyle.org> Reviewed-by: NTang Junhui <tang.junhui@zte.com.cn> Acked-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Michael Lyle 提交于
Writeback keys are presently iterated and dispatched for writeback in order of the logical block address on the backing device. Multiple may be, in parallel, read from the cache device and then written back (especially when there are contiguous I/O). However-- there was no guarantee with the existing code that the writes would be issued in LBA order, as the reads from the cache device are often re-ordered. In turn, when writing back quickly, the backing disk often has to seek backwards-- this slows writeback and increases utilization. This patch introduces an ordering mechanism that guarantees that the original order of issue is maintained for the write portion of the I/O. Performance for writeback is significantly improved when there are multiple contiguous keys or high writeback rates. Signed-off-by: NMichael Lyle <mlyle@lyle.org> Reviewed-by: NTang Junhui <tang.junhui@zte.com.cn> Tested-by: NTang Junhui <tang.junhui@zte.com.cn> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tang Junhui 提交于
in bch_debug_init(), ret is always 0, and the return value is useless, change it to return 0 if be success after calling debugfs_create_dir(), else return a non-zero value. Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tang Junhui 提交于
In such scenario that there are some flash only volumes , and some cached devices, when many tasks request these devices in writeback mode, the write IOs may fall to the same bucket as bellow: | cached data | flash data | cached data | cached data| flash data| then after writeback of these cached devices, the bucket would be like bellow bucket: | free | flash data | free | free | flash data | So, there are many free space in this bucket, but since data of flash only volumes still exists, so this bucket cannot be reclaimable, which would cause waste of bucket space. In this patch, we segregate flash only volume write streams from cached devices, so data from flash only volumes and cached devices can store in different buckets. Compare to v1 patch, this patch do not add a additionally open bucket list, and it is try best to segregate flash only volume write streams from cached devices, sectors of flash only volumes may still be mixed with dirty sectors of cached device, but the number is very small. [mlyle: fixed commit log formatting, permissions, line endings] Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Signed-off-by: NMichael Lyle <mlyle@lyle.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-