- 16 11月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Just open code it in the few callers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 03 10月, 2020 2 次提交
-
-
由 Coly Li 提交于
Since bcache code was merged into mainline kerrnel, each cache set only as one single cache in it. The multiple caches framework is here but the code is far from completed. Considering the multiple copies of cached data can also be stored on e.g. md raid1 devices, it is unnecessary to support multiple caches in one cache set indeed. The previous preparation patches fix the dependencies of explicitly making a cache set only have single cache. Now we don't have to maintain an embedded partial super block in struct cache_set, the in-memory super block can be directly referenced from struct cache. This patch removes the embedded struct cache_sb from struct cache_set, and fixes all locations where the superb lock was referenced from this removed super block by referencing the in-memory super block of struct cache. Signed-off-by: NColy Li <colyli@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
Because struct cache_set and struct cache both have struct cache_sb, therefore macro block_bytes() can be used on both of them. When removing the embedded struct cache_sb from struct cache_set, this macro won't be used on struct cache_set anymore. This patch unifies all block_bytes() usage only on struct cache, this is one of the preparation to remove the embedded struct cache_sb from struct cache_set. Signed-off-by: NColy Li <colyli@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 12 9月, 2020 1 次提交
-
-
由 Song Liu 提交于
This enables proper statistics in /proc/diskstats for bcache partitions. Signed-off-by: NSong Liu <songliubraving@fb.com> Reviewed-by: NColy Li <colyli@suse.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 7月, 2020 1 次提交
-
-
由 Coly Li 提交于
This patch is a fix to patch "bcache: fix bio_{start,end}_io_acct with proper device". The previous patch uses a hack to temporarily set bi_disk to bcache device, which is mistaken too. As Christoph suggests, this patch uses disk_{start,end}_io_acct() to count I/O for bcache device in the correct way. Fixes: 85750aeb ("bcache: use bio_{start,end}_io_acct") Signed-off-by: NColy Li <colyli@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 25 7月, 2020 1 次提交
-
-
由 Coly Li 提交于
Commit 85750aeb ("bcache: use bio_{start,end}_io_acct") moves the io account code to the location after bio_set_dev(bio, dc->bdev) in cached_dev_make_request(). Then the account is performed incorrectly on backing device, indeed the I/O should be counted to bcache device like /dev/bcache0. With the mistaken I/O account, iostat does not display I/O counts for bcache device and all the numbers go to backing device. In writeback mode, the hard drive may have 340K+ IOPS which is impossible and wrong for spinning disk. This patch introduces bch_bio_start_io_acct() and bch_bio_end_io_acct(), which switches bio->bi_disk to bcache device before calling bio_start_io_acct() or bio_end_io_acct(). Now the I/Os are counted to bcache device, and bcache device, cache device and backing device have their correct I/O count information back. Fixes: 85750aeb ("bcache: use bio_{start,end}_io_acct") Signed-off-by: NColy Li <colyli@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 7月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Except for pktdvd, the only places setting congested bits are file systems that allocate their own backing_dev_info structures. And pktdvd is a deprecated driver that isn't useful in stack setup either. So remove the dead congested_fn stacking infrastructure. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NSong Liu <song@kernel.org> Acked-by: NDavid Sterba <dsterba@suse.com> [axboe: fixup unused variables in bcache/request.c] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 7月, 2020 2 次提交
-
-
由 Christoph Hellwig 提交于
generic_make_request has always been very confusingly misnamed, so rename it to submit_bio_noacct to make it clear that it is submit_bio minus accounting and a few checks. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The make_request_fn is a little weird in that it sits directly in struct request_queue instead of an operation vector. Replace it with a block_device_operations method called submit_bio (which describes much better what it does). Also remove the request_queue argument to it, as the queue can be derived pretty trivially from the bio. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 5月, 2020 2 次提交
-
-
由 Christoph Hellwig 提交于
Switch bcache to use the nicer bio accounting helpers, and call the routines where we also sample the start time to give coherent accounting results. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Joe Perches 提交于
Remove the trailing newline from the define of pr_fmt and add newlines to the uses. Miscellanea: o Convert bch_bkey_dump from multiple uses of pr_err to pr_cont as the earlier conversion was inappropriate done causing multiple lines to be emitted where only a single output line was desired o Use vsprintf extension %pV in bch_cache_set_error to avoid multiple line output where only a single line output was desired o Coalesce formats Fixes: 6ae63e35 ("bcache: replace printk() by pr_*() routines") Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 25 4月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
The make_request_fn pointer should only be assigned by blk_alloc_queue. Fix a left over manual initialization. Fixes: ff27668c ("bcache: pass the make_request methods to blk_queue_make_request") Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 3月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
bcache is the only driver not actually passing its make_request methods to blk_queue_make_request, but instead just sets them up manually a little later. Make bcache follow the common way of setting up make_request based queues. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 2月, 2020 1 次提交
-
-
由 Coly Li 提交于
In year 2007 high performance SSD was still expensive, in order to save more space for real workload or meta data, the readahead I/Os for non-meta data was bypassed and not cached on SSD. In now days, SSD price drops a lot and people can find larger size SSD with more comfortable price. It is unncessary to alway bypass normal readahead I/Os to save SSD space for now. This patch adds options for readahead data cache policies via sysfs file /sys/block/bcache<N>/readahead_cache_policy, the options are, - "all": cache all readahead data I/Os. - "meta-only": only cache meta data, and bypass other regular I/Os. If users want to make bcache continue to only cache readahead request for metadata and bypass regular data readahead, please set "meta-only" to this sysfs file. By default, bcache will back to cache all read- ahead requests now. Cc: stable@vger.kernel.org Signed-off-by: NColy Li <colyli@suse.de> Acked-by: NEric Wheeler <bcache@linux.ewheeler.net> Cc: Michael Lyle <mlyle@lyle.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 11月, 2019 1 次提交
-
-
由 Coly Li 提交于
In request.c:bch_data_insert_keys(), there is code comment for a piece of dead code. This patch deletes the dead code and its code comment since they are useless in practice. Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 25 4月, 2019 2 次提交
-
-
由 George Spelvin 提交于
There are a few nits in this function. They could in theory all be separate patches, but that's probably taking small commits too far. 1) I added a brief comment saying what it does. 2) I like to declare pointer parameters "const" where possible for documentation reasons. 3) It uses bitmap_weight(&rand, BITS_PER_LONG) to compute the Hamming weight of a 32-bit random number (giving a random integer with mean 16 and variance 8). Passing by reference in a 64-bit variable is silly; just use hweight32(). 4) Its helper function fract_exp_two is unnecessarily tangled. Gcc can optimize the multiply by (1 << x) to a shift, but it can be written in a much more straightforward way at the cost of one more bit of internal precision. Some analysis reveals that this bit is always available. This shrinks the object code for fract_exp_two(x, 6) from 23 bytes: 0000000000000000 <foo1>: 0: 89 f9 mov %edi,%ecx 2: c1 e9 06 shr $0x6,%ecx 5: b8 01 00 00 00 mov $0x1,%eax a: d3 e0 shl %cl,%eax c: 83 e7 3f and $0x3f,%edi f: d3 e7 shl %cl,%edi 11: c1 ef 06 shr $0x6,%edi 14: 01 f8 add %edi,%eax 16: c3 retq To 19: 0000000000000017 <foo2>: 17: 89 f8 mov %edi,%eax 19: 83 e0 3f and $0x3f,%eax 1c: 83 c0 40 add $0x40,%eax 1f: 89 f9 mov %edi,%ecx 21: c1 e9 06 shr $0x6,%ecx 24: d3 e0 shl %cl,%eax 26: c1 e8 06 shr $0x6,%eax 29: c3 retq (Verified with 0 <= frac_bits <= 8, 0 <= x < 16<<frac_bits; both versions produce the same output.) 5) And finally, the call to bch_get_congested() in check_should_bypass() is separated from the use of the value by multiple tests which could moot the need to compute it. Move the computation down to where it's needed. This also saves a local register to hold the computed value. Signed-off-by: NGeorge Spelvin <lkml@sdf.org> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Guoju Fang 提交于
The bio from upper layer is considered completed when bio_complete() returns. In most scenarios bio_complete() is called in search_free(), but when read miss happens, the bio_compete() is called when backing device reading completed, while the struct search is still in use until cache inserting finished. If someone stops the bcache device just then, the device may be closed and released, but after cache inserting finished the struct search will access a freed struct cached_dev. This patch add the reference of bcache device before bio_complete() when read miss happens, and put it after the search is not used. Signed-off-by: NGuoju Fang <fangguoju@gmail.com> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 2月, 2019 1 次提交
-
-
由 Coly Li 提交于
In 'commit 752f66a7 ("bcache: use REQ_PRIO to indicate bio for metadata")' REQ_META is replaced by REQ_PRIO to indicate metadata bio. This assumption is not always correct, e.g. XFS uses REQ_META to mark metadata bio other than REQ_PRIO. This is why Nix noticed that bcache does not cache metadata for XFS after the above commit. Thanks to Dave Chinner, he explains the difference between REQ_META and REQ_PRIO from view of file system developer. Here I quote part of his explanation from mailing list, REQ_META is used for metadata. REQ_PRIO is used to communicate to the lower layers that the submitter considers this IO to be more important that non REQ_PRIO IO and so dispatch should be expedited. IOWs, if the filesystem considers metadata IO to be more important that user data IO, then it will use REQ_PRIO | REQ_META rather than just REQ_META. Then it seems bios with REQ_META or REQ_PRIO should both be cached for performance optimation, because they are all probably low I/O latency demand by upper layer (e.g. file system). So in this patch, when we want to decide whether to bypass the cache, REQ_META and REQ_PRIO are both checked. Then both metadata and high priority I/O requests will be handled properly. Reported-by: NNix <nix@esperi.org.uk> Signed-off-by: NColy Li <colyli@suse.de> Reviewed-by: NAndre Noll <maan@tuebingen.mpg.de> Tested-by: NNix <nix@esperi.org.uk> Cc: stable@vger.kernel.org Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 13 12月, 2018 1 次提交
-
-
由 Shenghui Wang 提交于
commit 220bb38c ("bcache: Break up struct search") introduced changes to struct search and s->iop. bypass/bio are fields of struct data_insert_op now. Update the comment. Signed-off-by: NShenghui Wang <shhuiw@foxmail.com> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 10月, 2018 3 次提交
-
-
由 Tang Junhui 提交于
When doing ioctl in flash device, it will call ioctl_dev() in super.c, then we should not to get cached device since flash only device has no backend device. This patch just move the jugement dc->io_disable to cached_dev_ioctl() to make ioctl in flash device correctly. Fixes: 0f0709e6 ("bcache: stop bcache device when backing device is offline") Signed-off-by: NTang Junhui <tang.junhui.linux@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
In cached_dev_cache_miss() and check_should_bypass(), REQ_META is used to check whether a bio is for metadata request. REQ_META is used for blktrace, the correct REQ_ flag should be REQ_PRIO. This flag means the bio should be prior to other bio, and frequently be used to indicate metadata io in file system code. This patch replaces REQ_META with correct flag REQ_PRIO. CC Adam Manzanares because he explains to me what REQ_PRIO is for. Signed-off-by: NColy Li <colyli@suse.de> Cc: Adam Manzanares <adam.manzanares@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tang Junhui 提交于
Missed reading IOs are identified by s->cache_missed, not the s->cache_miss, so in trace_bcache_read() using trace_bcache_read to identify whether the IO is missed or not. Signed-off-by: NTang Junhui <tang.junhui.linux@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 12 8月, 2018 6 次提交
-
-
由 Coly Li 提交于
kmem_cache_destroy() is safe for NULL pointer as input, the NULL pointer checking is unncessary. This patch just removes the NULL pointer checking to make code simpler. Signed-off-by: NColy Li <colyli@suse.de> Reviewed-by: NShenghui Wang <shhuiw@foxmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
This patch fixes typo 'succesfully' to correct 'successfully', which is suggested by checkpatch.pl. Signed-off-by: NColy Li <colyli@suse.de> Reviewed-by: NShenghui Wang <shhuiw@foxmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
This patch fixes the lines over 80 characters into more lines, to minimize warnings by checkpatch.pl. There are still some lines exceed 80 characters, but it is better to be a single line and I don't change them. Signed-off-by: NColy Li <colyli@suse.de> Reviewed-by: NShenghui Wang <shhuiw@foxmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
There are many function definitions do not have identifier argument names, scripts/checkpatch.pl complains warnings like this, WARNING: function definition argument 'struct bcache_device *' should also have an identifier name #16735: FILE: writeback.h:120: +void bch_sectors_dirty_init(struct bcache_device *); This patch adds identifier argument names to all bcache function definitions to fix such warnings. Signed-off-by: NColy Li <colyli@suse.de> Reviewed: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
Signed-off-by: NColy Li <colyli@suse.de> Reviewed-by: NShenghui Wang <shhuiw@foxmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
This patch fixes warning reported by checkpatch.pl by replacing 'unsigned' with 'unsigned int'. Signed-off-by: NColy Li <colyli@suse.de> Reviewed-by: NShenghui Wang <shhuiw@foxmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 8月, 2018 2 次提交
-
-
由 Coly Li 提交于
Commit b1092c9a ("bcache: allow quick writeback when backing idle") allows the writeback rate to be faster if there is no I/O request on a bcache device. It works well if there is only one bcache device attached to the cache set. If there are many bcache devices attached to a cache set, it may introduce performance regression because multiple faster writeback threads of the idle bcache devices will compete the btree level locks with the bcache device who have I/O requests coming. This patch fixes the above issue by only permitting fast writebac when all bcache devices attached on the cache set are idle. And if one of the bcache devices has new I/O request coming, minimized all writeback throughput immediately and let PI controller __update_writeback_rate() to decide the upcoming writeback rate for each bcache device. Also when all bcache devices are idle, limited wrieback rate to a small number is wast of thoughput, especially when backing devices are slower non-rotation devices (e.g. SATA SSD). This patch sets a max writeback rate for each backing device if the whole cache set is idle. A faster writeback rate in idle time means new I/Os may have more available space for dirty data, and people may observe a better write performance then. Please note bcache may change its cache mode in run time, and this patch still works if the cache mode is switched from writeback mode and there is still dirty data on cache. Fixes: Commit b1092c9a ("bcache: allow quick writeback when backing idle") Cc: stable@vger.kernel.org #4.16+ Signed-off-by: NColy Li <colyli@suse.de> Tested-by: NKai Krakow <kai@kaishome.de> Tested-by: NStefan Priebe <s.priebe@profihost.ag> Cc: Michael Lyle <mlyle@lyle.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
This patch updates code comment in bch_keylist_realloc() by fixing incorrected function names, to make the code to be more comprehennsible. Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 7月, 2018 1 次提交
-
-
由 Tang Junhui 提交于
In GC thread, we record the latest GC key in gc_done, which is expected to be used for incremental GC, but in currently code, we didn't realize it. When GC runs, front side IO would be blocked until the GC over, it would be a long time if there is a lot of btree nodes. This patch realizes incremental GC, the main ideal is that, when there are front side I/Os, after GC some nodes (100), we stop GC, release locker of the btree node, and go to process the front side I/Os for some times (100 ms), then go back to GC again. By this patch, when we doing GC, I/Os are not blocked all the time, and there is no obvious I/Os zero jump problem any more. Patch v2: Rename some variables and macros name as Coly suggested. Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 7月, 2018 1 次提交
-
-
由 Michael Callahan 提交于
Add and use a new op_stat_group() function for indexing partition stat fields rather than indexing them by rq_data_dir() or bio_data_dir(). This function works similarly to op_is_sync() in that it takes the request::cmd_flags or bio::bi_opf flags and determines which stats should et updated. In addition, the second parameter to generic_start_io_acct() and generic_end_io_acct() is now a REQ_OP rather than simply a read or write bit and it uses op_stat_group() on the parameter to determine the stat group. Note that the partition in_flight counts are not part of the per-cpu statistics and as such are not indexed via this function. It's now indexed by op_is_write(). tj: Refreshed on top of v4.17. Updated to pass around REQ_OP. Signed-off-by: NMichael Callahan <michaelcallahan@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Alasdair Kergon <agk@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 31 5月, 2018 1 次提交
-
-
由 Kent Overstreet 提交于
Convert bcache to embedded bio sets. Reviewed-by: NColy Li <colyli@suse.de> Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 03 5月, 2018 1 次提交
-
-
由 Coly Li 提交于
Current code uses bdevname() or bio_devname() to reference gendisk disk name when bcache needs to display the disk names in kernel message. It was safe before bcache device failure handling patch set merged in, because when devices are failed, there was deadlock to prevent bcache printing error messages with gendisk disk name. But after the failure handling patch set merged, the deadlock is fixed, so it is possible that the gendisk structure bdev->hd_disk is released when bdevname() is called to reference bdev->bd_disk->disk_name[]. This is why I receive bug report of NULL pointers deference panic. This patch stores gendisk disk name in a buffer inside struct cache and struct cached_dev, then print out the offline device name won't reference bdev->hd_disk anymore. And this patch also avoids extra function calls of bdevname() and bio_devnmae(). Changelog: v3, add Reviewed-by from Hannes. v2, call bdevname() earlier in register_bdev() v1, first version with segguestion from Junhui Tang. Fixes: c7b7bd07 ("bcache: add io_disable to struct cached_dev") Fixes: 5138ac67 ("bcache: fix misleading error message in bch_count_io_errors()") Signed-off-by: NColy Li <colyli@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 3月, 2018 5 次提交
-
-
由 Bart Van Assche 提交于
Avoid that building with W=1 triggers warnings about the kernel-doc headers. Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
If a bcache device is configured to writeback mode, current code does not handle write I/O errors on backing devices properly. In writeback mode, write request is written to cache device, and latter being flushed to backing device. If I/O failed when writing from cache device to the backing device, bcache code just ignores the error and upper layer code is NOT noticed that the backing device is broken. This patch tries to handle backing device failure like how the cache device failure is handled, - Add a error counter 'io_errors' and error limit 'error_limit' in struct cached_dev. Add another io_disable to struct cached_dev to disable I/Os on the problematic backing device. - When I/O error happens on backing device, increase io_errors counter. And if io_errors reaches error_limit, set cache_dev->io_disable to true, and stop the bcache device. The result is, if backing device is broken of disconnected, and I/O errors reach its error limit, backing device will be disabled and the associated bcache device will be removed from system. Changelog: v2: remove "bcache: " prefix in pr_error(), and use correct name string to print out bcache device gendisk name. v1: indeed this is new added in v2 patch set. Signed-off-by: NColy Li <colyli@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Cc: Michael Lyle <mlyle@lyle.org> Cc: Junhui Tang <tang.junhui@zte.com.cn> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
In order to catch I/O error of backing device, a separate bi_end_io call back is required. Then a per backing device counter can record I/O errors number and retire the backing device if the counter reaches a per backing device I/O error limit. This patch adds backing_request_endio() to bcache backing device I/O code path, this is a preparation for further complicated backing device failure handling. So far there is no real code logic change, I make this change a separate patch to make sure it is stable and reliable for further work. Changelog: v2: Fix code comments typo, remove a redundant bch_writeback_add() line added in v4 patch set. v1: indeed this is new added in this patch set. [mlyle: truncated commit subject] Signed-off-by: NColy Li <colyli@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Cc: Junhui Tang <tang.junhui@zte.com.cn> Cc: Michael Lyle <mlyle@lyle.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tang Junhui 提交于
When we run IO in a detached device, and run iostat to shows IO status, normally it will show like bellow (Omitted some fields): Device: ... avgrq-sz avgqu-sz await r_await w_await svctm %util sdd ... 15.89 0.53 1.82 0.20 2.23 1.81 52.30 bcache0 ... 15.89 115.42 0.00 0.00 0.00 2.40 69.60 but after IO stopped, there are still very big avgqu-sz and %util values as bellow: Device: ... avgrq-sz avgqu-sz await r_await w_await svctm %util bcache0 ... 0 5326.32 0.00 0.00 0.00 0.00 100.10 The reason for this issue is that, only generic_start_io_acct() called and no generic_end_io_acct() called for detached device in cached_dev_make_request(). See the code: //start generic_start_io_acct() generic_start_io_acct(q, rw, bio_sectors(bio), &d->disk->part0); if (cached_dev_get(dc)) { //will callback generic_end_io_acct() } else { //will not call generic_end_io_acct() } This patch calls generic_end_io_acct() in the end of IO for detached devices, so we can show IO state correctly. (Modified to use GFP_NOIO in kzalloc() by Coly Li) Changelog: v2: fix typo. v1: the initial version. Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn> Reviewed-by: NColy Li <colyli@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
When too many I/Os failed on cache device, bch_cache_set_error() is called in the error handling code path to retire whole problematic cache set. If new I/O requests continue to come and take refcount dc->count, the cache set won't be retired immediately, this is a problem. Further more, there are several kernel thread and self-armed kernel work may still running after bch_cache_set_error() is called. It needs to wait quite a while for them to stop, or they won't stop at all. They also prevent the cache set from being retired. The solution in this patch is, to add per cache set flag to disable I/O request on this cache and all attached backing devices. Then new coming I/O requests can be rejected in *_make_request() before taking refcount, kernel threads and self-armed kernel worker can stop very fast when flags bit CACHE_SET_IO_DISABLE is set. Because bcache also do internal I/Os for writeback, garbage collection, bucket allocation, journaling, this kind of I/O should be disabled after bch_cache_set_error() is called. So closure_bio_submit() is modified to check whether CACHE_SET_IO_DISABLE is set on cache_set->flags. If set, closure_bio_submit() will set bio->bi_status to BLK_STS_IOERR and return, generic_make_request() won't be called. A sysfs interface is also added to set or clear CACHE_SET_IO_DISABLE bit from cache_set->flags, to disable or enable cache set I/O for debugging. It is helpful to trigger more corner case issues for failed cache device. Changelog v4, add wait_for_kthread_stop(), and call it before exits writeback and gc kernel threads. v3, change CACHE_SET_IO_DISABLE from 4 to 3, since it is bit index. remove "bcache: " prefix when printing out kernel message. v2, more changes by previous review, - Use CACHE_SET_IO_DISABLE of cache_set->flags, suggested by Junhui. - Check CACHE_SET_IO_DISABLE in bch_btree_gc() to stop a while-loop, this is reported and inspired from origal patch of Pavel Vazharov. v1, initial version. Signed-off-by: NColy Li <colyli@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Cc: Junhui Tang <tang.junhui@zte.com.cn> Cc: Michael Lyle <mlyle@lyle.org> Cc: Pavel Vazharov <freakpv@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 2月, 2018 1 次提交
-
-
由 Tang Junhui 提交于
Kernel crashed when run fio in a RAID5 backend bcache device, the call trace is bellow: [ 440.012034] kernel BUG at block/blk-ioc.c:146! [ 440.012696] invalid opcode: 0000 [#1] SMP NOPTI [ 440.026537] CPU: 2 PID: 2205 Comm: md127_raid5 Not tainted 4.15.0 #8 [ 440.027441] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 07/16 /2015 [ 440.028615] RIP: 0010:put_io_context+0x8b/0x90 [ 440.029246] RSP: 0018:ffffa8c882b43af8 EFLAGS: 00010246 [ 440.029990] RAX: 0000000000000000 RBX: ffffa8c88294fca0 RCX: 0000000000 0f4240 [ 440.031006] RDX: 0000000000000004 RSI: 0000000000000286 RDI: ffffa8c882 94fca0 [ 440.032030] RBP: ffffa8c882b43b10 R08: 0000000000000003 R09: ffff949cb8 0c1700 [ 440.033206] R10: 0000000000000104 R11: 000000000000b71c R12: 00000000000 01000 [ 440.034222] R13: 0000000000000000 R14: ffff949cad84db70 R15: ffff949cb11 bd1e0 [ 440.035239] FS: 0000000000000000(0000) GS:ffff949cba280000(0000) knlGS: 0000000000000000 [ 440.060190] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 440.084967] CR2: 00007ff0493ef000 CR3: 00000002f1e0a002 CR4: 00000000001 606e0 [ 440.110498] Call Trace: [ 440.135443] bio_disassociate_task+0x1b/0x60 [ 440.160355] bio_free+0x1b/0x60 [ 440.184666] bio_put+0x23/0x30 [ 440.208272] search_free+0x23/0x40 [bcache] [ 440.231448] cached_dev_write_complete+0x31/0x70 [bcache] [ 440.254468] closure_put+0xb6/0xd0 [bcache] [ 440.277087] request_endio+0x30/0x40 [bcache] [ 440.298703] bio_endio+0xa1/0x120 [ 440.319644] handle_stripe+0x418/0x2270 [raid456] [ 440.340614] ? load_balance+0x17b/0x9c0 [ 440.360506] handle_active_stripes.isra.58+0x387/0x5a0 [raid456] [ 440.380675] ? __release_stripe+0x15/0x20 [raid456] [ 440.400132] raid5d+0x3ed/0x5d0 [raid456] [ 440.419193] ? schedule+0x36/0x80 [ 440.437932] ? schedule_timeout+0x1d2/0x2f0 [ 440.456136] md_thread+0x122/0x150 [ 440.473687] ? wait_woken+0x80/0x80 [ 440.491411] kthread+0x102/0x140 [ 440.508636] ? find_pers+0x70/0x70 [ 440.524927] ? kthread_associate_blkcg+0xa0/0xa0 [ 440.541791] ret_from_fork+0x35/0x40 [ 440.558020] Code: c2 48 00 5b 41 5c 41 5d 5d c3 48 89 c6 4c 89 e7 e8 bb c2 48 00 48 8b 3d bc 36 4b 01 48 89 de e8 7c f7 e0 ff 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 1f 00 0f 1f 44 00 00 55 48 8d 47 b8 48 89 e5 41 57 41 [ 440.610020] RIP: put_io_context+0x8b/0x90 RSP: ffffa8c882b43af8 [ 440.628575] ---[ end trace a1fd79d85643a73e ]-- All the crash issue happened when a bypass IO coming, in such scenario s->iop.bio is pointed to the s->orig_bio. In search_free(), it finishes the s->orig_bio by calling bio_complete(), and after that, s->iop.bio became invalid, then kernel would crash when calling bio_put(). Maybe its upper layer's faulty, since bio should not be freed before we calling bio_put(), but we'd better calling bio_put() first before calling bio_complete() to notify upper layer ending this bio. This patch moves bio_complete() under bio_put() to avoid kernel crash. [mlyle: fixed commit subject for character limits] Reported-by: NMatthias Ferdinand <bcache@mfedv.net> Tested-by: NMatthias Ferdinand <bcache@mfedv.net> Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn> Reviewed-by: NMichael Lyle <mlyle@lyle.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-