- 18 12月, 2018 5 次提交
-
-
由 Mike Snitzer 提交于
No need to be so fancy. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 wuzhouhui 提交于
The workqueues are shared by many multipath devices, only flush whole workqueue when necessary. Otherwise, we just flush works as needed. Signed-off-by: Nwuzhouhui <wuzhouhui14@mails.ucas.ac.cn> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mikulas Patocka 提交于
Indirect calls are inefficient because of retpolines that are used for spectre workaround. This patch replaces an indirect call with a condition (that can be predicted by the branch predictor). Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Jens Axboe 提交于
There's a single user of this function, dm, and dm just wants to check if IO is inflight, not that it's just allocated. This fixes a hang with srp/002 in blktests with dm, where it tries to suspend but waits for inflight IO to finish first. As it checks for just allocated requests, this fails. Tested-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 13 12月, 2018 12 次提交
-
-
由 Guoju Fang 提交于
Sometimes flush journal may be very frequent, so it's useful to dump number of keys every time write journal. Signed-off-by: NGuoju Fang <fangguoju@gmail.com> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
Because CUTOFF_WRITEBACK is defined as 40, so before the changes of dynamic cutoff writeback values, writeback_percent is limited to [0, CUTOFF_WRITEBACK]. Any value larger than CUTOFF_WRITEBACK will be fixed up to 40. Now cutof writeback limit is a dynamic value bch_cutoff_writeback, so the range of writeback_percent can be a more flexible range as [0, bch_cutoff_writeback]. The flexibility is, it can be expended to a larger or smaller range than [0, 40], depends on how value bch_cutoff_writeback is specified. The default value is still strongly recommended to most of users for most of workloads. But for people who want to do research on bcache writeback perforamnce tuning, they may have chance to specify more flexible writeback_percent in range [0, 70]. Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
Currently the cutoff writeback and cutoff writeback sync thresholds are defined by CUTOFF_WRITEBACK (40) and CUTOFF_WRITEBACK_SYNC (70) as static values. Most of time these they work fine, but when people want to do research on bcache writeback mode performance tuning, there is no chance to modify the soft and hard cutoff writeback values. This patch introduces two module parameters bch_cutoff_writeback_sync and bch_cutoff_writeback which permit people to tune the values when loading bcache.ko. If they are not specified by module loading, current values CUTOFF_WRITEBACK_SYNC and CUTOFF_WRITEBACK will be used as default and nothing changes. When people want to tune this two values, - cutoff_writeback can be set in range [1, 70] - cutoff_writeback_sync can be set in range [1, 90] - cutoff_writeback always <= cutoff_writeback_sync The default values are strongly recommended to most of users for most of workloads. Anyway, if people wants to take their own risk to do research on new writeback cutoff tuning for their own workload, now they can make it. Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
This patch moves MODULE_AUTHOR and MODULE_LICENSE to end of super.c, and add MODULE_DESCRIPTION("Bcache: a Linux block layer cache"). This is preparation for adding module parameters. Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
The option gc_after_writeback is disabled by default, because garbage collection will discard SSD data which drops cached data. Echo 1 into /sys/fs/bcache/<UUID>/internal/gc_after_writeback will enable this option, which wakes up gc thread when writeback accomplished and all cached data is clean. This option is helpful for people who cares writing performance more. In heavy writing workload, all cached data can be clean only happens when writeback thread cleans all cached data in I/O idle time. In such situation a following gc running may help to shrink bcache B+ tree and discard more clean data, which may be helpful for future writing requests. If you are not sure whether this is helpful for your own workload, please leave it as disabled by default. Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
Garbage collection thread starts to work when c->sectors_to_gc is negative value, otherwise nothing will happen even the gc thread is woken up by wake_up_gc(). force_wake_up_gc() sets c->sectors_to_gc to -1 before calling wake_up_gc(), then gc thread may have chance to run if no one else sets c->sectors_to_gc to a positive value before gc_should_run(). This routine can be called where the gc thread is woken up and required to run in force. Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shenghui Wang 提交于
"echo 1 > writeback_running" marks writeback_running even if no writeback kthread created as "d_strtoul(writeback_running)" will simply set dc-> writeback_running without checking the existence of dc->writeback_thread. Add check for setting writeback_running via sysfs: if no writeback kthread available, reject setting to 1. v2 -> v3: * Make message on wrong assignment more clear. * Print name of bcache device instead of name of backing device. Signed-off-by: NShenghui Wang <shhuiw@foxmail.com> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shenghui Wang 提交于
A fresh backing device is not attached to any cache_set, and has no writeback kthread created until first attached to some cache_set. But bch_cached_dev_writeback_init run " dc->writeback_running = true; WARN_ON(test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags)); " for any newly formatted backing devices. For a fresh standalone backing device, we can get something like following even if no writeback kthread created: ------------------------ /sys/block/bcache0/bcache# cat writeback_running 1 /sys/block/bcache0/bcache# cat writeback_rate_debug rate: 512.0k/sec dirty: 0.0k target: 0.0k proportional: 0.0k integral: 0.0k change: 0.0k/sec next io: -15427384ms The none ZERO fields are misleading as no alive writeback kthread yet. Set dc->writeback_running false as no writeback thread created in bch_cached_dev_writeback_init(). We have writeback thread created and woken up in bch_cached_dev_writeback _start(). Set dc->writeback_running true before bch_writeback_queue() called, as a writeback thread will check if dc->writeback_running is true before writing back dirty data, and hung if false detected. After the change, we can get the following output for a fresh standalone backing device: ----------------------- /sys/block/bcache0/bcache$ cat writeback_running 0 /sys/block/bcache0/bcache# cat writeback_rate_debug rate: 0.0k/sec dirty: 0.0k target: 0.0k proportional: 0.0k integral: 0.0k change: 0.0k/sec next io: 0ms v1 -> v2: Set dc->writeback_running before bch_writeback_queue() called, Signed-off-by: NShenghui Wang <shhuiw@foxmail.com> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shenghui Wang 提交于
We have struct cached_dev allocated by kzalloc in register_bcache(), which initializes all the fields of cached_dev with 0s. And commit ce4c3e19 ("bcache: Replace bch_read_string_list() by __sysfs_match_string()") has remove the string "default". Update the comment. Signed-off-by: NShenghui Wang <shhuiw@foxmail.com> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shenghui Wang 提交于
commit 220bb38c ("bcache: Break up struct search") introduced changes to struct search and s->iop. bypass/bio are fields of struct data_insert_op now. Update the comment. Signed-off-by: NShenghui Wang <shhuiw@foxmail.com> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shenghui Wang 提交于
debugfs_remove and debugfs_remove_recursive will check if the dentry pointer is NULL or ERR, and will do nothing in that case. Remove the check in cache_set_free and bch_debug_init. Signed-off-by: NShenghui Wang <shhuiw@foxmail.com> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shenghui Wang 提交于
We have the following define for btree iterator: struct btree_iter { size_t size, used; #ifdef CONFIG_BCACHE_DEBUG struct btree_keys *b; #endif struct btree_iter_set { struct bkey *k, *end; } data[MAX_BSETS]; }; We can see that the length of data[] field is static MAX_BSETS, which is defined as 4 currently. But a btree node on disk could have too many bsets for an iterator to fit on the stack - maybe far more that MAX_BSETS. Have to dynamically allocate space to host more btree_iter_sets. bch_cache_set_alloc() will make sure the pool cache_set->fill_iter can allocate an iterator equipped with enough room that can host (sb.bucket_size / sb.block_size) btree_iter_sets, which is more than static MAX_BSETS. bch_btree_node_read_done() will use that pool to allocate one iterator, to host many bsets in one btree node. Add more comment around cache_set->fill_iter to make code less confusing. Signed-off-by: NShenghui Wang <shhuiw@foxmail.com> Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 12月, 2018 2 次提交
-
-
由 Mike Snitzer 提交于
The md->wait waitqueue is used by both bio-based and request-based DM. Commit dbd3bbd2 ("dm rq: leverage blk_mq_queue_busy() to check for outstanding IO") lost sight of the requirement that dm_wait_for_completion() must work with all types of DM devices. Fix md_in_flight() to call the blk-mq or bio-based method accordingly. Fixes: dbd3bbd2 ("dm rq: leverage blk_mq_queue_busy() to check for outstanding IO") Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
After switching to percpu inflight counters, the inflight check is totally buggy. It's perfectly valid for some counters to be non-zero while having a total inflight IO count of 0, that's how these kinds of counters work (inc on one CPU, dec on another). Fix the md_in_flight() check to sum all counters before returning a false positive, potentially. While at it, remove the inflight read for IO completion. We don't need it, just wake anyone that's waiting for the IO count to drop to zero. The caller needs to re-check that value anyway when woken, which it does. Fixes: 6f757231 ("dm: remove the pending IO accounting") Acked-by: NMike Snitzer <snitzer@redhat.com> Reported-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 12月, 2018 4 次提交
-
-
由 Mikulas Patocka 提交于
Remove the "pending" atomic counters, that duplicate block-core's in_flight counters, and update md_in_flight() to look at percpu in_flight counters. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Mike Snitzer 提交于
All of part_stat_* and related methods are used with preempt disabled, so there is no need to pass cpu around to allow of them. Just call smp_processor_id() as needed. Suggested-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Mike Snitzer 提交于
Now that request-based dm-multipath only supports blk-mq, make use of the newly introduced blk_mq_queue_busy() to check for outstanding IO -- rather than (ab)using the block core's in_flight counters. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Mikulas Patocka 提交于
generic_start_io_acct and generic_end_io_acct already update the variable in_flight using atomic operations, so we don't have to overwrite them again. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 12月, 2018 2 次提交
-
-
由 Dennis Zhou 提交于
Prior patches ensured that any bio that interacts with a request_queue is properly associated with a blkg. This makes bio->bi_css unnecessary as blkg maintains a reference to blkcg already. This removes the bio field bi_css and transfers corresponding uses to access via bi_blkg. Signed-off-by: NDennis Zhou <dennis@kernel.org> Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Dennis Zhou 提交于
The next patch changes the macro bio_set_dev() to associate a bio with a blkg based on the device set. However, dm creates a static bio to be used as the basis for cloning empty flush bios on creation. The bio_set_dev() call in alloc_dev() will cause problems with the next patch adding association to bio_set_dev() because the call is before the bdev is associated with a gendisk (bd_disk is %NULL). To get around this, set the device on the static bio every time and use that to clone to the other bios. Signed-off-by: NDennis Zhou <dennis@kernel.org> Acked-by: NMike Snitzer <snitzer@redhat.com> Cc: Alasdair Kergon <agk@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 11月, 2018 2 次提交
-
-
由 Jens Axboe 提交于
Various spots check for q->mq_ops being non-NULL, but provide a helper to do this instead. Where the ->mq_ops != NULL check is redundant, remove it. Since mq == rq-based now that legacy is gone, get rid of the queue_is_rq_based() and just use queue_is_mq() everywhere. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
With the legacy request path gone there is no real need to override the queue_lock. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 11月, 2018 1 次提交
-
-
由 Dennis Zhou 提交于
This reverts a series committed earlier due to null pointer exception bug report in [1]. It seems there are edge case interactions that I did not consider and will need some time to understand what causes the adverse interactions. The original series can be found in [2] with a follow up series in [3]. [1] https://www.spinics.net/lists/cgroups/msg20719.html [2] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/ [3] https://lore.kernel.org/lkml/20181020185612.51587-1-dennis@kernel.org/ This reverts the following commits: d459d853, b2c3fa54, 101246ec, b3b9f24f, e2b09899, f0fcb3ec, c839e7a0, bdc24917, 74b7c02a, 5bf9a1f3, a7b39b4e, 07b05bcc, 49f4c2dc, 27e6fa99Signed-off-by: NDennis Zhou <dennis@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 26 10月, 2018 3 次提交
-
-
由 Damien Le Moal 提交于
Drivers exposing zoned block devices have to initialize and maintain correctness (i.e. revalidate) of the device zone bitmaps attached to the device request queue (seq_zones_bitmap and seq_zones_wlock). To simplify coding this, introduce a generic helper function blk_revalidate_disk_zones() suitable for most (and likely all) cases. This new function always update the seq_zones_bitmap and seq_zones_wlock bitmaps as well as the queue nr_zones field when called for a disk using a request based queue. For a disk using a BIO based queue, only the number of zones is updated since these queues do not have schedulers and so do not need the zone bitmaps. With this change, the zone bitmap initialization code in sd_zbc.c can be replaced with a call to this function in sd_zbc_read_zones(), which is called from the disk revalidate block operation method. A call to blk_revalidate_disk_zones() is also added to the null_blk driver for devices created with the zoned mode enabled. Finally, to ensure that zoned devices created with dm-linear or dm-flakey expose the correct number of zones through sysfs, a call to blk_revalidate_disk_zones() is added to dm_table_set_restrictions(). The zone bitmaps allocated and initialized with blk_revalidate_disk_zones() are freed automatically from __blk_release_queue() using the block internal function blk_queue_free_zone_bitmaps(). Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Dispatching a report zones command through the request queue is a major pain due to the command reply payload rewriting necessary. Given that blkdev_report_zones() is executing everything synchronously, implement report zones as a block device file operation instead, allowing major simplification of the code in many places. sd, null-blk, dm-linear and dm-flakey being the only block device drivers supporting exposing zoned block devices, these drivers are modified to provide the device side implementation of the report_zones() block device file operation. For device mappers, a new report_zones() target type operation is defined so that the upper block layer calls blkdev_report_zones() can be propagated down to the underlying devices of the dm targets. Implementation for this new operation is added to the dm-linear and dm-flakey targets. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> [Damien] * Changed method block_device argument to gendisk * Various bug fixes and improvements * Added support for null_blk, dm-linear and dm-flakey. Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Damien Le Moal 提交于
Introduce the blkdev_nr_zones() helper function to get the total number of zones of a zoned block device. This number is always 0 for a regular block device (q->limits.zoned == BLK_ZONED_NONE case). Replace hard-coded number of zones calculation in dmz_get_zoned_device() with a call to this helper. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 23 10月, 2018 2 次提交
-
-
由 Xiao Ni 提交于
flush_pool is leaked when flush bio size is zero Fixes: 5a409b4f ("MD: fix lock contention for flush bios") Signed-off-by: NDavid Jeffery <djeffery@redhat.com> Signed-off-by: NXiao Ni <xni@redhat.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Jack Wang 提交于
I noticed kmemleak report memory leak when run create/stop md in a loop, backtrace: [<000000001ca975e7>] mempool_create_node+0x86/0xd0 [<0000000095576bcd>] md_run+0x1057/0x1410 [md_mod] [<000000007b45c5fc>] do_md_run+0x15/0x130 [md_mod] [<000000001ede9ec0>] md_ioctl+0x1f49/0x25d0 [md_mod] [<000000004142cacf>] blkdev_ioctl+0x680/0xd00 The root cause is we alloc mddev->flush_pool and mddev->flush_bio_pool in md_run, but from do_md_stop will not call into md_stop but __md_stop, move the mempool_destroy to __md_stop fixes the problem for me. The bug was introduced in 5a409b4f, the fixes should go to 4.18+ Fixes: 5a409b4f ("MD: fix lock contention for flush bios") Signed-off-by: NJack Wang <jinpu.wang@profitbricks.com> Reviewed-by: NXiao Ni <xni@redhat.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 22 10月, 2018 1 次提交
-
-
由 Mike Snitzer 提交于
This dead branch was missed during review. It only makes memory_entry() more inefficient due to needless call to is_power_of_2(), etc. Reported-by: Nshenghui <shhuiw@foxmail.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 19 10月, 2018 6 次提交
-
-
由 Damien Le Moal 提交于
dmz_fetch_mblock() called from dmz_get_mblock() has a race since the allocation of the new metadata block descriptor and its insertion in the cache rbtree with the READING state is not atomic. Two different contexts requesting the same block may end up each adding two different descriptors of the same block to the cache. Another problem for this function is that the BIO for processing the block read is allocated after the metadata block descriptor is inserted in the cache rbtree. If the BIO allocation fails, the metadata block descriptor is freed without first being removed from the rbtree. Fix the first problem by checking again if the requested block is not in the cache right before inserting the newly allocated descriptor, atomically under the mblk_lock spinlock. The second problem is fixed by simply allocating the BIO before inserting the new block in the cache. Finally, since dmz_fetch_mblock() also increments a block reference counter, rename the function to dmz_get_mblock_slow(). To be symmetric and clear, also rename dmz_lookup_mblock() to dmz_get_mblock_fast() and increment the block reference counter directly in that function rather than in dmz_get_mblock(). Fixes: 3b1a94c8 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Damien Le Moal 提交于
Since the ref field of struct dmz_mblock is always used with the spinlock of struct dmz_metadata locked, there is no need to use an atomic_t type. Change the type of the ref field to an unsigne integer. Fixes: 3b1a94c8 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Heinz Mauelshagen 提交于
With raid4/5/6, journal device and write intent bitmap are mutually exclusive. Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Guoqing Jiang 提交于
Previously, we allow multiple nodes can resync device, but we had changed it to only support one node can do resync at one time, but suspend_info is still used. Now, let's remove the structure and use suspend_lo/hi to record the range. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
We need to continue the reshaping if it was interrupted in original node. So original node should call resync_bitmap in case reshaping is aborted. Then BITMAP_NEEDS_SYNC message is broadcasted to other nodes, node which continues the reshaping should restart reshape from mddev->reshape_position instead of from the first beginning. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Guoqing Jiang 提交于
When reshape is happening in one node, other nodes could receive lots of RESYNCING messages, so md_bitmap_sync_with_cluster is called. Since the resyncing window is typically small in these RESYNCING messages, so WARN is always triggered, so we should not call the func when reshape is happening. Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NShaohua Li <shli@fb.com>
-