提交 · d5c9c470b01177e4d90cdbf178b8c7f37f5b8795 · openeuler / Kernel

24 1月, 2020 15 次提交

bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan() · d5c9c470

由 Coly Li 提交于 1月 24, 2020

In order to skip the most recently freed btree node cahce, currently
in bch_mca_scan() the first 3 caches in c->btree_cache_freeable list
are skipped when shrinking bcache node caches in bch_mca_scan(). The
related code in bch_mca_scan() is,

 737 list_for_each_entry_safe(b, t, &c->btree_cache_freeable, list) {
 738         if (nr <= 0)
 739                 goto out;
 740
 741         if (++i > 3 &&
 742             !mca_reap(b, 0, false)) {
             		lines free cache memory
 746         }
 747         nr--;
 748 }

The problem is, if virtual memory code calls bch_mca_scan() and
the calculated 'nr' is 1 or 2, then in the above loop, nothing will
be shunk. In such case, if slub/slab manager calls bch_mca_scan()
for many times with small scan number, it does not help to shrink
cache memory and just wasts CPU cycles.

This patch just selects btree node caches from tail of the
c->btree_cache_freeable list, then the newly freed host cache can
still be allocated by mca_alloc(), and at least 1 node can be shunk.
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d5c9c470

bcache: remove member accessed from struct btree · 125d98ed

由 Coly Li 提交于 1月 24, 2020

The member 'accessed' of struct btree is used in bch_mca_scan() when
shrinking btree node caches. The original idea is, if b->accessed is
set, clean it and look at next btree node cache from c->btree_cache
list, and only shrink the caches whose b->accessed is cleaned. Then
only cold btree node cache will be shrunk.

But when I/O pressure is high, it is very probably that b->accessed
of a btree node cache will be set again in bch_btree_node_get()
before bch_mca_scan() selects it again. Then there is no chance for
bch_mca_scan() to shrink enough memory back to slub or slab system.

This patch removes member accessed from struct btree, then once a
btree node ache is selected, it will be immediately shunk. By this
change, bch_mca_scan() may release btree node cahce more efficiently.
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

125d98ed

bcache: print written and keys in trace_bcache_btree_write · d44330b7

由 Guoju Fang 提交于 1月 24, 2020

It's useful to dump written block and keys on btree write, this patch
add them into trace_bcache_btree_write.
Signed-off-by: NGuoju Fang <fangguoju@gmail.com>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d44330b7

bcache: avoid unnecessary btree nodes flushing in btree_flush_write() · 2aa8c529

由 Coly Li 提交于 1月 24, 2020

the commit 91be66e1 ("bcache: performance improvement for
btree_flush_write()") was an effort to flushing btree node with oldest
btree node faster in following methods,
- Only iterate dirty btree nodes in c->btree_cache, avoid scanning a lot
  of clean btree nodes.
- Take c->btree_cache as a LRU-like list, aggressively flushing all
  dirty nodes from tail of c->btree_cache util the btree node with
  oldest journal entry is flushed. This is to reduce the time of holding
  c->bucket_lock.

Guoju Fang and Shuang Li reported that they observe unexptected extra
write I/Os on cache device after applying the above patch. Guoju Fang
provideed more detailed diagnose information that the aggressive
btree nodes flushing may cause 10x more btree nodes to flush in his
workload. He points out when system memory is large enough to hold all
btree nodes in memory, c->btree_cache is not a LRU-like list any more.
Then the btree node with oldest journal entry is very probably not-
close to the tail of c->btree_cache list. In such situation much more
dirty btree nodes will be aggressively flushed before the target node
is flushed. When slow SATA SSD is used as cache device, such over-
aggressive flushing behavior will cause performance regression.

After spending a lot of time on debug and diagnose, I find the real
condition is more complicated, aggressive flushing dirty btree nodes
from tail of c->btree_cache list is not a good solution.
- When all btree nodes are cached in memory, c->btree_cache is not
  a LRU-like list, the btree nodes with oldest journal entry won't
  be close to the tail of the list.
- There can be hundreds dirty btree nodes reference the oldest journal
  entry, before flushing all the nodes the oldest journal entry cannot
  be reclaimed.
When the above two conditions mixed together, a simply flushing from
tail of c->btree_cache list is really NOT a good idea.

Fortunately there is still chance to make btree_flush_write() work
better. Here is how this patch avoids unnecessary btree nodes flushing,
- Only acquire c->journal.lock when getting oldest journal entry of
  fifo c->journal.pin. In rested locations check the journal entries
  locklessly, so their values can be changed on other cores
  in parallel.
- In loop list_for_each_entry_safe_reverse(), checking latest front
  point of fifo c->journal.pin. If it is different from the original
  point which we get with locking c->journal.lock, it means the oldest
  journal entry is reclaim on other cores. At this moment, all selected
  dirty nodes recorded in array btree_nodes[] are all flushed and clean
  on other CPU cores, it is unncessary to iterate c->btree_cache any
  longer. Just quit the list_for_each_entry_safe_reverse() loop and
  the following for-loop will skip all the selected clean nodes.
- Find a proper time to quit the list_for_each_entry_safe_reverse()
  loop. Check the refcount value of orignial fifo front point, if the
  value is larger than selected node number of btree_nodes[], it means
  more matching btree nodes should be scanned. Otherwise it means no
  more matching btee nodes in rest of c->btree_cache list, the loop
  can be quit. If the original oldest journal entry is reclaimed and
  fifo front point is updated, the refcount of original fifo front point
  will be 0, then the loop will be quit too.
- Not hold c->bucket_lock too long time. c->bucket_lock is also required
  for space allocation for cached data, hold it for too long time will
  block regular I/O requests. When iterating list c->btree_cache, even
  there are a lot of maching btree nodes, in order to not holding
  c->bucket_lock for too long time, only BTREE_FLUSH_NR nodes are
  selected and to flush in following for-loop.
With this patch, only btree nodes referencing oldest journal entry
are flushed to cache device, no aggressive flushing for  unnecessary
btree node any more. And in order to avoid blocking regluar I/O
requests, each time when btree_flush_write() called, at most only
BTREE_FLUSH_NR btree nodes are selected to flush, even there are more
maching btree nodes in list c->btree_cache.

At last, one more thing to explain: Why it is safe to read front point
of c->journal.pin without holding c->journal.lock inside the
list_for_each_entry_safe_reverse() loop ?

Here is my answer: When reading the front point of fifo c->journal.pin,
we don't need to know the exact value of front point, we just want to
check whether the value is different from the original front point
(which is accurate value because we get it while c->jouranl.lock is
held). For such purpose, it works as expected without holding
c->journal.lock. Even the front point is changed on other CPU core and
not updated to local core, and current iterating btree node has
identical journal entry local as original fetched fifo front point, it
is still safe. Because after holding mutex b->write_lock (with memory
barrier) this btree node can be found as clean and skipped, the loop
will quite latter when iterate on next node of list c->btree_cache.

Fixes: 91be66e1 ("bcache: performance improvement for btree_flush_write()")
Reported-by: NGuoju Fang <fangguoju@gmail.com>
Reported-by: NShuang Li <psymon@bonuscloud.io>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2aa8c529

bcache: add code comments for state->pool in __btree_sort() · 7a0bc2a8

由 Coly Li 提交于 1月 24, 2020

To explain the pages allocated from mempool state->pool can be
swapped in __btree_sort(), because state->pool is a page pool,
which allocates pages by alloc_pages() indeed.
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7a0bc2a8

lib: crc64: include <linux/crc64.h> for 'crc64_be' · 0e0c1231

由 Ben Dooks (Codethink) 提交于 1月 24, 2020

The crc64_be() is declared in <linux/crc64.h> so include
this where the symbol is defined to avoid the following
warning:

lib/crc64.c:43:12: warning: symbol 'crc64_be' was not declared. Should it be static?
Signed-off-by: NBen Dooks (Codethink) <ben.dooks@codethink.co.uk>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0e0c1231

bcache: use read_cache_page_gfp to read the superblock · 6321bef0

由 Christoph Hellwig 提交于 1月 24, 2020

Avoid a pointless dependency on buffer heads in bcache by simply open
coding reading a single page.  Also add a SB_OFFSET define for the
byte offset of the superblock instead of using magic numbers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6321bef0

bcache: store a pointer to the on-disk sb in the cache and cached_dev structures · 475389ae

由 Christoph Hellwig 提交于 1月 24, 2020

This allows to properly build the superblock bio including the offset in
the page using the normal bio helpers.  This fixes writing the superblock
for page sizes larger than 4k where the sb write bio would need an offset
in the bio_vec.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

475389ae

bcache: return a pointer to the on-disk sb from read_super · cfa0c56d

由 Christoph Hellwig 提交于 1月 24, 2020

Returning the properly typed actual data structure insteaf of the
containing struct page will save the callers some work going
forward.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cfa0c56d

bcache: transfer the sb_page reference to register_{bdev,cache} · fc8f19cc

由 Christoph Hellwig 提交于 1月 24, 2020

Avoid an extra reference count roundtrip by transferring the sb_page
ownership to the lower level register helpers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fc8f19cc

bcache: fix use-after-free in register_bcache() · ae3cd299

由 Coly Li 提交于 1月 24, 2020

The patch "bcache: rework error unwinding in register_bcache" introduces
a use-after-free regression in register_bcache(). Here are current code,
	2510 out_free_path:
	2511         kfree(path);
	2512 out_module_put:
	2513         module_put(THIS_MODULE);
	2514 out:
	2515         pr_info("error %s: %s", path, err);
	2516         return ret;
If some error happens and the above code path is executed, at line 2511
path is released, but referenced at line 2515. Then KASAN reports a use-
after-free error message.

This patch changes line 2515 in the following way to fix the problem,
	2515         pr_info("error %s: %s", path?path:"", err);
Signed-off-by: NColy Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ae3cd299

bcache: properly initialize 'path' and 'err' in register_bcache() · 29cda393

由 Coly Li 提交于 1月 24, 2020

Patch "bcache: rework error unwinding in register_bcache" from
Christoph Hellwig changes the local variables 'path' and 'err'
in undefined initial state. If the code in register_bcache() jumps
to label 'out:' or 'out_module_put:' by goto, these two variables
might be reference with undefined value by the following line,

	out_module_put:
	        module_put(THIS_MODULE);
	out:
	        pr_info("error %s: %s", path, err);
	        return ret;

Therefore this patch initializes these two local variables properly
in register_bcache() to avoid such issue.
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

29cda393

bcache: rework error unwinding in register_bcache · 50246693

由 Christoph Hellwig 提交于 1月 24, 2020

Split the successful and error return path, and use one goto label for each
resource to unwind.  This also fixes some small errors like leaking the
module reference count in the reboot case (which seems entirely harmless)
or printing the wrong warning messages for early failures.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

50246693

bcache: use a separate data structure for the on-disk super block · a702a692

由 Christoph Hellwig 提交于 1月 24, 2020

Split out an on-disk version struct cache_sb with the proper endianness
annotations.  This fixes a fair chunk of sparse warnings, but there are
some left due to the way the checksum is defined.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a702a692

bcache: cached_dev_free needs to put the sb page · e8547d42

由 Liang Chen 提交于 1月 24, 2020

Same as cache device, the buffer page needs to be put while
freeing cached_dev.  Otherwise a page would be leaked every
time a cached_dev is stopped.
Signed-off-by: NLiang Chen <liangchen.linux@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e8547d42

14 1月, 2020 16 次提交

Merge branch 'md-next' of... · 7454049e

由 Jens Axboe 提交于 1月 13, 2020

Merge branch 'md-next' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.6/drivers

Pull MD changes from Song.

* 'md-next' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md/raid1: introduce wait_for_serialization
  md/raid1: use bucket based mechanism for IO serialization
  md: introduce a new struct for IO serialization
  md: don't destroy serial_info_pool if serialize_policy is true
  raid1: serialize the overlap write
  md: reorgnize mddev_create/destroy_serial_pool
  md: add serialize_policy sysfs node for raid1
  md: prepare for enable raid1 io serialization
  md: fix a typo s/creat/create
  md: rename wb stuffs
  raid5: remove worker_cnt_per_group argument from alloc_thread_groups
  md/raid6: fix algorithm choice under larger PAGE_SIZE
  raid6/test: fix a compilation warning
  raid6/test: fix a compilation error
  md-bitmap: small cleanups

7454049e

md/raid1: introduce wait_for_serialization · d0d2d8ba

由 Guoqing Jiang 提交于 12月 23, 2019

Previously, we call check_and_add_serial when serialization is
enabled for write IO, but it could allocate and free memory
back and forth.

Now, let's just get an element from memory pool with the new
function, then insert node to rb tree if no collision happens.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

d0d2d8ba

md/raid1: use bucket based mechanism for IO serialization · 025471f9

由 Guoqing Jiang 提交于 12月 23, 2019

Since raid1 had already used bucket based mechanism to reduce
the conflict between write IO and resync IO, it is possible to
speed up performance for io serialization with refer to the
same mechanism.

To align with the barrier bucket mechanism, we created arrays
(with the same number of BARRIER_BUCKETS_NR) for spinlock, rb
tree and waitqueue. Then we can reduce lock competition with
multiple spinlocks, boost search performance with multiple rb
trees and also reduce thundering herd problem with multiple
waitqueues.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

025471f9

md: introduce a new struct for IO serialization · 69b00b5b

由 Guoqing Jiang 提交于 12月 23, 2019

Obviously, IO serialization could cause the degradation of
performance a lot. In order to reduce the degradation, so a
rb interval tree is added in raid1 to speed up the check of
collision.

So, a rb root is needed in md_rdev, then abstract all the
serialize related members to a new struct (serial_in_rdev),
embed it into md_rdev.

Of course, we need to free the struct if it is not needed
anymore, so rdev/rdevs_uninit_serial are added accordingly.
And they should be called when destroty memory pool or can't
alloc memory.

And we need to consider to call mddev_destroy_serial_pool
in case serialize_policy/write-behind is disabled, bitmap
is destroyed or in __md_stop_writes.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

69b00b5b

md: don't destroy serial_info_pool if serialize_policy is true · 4d26d32f

由 Guoqing Jiang 提交于 12月 23, 2019

The serial_info_pool is needed if array sets serialize_policy to
true, so don't destroy it.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

4d26d32f

raid1: serialize the overlap write · 69df9cfc

由 Guoqing Jiang 提交于 12月 23, 2019

Before dispatch write bio, raid1 array which enables
serialize_policy need to check if overlap exists between
this bio and previous on-flying bios. If there is overlap,
then it has to wait until the collision is disappeared.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

69df9cfc

md: reorgnize mddev_create/destroy_serial_pool · de31ee94

由 Guoqing Jiang 提交于 12月 23, 2019

So far, IO serialization is used for two scenarios:

1. raid1 which enables write-behind mode, and there is rdev in the array
which is multi-queue device and flaged with writemostly.
2. IO serialization is enabled or disabled by change serialize_policy.

So introduce rdev_need_serial to check the first scenario. And for 1, IO
serialization is enabled automatically while 2 is controlled manually.

And it is possible that both scenarios are true, so for create serial pool,
rdev/rdevs_init_serial should be separate from check if the pool existed or
not. Then for destroy pool, we need to check if the pool is needed by other
rdevs due to the first scenario.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

de31ee94

md: add serialize_policy sysfs node for raid1 · 3938f5fb

由 Guoqing Jiang 提交于 12月 23, 2019

With the new sysfs node, we can use it to control if raid1 array
wants io serialization or not. So mddev_create_serial_pool and
mddev_destroy_serial_pool are called in serialize_policy_store
to enable or disable the serialization.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

3938f5fb

md: prepare for enable raid1 io serialization · 11d3a9f6

由 Guoqing Jiang 提交于 12月 23, 2019

1. The related resources (spin_lock, list and waitqueue) are needed for
address raid1 reorder overlap issue too, in this case, rdev is set to
NULL for mddev_create/destroy_serial_pool which implies all rdevs need
to handle these resources.

And also add "is_suspend" to mddev_destroy_serial_pool since it will
be called under suspended situation, which also makes both create and
destroy pool have same arguments.

2. Introduce rdevs_init_serial which is called if raid1 io serialization
is enabled since all rdevs need to init related stuffs.

3. rdev_init_serial and clear_bit(CollisionCheck, &rdev->flags) should
be called between suspend and resume.

No need to export mddev_create_serial_pool since it is only called in
md-mod module.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

11d3a9f6

md: fix a typo s/creat/create · 3e173ab5

由 Guoqing Jiang 提交于 12月 23, 2019

It actually means create here, so fix the typo.
Reported-by: NSong Liu <liu.song.a23@gmail.com>
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

3e173ab5

md: rename wb stuffs · 404659cf

由 Guoqing Jiang 提交于 12月 23, 2019

Previously, wb_info_pool and wb_list stuffs are introduced
to address potential data inconsistence issue for write
behind device.

Now rename them to serial related name, since the same
mechanism will be used to address reorder overlap write
issue for raid1.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

404659cf

raid5: remove worker_cnt_per_group argument from alloc_thread_groups · d2c9ad41

由 Guoqing Jiang 提交于 12月 20, 2019

We can use "cnt" directly to update conf->worker_cnt_per_group
if alloc_thread_groups returns 0.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

d2c9ad41

md/raid6: fix algorithm choice under larger PAGE_SIZE · f591df3c

由 Zhengyuan Liu 提交于 12月 20, 2019

There are several algorithms available for raid6 to generate xor and syndrome
parity, including basic int1, int2 ... int32 and SIMD optimized implementation
like sse and neon. To test and choose the best algorithms at the initial
stage, we need provide enough disk data to feed the algorithms. However, the
disk number we provided depends on page size and gfmul table, seeing bellow:

const int disks = (65536/PAGE_SIZE) + 2;

So when come to 64K PAGE_SIZE, there is only one data disk plus 2 parity disk,
as a result the chosed algorithm is not reliable. For example, on my arm64
machine with 64K page enabled, it will choose intx32 as the best one, although
the NEON implementation is better.

This patch tries to fix the problem by defining a constant raid6 disk number to
supporting arbitrary page size.
Suggested-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: NSong Liu <songliubraving@fb.com>

f591df3c

raid6/test: fix a compilation warning · 5e5ac01c

由 Zhengyuan Liu 提交于 12月 20, 2019

The compilation warning is redefination showed as following:

        In file included from tables.c:2:
        ../../../include/linux/export.h:180: warning: "EXPORT_SYMBOL" redefined
         #define EXPORT_SYMBOL(sym)  __EXPORT_SYMBOL(sym, "")

        In file included from tables.c:1:
        ../../../include/linux/raid/pq.h:61: note: this is the location of the previous definition
         #define EXPORT_SYMBOL(sym)

Fixes: 69a94abb ("export.h, genksyms: do not make genksyms calculate CRC of trimmed symbols")
Signed-off-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: NSong Liu <songliubraving@fb.com>

5e5ac01c

raid6/test: fix a compilation error · 6b8651aa

由 Zhengyuan Liu 提交于 12月 20, 2019

The compilation error is redeclaration showed as following:

        In file included from ../../../include/linux/limits.h:6,
                         from /usr/include/x86_64-linux-gnu/bits/local_lim.h:38,
                         from /usr/include/x86_64-linux-gnu/bits/posix1_lim.h:161,
                         from /usr/include/limits.h:183,
                         from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/limits.h:194,
                         from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/syslimits.h:7,
                         from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/limits.h:34,
                         from ../../../include/linux/raid/pq.h:30,
                         from algos.c:14:
        ../../../include/linux/types.h:114:15: error: conflicting types for ‘int64_t’
         typedef s64   int64_t;
                       ^~~~~~~
        In file included from /usr/include/stdint.h:34,
                         from /usr/lib/gcc/x86_64-linux-gnu/8/include/stdint.h:9,
                         from /usr/include/inttypes.h:27,
                         from ../../../include/linux/raid/pq.h:29,
                         from algos.c:14:
        /usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous \
        declaration of ‘int64_t’ was here
         typedef __int64_t int64_t;

Fixes: 54d50897 ("linux/kernel.h: split *_MAX and *_MIN macros into <linux/limits.h>")
Signed-off-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: NSong Liu <songliubraving@fb.com>

6b8651aa

md-bitmap: small cleanups · 55180498

由 Zhiqiang Liu 提交于 12月 07, 2019

In md_bitmap_unplug, bitmap->storage.filemap is double checked.

In md_bitmap_daemon_work, bitmap->storage.filemap should be checked
before reference.
Signed-off-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

55180498

09 1月, 2020 1 次提交

block: mark zone-mgmt bios with REQ_SYNC · 8e42d239

由 Chaitanya Kulkarni 提交于 1月 07, 2020

In the current implementation, final zone-mgmt request is issued with
submit_bio_wait() which marks the bio REQ_SYNC. This is needed since
immediate action is expected for zone-mgmt requests as these are
blocking operations. This also bypasses the scheduler in the
blk_mq_make_request() and dispatches the request directly into the
hw ctx.

This patch marks all the chained bios REQ_SYNC so that we can have
above-mentioned behavior for non-final bios also.
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8e42d239

07 1月, 2020 2 次提交

blk-mq: Document functions for sending request · 105663f7

由 André Almeida 提交于 1月 06, 2020

Add or improve documentation for function regarding creating and sending
IO requests to the hardware.
Signed-off-by: NAndré Almeida <andrealmeid@collabora.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

105663f7

block: Allow t10-pi to be modular · a754bd5f

由 Herbert Xu 提交于 12月 23, 2019

Currently t10-pi can only be built into the block layer which via
crc-t10dif pulls in a whole chunk of the Crypto API.  In fact all
users of t10-pi work as modules and there is no reason for it to
always be built-in.

This patch adds a new hidden option for t10-pi that is selected
automatically based on BLK_DEV_INTEGRITY and whether the users
of t10-pi are built-in or not.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a754bd5f

19 12月, 2019 4 次提交

blk-mq: optimise blk_mq_flush_plug_list() · 95ed0c5b

由 Pavel Begunkov 提交于 11月 29, 2019

Instead of using list_del_init() in a loop, that generates a lot of
unnecessary memory read/writes, iterate from the first request of a
batch and cut out a sublist with list_cut_before().

Apart from removing the list node initialisation part, this is more
register-friendly, and the assembly uses the stack less intensively.

list_empty() at the beginning is done with hope, that the compiler can
optimise out the same check in the following list_splice_init().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

95ed0c5b

list: introduce list_for_each_continue() · 28ca0d6d

由 Pavel Begunkov 提交于 11月 29, 2019

As other *continue() helpers, this continues iteration from a given
position.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

28ca0d6d

blk-mq: optimise rq sort function · 7d30a621

由 Pavel Begunkov 提交于 11月 29, 2019

Check "!=" in multi-layer comparisons. The same memory usage, fewer
instructions, and 2 from 4 jumps are replaced with SETcc.

Note, that list_sort() doesn't differ 0 and <0.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7d30a621

Merge tag 'sound-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 80a0c2e5

由 Linus Torvalds 提交于 12月 18, 2019

Pull sound fixes from Takashi Iwai:
 "A slightly high amount at this time, but all good and small fixes:

   - A PCM core fix that initializes the buffer properly for avoiding
     information leaks; it is a long-standing minor problem, but good to
     fix better now

   - A few ASoC core fixes for the init / cleanup ordering issues that
     surfaced after the recent refactoring

   - Lots of SOF and topology-related fixes went in, as usual as such
     hot topics

   - Several ASoC codec and platform-specific small fixes: wm89xx,
     realtek, and max98090, AMD, Intel-SST

   - A fix for the previous incomplete regression of HD-audio, now
     hitting Nvidia HDMI

   - A few HD-audio CA0132 codec fixes"

* tag 'sound-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (27 commits)
  ALSA: hda - Downgrade error message for single-cmd fallback
  ASoC: wm8962: fix lambda value
  ALSA: hda: Fix regression by strip mask fix
  ALSA: hda/ca0132 - Fix work handling in delayed HP detection
  ALSA: hda/ca0132 - Avoid endless loop
  ALSA: hda/ca0132 - Keep power on during processing DSP response
  ALSA: pcm: Avoid possible info leaks from PCM stream buffers
  ASoC: Intel: common: work-around incorrect ACPI HID for CML boards
  ASoC: SOF: Intel: split cht and byt debug window sizes
  ASoC: SOF: loader: fix snd_sof_fw_parse_ext_data
  ASoC: SOF: loader: snd_sof_fw_parse_ext_data log warning on unknown header
  ASoC: simple-card: Don't create separate link when platform is present
  ASoC: topology: Check return value for soc_tplg_pcm_create()
  ASoC: topology: Check return value for snd_soc_add_dai_link()
  ASoC: core: only flush inited work during free
  ASoC: Intel: bytcr_rt5640: Update quirk for Teclast X89
  ASoC: core: Init pcm runtime work early to avoid warnings
  ASoC: Intel: sst: Add missing include <linux/io.h>
  ASoC: max98090: fix possible race conditions
  ASoC: max98090: exit workaround earlier if PLL is locked
  ...

80a0c2e5

18 12月, 2019 2 次提交

Merge tag 'for-5.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 2187f215

由 Linus Torvalds 提交于 12月 17, 2019

Pull btrfs fixes from David Sterba:
 "A mix of regression fixes and regular fixes for stable trees:

   - fix swapped error messages for qgroup enable/rescan

   - fixes for NO_HOLES feature with clone range

   - fix deadlock between iget/srcu lock/synchronize srcu while freeing
     an inode

   - fix double lock on subvolume cross-rename

   - tree log fixes
      * fix missing data checksums after replaying a log tree
      * also teach tree-checker about this problem
      * skip log replay on orphaned roots

   - fix maximum devices constraints for RAID1C -3 and -4

   - send: don't print warning on read-only mount regarding orphan
     cleanup

   - error handling fixes"

* tag 'for-5.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: send: remove WARN_ON for readonly mount
  btrfs: do not leak reloc root if we fail to read the fs root
  btrfs: skip log replay on orphaned roots
  btrfs: handle ENOENT in btrfs_uuid_tree_iterate
  btrfs: abort transaction after failed inode updates in create_subvol
  Btrfs: fix hole extent items with a zero size after range cloning
  Btrfs: fix removal logic of the tree mod log that leads to use-after-free issues
  Btrfs: make tree checker detect checksum items with overlapping ranges
  Btrfs: fix missing data checksums after replaying a log tree
  btrfs: return error pointer from alloc_test_extent_buffer
  btrfs: fix devs_max constraints for raid1c3 and raid1c4
  btrfs: tree-checker: Fix error format string for size_t
  btrfs: don't double lock the subvol_sem for rename exchange
  btrfs: handle error in btrfs_cache_block_group
  btrfs: do not call synchronize_srcu() in inode_tree_del
  Btrfs: fix cloning range with a hole when using the NO_HOLES feature
  btrfs: Fix error messages in qgroup_rescan_init

2187f215

early init: fix error handling when opening /dev/console · 2d3145f8

由 Linus Torvalds 提交于 12月 17, 2019

The comment says "this should never fail", but it definitely can fail
when you have odd initial boot filesystems, or kernel configurations.

So get the error handling right: filp_open() returns an error pointer.
Reported-by: NJesse Barnes <jsbarnes@google.com>
Reported-by: Nyouling 257 <youling257@gmail.com>
Fixes: 8243186f ("fs: remove ksys_dup()")
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d3145f8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功