提交 · ebaa1ac12b0c97337a5bad1d57334056791aaa88 · openeuler / Kernel

03 10月, 2020 13 次提交

bcache: remove can_attach_cache() · ebaa1ac1

由 Coly Li 提交于 10月 01, 2020

After removing the embedded struct cache_sb from struct cache_set, cache
set will directly reference the in-memory super block of struct cache.
It is unnecessary to compare block_size, bucket_size and nr_in_set from
the identical in-memory super block in can_attach_cache().

This is a preparation patch for latter removing cache_set->sb from
struct cache_set.
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ebaa1ac1

bcache: don't check seq numbers in register_cache_set() · 08a17828

由 Coly Li 提交于 10月 01, 2020

In order to update the partial super block of cache set, the seq numbers
of cache and cache set are checked in register_cache_set(). If cache's
seq number is larger than cache set's seq number, cache set must update
its partial super block from cache's super block. It is unncessary when
the embedded struct cache_sb is removed from struct cache set.

This patch removed the seq numbers checking from register_cache_set(),
because later there will be no such partial super block in struct cache
set, the cache set will directly reference in-memory super block from
struct cache. This is a preparation patch for removing embedded struct
cache_sb from struct cache_set.
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

08a17828

bcache: only use bucket_bytes() on struct cache · 63a96c05

由 Coly Li 提交于 10月 01, 2020

Because struct cache_set and struct cache both have struct cache_sb,
macro bucket_bytes() currently are used on both of them. When removing
the embedded struct cache_sb from struct cache_set, this macro won't be
used on struct cache_set anymore.

This patch unifies all bucket_bytes() usage only on struct cache, this is
one of the preparation to remove the embedded struct cache_sb from
struct cache_set.
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

63a96c05

bcache: remove useless bucket_pages() · 3c4fae29

由 Coly Li 提交于 10月 01, 2020

It seems alloc_bucket_pages() is the only user of bucket_pages().
Considering alloc_bucket_pages() is removed from bcache code, it is safe
to remove the useless macro bucket_pages() now.
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3c4fae29

bcache: remove useless alloc_bucket_pages() · 421cf1c5

由 Coly Li 提交于 10月 01, 2020

Now no one uses alloc_bucket_pages() anymore, remove it from bcache.h.
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

421cf1c5

bcache: only use block_bytes() on struct cache · 4e1ebae3

由 Coly Li 提交于 10月 01, 2020

Because struct cache_set and struct cache both have struct cache_sb,
therefore macro block_bytes() can be used on both of them. When removing
the embedded struct cache_sb from struct cache_set, this macro won't be
used on struct cache_set anymore.

This patch unifies all block_bytes() usage only on struct cache, this is
one of the preparation to remove the embedded struct cache_sb from
struct cache_set.
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4e1ebae3

bcache: add set_uuid in struct cache_set · 1132e56e

由 Coly Li 提交于 10月 01, 2020

This patch adds a separated set_uuid[16] in struct cache_set, to store
the uuid of the cache set. This is the preparation to remove the
embedded struct cache_sb from struct cache_set.
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1132e56e

bcache: remove for_each_cache() · 08fdb2cd

由 Coly Li 提交于 10月 01, 2020

Since now each cache_set explicitly has single cache, for_each_cache()
is unnecessary. This patch removes this macro, and update all locations
where it is used, and makes sure all code logic still being consistent.
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

08fdb2cd

bcache: explicitly make cache_set only have single cache · 697e2349

由 Coly Li 提交于 10月 01, 2020

Currently although the bcache code has a framework for multiple caches
in a cache set, but indeed the multiple caches never completed and users
use md raid1 for multiple copies of the cached data.

This patch does the following change in struct cache_set, to explicitly
make a cache_set only have single cache,
- Change pointer array "*cache[MAX_CACHES_PER_SET]" to a single pointer
  "*cache".
- Remove pointer array "*cache_by_alloc[MAX_CACHES_PER_SET]".
- Remove "caches_loaded".

Now the code looks as exactly what it does in practic: only one cache is
used in the cache set.
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

697e2349

bcache: remove 'int n' from parameter list of bch_bucket_alloc_set() · 17e4aed8

由 Coly Li 提交于 10月 01, 2020

The parameter 'int n' from bch_bucket_alloc_set() is not cleared
defined. From the code comments n is the number of buckets to alloc, but
from the code itself 'n' is the maximum cache to iterate. Indeed all the
locations where bch_bucket_alloc_set() is called, 'n' is alwasy 1.

This patch removes the confused and unnecessary 'int n' from parameter
list of  bch_bucket_alloc_set(), and explicitly allocates only 1 bucket
for its caller.
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

17e4aed8

bcache: Convert to DEFINE_SHOW_ATTRIBUTE · 84e5d136

由 Qinglang Miao 提交于 10月 01, 2020

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

As inode->iprivate equals to third parameter of
debugfs_create_file() which is NULL. So it's equivalent
to original code logic.
Signed-off-by: NQinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

84e5d136

bcache: check c->root with IS_ERR_OR_NULL() in mca_reserve() · 7e59c506

由 Dongsheng Yang 提交于 10月 01, 2020

In mca_reserve(c) macro, we are checking root whether is NULL or not.
But that's not enough, when we read the root node in run_cache_set(),
if we got an error in bch_btree_node_read_done(), we will return
ERR_PTR(-EIO) to c->root.

And then we will go continue to unregister, but before calling
unregister_shrinker(&c->shrink), there is a possibility to call
bch_mca_count(), and we would get a crash with call trace like that:

[ 2149.876008] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000b5
... ...
[ 2150.598931] Call trace:
[ 2150.606439]  bch_mca_count+0x58/0x98 [escache]
[ 2150.615866]  do_shrink_slab+0x54/0x310
[ 2150.624429]  shrink_slab+0x248/0x2d0
[ 2150.632633]  drop_slab_node+0x54/0x88
[ 2150.640746]  drop_slab+0x50/0x88
[ 2150.648228]  drop_caches_sysctl_handler+0xf0/0x118
[ 2150.657219]  proc_sys_call_handler.isra.18+0xb8/0x110
[ 2150.666342]  proc_sys_write+0x40/0x50
[ 2150.673889]  __vfs_write+0x48/0x90
[ 2150.681095]  vfs_write+0xac/0x1b8
[ 2150.688145]  ksys_write+0x6c/0xd0
[ 2150.695127]  __arm64_sys_write+0x24/0x30
[ 2150.702749]  el0_svc_handler+0xa0/0x128
[ 2150.710296]  el0_svc+0x8/0xc
Signed-off-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7e59c506

bcache: share register sysfs with async register · a58e88bf

由 Coly Li 提交于 10月 01, 2020

Previously the experimental async registration uses a separate sysfs
file register_async. Now the async registration code seems working well
for a while, we can do furtuher testing with it now.

This patch changes the async bcache registration shares the same sysfs
file /sys/fs/bcache/register (and register_quiet). Async registration
will be default behavior if BCACHE_ASYNC_REGISTRATION is set in kernel
configure. By default, BCACHE_ASYNC_REGISTRATION is not configured yet.
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a58e88bf

29 9月, 2020 1 次提交

null_blk: add support for max open/active zone limit for zoned devices · dc4d137e

由 Niklas Cassel 提交于 8月 28, 2020

Add support for user space to set a max open zone and a max active zone
limit via configfs. By default, the default values are 0 == no limit.

Call the block layer API functions used for exposing the configured
limits to sysfs.

Add accounting in null_blk_zoned so that these new limits are respected.
Performing an operation that would exceed these limits results in a
standard I/O error.

A max open zone limit exists in the ZBC standard.
While null_blk_zoned is used to test the Zoned Block Device model in
Linux, when it comes to differences between ZBC and ZNS, null_blk_zoned
mostly follows ZBC.

Therefore, implement the manage open zone resources function from ZBC,
but additionally add support for max active zones.
This enables user space not only to test against a device with an open
zone limit, but also to test against a device with an active zone limit.
Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dc4d137e

28 9月, 2020 1 次提交

Merge tag 'nvme-5.10-2020-09-27' of git://git.infradead.org/nvme into for-5.10/drivers · 1ed4211d

由 Jens Axboe 提交于 9月 28, 2020

Pull NVMe updates from Christoph:

"nvme updates for 5.10

 - fix keep alive timer modification (Amit Engel)
 - order the PCI ID list more sensibly (Andy Shevchenko)
 - cleanup the open by controller helper (Chaitanya Kulkarni)
 - use an xarray for th CSE log lookup (Chaitanya Kulkarni)
 - support ZNS in nvmet passthrough mode (Chaitanya Kulkarni)
 - fix nvme_ns_report_zones (me)
 - add a sanity check to nvmet-fc (James Smart)
 - fix interrupt allocation when too many polled queues are specified
   (Jeffle Xu)
 - small nvmet-tcp optimization (Mark Wunderlich)"

* tag 'nvme-5.10-2020-09-27' of git://git.infradead.org/nvme:
  nvme-pci: allocate separate interrupt for the reserved non-polled I/O queue
  nvme: fix error handling in nvme_ns_report_zones
  nvmet-fc: fix missing check for no hostport struct
  nvmet: add passthru ZNS support
  nvmet: handle keep-alive timer when kato is modified by a set features cmd
  nvmet-tcp: have queue io_work context run on sock incoming cpu
  nvme-pci: Move enumeration by class to be last in the table
  nvme: use an xarray to lookup the Commands Supported and Effects log
  nvme: lift the file open code from nvme_ctrl_get_by_path

1ed4211d

27 9月, 2020 9 次提交

nvme-pci: allocate separate interrupt for the reserved non-polled I/O queue · 21cc2f3f

由 Jeffle Xu 提交于 9月 24, 2020

One queue will be reserved for non-polled IO when nvme.poll_queues is
greater or equal than the number of IO queues that the nvme controller
can provide. Currently the reserved queue for non-polled IO will reuse
the interrupt used by admin queue in this case, e.g, vector 0.

This can work and the performance may not be an issue since the admin
queue is used unfrequently. However this behaviour may be inconsistent
with that when nvme.poll_queues is smaller than the number of IO
queues available.

Thus allocate separate interrupt for this reserved queue, and thus make
the behaviour consistent.
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
[hch: minor cleanups, mostly to the pre-existing surrounding code]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

21cc2f3f

nvme: fix error handling in nvme_ns_report_zones · 936fab50

由 Christoph Hellwig 提交于 8月 30, 2020

nvme_submit_sync_cmd can return positive NVMe error codes in addition to
the negative Linux error code, which are currently ignored. Fix this
by removing __nvme_ns_report_zones and handling the errors from
nvme_submit_sync_cmd in the caller instead of multiplexing the return
value and the number of zones reported into a single return value.

Fixes: 240e6ee2 ("nvme: support for zoned namespaces")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>

936fab50

nvmet-fc: fix missing check for no hostport struct · ddd3d105

由 James Smart 提交于 9月 22, 2020

A hostport port pointer is allowed to be NULL as it is not allocated if
the lldd does not support the new interfaces for NVME LS request support.
The hostport free routine validates the handle but forgot to validate the
hostport pointer.

Validate the hostport pointer before using it to validate the handle.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ddd3d105

nvmet: add passthru ZNS support · 5b3356d9

由 Chaitanya Kulkarni 提交于 9月 22, 2020

In the default passthru implementation NVMeOF target passthru ctrl is
not capable of handling Zoned Namespaces (ZNS).

Update the nvmet_parse_pasthru_admin_cmd() to allow
NVME_ID_CNS_CS_CTRL/NVME_CIS_ZNS and NVME_ID_CNS_CS_NS/NVME_CIS_ZNS.

With this addition NVMeOF Passthru allows Zoned Namespaces.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5b3356d9

nvmet: handle keep-alive timer when kato is modified by a set features cmd · 4e683c48

由 Amit Engel 提交于 9月 16, 2020

A user may modify the kato by a set features cmd.  To properly deal
with races or a kato value of 0 (no keep alive enabled) change
nvmet_set_feat_kato to first disable the timer, then set the value
and then re-enable the timer.
Signed-off-by: NAmit Engel <amit.engel@dell.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

4e683c48

nvmet-tcp: have queue io_work context run on sock incoming cpu · f7790e5d

由 Mark Wunderlich 提交于 8月 28, 2020

No real good need to spread queues artificially. Usually the
target will serve multiple hosts, and it's better to run on the socket
incoming cpu for better affinitization rather than spread queues on all
online cpus.

We rely on RSS to spread the work around sufficiently.
Signed-off-by: NMark Wunderlich <mark.wunderlich@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f7790e5d

nvme-pci: Move enumeration by class to be last in the table · 0b85f59d

由 Andy Shevchenko 提交于 8月 18, 2020

It's unusual that we have enumeration by class in the middle of the table.
It might potentially be problematic in the future if we add another entry
after it.

So, move class matching entry to be the last in the ID table.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0b85f59d

nvme: use an xarray to lookup the Commands Supported and Effects log · 1cf7a12e

由 Chaitanya Kulkarni 提交于 9月 22, 2020

When using linked list we have to open code the locking, search, and
destroy operations with the loops even if data structure doesn't fall
into the fast path.

One of the main advantage of having XArray to store, search, and remove
items is that it handles all the locking by itself, avoids the loops
when using linked lists, provides clear API to replace the linked list's
search and destroy loops.

This patch replaces the ctrl->cel list with XArray and removes :-

a. Extra code needed for the linked list for ctrl->cel item management
   such as nvme_find_cel().
b. Destroy loop in the nvme_free_ctrl().
c. Explicit insertion locking in the nvme_get_effects_log().
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1cf7a12e

nvme: lift the file open code from nvme_ctrl_get_by_path · b2702aaa

由 Chaitanya Kulkarni 提交于 9月 16, 2020

Lift opening the file open/close code from nvme_ctrl_get_by_path into
the caller, just keeping a simple nvme_ctrl_from_file() helper.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
[hch: refactored a bit, split the bug fixes into a separate prep patch]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>

b2702aaa

25 9月, 2020 16 次提交

Merge branch 'md-next' of... · 163090c1

由 Jens Axboe 提交于 9月 25, 2020

Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.10/drivers

Pull MD updates from Song.

* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md/raid10: improve discard request for far layout
  md/raid10: improve raid10 discard request
  md/raid10: pull codes that wait for blocked dev into one function
  md/raid10: extend r10bio devs to raid disks
  md: add md_submit_discard_bio() for submitting discard bio
  md: Simplify code with existing definition RESYNC_SECTORS in raid10.c
  md/raid5: reallocate page array after setting new stripe_size
  md/raid5: resize stripe_head when reshape array
  md/raid5: let multiple devices of stripe_head share page
  md/raid6: let async recovery function support different page offset
  md/raid6: let syndrome computor support different page offset
  md/raid5: convert to new xor compution interface
  md/raid5: add new xor function to support different page offset
  md/raid5: make async_copy_data() to support different page offset
  md/raid5: add a new member of offset into r5dev
  md: only calculate blocksize once and use i_blocksize()

163090c1

md/raid10: improve discard request for far layout · d3ee2d84

由 Xiao Ni 提交于 9月 02, 2020

For far layout, the discard region is not continuous on disks. So it needs
far copies r10bio to cover all regions. It needs a way to know all r10bios
have finish or not. Similar with raid10_sync_request, only the first r10bio
master_bio records the discard bio. Other r10bios master_bio record the
first r10bio. The first r10bio can finish after other r10bios finish and
then return the discard bio.
Signed-off-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

d3ee2d84

md/raid10: improve raid10 discard request · bcc90d28

由 Xiao Ni 提交于 9月 02, 2020

Now the discard request is split by chunk size. So it takes a long time
to finish mkfs on disks which support discard function. This patch improve
handling raid10 discard request. It uses the similar way with patch
29efc390 (md/md0: optimize raid0 discard handling).

But it's a little complex than raid0. Because raid10 has different layout.
If raid10 is offset layout and the discard request is smaller than stripe
size. There are some holes when we submit discard bio to underlayer disks.

For example: five disks (disk1 - disk5)
D01 D02 D03 D04 D05
D05 D01 D02 D03 D04
D06 D07 D08 D09 D10
D10 D06 D07 D08 D09
The discard bio just wants to discard from D03 to D10. For disk3, there is
a hole between D03 and D08. For disk4, there is a hole between D04 and D09.
D03 is a chunk, raid10_write_request can handle one chunk perfectly. So
the part that is not aligned with stripe size is still handled by
raid10_write_request.

If reshape is running when discard bio comes and the discard bio spans the
reshape position, raid10_write_request is responsible to handle this
discard bio.

I did a test with this patch set.
Without patch:
time mkfs.xfs /dev/md0
real4m39.775s
user0m0.000s
sys0m0.298s

With patch:
time mkfs.xfs /dev/md0
real0m0.105s
user0m0.000s
sys0m0.007s

nvme3n1           259:1    0   477G  0 disk
└─nvme3n1p1       259:10   0    50G  0 part
nvme4n1           259:2    0   477G  0 disk
└─nvme4n1p1       259:11   0    50G  0 part
nvme5n1           259:6    0   477G  0 disk
└─nvme5n1p1       259:12   0    50G  0 part
nvme2n1           259:9    0   477G  0 disk
└─nvme2n1p1       259:15   0    50G  0 part
nvme0n1           259:13   0   477G  0 disk
└─nvme0n1p1       259:14   0    50G  0 part
Reviewed-by: NColy Li <colyli@suse.de>
Reviewed-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

bcc90d28

md/raid10: pull codes that wait for blocked dev into one function · f046f5d0

由 Xiao Ni 提交于 8月 25, 2020

The following patch will reuse these logics, so pull the same codes into
one function.
Signed-off-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

f046f5d0

md/raid10: extend r10bio devs to raid disks · 8650a889

由 Xiao Ni 提交于 8月 25, 2020

Now it allocs r10bio->devs[conf->copies]. Discard bio needs to submit
to all member disks and it needs to use r10bio. So extend to
r10bio->devs[geo.raid_disks].
Reviewed-by: NColy Li <colyli@suse.de>
Signed-off-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

8650a889

md: add md_submit_discard_bio() for submitting discard bio · 2628089b

由 Xiao Ni 提交于 8月 25, 2020

Move these logic from raid0.c to md.c, so that we can also use it in
raid10.c.
Reviewed-by: NColy Li <colyli@suse.de>
Reviewed-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

2628089b

md: Simplify code with existing definition RESYNC_SECTORS in raid10.c · e287308b

由 Zhen Lei 提交于 8月 21, 2020

#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)

"RESYNC_BLOCK_SIZE/512" is equal to "RESYNC_BLOCK_SIZE >> 9", replace it
with RESYNC_SECTORS.
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

e287308b

md/raid5: reallocate page array after setting new stripe_size · 38912584

由 Yufen Yu 提交于 8月 20, 2020

When try to resize stripe_size, we also need to free old
shared page array and allocate new.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

38912584

md/raid5: resize stripe_head when reshape array · f16acaf3

由 Yufen Yu 提交于 8月 20, 2020

When reshape array, we try to reuse shared pages of old stripe_head,
and allocate more for the new one if needed.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

f16acaf3

md/raid5: let multiple devices of stripe_head share page · 046169f0

由 Yufen Yu 提交于 8月 20, 2020

In current implementation, grow_buffers() uses alloc_page() to
allocate the buffers for each stripe_head, i.e. allocate a page
for each dev[i] in stripe_head.

After setting stripe_size as a configurable value by writing
sysfs entry, it means that we always allocate 64K buffers, but
just use 4K of them when stripe_size is 4K in 64KB arm64.

To avoid wasting memory, we try to let multiple sh->dev share
one real page. That means, multiple sh->dev[i].page will point
to the only page with different offset. Example of 64K PAGE_SIZE
and 4K stripe_size as following:

                    64K PAGE_SIZE
          +---+---+---+---+------------------------------+
          |   |   |   |   |
          |   |   |   |   |
          +-+-+-+-+-+-+-+-+------------------------------+
            ^   ^   ^   ^
            |   |   |   +----------------------------+
            |   |   |                                |
            |   |   +-------------------+            |
            |   |                       |            |
            |   +----------+            |            |
            |              |            |            |
            +-+            |            |            |
              |            |            |            |
        +-----+-----+------+-----+------+-----+------+------+
sh      | offset(0) | offset(4K) | offset(8K) | offset(12K) |
 +      +-----------+------------+------------+-------------+
 +----> dev[0].page  dev[1].page  dev[2].page  dev[3].page

A new 'pages' array will be added into stripe_head to record shared
page used by this stripe_head. Allocate them when grow_buffers()
and free them when shrink_buffers().

After trying to share page, the users of sh->dev[i].page need to take
care of the related page offset: page of issued bio and page passed
to xor compution functions. But thanks for previous different page offset
supported. Here, we just need to set correct dev[i].offset.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

046169f0

md/raid6: let async recovery function support different page offset · 4f86ff55

由 Yufen Yu 提交于 8月 20, 2020

For now, asynchronous raid6 recovery calculate functions are require
common offset for pages. But, we expect them to support different page
offset after introducing stripe shared page. Do that by simplily adding
page offset where each page address are referred. Then, replace the
old interface with the new ones in raid6 and raid6test.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

4f86ff55

md/raid6: let syndrome computor support different page offset · d69454bc

由 Yufen Yu 提交于 8月 20, 2020

For now, syndrome compute functions require common offset in the pages
array. However, we expect them to support different offset when try to
use shared page in the following. Simplily covert them by adding page
offset where each page address are referred.

Since the only caller of async_gen_syndrome() and async_syndrome_val()
are in raid6, we don't want to reserve the old interface but modify the
interface directly. After that, replacing old interfaces with new ones
for raid6 and raid6test.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

d69454bc

md/raid5: convert to new xor compution interface · a7c224a8

由 Yufen Yu 提交于 8月 20, 2020

We try to replace async_xor() and async_xor_val() with the new
introduced interface async_xor_offs() and async_xor_val_offs()
for raid456.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

a7c224a8

md/raid5: add new xor function to support different page offset · 29bcff78

由 Yufen Yu 提交于 8月 20, 2020

raid5 will call async_xor() and async_xor_val() to compute xor.
For now, both of them require the common src/dst page offset. But,
we want them to support different src/dst page offset for following
shared page.

Here, adding two new function async_xor_offs() and async_xor_val_offs()
respectively for async_xor() and async_xor_val().
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

29bcff78

md/raid5: make async_copy_data() to support different page offset · 248728dd

由 Yufen Yu 提交于 8月 20, 2020

ops_run_biofill() and ops_run_biodrain() will call async_copy_data()
to copy sh->dev[i].page from or to bio page. For now, it implies the
offset of dev[i].page is 0. But we want to support different page offset
in the following.

Thus, pass page offset to these functions and replace 'page_offset'
with 'page_offset + poff'.

No functional change.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

248728dd

md/raid5: add a new member of offset into r5dev · 7aba13b7

由 Yufen Yu 提交于 8月 20, 2020

Add a new member of offset into struct r5dev. It indicates the
offset of related dev[i].page. For now, since each device have a
privated page, the value is always 0. Thus, we set offset as 0
when allcate page in grow_buffers() and resize_stripes().

To support following different page offset, we try to use the page
offset rather than '0' directly for async_memcpy() and ops_run_io().

We try to support different page offset for xor compution functions
in the following. To avoid repeatly allocate a new array each time,
we add a memory region into scribble buffer to record offset.

No functional change.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

7aba13b7

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功