提交 · 978e51ba38e00e9da09b3ef9ed8c94af7b55a1eb · openanolis / cloud-kernel

20 12月, 2017 2 次提交

dm: optimize bio-based NVMe IO submission · 978e51ba

由 Mike Snitzer 提交于 12月 09, 2017

Upper level bio-based drivers that stack immediately ontop of NVMe can
leverage direct_make_request().  In addition DM's NVMe bio-based
will initially only ever have one NVMe device that it submits IO to at a
time.  There is no splitting needed.  Enhance DM core so that
DM_TYPE_NVME_BIO_BASED's IO submission takes advantage of both of these
characteristics.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

978e51ba

dm: introduce DM_TYPE_NVME_BIO_BASED · 22c11858

由 Mike Snitzer 提交于 12月 04, 2017

If dm_table_determine_type() establishes DM_TYPE_NVME_BIO_BASED then
all devices in the DM table do not support partial completions.  Also,
the table has a single immutable target that doesn't require DM core to
split bios.

This will enable adding NVMe optimizations to bio-based DM.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

22c11858

18 12月, 2017 1 次提交

dm: simplify start of block stats accounting for bio-based · f3986374

由 Mike Snitzer 提交于 12月 17, 2017

No apparent need to generic_start_io_acct() until before the IO is ready
for submission.  start_io_acct() is the proper place to do this
accounting -- it is also where DM accounts for pending IO and, if
enabled, starts dm-stats accounting.

Replace start_io_acct()'s part_round_stats() with generic_start_io_acct().
This eliminates needing to take part_stat_lock() multiple times when
starting an IO on bio-based devices.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f3986374

17 12月, 2017 5 次提交

dm: remove redundant mapped_device member from clone_info structure · bc02cdbe

由 Mike Snitzer 提交于 12月 14, 2017

'struct dm_io' already has the same pointer.  So update all accesses
from ci->md to ci->io->md.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bc02cdbe

M
dm: remove now unused bio-based io_pool and _io_cache · dde1e1ec
由 Mike Snitzer 提交于 12月 11, 2017
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
dde1e1ec

dm: improve performance by moving dm_io structure to per-bio-data · 64f52b0e

由 Mike Snitzer 提交于 12月 11, 2017

Eliminates need for a separate mempool to allocate 'struct dm_io'
objects from.  As such, it saves an extra mempool allocation for each
original bio that DM core is issued.

This complicates the per-bio-data accessor functions by needing to
conditonally add extra padding to get to a target's per-bio-data.  But
in the end this provides a decent performance improvement for all
bio-based DM devices.

On an NVMe-loop based testbed to a ramdisk (~3100 MB/s): bio-based
DM linear performance improved by 2% (went from 2665 to 2777 MB/s).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

64f52b0e

M
dm: rename 'bio' member of dm_io structure to 'orig_bio' · 745dc570
由 Mike Snitzer 提交于 12月 11, 2017
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
745dc570

dm: remove stale comment blocks · 2abf1fc9

由 Mike Snitzer 提交于 12月 09, 2017

These CRUD comments have worn out their welcome.  The code is what it
is, over time it'll hopefully get better.  But these comments serve no
purpose whatsoever.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2abf1fc9

14 12月, 2017 8 次提交

dm: set QUEUE_FLAG_DAX accordingly in dm_table_set_restrictions() · ad3793fc

由 Mike Snitzer 提交于 12月 04, 2017

Rather than having DAX support be unique by setting it based on table
type in dm_setup_md_queue().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ad3793fc

dm: fix __send_changing_extent_only() to send first bio and chain remainder · 3d7f4562

由 Mike Snitzer 提交于 12月 08, 2017

__send_changing_extent_only() must follow the same pattern that was
established with commit "dm: ensure bio submission follows a depth-first
tree walk".  That is: submit first bio up to split boundary and then
split the remainder to further submissions.
Suggested-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

3d7f4562

dm: ensure bio-based DM's bioset and io_pool support targets' maximum IOs · 0776aa0e

由 Mike Snitzer 提交于 12月 08, 2017

alloc_multiple_bios() assumes it can allocate the requested number of
bios but until now there was no gaurantee that the mempools would be
accomodating.
Suggested-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0776aa0e

dm: remove BIOSET_NEED_RESCUER based dm_offload infrastructure · 4a3f54d9

由 Mike Snitzer 提交于 11月 22, 2017

Now that all of DM has been revised and/or verified to no longer require
the use of BIOSET_NEED_RESCUER the dm_offload code may be removed.
Suggested-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4a3f54d9

dm: safely allocate multiple bioset bios · 318716dd

由 Mike Snitzer 提交于 11月 22, 2017

DM targets can request multiple bios be sent to them by DM core (see:
num_{flush,discard,write_same,write_zeroes}_bios).  But until now these
bios were allocated in an unsafe manner than could potentially exhaust
the DM device's bioset -- in the face of multiple threads each trying to
do multiple allocations from the same DM device's bioset.

Fix __send_duplicate_bios() by using the new alloc_multiple_bios().  The
allocation strategy used by alloc_multiple_bios() models that used by
dm-crypt.c:crypt_alloc_buffer().

Neil Brown initially proposed this fix but the implementation has been
revised enough that it inappropriate to attribute the entirety of it to
him.
Suggested-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

318716dd

dm: remove unused 'num_write_bios' target interface · f31c21e4

由 NeilBrown 提交于 11月 22, 2017

No DM target provides num_write_bios and none has since dm-cache's
brief use in 2013.

Having the possibility of num_write_bios > 1 complicates bio
allocation.  So remove the interface and assume there is only one bio
needed.

If a target ever needs more, it must provide a suitable bioset and
allocate itself based on its particular needs.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f31c21e4

dm: ensure bio submission follows a depth-first tree walk · 18a25da8

由 NeilBrown 提交于 9月 06, 2017

A dm device can, in general, represent a tree of targets, each of which
handles a sub-range of the range of blocks handled by the parent.

The bio sequencing managed by generic_make_request() requires that bios
are generated and handled in a depth-first manner.  Each call to a
make_request_fn() may submit bios to a single member device, and may
submit bios for a reduced region of the same device as the
make_request_fn.

In particular, any bios submitted to member devices must be expected to
be processed in order, so a later one must never wait for an earlier
one.

This ordering is usually achieved by using bio_split() to reduce a bio
to a size that can be completely handled by one target, and resubmitting
the remainder to the originating device. bio_queue_split() shows the
canonical approach.

dm doesn't follow this approach, largely because it has needed to split
bios since long before bio_split() was available.  It currently can
submit bios to separate targets within the one dm_make_request() call.
Dependencies between these targets, as can happen with dm-snap, can
cause deadlocks if either bios gets stuck behind the other in the queues
managed by generic_make_request().  This requires the 'rescue'
functionality provided by dm_offload_{start,end}.

Some of this requirement can be removed by changing the order of bio
submission to follow the canonical approach.  That is, if dm finds that
it needs to split a bio, the remainder should be sent to
generic_make_request() rather than being handled immediately.  This
delays the handling until the first part is completely processed, so the
deadlock problems do not occur.

__split_and_process_bio() can be called both from dm_make_request() and
from dm_wq_work().  When called from dm_wq_work() the current approach
is perfectly satisfactory as each bio will be processed immediately.
When called from dm_make_request(), current->bio_list will be non-NULL,
and in this case it is best to create a separate "clone" bio for the
remainder.

When we use bio_clone_bioset() to split off the front part of a bio
and chain the two together and submit the remainder to
generic_make_request(), it is important that the newly allocated
bio is used as the head to be processed immediately, and the original
bio gets "bio_advance()"d and sent to generic_make_request() as the
remainder.  Otherwise, if the newly allocated bio is used as the
remainder, and if it then needs to be split again, then the next
bio_clone_bioset() call will be made while holding a reference a bio
(result of the first clone) from the same bioset.  This can potentially
exhaust the bioset mempool and result in a memory allocation deadlock.

Note that there is no race caused by reassigning cio.io->bio after already
calling __map_bio().  This bio will only be dereferenced again after
dec_pending() has found io->io_count to be zero, and this cannot happen
before the dec_pending() call at the end of __split_and_process_bio().

To provide the clone bio when splitting, we use q->bio_split.  This
was previously being freed by bio-based dm to avoid having excess
rescuer threads.  As bio_split bio sets no longer create rescuer
threads, there is little cost and much gain from restoring the
q->bio_split bio set.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

18a25da8

dm: fix comment above dm_accept_partial_bio · c06b3e58

由 NeilBrown 提交于 11月 21, 2017

Clarify that dm_accept_partial_bio isn't allowed for REQ_OP_ZONE_RESET
bios.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c06b3e58

11 11月, 2017 3 次提交

dm: small cleanup in dm_get_md() · 49de5769

由 Mike Snitzer 提交于 11月 06, 2017

Makes dm_get_md() and dm_get_from_kobject() have similar code.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

49de5769

dm: fix race between dm_get_from_kobject() and __dm_destroy() · b9a41d21

由 Hou Tao 提交于 11月 01, 2017

The following BUG_ON was hit when testing repeat creation and removal of
DM devices:

    kernel BUG at drivers/md/dm.c:2919!
    CPU: 7 PID: 750 Comm: systemd-udevd Not tainted 4.1.44
    Call Trace:
     [<ffffffff81649e8b>] dm_get_from_kobject+0x34/0x3a
     [<ffffffff81650ef1>] dm_attr_show+0x2b/0x5e
     [<ffffffff817b46d1>] ? mutex_lock+0x26/0x44
     [<ffffffff811df7f5>] sysfs_kf_seq_show+0x83/0xcf
     [<ffffffff811de257>] kernfs_seq_show+0x23/0x25
     [<ffffffff81199118>] seq_read+0x16f/0x325
     [<ffffffff811de994>] kernfs_fop_read+0x3a/0x13f
     [<ffffffff8117b625>] __vfs_read+0x26/0x9d
     [<ffffffff8130eb59>] ? security_file_permission+0x3c/0x44
     [<ffffffff8117bdb8>] ? rw_verify_area+0x83/0xd9
     [<ffffffff8117be9d>] vfs_read+0x8f/0xcf
     [<ffffffff81193e34>] ? __fdget_pos+0x12/0x41
     [<ffffffff8117c686>] SyS_read+0x4b/0x76
     [<ffffffff817b606e>] system_call_fastpath+0x12/0x71

The bug can be easily triggered, if an extra delay (e.g. 10ms) is added
between the test of DMF_FREEING & DMF_DELETING and dm_get() in
dm_get_from_kobject().

To fix it, we need to ensure the test of DMF_FREEING & DMF_DELETING and
dm_get() are done in an atomic way, so _minor_lock is used.

The other callers of dm_get() have also been checked to be OK: some
callers invoke dm_get() under _minor_lock, some callers invoke it under
_hash_lock, and dm_start_request() invoke it after increasing
md->open_count.

Cc: stable@vger.kernel.org
Signed-off-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b9a41d21

dm: allocate struct mapped_device with kvzalloc · 856eb091

由 Mikulas Patocka 提交于 10月 31, 2017

The structure srcu_struct can be very big, its size is proportional to the
value CONFIG_NR_CPUS. The Fedora kernel has CONFIG_NR_CPUS 8192, the field
io_barrier in the struct mapped_device has 84kB in the debugging kernel
and 50kB in the non-debugging kernel. The large size may result in failure
of the function kzalloc_node.

In order to avoid the allocation failure, we use the function
kvzalloc_node, this function falls back to vmalloc if a large contiguous
chunk of memory is not available. This patch also moves the field
io_barrier to the last position of struct mapped_device - the reason is
that on many processor architectures, short memory offsets result in
smaller code than long memory offsets - on x86-64 it reduces code size by
320 bytes.

Note to stable kernel maintainers - the kernels 4.11 and older don't have
the function kvzalloc_node, you can use the function vzalloc_node instead.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

856eb091

25 10月, 2017 2 次提交

locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05

由 Mark Rutland 提交于 10月 23, 2017

locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()

Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6aa7de05

dm: convert table_device.count from atomic_t to refcount_t · b0b4d7c6

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable table_device.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b0b4d7c6

06 10月, 2017 1 次提交

block: remove QUEUE_FLAG_STACKABLE · 5fdee212

由 Christoph Hellwig 提交于 10月 05, 2017

We already have a queue_is_rq_based helper to check if a request_queue
is request based, so we can remove the flag for it.
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5fdee212

25 9月, 2017 1 次提交

dm ioctl: fix alignment of event number in the device list · 62e08243

由 Mikulas Patocka 提交于 9月 20, 2017

The size of struct dm_name_list is different on 32-bit and 64-bit
kernels (so "(nl + 1)" differs between 32-bit and 64-bit kernels).

This mismatch caused some harmless difference in padding when using 32-bit
or 64-bit kernel. Commit 23d70c5e ("dm ioctl: report event number in
DM_LIST_DEVICES") added reporting event number in the output of
DM_LIST_DEVICES_CMD. This difference in padding makes it impossible for
userspace to determine the location of the event number (the location
would be different when running on 32-bit and 64-bit kernels).

Fix the padding by using offsetof(struct dm_name_list, name) instead of
sizeof(struct dm_name_list) to determine the location of entries.

Also, the ioctl version number is incremented to 37 so that userspace
can use the version number to determine that the event number is present
and correctly located.

In addition, a global event is now raised when a DM device is created,
removed, renamed or when table is swapped, so that the user can monitor
for device changes.
Reported-by: NEugene Syromiatnikov <esyr@redhat.com>
Fixes: 23d70c5e ("dm ioctl: report event number in DM_LIST_DEVICES")
Cc: stable@vger.kernel.org # 4.13
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

62e08243

11 9月, 2017 1 次提交

dax: remove the pmem_dax_ops->flush abstraction · c3ca015f

由 Mikulas Patocka 提交于 8月 31, 2017

Commit abebfbe2 ("dm: add ->flush() dax operation support") is
buggy. A DM device may be composed of multiple underlying devices and
all of them need to be flushed. That commit just routes the flush
request to the first device and ignores the other devices.

It could be fixed by adding more complex logic to the device mapper. But
there is only one implementation of the method pmem_dax_ops->flush - that
is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we
don't need the pmem_dax_ops->flush abstraction at all, we can call
arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush
can't ever reach anything different from arch_wb_cache_pmem().

It should be also pointed out that for some uses of persistent memory it
is needed to flush only a very small amount of data (such as 1 cacheline),
and it would be overkill if we go through that device mapper machinery for
a single flushed cache line.

Fix this by removing the pmem_dax_ops->flush abstraction and call
arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device
mapper code that forwards the flushes.

Fixes: abebfbe2 ("dm: add ->flush() dax operation support")
Cc: stable@vger.kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c3ca015f

28 8月, 2017 2 次提交

dm: fix printk() rate limiting code · 60440789

由 Bart Van Assche 提交于 8月 09, 2017

Using the same rate limiting state for different kinds of messages
is wrong because this can cause a high frequency message to suppress
a report of a low frequency message. Hence use a unique rate limiting
state per message type.

Fixes: 71a16736 ("dm: use local printk ratelimit")
Cc: stable@vger.kernel.org
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

60440789

dm: fix the second dec_pending() argument in __split_and_process_bio() · 54385bf7

由 Bart Van Assche 提交于 8月 09, 2017

Detected by sparse.

Fixes: 4e4cbee9 ("block: switch bios to blk_status_t")
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NLaurence Oberman <loberman@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

54385bf7

24 8月, 2017 1 次提交

block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992

由 Christoph Hellwig 提交于 8月 23, 2017

This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

74d46992

10 8月, 2017 1 次提交

block: pass in queue to inflight accounting · d62e26b3

由 Jens Axboe 提交于 6月 30, 2017

No functional change in this patch, just in preparation for
basing the inflight mechanism on the queue in question.
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d62e26b3

04 7月, 2017 1 次提交

bio-integrity: fix interface for bio_integrity_trim · fbd08e76

由 Dmitry Monakhov 提交于 6月 29, 2017

bio_integrity_trim inherent it's interface from bio_trim and accept
offset and size, but this API is error prone because data offset
must always be insync with bio's data offset. That is why we have
integrity update hook in bio_advance()

So only meaningful values are: offset == 0, sectors == bio_sectors(bio)
Let's just remove them completely.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fbd08e76

28 6月, 2017 1 次提交

dm: don't set bounce limit · 41341afa

由 Christoph Hellwig 提交于 6月 19, 2017

Now all queues allocators come without abounce limit by default,
dm doesn't have to override this anymore.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

41341afa

19 6月, 2017 6 次提交

dm: introduce dm_remap_zone_report() · 10999307

由 Damien Le Moal 提交于 5月 08, 2017

A target driver support zoned block devices and exposing it as such may
receive REQ_OP_ZONE_REPORT request for the user to determine the mapped
device zone configuration. To process properly such request, the target
driver may need to remap the zone descriptors provided in the report
reply. The helper function dm_remap_zone_report() does this generically
using only the target start offset and length and the start offset
within the target device.

dm_remap_zone_report() will remap the start sector of all zones
reported. If the report includes sequential zones, the write pointer
position of these zones will also be remapped.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

10999307

dm: fix REQ_OP_ZONE_REPORT bio handling · 264c869d

由 Damien Le Moal 提交于 5月 08, 2017

A REQ_OP_ZONE_REPORT bio is not a medium access command.  Its number of
sectors indicates the maximum size allowed for the report reply size and
not an amount of sectors accessed from the device.  REQ_OP_ZONE_REPORT
bios should thus not be split depending on the target device maximum I/O
length but passed as-is.  Note that it is the responsability of the
target to remap and format the report reply.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

264c869d

dm: fix REQ_OP_ZONE_RESET bio handling · a4aa5e56

由 Damien Le Moal 提交于 5月 08, 2017

The REQ_OP_ZONE_RESET bio has no payload and zero sectors.  Its position
is the only information used to indicate the zone to reset on the
device.  Due to its zero length, this bio is not cloned and sent to the
target through the non-flush case in __split_and_process_bio().  Add an
additional case in that function to call __split_and_process_non_flush()
without checking the clone info size.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

a4aa5e56

dm: add basic support for using the select or poll function · 93e6442c

由 Mikulas Patocka 提交于 1月 16, 2017

Add the ability to poll on the /dev/mapper/control device.  The select
or poll function waits until any event happens on any dm device since
opening the /dev/mapper/control device.  When select or poll returns the
device as readable, we must close and reopen the device to wait for new
dm events.

Usage:
1. open the /dev/mapper/control device
2. scan the event numbers of all devices we are interested in and process
   them
3. call select, poll or epoll on the handle (it waits until some new event
   happens since opening the device)
4. close the /dev/mapper/control handle
5. go to step 1

The next commit allows to re-arm the polling without closing and
reopening the device.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAndy Grover <agrover@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

93e6442c

blk: make the bioset rescue_workqueue optional. · 47e0fb46

由 NeilBrown 提交于 6月 18, 2017

This patch converts bioset_create() to not create a workqueue by
default, so alloctions will never trigger punt_bios_to_rescuer().  It
also introduces a new flag BIOSET_NEED_RESCUER which tells
bioset_create() to preserve the old behavior.

All callers of bioset_create() that are inside block device drivers,
are given the BIOSET_NEED_RESCUER flag.

biosets used by filesystems or other top-level users do not
need rescuing as the bio can never be queued behind other
bios.  This includes fs_bio_set, blkdev_dio_pool,
btrfs_bioset, xfs_ioend_bioset, and one allocated by
target_core_iblock.c.

biosets used by md/raid do not need rescuing as
their usage was recently audited and revised to never
risk deadlock.

It is hoped that most, if not all, of the remaining biosets
can end up being the non-rescued version.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Credit-to: Ming Lei <ming.lei@redhat.com> (minor fixes)
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

47e0fb46

blk: replace bioset_create_nobvec() with a flags arg to bioset_create() · 011067b0

由 NeilBrown 提交于 6月 18, 2017

"flags" arguments are often seen as good API design as they allow
easy extensibility.
bioset_create_nobvec() is implemented internally as a variation in
flags passed to __bioset_create().

To support future extension, make the internal structure part of the
API.
i.e. add a 'flags' argument to bioset_create() and discard
bioset_create_nobvec().

Note that the bio_split allocations in drivers/md/raid* do not need
the bvec mempool - they should have used bioset_create_nobvec().
Suggested-by: NChristoph Hellwig <hch@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@infradead.org>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

011067b0

16 6月, 2017 1 次提交

dm: add ->flush() dax operation support · abebfbe2

由 Dan Williams 提交于 5月 29, 2017

Allow device-mapper to route flush operations to the
per-target implementation. In order for the device stacking to work we
need a dax_dev and a pgoff relative to that device. This gives each
layer of the stack the information it needs to look up the operation
pointer for the next level.

This conceptually allows for an array of mixed device drivers with
varying flush implementations.
Reviewed-by: NToshi Kani <toshi.kani@hpe.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

abebfbe2

10 6月, 2017 1 次提交

dm: add ->copy_from_iter() dax operation support · 7e026c8c

由 Dan Williams 提交于 5月 29, 2017

Allow device-mapper to route copy_from_iter operations to the
per-target implementation. In order for the device stacking to work we
need a dax_dev and a pgoff relative to that device. This gives each
layer of the stack the information it needs to look up the operation
pointer for the next level.

This conceptually allows for an array of mixed device drivers with
varying copy_from_iter implementations.
Reviewed-by: NToshi Kani <toshi.kani@hpe.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

7e026c8c

09 6月, 2017 2 次提交

block: switch bios to blk_status_t · 4e4cbee9

由 Christoph Hellwig 提交于 6月 03, 2017

Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

4e4cbee9

dm: change ->end_io calling convention · 1be56909

由 Christoph Hellwig 提交于 6月 03, 2017

Turn the error paramter into a pointer so that target drivers can change
the value, and make sure only DM_ENDIO_* values are returned from the
methods.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

1be56909

openanolis / cloud-kernel 10 个月 前同步成功

openanolis / cloud-kernel
10 个月前同步成功