提交 · 0618764cb25f6fa9fb31152995de42a8a0496475 · openanolis / cloud-kernel

16 4月, 2015 23 次提交

dm crypt: fix deadlock when async crypto algorithm returns -EBUSY · 0618764c

由 Ben Collins 提交于 4月 03, 2015

I suspect this doesn't show up for most anyone because software
algorithms typically don't have a sense of being too busy.  However,
when working with the Freescale CAAM driver it will return -EBUSY on
occasion under heavy -- which resulted in dm-crypt deadlock.

After checking the logic in some other drivers, the scheme for
crypt_convert() and it's callback, kcryptd_async_done(), were not
correctly laid out to properly handle -EBUSY or -EINPROGRESS.

Fix this by using the completion for both -EBUSY and -EINPROGRESS.  Now
crypt_convert()'s use of completion is comparable to
af_alg_wait_for_completion().  Similarly, kcryptd_async_done() follows
the pattern used in af_alg_complete().

Before this fix dm-crypt would lockup within 1-2 minutes running with
the CAAM driver.  Fix was regression tested against software algorithms
on PPC32 and x86_64, and things seem perfectly happy there as well.
Signed-off-by: NBen Collins <ben.c@servergy.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

0618764c

dm crypt: leverage immutable biovecs when decrypting on read · 59779079

由 Mike Snitzer 提交于 4月 09, 2015

Commit 003b5c57 ("block: Convert drivers to immutable biovecs")
stopped short of changing dm-crypt to leverage the fact that the biovec
array of a bio will no longer be modified.

Switch to using bio_clone_fast() when cloning bios for decryption after
read.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

59779079

dm crypt: update URLs to new cryptsetup project page · e44f23b3

由 Milan Broz 提交于 4月 05, 2015

Cryptsetup home page moved to GitLab.
Also remove link to abandonded Truecrypt page.
Signed-off-by: NMilan Broz <gmazyland@gmail.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e44f23b3

dm: add log writes target · 0e9cebe7

由 Josef Bacik 提交于 3月 20, 2015

Introduce a new target that is meant for file system developers to test file
system integrity at particular points in the life of a file system.  We capture
all write requests and associated data and log them to a separate device
for later replay.  There is a userspace utility to do this replay.  The
idea behind this is to give file system developers a tool to verify that
the file system is always consistent.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NZach Brown <zab@zabbo.net>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0e9cebe7

dm table: use bool function return values of true/false not 1/0 · 7f61f5a0

由 Joe Perches 提交于 3月 30, 2015

Use the normal return values for bool functions.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

7f61f5a0

dm verity: add error handling modes for corrupted blocks · 65ff5b7d

由 Sami Tolvanen 提交于 3月 18, 2015

Add device specific modes to dm-verity to specify how corrupted
blocks should be handled.  The following modes are defined:

  - DM_VERITY_MODE_EIO is the default behavior, where reading a
    corrupted block results in -EIO.

  - DM_VERITY_MODE_LOGGING only logs corrupted blocks, but does
    not block the read.

  - DM_VERITY_MODE_RESTART calls kernel_restart when a corrupted
    block is discovered.

In addition, each mode sends a uevent to notify userspace of
corruption and to allow further recovery actions.

The driver defaults to previous behavior (DM_VERITY_MODE_EIO)
and other modes can be enabled with an additional parameter to
the verity table.
Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

65ff5b7d

M
dm thin: remove stale 'trim' message documentation · 0e0e32c1
由 Mike Snitzer 提交于 3月 18, 2015
```
The 'trim' message wasn't ever implemented.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
0e0e32c1

dm delay: use msecs_to_jiffies for time conversion · aca607ba

由 Nicholas Mc Guire 提交于 3月 17, 2015

Converting milliseconds to jiffies by "val * HZ / 1000" is technically
OK but msecs_to_jiffies(val) is the cleaner solution and handles all
corner cases correctly.
Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

aca607ba

dm log userspace base: fix compile warning · 18cc980a

由 Nicholas Mc Guire 提交于 3月 18, 2015

This fixes up a compile warning [-Wunused-but-set-variable] - given the
comment in userspace_set_region_sync() the non-reporting of errors is
intentional so the return value can be dropped to make gcc happy.

Also, fix typo in comment.
Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

18cc980a

dm log userspace transfer: match wait_for_completion_timeout return type · c32a512f

由 Nicholas Mc Guire 提交于 3月 15, 2015

Return type of wait_for_completion_timeout() is unsigned long not int.
An appropriately named unsigned long is added and the assignment fixed.
Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c32a512f

dm table: fall back to getting device using name_to_dev_t() · 644bda6f

由 Dan Ehrenberg 提交于 2月 10, 2015

If a device is used as the root filesystem, it can't be built
off of devices which are within the root filesystem (just like
command line arguments to root=).  For this reason, Linux has a
pseudo-filesystem for root= and MD initialization (based on the
function name_to_dev_t) which handles different ways of specifying
devices including PARTUUID and major:minor.

Switch to using name_to_dev_t() in dm_get_device().  Rather than
having DM assume that all things which are not major:minor are paths in
an already-mounted filesystem, change dm_get_device() to first attempt
to look up the device in the filesystem, and if not found it will fall
back to using name_to_dev_t().

In terms of backwards compatibility, there are some cases where
behavior will be different:
- If you have a file in the current working directory named 1:2 and
  you initialze DM there, then it will try to use that file rather
  than the disk with that major:minor pair as a backing device.
- Similarly for other bdev types which name_to_dev_t() knows how to
  interpret, the previous behavior was to repeatedly check for the
  existence of the file (e.g., while waiting for rootfs to come up)
  but the new behavior is to use the name_to_dev_t() interpretation.
  For example, if you have a file named /dev/ubiblock0_0 which is
  a symlink to /dev/sda3, but it is not yet present when DM starts
  to initialize, then the name_to_dev_t() interpretation will take
  precedence.

These incompatibilities would only show up in really strange setups
with bad practices so we shouldn't have to worry about them.
Signed-off-by: NDan Ehrenberg <dehrenberg@chromium.org>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

644bda6f

init: stricter checking of major:minor root= values · 283e7ad0

由 Dan Ehrenberg 提交于 2月 10, 2015

In the kernel command-line, previously, root=1:2jakshflaksjdhfa would
be accepted and interpreted just like root=1:2. This patch adds
stricter checking so that additional characters after major:minor are
rejected by root=.

The goal of this change is to help in unifying DM's interpretation of
its block device argument by using existing kernel code (name_to_dev_t).
But DM rejects malformed major:minor pairs, it seems reasonable for
root= to reject them as well.
Signed-off-by: NDan Ehrenberg <dehrenberg@chromium.org>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

283e7ad0

init: export name_to_dev_t and mark name argument as const · e6e20a7a

由 Dan Ehrenberg 提交于 2月 10, 2015

DM will switch its device lookup code to using name_to_dev_t() so it
must be exported.  Also, the @name argument should be marked const.
Signed-off-by: NDan Ehrenberg <dehrenberg@chromium.org>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e6e20a7a

dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr · 17e149b8

由 Mike Snitzer 提交于 3月 11, 2015

Request-based DM's blk-mq support defaults to off; but a user can easily
change the default using the dm_mod.use_blk_mq module/boot option.

Also, you can check what mode a given request-based DM device is using
with: cat /sys/block/dm-X/dm/use_blk_mq

This change enabled further cleanup and reduced work (e.g. the
md->io_pool and md->rq_pool isn't created if using blk-mq).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

17e149b8

dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq · 02233342

由 Mike Snitzer 提交于 3月 10, 2015

dm_mq_queue_rq() is in atomic context so care must be taken to not
sleep -- as such GFP_ATOMIC is used for the md->bs bioset allocations
and dm-mpath's call to blk_get_request().  In the future the bioset
allocations will hopefully go away (by removing support for partial
completions of bios in a cloned request).

Also prepare for supporting DM blk-mq ontop of old-style request_fn
device(s) if a new dm-mod 'use_blk_mq' parameter is set.  The kthread
will still be used to queue work if blk-mq is used ontop of old-style
request_fn device(s).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

02233342

dm: add full blk-mq support to request-based DM · bfebd1cd

由 Mike Snitzer 提交于 3月 08, 2015

Commit e5863d9a ("dm: allocate requests in target when stacking on
blk-mq devices") served as the first step toward fully utilizing blk-mq
in request-based DM -- it enabled stacking an old-style (request_fn)
request_queue ontop of the underlying blk-mq device(s).  That first step
didn't improve performance of DM multipath ontop of fast blk-mq devices
(e.g. NVMe) because the top-level old-style request_queue was severely
limited by the queue_lock.

The second step offered here enables stacking a blk-mq request_queue
ontop of the underlying blk-mq device(s).  This unlocks significant
performance gains on fast blk-mq devices, Keith Busch tested on his NVMe
testbed and offered this really positive news:

 "Just providing a performance update. All my fio tests are getting
  roughly equal performance whether accessed through the raw block
  device or the multipath device mapper (~470k IOPS). I could only push
  ~20% of the raw iops through dm before this conversion, so this latest
  tree is looking really solid from a performance standpoint."
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Tested-by: NKeith Busch <keith.busch@intel.com>

bfebd1cd

dm: impose configurable deadline for dm_request_fn's merge heuristic · 0ce65797

由 Mike Snitzer 提交于 2月 26, 2015

Otherwise, for sequential workloads, the dm_request_fn can allow
excessive request merging at the expense of increased service time.

Add a per-device sysfs attribute to allow the user to control how long a
request, that is a reasonable merge candidate, can be queued on the
request queue. The resolution of this request dispatch deadline is in
microseconds (ranging from 1 to 100000 usecs), to set a 20us deadline:
echo 20 > /sys/block/dm-7/dm/rq_based_seq_io_merge_deadline

The dm_request_fn's merge heuristic and associated extra accounting is
disabled by default (rq_based_seq_io_merge_deadline is 0).

This sysfs attribute is not applicable to bio-based DM devices so it
will only ever report 0 for them.

By allowing a request to remain on the queue it will block others
requests on the queue. But introducing a short dequeue delay has proven
very effective at enabling certain sequential IO workloads on really
fast, yet IOPS constrained, devices to build up slightly larger IOs --
yielding 90+% throughput improvements. Having precise control over the
time taken to wait for larger requests to build affords control beyond
that of waiting for certain IO sizes to accumulate (which would require
a deadline anyway). This knob will only ever make sense with sequential
IO workloads and the particular value used is storage configuration
specific.

Given the expected niche use-case for when this knob is useful it has
been deemed acceptable to expose this relatively crude method for
crafting optimal IO on specific storage -- especially given the solution
is simple yet effective. In the context of DM multipath, it is
advisable to tune this sysfs attribute to a value that offers the best
performance for the common case (e.g. if 4 paths are expected active,
tune for that; if paths fail then performance may be slightly reduced).

Alternatives were explored to have request-based DM autotune this value
(e.g. if/when paths fail) but they were quickly deemed too fragile and
complex to warrant further design and development time. If this problem
proves more common as faster storage emerges we'll have to look at
elevating a generic solution into the block core.
Tested-by: NShiva Krishna Merla <shivakrishna.merla@netapp.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0ce65797

dm sysfs: introduce ability to add writable attributes · b898320d

由 Mike Snitzer 提交于 2月 27, 2015

Add DM_ATTR_RW() macro and establish .store method in dm_sysfs_ops.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b898320d

dm: don't start current request if it would've merged with the previous · de3ec86d

由 Mike Snitzer 提交于 2月 24, 2015

Request-based DM's dm_request_fn() is so fast to pull requests off the
queue that steps need to be taken to promote merging by avoiding request
processing if it makes sense.

If the current request would've merged with previous request let the
current request stay on the queue longer.
Suggested-by: NJens Axboe <axboe@fb.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

de3ec86d

dm: reduce the queue delay used in dm_request_fn from 100ms to 10ms · d548b34b

由 Mike Snitzer 提交于 3月 05, 2015

Commit 7eaceacc ("block: remove per-queue plugging") didn't justify
DM's use of a 100ms delay; such an extended delay is a liability when
there is reason to re-kick the queue.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d548b34b

dm: don't schedule delayed run of the queue if nothing to do · 9d1deb83

由 Mike Snitzer 提交于 2月 24, 2015

In request-based DM's dm_request_fn(), if blk_peek_request() returns
NULL just return.  Avoids unnecessary blk_delay_queue().
Reported-by: NJens Axboe <axboe@fb.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9d1deb83

dm: only run the queue on completion if congested or no requests pending · 9a0e609e

由 Mike Snitzer 提交于 2月 24, 2015

On really fast storage it can be beneficial to delay running the
request_queue to allow the elevator more opportunity to merge requests.

Otherwise, it has been observed that requests are being sent to
q->request_fn much quicker than is ideal on IOPS-bound backends.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9a0e609e

dm: remove request-based logic from make_request_fn wrapper · ff36ab34

由 Mike Snitzer 提交于 2月 23, 2015

The old dm_request() method used for q->make_request_fn had a branch for
request-based DM support but it isn't needed given that
dm_init_request_based_queue() sets it to the standard blk_queue_bio()
anyway.

Cleanup dm_init_md_queue() to be DM device-type agnostic and have
dm_setup_md_queue() properly finish queue setup based on DM device-type
(bio-based vs request-based).

A followup block patch can be made to remove the export for
blk_queue_bio() now that DM no longer calls it directly.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ff36ab34

01 4月, 2015 11 次提交

dm: remove request-based DM queue's lld_busy_fn hook · d56b9b28

由 Mike Snitzer 提交于 2月 23, 2015

DM multipath is the only caller of blk_lld_busy() -- which calls a
queue's lld_busy_fn hook. Request-based DM doesn't support stacking
multipath devices so there is no reason to register the lld_busy_fn hook
on a multipath device's queue using blk_queue_lld_busy().

As such, remove functions dm_lld_busy and dm_table_any_busy_target.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d56b9b28

dm: remove unnecessary wrapper around blk_lld_busy · 52b09914

由 Mike Snitzer 提交于 2月 23, 2015

There is no need for DM to export a wrapper around the already exported
blk_lld_busy().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

52b09914

dm: rename __dm_get_reserved_ios() helper to __dm_get_module_param() · 09c2d531

由 Mike Snitzer 提交于 2月 27, 2015

__dm_get_module_param() could be useful for future DM module parameters
besides those related to "reserved_ios".
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

09c2d531

M
dm switch: fix Documentation to use plain text · e73f6e8a
由 Mike Snitzer 提交于 2月 27, 2015
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
e73f6e8a

dm cache policy mq: try not to writeback data that changed in the last second · e65ff870