提交 · f6e629bd2379dceb547be93915314307871a7f6c · openanolis / cloud-kernel

21 7月, 2016 12 次提交

dm snap: add fake origin_direct_access · f6e629bd

由 Toshi Kani 提交于 6月 28, 2016

dax-capable mapped-device is marked as DM_TYPE_DAX_BIO_BASED,
which supports both dax and bio-based operations.  dm-snap
needs to work with dax-capable device when bio-based operation
is used.

Add fake origin_direct_access() to origin device so that its
origin device is also marked as DM_TYPE_DAX_BIO_BASED for
dax-capable device.  This allows to extend target's DM table.
dm-snap works normally when bio-based operation is used.

dm-snap does not support dax operation, and mount with dax
option to a target device or snapshot device fails.
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f6e629bd

dm stripe: add DAX support · beec25b4

由 Toshi Kani 提交于 6月 24, 2016

Change dm-stripe to implement direct_access function,
stripe_direct_access(), which maps bdev and sector and
calls direct_access function of its physical target device.
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

beec25b4

dm error: add DAX support · f8df1fdf

由 Mike Snitzer 提交于 6月 24, 2016

Allow the error target to replace an existing DAX-enabled target.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f8df1fdf

dm linear: add DAX support · 84b22f83

由 Toshi Kani 提交于 6月 22, 2016

Change dm-linear to implement direct_access function,
linear_direct_access(), which maps sector and calls direct_access
function of its physical target device.
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

84b22f83

dm: add infrastructure for DAX support · 545ed20e

由 Toshi Kani 提交于 6月 22, 2016

Change mapped device to implement direct_access function,
dm_blk_direct_access(), which calls a target direct_access function.
'struct target_type' is extended to have target direct_access interface.
This function limits direct accessible size to the dm_target's limit
with max_io_len().

Add dm_table_supports_dax() to iterate all targets and associated block
devices to check for DAX support.  To add DAX support to a DM target the
target must only implement the direct_access function.

Add a new dm type, DM_TYPE_DAX_BIO_BASED, which indicates that mapped
device supports DAX and is bio based.  This new type is used to assure
that all target devices have DAX support and remain that way after
QUEUE_FLAG_DAX is set in mapped device.

At initial table load, QUEUE_FLAG_DAX is set to mapped device when setting
DM_TYPE_DAX_BIO_BASED to the type.  Any subsequent table load to the
mapped device must have the same type, or else it fails per the check in
table_load().
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

545ed20e

M
Merge remote-tracking branch 'jens/for-4.8/core' into dm-4.8 · e9ccb945
由 Mike Snitzer 提交于 7月 20, 2016
```
DM's DAX support depends on block core's newly added QUEUE_FLAG_DAX.
```
e9ccb945

block: do not merge requests without consulting with io scheduler · 72ef799b

由 Tahsin Erdogan 提交于 7月 07, 2016

Before merging a bio into an existing request, io scheduler is called to
get its approval first. However, the requests that come from a plug
flush may get merged by block layer without consulting with io
scheduler.

In case of CFQ, this can cause fairness problems. For instance, if a
request gets merged into a low weight cgroup's request, high weight cgroup
now will depend on low weight cgroup to get scheduled. If high weigt cgroup
needs that io request to complete before submitting more requests, then it
will also lose its timeslice.

Following script demonstrates the problem. Group g1 has a low weight, g2
and g3 have equal high weights but g2's requests are adjacent to g1's
requests so they are subject to merging. Due to these merges, g2 gets
poor disk time allocation.

cat > cfq-merge-repro.sh << "EOF"
#!/bin/bash
set -e

IO_ROOT=/mnt-cgroup/io

mkdir -p $IO_ROOT

if ! mount | grep -qw $IO_ROOT; then
  mount -t cgroup none -oblkio $IO_ROOT
fi

cd $IO_ROOT

for i in g1 g2 g3; do
  if [ -d $i ]; then
    rmdir $i
  fi
done

mkdir g1 && echo 10 > g1/blkio.weight
mkdir g2 && echo 495 > g2/blkio.weight
mkdir g3 && echo 495 > g3/blkio.weight

RUNTIME=10

(echo $BASHPID > g1/cgroup.procs &&
 fio --readonly --name name1 --filename /dev/sdb \
     --rw read --size 64k --bs 64k --time_based \
     --runtime=$RUNTIME --offset=0k &> /dev/null)&

(echo $BASHPID > g2/cgroup.procs &&
 fio --readonly --name name1 --filename /dev/sdb \
     --rw read --size 64k --bs 64k --time_based \
     --runtime=$RUNTIME --offset=64k &> /dev/null)&

(echo $BASHPID > g3/cgroup.procs &&
 fio --readonly --name name1 --filename /dev/sdb \
     --rw read --size 64k --bs 64k --time_based \
     --runtime=$RUNTIME --offset=256k &> /dev/null)&

sleep $((RUNTIME+1))

for i in g1 g2 g3; do
  echo ---- $i ----
  cat $i/blkio.time
done

EOF
# ./cfq-merge-repro.sh
---- g1 ----
8:16 162
---- g2 ----
8:16 165
---- g3 ----
8:16 686

After applying the patch:

# ./cfq-merge-repro.sh
---- g1 ----
8:16 90
---- g2 ----
8:16 445
---- g3 ----
8:16 471
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

72ef799b

block: Fix spelling in a source code comment · 68bdf1ac

由 Bart Van Assche 提交于 7月 19, 2016

Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

68bdf1ac

block: expose QUEUE_FLAG_DAX in sysfs · ea6ca600

由 Yigal Korman 提交于 6月 23, 2016

Provides the ability to identify DAX enabled devices in userspace.
Signed-off-by: NYigal Korman <yigal@plexistor.com>
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Acked-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

ea6ca600

block: add QUEUE_FLAG_DAX for devices to advertise their DAX support · 163d4baa

由 Toshi Kani 提交于 6月 23, 2016

Currently, presence of direct_access() in block_device_operations
indicates support of DAX on its block device.  Because
block_device_operations is instantiated with 'const', this DAX
capablity may not be enabled conditinally.

In preparation for supporting DAX to device-mapper devices, add
QUEUE_FLAG_DAX to request_queue flags to advertise their DAX
support.  This will allow to set the DAX capability based on how
mapped device is composed.
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Acked-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <linux-s390@vger.kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

163d4baa

dm thin: fix a race condition between discarding and provisioning a block · 2a0fbffb

由 Joe Thornber 提交于 7月 01, 2016

The discard passdown was being issued after the block was unmapped,
which meant the block could be reprovisioned whilst the passdown discard
was still in flight.

We can only identify unshared blocks (safe to do a passdown a discard
to) once they're unmapped and their ref count hits zero. Block ref
counts are now used to guard against concurrent allocation of these
blocks that are being discarded. So now we unmap the block, issue
passdown discards, and the immediately increment ref counts for regions
that have been discarded via passed down (this is safe because
allocation occurs within the same thread). We then decrement ref counts
once the passdown discard IO is complete -- signaling these blocks may
now be allocated.

This fixes the potential for corruption that was reported here:
https://www.redhat.com/archives/dm-devel/2016-June/msg00311.htmlReported-by: NDennis Yang <dennisyang@qnap.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2a0fbffb

dm btree: fix a bug in dm_btree_find_next_single() · e7e0f730

由 Joe Thornber 提交于 7月 01, 2016

dm_btree_find_next_single() can short-circuit the search for a block
with a return of -ENODATA if all entries are higher than the search key
passed to lower_bound().

This hasn't been a problem because of the way the btree has been used by
DM thinp.  But it must be fixed now in preparation for fixing the race
in DM thinp's handling of simultaneous block discard vs allocation.
Otherwise, once that fix is in place, some of the blocks in a discard
would not be unmapped as expected.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e7e0f730

19 7月, 2016 28 次提交

dm raid: fix random optimal_io_size for raid0 · 89d3d9a1

由 Heinz Mauelshagen 提交于 7月 19, 2016

raid_io_hints() was retrieving the number of data stripes used for the
calculation of io_opt from struct r5conf, which is not defined for raid0
mappings.

Base the calculation on the in-core raid_set structure instead.

Also, adjust to use to_bytes() for the sector -> bytes conversion
throughout.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

89d3d9a1

dm raid: address checkpatch.pl complaints · 094f394d

由 Heinz Mauelshagen 提交于 7月 19, 2016

Use 'unsigned int' where appropriate.
Return negative errors.
Correct an indentation.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

094f394d

Btrfs: fix comparison in __btrfs_map_block() · df5c82a8

由 Vincent Stehlé 提交于 7月 15, 2016

Add missing comparison to op in expression, which was forgotten when doing
the REQ_OP transition.

Fixes: b3d3fa51 ("btrfs: update __btrfs_map_block for REQ_OP transition")
Signed-off-by: NVincent Stehlé <vincent.stehle@intel.com>
Reviewed-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

df5c82a8

dm: call PR reserve/unreserve on each underlying device · 9c72bad1

由 Christoph Hellwig 提交于 7月 08, 2016

So far we tried to rely on the SCSI 'all target ports' bit to register
all path, but for many setups this didn't work properly as the different
paths are seen as separate initiators to the target instead of multiple
ports of the same initiator.  Because of that we'll stop setting the
'all target ports' bit in SCSI, and let device mapper handle iterating
over the device for each path and register them manually.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9c72bad1

sd: don't use the ALL_TG_PT bit for reservations · 01f90dd9

由 Christoph Hellwig 提交于 7月 08, 2016

These only work if the we use the same initiator ID for all path,
which might not be true if we use different protocols, or even just
different HBAs.

Instead dm-mpath will grow support to register all path manually
later in this series.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Christie <mchristi@redhat.com>
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

01f90dd9

dm: fix second blk_delay_queue() parameter to be in msec units not jiffies · bd9f55ea

由 Tahsin Erdogan 提交于 7月 15, 2016

Commit d548b34b ("dm: reduce the queue delay used in dm_request_fn
from 100ms to 10ms") always intended the value to be 10 msecs -- it
just expressed it in jiffies because earlier commit 7eaceacc ("block:
remove per-queue plugging") did.
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Fixes: d548b34b ("dm: reduce the queue delay used in dm_request_fn from 100ms to 10ms")
Cc: stable@vger.kernel.org # 4.1+ -- stable@ backports must be applied to drivers/md/dm.c

bd9f55ea

H
dm raid: change logical functions to actually return bool · d7ccc2e2
由 Heinz Mauelshagen 提交于 7月 06, 2016
```
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
d7ccc2e2

dm raid: use rdev_for_each in status · 32682409

由 Heinz Mauelshagen 提交于 6月 30, 2016

Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

32682409

dm raid: use rs->raid_disks to avoid memory leaks on free · ffeeac75

由 Heinz Mauelshagen 提交于 6月 30, 2016

Also makes code more consistent throughout.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ffeeac75

dm raid: support delta_disks for raid1, fix table output · 7a7c330f

由 Heinz Mauelshagen 提交于 6月 30, 2016

Add "delta_disks" constructor argument support to raid1 to allow for
consistent userspace disk addition/removal handling.

Fix raid_status() to report all raid disks with status and table output
on disk adding reshapes, not just the ones listed on the mddev; optimize
its rebuild and writemostly output.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

7a7c330f

dm raid: enhance reshape check and factor out reshape setup · 469b304b

由 Heinz Mauelshagen 提交于 6月 29, 2016

Enhance rs_reshape_requested() check function to be more transparent and
fix its raid10 check.

Streamline the constructor by factoring out reshaping preparation into
fucntion rs_prepare_reshape().
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

469b304b

dm raid: allow resize during recovery · 2a5556c2

由 Heinz Mauelshagen 提交于 6月 27, 2016

Resizing a RAID set during recovery can be allowed, because the MD
resynchronization thread will either stop any ongoing recovery in case
of shrinking below the current recovery position or carry on recovery
to the new size if the set is growing.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2a5556c2

H
dm raid: fix rs_is_recovering() to allow for lvextend · 345a6cdc
由 Heinz Mauelshagen 提交于 6月 25, 2016
```
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
345a6cdc
H
dm raid: fix rebuild and catch bogus sync/resync flags · 37f10be1
由 Heinz Mauelshagen 提交于 6月 24, 2016
```
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
37f10be1

dm raid: fix ctr memory leaks on error paths · b1956dc4

由 Heinz Mauelshagen 提交于 6月 24, 2016

Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b1956dc4

dm raid: fix typo in write_mostly flag · 65359ee6

由 Heinz Mauelshagen 提交于 6月 24, 2016

Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

65359ee6

H
dm raid: also reject size change during recovery · 4348309a
由 Heinz Mauelshagen 提交于 6月 24, 2016
```
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
4348309a
H
dm raid: fix new superblock/bitmap creation on disk addition · f6895fd5
由 Heinz Mauelshagen 提交于 6月 24, 2016
```
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
f6895fd5

dm raid: add comments and fix typos · 2527b56e

由 Heinz Mauelshagen 提交于 6月 24, 2016

Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2527b56e

H
dm raid: fix raid10 device size error on out-of-place reshape · fbe6365b
由 Heinz Mauelshagen 提交于 6月 24, 2016
```
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
fbe6365b
H
dm raid: prohibit 'nosync' on new raid6 and reject resize during reshape · 2d92a3c2
由 Heinz Mauelshagen 提交于 6月 24, 2016
```
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
2d92a3c2

dm raid: clarify and fix recovery · 4dff2f1e

由 Heinz Mauelshagen 提交于 6月 24, 2016

Add function rs_setup_recovery() to allow for defined setup of RAID set
recovery in the constructor.

Will be called with dev_sectors={0, rdev->sectors, MaxSectors} to
recover a new or enforced sync, grown or not to be synhronized RAID set
respectively.

Prevents recovery on raid0, which doesn't support it.

Enforces recovery on raid6 to ensure properly defined Syndromes
mandatory for that MD personality are being created.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4dff2f1e

H
dm raid: fix rs_set_capacity on growing reshape · 0095dbc9
由 Heinz Mauelshagen 提交于 6月 24, 2016
```
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
0095dbc9
H
dm raid: make rs_set_capacity to work on shrinking reshape · 9d9d939c
由 Heinz Mauelshagen 提交于 6月 16, 2016
```
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
9d9d939c

dm raid: enhance comments in takeover checks · 6ee0bae9

由 Heinz Mauelshagen 提交于 6月 15, 2016

Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6ee0bae9

H
dm raid: remove bogus comment and fix comment typos · ae3c6cff
由 Heinz Mauelshagen 提交于 6月 15, 2016
```
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
ae3c6cff
H
dm raid: more restricting data_offset value checks · 75dd3b9e
由 Heinz Mauelshagen 提交于 6月 15, 2016
```
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
75dd3b9e

dm raid: reject too many write_mostly devices · 5fa146b2

由 Heinz Mauelshagen 提交于 6月 15, 2016

Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

5fa146b2

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功