提交 · ffcc39364160663cda1a3c358f4537302a92459b · openanolis / cloud-kernel

20 11月, 2014 3 次提交

dm: enhance internal suspend and resume interface · ffcc3936

由 Mike Snitzer 提交于 10月 28, 2014

Rename dm_internal_{suspend,resume} to dm_internal_{suspend,resume}_fast
-- dm-stats will continue using these methods to avoid all the extra
suspend/resume logic that is not needed in order to quickly flush IO.

Introduce dm_internal_suspend_noflush() variant that actually calls the
mapped_device's target callbacks -- otherwise target-specific hooks are
avoided (e.g. dm-thin's thin_presuspend and thin_postsuspend). Common
code between dm_internal_{suspend_noflush,resume} and
dm_{suspend,resume} was factored out as __dm_{suspend,resume}.

Update dm_internal_{suspend_noflush,resume} to always take and release
the mapped_device's suspend_lock. Also update dm_{suspend,resume} to be
aware of potential for DM_INTERNAL_SUSPEND_FLAG to be set and respond
accordingly by interruptibly waiting for the DM_INTERNAL_SUSPEND_FLAG to
be cleared. Add lockdep annotation to dm_suspend() and dm_resume().

The existing DM_SUSPEND_FLAG remains unchanged.
DM_INTERNAL_SUSPEND_FLAG is set by dm_internal_suspend_noflush() and
cleared by dm_internal_resume().

Both DM_SUSPEND_FLAG and DM_INTERNAL_SUSPEND_FLAG may be set if a device
was already suspended when dm_internal_suspend_noflush() was called --
this can be thought of as a "nested suspend". A "nested suspend" can
occur with legacy userspace dm-thin code that might suspend all active
thin volumes before suspending the pool for resize.

But otherwise, in the normal dm-thin-pool suspend case moving forward:
the thin-pool will have DM_SUSPEND_FLAG set and all active thins from
that thin-pool will have DM_INTERNAL_SUSPEND_FLAG set.

Also add DM_INTERNAL_SUSPEND_FLAG to status report. This new
DM_INTERNAL_SUSPEND_FLAG state is being reported to assist with
debugging (e.g. 'dmsetup info' will report an internally suspended
device accordingly).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>

ffcc3936

dm: add presuspend_undo hook to target_type · d67ee213

由 Mike Snitzer 提交于 10月 28, 2014

The DM thin-pool target now must undo the changes performed during
pool_presuspend() so introduce presuspend_undo hook in target_type.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>

d67ee213

dm: return earlier from dm_blk_ioctl if target doesn't implement .ioctl · 4d341d82

由 Mike Snitzer 提交于 11月 16, 2014

No point checking if the device is suspended if the current target
doesn't even implement .ioctl
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4d341d82

11 11月, 2014 4 次提交

dm: do not call dm_sync_table() when creating new devices · 41abc4e1

由 Hannes Reinecke 提交于 11月 05, 2014

When creating new devices dm_sync_table() calls
synchronize_rcu_expedited(), causing _all_ pending RCU pointers to be
flushed. This causes a latency overhead that is especially noticeable
when creating lots of devices.

And all of this is pointless as there are no old maps to be
disconnected, and hence no stale pointers which would need to be
cleared up.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

41abc4e1

dm: sparse: Annotate field with __rcu for checking · 6fa99520

由 Pranith Kumar 提交于 10月 28, 2014

Annotate the map field with __rcu since this is a rcu pointer which is checked
by sparse.
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6fa99520

dm: Use rcu_dereference() for accessing rcu pointer · 33423974

由 Pranith Kumar 提交于 10月 28, 2014

The map field in 'struct mapped_device' is an rcu pointer. Use rcu_dereference()
while accessing it.
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

33423974

dm: improve documentation and code clarity in dm_merge_bvec · 148e51ba

由 Mike Snitzer 提交于 10月 09, 2014

These code changes do not introduce a functional change.

But bio_add_page() will never attempt to build up a bio larger than
queue_max_sectors(). Similarly, bio_get_nr_vecs() is also bound by
queue_max_sectors(). Therefore, there is no point in allowing
dm_merge_bvec() to answer "how many sectors can a bio have at this
offset?" with anything larger than queue_max_sectors(). Using
queue_max_sectors() rather than BIO_MAX_SECTORS serves to more
accurately convey the limits that are being imposed.

Also, use unlikely() to clarify the fact that the defensive code in
dm_merge_bvec() relative to max_size going negative shouldn't ever
happen -- if it does happen there is a bug in the block layer for
requesting larger than dm_merge_bvec()'s initial response for a given
offset. Also, update a comment in dm_merge_bvec() relative to
max_hw_sectors_kb. And fix empty newline whitespace.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

148e51ba

06 10月, 2014 3 次提交

dm: allow active and inactive tables to share dm_devs · 86f1152b

由 Benjamin Marzinski 提交于 8月 13, 2014

Until this change, when loading a new DM table, DM core would re-open
all of the devices in the DM table. Now, DM core will avoid redundant
device opens (and closes when destroying the old table) if the old
table already has a device open using the same mode. This is achieved
by managing reference counts on the table_devices that DM core now
stores in the mapped_device structure (rather than in the dm_table
structure). So a mapped_device's active and inactive dm_tables' dm_dev
lists now just point to the dm_devs stored in the mapped_device's
table_devices list.

This improvement in DM core's device reference counting has the
side-effect of fixing a long-standing limitation of the multipath
target: a DM multipath table couldn't include any paths that were unusable
(failed). For example: if all paths have failed and you add a new,
working, path to the table; you can't use it since the table load would
fail due to it still containing failed paths. Now a re-load of a
multipath table can include failed devices and when those devices become
active again they can be used instantly.

The device list code in dm.c isn't a straight copy/paste from the code in
dm-table.c, but it's very close (aside from some variable renames). One
subtle difference is that find_table_device for the tables_devices list
will only match devices with the same name and mode. This is because we
don't want to upgrade a device's mode in the active table when an
inactive table is loaded.

Access to the mapped_device structure's tables_devices list requires a
mutex (tables_devices_lock), so that tables cannot be created and
destroyed concurrently.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

86f1152b

dm: use bioset_create_nobvec() · 3d8aab2d

由 Junichi Nomura 提交于 10月 03, 2014

Since DM core uses bio_clone_fast() for both bio-based and request-based
DM devices there is no need for DM's bioset to have a bvec mempool.

With this patch, on arch with 4KB page for example, memory usage will be
reduced by 64KB for each bio-based DM device and 1MB for each
request-based DM device.

For example, when you create 10,000 bio-based DM devices and 1,000
request-based DM devices, memory usage of biovec under no load is:
  # grep biovec /proc/slabinfo

  biovec-256        418068 418068   4096  ...
  biovec-128             0      0   2048  ...
  biovec-64              0      0   1024  ...
  biovec-16              0      0    256  ...

With this patch series applied, the usage becomes:
  # grep biovec /proc/slabinfo

  biovec-256           116    116   4096  ...
  biovec-128             0      0   2048  ...
  biovec-64              0      0   1024  ...
  biovec-16              0      0    256  ...

So 4096 * (418068 - 116) = 1.6GB of memory is saved in this example.
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

3d8aab2d

dm: remove nr_iovecs parameter from alloc_tio() · 99778273

由 Junichi Nomura 提交于 10月 03, 2014

alloc_tio() uses bio_alloc_bioset() to allocate a clone-bio for a bio.
alloc_tio() takes the number of bvecs to allocate for the clone-bio.
However, with v3.14's immutable biovec changes DM now uses
__bio_clone_fast() and no longer needs to allocate bvecs.

In practice, the 'nr_iovecs' passed to alloc_tio() is always effectively
0.  __clone_and_map_simple_bio() looked like it was passing non-zero
nr_iovecs, but its value was always within the range of inline bvecs and
no allocation actually happened.  If allocation happened, the BUG_ON() in
__bio_clone_fast() would've triggered.

Remove the nr_iovecs parameter from alloc_tio() to prevent possible
future bio_alloc_bioset() mis-use of a new bioset interface that will no
longer allow bvecs to be allocated.

Also fix extra whitespace before the __bio_clone_fast() call in
__clone_and_map_simple_bio().
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

99778273

11 7月, 2014 1 次提交

dm: allocate a special workqueue for deferred device removal · acfe0ad7

由 Mikulas Patocka 提交于 6月 14, 2014

The commit 2c140a24 ("dm: allow remove to be deferred") introduced a
deferred removal feature for the device mapper.  When this feature is
used (by passing a flag DM_DEFERRED_REMOVE to DM_DEV_REMOVE_CMD ioctl)
and the user tries to remove a device that is currently in use, the
device will be removed automatically in the future when the last user
closes it.

Device mapper used the system workqueue to perform deferred removals.
However, some targets (dm-raid1, dm-mpath, dm-stripe) flush work items
scheduled for the system workqueue from their destructor.  If the
destructor itself is called from the system workqueue during deferred
removal, it introduces a possible deadlock - the workqueue tries to flush
itself.

Fix this possible deadlock by introducing a new workqueue for deferred
removals.  We allocate just one workqueue for all dm targets.  The
ability of dm targets to process IOs isn't dependent on deferred removal
of unused targets, so a deadlock due to shared workqueue isn't possible.

Also, cleanup local_init() to eliminate potential for returning success
on failure.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.13+

acfe0ad7

04 6月, 2014 4 次提交

dm: remove symbol export for dm_set_device_limits · 11f0431b

由 Mike Snitzer 提交于 6月 03, 2014

There is no need for code other than DM core to use dm_set_device_limits
so remove its EXPORT_SYMBOL_GPL. Also, cleanup a couple whitespace nits.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

11f0431b

dm: disable WRITE SAME if it fails · 7eee4ae2

由 Mike Snitzer 提交于 6月 02, 2014

Add DM core support for disabling WRITE SAME on first failure to both
request-based and bio-based targets. The need to disable WRITE SAME
stems from SCSI enabling it by default but then disabling it when it
fails. When SCSI does this it returns "permanent target failure, do
not retry" using -EREMOTEIO. Update DM core to only disable WRITE SAME
on failure if the returned error is -EREMOTEIO.

Commit f84cb8a4 ("dm mpath: disable WRITE SAME if it fails")
implemented multipath specific disabling of WRITE SAME if it fails.
However, as that commit detailed, the multipath-only solution doesn't go
far enough if bio-based DM targets are stacked ontop of the
request-based dm-multipath target (as is commonly done using dm-linear
to support partitions on multipath devices, via kpartx).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Tested-by: NAlex Chen <alex.chen@huawei.com>

7eee4ae2

dm: introduce dm_accept_partial_bio · 1dd40c3e

由 Mikulas Patocka 提交于 3月 14, 2014

The function dm_accept_partial_bio allows the target to specify how many
sectors of the current bio it will process.  If the target only wants to
accept part of the bio, it calls dm_accept_partial_bio and the DM core
sends the rest of the data in next bio.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

1dd40c3e

dm: change sector_count member in clone_info from sector_t to unsigned · e0d6609a

由 Mikulas Patocka 提交于 3月 14, 2014

It is impossible to create bios with 2^23 or more sectors (the size is
stored as a 32-bit byte count in the bio). So we convert some sector_t
values to unsigned integers.

This is needed for the next commit ("dm: introduce
dm_accept_partial_bio") that replaces integer value arguments with
pointers, so the size of the integer must match.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e0d6609a

18 4月, 2014 1 次提交

arch: Mass conversion of smp_mb__*() · 4e857c58

由 Peter Zijlstra 提交于 3月 17, 2014

Mostly scripted conversion of the smp_mb__* barriers.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

4e857c58

16 4月, 2014 1 次提交

block: remove struct request buffer member · b4f42e28

由 Jens Axboe 提交于 4月 10, 2014

This was used in the olden days, back when onions were proper
yellow. Basically it mapped to the current buffer to be
transferred. With highmem being added more than a decade ago,
most drivers map pages out of a bio, and rq->buffer isn't
pointing at anything valid.

Convert old style drivers to just use bio_data().

For the discard payload use case, just reference the page
in the bio.
Signed-off-by: NJens Axboe <axboe@fb.com>

b4f42e28

28 3月, 2014 4 次提交

dm table: add dm_table_run_md_queue_async · 9974fa2c

由 Mike Snitzer 提交于 2月 28, 2014

Introduce dm_table_run_md_queue_async() to run the request_queue of the
mapped_device associated with a request-based DM table.

Also add dm_md_get_queue() wrapper to extract the request_queue from a
mapped_device.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>

9974fa2c

dm: use RCU_INIT_POINTER instead of rcu_assign_pointer in __unbind · 9cdb8520

由 Monam Agarwal 提交于 3月 23, 2014

Replace rcu_assign_pointer(p, NULL) with RCU_INIT_POINTER(p, NULL).

The rcu_assign_pointer() ensures that the initialization of a structure
is carried out before storing a pointer to that structure.  And in the
case of the NULL pointer, there is no structure to initialize.  So,
rcu_assign_pointer(p, NULL) can be safely converted to
RCU_INIT_POINTER(p, NULL).
Signed-off-by: NMonam Agarwal <monamagarwal123@gmail.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9cdb8520

dm: stop using bi_private · bfc6d41c

由 Mikulas Patocka 提交于 3月 04, 2014

Device mapper uses the bio structure's bi_private field as a pointer
to dm_target_io or dm_rq_clone_bio_info.  But a bio structure is
embedded in the dm_target_io and dm_rq_clone_bio_info structures, so the
pointer to the structure that contains the bio can be found with the
container_of() macro.

Remove the use of bi_private and use container_of() instead.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bfc6d41c

dm: remove dm_get_mapinfo · d70ab4fb

由 Mikulas Patocka 提交于 3月 03, 2014

Remove dm_get_mapinfo() because no target uses it.  Targets can allocate
per-bio data using ti->per_bio_data_size, this is much more flexible
than union map_info.

Leave union map_info only for the request-based multipath target's use.
Also delete the unused "unsigned long long ll" field of union map_info.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d70ab4fb

15 1月, 2014 1 次提交

dm sysfs: fix a module unload race · 2995fa78

由 Mikulas Patocka 提交于 1月 13, 2014

This reverts commit be35f486 ("dm: wait until embedded kobject is
released before destroying a device") and provides an improved fix.

The kobject release code that calls the completion must be placed in a
non-module file, otherwise there is a module unload race (if the process
calling dm_kobject_release is preempted and the DM module unloaded after
the completion is triggered, but before dm_kobject_release returns).

To fix this race, this patch moves the completion code to dm-builtin.c
which is always compiled directly into the kernel if BLK_DEV_DM is
selected.

The patch introduces a new dm_kobject_holder structure, its purpose is
to keep the completion and kobject in one place, so that it can be
accessed from non-module code without the need to export the layout of
struct mapped_device to that code.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

2995fa78

08 1月, 2014 2 次提交

dm: wait until embedded kobject is released before destroying a device · be35f486

由 Mikulas Patocka 提交于 1月 06, 2014

There may be other parts of the kernel holding a reference on the dm
kobject.  We must wait until all references are dropped before
deallocating the mapped_device structure.

The dm_kobject_release method signals that all references are dropped
via completion.  But dm_kobject_release doesn't free the kobject (which
is embedded in the mapped_device structure).

This is the sequence of operations:
* when destroying a DM device, call kobject_put from dm_sysfs_exit
* wait until all users stop using the kobject, when it happens the
  release method is called
* the release method signals the completion and should return without
  delay
* the dm device removal code that waits on the completion continues
* the dm device removal code drops the dm_mod reference the device had
* the dm device removal code frees the mapped_device structure that
  contains the kobject

Using kobject this way should avoid the module unload race that was
mentioned at the beginning of this thread:
https://lkml.org/lkml/2014/1/4/83Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

be35f486

dm: remove pointless kobject comparison in dm_get_from_kobject · 1ddd641d

由 Mikulas Patocka 提交于 1月 06, 2014

The comparison is always true and the compiler optimizes it out anyway.

Milan offered additional context relative to the original commit
784aae73 ("dm: add name and uuid to sysfs") which introduced the code:
"I think it is just relict of some experiments before I committed this
simple embedded sysfs kobj handling".
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Acked-by: NMilan Broz <gmazyland@gmail.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

1ddd641d

24 11月, 2013 2 次提交

dm: Refactor for new bio cloning/splitting · 1c3b13e6

由 Kent Overstreet 提交于 10月 29, 2013

We need to convert the dm code to the new bvec_iter primitives which
respect bi_bvec_done; they also allow us to drastically simplify dm's
bio splitting code.

Also, it's no longer necessary to save/restore the bvec array anymore -
driver conversions for immutable bvecs are done, so drivers should never
be modifying it.

Also kill bio_sector_offset(), dm was the only user and it doesn't make
much sense anymore.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Reviewed-by: NMike Snitzer <snitzer@redhat.com>

1c3b13e6

block: Abstract out bvec iterator · 4f024f37

由 Kent Overstreet 提交于 10月 11, 2013

Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@inktank.com>
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: Benny Halevy <bhalevy@tonian.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchand@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Peng Tao <tao.peng@emc.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Pankaj Kumar <pankaj.km@samsung.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>6

4f024f37

10 11月, 2013 1 次提交

dm: allow remove to be deferred · 2c140a24

由 Mikulas Patocka 提交于 11月 01, 2013

This patch allows the removal of an open device to be deferred until
it is closed.  (Previously such a removal attempt would fail.)

The deferred remove functionality is enabled by setting the flag
DM_DEFERRED_REMOVE in the ioctl structure on DM_DEV_REMOVE or
DM_REMOVE_ALL ioctl.

On return from DM_DEV_REMOVE, the flag DM_DEFERRED_REMOVE indicates if
the device was removed immediately or flagged to be removed on close -
if the flag is clear, the device was removed.

On return from DM_DEV_STATUS and other ioctls, the flag
DM_DEFERRED_REMOVE is set if the device is scheduled to be removed on
closure.

A device that is scheduled to be deleted can be revived using the
message "@cancel_deferred_remove". This message clears the
DMF_DEFERRED_REMOVE flag so that the device won't be deleted on close.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2c140a24

23 9月, 2013 3 次提交

dm: add reserved_bio_based_ios module parameter · e8603136

由 Mike Snitzer 提交于 9月 12, 2013

Allow user to change the number of IOs that are reserved by
bio-based DM's mempools by writing to this file:
/sys/module/dm_mod/parameters/reserved_bio_based_ios

The default value is RESERVED_BIO_BASED_IOS (16).  The maximum allowed
value is RESERVED_MAX_IOS (1024).

Export dm_get_reserved_bio_based_ios() for use by DM targets and core
code.  Switch to sizing dm-io's mempool and bioset using DM core's
configurable 'reserved_bio_based_ios'.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NFrank Mayhar <fmayhar@google.com>

e8603136

dm: add reserved_rq_based_ios module parameter · f4790826

由 Mike Snitzer 提交于 9月 12, 2013

Allow user to change the number of IOs that are reserved by
request-based DM's mempools by writing to this file:
/sys/module/dm_mod/parameters/reserved_rq_based_ios

The default value is RESERVED_REQUEST_BASED_IOS (256).  The maximum
allowed value is RESERVED_MAX_IOS (1024).

Export dm_get_reserved_rq_based_ios() for use by DM targets and core
code.  Switch to sizing dm-mpath's mempool using DM core's configurable
'reserved_rq_based_ios'.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NFrank Mayhar <fmayhar@google.com>
Acked-by: NMikulas Patocka <mpatocka@redhat.com>

f4790826

dm: lower bio-based mempool reservation · 6cfa5857

由 Mike Snitzer 提交于 9月 12, 2013

Bio-based device mapper processing doesn't need larger mempools (like
request-based DM does), so lower the number of reserved entries for
bio-based operation.  16 was already used for bio-based DM's bioset
but mistakenly wasn't used for it's _io_cache.

Formalize difference between bio-based and request-based defaults by
introducing RESERVED_BIO_BASED_IOS and RESERVED_REQUEST_BASED_IOS.

(based on older code from Mikulas Patocka)
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NFrank Mayhar <fmayhar@google.com>
Acked-by: NMikulas Patocka <mpatocka@redhat.com>

6cfa5857

20 9月, 2013 1 次提交

dm mpath: disable WRITE SAME if it fails · f84cb8a4

由 Mike Snitzer 提交于 9月 19, 2013

Workaround the SCSI layer's problematic WRITE SAME heuristics by
disabling WRITE SAME in the DM multipath device's queue_limits if an
underlying device disabled it.

The WRITE SAME heuristics, with both the original commit 5db44863
("[SCSI] sd: Implement support for WRITE SAME") and the updated commit
66c28f97 ("[SCSI] sd: Update WRITE SAME heuristics"), default to enabling
WRITE SAME(10) even without successfully determining it is supported.
After the first failed WRITE SAME the SCSI layer will disable WRITE SAME
for the device (by setting sdkp->device->no_write_same which results in
'max_write_same_sectors' in device's queue_limits to be set to 0).

When a device is stacked ontop of such a SCSI device any changes to that
SCSI device's queue_limits do not automatically propagate up the stack.
As such, a DM multipath device will not have its WRITE SAME support
disabled.  This causes the block layer to continue to issue WRITE SAME
requests to the mpath device which causes paths to fail and (if mpath IO
isn't configured to queue when no paths are available) it will result in
actual IO errors to the upper layers.

This fix doesn't help configurations that have additional devices
stacked ontop of the mpath device (e.g. LVM created linear DM devices
ontop).  A proper fix that restacks all the queue_limits from the bottom
of the device stack up will need to be explored if SCSI will continue to
use this model of optimistically allowing op codes and then disabling
them after they fail for the first time.

Before this patch:

EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121)
device-mapper: multipath: XXX snitm debugging: failing WRITE SAME IO with error=-121
end_request: critical target error, dev dm-6, sector 528
dm-6: WRITE SAME failed. Manually zeroing.
device-mapper: multipath: Failing path 8:112.
end_request: I/O error, dev dm-6, sector 4616
dm-6: WRITE SAME failed. Manually zeroing.
end_request: I/O error, dev dm-6, sector 4616
end_request: I/O error, dev dm-6, sector 5640
end_request: I/O error, dev dm-6, sector 6664
end_request: I/O error, dev dm-6, sector 7688
end_request: I/O error, dev dm-6, sector 524288
Buffer I/O error on device dm-6, logical block 65536
lost page write due to I/O error on dm-6
JBD2: Error -5 detected when updating journal superblock for dm-6-8.
end_request: I/O error, dev dm-6, sector 524296
Aborting journal on device dm-6-8.
end_request: I/O error, dev dm-6, sector 524288
Buffer I/O error on device dm-6, logical block 65536
lost page write due to I/O error on dm-6
JBD2: Error -5 detected when updating journal superblock for dm-6-8.

# cat /sys/block/sdh/queue/write_same_max_bytes
0
# cat /sys/block/dm-6/queue/write_same_max_bytes
33553920

After this patch:

EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121)
device-mapper: multipath: XXX snitm debugging: WRITE SAME I/O failed with error=-121
end_request: critical target error, dev dm-6, sector 528
dm-6: WRITE SAME failed. Manually zeroing.

# cat /sys/block/sdh/queue/write_same_max_bytes
0
# cat /sys/block/dm-6/queue/write_same_max_bytes
0

It should be noted that WRITE SAME support wasn't enabled in DM
multipath until v3.10.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org # 3.10+

f84cb8a4

06 9月, 2013 2 次提交

dm: add statistics support · fd2ed4d2

由 Mikulas Patocka 提交于 8月 16, 2013

Support the collection of I/O statistics on user-defined regions of
a DM device.  If no regions are defined no statistics are collected so
there isn't any performance impact.  Only bio-based DM devices are
currently supported.

Each user-defined region specifies a starting sector, length and step.
Individual statistics will be collected for each step-sized area within
the range specified.

The I/O statistics counters for each step-sized area of a region are
in the same format as /sys/block/*/stat or /proc/diskstats but extra
counters (12 and 13) are provided: total time spent reading and
writing in milliseconds.  All these counters may be accessed by sending
the @stats_print message to the appropriate DM device via dmsetup.

The creation of DM statistics will allocate memory via kmalloc or
fallback to using vmalloc space.  At most, 1/4 of the overall system
memory may be allocated by DM statistics.  The admin can see how much
memory is used by reading
/sys/module/dm_mod/parameters/stats_current_allocated_bytes

See Documentation/device-mapper/statistics.txt for more details.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

fd2ed4d2

dm ioctl: increase granularity of type_lock when loading table · 00c4fc3b

由 Mike Snitzer 提交于 8月 27, 2013

Hold the mapped device's type_lock before calling populate_table() since
it is where the table's type is determined based on the specified
targets.  There is no need to allow concurrent table loads to race to
establish the table's targets or type.

This eliminates the need to grab the lock in dm_table_set_type().

Also verify that the type_lock is held in both dm_set_md_type() and
dm_get_md_type().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

00c4fc3b

23 8月, 2013 1 次提交

dm: stop using WQ_NON_REENTRANT · 670368a8

由 Tejun Heo 提交于 7月 30, 2013

dbf2576e ("workqueue: make all workqueues non-reentrant") made
WQ_NON_REENTRANT no-op and the flag is going away.  Remove its usages.

This patch doesn't introduce any behavior changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>

670368a8

11 7月, 2013 3 次提交

dm: optimize reorder structure · 2a7faeb1

由 Mikulas Patocka 提交于 7月 10, 2013

This reorder actually improves performance by 20% (from 39.1s to 32.8s)
on x86-64 quad core Opteron.

I have no explanation for this, possibly it makes some other entries are
better cache-aligned.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

2a7faeb1

dm: optimize use SRCU and RCU · 83d5e5b0

由 Mikulas Patocka 提交于 7月 10, 2013

This patch removes "io_lock" and "map_lock" in struct mapped_device and
"holders" in struct dm_table and replaces these mechanisms with
sleepable-rcu.

Previously, the code would call "dm_get_live_table" and "dm_table_put" to
get and release table. Now, the code is changed to call "dm_get_live_table"
and "dm_put_live_table". dm_get_live_table locks sleepable-rcu and
dm_put_live_table unlocks it.

dm_get_live_table_fast/dm_put_live_table_fast can be used instead of
dm_get_live_table/dm_put_live_table. These *_fast functions use
non-sleepable RCU, so the caller must not block between them.

If the code changes active or inactive dm table, it must call
dm_sync_table before destroying the old table.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

83d5e5b0

dm mpath: fix ioctl deadlock when no paths · 6c182cd8

由 Hannes Reinecke 提交于 7月 10, 2013

When multipath needs to retry an ioctl the reference to the
current live table needs to be dropped. Otherwise a deadlock
occurs when all paths are down:
- dm_blk_ioctl takes a reference to the current table
  and spins in multipath_ioctl().
- A new table is being loaded, but upon resume the process
  hangs in dm_table_destroy() waiting for references to
  drop to zero.

With this patch the reference to the old table is dropped
prior to retry, thereby avoiding the deadlock.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

6c182cd8

07 5月, 2013 1 次提交

block_device_operations->release() should return void · db2a144b

由 Al Viro 提交于 5月 05, 2013

The value passed is 0 in all but "it can never happen" cases (and those
only in a couple of drivers) *and* it would've been lost on the way
out anyway, even if something tried to pass something meaningful.
Just don't bother.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

db2a144b

19 4月, 2013 1 次提交

Revert "block: add missing block_bio_complete() tracepoint" · 0a82a8d1

由 Linus Torvalds 提交于 4月 18, 2013

This reverts commit 3a366e61.

Wanlong Gao reports that it causes a kernel panic on his machine several
minutes after boot. Reverting it removes the panic.

Jens says:
 "It's not quite clear why that is yet, so I think we should just revert
  the commit for 3.9 final (which I'm assuming is pretty close).

  The wifi is crap at the LSF hotel, so sending this email instead of
  queueing up a revert and pull request."
Reported-by: NWanlong Gao <gaowanlong@cn.fujitsu.com>
Requested-by: NJens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0a82a8d1

02 3月, 2013 1 次提交

dm: add target num_write_bios fn · b0d8ed4d

由 Alasdair G Kergon 提交于 3月 01, 2013

Add a num_write_bios function to struct target.

If an instance of a target sets this, it will be queried before the
target's mapping function is called on a write bio, and the response
controls the number of copies of the write bio that the target will
receive.

This provides a convenient way for a target to send the same data to
more than one device.  The new cache target uses this in writethrough
mode, to send the data both to the cache and the backing device.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

b0d8ed4d

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功