提交 · 11a68244e16b0c35e122dd55b4e7c595e0fb67a1 · openeuler / raspberrypi-kernel

11 12月, 2009 40 次提交

dm: refactor request based completion functions · 11a68244

由 Kiyoshi Ueda 提交于 12月 10, 2009

This patch factors out the clone completion code, dm_done(),
from dm_softirq_done() in preparation for a subsequent patch.
No functional change.

dm_done() will be used in barrier completion, which can't use and
doesn't need softirq.  The softirq_done callback needs to get a clone
from an original request but it can't in the case of barrier, where
an original request is shared by multiple clones.  On the other hand,
the completion of barrier clones doesn't involve re-submitting requests,
which was the primary reason of the need for softirq.
Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

11a68244

dm: use md pending for in flight IO counting · b4324fee

由 Kiyoshi Ueda 提交于 12月 10, 2009

This patch changes the counter for the number of in_flight I/Os
to md->pending from q->in_flight in preparation for a later patch.
No functional change.

Request-based dm used q->in_flight to count the number of in-flight
clones assuming the counter is always incremented for an in-flight
original request and original:clone is 1:1 relationship.
However, it this no longer true for barrier requests.
So use md->pending to count the number of in-flight clones.
Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

b4324fee

dm: simplify request based suspend · 9f518b27

由 Kiyoshi Ueda 提交于 12月 10, 2009

The semantics of bio-based dm were changed recently in the case of
suspend with "--nolockfs" but without "--noflush".
Before 2.6.30, I/Os submitted before the suspend invocation were always
flushed.  From 2.6.30 onwards, I/Os submitted before the suspend
invocation might not be flushed.  (For details, see
http://marc.info/?t=123994433400003&r=1&w=2)

This patch brings the behaviour of request-based dm into line with
bio-based dm, simplifying the code and preparing for a subsequent patch
that will wait for all in_flight I/Os to complete without stopping
request_queue and use dm_wait_for_completion() for it.

This change in semantics simplifies the suspend code as follows:
  o Suspend is implemented as stopping request_queue
    in request-based dm, and all I/Os are queued in the request_queue
    even after suspend is invoked.
  o In the old semantics, we had to track whether I/Os were
    queued before or after the suspend invocation, so a special
    barrier-like request called 'suspend marker' was introduced.
  o With the new semantics, we don't need to flush any I/O
    so we can remove the marker and the code related to the marker
    handling and I/O flushing.

After removing this codes, the suspend sequence is now:
  1. Flush all I/Os by lock_fs() if needed.
  2. Stop dispatching any I/O by stopping the request_queue.
  3. Wait for all in-flight I/Os to be completed or requeued.
Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

9f518b27

dm: abstract clone_rq · 6facdaff

由 Kiyoshi Ueda 提交于 12月 10, 2009

This patch factors out the request cloning code in dm_prep_fn()
as clone_rq().  No functional change.

This patch is a preparation for a later patch in this series which needs to
make clones from an original barrier request.
Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

6facdaff

dm: pass gfp_mask to alloc_rq_tio · 08885643

由 Kiyoshi Ueda 提交于 12月 10, 2009

This patch adds the gfp_mask argument to alloc_rq_tio().
No functional change.

This patch is a preparation for a later patch in this series which needs to
allocate tio (for barrier I/O) with different allocation flag (GFP_NOIO) from
the one in the normal I/O code path.
Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

08885643

dm: use clone in map_request function · 598de409

由 Kiyoshi Ueda 提交于 12月 10, 2009

This patch changes the argument of map_request() to clone request
from original request.  No functional change.

This patch is a preparation for PATCH 9, which needs to use
map_request() for clones sharing an original barrier request.
Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

598de409

dm: abstract dm_in_flight function · 90abb8c4

由 Kiyoshi Ueda 提交于 12月 10, 2009

This patch adds md_in_flight() to get the number of in_flight I/Os.
No functional change.

This patch is a preparation for a later patch in this series, which
changes I/O counter to md->pending from q->in_flight in request-based dm.
Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

90abb8c4

dm kcopyd: accept zero size jobs · 9ca170a3

由 Mikulas Patocka 提交于 12月 10, 2009

dm-kcopyd: accept zero-size jobs

This patch changes dm-kcopyd so that it accepts zero-size jobs and completes
them immediatelly via its completion thread.

It is needed for multisnapshots snapshot resizing. When we are writing to
a chunk beyond origin end, no copying is done. To simplify the code, we submit
an empty request to kcopyd and let kcopyd complete it. If we didn't submit
a request to kcopyd and called the completion routine immediatelly, it would
violate the principle that completion is called only from one thread and
it would need additional locking.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

9ca170a3

dm snapshot: track suspended state in target · c26655ca

由 Mike Snitzer 提交于 12月 10, 2009

Keep track of whether or not the device is suspended within the snapshot
target module, the same as we do in dm-raid1.

We will use this later to enforce the correct sequence of ioctls to
transfer the in-core exceptions from a snapshot target instance in
one table to a replacement one capable of merging them back
into the origin.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

c26655ca

dm snapshot: move cow ref from exception store to snap core · fc56f6fb

由 Mike Snitzer 提交于 12月 10, 2009

Store the reference to the snapshot cow device in the core snapshot
code instead of each exception store.  It can be accessed through the
new function dm_snap_cow().  Exception stores should each now maintain a
reference to their parent snapshot struct.

This is cleaner and makes part of the forthcoming snapshot merge code simpler.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Reviewed-by: NJonathan Brassow <jbrassow@redhat.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>

fc56f6fb

dm snapshot: add allocated metadata to snapshot status · 985903bb

由 Mike Snitzer 提交于 12月 10, 2009

Add number of sectors used by metadata to the end of the snapshot's status
line.

Renamed dm_exception_store_type's 'fraction_full' to 'usage'.  Renamed
arguments to be clearer about what is being returned.  Also added
'metadata_sectors'.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

985903bb

dm snapshot: rename exception functions · 3510cb94

由 Jon Brassow 提交于 12月 10, 2009

Rename exception functions.  Preparing to pull them out of
dm-snap.c for broader use.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

3510cb94

dm snapshot: rename exception_table to dm_exception_table · 191437a5

由 Jon Brassow 提交于 12月 10, 2009

Rename exception_table for broader use outside dm-snap.c
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

191437a5

dm snapshot: rename dm_snap_exception to dm_exception · 1d4989c8

由 Jon Brassow 提交于 12月 10, 2009

The exception structure is not necessarily just a snapshot
element (especially after we pull it out of dm-snap.c).

Renaming appropriately.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

1d4989c8

dm snapshot: consolidate insert exception functions · d32a6ea6

由 Jon Brassow 提交于 12月 10, 2009

Consolidate the insert_*exception functions.  'insert_completed_exception'
already contains all the logic to handle 'insert_exception' (via
check for a hash_shift of 0), so remove redundant function.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

d32a6ea6

dm snapshot: abstract minimum_chunk_size fn · 7e201b35

由 Mikulas Patocka 提交于 12月 10, 2009

The origin needs to find minimum chunksize of all snapshots.  This logic is
moved to a separate function because it will be used at another place in
the snapshot merge patches.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

7e201b35

dm snapshot: simplify sector_to_chunk expression · 102c6ddb

由 Mikulas Patocka 提交于 12月 10, 2009

Removed unnecessary 'and' masking: The right shift discards the lower
bits so there is no need to clear them.

(A later patch needs this change to support a 32-bit chunk_mask.)
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

102c6ddb

dm snapshot: avoid else clause in persistent_read_metadata · f5acc834

由 Jon Brassow 提交于 12月 10, 2009

Minor code touch-up.  We don't need the 'else'.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Reviewed-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

f5acc834

dm ioctl: prefer strlcpy over strncpy · a518b86d

由 Roel Kluin 提交于 12月 10, 2009

strlcpy() will always null terminate the string.

    The code should already guarantee this as the last bytes are already
    NULs and the string lengths were restricted before being stored in
    hc.  Removing the '-1' becomes necessary so strlcpy() doesn't
    lose the last character of a maximum-length string.
	- agk
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a518b86d

dm raid1: explicitly initialise bio_lists · 5339fc2d

由 Mikulas Patocka 提交于 12月 10, 2009

Explicitly initialize bio lists instead of relying on kzalloc.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NTakahiro Yasui <tyasui@redhat.com>
Tested-by: NTakahiro Yasui <tyasui@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

5339fc2d

dm raid1: hold all write bios when leg fails · 929be8fc

由 Mikulas Patocka 提交于 12月 10, 2009

Hold all write bios when leg fails and errors are handled

When using a userspace daemon such as dmeventd to handle errors, we must
delay completing  bios until it has done its job.
This patch prevents the following race:
  - primary leg fails
  - write "1" fail, the write is held, secondary leg is set default
  - write "2" goes straight to the secondary leg
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NTakahiro Yasui <tyasui@redhat.com>
Tested-by: NTakahiro Yasui <tyasui@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

929be8fc

dm raid1: hold write bios when errors are handled · 60f355ea

由 Mikulas Patocka 提交于 12月 10, 2009

Hold all write bios when errors are handled.

Previously the failures list was used only when handling errors with
a userspace daemon such as dmeventd.  Now, it is always used for all bios.
The regions where some writes failed must be marked as nosync. This can only
be done in process context (i.e. in raid1 workqueue), not in the
write_callback function.

Previously the write would succeed if writing to at least one leg
succeeded.  This is wrong because data from the failed leg may be
replicated to the correct leg.  Now, if using a userspace daemon, the
write with some failures will be held until the daemon has done its job
and reconfigured the array.  If not using a daemon, the write still
succeeds if at least one leg succeeds. This is bad, but it is consistent
with current behavior.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NTakahiro Yasui <tyasui@redhat.com>
Tested-by: NTakahiro Yasui <tyasui@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

60f355ea

dm raid1: remove bio_endio from dm_rh_mark_nosync · c58098be

由 Mikulas Patocka 提交于 12月 10, 2009

Move bio completion out of dm_rh_mark_nosync in preparation for the
next patch.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NTakahiro Yasui <tyasui@redhat.com>
Tested-by: NTakahiro Yasui <tyasui@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

c58098be

dm raid1: abstract get_valid_mirror function · 87968ddd

由 Mikulas Patocka 提交于 12月 10, 2009

Move the logic to get a valid mirror leg into a function for re-use
in a later patch.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NTakahiro Yasui <tyasui@redhat.com>
Tested-by: NTakahiro Yasui <tyasui@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

87968ddd

dm raid1: use hold framework in do_failures · 0f398a84

由 Mikulas Patocka 提交于 12月 10, 2009

Use the hold framework in do_failures.

This patch doesn't change the bio processing logic, it just simplifies
failure handling and avoids periodically polling the failures list.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NTakahiro Yasui <tyasui@redhat.com>
Tested-by: NTakahiro Yasui <tyasui@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

0f398a84

dm raid1: add framework to hold bios during suspend · 04788507

由 Mikulas Patocka 提交于 12月 10, 2009

Add framework to delay bios until a suspend and then resubmit them with
either DM_ENDIO_REQUEUE (if the suspend was noflush) or complete them
with -EIO.  I/O barrier support will use this.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NTakahiro Yasui <tyasui@redhat.com>
Tested-by: NTakahiro Yasui <tyasui@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

04788507

dm raid1: report flush errors separately in status · 64b30c46

由 Mikulas Patocka 提交于 12月 10, 2009

Report flush errors as 'F' instead of 'D' for log and mirror devices.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

64b30c46

dm raid1: implement mirror_flush · c0da3748

由 Mikulas Patocka 提交于 12月 10, 2009

Implement flush callee. It uses dm_io to send zero-size barrier synchronously
and concurrently to all the mirror legs.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

c0da3748

dm log: use flush callback fn · 076010e2

由 Mikulas Patocka 提交于 12月 10, 2009

Call the flush callback from the log.

If flush failed, we have no alternative but to mark the whole log as dirty.
Also we set the variable flush_failed to prevent any bits ever being marked as
clean again.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

076010e2

dm log: add flush callback fn · 87a8f240

由 Mikulas Patocka 提交于 12月 10, 2009

Introduce a callback pointer from the log to dm-raid1 layer.

Before some region is set as "in-sync", we need to flush hardware cache on
all the disks. But the log module doesn't have access to the mirror_set
structure. So it will use this callback.

So far the callback is unused, it will be used in further patches.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

87a8f240

dm log: introduce flush_failed variable · 5adc78d0

由 Mikulas Patocka 提交于 12月 10, 2009

Introduce "flush failed" variable.  When a flush before clearing a bit
in the log fails, we don't know anything about which which regions are
in-sync and which not.

So we need to set all regions as not-in-sync and set the variable
"flush_failed" to prevent setting the in-sync bit in the future.

A target reload is the only way to get out of this situation.

The variable will be set in following patches.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

5adc78d0

dm log: add flush_header function · 20a34a8e

由 Mikulas Patocka 提交于 12月 10, 2009

Introduce flush_header and use it to flush the log device.

Note that we don't have to flush if all the regions transition
from "dirty" to "clean" state.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

20a34a8e

dm raid1: split touched state into two · b09acf1a

由 Mikulas Patocka 提交于 12月 10, 2009

Split the variable "touched" into two, "touched_dirtied" and
"touched_cleaned", set when some region was dirtied or cleaned.

This will be used to optimize flushes.

After a transition from "dirty" to "clean" state we don't have flush hardware
cache on the log device. After a transition from "clean" to "dirty" the cache
must be flushed.

Before a transition from "clean" to "dirty" state we don't have to flush all
the raid legs. Before a transition from "dirty" to "clean" we must flush all
the legs to make sure that they are really in sync.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

b09acf1a

dm raid1: support flush · 4184153f

由 Mikulas Patocka 提交于 12月 10, 2009

Flush support for dm-raid1.

When it receives an empty barrier, submit it to all the devices via dm-io.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

4184153f

dm io: remove extra bi_io_vec region hack · f1e53987

由 Mikulas Patocka 提交于 12月 10, 2009

Remove the hack where we allocate an extra bi_io_vec to store additional
private data. This hack prevents us from supporting barriers in
dm-raid1 without first making another little block layer change.
Instead of doing that, this patch eliminates the bi_io_vec abuse by
storing the region number directly in the low bits of bi_private.

We need to store two things for each bio, the pointer to the main io
structure and, if parallel writes were requested, an index indicating
which of these writes this bio belongs to. There can be at most
BITS_PER_LONG regions - 32 or 64.

The index (region number) was stored in the last (hidden) bio vector and
the pointer to struct io was stored in bi_private.

This patch now aligns "struct io" on BITS_PER_LONG bytes and stores the
region number in the low BITS_PER_LONG bits of bi_private.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

f1e53987

dm io: use slab for struct io · 952b3557

由 Mikulas Patocka 提交于 12月 10, 2009

Allocate "struct io" from a slab.

This patch changes dm-io, so that "struct io" is allocated from a slab cache.
It used to be allocated with kmalloc. Allocating from a slab will be needed
for the next patch, because it requires a special alignment of "struct io"
and kmalloc cannot meet this alignment.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

952b3557

dm crypt: make wipe message also wipe essiv key · 542da317

由 Milan Broz 提交于 12月 10, 2009

The "wipe key" message is used to wipe the volume key from memory
temporarily, for example when suspending to RAM.

But the initialisation vector in ESSIV mode is calculated from the
hashed volume key, so the wipe message should wipe this IV key too and
reinitialise it when the volume key is reinstated.

This patch adds an IV wipe method called from a wipe message callback.
ESSIV is then reinitialised using the init function added by the
last patch.

Cc: stable@kernel.org
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

542da317

dm crypt: separate essiv allocation from initialisation · b95bf2d3

由 Milan Broz 提交于 12月 10, 2009

This patch separates the construction of IV from its initialisation.
(For ESSIV it is a hash calculation based on volume key.)

Constructor code now preallocates hash tfm and salt array
and saves it in a private IV structure.

The next patch requires this to reinitialise the wiped IV
without reallocating memory when resuming a suspended device.

Cc: stable@kernel.org
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

b95bf2d3

dm crypt: restructure essiv error path · 5861f1be

由 Milan Broz 提交于 12月 10, 2009

Use kzfree for salt deallocation because it is derived from the volume
key.  Use a common error path in ESSIV constructor.

Required by a later patch which fixes the way key material is wiped
from memory.

Cc: stable@kernel.org
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

5861f1be

dm crypt: move private iv fields to structs · 60473592

由 Milan Broz 提交于 12月 10, 2009

Define private structures for IV so it's easy to add further attributes
in a following patch which fixes the way key material is wiped from
memory.  Also move ESSIV destructor and remove unnecessary 'status'
operation.

There are no functional changes in this patch.

Cc: stable@kernel.org
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

60473592