提交 · 32a926da5a16c01a8213331e5764472ce2f14a8d · openeuler / raspberrypi-kernel

22 6月, 2009 4 次提交

dm: always hold bdev reference · 32a926da

由 Mikulas Patocka 提交于 6月 22, 2009

Fix a potential deadlock when creating multiple snapshots by holding a
reference to struct block_device for the whole lifecycle of every dm
device instead of obtaining it independently at each point it is needed.

bdget_disk() was called while the device was being suspended, in
dm_suspend(). However there could be other devices already suspended,
for example when creating additional snapshots of a device. bdget_disk()
can wait for IO and allocate memory resulting in waiting for the
already-suspended device - deadlock.

This patch changes the code so that it gets the reference to struct
block_device when struct mapped_device is allocated and initialized in
alloc_dev() where it is always OK to allocate memory or wait for I/O.
It drops the reference when it is destroyed in free_dev(). Thus there
is no call to bdget_disk() while any device is suspended.

Previously unlock_fs() was called only if bdev was held. Now it is
called unconditionally, but the superfluous calls are harmless because
it returns immediately if the filesystem was not previously frozen.

This patch also now allows the device size to be changed in a
noflush suspend because the bdev is held. This has no adverse effect.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

32a926da

dm: rename suspended_bdev to bdev · db8fef4f

由 Mikulas Patocka 提交于 6月 22, 2009

Rename suspended_bdev to bdev.

This patch doesn't change any functionality, just renames the variable.
In the next patch, the variable will be used even for non-suspended device.

(Pre-requisite for the per-target barrier support patches.)
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

db8fef4f

dm: avoid unsupported spanning of md stripe boundaries · 8cbeb67a

由 Mikulas Patocka 提交于 6月 22, 2009

A bio that has two or more vector entries, size less than or equal to
page size, that crosses a stripe boundary of an underlying md device is
accepted by device mapper (it conforms to all its limits) but not by the
underlying device.

The fix is: If device mapper selects the one-page maximum request size,
it also needs to set its own q->merge_bvec_fn to reject any bios with
multiple vector entries that span more pages.

The problem was discovered in the following scenario:
  * MD - RAID-0
  * LV on the top of it (raid1, snapshot or striped with chunk
size/stripe larger than RAID-0 stripe)
  * one of the logical volumes is exported to xen domU
  * inside xen domU it is partitioned, the key point is that the partition
must be unaligned on page boundary (fdisk normally aligns the partition to
63 sectors which will trigger it)
  * install the system on the partitioned disk in domU
This causes I/O failures in dom0.
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=223947Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

8cbeb67a

dm: sysfs skip output when device is being destroyed · 4d89b7b4

由 Milan Broz 提交于 6月 22, 2009

Do not process sysfs attributes when device is being destroyed.

Otherwise code can cause
  BUG_ON(test_bit(DMF_FREEING, &md->flags));
in dm_put() call.

Cc: stable@kernel.org
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

4d89b7b4

16 6月, 2009 1 次提交

block: remove some includings of blktrace_api.h · e212d6f2

由 Li Zefan 提交于 6月 16, 2009

When porting blktrace to tracepoints, we changed to trace/block.h
for trace prober declarations.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

e212d6f2

10 6月, 2009 1 次提交

tracing/events: convert block trace points to TRACE_EVENT() · 55782138

由 Li Zefan 提交于 6月 09, 2009

TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
these new capabilities to this tracepoint:

  - zero-copy and per-cpu splice() tracing
  - binary tracing without printf overhead
  - structured logging records exposed under /debug/tracing/events
  - trace events embedded in function tracer output and other plugins
  - user-defined, per tracepoint filter expressions
  ...

Cons:

  - no dev_t info for the output of plug, unplug_timer and unplug_io events.
    no dev_t info for getrq and sleeprq events if bio == NULL.
    no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.

    This is mainly because we can't get the deivce from a request queue.
    But this may change in the future.

  - A packet command is converted to a string in TP_assign, not TP_print.
    While blktrace do the convertion just before output.

    Since pc requests should be rather rare, this is not a big issue.

  - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
    has a unique format, which means we have some unused data in a trace entry.

    The overhead is minimized by using __dynamic_array() instead of __array().

I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:

      dd                   dd + ioctl blktrace       dd + TRACE_EVENT (splice)
1     7.36s, 42.7 MB/s     7.50s, 42.0 MB/s          7.41s, 42.5 MB/s
2     7.43s, 42.3 MB/s     7.48s, 42.1 MB/s          7.43s, 42.4 MB/s
3     7.38s, 42.6 MB/s     7.45s, 42.2 MB/s          7.41s, 42.5 MB/s

So the overhead of tracing is very small, and no regression when using
those trace events vs blktrace.

And the binary output of TRACE_EVENT is much smaller than blktrace:

 # ls -l -h
 -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
 -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
 -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out

Following are some comparisons between TRACE_EVENT and blktrace:

plug:
  kjournald-480   [000]   303.084981: block_plug: [kjournald]
  kjournald-480   [000]   303.084981:   8,0    P   N [kjournald]

unplug_io:
  kblockd/0-118   [000]   300.052973: block_unplug_io: [kblockd/0] 1
  kblockd/0-118   [000]   300.052974:   8,0    U   N [kblockd/0] 1

remap:
  kjournald-480   [000]   303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384
  kjournald-480   [000]   303.085043:   8,0    A   W 102736992 + 8 <- (8,8) 33384

bio_backmerge:
  kjournald-480   [000]   303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald]
  kjournald-480   [000]   303.085086:   8,0    M   W 102737032 + 8 [kjournald]

getrq:
  kjournald-480   [000]   303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald]
  kjournald-480   [000]   303.084975:   8,0    G   W 102736984 + 8 [kjournald]

  bash-2066  [001]  1072.953770:   8,0    G   N [bash]
  bash-2066  [001]  1072.953773: block_getrq: 0,0 N 0 + 0 [bash]

rq_complete:
  konsole-2065  [001]   300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0]
  konsole-2065  [001]   300.053191:   8,0    C   W 103669040 + 16 [0]

  ksoftirqd/1-7   [001]  1072.953811:   8,0    C   N (5a 00 08 00 00 00 00 00 24 00) [0]
  ksoftirqd/1-7   [001]  1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0]

rq_insert:
  kjournald-480   [000]   303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald]
  kjournald-480   [000]   303.084986:   8,0    I   W 102736984 + 8 [kjournald]

Changelog from v2 -> v3:

- use the newly introduced __dynamic_array().

Changelog from v1 -> v2:

- use __string() instead of __array() to minimize the memory required
  to store hex dump of rq->cmd().

- support large pc requests.

- add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.

- some cleanups.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

55782138

06 5月, 2009 1 次提交

blktrace: from-sector redundant in trace_block_remap · 22a7c31a

由 Alan D. Brunelle 提交于 5月 04, 2009

Remove redundant from-sector parameter: it's /always/ the bio's sector
passed in.

[ Impact: cleanup ]
Signed-off-by: NAlan D. Brunelle <alan.brunelle@hp.com>
Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <49FF517C.7000503@hp.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

22a7c31a

15 4月, 2009 1 次提交

block: move bio list helpers into bio.h · 8f3d8ba2

由 Christoph Hellwig 提交于 4月 07, 2009

It's used by DM and MD and generally useful, so move the bio list
helpers into bio.h.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

8f3d8ba2

09 4月, 2009 8 次提交

dm: implement basic barrier support · af7e466a

由 Mikulas Patocka 提交于 4月 09, 2009

Barriers are submitted to a worker thread that issues them in-order.

The thread is modified so that when it sees a barrier request it waits
for all pending IO before the request then submits the barrier and
waits for it.  (We must wait, otherwise it could be intermixed with
following requests.)

Errors from the barrier request are recorded in a per-device barrier_error
variable. There may be only one barrier request in progress at once.

For now, the barrier request is converted to a non-barrier request when
sending it to the underlying device.

This patch guarantees correct barrier behavior if the underlying device
doesn't perform write-back caching. The same requirement existed before
barriers were supported in dm.

Bottom layer barrier support (sending barriers by target drivers) and
handling devices with write-back caches will be done in further patches.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

af7e466a

dm: remove dm_request loop · 92c63902

由 Mikulas Patocka 提交于 4月 09, 2009

Remove queue_io return value and a loop in dm_request.

IO may be submitted to a worker thread with queue_io().  queue_io() sets
DMF_QUEUE_IO_TO_THREAD so that all further IO is queued for the thread. When
the thread finishes its work, it clears DMF_QUEUE_IO_TO_THREAD and from this
point on, requests are submitted from dm_request again. This will be used
for processing barriers.

Remove the loop in dm_request. queue_io() can submit I/Os to the worker thread
even if DMF_QUEUE_IO_TO_THREAD was not set.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

92c63902

dm: rework queueing and suspension · 3b00b203

由 Mikulas Patocka 提交于 4月 09, 2009

Rework shutting down on suspend and document the associated rules.

Drop write lock in __split_and_process_bio to allow more processing
concurrency.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

3b00b203

dm: simplify dm_request loop · 54d9a1b4

由 Alasdair G Kergon 提交于 4月 09, 2009

Refactor the code in dm_request().

Require the new DMF_BLOCK_FOR_SUSPEND flag on readahead bios we will
discard so we don't drop such bios while processing a barrier.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

54d9a1b4

dm: split DMF_BLOCK_IO flag into two · 1eb787ec

由 Alasdair G Kergon 提交于 4月 09, 2009

Split the DMF_BLOCK_IO flag into two.

DMF_BLOCK_IO_FOR_SUSPEND is set when I/O must be blocked while suspending a
device.  DMF_QUEUE_IO_TO_THREAD is set when I/O must be queued to a
worker thread for later processing.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

1eb787ec

dm: rearrange dm_wq_work · df12ee99

由 Alasdair G Kergon 提交于 4月 09, 2009

Refactor dm_wq_work() to make later patch more readable.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

df12ee99

dm: remove limited barrier support · 692d0eb9

由 Mikulas Patocka 提交于 4月 09, 2009

Prepare for full barrier implementation: first remove the restricted support.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

692d0eb9

dm: add integrity support · 9c47008d

由 Martin K. Petersen 提交于 4月 09, 2009

This patch provides support for data integrity passthrough in the device
mapper.

 - If one or more component devices support integrity an integrity
   profile is preallocated for the DM device.

 - If all component devices have compatible profiles the DM device is
   flagged as capable.

 - Handle integrity metadata when splitting and cloning bios.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

9c47008d

03 4月, 2009 10 次提交

dm: set queue ordered mode · 99360b4c

由 Mikulas Patocka 提交于 4月 02, 2009

Set queue ordered mode.  It doesn't really matter what we set here
because we don't ever put any requests on the queue.  But we need to set
something other than QUEUE_ORDERED_NONE so that __generic_make_request
passes barrier requests to us.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

99360b4c

dm: move wait queue declaration · b44ebeb0

由 Mikulas Patocka 提交于 4月 02, 2009

Move wait queue declaration and unplug to dm_wait_for_completion.

The purpose is to minimize duplicate code in the further patches.

The patch reorders functions a little bit. It doesn't change any
functionality. For proper non-deadlock operation, add_wait_queue must
happen before set_current_state(interruptible) and before the test for
!atomic_read(&md->pending).
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

b44ebeb0

dm: merge pushback and deferred bio lists · 022c2611

由 Mikulas Patocka 提交于 4月 02, 2009

Merge pushback and deferred lists into one list - use deferred list
for both deferred and pushed-back bios.

This will be needed for proper support of barrier bios: it is impossible to
support ordering correctly with two lists because the requests on both lists
will be mixed up.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

022c2611

dm: allow uninterruptible wait for pending io · 401600df

由 Mikulas Patocka 提交于 4月 02, 2009

Allow uninterruptible wait for pending IOs.

Add argument "interruptible" to dm_wait_for_completion that specifies
either interruptible or uninterruptible waiting.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

401600df

dm: merge __flush_deferred_io into caller · ef208587

由 Mikulas Patocka 提交于 4月 02, 2009

Merge __flush_deferred_io() into the only caller, dm_wq_work().

There's no need to have a function that has only one caller.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

ef208587

dm: move bio_io_error into __split_and_process_bio · f0b9a450

由 Mikulas Patocka 提交于 4月 02, 2009

Move the bio_io_error() calls directly into __split_and_process_bio().

This avoids some code duplication in later patches.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

f0b9a450

dm: rename __split_bio · 8a53c28d

由 Mikulas Patocka 提交于 4月 02, 2009

Rename __split_bio() to __split_and_process_bio() because it not only splits
the bio to serveral parts, but also submits them to target drivers.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

8a53c28d

dm: remove unnecessary struct dm_wq_req · 53d5914f

由 Mikulas Patocka 提交于 4月 02, 2009

Remove struct dm_wq_req and move "work" directly into struct mapped_device.

In the revised implementation, the thread will do just one type of work
(processing the queue).
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

53d5914f

dm: remove unnecessary work queue context field · 9a1fb464

由 Mikulas Patocka 提交于 4月 02, 2009

Remove the context field from struct dm_wq_req because we will no longer
need it.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

9a1fb464

dm: remove unnecessary work queue type field · 14377396

由 Mikulas Patocka 提交于 4月 02, 2009

Remove "type" field from struct dm_wq_req because we no longer need it
to have more than one value.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

14377396

17 3月, 2009 1 次提交

dm crypt: wait for endio to complete before destruction · b35f8caa

由 Milan Broz 提交于 3月 16, 2009

The following oops has been reported when dm-crypt runs over a loop device.

...
[   70.381058] Process loop0 (pid: 4268, ti=cf3b2000 task=cf1cc1f0 task.ti=cf3b2000)
...
[   70.381058] Call Trace:
[   70.381058]  [<d0d76601>] ? crypt_dec_pending+0x5e/0x62 [dm_crypt]
[   70.381058]  [<d0d767b8>] ? crypt_endio+0xa2/0xaa [dm_crypt]
[   70.381058]  [<d0d76716>] ? crypt_endio+0x0/0xaa [dm_crypt]
[   70.381058]  [<c01a2f24>] ? bio_endio+0x2b/0x2e
[   70.381058]  [<d0806530>] ? dec_pending+0x224/0x23b [dm_mod]
[   70.381058]  [<d08066e4>] ? clone_endio+0x79/0xa4 [dm_mod]
[   70.381058]  [<d080666b>] ? clone_endio+0x0/0xa4 [dm_mod]
[   70.381058]  [<c01a2f24>] ? bio_endio+0x2b/0x2e
[   70.381058]  [<c02bad86>] ? loop_thread+0x380/0x3b7
[   70.381058]  [<c02ba8a1>] ? do_lo_send_aops+0x0/0x165
[   70.381058]  [<c013754f>] ? autoremove_wake_function+0x0/0x33
[   70.381058]  [<c02baa06>] ? loop_thread+0x0/0x3b7

When a table is being replaced, it waits for I/O to complete
before destroying the mempool, but the endio function doesn't
call mempool_free() until after completing the bio.

Fix it by swapping the order of those two operations.

The same problem occurs in dm.c with md referenced after dec_pending.
Again, we swap the order.

Cc: stable@kernel.org
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

b35f8caa

06 1月, 2009 5 次提交

dm: add name and uuid to sysfs · 784aae73

由 Milan Broz 提交于 1月 06, 2009

Implement simple read-only sysfs entry for device-mapper block device.

This patch adds a simple sysfs directory named "dm" under block device
properties and implements
	- name attribute (string containing mapped device name)
	- uuid attribute (string containing UUID, or empty string if not set)

The kobject is embedded in mapped_device struct, so no additional
memory allocation is needed for initializing sysfs entry.

During the processing of sysfs attribute we need to lock mapped device
which is done by a new function dm_get_from_kobj, which returns the md
associated with kobject and increases the usage count.

Each 'show attribute' function is responsible for its own locking.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

784aae73

dm table: rework reference counting · d5816876

由 Mikulas Patocka 提交于 1月 06, 2009

Rework table reference counting.

The existing code uses a reference counter. When the last reference is
dropped and the counter reaches zero, the table destructor is called.
Table reference counters are acquired/released from upcalls from other
kernel code (dm_any_congested, dm_merge_bvec, dm_unplug_all).
If the reference counter reaches zero in one of the upcalls, the table
destructor is called from almost random kernel code.

This leads to various problems:
* dm_any_congested being called under a spinlock, which calls the
  destructor, which calls some sleeping function.
* the destructor attempting to take a lock that is already taken by the
  same process.
* stale reference from some other kernel code keeps the table
  constructed, which keeps some devices open, even after successful
  return from "dmsetup remove". This can confuse lvm and prevent closing
  of underlying devices or reusing device minor numbers.

The patch changes reference counting so that the table destructor can be
called only at predetermined places.

The table has always exactly one reference from either mapped_device->map
or hash_cell->new_map. After this patch, this reference is not counted
in table->holders.  A pair of dm_create_table/dm_destroy_table functions
is used for table creation/destruction.

Temporary references from the other code increase table->holders. A pair
of dm_table_get/dm_table_put functions is used to manipulate it.

When the table is about to be destroyed, we wait for table->holders to
reach 0. Then, we call the table destructor.  We use active waiting with
msleep(1), because the situation happens rarely (to one user in 5 years)
and removing the device isn't performance-critical task: the user doesn't
care if it takes one tick more or not.

This way, the destructor is called only at specific points
(dm_table_destroy function) and the above problems associated with lazy
destruction can't happen.

Finally remove the temporary protection added to dm_any_congested().
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

d5816876

dm: support barriers on simple devices · ab4c1424

由 Andi Kleen 提交于 1月 06, 2009

Implement barrier support for single device DM devices

This patch implements barrier support in DM for the common case of dm linear
just remapping a single underlying device. In this case we can safely
pass the barrier through because there can be no reordering between
devices.

 NB. Any DM device might cease to support barriers if it gets
     reconfigured so code must continue to allow for a possible
     -EOPNOTSUPP on every barrier bio submitted.  - agk
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

ab4c1424

dm request: add caches · 8fbf26ad

由 Kiyoshi Ueda 提交于 1月 06, 2009

This patch prepares some kmem_caches for request-based dm.
Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

8fbf26ad

dm table: drop reference at unbind · a1b51e98

由 Mikulas Patocka 提交于 1月 06, 2009

Move one dm_table_put() so that the last reference in the thread
gets dropped in __unbind().

This is required for a following patch,
dm-table-rework-reference-counting.patch, which will change the logic in
such a way that table destructor is called only at specific points in
the code.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a1b51e98

29 12月, 2008 1 次提交

bio: allow individual slabs in the bio_set · bb799ca0

由 Jens Axboe 提交于 12月 10, 2008

Instead of having a global bio slab cache, add a reference to one
in each bio_set that is created. This allows for personalized slabs
in each bio_set, so that they can have bios of different sizes.

This means we can personalize the bios we return. File systems may
want to embed the bio inside another structure, to avoid allocation
more items (and stuffing them in ->bi_private) after the get a bio.
Or we may want to embed a number of bio_vecs directly at the end
of a bio, to avoid doing two allocations to return a bio. This is now
possible.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

bb799ca0

26 11月, 2008 2 次提交

blktrace: port to tracepoints, update · 0bfc2455

由 Ingo Molnar 提交于 11月 26, 2008

Port to the new tracepoints API: split DEFINE_TRACE() and DECLARE_TRACE()
sites. Spread them out to the usage sites, as suggested by
Mathieu Desnoyers.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

0bfc2455

blktrace: port to tracepoints · 5f3ea37c

由 Arnaldo Carvalho de Melo 提交于 10月 30, 2008

This was a forward port of work done by Mathieu Desnoyers, I changed it to
encode the 'what' parameter on the tracepoint name, so that one can register
interest in specific events and not on classes of events to then check the
'what' parameter.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5f3ea37c

14 11月, 2008 2 次提交

dm: avoid destroying table in dm_any_congested · 8a57dfc6

由 Chandra Seetharaman 提交于 11月 13, 2008

dm_any_congested() just checks for the DMF_BLOCK_IO and has no
code to make sure that suspend waits for dm_any_congested() to
complete.  This patch adds such a check.

Without it, a race can occur with dm_table_put() attempting to
destroying the table in the wrong thread, the one running
dm_any_congested() which is meant to be quick and return
immediately.

Two examples of problems:
1. Sleeping functions called from congested code, the caller
   of which holds a spin lock.
2. An ABBA deadlock between pdflush and multipathd. The two locks
   in contention are inode lock and kernel lock.
Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

8a57dfc6

dm: move pending queue wake_up end_io_acct · d221d2e7

由 Mikulas Patocka 提交于 11月 13, 2008

This doesn't fix any bug, just moves wake_up immediately after decrementing
md->pending, for better code readability.

It must be clear to anyone manipulating md->pending to wake up
the queue if md->pending reaches zero, so move the wakeup as close to
the decrementing as possible.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

d221d2e7

22 10月, 2008 3 次提交

dm: tidy local_init · 51157b4a

由 Kiyoshi Ueda 提交于 10月 21, 2008

This patch tidies local_init() in preparation for request-based dm.
No functional change.
Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

51157b4a

dm: remove unused flush_all · f431d966

由 Kiyoshi Ueda 提交于 10月 21, 2008

This patch removes the DM_WQ_FLUSH_ALL state that is unnecessary.

The dm_queue_flush(md, DM_WQ_FLUSH_ALL, NULL) in dm_suspend()
is never invoked because:
  - 'goto flush_and_out' is the same as 'goto out' because
    the 'goto flush_and_out' is called only when '!noflush'
  - If r is non-zero, then the code above will invoke 'goto out'
    and skip this code.

No functional change.
Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

f431d966

dm: mark split bio as cloned · f3e1d26e

由 Martin K. Petersen 提交于 10月 21, 2008

When a bio gets split, mark its fragments with the BIO_CLONED flag.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

f3e1d26e