提交 · 7caacb69ac468ea713e8e8ba77be8040d8fe7bbe · openeuler / Kernel

09 5月, 2012 37 次提交

drbd: Consider the disk-timeout also for meta-data IO operations · 7caacb69

由 Philipp Reisner 提交于 12月 14, 2011

If the backing device is already frozen during attach, we failed
to recognize that. The current disk-timeout code works on top
of the drbd_request objects. During attach we do not allow IO
and therefore never generate a drbd_request object but block
before that in drbd_make_request().

This patch adds the timeout to all drbd_md_sync_page_io().

Before this patch we used to go from D_ATTACHING directly
to D_DISKLESS if IO failed during attach. We can no longer
do this since we have to stay in D_FAILED until all IO
ops issued to the backing device returned.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

7caacb69

drbd: Do not send state packets while lower than C_CONNECTED cstate · 4afc433c

由 Philipp Reisner 提交于 12月 13, 2011

I.e. in C_WF_REPORT_PARAMS or in C_WF_CONNECTION.
Sending may already work in these cstates, but the peer still expects
the HandShake / ConnectionFeatures packet.

Actually triggered by the Testuite on kugel.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

4afc433c

drbd: fix race between disconnect and receive_state · 545752d5

由 Lars Ellenberg 提交于 12月 05, 2011

If the asender thread, or request_timer_fn(), or some other part of
the code, decided to drop the connection (because of timeout or other),
but the receiver just now was processing a P_STATE packet, there was a
chance that receive_state() would do a hard state change
"re-establishing" an already failed connection without additional handshake.

Log excerpt:
  Remote failed to finish a request within ko-count * timeout
  peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown )
  asender terminated
  ...
  peer( Unknown -> Secondary ) conn( Timeout -> Connected ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 )
  ...
  Connection closed
  peer( Secondary -> Unknown ) conn( Connected -> Unconnected ) pdsk( UpToDate -> DUnknown ) peer_isp( 1 -> 0 )
  receiver terminated

Impact:
while the connection state is erroneously "Connected",
requests may be queued and even sent,
which would never be acknowledged,
and may have been missed by the cleanup.
These requests would never be completed.

The next drbd_suspend_io() will then lock up,
waiting forever for these requests to complete.

Fixed in several code paths:
  Make sure the connection state is NetworkFailure or worse
  before starting the cleanup in drbd_disconnect().
  This should make sure the cleanup won't miss any requests.

  Disallow receive_state() to "upgrade" the connection state
  from an error state. This will make sure the "illegal" state
  transition won't happen.

  For all connection failure states,
  relax the safe-guard in sanitize_state() again
  to silently mask out those state changes
  (e.g. Timeout -> Connected becomes Timeout -> Timeout).
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

545752d5

drbd: fix potential spinlock deadlock · 763eb636

由 Lars Ellenberg 提交于 11月 02, 2011

drbd_try_clear_on_disk_bm() has a sanity check for the number of blocks
left to be resynced (rs_left) in the current resync extent.
If it detects a mismatch, it complains, and forces a disconnect using
drbd_force_state(mdev, NS(conn, C_DISCONNECTING));

Unfortunately, this may be called while holding the req_lock,
and drbd_force_state() want's to aquire that lock itself. Deadlock.

Don't force a disconnect, but fix up rs_left by recounting and
reassigning the number of dirty blocks in that extent.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

763eb636

drbd: Fixed an obvious copy-n-paste mistake · e89868a0

由 Philipp Reisner 提交于 11月 09, 2011

This bug might have caused troubles if disk-barriers and the ahead-behind
more are enabled at the same time.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

e89868a0

drbd: send intermediate state change results to the peer · f479ea06

由 Lars Ellenberg 提交于 10月 27, 2011

DRBD state changes schedule after_state_ch() actions to a worker thread,
which decides on the old and new states of that change, whether to send
an informational state update packet (P_STATE) to the peer.
If it decides to drbd_send_state(), it would however always send the
_curent_ state, which, if a second state change happens before the
after_state_ch() of the first ran, may "fast-forward" the peer's view
about this node.  In most cases that is harmless, but sometimes this can
confuse DRBD, for example into not actually starting a necessary resync
if you do a very tight detach/attach loop on a Connected Secondary.

Fix this by always sending the "new" state of the respective state
transition which scheduled this after_state_ch() work.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

f479ea06

drbd: fix spurious meta data IO "error" · a2e91381

由 Lars Ellenberg 提交于 10月 06, 2011

When detaching, even cleanly detaching due to administrator request,
we always go through D_FAILED before we become D_DISKLESS.

Don't let that state change race with an in-flight meta data IO,
or that one might think it actually experienced an IO error.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

a2e91381

drbd: Fixed a race condition between detach and start of resync · aaae506d

由 Philipp Reisner 提交于 10月 06, 2011

drbd_state_lock() is only there to serialize cluster wide state
changes. Testing the local disk state needs to happen while
holding the global_state_lock.

Otherwise you might see something like this (Oct 6 on kugel)
14:20:24 drbd0: conn( WFSyncUUID -> Connected ) disk( Inconsistent -> Failed )
14:20:24 drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
14:20:24 drbd0: conn( Connected -> SyncTarget ) disk( Failed -> Inconsistent )
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

aaae506d

drbd: fix harmless race to not trigger an ASSERT · 6a9a92f4

由 Lars Ellenberg 提交于 10月 06, 2011

We have one pre-allocated page to do certain synchronous meta data IO with,
using it is serialized like so:
	drbd_md_get_buffer();
	drbd_md_sync_page_io();
	drbd_md_sync_page_io();
	...
	drbd_md_put_buffer();

In drbd_md_sync_page_io() there is an
	ASSERT(atomic_read(&mdev->md_io_in_use) == 1);

We want to be able to timeout on unresponsive lower level devices, so we
can "detach" in that case. Inside drbd_md_sync_page_io() we grab an extra
reference, to not have a dangling pointer in case a delayed IO eventually
does still complete, even after we "detached" already.

We need to put the extra reference before we signal completion from the
completion handler, or the second drbd_md_sync_page_io() above may
trigger the assert (reference count still 2).
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

6a9a92f4

P
drbd: Derive sync-UUIDs only from the bitmap-uuid if it is non-zero · 5ba3dac5
由 Philipp Reisner 提交于 10月 05, 2011
```
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
```
5ba3dac5

drbd: drbd_nl_resize(): Fix missing put_ldev() on error path · 7b4e4d31

由 Andreas Gruenbacher 提交于 9月 28, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

7b4e4d31

drbd: fix "stalled" empty resync · 40424e4a

由 Lars Ellenberg 提交于 9月 26, 2011

With sync-after dependencies, given "lucky" timing of pause/unpause
events, and the end of an empty (0 bits set) resync was sometimes not
detected on the SyncTarget, leading to a "stalled" SyncSource state.

Fixed this by expecting not only "Inconsistent -> UpToDate" but also
"Consistent -> UpToDate" transitions for the peer disk state
to end a resync.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

40424e4a

drbd: Bugfix for the connection behavior · 1e86ac48

由 Philipp Reisner 提交于 8月 04, 2011

If we get into the C_BROKEN_PIPE cstate once, the state engine set the
thi->t_state of the receiver thread to restarting.  But with the while loop
in drbdd_init() a new connection gets established. After the call into
drbdd() returns immediately since the thi->t_state is not RUNNING.  The
restart of drbd_init() then resets thi->t_state to RUNNING.

I.e. after entering C_BROKEN_PIPE once, the next successful established
connection gets wasted.

The two parts of the fix:
  * Do not cause the thread to restart if we detect the issue
    with the sockets while we are in C_WF_CONNECTION.

  * Make sure that all actions that would have set us to C_BROKEN_PIPE
    happen before the state change to C_WF_REPORT_PARAMS.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

1e86ac48

drbd: Cleanup all epoch objects upon connection loss · 80f9fd55

由 Philipp Reisner 提交于 7月 18, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

80f9fd55

drbd: detach must not try to abort non-local requests from drbd-8.4 · fd2491f4

由 Philipp Reisner 提交于 7月 18, 2011

Cherry picked form 8.4
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

fd2491f4

drbd: Consider that the no-data-condition could be in connected state · 79f16f5d

由 Philipp Reisner 提交于 7月 15, 2011

...when the peer has inconsistent data. In that case we failed to
clear the susp_nod flag. When the local disk was attached again
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

79f16f5d

drbd: Fixed current UUID generation · bca482e9

由 Philipp Reisner 提交于 7月 15, 2011

Now, the new edition of the clause only fires if a diskless
peer gets promoted.

This is a fixup for "drbd: Delayed creation of current-UUID".
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

bca482e9

drbd: change some GFP_KERNEL to GFP_NOIO · 22f46ce2

由 Lars Ellenberg 提交于 7月 11, 2011

Bitmap IO may happend in the context of an application write,
in the generic block IO path.  We need to use GFP_NOIO.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

22f46ce2

drbd: Implemented the disk-timeout option · dfa8bedb

由 Philipp Reisner 提交于 6月 29, 2011

When the disk-timeout is active, and it expires for a single request,
we consider the local disk as D_FAILED. Note: With this change,
I made both timeout based state transitions HARD state transitions.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

dfa8bedb

drbd: Force flag for the detach operation · 02ee8f95

由 Philipp Reisner 提交于 3月 14, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

02ee8f95

drbd: Allow new IOs while the local disk in in FAILED state · 5ca1de03

由 Philipp Reisner 提交于 6月 28, 2011

The last bunch of commits prepared the 'detach from tar pit' feature.
With that we can be for long time in disk state FAILED. We need
to accept new IO requests during that time.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

5ca1de03

P
drbd: Bitmap IO functions can now return prematurely if the disk breaks · 9e58c4da
由 Philipp Reisner 提交于 6月 27, 2011
```
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
```
9e58c4da

drbd: Added a kref to bm_aio_ctx · d1f3779b

由 Philipp Reisner 提交于 6月 27, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

d1f3779b

drbd: Hold a reference to ldev while doing meta-data IO · b2057629

由 Philipp Reisner 提交于 6月 27, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

b2057629

P
drbd: Keep a reference to the bio until the completion handler finished · 4a2fe568
由 Philipp Reisner 提交于 7月 04, 2011
```
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
```
4a2fe568

drbd: Implemented wait_until_done_or_disk_failure() · 0c464425

由 Philipp Reisner 提交于 6月 26, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

0c464425

drbd: Replaced md_io_mutex by an atomic: md_io_in_use · e1711731

由 Philipp Reisner 提交于 6月 27, 2011

The new function drbd_md_get_buffer() aborts waiting for the buffer
in case the disk failes in the meantime.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

e1711731

drbd: moved md_io into mdev · cc94c650

由 Philipp Reisner 提交于 6月 26, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

cc94c650

P
drbd: Immediately allow completion of IOs, that wait for IO completions on a failed disk · 2b4dd36f
由 Philipp Reisner 提交于 3月 14, 2011
```
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
```
2b4dd36f

drbd: Keep a reference to barrier acked requests · 6d7e32f5

由 Philipp Reisner 提交于 3月 15, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

6d7e32f5

drbd: Improve compatibility with drbd's older than 8.3.7 · 6809384c

由 Philipp Reisner 提交于 6月 30, 2011

Regression introduced with 8.3.11 commit:
drbd: Take a more conservative approach when deciding max_bio_size

Never ever tell an older drbd, that we support more than 32KiB
in a single data request (packet).
Never believe an older drbd, that is supports more than 32KiB
in a single data request (packet)
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

6809384c

drbd: Only print sanitize state's warnings, if the state change happens · 77e8fdfc

由 Philipp Reisner 提交于 6月 29, 2011

The reason for this change is that, with when doing
'drbdadm invalidate' on a disconnected resource caused
an "implicitly set pdsk from UpToDate to DUnknown" message,
which was missleading.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

77e8fdfc

drbd: downgraded error printk to info · 07667347

由 Lars Ellenberg 提交于 6月 21, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

07667347

DRBD: Fix comparison always false warning due to long/long long compare · 5f138ce0

由 David Howells 提交于 6月 15, 2011

Fix warnings of the following nature in the drbd header:

In file included from drivers/block/drbd/drbd_bitmap.c:32:
drivers/block/drbd/drbd_int.h: In function 'drbd_get_syncer_progress':
drivers/block/drbd/drbd_int.h:2234: warning: comparison is always false due to limited range of data

where mdev->rs_total (an unsigned long) is being compared to 1ULL << 32, which
is always false on a 32-bit machine.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>

5f138ce0

drbd: spelling fix: too small · 7948bcdc

由 Lars Ellenberg 提交于 6月 06, 2011

It is not "to small", but "too small".
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

7948bcdc

drbd: cosmetic: fix accidental division instead of modulo when pretty printing · 1381e9a4

由 Lars Ellenberg 提交于 6月 03, 2011

For large resync rates, seq_printf_with_thousands_grouping()
accidentally only produced Y,000,00Y, instead of the real numbers.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

1381e9a4

P
drbd: Lower log priority for an event that is definitely not an error · ebd2b0cd
由 Philipp Reisner 提交于 5月 25, 2011
```
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
```
ebd2b0cd

19 4月, 2012 1 次提交

xen/blkback: Fix warning error. · a71e23d9

由 Konrad Rzeszutek Wilk 提交于 4月 16, 2012

drivers/block/xen-blkback/xenbus.c: In function 'xen_blkbk_discard':
drivers/block/xen-blkback/xenbus.c:419:4: warning: passing argument 1 of 'dev_warn' makes pointer from integer without a cast
+[enabled by default]
include/linux/device.h:894:5: note: expected 'const struct device *' but argument is of type 'long int'

It is unclear how that mistake made it in. It surely is wrong.
Acked-by: NJens Axboe <axboe@kernel.dk>
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

a71e23d9

12 4月, 2012 2 次提交

virtio_blk: helper function to format disk names · c0aa3e09

由 Ren Mingxin 提交于 4月 10, 2012

The current virtio block's naming algorithm just supports 18278
(26^3 + 26^2 + 26) disks. If there are more virtio blocks,
there will be disks with the same name.

Based on commit 3e1a7ff8, add
a function "virtblk_name_format()" for virtio block to support mass
of disks naming.

Notes:
- Our naming scheme is ugly. We are stuck with it
  for virtio but don't use it for any new driver:
  new drivers should name their devices PREFIX%d
  where the sequence number can be allocated by ida
- sd_format_disk_name has exactly the same logic.
  Moving it to a central place was deferred over worries
  that this will make people keep using the legacy naming
  in new drivers.
  We kept code idential in case someone wants to deduplicate later.
Signed-off-by: NRen Mingxin <renmx@cn.fujitsu.com>
Acked-by: NAsias He <asias@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

c0aa3e09

block: mtip32xx: remove HOTPLUG_PCI_PCIE dependancy · 63634806

由 Greg Kroah-Hartman 提交于 4月 12, 2012

This removes the HOTPLUG_PCI_PCIE dependency on the driver and makes it
depend on PCI.

Cc: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NAsai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

63634806

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功