提交 · 32db80f6f6326617ed40b2d157709226af4f062b · openanolis / cloud-kernel

08 11月, 2012 40 次提交

drbd: Consider the disk-timeout also for meta-data IO operations · 32db80f6

由 Philipp Reisner 提交于 2月 22, 2012

If the backing device is already frozen during attach, we failed
to recognize that. The current disk-timeout code works on top
of the drbd_request objects. During attach we do not allow IO
and therefore never generate a drbd_request object but block
before that in drbd_make_request().

This patch adds the timeout to all drbd_md_sync_page_io().

Before this patch we used to go from D_ATTACHING directly
to D_DISKLESS if IO failed during attach. We can no longer
do this since we have to stay in D_FAILED until all IO
ops issued to the backing device returned.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

32db80f6

drbd: Reinstate disabling AL updates with invalidate-remote · 25b0d6c8

由 Philipp Reisner 提交于 2月 14, 2012

Commit d0ef827e (drbd: switch configuration interface from connector to
genetlink) introduced a regression by removing the ability to set all
bits in the out of sync bitmap and to suspend updates to the activity log
of a disconnected device via the invalidate-remote management call.

Credits for reporting the issue are going to Arne Redlich.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

25b0d6c8

drbd: explicitly clear unused dp_flags in drbd_send_block · b17f33cb

由 Lars Ellenberg 提交于 2月 08, 2012

We send left-over garbage from the previous packet in P_DATA_REPLY and
P_RS_DATA_REPLY packets. That's bad behaviour.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

b17f33cb

drbd: Fixed compat issue with disconnecting 8.4 from a primary 8.3 · 4d0fc3fd

由 Philipp Reisner 提交于 1月 20, 2012

For compatibility reasons 8.4 has to send P_STATE_CHG_REQ (instead
of P_CONN_ST_CHG_REQ) when disconnecting.

In the receiving code path we missed to convert the old
answer (P_STATE_CHG_REPLY) back to 8.4 logic. Therefore
the CL_ST_CHG_SUCCESS or CL_ST_CHG_FAIL bit in the flags word
of mdev got set, while the state code was waiting for
the CONN_WD_ST_CHG_OKAY or CONN_WD_ST_CHG_FAIL bits in tconn.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

4d0fc3fd

drbd: drbd_bm_ALe_set_all(): Remove unused function · 1a3cde44

由 Andreas Gruenbacher 提交于 12月 30, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

1a3cde44

drbd: restart loop in drbd_make_request() [prepare for Linux-3.2] · 69b6a3b1

由 Philipp Reisner 提交于 12月 20, 2011

With Linux-3.2 generic_make_request() will no longer loop over
the request function until it finally returns 0. Move this
loop into our drbd_make_request() function.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

69b6a3b1

drbd: Restore late assigning of tconn->data.sock and meta.sock · 7da35862

由 Philipp Reisner 提交于 12月 19, 2011

With commit from Mon Mar 28 16:33:12 2011 +0200
"drbd: drbd_connect(): Initialize struct drbd_socket before sending anything"

tconn->data.sock and tconn->meta.sock get assigned early, in
conn_connect.

The early assigning can trigger an OOPS, because it may released the socket
without acquiring the mutex protecting the socket. An other thread (worker)
might use setsockopt() on the socket while it gets free()ed.

Restored the (proven) 8.3 behavior of assigning these sockets after the two
connections are established.

Credits for reporting the issue are going to Arne Redlich.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

7da35862

drbd: Log failures of connection state changes · a01842eb

由 Philipp Reisner 提交于 12月 13, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

a01842eb

drbd: Consider that read requests could be NEG_ACKEDed · e8cdc343

由 Philipp Reisner 提交于 12月 13, 2011

ap_in_flight only counts writes. NEG_ACKED is an action
on a request that might be called for reads and writes.

This bug was there forever, but it becomes much more
relevant with the read balincing code.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

e8cdc343

drbd: Do not send state packets while lower than C_CONNECTED cstate · 6ab9b1b6

由 Philipp Reisner 提交于 12月 13, 2011

I.e. in C_WF_REPORT_PARAMS or in C_WF_CONNECTION.
Sending may already work in these cstates, but the peer still expects
the HandShake / ConnectionFeatures packet.

Actually triggered by the Testuite on kugel.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

6ab9b1b6

drbd: fix race between disconnect and receive_state · b8853dbd

由 Philipp Reisner 提交于 12月 13, 2011

If the asender thread, or request_timer_fn(), or some other part of
the code, decided to drop the connection (because of timeout or other),
but the receiver just now was processing a P_STATE packet, there was a
chance that receive_state() would do a hard state change
"re-establishing" an already failed connection without additional handshake.

Log excerpt:
  Remote failed to finish a request within ko-count * timeout
  peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown )
  asender terminated
  ...
  peer( Unknown -> Secondary ) conn( Timeout -> Connected ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 )
  ...
  Connection closed
  peer( Secondary -> Unknown ) conn( Connected -> Unconnected ) pdsk( UpToDate -> DUnknown ) peer_isp( 1 -> 0 )
  receiver terminated

Impact:
while the connection state is erroneously "Connected",
requests may be queued and even sent,
which would never be acknowledged,
and may have been missed by the cleanup.
These requests would never be completed.

The next drbd_suspend_io() will then lock up,
waiting forever for these requests to complete.

Fixed in several code paths:
  Make sure the connection state is NetworkFailure or worse
  before starting the cleanup in drbd_disconnect().
  This should make sure the cleanup won't miss any requests.

  Disallow receive_state() to "upgrade" the connection state
  from an error state. This will make sure the "illegal" state
  transition won't happen.

  For all connection failure states,
  relax the safe-guard in sanitize_state() again
  to silently mask out those state changes
  (e.g. Timeout -> Connected becomes Timeout -> Timeout).

 Note by Philipp Reisner:
  The 3rd chunk described as "relax the safe-guard..."
  is not there in 8.4 as it is relaxed to the maximum in
  8.4 already
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

b8853dbd

drbd: Do not call generic_make_request() while holding req_lock · 57bcb6cf

由 Philipp Reisner 提交于 12月 03, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

57bcb6cf

drbd: Load balancing method: striping · d60de03a

由 Philipp Reisner 提交于 11月 17, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

d60de03a

drbd: Load balancing of read requests · 380207d0

由 Philipp Reisner 提交于 11月 11, 2011

New config option for the disk secition "read-balancing", with
the values: prefer-local, prefer-remote, round-robin, when-congested-remote.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

380207d0

P
drbd: Get rid of "ASSERTION FAILED: tconn->current_epoch->list not empty" · d10b4ea3
由 Philipp Reisner 提交于 11月 30, 2011
```
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
```
d10b4ea3

drbd: add missing rcu locks around recently introduced idr_for_each · 615e087f

由 Lars Ellenberg 提交于 11月 17, 2011

Recent commit
 drbd: Move write_ordering from mdev to tconn
introduced a new idr_for_each loop over all volumes,
but did not take necessary rcu locks or krefs.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

615e087f

drbd: Remove leftover prototype · 03d63e1d

由 Andreas Gruenbacher 提交于 11月 17, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

03d63e1d

drbd: fix potential spinlock deadlock · 975b2979

由 Philipp Reisner 提交于 11月 17, 2011

drbd_try_clear_on_disk_bm() has a sanity check for the number of blocks
left to be resynced (rs_left) in the current resync extent.
If it detects a mismatch, it complains, and forces a disconnect using
drbd_force_state(mdev, NS(conn, C_DISCONNECTING));

Unfortunately, this may be called while holding the req_lock,
and drbd_force_state() want's to aquire that lock itself. Deadlock.

Don't force a disconnect, but fix up rs_left by recounting and
reassigning the number of dirty blocks in that extent.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

975b2979

drbd: Fix the WO=drain implementation for multiple volumes · 77fede51

由 Philipp Reisner 提交于 11月 10, 2011

Wait until IO is drained in all volumes.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

77fede51

drbd: Switch drbd_may_finish_epoch() from mdev to tconn · 1e9dd291

由 Philipp Reisner 提交于 11月 10, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

1e9dd291

drbd: Move list of epochs from mdev to tconn · 12038a3a

由 Philipp Reisner 提交于 11月 09, 2011

This is necessary since the transfer_log on the sending is also
per tconn.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

12038a3a

drbd: Prepare epochs per connection · 1d2783d5

由 Philipp Reisner 提交于 11月 10, 2011

An epoch object needs a pointer to the mdev it was received for.
This is necessary to be able to send the barrier ack packet for
the same volume as the original barrier packet was assigned to.

This prepares the next step, in which the (receiver side)
epoch list is moved from the device (mdev) to the connection (tconn)
object.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

1d2783d5

drbd: Move write_ordering from mdev to tconn · 4b0007c0

由 Philipp Reisner 提交于 11月 09, 2011

This is necessary in order to prepare the move of the (receiver side)
epoch list from the device (mdev) to the connection (tconn) objects.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

4b0007c0

drbd: Move the CREATE_BARRIER flag from connection to device · 6936fcb4

由 Philipp Reisner 提交于 11月 10, 2011

That is necessary since the whole transfer log is per connection(tconn)
and not per device(mdev).

This bug caused list corruption on the worker list. When a barrier is queued
for sending in the context of one device, another device did not see the
CREATE_BARRIER bit, and queued the same object again -> list corruption.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

6936fcb4

drbd: Fixed an obvious copy-n-paste mistake · 36baf611

由 Philipp Reisner 提交于 11月 10, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

36baf611

drbd: Fixes from the drbd-8.3 branch · 43de7c85

由 Philipp Reisner 提交于 11月 10, 2011

* drbd-8.3:
  drbd: O_SYNC gives EIO on ramdisks for some kernels (eg. RHEL6).
  drbd: send intermediate state change results to the peer
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

43de7c85

drbd: Fixes from the drbd-8.3 branch · 0cfac5dd

由 Philipp Reisner 提交于 11月 10, 2011

* drbd-8.3:
  drbd: fix spurious meta data IO "error"
  drbd: Fixed a race condition between detach and start of resync
  drbd: fix harmless race to not trigger an ASSERT
  drbd: Derive sync-UUIDs only from the bitmap-uuid if it is non-zero
  drbd: Fixed current UUID generation (regression introduced recently, after 8.3.11)
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

0cfac5dd

drbd: Silenced compiler warnings · 376694a0

由 Philipp Reisner 提交于 11月 07, 2011

Since version 4.6.1 gcc warns about variables that get
a value assigned, but which are never read later on.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

376694a0

drbd: fix "stalled" empty resync · 9bcd2521

由 Philipp Reisner 提交于 9月 29, 2011

With sync-after dependencies, given "lucky" timing of pause/unpause
events, and the end of an empty (0 bits set) resync was sometimes not
detected on the SyncTarget, leading to a "stalled" SyncSource state.

Fixed this by expecting not only "Inconsistent -> UpToDate" but also
"Consistent -> UpToDate" transitions for the peer disk state
to end a resync.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

9bcd2521

drbd: fix bitmap writeout after aborted resync · 22d81140

由 Lars Ellenberg 提交于 9月 26, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

22d81140

drbd: Consider the discard-my-data flag for all volumes [bugz 359] · 08b165ba

由 Philipp Reisner 提交于 9月 05, 2011

...not only for the first volume
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

08b165ba

drbd: Improve error reporting in drbd_md_sync_page_io() · 935be260

由 Andreas Gruenbacher 提交于 8月 19, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

935be260

drbd: fix connect failure with all default net-options · 25e40932

由 Lars Ellenberg 提交于 8月 19, 2011

If no net-options are configured (all on their default),
no DRBD_NLA_NET_CONF will be passed to the kernel.
The kernel must not require its presence,
there is no required option in there.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

25e40932

drbd: Update some outdated comments to match the code · a209b4ae

由 Andreas Gruenbacher 提交于 8月 17, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

a209b4ae

drbd: Remove unused code · c4e7afdc

由 Philipp Reisner 提交于 8月 16, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

c4e7afdc

drbd: Remove dead code · 4276dea7

由 Philipp Reisner 提交于 6月 16, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

4276dea7

drbd: Cleanup all epoch objects upon connection loss · 85d73513

由 Philipp Reisner 提交于 7月 18, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

85d73513

drbd: The minor_count module parameter is only a hint nowadays · b80c0433

由 Philipp Reisner 提交于 7月 18, 2011

* The max of minor_count is 255
* In drbdadm count the number of minors, instead of finding
  the highest minor number
* No longer us the magic in the init script
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

b80c0433

drbd: Do not display bogus log lines for pdsk in case pdsk < D_UNKNOWN · f132f554

由 Philipp Reisner 提交于 7月 18, 2011

This was a regression recently introduced with commit
7848ddb752c09b6dfd1ddfabb06b69b08aa8f6b9
"drbd: Correctly handle resources without volumes"
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

f132f554

drbd: detach must not try to abort non-local requests · 97ddb687

由 Lars Ellenberg 提交于 7月 15, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

97ddb687

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功