提交 · 69b6a3b159927d45092f64e07f40d5ecf93e11d8 · openanolis / cloud-kernel

08 11月, 2012 40 次提交

drbd: restart loop in drbd_make_request() [prepare for Linux-3.2] · 69b6a3b1

由 Philipp Reisner 提交于 12月 20, 2011

With Linux-3.2 generic_make_request() will no longer loop over
the request function until it finally returns 0. Move this
loop into our drbd_make_request() function.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

69b6a3b1

drbd: Restore late assigning of tconn->data.sock and meta.sock · 7da35862

由 Philipp Reisner 提交于 12月 19, 2011

With commit from Mon Mar 28 16:33:12 2011 +0200
"drbd: drbd_connect(): Initialize struct drbd_socket before sending anything"

tconn->data.sock and tconn->meta.sock get assigned early, in
conn_connect.

The early assigning can trigger an OOPS, because it may released the socket
without acquiring the mutex protecting the socket. An other thread (worker)
might use setsockopt() on the socket while it gets free()ed.

Restored the (proven) 8.3 behavior of assigning these sockets after the two
connections are established.

Credits for reporting the issue are going to Arne Redlich.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

7da35862

drbd: Log failures of connection state changes · a01842eb

由 Philipp Reisner 提交于 12月 13, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

a01842eb

drbd: Consider that read requests could be NEG_ACKEDed · e8cdc343

由 Philipp Reisner 提交于 12月 13, 2011

ap_in_flight only counts writes. NEG_ACKED is an action
on a request that might be called for reads and writes.

This bug was there forever, but it becomes much more
relevant with the read balincing code.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

e8cdc343

drbd: Do not send state packets while lower than C_CONNECTED cstate · 6ab9b1b6

由 Philipp Reisner 提交于 12月 13, 2011

I.e. in C_WF_REPORT_PARAMS or in C_WF_CONNECTION.
Sending may already work in these cstates, but the peer still expects
the HandShake / ConnectionFeatures packet.

Actually triggered by the Testuite on kugel.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

6ab9b1b6

drbd: fix race between disconnect and receive_state · b8853dbd

由 Philipp Reisner 提交于 12月 13, 2011

If the asender thread, or request_timer_fn(), or some other part of
the code, decided to drop the connection (because of timeout or other),
but the receiver just now was processing a P_STATE packet, there was a
chance that receive_state() would do a hard state change
"re-establishing" an already failed connection without additional handshake.

Log excerpt:
  Remote failed to finish a request within ko-count * timeout
  peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown )
  asender terminated
  ...
  peer( Unknown -> Secondary ) conn( Timeout -> Connected ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 )
  ...
  Connection closed
  peer( Secondary -> Unknown ) conn( Connected -> Unconnected ) pdsk( UpToDate -> DUnknown ) peer_isp( 1 -> 0 )
  receiver terminated

Impact:
while the connection state is erroneously "Connected",
requests may be queued and even sent,
which would never be acknowledged,
and may have been missed by the cleanup.
These requests would never be completed.

The next drbd_suspend_io() will then lock up,
waiting forever for these requests to complete.

Fixed in several code paths:
  Make sure the connection state is NetworkFailure or worse
  before starting the cleanup in drbd_disconnect().
  This should make sure the cleanup won't miss any requests.

  Disallow receive_state() to "upgrade" the connection state
  from an error state. This will make sure the "illegal" state
  transition won't happen.

  For all connection failure states,
  relax the safe-guard in sanitize_state() again
  to silently mask out those state changes
  (e.g. Timeout -> Connected becomes Timeout -> Timeout).

 Note by Philipp Reisner:
  The 3rd chunk described as "relax the safe-guard..."
  is not there in 8.4 as it is relaxed to the maximum in
  8.4 already
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

b8853dbd

drbd: Do not call generic_make_request() while holding req_lock · 57bcb6cf

由 Philipp Reisner 提交于 12月 03, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

57bcb6cf

drbd: Load balancing method: striping · d60de03a

由 Philipp Reisner 提交于 11月 17, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

d60de03a

drbd: Load balancing of read requests · 380207d0

由 Philipp Reisner 提交于 11月 11, 2011

New config option for the disk secition "read-balancing", with
the values: prefer-local, prefer-remote, round-robin, when-congested-remote.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

380207d0

P
drbd: Get rid of "ASSERTION FAILED: tconn->current_epoch->list not empty" · d10b4ea3
由 Philipp Reisner 提交于 11月 30, 2011
```
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
```
d10b4ea3

drbd: add missing rcu locks around recently introduced idr_for_each · 615e087f

由 Lars Ellenberg 提交于 11月 17, 2011

Recent commit
 drbd: Move write_ordering from mdev to tconn
introduced a new idr_for_each loop over all volumes,
but did not take necessary rcu locks or krefs.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

615e087f

drbd: Remove leftover prototype · 03d63e1d

由 Andreas Gruenbacher 提交于 11月 17, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

03d63e1d

drbd: fix potential spinlock deadlock · 975b2979

由 Philipp Reisner 提交于 11月 17, 2011

drbd_try_clear_on_disk_bm() has a sanity check for the number of blocks
left to be resynced (rs_left) in the current resync extent.
If it detects a mismatch, it complains, and forces a disconnect using
drbd_force_state(mdev, NS(conn, C_DISCONNECTING));

Unfortunately, this may be called while holding the req_lock,
and drbd_force_state() want's to aquire that lock itself. Deadlock.

Don't force a disconnect, but fix up rs_left by recounting and
reassigning the number of dirty blocks in that extent.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

975b2979

drbd: Fix the WO=drain implementation for multiple volumes · 77fede51

由 Philipp Reisner 提交于 11月 10, 2011

Wait until IO is drained in all volumes.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

77fede51

drbd: Switch drbd_may_finish_epoch() from mdev to tconn · 1e9dd291

由 Philipp Reisner 提交于 11月 10, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

1e9dd291

drbd: Move list of epochs from mdev to tconn · 12038a3a

由 Philipp Reisner 提交于 11月 09, 2011

This is necessary since the transfer_log on the sending is also
per tconn.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

12038a3a

drbd: Prepare epochs per connection · 1d2783d5

由 Philipp Reisner 提交于 11月 10, 2011

An epoch object needs a pointer to the mdev it was received for.
This is necessary to be able to send the barrier ack packet for
the same volume as the original barrier packet was assigned to.

This prepares the next step, in which the (receiver side)
epoch list is moved from the device (mdev) to the connection (tconn)
object.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

1d2783d5

drbd: Move write_ordering from mdev to tconn · 4b0007c0

由 Philipp Reisner 提交于 11月 09, 2011

This is necessary in order to prepare the move of the (receiver side)
epoch list from the device (mdev) to the connection (tconn) objects.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

4b0007c0

drbd: Move the CREATE_BARRIER flag from connection to device · 6936fcb4

由 Philipp Reisner 提交于 11月 10, 2011

That is necessary since the whole transfer log is per connection(tconn)
and not per device(mdev).

This bug caused list corruption on the worker list. When a barrier is queued
for sending in the context of one device, another device did not see the
CREATE_BARRIER bit, and queued the same object again -> list corruption.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

6936fcb4

drbd: Fixed an obvious copy-n-paste mistake · 36baf611

由 Philipp Reisner 提交于 11月 10, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

36baf611

drbd: Fixes from the drbd-8.3 branch · 43de7c85

由 Philipp Reisner 提交于 11月 10, 2011

* drbd-8.3:
  drbd: O_SYNC gives EIO on ramdisks for some kernels (eg. RHEL6).
  drbd: send intermediate state change results to the peer
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

43de7c85

drbd: Fixes from the drbd-8.3 branch · 0cfac5dd

由 Philipp Reisner 提交于 11月 10, 2011

* drbd-8.3:
  drbd: fix spurious meta data IO "error"
  drbd: Fixed a race condition between detach and start of resync
  drbd: fix harmless race to not trigger an ASSERT
  drbd: Derive sync-UUIDs only from the bitmap-uuid if it is non-zero
  drbd: Fixed current UUID generation (regression introduced recently, after 8.3.11)
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

0cfac5dd

drbd: Silenced compiler warnings · 376694a0

由 Philipp Reisner 提交于 11月 07, 2011

Since version 4.6.1 gcc warns about variables that get
a value assigned, but which are never read later on.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

376694a0

drbd: fix "stalled" empty resync · 9bcd2521

由 Philipp Reisner 提交于 9月 29, 2011

With sync-after dependencies, given "lucky" timing of pause/unpause
events, and the end of an empty (0 bits set) resync was sometimes not
detected on the SyncTarget, leading to a "stalled" SyncSource state.

Fixed this by expecting not only "Inconsistent -> UpToDate" but also
"Consistent -> UpToDate" transitions for the peer disk state
to end a resync.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

9bcd2521

drbd: fix bitmap writeout after aborted resync · 22d81140

由 Lars Ellenberg 提交于 9月 26, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

22d81140

drbd: Consider the discard-my-data flag for all volumes [bugz 359] · 08b165ba

由 Philipp Reisner 提交于 9月 05, 2011

...not only for the first volume
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

08b165ba

drbd: Improve error reporting in drbd_md_sync_page_io() · 935be260

由 Andreas Gruenbacher 提交于 8月 19, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

935be260

drbd: fix connect failure with all default net-options · 25e40932

由 Lars Ellenberg 提交于 8月 19, 2011

If no net-options are configured (all on their default),
no DRBD_NLA_NET_CONF will be passed to the kernel.
The kernel must not require its presence,
there is no required option in there.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

25e40932

drbd: Update some outdated comments to match the code · a209b4ae

由 Andreas Gruenbacher 提交于 8月 17, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

a209b4ae

drbd: Remove unused code · c4e7afdc

由 Philipp Reisner 提交于 8月 16, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

c4e7afdc

drbd: Remove dead code · 4276dea7

由 Philipp Reisner 提交于 6月 16, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

4276dea7

drbd: Cleanup all epoch objects upon connection loss · 85d73513

由 Philipp Reisner 提交于 7月 18, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

85d73513

drbd: Do not display bogus log lines for pdsk in case pdsk < D_UNKNOWN · f132f554

由 Philipp Reisner 提交于 7月 18, 2011

This was a regression recently introduced with commit
7848ddb752c09b6dfd1ddfabb06b69b08aa8f6b9
"drbd: Correctly handle resources without volumes"
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

f132f554

drbd: detach must not try to abort non-local requests · 97ddb687

由 Lars Ellenberg 提交于 7月 15, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

97ddb687

drbd: Get rid of MR_{READ,WRITE}_SHIFT · f497609e

由 Andreas Gruenbacher 提交于 7月 17, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

f497609e

drbd: Bugfix for the connection behavior · 823bd832

由 Philipp Reisner 提交于 11月 08, 2012

If we get into the C_BROKEN_PIPE cstate once, the state engine set the
thi->t_state of the receiver thread to restarting.  But with the while loop
in drbdd_init() a new connection gets established. After the call into
drbdd() returns immediately since the thi->t_state is not RUNNING.  The
restart of drbd_init() then resets thi->t_state to RUNNING.

I.e. after entering C_BROKEN_PIPE once, the next successful established
connection gets wasted.

The two parts of the fix:
  * Do not cause the thread to restart if we detect the issue
    with the sockets while we are in C_WF_CONNECTION.

  * Make sure that all actions that would have set us to C_BROKEN_PIPE
    happen before the state change to C_WF_REPORT_PARAMS.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

823bd832

drbd: Fix the data-integrity-alg setting · 7d4c782c

由 Andreas Gruenbacher 提交于 7月 17, 2011

The last data-integrity-alg fix made data integrity checking work when the
algorithm was changed for an established connection, but the common case of
configuring the algorithm before connecting was still broken. Fix that.
Signed-off-by: NAndreas Gruenbacher <agruen@linbit.com>
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

7d4c782c

drbd: Turn tl_apply() into tl_abort_disk_io() · 71fc7eed

由 Andreas Gruenbacher 提交于 7月 17, 2011

There is no need to overly generalize this function; it only makes the code
harder to understand.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

71fc7eed

drbd: Fixed w_restart_disk_io() to handle non active AL-extents · 1b7ab15b

由 Philipp Reisner 提交于 7月 15, 2011

Since we now apply the AL in user space onto the bitmap, the AL
is not active for the requests we want to reply.

For that a al_write_transaction() that might be called from
worker context became necessary.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

1b7ab15b

drbd: Missing assignment of mdev before drbd_queue_work() · 9b743da9

由 Philipp Reisner 提交于 7月 15, 2011

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>

9b743da9

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功