提交 · e4dc81ed5a8069b8ae56116058ebbad77ff559ec · openeuler / Kernel

08 12月, 2021 4 次提交

fs: dlm: memory cache for lowcomms hotpath · e4dc81ed

由 Alexander Aring 提交于 11月 30, 2021

This patch introduces a kmem cache for dlm_msg handles which are used
always if dlm sends a message out. Even if their are covered by midcomms
layer or not.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

e4dc81ed

fs: dlm: memory cache for writequeue_entry · 3af2326c

由 Alexander Aring 提交于 11月 30, 2021

This patch introduces a kmem cache for writequeue entry. A writequeue
entry get quite a lot allocated if dlm transmit messages.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

3af2326c

fs: dlm: remove wq_alloc mutex · be3b0400

由 Alexander Aring 提交于 11月 30, 2021

This patch cleanups the code for allocating a new buffer in the dlm
writequeue mechanism. There was a possible tuneup to allow scheduling
while a new writequeue entry needs to be allocated because either no
sending page is available or are full. To avoid multiple concurrent
users checking at the same time if an entry is available or full
alloc_wq was introduce that those are waiting if there is currently a
new writequeue entry in process to be queued so possible further users
will check on the new allocated writequeue entry if it's full.

To simplify the code we just remove this mutex and switch that the
already introduced spin lock will be held during writequeue check,
allocation and queueing. So other users can never check on available
writequeues while there is a new one in process but not queued yet.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

be3b0400

fs: dlm: check for pending users filling buffers · bcbfea41

由 Alexander Aring 提交于 11月 30, 2021

Currently we don't care if the DLM application stack is filling buffers
(not committed yet) while we transmit some already committed buffers.
By checking on active writequeue users before dequeue a writequeue entry
we know there is coming more data and do nothing. We wait until the send
worker will be triggered again if the writequeue entry users hit zero.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

bcbfea41

17 11月, 2021 1 次提交

fs: dlm: fix build with CONFIG_IPV6 disabled · 1b9beda8

由 Alexander Aring 提交于 11月 17, 2021

This patch will surround the AF_INET6 case in sk_error_report() of dlm
with a #if IS_ENABLED(CONFIG_IPV6). The field sk->sk_v6_daddr is not
defined when CONFIG_IPV6 is disabled. If CONFIG_IPV6 is disabled, the
socket creation with AF_INET6 should already fail because a runtime
check if AF_INET6 is registered. However if there is the possibility
that AF_INET6 is set as sk_family the sk_error_report() callback will
print then an invalid family type error.
Reported-by: Nkernel test robot <lkp@intel.com>
Fixes: 4c3d9057 ("fs: dlm: don't call kernel_getpeername() in error_report()")
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

1b9beda8

16 11月, 2021 2 次提交

fs: dlm: replace use of socket sk_callback_lock with sock_lock · 92c44605

由 Alexander Aring 提交于 11月 15, 2021

This patch will replace the use of socket sk_callback_lock lock and uses
socket lock instead. Some users like sunrpc, see commit ea9afca8
("SUNRPC: Replace use of socket sk_callback_lock with sock_lock") moving
from sk_callback_lock to sock_lock which seems to be held when the socket
callbacks are called.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

92c44605

fs: dlm: don't call kernel_getpeername() in error_report() · 4c3d9057

由 Alexander Aring 提交于 11月 15, 2021

In some cases kernel_getpeername() will held the socket lock which is
already held when the socket layer calls error_report() callback. Since
commit 9dfc685e ("inet: remove races in inet{6}_getname()") this
problem becomes more likely because the socket lock will be held always.
You will see something like:

bob9-u5 login: [  562.316860] BUG: spinlock recursion on CPU#7, swapper/7/0
[  562.318562]  lock: 0xffff8f2284720088, .magic: dead4ead, .owner: swapper/7/0, .owner_cpu: 7
[  562.319522] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.15.0+ #135
[  562.320346] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
[  562.321277] Call Trace:
[  562.321529]  <IRQ>
[  562.321734]  dump_stack_lvl+0x33/0x42
[  562.322282]  do_raw_spin_lock+0x8b/0xc0
[  562.322674]  lock_sock_nested+0x1e/0x50
[  562.323057]  inet_getname+0x39/0x110
[  562.323425]  ? sock_def_readable+0x80/0x80
[  562.323838]  lowcomms_error_report+0x63/0x260 [dlm]
[  562.324338]  ? wait_for_completion_interruptible_timeout+0xd2/0x120
[  562.324949]  ? lock_timer_base+0x67/0x80
[  562.325330]  ? do_raw_spin_unlock+0x49/0xc0
[  562.325735]  ? _raw_spin_unlock_irqrestore+0x1e/0x40
[  562.326218]  ? del_timer+0x54/0x80
[  562.326549]  sk_error_report+0x12/0x70
[  562.326919]  tcp_validate_incoming+0x3c8/0x530
[  562.327347]  ? kvm_clock_read+0x14/0x30
[  562.327718]  ? ktime_get+0x3b/0xa0
[  562.328055]  tcp_rcv_established+0x121/0x660
[  562.328466]  tcp_v4_do_rcv+0x132/0x260
[  562.328835]  tcp_v4_rcv+0xcea/0xe20
[  562.329173]  ip_protocol_deliver_rcu+0x35/0x1f0
[  562.329615]  ip_local_deliver_finish+0x54/0x60
[  562.330050]  ip_local_deliver+0xf7/0x110
[  562.330431]  ? inet_rtm_getroute+0x211/0x840
[  562.330848]  ? ip_protocol_deliver_rcu+0x1f0/0x1f0
[  562.331310]  ip_rcv+0xe1/0xf0
[  562.331603]  ? ip_local_deliver+0x110/0x110
[  562.332011]  __netif_receive_skb_core+0x46a/0x1040
[  562.332476]  ? inet_gro_receive+0x263/0x2e0
[  562.332885]  __netif_receive_skb_list_core+0x13b/0x2c0
[  562.333383]  netif_receive_skb_list_internal+0x1c8/0x2f0
[  562.333896]  ? update_load_avg+0x7e/0x5e0
[  562.334285]  gro_normal_list.part.149+0x19/0x40
[  562.334722]  napi_complete_done+0x67/0x160
[  562.335134]  virtnet_poll+0x2ad/0x408 [virtio_net]
[  562.335644]  __napi_poll+0x28/0x140
[  562.336012]  net_rx_action+0x23d/0x300
[  562.336414]  __do_softirq+0xf2/0x2ea
[  562.336803]  irq_exit_rcu+0xc1/0xf0
[  562.337173]  common_interrupt+0xb9/0xd0

It is and was always forbidden to call kernel_getpeername() in context
of error_report(). To get rid of the problem we access the destination
address for the peer over the socket structure. While on it we fix to
print out the destination port of the inet socket.

Fixes: 1a31833d ("DLM: Replace nodeid_to_addr with kernel_getpeername")
Reported-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

4c3d9057

04 11月, 2021 1 次提交

fs: dlm: remove double list_first_entry call · b87b1883

由 Alexander Aring 提交于 11月 03, 2021

This patch removes a list_first_entry() call which is already done by
the previous con_next_wq() call.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

b87b1883

03 11月, 2021 3 次提交

fs: dlm: let handle callback data as void · 5c16febb

由 Alexander Aring 提交于 11月 02, 2021

This patch changes the dlm_lowcomms_new_msg() function pointer private data
from "struct mhandle *" to "void *" to provide different structures than
just "struct mhandle".
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

5c16febb

fs: dlm: trace socket handling · 92732376

由 Alexander Aring 提交于 11月 02, 2021

This patch adds tracepoints for dlm socket receive and send
functionality. We can use it to track how much data was send or received
to or from a specific nodeid.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

92732376

fs: dlm: remove check SCTP is loaded message · fe933675

由 Alexander Aring 提交于 11月 02, 2021

Since commit 764ff4011424 ("fs: dlm: auto load sctp module") we try
load the sctp module before we try to create a sctp kernel socket. That
a socket creation fails now has more likely other reasons. This patch
removes the part of error to load the sctp module and instead printout
the error code.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

fe933675

20 8月, 2021 1 次提交

fs: dlm: implement delayed ack handling · b97f8525

由 Alexander Aring 提交于 8月 18, 2021

This patch changes that we don't ack each message. Lowcomms will take
care about to send an ack back after a bulk of messages was processed.
Currently it's only when the whole receive buffer was processed, there
might better positions to send an ack back but only the lowcomms
implementation know when there are more data to receive. This patch has
also disadvantages that we might retransmit more on errors, however this
is a very rare case.

Tested with make_panic on gfs2 with three nodes by running:

trace-cmd record -p function -l 'dlm_send_ack' sleep 100

and

trace-cmd report | wc -l

Before patch:
- 20548
- 21376
- 21398

After patch:
- 18338
- 20679
- 19949
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

b97f8525

20 7月, 2021 9 次提交

fs: dlm: move receive loop into receive handler · 62699b3f

由 Alexander Aring 提交于 7月 16, 2021

This patch moves the kernel_recvmsg() loop call into the
receive_from_sock() function instead of doing the loop outside the
function and abort the loop over it's return value.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

62699b3f

fs: dlm: fix multiple empty writequeue alloc · c51b0221

由 Alexander Aring 提交于 7月 16, 2021

This patch will add a mutex that a connection can allocate a writequeue
entry buffer only at a sleepable context at one time. If multiple caller
waits at the writequeue spinlock and the spinlock gets release it could
be that multiple new writequeue page buffers were allocated instead of
allocate one writequeue page buffer and other waiters will use remaining
buffer of it. It will only be the case for sleepable context which is
the common case. In non-sleepable contexts like retransmission we just
don't care about such behaviour.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

c51b0221

fs: dlm: generic connect func · 8728a455

由 Alexander Aring 提交于 7月 16, 2021

This patch adds a generic connect function for TCP and SCTP. If the
connect functionality differs from each other additional callbacks in
dlm_proto_ops were added. The sockopts callback handling will guarantee
that sockets created by connect() will use the same options as sockets
created by accept().
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

8728a455

fs: dlm: auto load sctp module · 90d21fc0

由 Alexander Aring 提交于 7月 16, 2021

This patch adds a "for now" better handling of missing SCTP support in
the kernel and try to load the sctp module if SCTP is set.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

90d21fc0

fs: dlm: introduce generic listen · 2dc6b115

由 Alexander Aring 提交于 7月 16, 2021

This patch combines each transport layer listen functionality into one
listen function. Per transport layer differences are provided by
additional callbacks in dlm_proto_ops.

This patch drops silently sock_set_keepalive() for listen tcp sockets
only. This socket option is not set at connecting sockets, I also don't
see the sense of set keepalive for sockets which are created by accept()
only.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

2dc6b115

fs: dlm: move to static proto ops · a66c008c

由 Alexander Aring 提交于 7月 16, 2021

This patch moves the per transport socket callbacks to a static const
array. We can support only one transport socket for the init namespace
which will be determinted by reading the dlm config at lowcomms_start().
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

a66c008c

fs: dlm: introduce con_next_wq helper · 66d5955a

由 Alexander Aring 提交于 7月 16, 2021

This patch introduce a function to determine if something is ready to
being send in the writequeue. It's not just that the writequeue is not
empty additional the first entry need to have a valid length field.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

66d5955a

fs: dlm: clear CF_APP_LIMITED on close · 052849be

由 Alexander Aring 提交于 7月 16, 2021

If send_to_sock() sets CF_APP_LIMITED limited bit and it has not been
cleared by a waiting lowcomms_write_space() yet and a close_connection()
apprears we should clear the CF_APP_LIMITED bit again because the
connection starts from a new state again at reconnect.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

052849be

fs: dlm: use sk->sk_socket instead of con->sock · feb704bd

由 Alexander Aring 提交于 7月 16, 2021

Instead of dereference "con->sock" we can get the socket structure over
"sk->sk_socket" as well. This patch will switch to this behaviour.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

feb704bd

03 6月, 2021 5 次提交

fs: dlm: rename socket and app buffer defines · d10a0b88

由 Alexander Aring 提交于 6月 02, 2021

This patch renames DEFAULT_BUFFER_SIZE to DLM_MAX_SOCKET_BUFSIZE and
LOWCOMMS_MAX_TX_BUFFER_LEN to DLM_MAX_APP_BUFSIZE as they are proper
names to define what's behind those values. The DLM_MAX_SOCKET_BUFSIZE
defines the maximum size of buffer which can be handled on socket layer,
the DLM_MAX_APP_BUFSIZE defines the maximum size of buffer which can be
handled by the DLM application layer.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

d10a0b88

fs: dlm: introduce proto values · ac7d5d03

由 Alexander Aring 提交于 6月 02, 2021

Currently the dlm protocol values are that TCP is 0 and everything else
is SCTP. This makes it difficult to introduce possible other transport
layers. The only one user space tool dlm_controld, which I am aware of,
handles the protocol value 1 for SCTP. We change it now to handle SCTP
as 1, this will break user space API but it will fix it so we can add
possible other transport layers.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

ac7d5d03

fs: dlm: move dlm allow conn · 9a4139a7

由 Alexander Aring 提交于 6月 02, 2021

This patch checks if possible allowing new connections is allowed before
queueing the listen socket to accept new connections.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

9a4139a7

fs: dlm: use alloc_ordered_workqueue · 6c6a1cc6

由 Alexander Aring 提交于 6月 02, 2021

The proper way to allocate ordered workqueues is to use
alloc_ordered_workqueue() function. The current way implies an ordered
workqueue which is also required by dlm.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

6c6a1cc6

fs: dlm: fix lowcomms_start error case · fcef0e6c

由 Alexander Aring 提交于 6月 02, 2021

This patch fixes the error path handling in lowcomms_start(). We need to
cleanup some static allocated data structure and cleanup possible
workqueue if these have started.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

fcef0e6c

25 5月, 2021 11 次提交

fs: dlm: don't allow half transmitted messages · 706474fb

由 Alexander Aring 提交于 5月 21, 2021

This patch will clean a dirty page buffer if a reconnect occurs. If a page
buffer was half transmitted we cannot start inside the middle of a dlm
message if a node connects again. I observed invalid length receptions
errors and was guessing that this behaviour occurs, after this patch I
never saw an invalid message length again. This patch might drops more
messages for dlm version 3.1 but 3.1 can't deal with half messages as
well, for 3.2 it might trigger more re-transmissions but will not leave dlm
in a broken state.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

706474fb

fs: dlm: add reliable connection if reconnect · 489d8e55

由 Alexander Aring 提交于 5月 21, 2021

This patch introduce to make a tcp lowcomms connection reliable even if
reconnects occurs. This is done by an application layer re-transmission
handling and sequence numbers in dlm protocols. There are three new dlm
commands:

DLM_OPTS:

This will encapsulate an existing dlm message (and rcom message if they
don't have an own application side re-transmission handling). As optional
handling additional tlv's (type length fields) can be appended. This can
be for example a sequence number field. However because in DLM_OPTS the
lockspace field is unused and a sequence number is a mandatory field it
isn't made as a tlv and we put the sequence number inside the lockspace
id. The possibility to add optional options are still there for future
purposes.

DLM_ACK:

Just a dlm header to acknowledge the receive of a DLM_OPTS message to
it's sender.

DLM_FIN:

This provides a 4 way handshake for connection termination inclusive
support for half-closed connections. It's provided on application layer
because SCTP doesn't support half-closed sockets, the shutdown() call
can interrupted by e.g. TCP resets itself and a hard logic to implement
it because the othercon paradigm in lowcomms. The 4-way termination
handshake also solve problems to synchronize peer EOF arrival and that
the cluster manager removes the peer in the node membership handling of
DLM. In some cases messages can be still transmitted in this time and we
need to wait for the node membership event.

To provide a reliable connection the node will retransmit all
unacknowledges message to it's peer on reconnect. The receiver will then
filtering out the next received message and drop all messages which are
duplicates.

As RCOM_STATUS and RCOM_NAMES messages are the first messages which are
exchanged and they have they own re-transmission handling, there exists
logic that these messages must be first. If these messages arrives we
store the dlm version field. This handling is on DLM 3.1 and after this
patch 3.2 the same. A backwards compatibility handling has been added
which seems to work on tests without tcpkill, however it's not recommended
to use DLM 3.1 and 3.2 at the same time, because DLM 3.2 tries to fix long
term bugs in the DLM protocol.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

489d8e55

fs: dlm: move out some hash functionality · 37a247da

由 Alexander Aring 提交于 5月 21, 2021

This patch moves out some lowcomms hash functionality into lowcomms
header to provide them to other layers like midcomms as well.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

37a247da

fs: dlm: add functionality to re-transmit a message · 2874d1a6

由 Alexander Aring 提交于 5月 21, 2021

This patch introduces a retransmit functionality for a lowcomms message
handle. It's just allocates a new buffer and transmit it again, no
special handling about prioritize it because keeping bytestream in order.

To avoid another connection look some refactor was done to make a new
buffer allocation with a preexisting connection pointer.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

2874d1a6

fs: dlm: make buffer handling per msg · 8f2dc78d

由 Alexander Aring 提交于 5月 21, 2021

This patch makes the void pointer handle for lowcomms functionality per
message and not per page allocation entry. A refcount handling for the
handle was added to keep the message alive until the user doesn't need
it anymore.

There exists now a per message callback which will be called when
allocating a new buffer. This callback will be guaranteed to be called
according the order of the sending buffer, which can be used that the
caller increments a sequence number for the dlm message handle.

For transition process we cast the dlm_mhandle to dlm_msg and vice versa
until the midcomms layer will implement a specific dlm_mhandle structure.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

8f2dc78d

fs: dlm: fix connection tcp EOF handling · 8aa31cbf

由 Alexander Aring 提交于 5月 21, 2021

This patch fixes the EOF handling for TCP that if and EOF is received we
will close the socket next time the writequeue runs empty. This is a
half-closed socket functionality which doesn't exists in SCTP. The
midcomms layer will do a half closed socket functionality on DLM side to
solve this problem for the SCTP case. However there is still the last ack
flying around but other reset functionality will take care of it if it got
lost.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

8aa31cbf

fs: dlm: cancel work sync othercon · c6aa00e3

由 Alexander Aring 提交于 5月 21, 2021

These rx tx flags arguments are for signaling close_connection() from
which worker they are called. Obviously the receive worker cannot cancel
itself and vice versa for swork. For the othercon the receive worker
should only be used, however to avoid deadlocks we should pass the same
flags as the original close_connection() was called.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

c6aa00e3

fs: dlm: reconnect if socket error report occurs · ba868d9d

由 Alexander Aring 提交于 5月 21, 2021

This patch will change the reconnect handling that if an error occurs
if a socket error callback is occurred. This will also handle reconnects
in a non blocking connecting case which is currently missing. If error
ECONNREFUSED is reported we delay the reconnect by one second.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

ba868d9d

fs: dlm: set is othercon flag · 7443bc96

由 Alexander Aring 提交于 5月 21, 2021

There is a is othercon flag which is never used, this patch will set it
and printout a warning if the othercon ever sends a dlm message which
should never be the case.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

7443bc96

fs: dlm: fix srcu read lock usage · b38bc9c2

由 Alexander Aring 提交于 5月 21, 2021

This patch holds the srcu connection read lock in cases where we lookup
the connections and accessing it. We don't hold the srcu lock in workers
function where the scheduled worker is part of the connection itself.
The connection should not be freed if any worker is scheduled or
pending.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

b38bc9c2

fs: dlm: add dlm macros for ratelimit log · 2df6b762

由 Alexander Aring 提交于 5月 21, 2021

This patch add ratelimit macro to dlm subsystem and will set the
connecting log message to ratelimit. In non blocking connecting cases it
will print out this message a lot.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

2df6b762

30 3月, 2021 1 次提交

fs: dlm: fix missing unlock on error in accept_from_sock() · 2fd8db2d

由 Yang Yingliang 提交于 3月 27, 2021

Add the missing unlock before return from accept_from_sock()
in the error handling case.

Fixes: 6cde210a ("fs: dlm: add helper for init connection")
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

2fd8db2d

09 3月, 2021 2 次提交

fs: dlm: add shutdown hook · 9d232469

由 Alexander Aring 提交于 3月 01, 2021

This patch fixes issues which occurs when dlm lowcomms synchronize their
workqueues but dlm application layer already released the lockspace. In
such cases messages like:

dlm: gfs2: release_lockspace final free
dlm: invalid lockspace 3841231384 from 1 cmd 1 type 11

are printed on the kernel log. This patch is solving this issue by
introducing a new "shutdown" hook before calling "stop" hook when the
lockspace is going to be released finally. This should pretend any
dlm messages sitting in the workqueues during or after lockspace
removal.

It's necessary to call dlm_scand_stop() as I instrumented
dlm_lowcomms_get_buffer() code to report a warning after it's called after
dlm_midcomms_shutdown() functionality, see below:

WARNING: CPU: 1 PID: 3794 at fs/dlm/midcomms.c:1003 dlm_midcomms_get_buffer+0x167/0x180
Modules linked in: joydev iTCO_wdt intel_pmc_bxt iTCO_vendor_support drm_ttm_helper ttm pcspkr serio_raw i2c_i801 i2c_smbus drm_kms_helper virtio_scsi lpc_ich virtio_balloon virtio_console xhci_pci xhci_pci_renesas cec qemu_fw_cfg drm [last unloaded: qxl]
CPU: 1 PID: 3794 Comm: dlm_scand Tainted: G        W         5.11.0+ #26
Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
RIP: 0010:dlm_midcomms_get_buffer+0x167/0x180
Code: 5d 41 5c 41 5d 41 5e 41 5f c3 0f 0b 45 31 e4 5b 5d 4c 89 e0 41 5c 41 5d 41 5e 41 5f c3 4c 89 e7 45 31 e4 e8 3b f1 ec ff eb 86 <0f> 0b 4c 89 e7 45 31 e4 e8 2c f1 ec ff e9 74 ff ff ff 0f 1f 80 00
RSP: 0018:ffffa81503f8fe60 EFLAGS: 00010202
RAX: 0000000000000008 RBX: ffff8f969827f200 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffffffad1e89a0 RDI: ffff8f96a5294160
RBP: 0000000000000001 R08: 0000000000000000 R09: ffff8f96a250bc60
R10: 00000000000045d3 R11: 0000000000000000 R12: ffff8f96a250bc60
R13: ffffa81503f8fec8 R14: 0000000000000070 R15: 0000000000000c40
FS:  0000000000000000(0000) GS:ffff8f96fbc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055aa3351c000 CR3: 000000010bf22000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 dlm_scan_rsbs+0x420/0x670
 ? dlm_uevent+0x20/0x20
 dlm_scand+0xbf/0xe0
 kthread+0x13a/0x150
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30

To synchronize all dlm scand messages we stop it right before shutdown
hook.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

9d232469

fs: dlm: flush swork on shutdown · eec054b5

由 Alexander Aring 提交于 3月 01, 2021

This patch fixes the flushing of send work before shutdown. The function
cancel_work_sync() is not the right workqueue functionality to use here
as it would cancel the work if the work queues itself. In cases of
EAGAIN in send() for dlm message we need to be sure that everything is
send out before. The function flush_work() will ensure that every send
work is be done inclusive in EAGAIN cases.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

eec054b5

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功