1. 08 12月, 2021 4 次提交
  2. 17 11月, 2021 1 次提交
  3. 16 11月, 2021 2 次提交
    • A
      fs: dlm: replace use of socket sk_callback_lock with sock_lock · 92c44605
      Alexander Aring 提交于
      This patch will replace the use of socket sk_callback_lock lock and uses
      socket lock instead. Some users like sunrpc, see commit ea9afca8
      ("SUNRPC: Replace use of socket sk_callback_lock with sock_lock") moving
      from sk_callback_lock to sock_lock which seems to be held when the socket
      callbacks are called.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      92c44605
    • A
      fs: dlm: don't call kernel_getpeername() in error_report() · 4c3d9057
      Alexander Aring 提交于
      In some cases kernel_getpeername() will held the socket lock which is
      already held when the socket layer calls error_report() callback. Since
      commit 9dfc685e ("inet: remove races in inet{6}_getname()") this
      problem becomes more likely because the socket lock will be held always.
      You will see something like:
      
      bob9-u5 login: [  562.316860] BUG: spinlock recursion on CPU#7, swapper/7/0
      [  562.318562]  lock: 0xffff8f2284720088, .magic: dead4ead, .owner: swapper/7/0, .owner_cpu: 7
      [  562.319522] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.15.0+ #135
      [  562.320346] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
      [  562.321277] Call Trace:
      [  562.321529]  <IRQ>
      [  562.321734]  dump_stack_lvl+0x33/0x42
      [  562.322282]  do_raw_spin_lock+0x8b/0xc0
      [  562.322674]  lock_sock_nested+0x1e/0x50
      [  562.323057]  inet_getname+0x39/0x110
      [  562.323425]  ? sock_def_readable+0x80/0x80
      [  562.323838]  lowcomms_error_report+0x63/0x260 [dlm]
      [  562.324338]  ? wait_for_completion_interruptible_timeout+0xd2/0x120
      [  562.324949]  ? lock_timer_base+0x67/0x80
      [  562.325330]  ? do_raw_spin_unlock+0x49/0xc0
      [  562.325735]  ? _raw_spin_unlock_irqrestore+0x1e/0x40
      [  562.326218]  ? del_timer+0x54/0x80
      [  562.326549]  sk_error_report+0x12/0x70
      [  562.326919]  tcp_validate_incoming+0x3c8/0x530
      [  562.327347]  ? kvm_clock_read+0x14/0x30
      [  562.327718]  ? ktime_get+0x3b/0xa0
      [  562.328055]  tcp_rcv_established+0x121/0x660
      [  562.328466]  tcp_v4_do_rcv+0x132/0x260
      [  562.328835]  tcp_v4_rcv+0xcea/0xe20
      [  562.329173]  ip_protocol_deliver_rcu+0x35/0x1f0
      [  562.329615]  ip_local_deliver_finish+0x54/0x60
      [  562.330050]  ip_local_deliver+0xf7/0x110
      [  562.330431]  ? inet_rtm_getroute+0x211/0x840
      [  562.330848]  ? ip_protocol_deliver_rcu+0x1f0/0x1f0
      [  562.331310]  ip_rcv+0xe1/0xf0
      [  562.331603]  ? ip_local_deliver+0x110/0x110
      [  562.332011]  __netif_receive_skb_core+0x46a/0x1040
      [  562.332476]  ? inet_gro_receive+0x263/0x2e0
      [  562.332885]  __netif_receive_skb_list_core+0x13b/0x2c0
      [  562.333383]  netif_receive_skb_list_internal+0x1c8/0x2f0
      [  562.333896]  ? update_load_avg+0x7e/0x5e0
      [  562.334285]  gro_normal_list.part.149+0x19/0x40
      [  562.334722]  napi_complete_done+0x67/0x160
      [  562.335134]  virtnet_poll+0x2ad/0x408 [virtio_net]
      [  562.335644]  __napi_poll+0x28/0x140
      [  562.336012]  net_rx_action+0x23d/0x300
      [  562.336414]  __do_softirq+0xf2/0x2ea
      [  562.336803]  irq_exit_rcu+0xc1/0xf0
      [  562.337173]  common_interrupt+0xb9/0xd0
      
      It is and was always forbidden to call kernel_getpeername() in context
      of error_report(). To get rid of the problem we access the destination
      address for the peer over the socket structure. While on it we fix to
      print out the destination port of the inet socket.
      
      Fixes: 1a31833d ("DLM: Replace nodeid_to_addr with kernel_getpeername")
      Reported-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      4c3d9057
  4. 04 11月, 2021 1 次提交
  5. 03 11月, 2021 3 次提交
  6. 20 8月, 2021 1 次提交
    • A
      fs: dlm: implement delayed ack handling · b97f8525
      Alexander Aring 提交于
      This patch changes that we don't ack each message. Lowcomms will take
      care about to send an ack back after a bulk of messages was processed.
      Currently it's only when the whole receive buffer was processed, there
      might better positions to send an ack back but only the lowcomms
      implementation know when there are more data to receive. This patch has
      also disadvantages that we might retransmit more on errors, however this
      is a very rare case.
      
      Tested with make_panic on gfs2 with three nodes by running:
      
      trace-cmd record -p function -l 'dlm_send_ack' sleep 100
      
      and
      
      trace-cmd report | wc -l
      
      Before patch:
      - 20548
      - 21376
      - 21398
      
      After patch:
      - 18338
      - 20679
      - 19949
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      b97f8525
  7. 20 7月, 2021 9 次提交
  8. 03 6月, 2021 5 次提交
  9. 25 5月, 2021 11 次提交
    • A
      fs: dlm: don't allow half transmitted messages · 706474fb
      Alexander Aring 提交于
      This patch will clean a dirty page buffer if a reconnect occurs. If a page
      buffer was half transmitted we cannot start inside the middle of a dlm
      message if a node connects again. I observed invalid length receptions
      errors and was guessing that this behaviour occurs, after this patch I
      never saw an invalid message length again. This patch might drops more
      messages for dlm version 3.1 but 3.1 can't deal with half messages as
      well, for 3.2 it might trigger more re-transmissions but will not leave dlm
      in a broken state.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      706474fb
    • A
      fs: dlm: add reliable connection if reconnect · 489d8e55
      Alexander Aring 提交于
      This patch introduce to make a tcp lowcomms connection reliable even if
      reconnects occurs. This is done by an application layer re-transmission
      handling and sequence numbers in dlm protocols. There are three new dlm
      commands:
      
      DLM_OPTS:
      
      This will encapsulate an existing dlm message (and rcom message if they
      don't have an own application side re-transmission handling). As optional
      handling additional tlv's (type length fields) can be appended. This can
      be for example a sequence number field. However because in DLM_OPTS the
      lockspace field is unused and a sequence number is a mandatory field it
      isn't made as a tlv and we put the sequence number inside the lockspace
      id. The possibility to add optional options are still there for future
      purposes.
      
      DLM_ACK:
      
      Just a dlm header to acknowledge the receive of a DLM_OPTS message to
      it's sender.
      
      DLM_FIN:
      
      This provides a 4 way handshake for connection termination inclusive
      support for half-closed connections. It's provided on application layer
      because SCTP doesn't support half-closed sockets, the shutdown() call
      can interrupted by e.g. TCP resets itself and a hard logic to implement
      it because the othercon paradigm in lowcomms. The 4-way termination
      handshake also solve problems to synchronize peer EOF arrival and that
      the cluster manager removes the peer in the node membership handling of
      DLM. In some cases messages can be still transmitted in this time and we
      need to wait for the node membership event.
      
      To provide a reliable connection the node will retransmit all
      unacknowledges message to it's peer on reconnect. The receiver will then
      filtering out the next received message and drop all messages which are
      duplicates.
      
      As RCOM_STATUS and RCOM_NAMES messages are the first messages which are
      exchanged and they have they own re-transmission handling, there exists
      logic that these messages must be first. If these messages arrives we
      store the dlm version field. This handling is on DLM 3.1 and after this
      patch 3.2 the same. A backwards compatibility handling has been added
      which seems to work on tests without tcpkill, however it's not recommended
      to use DLM 3.1 and 3.2 at the same time, because DLM 3.2 tries to fix long
      term bugs in the DLM protocol.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      489d8e55
    • A
      fs: dlm: move out some hash functionality · 37a247da
      Alexander Aring 提交于
      This patch moves out some lowcomms hash functionality into lowcomms
      header to provide them to other layers like midcomms as well.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      37a247da
    • A
      fs: dlm: add functionality to re-transmit a message · 2874d1a6
      Alexander Aring 提交于
      This patch introduces a retransmit functionality for a lowcomms message
      handle. It's just allocates a new buffer and transmit it again, no
      special handling about prioritize it because keeping bytestream in order.
      
      To avoid another connection look some refactor was done to make a new
      buffer allocation with a preexisting connection pointer.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      2874d1a6
    • A
      fs: dlm: make buffer handling per msg · 8f2dc78d
      Alexander Aring 提交于
      This patch makes the void pointer handle for lowcomms functionality per
      message and not per page allocation entry. A refcount handling for the
      handle was added to keep the message alive until the user doesn't need
      it anymore.
      
      There exists now a per message callback which will be called when
      allocating a new buffer. This callback will be guaranteed to be called
      according the order of the sending buffer, which can be used that the
      caller increments a sequence number for the dlm message handle.
      
      For transition process we cast the dlm_mhandle to dlm_msg and vice versa
      until the midcomms layer will implement a specific dlm_mhandle structure.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      8f2dc78d
    • A
      fs: dlm: fix connection tcp EOF handling · 8aa31cbf
      Alexander Aring 提交于
      This patch fixes the EOF handling for TCP that if and EOF is received we
      will close the socket next time the writequeue runs empty. This is a
      half-closed socket functionality which doesn't exists in SCTP. The
      midcomms layer will do a half closed socket functionality on DLM side to
      solve this problem for the SCTP case. However there is still the last ack
      flying around but other reset functionality will take care of it if it got
      lost.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      8aa31cbf
    • A
      fs: dlm: cancel work sync othercon · c6aa00e3
      Alexander Aring 提交于
      These rx tx flags arguments are for signaling close_connection() from
      which worker they are called. Obviously the receive worker cannot cancel
      itself and vice versa for swork. For the othercon the receive worker
      should only be used, however to avoid deadlocks we should pass the same
      flags as the original close_connection() was called.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      c6aa00e3
    • A
      fs: dlm: reconnect if socket error report occurs · ba868d9d
      Alexander Aring 提交于
      This patch will change the reconnect handling that if an error occurs
      if a socket error callback is occurred. This will also handle reconnects
      in a non blocking connecting case which is currently missing. If error
      ECONNREFUSED is reported we delay the reconnect by one second.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      ba868d9d
    • A
      fs: dlm: set is othercon flag · 7443bc96
      Alexander Aring 提交于
      There is a is othercon flag which is never used, this patch will set it
      and printout a warning if the othercon ever sends a dlm message which
      should never be the case.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      7443bc96
    • A
      fs: dlm: fix srcu read lock usage · b38bc9c2
      Alexander Aring 提交于
      This patch holds the srcu connection read lock in cases where we lookup
      the connections and accessing it. We don't hold the srcu lock in workers
      function where the scheduled worker is part of the connection itself.
      The connection should not be freed if any worker is scheduled or
      pending.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      b38bc9c2
    • A
      fs: dlm: add dlm macros for ratelimit log · 2df6b762
      Alexander Aring 提交于
      This patch add ratelimit macro to dlm subsystem and will set the
      connecting log message to ratelimit. In non blocking connecting cases it
      will print out this message a lot.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      2df6b762
  10. 30 3月, 2021 1 次提交
  11. 09 3月, 2021 2 次提交
    • A
      fs: dlm: add shutdown hook · 9d232469
      Alexander Aring 提交于
      This patch fixes issues which occurs when dlm lowcomms synchronize their
      workqueues but dlm application layer already released the lockspace. In
      such cases messages like:
      
      dlm: gfs2: release_lockspace final free
      dlm: invalid lockspace 3841231384 from 1 cmd 1 type 11
      
      are printed on the kernel log. This patch is solving this issue by
      introducing a new "shutdown" hook before calling "stop" hook when the
      lockspace is going to be released finally. This should pretend any
      dlm messages sitting in the workqueues during or after lockspace
      removal.
      
      It's necessary to call dlm_scand_stop() as I instrumented
      dlm_lowcomms_get_buffer() code to report a warning after it's called after
      dlm_midcomms_shutdown() functionality, see below:
      
      WARNING: CPU: 1 PID: 3794 at fs/dlm/midcomms.c:1003 dlm_midcomms_get_buffer+0x167/0x180
      Modules linked in: joydev iTCO_wdt intel_pmc_bxt iTCO_vendor_support drm_ttm_helper ttm pcspkr serio_raw i2c_i801 i2c_smbus drm_kms_helper virtio_scsi lpc_ich virtio_balloon virtio_console xhci_pci xhci_pci_renesas cec qemu_fw_cfg drm [last unloaded: qxl]
      CPU: 1 PID: 3794 Comm: dlm_scand Tainted: G        W         5.11.0+ #26
      Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
      RIP: 0010:dlm_midcomms_get_buffer+0x167/0x180
      Code: 5d 41 5c 41 5d 41 5e 41 5f c3 0f 0b 45 31 e4 5b 5d 4c 89 e0 41 5c 41 5d 41 5e 41 5f c3 4c 89 e7 45 31 e4 e8 3b f1 ec ff eb 86 <0f> 0b 4c 89 e7 45 31 e4 e8 2c f1 ec ff e9 74 ff ff ff 0f 1f 80 00
      RSP: 0018:ffffa81503f8fe60 EFLAGS: 00010202
      RAX: 0000000000000008 RBX: ffff8f969827f200 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: ffffffffad1e89a0 RDI: ffff8f96a5294160
      RBP: 0000000000000001 R08: 0000000000000000 R09: ffff8f96a250bc60
      R10: 00000000000045d3 R11: 0000000000000000 R12: ffff8f96a250bc60
      R13: ffffa81503f8fec8 R14: 0000000000000070 R15: 0000000000000c40
      FS:  0000000000000000(0000) GS:ffff8f96fbc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055aa3351c000 CR3: 000000010bf22000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       dlm_scan_rsbs+0x420/0x670
       ? dlm_uevent+0x20/0x20
       dlm_scand+0xbf/0xe0
       kthread+0x13a/0x150
       ? __kthread_bind_mask+0x60/0x60
       ret_from_fork+0x22/0x30
      
      To synchronize all dlm scand messages we stop it right before shutdown
      hook.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      9d232469
    • A
      fs: dlm: flush swork on shutdown · eec054b5
      Alexander Aring 提交于
      This patch fixes the flushing of send work before shutdown. The function
      cancel_work_sync() is not the right workqueue functionality to use here
      as it would cancel the work if the work queues itself. In cases of
      EAGAIN in send() for dlm message we need to be sure that everything is
      send out before. The function flush_work() will ensure that every send
      work is be done inclusive in EAGAIN cases.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      eec054b5