1. 16 10月, 2020 2 次提交
  2. 04 9月, 2020 2 次提交
    • U
      net/smc: reset sndbuf_desc if freed · 1d8df41d
      Ursula Braun 提交于
      When an SMC connection is created, and there is a problem to
      create an RMB or DMB, the previously created send buffer is
      thrown away as well including buffer descriptor freeing.
      Make sure the connection no longer references the freed
      buffer descriptor, otherwise bugs like this are possible:
      
      [71556.835148] =============================================================================
      [71556.835168] BUG kmalloc-128 (Tainted: G    B      OE    ): Poison overwritten
      [71556.835172] -----------------------------------------------------------------------------
      
      [71556.835179] INFO: 0x00000000d20894be-0x00000000aaef63e9 @offset=2724. First byte 0x0 instead of 0x6b
      [71556.835215] INFO: Allocated in __smc_buf_create+0x184/0x578 [smc] age=0 cpu=5 pid=46726
      [71556.835234]     ___slab_alloc+0x5a4/0x690
      [71556.835239]     __slab_alloc.constprop.0+0x70/0xb0
      [71556.835243]     kmem_cache_alloc_trace+0x38e/0x3f8
      [71556.835250]     __smc_buf_create+0x184/0x578 [smc]
      [71556.835257]     smc_buf_create+0x2e/0xe8 [smc]
      [71556.835264]     smc_listen_work+0x516/0x6a0 [smc]
      [71556.835275]     process_one_work+0x280/0x478
      [71556.835280]     worker_thread+0x66/0x368
      [71556.835287]     kthread+0x17a/0x1a0
      [71556.835294]     ret_from_fork+0x28/0x2c
      [71556.835301] INFO: Freed in smc_buf_create+0xd8/0xe8 [smc] age=0 cpu=5 pid=46726
      [71556.835307]     __slab_free+0x246/0x560
      [71556.835311]     kfree+0x398/0x3f8
      [71556.835318]     smc_buf_create+0xd8/0xe8 [smc]
      [71556.835324]     smc_listen_work+0x516/0x6a0 [smc]
      [71556.835328]     process_one_work+0x280/0x478
      [71556.835332]     worker_thread+0x66/0x368
      [71556.835337]     kthread+0x17a/0x1a0
      [71556.835344]     ret_from_fork+0x28/0x2c
      [71556.835348] INFO: Slab 0x00000000a0744551 objects=51 used=51 fp=0x0000000000000000 flags=0x1ffff00000010200
      [71556.835352] INFO: Object 0x00000000563480a1 @offset=2688 fp=0x00000000289567b2
      
      [71556.835359] Redzone 000000006783cde2: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [71556.835363] Redzone 00000000e35b876e: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [71556.835367] Redzone 0000000023074562: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [71556.835372] Redzone 00000000b9564b8c: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [71556.835376] Redzone 00000000810c6362: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [71556.835380] Redzone 0000000065ef52c3: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [71556.835384] Redzone 00000000c5dd6984: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [71556.835388] Redzone 000000004c480f8f: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [71556.835392] Object 00000000563480a1: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
      [71556.835397] Object 000000009c479d06: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
      [71556.835401] Object 000000006e1dce92: 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b  kkkk....kkkkkkkk
      [71556.835405] Object 00000000227f7cf8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
      [71556.835410] Object 000000009a701215: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
      [71556.835414] Object 000000003731ce76: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
      [71556.835418] Object 00000000f7085967: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
      [71556.835422] Object 0000000007f99927: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
      [71556.835427] Redzone 00000000579c4913: bb bb bb bb bb bb bb bb                          ........
      [71556.835431] Padding 00000000305aef82: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
      [71556.835435] Padding 00000000b1cdd722: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
      [71556.835438] Padding 00000000c7568199: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
      [71556.835442] Padding 00000000fad4c4d4: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
      [71556.835451] CPU: 0 PID: 47939 Comm: kworker/0:15 Tainted: G    B      OE     5.9.0-rc1uschi+ #54
      [71556.835456] Hardware name: IBM 3906 M03 703 (LPAR)
      [71556.835464] Workqueue: events smc_listen_work [smc]
      [71556.835470] Call Trace:
      [71556.835478]  [<00000000d5eaeb10>] show_stack+0x90/0xf8
      [71556.835493]  [<00000000d66fc0f8>] dump_stack+0xa8/0xe8
      [71556.835499]  [<00000000d61a511c>] check_bytes_and_report+0x104/0x130
      [71556.835504]  [<00000000d61a57b2>] check_object+0x26a/0x2e0
      [71556.835509]  [<00000000d61a59bc>] alloc_debug_processing+0x194/0x238
      [71556.835514]  [<00000000d61a8c14>] ___slab_alloc+0x5a4/0x690
      [71556.835519]  [<00000000d61a9170>] __slab_alloc.constprop.0+0x70/0xb0
      [71556.835524]  [<00000000d61aaf66>] kmem_cache_alloc_trace+0x38e/0x3f8
      [71556.835530]  [<000003ff80549bbc>] __smc_buf_create+0x184/0x578 [smc]
      [71556.835538]  [<000003ff8054a396>] smc_buf_create+0x2e/0xe8 [smc]
      [71556.835545]  [<000003ff80540c16>] smc_listen_work+0x516/0x6a0 [smc]
      [71556.835549]  [<00000000d5f0f448>] process_one_work+0x280/0x478
      [71556.835554]  [<00000000d5f0f6a6>] worker_thread+0x66/0x368
      [71556.835559]  [<00000000d5f18692>] kthread+0x17a/0x1a0
      [71556.835563]  [<00000000d6abf3b8>] ret_from_fork+0x28/0x2c
      [71556.835569] INFO: lockdep is turned off.
      [71556.835573] FIX kmalloc-128: Restoring 0x00000000d20894be-0x00000000aaef63e9=0x6b
      
      [71556.835577] FIX kmalloc-128: Marking all objects used
      
      Fixes: fd7f3a74 ("net/smc: remove freed buffer from list")
      Reviewed-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d8df41d
    • U
      net/smc: set rx_off for SMCR explicitly · 2d2bfeb8
      Ursula Braun 提交于
      SMC tries to make use of SMCD first. If a problem shows up,
      it tries to switch to SMCR. If the SMCD initializing problem shows
      up after the SMCD connection has already been initialized, field
      rx_off keeps the wrong SMCD value for SMCR, which results in corrupted
      data at the receiver.
      This patch adds an explicit (re-)setting of field rx_off to zero if the
      connection uses SMCR.
      
      Fixes: be244f28 ("net/smc: add SMC-D support in data transfer")
      Reviewed-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d2bfeb8
  3. 28 7月, 2020 1 次提交
  4. 21 7月, 2020 1 次提交
    • K
      net/smc: fix dmb buffer shortage · a9e44502
      Karsten Graul 提交于
      There is a current limit of 1920 registered dmb buffers per ISM device
      for smc-d. One link group can contain 255 connections, each connection
      is using one dmb buffer. When the connection is closed then the
      registered buffer is held in a queue and is reused by the next
      connection. When a link group is 'full' then another link group is
      created and uses an own buffer pool. The link groups are added to a
      list using list_add() which puts a new link group to the first position
      in the list.
      In the situation that many connections are opened (>1920) and a few of
      them stay open while others are closed quickly we end up with at least 8
      link groups. For a new connection a matching link group is looked up,
      iterating over the list of link groups. The trailing 7 link groups
      all have registered dmb buffers which could be reused, while the first
      link group has only a few dmb buffers and then hit the 1920 limit.
      Because the first link group is not full (255 connection limit not
      reached) it is chosen and finally the connection falls back to TCP
      because there is no dmb buffer available in this link group.
      There are multiple ways to fix that: using list_add_tail() allows
      to scan older link groups first for free buffers which ensures that
      buffers are reused first. This fixes the problem for smc-r link groups
      as well. For smc-d there is an even better way to address this problem
      because smc-d does not have the 255 connections per link group limit.
      So fix the problem for smc-d by allowing large link groups.
      
      Fixes: c6ba7c9b ("net/smc: add base infrastructure for SMC-D and ISM")
      Reviewed-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9e44502
  5. 20 7月, 2020 4 次提交
  6. 09 7月, 2020 3 次提交
    • U
      net/smc: switch smcd_dev_list spinlock to mutex · 82087c03
      Ursula Braun 提交于
      The similar smc_ib_devices spinlock has been converted to a mutex.
      Protecting the smcd_dev_list by a mutex is possible as well. This
      patch converts the smcd_dev_list spinlock to a mutex.
      
      Fixes: c6ba7c9b ("net/smc: add base infrastructure for SMC-D and ISM")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82087c03
    • U
      net/smc: fix sleep bug in smc_pnet_find_roce_resource() · 92f3cb0e
      Ursula Braun 提交于
      Tests showed this BUG:
      [572555.252867] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
      [572555.252876] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 131031, name: smcapp
      [572555.252879] INFO: lockdep is turned off.
      [572555.252883] CPU: 1 PID: 131031 Comm: smcapp Tainted: G           O      5.7.0-rc3uschi+ #356
      [572555.252885] Hardware name: IBM 3906 M03 703 (LPAR)
      [572555.252887] Call Trace:
      [572555.252896]  [<00000000ac364554>] show_stack+0x94/0xe8
      [572555.252901]  [<00000000aca1f400>] dump_stack+0xa0/0xe0
      [572555.252906]  [<00000000ac3c8c10>] ___might_sleep+0x260/0x280
      [572555.252910]  [<00000000acdc0c98>] __mutex_lock+0x48/0x940
      [572555.252912]  [<00000000acdc15c2>] mutex_lock_nested+0x32/0x40
      [572555.252975]  [<000003ff801762d0>] mlx5_lag_get_roce_netdev+0x30/0xc0 [mlx5_core]
      [572555.252996]  [<000003ff801fb3aa>] mlx5_ib_get_netdev+0x3a/0xe0 [mlx5_ib]
      [572555.253007]  [<000003ff80063848>] smc_pnet_find_roce_resource+0x1d8/0x310 [smc]
      [572555.253011]  [<000003ff800602f0>] __smc_connect+0x1f0/0x3e0 [smc]
      [572555.253015]  [<000003ff80060634>] smc_connect+0x154/0x190 [smc]
      [572555.253022]  [<00000000acbed8d4>] __sys_connect+0x94/0xd0
      [572555.253025]  [<00000000acbef620>] __s390x_sys_socketcall+0x170/0x360
      [572555.253028]  [<00000000acdc6800>] system_call+0x298/0x2b8
      [572555.253030] INFO: lockdep is turned off.
      
      Function smc_pnet_find_rdma_dev() might be called from
      smc_pnet_find_roce_resource(). It holds the smc_ib_devices list
      spinlock while calling infiniband op get_netdev(). At least for mlx5
      the get_netdev operation wants mutex serialization, which conflicts
      with the smc_ib_devices spinlock.
      This patch switches the smc_ib_devices spinlock into a mutex to
      allow sleeping when calling get_netdev().
      
      Fixes: a4cf0443 ("smc: introduce SMC as an IB-client")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92f3cb0e
    • K
      net/smc: separate LLC wait queues for flow and messages · 6778a6be
      Karsten Graul 提交于
      There might be races in scenarios where both SMC link groups are on the
      same system. Prevent that by creating separate wait queues for LLC flows
      and messages. Switch to non-interruptable versions of wait_event() and
      wake_up() for the llc flow waiter to make sure the waiters get control
      sequentially. Fine tune the llc_flow_lock to include the assignment of
      the message. Write to system log when an unexpected message was
      dropped. And remove an extra indirection and use the existing local
      variable lgr in smc_llc_enqueue().
      
      Fixes: 555da9af ("net/smc: add event-based llc_flow framework")
      Reviewed-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6778a6be
  7. 31 5月, 2020 1 次提交
  8. 06 5月, 2020 1 次提交
  9. 05 5月, 2020 9 次提交
  10. 04 5月, 2020 3 次提交
  11. 02 5月, 2020 10 次提交
  12. 01 5月, 2020 3 次提交