1. 26 6月, 2021 17 次提交
    • T
      Bluetooth: mgmt: Fix the command returns garbage parameter value · 02ce2c2c
      Tedd Ho-Jeong An 提交于
      When the Get Device Flags command fails, it returns the error status
      with the parameters filled with the garbage values. Although the
      parameters are not used, it is better to fill with zero than the random
      values.
      Signed-off-by: NTedd Ho-Jeong An <tedd.an@intel.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      02ce2c2c
    • Y
      Bluetooth: disable filter dup when scan for adv monitor · c32d6246
      Yun-Hao Chung 提交于
      Disable duplicates filter when scanning for advertisement monitor for
      the following reasons. The scanning includes active scan and passive
      scan.
      
      For HW pattern filtering (ex. MSFT), Realtek and Qualcomm controllers
      ignore RSSI_Sampling_Period when the duplicates filter is enabled.
      
      For SW pattern filtering, when we're not doing interleaved scanning, it
      is necessary to disable duplicates filter, otherwise hosts can only
      receive one advertisement and it's impossible to know if a peer is still
      in range.
      Signed-off-by: NYun-Hao Chung <howardchung@chromium.org>
      Reviewed-by: NArchie Pusaka <apusaka@chromium.org>
      Reviewed-by: NManish Mandlik <mmandlik@chromium.org>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      c32d6246
    • S
      Bluetooth: Translate additional address type during le_conn_comp · 79699a70
      Sathish Narasimman 提交于
      When using controller based address resolution, then the destination
      address type during le_conn_complete uses 0x02 & 0x03 if controller
      resolves the destination address(RPA).
      These address types need to be converted back into either 0x00 0r 0x01
      Signed-off-by: NSathish Narasimman <sathish.narasimman@intel.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      79699a70
    • Y
      Bluetooth: RFCOMM: Use DEVICE_ATTR_RO macro · c615943e
      YueHaibing 提交于
      Use DEVICE_ATTR_RO helper instead of plain DEVICE_ATTR,
      which makes the code a bit shorter and easier to read.
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      c615943e
    • L
      Bluetooth: L2CAP: Fix invalid access on ECRED Connection response · de895b43
      Luiz Augusto von Dentz 提交于
      The use of l2cap_chan_del is not safe under a loop using
      list_for_each_entry.
      Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      de895b43
    • L
      Bluetooth: L2CAP: Fix invalid access if ECRED Reconfigure fails · 1fa20d7d
      Luiz Augusto von Dentz 提交于
      The use of l2cap_chan_del is not safe under a loop using
      list_for_each_entry.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      1fa20d7d
    • S
      Bluetooth: Remove spurious error message · 1c58e933
      Szymon Janc 提交于
      Even with rate limited reporting this is very spammy and since
      it is remote device that is providing bogus data there is no
      need to report this as error.
      
      Since real_len variable was used only to allow conditional error
      message it is now also removed.
      
      [72454.143336] bt_err_ratelimited: 10 callbacks suppressed
      [72454.143337] Bluetooth: hci0: advertising data len corrected
      [72454.296314] Bluetooth: hci0: advertising data len corrected
      [72454.892329] Bluetooth: hci0: advertising data len corrected
      [72455.051319] Bluetooth: hci0: advertising data len corrected
      [72455.357326] Bluetooth: hci0: advertising data len corrected
      [72455.663295] Bluetooth: hci0: advertising data len corrected
      [72455.787278] Bluetooth: hci0: advertising data len corrected
      [72455.942278] Bluetooth: hci0: advertising data len corrected
      [72456.094276] Bluetooth: hci0: advertising data len corrected
      [72456.249137] Bluetooth: hci0: advertising data len corrected
      [72459.416333] bt_err_ratelimited: 13 callbacks suppressed
      [72459.416334] Bluetooth: hci0: advertising data len corrected
      [72459.721334] Bluetooth: hci0: advertising data len corrected
      [72460.011317] Bluetooth: hci0: advertising data len corrected
      [72460.327171] Bluetooth: hci0: advertising data len corrected
      [72460.638294] Bluetooth: hci0: advertising data len corrected
      [72460.946350] Bluetooth: hci0: advertising data len corrected
      [72461.225320] Bluetooth: hci0: advertising data len corrected
      [72461.690322] Bluetooth: hci0: advertising data len corrected
      [72462.118318] Bluetooth: hci0: advertising data len corrected
      [72462.427319] Bluetooth: hci0: advertising data len corrected
      [72464.546319] bt_err_ratelimited: 7 callbacks suppressed
      [72464.546319] Bluetooth: hci0: advertising data len corrected
      [72464.857318] Bluetooth: hci0: advertising data len corrected
      [72465.163332] Bluetooth: hci0: advertising data len corrected
      [72465.278331] Bluetooth: hci0: advertising data len corrected
      [72465.432323] Bluetooth: hci0: advertising data len corrected
      [72465.891334] Bluetooth: hci0: advertising data len corrected
      [72466.045334] Bluetooth: hci0: advertising data len corrected
      [72466.197321] Bluetooth: hci0: advertising data len corrected
      [72466.340318] Bluetooth: hci0: advertising data len corrected
      [72466.498335] Bluetooth: hci0: advertising data len corrected
      [72469.803299] bt_err_ratelimited: 10 callbacks suppressed
      Signed-off-by: NSzymon Janc <szymon.janc@codecoup.pl>
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=203753
      Cc: stable@vger.kernel.org
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      1c58e933
    • K
      Bluetooth: Shutdown controller after workqueues are flushed or cancelled · 0ea9fd00
      Kai-Heng Feng 提交于
      Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
      stops working:
      [  509.691509] Bluetooth: hci0: HCI reset during shutdown failed
      [  514.897584] Bluetooth: hci0: MSFT filter_enable is already on
      [  530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
      [  545.660350] usb 3-10: device descriptor read/64, error -110
      [  561.283530] usb 3-10: device descriptor read/64, error -110
      [  561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
      [  566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
      [  568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
      [  578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
      [  586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
      [  596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)
      
      Or kernel panics because other workqueues already freed skb:
      [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [ 2048.663775] #PF: supervisor read access in kernel mode
      [ 2048.663779] #PF: error_code(0x0000) - not-present page
      [ 2048.663782] PGD 0 P4D 0
      [ 2048.663787] Oops: 0000 [#1] SMP NOPTI
      [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G        W         5.13.0-rc1-next-20210510+ #20
      [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
      [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
      [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
      07 83 f8 01
      [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
      [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
      [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
      [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
      [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
      [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
      [ 2048.663843] FS:  00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
      [ 2048.663849] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
      [ 2048.663856] PKRU: 55555554
      [ 2048.663859] Call Trace:
      [ 2048.663865]  ? skb_release_head_state+0x5e/0x80
      [ 2048.663873]  kfree_skb+0x2f/0xb0
      [ 2048.663881]  btusb_shutdown_intel_new+0x36/0x60 [btusb]
      [ 2048.663905]  hci_dev_do_close+0x48c/0x5e0 [bluetooth]
      [ 2048.663954]  ? __cond_resched+0x1a/0x50
      [ 2048.663962]  hci_rfkill_set_block+0x56/0xa0 [bluetooth]
      [ 2048.664007]  rfkill_set_block+0x98/0x170
      [ 2048.664016]  rfkill_fop_write+0x136/0x1e0
      [ 2048.664022]  vfs_write+0xc7/0x260
      [ 2048.664030]  ksys_write+0xb1/0xe0
      [ 2048.664035]  ? exit_to_user_mode_prepare+0x37/0x1c0
      [ 2048.664042]  __x64_sys_write+0x1a/0x20
      [ 2048.664048]  do_syscall_64+0x40/0xb0
      [ 2048.664055]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 2048.664060] RIP: 0033:0x7fe02ac23c27
      [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
      [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
      [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
      [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
      [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff
      
      So move the shutdown callback to a place where workqueues are either
      flushed or cancelled to resolve the issue.
      Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      0ea9fd00
    • K
      Bluetooth: Fix alt settings for incoming SCO with transparent coding format · 06d213d8
      Kiran K 提交于
      For incoming SCO connection with transparent coding format, alt setting
      of CVSD is getting applied instead of Transparent.
      
      Before fix:
      < HCI Command: Accept Synchron.. (0x01|0x0029) plen 21  #2196 [hci0] 321.342548
              Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
              Transmit bandwidth: 8000
              Receive bandwidth: 8000
              Max latency: 13
              Setting: 0x0003
                Input Coding: Linear
                Input Data Format: 1's complement
                Input Sample Size: 8-bit
                # of bits padding at MSB: 0
                Air Coding Format: Transparent Data
              Retransmission effort: Optimize for link quality (0x02)
              Packet type: 0x003f
                HV1 may be used
                HV2 may be used
                HV3 may be used
                EV3 may be used
                EV4 may be used
                EV5 may be used
      > HCI Event: Command Status (0x0f) plen 4               #2197 [hci0] 321.343585
            Accept Synchronous Connection Request (0x01|0x0029) ncmd 1
              Status: Success (0x00)
      > HCI Event: Synchronous Connect Comp.. (0x2c) plen 17  #2198 [hci0] 321.351666
              Status: Success (0x00)
              Handle: 257
              Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
              Link type: eSCO (0x02)
              Transmission interval: 0x0c
              Retransmission window: 0x04
              RX packet length: 60
              TX packet length: 60
              Air mode: Transparent (0x03)
      ........
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2336 [hci0] 321.383655
      < SCO Data TX: Handle 257 flags 0x00 dlen 60            #2337 [hci0] 321.389558
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2338 [hci0] 321.393615
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2339 [hci0] 321.393618
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2340 [hci0] 321.393618
      < SCO Data TX: Handle 257 flags 0x00 dlen 60            #2341 [hci0] 321.397070
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2342 [hci0] 321.403622
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2343 [hci0] 321.403625
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2344 [hci0] 321.403625
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2345 [hci0] 321.403625
      < SCO Data TX: Handle 257 flags 0x00 dlen 60            #2346 [hci0] 321.404569
      < SCO Data TX: Handle 257 flags 0x00 dlen 60            #2347 [hci0] 321.412091
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2348 [hci0] 321.413626
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2349 [hci0] 321.413630
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2350 [hci0] 321.413630
      < SCO Data TX: Handle 257 flags 0x00 dlen 60            #2351 [hci0] 321.419674
      
      After fix:
      
      < HCI Command: Accept Synchronou.. (0x01|0x0029) plen 21  #309 [hci0] 49.439693
              Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
              Transmit bandwidth: 8000
              Receive bandwidth: 8000
              Max latency: 13
              Setting: 0x0003
                Input Coding: Linear
                Input Data Format: 1's complement
                Input Sample Size: 8-bit
                # of bits padding at MSB: 0
                Air Coding Format: Transparent Data
              Retransmission effort: Optimize for link quality (0x02)
              Packet type: 0x003f
                HV1 may be used
                HV2 may be used
                HV3 may be used
                EV3 may be used
                EV4 may be used
                EV5 may be used
      > HCI Event: Command Status (0x0f) plen 4                 #310 [hci0] 49.440308
            Accept Synchronous Connection Request (0x01|0x0029) ncmd 1
              Status: Success (0x00)
      > HCI Event: Synchronous Connect Complete (0x2c) plen 17  #311 [hci0] 49.449308
              Status: Success (0x00)
              Handle: 257
              Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
              Link type: eSCO (0x02)
              Transmission interval: 0x0c
              Retransmission window: 0x04
              RX packet length: 60
              TX packet length: 60
              Air mode: Transparent (0x03)
      < SCO Data TX: Handle 257 flags 0x00 dlen 60              #312 [hci0] 49.450421
      < SCO Data TX: Handle 257 flags 0x00 dlen 60              #313 [hci0] 49.457927
      > HCI Event: Max Slots Change (0x1b) plen 3               #314 [hci0] 49.460345
              Handle: 256
              Max slots: 5
      < SCO Data TX: Handle 257 flags 0x00 dlen 60              #315 [hci0] 49.465453
      > SCO Data RX: Handle 257 flags 0x00 dlen 60              #316 [hci0] 49.470502
      > SCO Data RX: Handle 257 flags 0x00 dlen 60              #317 [hci0] 49.470519
      < SCO Data TX: Handle 257 flags 0x00 dlen 60              #318 [hci0] 49.472996
      > SCO Data RX: Handle 257 flags 0x00 dlen 60              #319 [hci0] 49.480412
      < SCO Data TX: Handle 257 flags 0x00 dlen 60              #320 [hci0] 49.480492
      < SCO Data TX: Handle 257 flags 0x00 dlen 60              #321 [hci0] 49.487989
      > SCO Data RX: Handle 257 flags 0x00 dlen 60              #322 [hci0] 49.490303
      < SCO Data TX: Handle 257 flags 0x00 dlen 60              #323 [hci0] 49.495496
      > SCO Data RX: Handle 257 flags 0x00 dlen 60              #324 [hci0] 49.500304
      > SCO Data RX: Handle 257 flags 0x00 dlen 60              #325 [hci0] 49.500311
      Signed-off-by: NKiran K <kiran.k@intel.com>
      Signed-off-by: NLokendra Singh <lokendra.singh@intel.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      06d213d8
    • J
      Bluetooth: 6lowpan: remove unused function · b0e56db7
      Jiapeng Chong 提交于
      Fix the following clang warning:
      
      net/bluetooth/6lowpan.c:913:20: warning: unused function 'bdaddr_type'
      [-Wunused-function].
      
      net/bluetooth/6lowpan.c:106:35: warning: unused function
      'peer_lookup_ba' [-Wunused-function].
      Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: NJiapeng Chong <jiapeng.chong@linux.alibaba.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      b0e56db7
    • M
      Bluetooth: Add ncmd=0 recovery handling · de75cd0d
      Manish Mandlik 提交于
      During command status or command complete event, the controller may set
      ncmd=0 indicating that it is not accepting any more commands. In such a
      case, host holds off sending any more commands to the controller. If the
      controller doesn't recover from such condition, host will wait forever,
      until the user decides that the Bluetooth is broken and may power cycles
      the Bluetooth.
      
      This patch triggers the hardware error to reset the controller and
      driver when it gets into such state as there is no other wat out.
      Reviewed-by: NAbhishek Pandit-Subedi <abhishekpandit@chromium.org>
      Signed-off-by: NManish Mandlik <mmandlik@google.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      de75cd0d
    • Y
      Bluetooth: Fix the HCI to MGMT status conversion table · 4ef36a52
      Yu Liu 提交于
      0x2B, 0x31 and 0x33 are reserved for future use but were not present in
      the HCI to MGMT conversion table, this caused the conversion to be
      incorrect for the HCI status code greater than 0x2A.
      Reviewed-by: NMiao-chen Chou <mcchou@chromium.org>
      Signed-off-by: NYu Liu <yudiliu@google.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      4ef36a52
    • T
      Bluetooth: cmtp: fix file refcount when cmtp_attach_device fails · 3cfdf8fc
      Thadeu Lima de Souza Cascardo 提交于
      When cmtp_attach_device fails, cmtp_add_connection returns the error value
      which leads to the caller to doing fput through sockfd_put. But
      cmtp_session kthread, which is stopped in this path will also call fput,
      leading to a potential refcount underflow or a use-after-free.
      
      Add a refcount before we signal the kthread to stop. The kthread will try
      to grab the cmtp_session_sem mutex before doing the fput, which is held
      when get_file is called, so there should be no races there.
      
      Reported-by: Ryota Shiga
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      3cfdf8fc
    • Y
      Bluetooth: Return whether a connection is outbound · 1c6ed31b
      Yu Liu 提交于
      When an MGMT_EV_DEVICE_CONNECTED event is reported back to the user
      space we will set the flags to tell if the established connection is
      outbound or not. This is useful for the user space to log better metrics
      and error messages.
      Reviewed-by: NMiao-chen Chou <mcchou@chromium.org>
      Reviewed-by: NAlain Michaud <alainm@chromium.org>
      Signed-off-by: NYu Liu <yudiliu@google.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      1c6ed31b
    • Q
      Bluetooth: use flexible-array member instead of zero-length array · 07d85dbe
      Qiheng Lin 提交于
      Fix the following coccicheck warning:
      
      net/bluetooth/msft.c:37:6-13: WARNING use flexible-array member instead
      net/bluetooth/msft.c:42:6-10: WARNING use flexible-array member instead
      net/bluetooth/msft.c:52:6-10: WARNING use flexible-array member instead
      Signed-off-by: NQiheng Lin <linqiheng@huawei.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      07d85dbe
    • K
      Bluetooth: 6lowpan: delete unneeded variable initialization · c469c9c9
      Kai Ye 提交于
      Delete unneeded variable initialization.
      Signed-off-by: NKai Ye <yekai13@huawei.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      c469c9c9
    • G
      net/smc: Ensure correct state of the socket in send path · 17081633
      Guvenc Gulce 提交于
      When smc_sendmsg() is called before the SMC socket initialization has
      completed, smc_tx_sendmsg() will access un-initialized fields of the
      SMC socket which results in a null-pointer dereference.
      Fix this by checking the socket state first in smc_tx_sendmsg().
      
      Fixes: e0e4b8fa ("net/smc: Add SMC statistics support")
      Reported-by: syzbot+5dda108b672b54141857@syzkaller.appspotmail.com
      Reviewed-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NGuvenc Gulce <guvenc@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17081633
  2. 25 6月, 2021 5 次提交
    • Z
      ipv6: delete useless dst check in ip6_dst_lookup_tail · c305b9e6
      zhang kai 提交于
      parameter dst always points to null.
      Signed-off-by: Nzhang kai <zhangkaiheb@126.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c305b9e6
    • X
      sctp: send the next probe immediately once the last one is acked · fea1d5b1
      Xin Long 提交于
      These is no need to wait for 'interval' period for the next probe
      if the last probe is already acked in search state. The 'interval'
      period waiting should be only for probe failure timeout and the
      current pmtu check when it's in search complete state.
      
      This change will shorten the probe time a lot in search state, and
      also fix the document accordingly.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fea1d5b1
    • X
      sctp: do black hole detection in search complete state · 0dac127c
      Xin Long 提交于
      Currently the PLPMUTD probe will stop for a long period (interval * 30)
      after it enters search complete state. If there's a pmtu change on the
      route path, it takes a long time to be aware if the ICMP TooBig packet
      is lost or filtered.
      
      As it says in rfc8899#section-4.3:
      
        "A DPLPMTUD method MUST NOT rely solely on this method."
        (ICMP PTB message).
      
      This patch is to enable the other method for search complete state:
      
        "A PL can use the DPLPMTUD probing mechanism to periodically
         generate probe packets of the size of the current PLPMTU."
      
      With this patch, the probe will continue with the current pmtu every
      'interval' until the PMTU_RAISE_TIMER 'timeout', which we implement
      by adding raise_count to raise the probe size when it counts to 30
      and removing the SCTP_PL_COMPLETE check for PMTU_RAISE_TIMER.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0dac127c
    • J
      net: ip: avoid OOM kills with large UDP sends over loopback · 6d123b81
      Jakub Kicinski 提交于
      Dave observed number of machines hitting OOM on the UDP send
      path. The workload seems to be sending large UDP packets over
      loopback. Since loopback has MTU of 64k kernel will try to
      allocate an skb with up to 64k of head space. This has a good
      chance of failing under memory pressure. What's worse if
      the message length is <32k the allocation may trigger an
      OOM killer.
      
      This is entirely avoidable, we can use an skb with page frags.
      
      af_unix solves a similar problem by limiting the head
      length to SKB_MAX_ALLOC. This seems like a good and simple
      approach. It means that UDP messages > 16kB will now
      use fragments if underlying device supports SG, if extra
      allocator pressure causes regressions in real workloads
      we can switch to trying the large allocation first and
      falling back.
      
      v4: pre-calculate all the additions to alloclen so
          we can be sure it won't go over order-2
      Reported-by: NDave Jones <dsj@fb.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d123b81
    • M
      net: retrieve netns cookie via getsocketopt · e8b9eab9
      Martynas Pumputis 提交于
      It's getting more common to run nested container environments for
      testing cloud software. One of such examples is Kind [1] which runs a
      Kubernetes cluster in Docker containers on a single host. Each container
      acts as a Kubernetes node, and thus can run any Pod (aka container)
      inside the former. This approach simplifies testing a lot, as it
      eliminates complicated VM setups.
      
      Unfortunately, such a setup breaks some functionality when cgroupv2 BPF
      programs are used for load-balancing. The load-balancer BPF program
      needs to detect whether a request originates from the host netns or a
      container netns in order to allow some access, e.g. to a service via a
      loopback IP address. Typically, the programs detect this by comparing
      netns cookies with the one of the init ns via a call to
      bpf_get_netns_cookie(NULL). However, in nested environments the latter
      cannot be used given the Kubernetes node's netns is outside the init ns.
      To fix this, we need to pass the Kubernetes node netns cookie to the
      program in a different way: by extending getsockopt() with a
      SO_NETNS_COOKIE option, the orchestrator which runs in the Kubernetes
      node netns can retrieve the cookie and pass it to the program instead.
      
      Thus, this is following up on Eric's commit 3d368ab8 ("net:
      initialize net->net_cookie at netns setup") to allow retrieval via
      SO_NETNS_COOKIE.  This is also in line in how we retrieve socket cookie
      via SO_COOKIE.
      
        [1] https://kind.sigs.k8s.io/Signed-off-by: NLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: NMartynas Pumputis <m@lambda.lt>
      Cc: Eric Dumazet <edumazet@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8b9eab9
  3. 24 6月, 2021 6 次提交
    • D
      devlink: Protect rate list with lock while switching modes · a3e5e579
      Dmytro Linkin 提交于
      Devlink eswitch set command doesn't hold devlink->lock, which makes
      possible race condition between rate list traversing and others devlink
      rate KAPI calls, like devlink_rate_nodes_destroy().
      Hold devlink lock while traversing the list.
      
      Fixes: a8ecb93e ("devlink: Introduce rate nodes")
      Signed-off-by: NDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3e5e579
    • D
      devlink: Remove eswitch mode check for mode set call · ff99324d
      Dmytro Linkin 提交于
      When eswitch is disabled, querying its current mode results in error.
      Due to this when trying to set the eswitch mode for mlx5 devices, it
      fails to set the eswitch switchdev mode.
      Hence remove such check.
      
      Fixes: a8ecb93e ("devlink: Introduce rate nodes")
      Signed-off-by: NDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff99324d
    • D
      devlink: Decrease refcnt of parent rate object on leaf destroy · 1321ed5e
      Dmytro Linkin 提交于
      Port functions, like SFs, can be deleted by the user when its leaf rate
      object has parent node. In such case node refcnt won't be decreased
      which blocks the node from deletion later.
      Do simple refcnt decrease, since driver in cleanup stage. This:
      1) assumes that driver took proper internal parent unset action;
      2) allows to avoid nested callbacks call and deadlock.
      
      Fixes: d7555984 ("devlink: Allow setting parent node of rate objects")
      Signed-off-by: NDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1321ed5e
    • K
      tcp: Add stats for socket migration. · 55d444b3
      Kuniyuki Iwashima 提交于
      This commit adds two stats for the socket migration feature to evaluate the
      effectiveness: LINUX_MIB_TCPMIGRATEREQ(SUCCESS|FAILURE).
      
      If the migration fails because of the own_req race in receiving ACK and
      sending SYN+ACK paths, we do not increment the failure stat. Then another
      CPU is responsible for the req.
      
      Link: https://lore.kernel.org/bpf/CAK6E8=cgFKuGecTzSCSQ8z3YJ_163C0uwO9yRvfDSE7vOe9mJA@mail.gmail.com/Suggested-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55d444b3
    • Y
      net: sched: remove qdisc->empty for lockless qdisc · d3e0f575
      Yunsheng Lin 提交于
      As MISSED and DRAINING state are used to indicate a non-empty
      qdisc, qdisc->empty is not longer needed, so remove it.
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # flexcan
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3e0f575
    • Y
      net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc · c4fef01b
      Yunsheng Lin 提交于
      Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK
      flag set, but queue discipline by-pass does not work for lockless
      qdisc because skb is always enqueued to qdisc even when the qdisc
      is empty, see __dev_xmit_skb().
      
      This patch calls sch_direct_xmit() to transmit the skb directly
      to the driver for empty lockless qdisc, which aviod enqueuing
      and dequeuing operation.
      
      As qdisc->empty is not reliable to indicate a empty qdisc because
      there is a time window between enqueuing and setting qdisc->empty.
      So we use the MISSED state added in commit a90c57f2 ("net:
      sched: fix packet stuck problem for lockless qdisc"), which
      indicate there is lock contention, suggesting that it is better
      not to do the qdisc bypass in order to avoid packet out of order
      problem.
      
      In order to make MISSED state reliable to indicate a empty qdisc,
      we need to ensure that testing and clearing of MISSED state is
      within the protection of qdisc->seqlock, only setting MISSED state
      can be done without the protection of qdisc->seqlock. A MISSED
      state testing is added without the protection of qdisc->seqlock to
      aviod doing unnecessary spin_trylock() for contention case.
      
      As the enqueuing is not within the protection of qdisc->seqlock,
      there is still a potential data race as mentioned by Jakub [1]:
      
            thread1               thread2             thread3
      qdisc_run_begin() # true
                              qdisc_run_begin(q)
                                   set(MISSED)
      pfifo_fast_dequeue
        clear(MISSED)
        # recheck the queue
      qdisc_run_end()
                                  enqueue skb1
                                                   qdisc empty # true
                                                qdisc_run_begin() # true
                                                sch_direct_xmit() # skb2
                               qdisc_run_begin()
                                  set(MISSED)
      
      When above happens, skb1 enqueued by thread2 is transmited after
      skb2 is transmited by thread3 because MISSED state setting and
      enqueuing is not under the qdisc->seqlock. If qdisc bypass is
      disabled, skb1 has better chance to be transmited quicker than
      skb2.
      
      This patch does not take care of the above data race, because we
      view this as similar as below:
      Even at the same time CPU1 and CPU2 write the skb to two socket
      which both heading to the same qdisc, there is no guarantee that
      which skb will hit the qdisc first, because there is a lot of
      factor like interrupt/softirq/cache miss/scheduling afffecting
      that.
      
      There are below cases that need special handling:
      1. When MISSED state is cleared before another round of dequeuing
         in pfifo_fast_dequeue(), and __qdisc_run() might not be able to
         dequeue all skb in one round and call __netif_schedule(), which
         might result in a non-empty qdisc without MISSED set. In order
         to avoid this, the MISSED state is set for lockless qdisc and
         __netif_schedule() will be called at the end of qdisc_run_end.
      
      2. The MISSED state also need to be set for lockless qdisc instead
         of calling __netif_schedule() directly when requeuing a skb for
         a similar reason.
      
      3. For netdev queue stopped case, the MISSED case need clearing
         while the netdev queue is stopped, otherwise there may be
         unnecessary __netif_schedule() calling. So a new DRAINING state
         is added to indicate this case, which also indicate a non-empty
         qdisc.
      
      4. As there is already netif_xmit_frozen_or_stopped() checking in
         dequeue_skb() and sch_direct_xmit(), which are both within the
         protection of qdisc->seqlock, but the same checking in
         __dev_xmit_skb() is without the protection, which might cause
         empty indication of a lockless qdisc to be not reliable. So
         remove the checking in __dev_xmit_skb(), and the checking in
         the protection of qdisc->seqlock seems enough to avoid the cpu
         consumption problem for netdev queue stopped case.
      
      1. https://lkml.org/lkml/2021/5/29/215Acked-by: NJakub Kicinski <kuba@kernel.org>
      Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # flexcan
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4fef01b
  4. 23 6月, 2021 12 次提交