1. 26 6月, 2021 16 次提交
    • Y
      Bluetooth: disable filter dup when scan for adv monitor · c32d6246
      Yun-Hao Chung 提交于
      Disable duplicates filter when scanning for advertisement monitor for
      the following reasons. The scanning includes active scan and passive
      scan.
      
      For HW pattern filtering (ex. MSFT), Realtek and Qualcomm controllers
      ignore RSSI_Sampling_Period when the duplicates filter is enabled.
      
      For SW pattern filtering, when we're not doing interleaved scanning, it
      is necessary to disable duplicates filter, otherwise hosts can only
      receive one advertisement and it's impossible to know if a peer is still
      in range.
      Signed-off-by: NYun-Hao Chung <howardchung@chromium.org>
      Reviewed-by: NArchie Pusaka <apusaka@chromium.org>
      Reviewed-by: NManish Mandlik <mmandlik@chromium.org>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      c32d6246
    • S
      Bluetooth: Translate additional address type during le_conn_comp · 79699a70
      Sathish Narasimman 提交于
      When using controller based address resolution, then the destination
      address type during le_conn_complete uses 0x02 & 0x03 if controller
      resolves the destination address(RPA).
      These address types need to be converted back into either 0x00 0r 0x01
      Signed-off-by: NSathish Narasimman <sathish.narasimman@intel.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      79699a70
    • Y
      Bluetooth: RFCOMM: Use DEVICE_ATTR_RO macro · c615943e
      YueHaibing 提交于
      Use DEVICE_ATTR_RO helper instead of plain DEVICE_ATTR,
      which makes the code a bit shorter and easier to read.
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      c615943e
    • L
      Bluetooth: L2CAP: Fix invalid access on ECRED Connection response · de895b43
      Luiz Augusto von Dentz 提交于
      The use of l2cap_chan_del is not safe under a loop using
      list_for_each_entry.
      Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      de895b43
    • L
      Bluetooth: L2CAP: Fix invalid access if ECRED Reconfigure fails · 1fa20d7d
      Luiz Augusto von Dentz 提交于
      The use of l2cap_chan_del is not safe under a loop using
      list_for_each_entry.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      1fa20d7d
    • S
      Bluetooth: Remove spurious error message · 1c58e933
      Szymon Janc 提交于
      Even with rate limited reporting this is very spammy and since
      it is remote device that is providing bogus data there is no
      need to report this as error.
      
      Since real_len variable was used only to allow conditional error
      message it is now also removed.
      
      [72454.143336] bt_err_ratelimited: 10 callbacks suppressed
      [72454.143337] Bluetooth: hci0: advertising data len corrected
      [72454.296314] Bluetooth: hci0: advertising data len corrected
      [72454.892329] Bluetooth: hci0: advertising data len corrected
      [72455.051319] Bluetooth: hci0: advertising data len corrected
      [72455.357326] Bluetooth: hci0: advertising data len corrected
      [72455.663295] Bluetooth: hci0: advertising data len corrected
      [72455.787278] Bluetooth: hci0: advertising data len corrected
      [72455.942278] Bluetooth: hci0: advertising data len corrected
      [72456.094276] Bluetooth: hci0: advertising data len corrected
      [72456.249137] Bluetooth: hci0: advertising data len corrected
      [72459.416333] bt_err_ratelimited: 13 callbacks suppressed
      [72459.416334] Bluetooth: hci0: advertising data len corrected
      [72459.721334] Bluetooth: hci0: advertising data len corrected
      [72460.011317] Bluetooth: hci0: advertising data len corrected
      [72460.327171] Bluetooth: hci0: advertising data len corrected
      [72460.638294] Bluetooth: hci0: advertising data len corrected
      [72460.946350] Bluetooth: hci0: advertising data len corrected
      [72461.225320] Bluetooth: hci0: advertising data len corrected
      [72461.690322] Bluetooth: hci0: advertising data len corrected
      [72462.118318] Bluetooth: hci0: advertising data len corrected
      [72462.427319] Bluetooth: hci0: advertising data len corrected
      [72464.546319] bt_err_ratelimited: 7 callbacks suppressed
      [72464.546319] Bluetooth: hci0: advertising data len corrected
      [72464.857318] Bluetooth: hci0: advertising data len corrected
      [72465.163332] Bluetooth: hci0: advertising data len corrected
      [72465.278331] Bluetooth: hci0: advertising data len corrected
      [72465.432323] Bluetooth: hci0: advertising data len corrected
      [72465.891334] Bluetooth: hci0: advertising data len corrected
      [72466.045334] Bluetooth: hci0: advertising data len corrected
      [72466.197321] Bluetooth: hci0: advertising data len corrected
      [72466.340318] Bluetooth: hci0: advertising data len corrected
      [72466.498335] Bluetooth: hci0: advertising data len corrected
      [72469.803299] bt_err_ratelimited: 10 callbacks suppressed
      Signed-off-by: NSzymon Janc <szymon.janc@codecoup.pl>
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=203753
      Cc: stable@vger.kernel.org
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      1c58e933
    • K
      Bluetooth: Shutdown controller after workqueues are flushed or cancelled · 0ea9fd00
      Kai-Heng Feng 提交于
      Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
      stops working:
      [  509.691509] Bluetooth: hci0: HCI reset during shutdown failed
      [  514.897584] Bluetooth: hci0: MSFT filter_enable is already on
      [  530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
      [  545.660350] usb 3-10: device descriptor read/64, error -110
      [  561.283530] usb 3-10: device descriptor read/64, error -110
      [  561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
      [  566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
      [  568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
      [  578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
      [  586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
      [  596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)
      
      Or kernel panics because other workqueues already freed skb:
      [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [ 2048.663775] #PF: supervisor read access in kernel mode
      [ 2048.663779] #PF: error_code(0x0000) - not-present page
      [ 2048.663782] PGD 0 P4D 0
      [ 2048.663787] Oops: 0000 [#1] SMP NOPTI
      [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G        W         5.13.0-rc1-next-20210510+ #20
      [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
      [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
      [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
      07 83 f8 01
      [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
      [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
      [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
      [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
      [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
      [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
      [ 2048.663843] FS:  00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
      [ 2048.663849] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
      [ 2048.663856] PKRU: 55555554
      [ 2048.663859] Call Trace:
      [ 2048.663865]  ? skb_release_head_state+0x5e/0x80
      [ 2048.663873]  kfree_skb+0x2f/0xb0
      [ 2048.663881]  btusb_shutdown_intel_new+0x36/0x60 [btusb]
      [ 2048.663905]  hci_dev_do_close+0x48c/0x5e0 [bluetooth]
      [ 2048.663954]  ? __cond_resched+0x1a/0x50
      [ 2048.663962]  hci_rfkill_set_block+0x56/0xa0 [bluetooth]
      [ 2048.664007]  rfkill_set_block+0x98/0x170
      [ 2048.664016]  rfkill_fop_write+0x136/0x1e0
      [ 2048.664022]  vfs_write+0xc7/0x260
      [ 2048.664030]  ksys_write+0xb1/0xe0
      [ 2048.664035]  ? exit_to_user_mode_prepare+0x37/0x1c0
      [ 2048.664042]  __x64_sys_write+0x1a/0x20
      [ 2048.664048]  do_syscall_64+0x40/0xb0
      [ 2048.664055]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 2048.664060] RIP: 0033:0x7fe02ac23c27
      [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
      [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
      [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
      [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
      [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff
      
      So move the shutdown callback to a place where workqueues are either
      flushed or cancelled to resolve the issue.
      Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      0ea9fd00
    • K
      Bluetooth: Fix alt settings for incoming SCO with transparent coding format · 06d213d8
      Kiran K 提交于
      For incoming SCO connection with transparent coding format, alt setting
      of CVSD is getting applied instead of Transparent.
      
      Before fix:
      < HCI Command: Accept Synchron.. (0x01|0x0029) plen 21  #2196 [hci0] 321.342548
              Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
              Transmit bandwidth: 8000
              Receive bandwidth: 8000
              Max latency: 13
              Setting: 0x0003
                Input Coding: Linear
                Input Data Format: 1's complement
                Input Sample Size: 8-bit
                # of bits padding at MSB: 0
                Air Coding Format: Transparent Data
              Retransmission effort: Optimize for link quality (0x02)
              Packet type: 0x003f
                HV1 may be used
                HV2 may be used
                HV3 may be used
                EV3 may be used
                EV4 may be used
                EV5 may be used
      > HCI Event: Command Status (0x0f) plen 4               #2197 [hci0] 321.343585
            Accept Synchronous Connection Request (0x01|0x0029) ncmd 1
              Status: Success (0x00)
      > HCI Event: Synchronous Connect Comp.. (0x2c) plen 17  #2198 [hci0] 321.351666
              Status: Success (0x00)
              Handle: 257
              Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
              Link type: eSCO (0x02)
              Transmission interval: 0x0c
              Retransmission window: 0x04
              RX packet length: 60
              TX packet length: 60
              Air mode: Transparent (0x03)
      ........
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2336 [hci0] 321.383655
      < SCO Data TX: Handle 257 flags 0x00 dlen 60            #2337 [hci0] 321.389558
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2338 [hci0] 321.393615
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2339 [hci0] 321.393618
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2340 [hci0] 321.393618
      < SCO Data TX: Handle 257 flags 0x00 dlen 60            #2341 [hci0] 321.397070
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2342 [hci0] 321.403622
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2343 [hci0] 321.403625
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2344 [hci0] 321.403625
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2345 [hci0] 321.403625
      < SCO Data TX: Handle 257 flags 0x00 dlen 60            #2346 [hci0] 321.404569
      < SCO Data TX: Handle 257 flags 0x00 dlen 60            #2347 [hci0] 321.412091
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2348 [hci0] 321.413626
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2349 [hci0] 321.413630
      > SCO Data RX: Handle 257 flags 0x00 dlen 48            #2350 [hci0] 321.413630
      < SCO Data TX: Handle 257 flags 0x00 dlen 60            #2351 [hci0] 321.419674
      
      After fix:
      
      < HCI Command: Accept Synchronou.. (0x01|0x0029) plen 21  #309 [hci0] 49.439693
              Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
              Transmit bandwidth: 8000
              Receive bandwidth: 8000
              Max latency: 13
              Setting: 0x0003
                Input Coding: Linear
                Input Data Format: 1's complement
                Input Sample Size: 8-bit
                # of bits padding at MSB: 0
                Air Coding Format: Transparent Data
              Retransmission effort: Optimize for link quality (0x02)
              Packet type: 0x003f
                HV1 may be used
                HV2 may be used
                HV3 may be used
                EV3 may be used
                EV4 may be used
                EV5 may be used
      > HCI Event: Command Status (0x0f) plen 4                 #310 [hci0] 49.440308
            Accept Synchronous Connection Request (0x01|0x0029) ncmd 1
              Status: Success (0x00)
      > HCI Event: Synchronous Connect Complete (0x2c) plen 17  #311 [hci0] 49.449308
              Status: Success (0x00)
              Handle: 257
              Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd)
              Link type: eSCO (0x02)
              Transmission interval: 0x0c
              Retransmission window: 0x04
              RX packet length: 60
              TX packet length: 60
              Air mode: Transparent (0x03)
      < SCO Data TX: Handle 257 flags 0x00 dlen 60              #312 [hci0] 49.450421
      < SCO Data TX: Handle 257 flags 0x00 dlen 60              #313 [hci0] 49.457927
      > HCI Event: Max Slots Change (0x1b) plen 3               #314 [hci0] 49.460345
              Handle: 256
              Max slots: 5
      < SCO Data TX: Handle 257 flags 0x00 dlen 60              #315 [hci0] 49.465453
      > SCO Data RX: Handle 257 flags 0x00 dlen 60              #316 [hci0] 49.470502
      > SCO Data RX: Handle 257 flags 0x00 dlen 60              #317 [hci0] 49.470519
      < SCO Data TX: Handle 257 flags 0x00 dlen 60              #318 [hci0] 49.472996
      > SCO Data RX: Handle 257 flags 0x00 dlen 60              #319 [hci0] 49.480412
      < SCO Data TX: Handle 257 flags 0x00 dlen 60              #320 [hci0] 49.480492
      < SCO Data TX: Handle 257 flags 0x00 dlen 60              #321 [hci0] 49.487989
      > SCO Data RX: Handle 257 flags 0x00 dlen 60              #322 [hci0] 49.490303
      < SCO Data TX: Handle 257 flags 0x00 dlen 60              #323 [hci0] 49.495496
      > SCO Data RX: Handle 257 flags 0x00 dlen 60              #324 [hci0] 49.500304
      > SCO Data RX: Handle 257 flags 0x00 dlen 60              #325 [hci0] 49.500311
      Signed-off-by: NKiran K <kiran.k@intel.com>
      Signed-off-by: NLokendra Singh <lokendra.singh@intel.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      06d213d8
    • J
      Bluetooth: 6lowpan: remove unused function · b0e56db7
      Jiapeng Chong 提交于
      Fix the following clang warning:
      
      net/bluetooth/6lowpan.c:913:20: warning: unused function 'bdaddr_type'
      [-Wunused-function].
      
      net/bluetooth/6lowpan.c:106:35: warning: unused function
      'peer_lookup_ba' [-Wunused-function].
      Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: NJiapeng Chong <jiapeng.chong@linux.alibaba.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      b0e56db7
    • M
      Bluetooth: Add ncmd=0 recovery handling · de75cd0d
      Manish Mandlik 提交于
      During command status or command complete event, the controller may set
      ncmd=0 indicating that it is not accepting any more commands. In such a
      case, host holds off sending any more commands to the controller. If the
      controller doesn't recover from such condition, host will wait forever,
      until the user decides that the Bluetooth is broken and may power cycles
      the Bluetooth.
      
      This patch triggers the hardware error to reset the controller and
      driver when it gets into such state as there is no other wat out.
      Reviewed-by: NAbhishek Pandit-Subedi <abhishekpandit@chromium.org>
      Signed-off-by: NManish Mandlik <mmandlik@google.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      de75cd0d
    • Y
      Bluetooth: Fix the HCI to MGMT status conversion table · 4ef36a52
      Yu Liu 提交于
      0x2B, 0x31 and 0x33 are reserved for future use but were not present in
      the HCI to MGMT conversion table, this caused the conversion to be
      incorrect for the HCI status code greater than 0x2A.
      Reviewed-by: NMiao-chen Chou <mcchou@chromium.org>
      Signed-off-by: NYu Liu <yudiliu@google.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      4ef36a52
    • T
      Bluetooth: cmtp: fix file refcount when cmtp_attach_device fails · 3cfdf8fc
      Thadeu Lima de Souza Cascardo 提交于
      When cmtp_attach_device fails, cmtp_add_connection returns the error value
      which leads to the caller to doing fput through sockfd_put. But
      cmtp_session kthread, which is stopped in this path will also call fput,
      leading to a potential refcount underflow or a use-after-free.
      
      Add a refcount before we signal the kthread to stop. The kthread will try
      to grab the cmtp_session_sem mutex before doing the fput, which is held
      when get_file is called, so there should be no races there.
      
      Reported-by: Ryota Shiga
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      3cfdf8fc
    • Y
      Bluetooth: Return whether a connection is outbound · 1c6ed31b
      Yu Liu 提交于
      When an MGMT_EV_DEVICE_CONNECTED event is reported back to the user
      space we will set the flags to tell if the established connection is
      outbound or not. This is useful for the user space to log better metrics
      and error messages.
      Reviewed-by: NMiao-chen Chou <mcchou@chromium.org>
      Reviewed-by: NAlain Michaud <alainm@chromium.org>
      Signed-off-by: NYu Liu <yudiliu@google.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      1c6ed31b
    • Q
      Bluetooth: use flexible-array member instead of zero-length array · 07d85dbe
      Qiheng Lin 提交于
      Fix the following coccicheck warning:
      
      net/bluetooth/msft.c:37:6-13: WARNING use flexible-array member instead
      net/bluetooth/msft.c:42:6-10: WARNING use flexible-array member instead
      net/bluetooth/msft.c:52:6-10: WARNING use flexible-array member instead
      Signed-off-by: NQiheng Lin <linqiheng@huawei.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      07d85dbe
    • K
      Bluetooth: 6lowpan: delete unneeded variable initialization · c469c9c9
      Kai Ye 提交于
      Delete unneeded variable initialization.
      Signed-off-by: NKai Ye <yekai13@huawei.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      c469c9c9
    • G
      net/smc: Ensure correct state of the socket in send path · 17081633
      Guvenc Gulce 提交于
      When smc_sendmsg() is called before the SMC socket initialization has
      completed, smc_tx_sendmsg() will access un-initialized fields of the
      SMC socket which results in a null-pointer dereference.
      Fix this by checking the socket state first in smc_tx_sendmsg().
      
      Fixes: e0e4b8fa ("net/smc: Add SMC statistics support")
      Reported-by: syzbot+5dda108b672b54141857@syzkaller.appspotmail.com
      Reviewed-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NGuvenc Gulce <guvenc@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17081633
  2. 25 6月, 2021 5 次提交
    • Z
      ipv6: delete useless dst check in ip6_dst_lookup_tail · c305b9e6
      zhang kai 提交于
      parameter dst always points to null.
      Signed-off-by: Nzhang kai <zhangkaiheb@126.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c305b9e6
    • X
      sctp: send the next probe immediately once the last one is acked · fea1d5b1
      Xin Long 提交于
      These is no need to wait for 'interval' period for the next probe
      if the last probe is already acked in search state. The 'interval'
      period waiting should be only for probe failure timeout and the
      current pmtu check when it's in search complete state.
      
      This change will shorten the probe time a lot in search state, and
      also fix the document accordingly.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fea1d5b1
    • X
      sctp: do black hole detection in search complete state · 0dac127c
      Xin Long 提交于
      Currently the PLPMUTD probe will stop for a long period (interval * 30)
      after it enters search complete state. If there's a pmtu change on the
      route path, it takes a long time to be aware if the ICMP TooBig packet
      is lost or filtered.
      
      As it says in rfc8899#section-4.3:
      
        "A DPLPMTUD method MUST NOT rely solely on this method."
        (ICMP PTB message).
      
      This patch is to enable the other method for search complete state:
      
        "A PL can use the DPLPMTUD probing mechanism to periodically
         generate probe packets of the size of the current PLPMTU."
      
      With this patch, the probe will continue with the current pmtu every
      'interval' until the PMTU_RAISE_TIMER 'timeout', which we implement
      by adding raise_count to raise the probe size when it counts to 30
      and removing the SCTP_PL_COMPLETE check for PMTU_RAISE_TIMER.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0dac127c
    • J
      net: ip: avoid OOM kills with large UDP sends over loopback · 6d123b81
      Jakub Kicinski 提交于
      Dave observed number of machines hitting OOM on the UDP send
      path. The workload seems to be sending large UDP packets over
      loopback. Since loopback has MTU of 64k kernel will try to
      allocate an skb with up to 64k of head space. This has a good
      chance of failing under memory pressure. What's worse if
      the message length is <32k the allocation may trigger an
      OOM killer.
      
      This is entirely avoidable, we can use an skb with page frags.
      
      af_unix solves a similar problem by limiting the head
      length to SKB_MAX_ALLOC. This seems like a good and simple
      approach. It means that UDP messages > 16kB will now
      use fragments if underlying device supports SG, if extra
      allocator pressure causes regressions in real workloads
      we can switch to trying the large allocation first and
      falling back.
      
      v4: pre-calculate all the additions to alloclen so
          we can be sure it won't go over order-2
      Reported-by: NDave Jones <dsj@fb.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d123b81
    • M
      net: retrieve netns cookie via getsocketopt · e8b9eab9
      Martynas Pumputis 提交于
      It's getting more common to run nested container environments for
      testing cloud software. One of such examples is Kind [1] which runs a
      Kubernetes cluster in Docker containers on a single host. Each container
      acts as a Kubernetes node, and thus can run any Pod (aka container)
      inside the former. This approach simplifies testing a lot, as it
      eliminates complicated VM setups.
      
      Unfortunately, such a setup breaks some functionality when cgroupv2 BPF
      programs are used for load-balancing. The load-balancer BPF program
      needs to detect whether a request originates from the host netns or a
      container netns in order to allow some access, e.g. to a service via a
      loopback IP address. Typically, the programs detect this by comparing
      netns cookies with the one of the init ns via a call to
      bpf_get_netns_cookie(NULL). However, in nested environments the latter
      cannot be used given the Kubernetes node's netns is outside the init ns.
      To fix this, we need to pass the Kubernetes node netns cookie to the
      program in a different way: by extending getsockopt() with a
      SO_NETNS_COOKIE option, the orchestrator which runs in the Kubernetes
      node netns can retrieve the cookie and pass it to the program instead.
      
      Thus, this is following up on Eric's commit 3d368ab8 ("net:
      initialize net->net_cookie at netns setup") to allow retrieval via
      SO_NETNS_COOKIE.  This is also in line in how we retrieve socket cookie
      via SO_COOKIE.
      
        [1] https://kind.sigs.k8s.io/Signed-off-by: NLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: NMartynas Pumputis <m@lambda.lt>
      Cc: Eric Dumazet <edumazet@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8b9eab9
  3. 24 6月, 2021 6 次提交
    • D
      devlink: Protect rate list with lock while switching modes · a3e5e579
      Dmytro Linkin 提交于
      Devlink eswitch set command doesn't hold devlink->lock, which makes
      possible race condition between rate list traversing and others devlink
      rate KAPI calls, like devlink_rate_nodes_destroy().
      Hold devlink lock while traversing the list.
      
      Fixes: a8ecb93e ("devlink: Introduce rate nodes")
      Signed-off-by: NDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3e5e579
    • D
      devlink: Remove eswitch mode check for mode set call · ff99324d
      Dmytro Linkin 提交于
      When eswitch is disabled, querying its current mode results in error.
      Due to this when trying to set the eswitch mode for mlx5 devices, it
      fails to set the eswitch switchdev mode.
      Hence remove such check.
      
      Fixes: a8ecb93e ("devlink: Introduce rate nodes")
      Signed-off-by: NDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff99324d
    • D
      devlink: Decrease refcnt of parent rate object on leaf destroy · 1321ed5e
      Dmytro Linkin 提交于
      Port functions, like SFs, can be deleted by the user when its leaf rate
      object has parent node. In such case node refcnt won't be decreased
      which blocks the node from deletion later.
      Do simple refcnt decrease, since driver in cleanup stage. This:
      1) assumes that driver took proper internal parent unset action;
      2) allows to avoid nested callbacks call and deadlock.
      
      Fixes: d7555984 ("devlink: Allow setting parent node of rate objects")
      Signed-off-by: NDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1321ed5e
    • K
      tcp: Add stats for socket migration. · 55d444b3
      Kuniyuki Iwashima 提交于
      This commit adds two stats for the socket migration feature to evaluate the
      effectiveness: LINUX_MIB_TCPMIGRATEREQ(SUCCESS|FAILURE).
      
      If the migration fails because of the own_req race in receiving ACK and
      sending SYN+ACK paths, we do not increment the failure stat. Then another
      CPU is responsible for the req.
      
      Link: https://lore.kernel.org/bpf/CAK6E8=cgFKuGecTzSCSQ8z3YJ_163C0uwO9yRvfDSE7vOe9mJA@mail.gmail.com/Suggested-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55d444b3
    • Y
      net: sched: remove qdisc->empty for lockless qdisc · d3e0f575
      Yunsheng Lin 提交于
      As MISSED and DRAINING state are used to indicate a non-empty
      qdisc, qdisc->empty is not longer needed, so remove it.
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # flexcan
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3e0f575
    • Y
      net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc · c4fef01b
      Yunsheng Lin 提交于
      Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK
      flag set, but queue discipline by-pass does not work for lockless
      qdisc because skb is always enqueued to qdisc even when the qdisc
      is empty, see __dev_xmit_skb().
      
      This patch calls sch_direct_xmit() to transmit the skb directly
      to the driver for empty lockless qdisc, which aviod enqueuing
      and dequeuing operation.
      
      As qdisc->empty is not reliable to indicate a empty qdisc because
      there is a time window between enqueuing and setting qdisc->empty.
      So we use the MISSED state added in commit a90c57f2 ("net:
      sched: fix packet stuck problem for lockless qdisc"), which
      indicate there is lock contention, suggesting that it is better
      not to do the qdisc bypass in order to avoid packet out of order
      problem.
      
      In order to make MISSED state reliable to indicate a empty qdisc,
      we need to ensure that testing and clearing of MISSED state is
      within the protection of qdisc->seqlock, only setting MISSED state
      can be done without the protection of qdisc->seqlock. A MISSED
      state testing is added without the protection of qdisc->seqlock to
      aviod doing unnecessary spin_trylock() for contention case.
      
      As the enqueuing is not within the protection of qdisc->seqlock,
      there is still a potential data race as mentioned by Jakub [1]:
      
            thread1               thread2             thread3
      qdisc_run_begin() # true
                              qdisc_run_begin(q)
                                   set(MISSED)
      pfifo_fast_dequeue
        clear(MISSED)
        # recheck the queue
      qdisc_run_end()
                                  enqueue skb1
                                                   qdisc empty # true
                                                qdisc_run_begin() # true
                                                sch_direct_xmit() # skb2
                               qdisc_run_begin()
                                  set(MISSED)
      
      When above happens, skb1 enqueued by thread2 is transmited after
      skb2 is transmited by thread3 because MISSED state setting and
      enqueuing is not under the qdisc->seqlock. If qdisc bypass is
      disabled, skb1 has better chance to be transmited quicker than
      skb2.
      
      This patch does not take care of the above data race, because we
      view this as similar as below:
      Even at the same time CPU1 and CPU2 write the skb to two socket
      which both heading to the same qdisc, there is no guarantee that
      which skb will hit the qdisc first, because there is a lot of
      factor like interrupt/softirq/cache miss/scheduling afffecting
      that.
      
      There are below cases that need special handling:
      1. When MISSED state is cleared before another round of dequeuing
         in pfifo_fast_dequeue(), and __qdisc_run() might not be able to
         dequeue all skb in one round and call __netif_schedule(), which
         might result in a non-empty qdisc without MISSED set. In order
         to avoid this, the MISSED state is set for lockless qdisc and
         __netif_schedule() will be called at the end of qdisc_run_end.
      
      2. The MISSED state also need to be set for lockless qdisc instead
         of calling __netif_schedule() directly when requeuing a skb for
         a similar reason.
      
      3. For netdev queue stopped case, the MISSED case need clearing
         while the netdev queue is stopped, otherwise there may be
         unnecessary __netif_schedule() calling. So a new DRAINING state
         is added to indicate this case, which also indicate a non-empty
         qdisc.
      
      4. As there is already netif_xmit_frozen_or_stopped() checking in
         dequeue_skb() and sch_direct_xmit(), which are both within the
         protection of qdisc->seqlock, but the same checking in
         __dev_xmit_skb() is without the protection, which might cause
         empty indication of a lockless qdisc to be not reliable. So
         remove the checking in __dev_xmit_skb(), and the checking in
         the protection of qdisc->seqlock seems enough to avoid the cpu
         consumption problem for netdev queue stopped case.
      
      1. https://lkml.org/lkml/2021/5/29/215Acked-by: NJakub Kicinski <kuba@kernel.org>
      Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # flexcan
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4fef01b
  4. 23 6月, 2021 13 次提交
    • H
      net/xfrm: Add inner_ipproto into sec_path · fa453523
      Huy Nguyen 提交于
      The inner_ipproto saves the inner IP protocol of the plain
      text packet. This allows vendor's IPsec feature making offload
      decision at skb's features_check and configuring hardware at
      ndo_start_xmit.
      
      For example, ConnectX6-DX IPsec device needs the plaintext's
      IP protocol to support partial checksum offload on
      VXLAN/GENEVE packet over IPsec transport mode tunnel.
      Signed-off-by: NRaed Salem <raeds@nvidia.com>
      Signed-off-by: NHuy Nguyen <huyn@nvidia.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      fa453523
    • P
      mptcp: refine mptcp_cleanup_rbuf · fde56eea
      Paolo Abeni 提交于
      The current cleanup rbuf tries a bit too hard to avoid acquiring
      the subflow socket lock. We may end-up delaying the needed ack,
      or skip acking a blocked subflow.
      
      Address the above extending the conditions used to trigger the cleanup
      to reflect more closely what TCP does and invoking tcp_cleanup_rbuf()
      on all the active subflows.
      
      Note that we can't replicate the exact tests implemented in
      tcp_cleanup_rbuf(), as MPTCP lacks some of the required info - e.g.
      ping-pong mode.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fde56eea
    • G
      mptcp: add deny_join_id0 in mptcp_options_received · df377be3
      Geliang Tang 提交于
      This patch added a new flag named deny_join_id0 in struct
      mptcp_options_received. Set it when MP_CAPABLE with the flag
      MPTCP_CAP_DENYJOIN_ID0 is received.
      
      Also add a new flag remote_deny_join_id0 in struct mptcp_pm_data. When the
      flag deny_join_id0 is set, set this remote_deny_join_id0 flag.
      
      In mptcp_pm_create_subflow_or_signal_addr, if the remote_deny_join_id0 flag
      is set, and the remote address id is zero, stop this connection.
      Suggested-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df377be3
    • G
      mptcp: add allow_join_id0 in mptcp_out_options · bab6b88e
      Geliang Tang 提交于
      This patch defined a new flag MPTCP_CAP_DENY_JOIN_ID0 for the third bit,
      labeled "C" of the MP_CAPABLE option.
      
      Add a new flag allow_join_id0 in struct mptcp_out_options. If this flag is
      set, send out the MP_CAPABLE option with the flag MPTCP_CAP_DENY_JOIN_ID0.
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bab6b88e
    • G
      mptcp: add sysctl allow_join_initial_addr_port · d2f77960
      Geliang Tang 提交于
      This patch added a new sysctl, named allow_join_initial_addr_port, to
      control whether allow peers to send join requests to the IP address and
      port number used by the initial subflow.
      Suggested-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2f77960
    • X
      sctp: process sctp over udp icmp err on sctp side · 9e47df00
      Xin Long 提交于
      Previously, sctp over udp was using udp tunnel's icmp err process, which
      only does sk lookup on sctp side. However for sctp's icmp error process,
      there are more things to do, like syncing assoc pmtu/retransmit packets
      for toobig type err, and starting proto_unreach_timer for unreach type
      err etc.
      
      Now after adding PLPMTUD, which also requires to process toobig type err
      on sctp side. This patch is to process icmp err on sctp side by parsing
      the type/code/info in .encap_err_lookup and call sctp's icmp processing
      functions. Note as the 'redirect' err process needs to know the outer
      ip(v6) header's, we have to leave it to udp(v6)_err to handle it.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e47df00
    • X
      sctp: extract sctp_v4_err_handle function from sctp_v4_err · d8306075
      Xin Long 提交于
      This patch is to extract sctp_v4_err_handle() from sctp_v4_err() to
      only handle the icmp err after the sock lookup, and it also makes
      the code clearer.
      
      sctp_v4_err_handle() will be used in sctp over udp's err handling
      in the following patch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8306075
    • X
      sctp: extract sctp_v6_err_handle function from sctp_v6_err · f6549bd3
      Xin Long 提交于
      This patch is to extract sctp_v6_err_handle() from sctp_v6_err() to
      only handle the icmp err after the sock lookup, and it also makes
      the code clearer.
      
      sctp_v6_err_handle() will be used in sctp over udp's err handling
      in the following patch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6549bd3
    • X
      sctp: remove the unessessary hold for idev in sctp_v6_err · 237a6a2e
      Xin Long 提交于
      Same as in tcp_v6_err() and __udp6_lib_err(), there's no need to
      hold idev in sctp_v6_err(), so just call __in6_dev_get() instead.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      237a6a2e
    • X
      sctp: enable PLPMTUD when the transport is ready · 7307e4fa
      Xin Long 提交于
      sctp_transport_pl_reset() is called whenever any of these 3 members in
      transport is changed:
      
        - probe_interval
        - param_flags & SPP_PMTUD_ENABLE
        - state == ACTIVE
      
      If all are true, start the PLPMTUD when it's not yet started. If any of
      these is false, stop the PLPMTUD when it's already running.
      
      sctp_transport_pl_update() is called when the transport dst has changed.
      It will restart the PLPMTUD probe. Again, the pathmtu won't change but
      use the dst's mtu until the Search phase is done.
      
      Note that after using PLPMTUD, the pathmtu is only initialized with the
      dst mtu when the transport dst changes. At other time it is updated by
      pl.pmtu. So sctp_transport_pmtu_check() will be called only when PLPMTUD
      is disabled in sctp_packet_config().
      
      After this patch, the PLPMTUD feature from RFC8899 will be activated
      and can be used by users.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7307e4fa
    • X
      sctp: do state transition when receiving an icmp TOOBIG packet · 83696408
      Xin Long 提交于
      PLPMTUD will short-circuit the old process for icmp TOOBIG packets.
      This part is described in rfc8899#section-4.6.2 (PL_PTB_SIZE =
      PTB_SIZE - other_headers_len). Note that from rfc8899#section-5.2
      State Machine, each case below is for some specific states only:
      
        a) PL_PTB_SIZE < MIN_PLPMTU || PL_PTB_SIZE >= PROBED_SIZE,
           discard it, for any state
      
        b) MIN_PLPMTU < PL_PTB_SIZE < BASE_PLPMTU,
           Base -> Error, for Base state
      
        c) BASE_PLPMTU <= PL_PTB_SIZE < PLPMTU,
           Search -> Base or Complete -> Base, for Search and Complete states.
      
        d) PLPMTU < PL_PTB_SIZE < PROBED_SIZE,
           set pl.probe_size to PL_PTB_SIZE then verify it, for Search state.
      
      The most important one is case d), which will help find the optimal
      fast during searching. Like when pathmtu = 1392 for SCTP over IPv4,
      the search will be (20 is iphdr_len):
      
        1. probe with 1200 - 20
        2. probe with 1232 - 20
        3. probe with 1264 - 20
        ...
        7. probe with 1388 - 20
        8. probe with 1420 - 20
      
      When sending the probe with 1420 - 20, TOOBIG may come with PL_PTB_SIZE =
      1392 - 20. Then it matches case d), and saves some rounds to try with the
      1392 - 20 probe. But of course, PLPMTUD doesn't trust TOOBIG packets, and
      it will go back to the common searching once the probe with the new size
      can't be verified.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83696408
    • X
      sctp: do state transition when a probe succeeds on HB ACK recv path · b87641af
      Xin Long 提交于
      As described in rfc8899#section-5.2, when a probe succeeds, there might
      be the following state transitions:
      
        - Base -> Search, occurs when probe succeeds with BASE_PLPMTU,
          pl.pmtu is not changing,
          pl.probe_size increases by SCTP_PL_BIG_STEP,
      
        - Error -> Search, occurs when probe succeeds with BASE_PLPMTU,
          pl.pmtu is changed from SCTP_MIN_PLPMTU to SCTP_BASE_PLPMTU,
          pl.probe_size increases by SCTP_PL_BIG_STEP.
      
        - Search -> Search Complete, occurs when probe succeeds with the probe
          size SCTP_MAX_PLPMTU less than pl.probe_high,
          pl.pmtu is not changing, but update *pathmtu* with it,
          pl.probe_size is set back to pl.pmtu to double check it.
      
        - Search Complete -> Search, occurs when probe succeeds with the probe
          size equal to pl.pmtu,
          pl.pmtu is not changing,
          pl.probe_size increases by SCTP_PL_MIN_STEP.
      
      So search process can be described as:
      
       1. When it just enters 'Search' state, *pathmtu* is not updated with
          pl.pmtu, and probe_size increases by a big step (SCTP_PL_BIG_STEP)
          each round.
      
       2. Until pl.probe_high is set when a probe fails, and probe_size
          decreases back to pl.pmtu, as described in the last patch.
      
       3. When the probe with the new size succeeds, probe_size changes to
          increase by a small step (SCTP_PL_MIN_STEP) due to pl.probe_high
          is set.
      
       4. Until probe_size is next to pl.probe_high, the searching finishes and
          it goes to 'Complete' state and updates *pathmtu* with pl.pmtu, and
          then probe_size is set to pl.pmtu to confirm by once more probe.
      
       5. This probe occurs after "30 * probe_inteval", a much longer time than
          that in Search state. Once it is done it goes to 'Search' state again
          with probe_size increased by SCTP_PL_MIN_STEP.
      
      As we can see above, during the searching, pl.pmtu changes while *pathmtu*
      doesn't. *pathmtu* is only updated when the search finishes by which it
      gets an optimal value for it. A big step is used at the beginning until
      it gets close to the optimal value, then it changes to a small step until
      it has this optimal value.
      
      The small step is also used in 'Complete' until it goes to 'Search' state
      again and the probe with 'pmtu + the small step' succeeds, which means a
      higher size could be used. Then probe_size changes to increase by a big
      step again until it gets close to the next optimal value.
      
      Note that anytime when black hole is detected, it goes directly to 'Base'
      state with pl.pmtu set to SCTP_BASE_PLPMTU, as described in the last patch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b87641af
    • X
      sctp: do state transition when PROBE_COUNT == MAX_PROBES on HB send path · 1dc68c19
      Xin Long 提交于
      The state transition is described in rfc8899#section-5.2,
      PROBE_COUNT == MAX_PROBES means the probe fails for MAX times, and the
      state transition includes:
      
        - Base -> Error, occurs when BASE_PLPMTU Confirmation Fails,
          pl.pmtu is set to SCTP_MIN_PLPMTU,
          probe_size is still SCTP_BASE_PLPMTU;
      
        - Search -> Base, occurs when Black Hole Detected,
          pl.pmtu is set to SCTP_BASE_PLPMTU,
          probe_size is set back to SCTP_BASE_PLPMTU;
      
        - Search Complete -> Base, occurs when Black Hole Detected
          pl.pmtu is set to SCTP_BASE_PLPMTU,
          probe_size is set back to SCTP_BASE_PLPMTU;
      
      Note a black hole is encountered when a sender is unaware that packets
      are not being delivered to the destination endpoint. So it includes the
      probe failures with equal probe_size to pl.pmtu, and definitely not
      include that with greater probe_size than pl.pmtu. The later one is the
      normal probe failure where probe_size should decrease back to pl.pmtu
      and pl.probe_high is set.  pl.probe_high would be used on HB ACK recv
      path in the next patch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dc68c19