1. 15 1月, 2020 2 次提交
  2. 02 1月, 2020 1 次提交
    • M
      mac80211: mesh: restrict airtime metric to peered established plinks · 02a61449
      Markus Theil 提交于
      The following warning is triggered every time an unestablished mesh peer
      gets dumped. Checks if a peer link is established before retrieving the
      airtime link metric.
      
      [ 9563.022567] WARNING: CPU: 0 PID: 6287 at net/mac80211/mesh_hwmp.c:345
                     airtime_link_metric_get+0xa2/0xb0 [mac80211]
      [ 9563.022697] Hardware name: PC Engines apu2/apu2, BIOS v4.10.0.3
      [ 9563.022756] RIP: 0010:airtime_link_metric_get+0xa2/0xb0 [mac80211]
      [ 9563.022838] Call Trace:
      [ 9563.022897]  sta_set_sinfo+0x936/0xa10 [mac80211]
      [ 9563.022964]  ieee80211_dump_station+0x6d/0x90 [mac80211]
      [ 9563.023062]  nl80211_dump_station+0x154/0x2a0 [cfg80211]
      [ 9563.023120]  netlink_dump+0x17b/0x370
      [ 9563.023130]  netlink_recvmsg+0x2a4/0x480
      [ 9563.023140]  ____sys_recvmsg+0xa6/0x160
      [ 9563.023154]  ___sys_recvmsg+0x93/0xe0
      [ 9563.023169]  __sys_recvmsg+0x7e/0xd0
      [ 9563.023210]  do_syscall_64+0x4e/0x140
      [ 9563.023217]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Signed-off-by: NMarkus Theil <markus.theil@tu-ilmenau.de>
      Link: https://lore.kernel.org/r/20191203180644.70653-1-markus.theil@tu-ilmenau.de
      [rewrite commit message]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      02a61449
  3. 31 12月, 2019 3 次提交
    • T
      hsr: fix slab-out-of-bounds Read in hsr_debugfs_rename() · 04b69426
      Taehee Yoo 提交于
      hsr slave interfaces don't have debugfs directory.
      So, hsr_debugfs_rename() shouldn't be called when hsr slave interface name
      is changed.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add dummy1 type dummy
          ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1
          ip link set dummy0 name ap
      
      Splat looks like:
      [21071.899367][T22666] ap: renamed from dummy0
      [21071.914005][T22666] ==================================================================
      [21071.919008][T22666] BUG: KASAN: slab-out-of-bounds in hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.923640][T22666] Read of size 8 at addr ffff88805febcd98 by task ip/22666
      [21071.926941][T22666]
      [21071.927750][T22666] CPU: 0 PID: 22666 Comm: ip Not tainted 5.5.0-rc2+ #240
      [21071.929919][T22666] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [21071.935094][T22666] Call Trace:
      [21071.935867][T22666]  dump_stack+0x96/0xdb
      [21071.936687][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.937774][T22666]  print_address_description.constprop.5+0x1be/0x360
      [21071.939019][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.940081][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.940949][T22666]  __kasan_report+0x12a/0x16f
      [21071.941758][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.942674][T22666]  kasan_report+0xe/0x20
      [21071.943325][T22666]  hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.944187][T22666]  hsr_netdev_notify+0x1fe/0x9b0 [hsr]
      [21071.945052][T22666]  ? __module_text_address+0x13/0x140
      [21071.945897][T22666]  notifier_call_chain+0x90/0x160
      [21071.946743][T22666]  dev_change_name+0x419/0x840
      [21071.947496][T22666]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
      [21071.948600][T22666]  ? netdev_adjacent_rename_links+0x280/0x280
      [21071.949577][T22666]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
      [21071.950672][T22666]  ? lock_downgrade+0x6e0/0x6e0
      [21071.951345][T22666]  ? do_setlink+0x811/0x2ef0
      [21071.951991][T22666]  do_setlink+0x811/0x2ef0
      [21071.952613][T22666]  ? is_bpf_text_address+0x81/0xe0
      [ ... ]
      
      Reported-by: syzbot+9328206518f08318a5fd@syzkaller.appspotmail.com
      Fixes: 4c2d5e33 ("hsr: rename debugfs file when interface name is changed")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04b69426
    • D
      net/sched: add delete_empty() to filters and use it in cls_flower · a5b72a08
      Davide Caratti 提交于
      Revert "net/sched: cls_u32: fix refcount leak in the error path of
      u32_change()", and fix the u32 refcount leak in a more generic way that
      preserves the semantic of rule dumping.
      On tc filters that don't support lockless insertion/removal, there is no
      need to guard against concurrent insertion when a removal is in progress.
      Therefore, for most of them we can avoid a full walk() when deleting, and
      just decrease the refcount, like it was done on older Linux kernels.
      This fixes situations where walk() was wrongly detecting a non-empty
      filter, like it happened with cls_u32 in the error path of change(), thus
      leading to failures in the following tdc selftests:
      
       6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev
       6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle
       74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id
      
      On cls_flower, and on (future) lockless filters, this check is necessary:
      move all the check_empty() logic in a callback so that each filter
      can have its own implementation. For cls_flower, it's sufficient to check
      if no IDRs have been allocated.
      
      This reverts commit 275c44aa.
      
      Changes since v1:
       - document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED
         is used, thanks to Vlad Buslov
       - implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov
       - squash revert and new fix in a single patch, to be nice with bisect
         tests that run tdc on u32 filter, thanks to Dave Miller
      
      Fixes: 275c44aa ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()")
      Fixes: 6676d5e4 ("net: sched: set dedicated tcf_walker flag when tp is empty")
      Suggested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Suggested-by: NVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: NVlad Buslov <vladbu@mellanox.com>
      Tested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5b72a08
    • C
      tcp: Fix highest_sack and highest_sack_seq · 85369750
      Cambda Zhu 提交于
      >From commit 50895b9d ("tcp: highest_sack fix"), the logic about
      setting tp->highest_sack to the head of the send queue was removed.
      Of course the logic is error prone, but it is logical. Before we
      remove the pointer to the highest sack skb and use the seq instead,
      we need to set tp->highest_sack to NULL when there is no skb after
      the last sack, and then replace NULL with the real skb when new skb
      inserted into the rtx queue, because the NULL means the highest sack
      seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent,
      the next ACK with sack option will increase tp->reordering unexpectedly.
      
      This patch sets tp->highest_sack to the tail of the rtx queue if
      it's NULL and new data is sent. The patch keeps the rule that the
      highest_sack can only be maintained by sack processing, except for
      this only case.
      
      Fixes: 50895b9d ("tcp: highest_sack fix")
      Signed-off-by: NCambda Zhu <cambda@linux.alibaba.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85369750
  4. 28 12月, 2019 1 次提交
    • S
      net/sched: act_mirred: Pull mac prior redir to non mac_header_xmit device · 70cf3dc7
      Shmulik Ladkani 提交于
      There's no skb_pull performed when a mirred action is set at egress of a
      mac device, with a target device/action that expects skb->data to point
      at the network header.
      
      As a result, either the target device is errornously given an skb with
      data pointing to the mac (egress case), or the net stack receives the
      skb with data pointing to the mac (ingress case).
      
      E.g:
       # tc qdisc add dev eth9 root handle 1: prio
       # tc filter add dev eth9 parent 1: prio 9 protocol ip handle 9 basic \
         action mirred egress redirect dev tun0
      
       (tun0 is a tun device. result: tun0 errornously gets the eth header
        instead of the iph)
      
      Revise the push/pull logic of tcf_mirred_act() to not rely on the
      skb_at_tc_ingress() vs tcf_mirred_act_wants_ingress() comparison, as it
      does not cover all "pull" cases.
      
      Instead, calculate whether the required action on the target device
      requires the data to point at the network header, and compare this to
      whether skb->data points to network header - and make the push/pull
      adjustments as necessary.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NShmulik Ladkani <sladkani@proofpoint.com>
      Tested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70cf3dc7
  5. 27 12月, 2019 1 次提交
  6. 26 12月, 2019 6 次提交
    • T
      hsr: reset network header when supervision frame is created · 3ed0a1d5
      Taehee Yoo 提交于
      The supervision frame is L2 frame.
      When supervision frame is created, hsr module doesn't set network header.
      If tap routine is enabled, dev_queue_xmit_nit() is called and it checks
      network_header. If network_header pointer wasn't set(or invalid),
      it resets network_header and warns.
      In order to avoid unnecessary warning message, resetting network_header
      is needed.
      
      Test commands:
          ip netns add nst
          ip link add veth0 type veth peer name veth1
          ip link add veth2 type veth peer name veth3
          ip link set veth1 netns nst
          ip link set veth3 netns nst
          ip link set veth0 up
          ip link set veth2 up
          ip link add hsr0 type hsr slave1 veth0 slave2 veth2
          ip a a 192.168.100.1/24 dev hsr0
          ip link set hsr0 up
          ip netns exec nst ip link set veth1 up
          ip netns exec nst ip link set veth3 up
          ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3
          ip netns exec nst ip a a 192.168.100.2/24 dev hsr1
          ip netns exec nst ip link set hsr1 up
          tcpdump -nei veth0
      
      Splat looks like:
      [  175.852292][    C3] protocol 88fb is buggy, dev veth0
      
      Fixes: f421436a ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ed0a1d5
    • T
      hsr: fix a race condition in node list insertion and deletion · 92a35678
      Taehee Yoo 提交于
      hsr nodes are protected by RCU and there is no write side lock.
      But node insertions and deletions could be being operated concurrently.
      So write side locking is needed.
      
      Test commands:
          ip netns add nst
          ip link add veth0 type veth peer name veth1
          ip link add veth2 type veth peer name veth3
          ip link set veth1 netns nst
          ip link set veth3 netns nst
          ip link set veth0 up
          ip link set veth2 up
          ip link add hsr0 type hsr slave1 veth0 slave2 veth2
          ip a a 192.168.100.1/24 dev hsr0
          ip link set hsr0 up
          ip netns exec nst ip link set veth1 up
          ip netns exec nst ip link set veth3 up
          ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3
          ip netns exec nst ip a a 192.168.100.2/24 dev hsr1
          ip netns exec nst ip link set hsr1 up
      
          for i in {0..9}
          do
              for j in {0..9}
      	do
      	    for k in {0..9}
      	    do
      	        for l in {0..9}
      		do
      	        arping 192.168.100.2 -I hsr0 -s 00:01:3$i:4$j:5$k:6$l -c1 &
      		done
      	    done
      	done
          done
      
      Splat looks like:
      [  236.066091][ T3286] list_add corruption. next->prev should be prev (ffff8880a5940300), but was ffff8880a5940d0.
      [  236.069617][ T3286] ------------[ cut here ]------------
      [  236.070545][ T3286] kernel BUG at lib/list_debug.c:25!
      [  236.071391][ T3286] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  236.072343][ T3286] CPU: 0 PID: 3286 Comm: arping Tainted: G        W         5.5.0-rc1+ #209
      [  236.073463][ T3286] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  236.074695][ T3286] RIP: 0010:__list_add_valid+0x74/0xd0
      [  236.075499][ T3286] Code: 48 39 da 75 27 48 39 f5 74 36 48 39 dd 74 31 48 83 c4 08 b8 01 00 00 00 5b 5d c3 48 b
      [  236.078277][ T3286] RSP: 0018:ffff8880aaa97648 EFLAGS: 00010286
      [  236.086991][ T3286] RAX: 0000000000000075 RBX: ffff8880d4624c20 RCX: 0000000000000000
      [  236.088000][ T3286] RDX: 0000000000000075 RSI: 0000000000000008 RDI: ffffed1015552ebf
      [  236.098897][ T3286] RBP: ffff88809b53d200 R08: ffffed101b3c04f9 R09: ffffed101b3c04f9
      [  236.099960][ T3286] R10: 00000000308769a1 R11: ffffed101b3c04f8 R12: ffff8880d4624c28
      [  236.100974][ T3286] R13: ffff8880d4624c20 R14: 0000000040310100 R15: ffff8880ce17ee02
      [  236.138967][ T3286] FS:  00007f23479fa680(0000) GS:ffff8880d9c00000(0000) knlGS:0000000000000000
      [  236.144852][ T3286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  236.145720][ T3286] CR2: 00007f4a14bab210 CR3: 00000000a61c6001 CR4: 00000000000606f0
      [  236.146776][ T3286] Call Trace:
      [  236.147222][ T3286]  hsr_add_node+0x314/0x490 [hsr]
      [  236.153633][ T3286]  hsr_forward_skb+0x2b6/0x1bc0 [hsr]
      [  236.154362][ T3286]  ? rcu_read_lock_sched_held+0x90/0xc0
      [  236.155091][ T3286]  ? rcu_read_lock_bh_held+0xa0/0xa0
      [  236.156607][ T3286]  hsr_dev_xmit+0x70/0xd0 [hsr]
      [  236.157254][ T3286]  dev_hard_start_xmit+0x160/0x740
      [  236.157941][ T3286]  __dev_queue_xmit+0x1961/0x2e10
      [  236.158565][ T3286]  ? netdev_core_pick_tx+0x2e0/0x2e0
      [ ... ]
      
      Reported-by: syzbot+3924327f9ad5f4d2b343@syzkaller.appspotmail.com
      Fixes: f421436a ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92a35678
    • T
      hsr: rename debugfs file when interface name is changed · 4c2d5e33
      Taehee Yoo 提交于
      hsr interface has own debugfs file, which name is same with interface name.
      So, interface name is changed, debugfs file name should be changed too.
      
      Fixes: fc4ecaee ("net: hsr: add debugfs support for display node list")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c2d5e33
    • T
      hsr: add hsr root debugfs directory · c6c4ccd7
      Taehee Yoo 提交于
      In current hsr code, when hsr interface is created, it creates debugfs
      directory /sys/kernel/debug/<interface name>.
      If there is same directory or file name in there, it fails.
      In order to reduce possibility of failure of creation of debugfs,
      this patch adds root directory.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add dummy1 type dummy
          ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1
      
      Before this patch:
          /sys/kernel/debug/hsr0/node_table
      
      After this patch:
          /sys/kernel/debug/hsr/hsr0/node_table
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6c4ccd7
    • T
      hsr: fix error handling routine in hsr_dev_finalize() · 1d19e2d5
      Taehee Yoo 提交于
      hsr_dev_finalize() is called to create new hsr interface.
      There are some wrong error handling codes.
      
      1. wrong checking return value of debugfs_create_{dir/file}.
      These function doesn't return NULL. If error occurs in there,
      it returns error pointer.
      So, it should check error pointer instead of NULL.
      
      2. It doesn't unregister interface if it fails to setup hsr interface.
      If it fails to initialize hsr interface after register_netdevice(),
      it should call unregister_netdevice().
      
      3. Ignore failure of creation of debugfs
      If creating of debugfs dir and file is failed, creating hsr interface
      will be failed. But debugfs doesn't affect actual logic of hsr module.
      So, ignoring this is more correct and this behavior is more general.
      
      Fixes: c5a75911 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d19e2d5
    • T
      hsr: avoid debugfs warning message when module is remove · 84bb59d7
      Taehee Yoo 提交于
      When hsr module is being removed, debugfs_remove() is called to remove
      both debugfs directory and file.
      
      When module is being removed, module state is changed to
      MODULE_STATE_GOING then exit() is called.
      At this moment, module couldn't be held so try_module_get()
      will be failed.
      
      debugfs's open() callback tries to hold the module if .owner is existing.
      If it fails, warning message is printed.
      
      CPU0				CPU1
      delete_module()
          try_stop_module()
          hsr_exit()			open() <-- WARNING
              debugfs_remove()
      
      In order to avoid the warning message, this patch makes hsr module does
      not set .owner. Unsetting .owner is safe because these are protected by
      inode_lock().
      
      Test commands:
          #SHELL1
          ip link add dummy0 type dummy
          ip link add dummy1 type dummy
          while :
          do
              ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1
      	modprobe -rv hsr
          done
      
          #SHELL2
          while :
          do
              cat /sys/kernel/debug/hsr0/node_table
          done
      
      Splat looks like:
      [  101.223783][ T1271] ------------[ cut here ]------------
      [  101.230309][ T1271] debugfs file owner did not clean up at exit: node_table
      [  101.230380][ T1271] WARNING: CPU: 3 PID: 1271 at fs/debugfs/file.c:309 full_proxy_open+0x10f/0x650
      [  101.233153][ T1271] Modules linked in: hsr(-) dummy veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_d]
      [  101.237112][ T1271] CPU: 3 PID: 1271 Comm: cat Tainted: G        W         5.5.0-rc1+ #204
      [  101.238270][ T1271] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  101.240379][ T1271] RIP: 0010:full_proxy_open+0x10f/0x650
      [  101.241166][ T1271] Code: 48 c1 ea 03 80 3c 02 00 0f 85 c1 04 00 00 49 8b 3c 24 e8 04 86 7e ff 84 c0 75 2d 4c 8
      [  101.251985][ T1271] RSP: 0018:ffff8880ca22fa38 EFLAGS: 00010286
      [  101.273355][ T1271] RAX: dffffc0000000008 RBX: ffff8880cc6e6200 RCX: 0000000000000000
      [  101.274466][ T1271] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff8880c4dd5c14
      [  101.275581][ T1271] RBP: 0000000000000000 R08: fffffbfff2922f5d R09: 0000000000000000
      [  101.276733][ T1271] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffffc0551bc0
      [  101.277853][ T1271] R13: ffff8880c4059a48 R14: ffff8880be50a5e0 R15: ffffffff941adaa0
      [  101.278956][ T1271] FS:  00007f8871cda540(0000) GS:ffff8880da800000(0000) knlGS:0000000000000000
      [  101.280216][ T1271] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  101.282832][ T1271] CR2: 00007f88717cfd10 CR3: 00000000b9440005 CR4: 00000000000606e0
      [  101.283974][ T1271] Call Trace:
      [  101.285328][ T1271]  do_dentry_open+0x63c/0xf50
      [  101.286077][ T1271]  ? open_proxy_open+0x270/0x270
      [  101.288271][ T1271]  ? __x64_sys_fchdir+0x180/0x180
      [  101.288987][ T1271]  ? inode_permission+0x65/0x390
      [  101.289682][ T1271]  path_openat+0x701/0x2810
      [  101.290294][ T1271]  ? path_lookupat+0x880/0x880
      [  101.290957][ T1271]  ? check_chain_key+0x236/0x5d0
      [  101.291676][ T1271]  ? __lock_acquire+0xdfe/0x3de0
      [  101.292358][ T1271]  ? sched_clock+0x5/0x10
      [  101.292962][ T1271]  ? sched_clock_cpu+0x18/0x170
      [  101.293644][ T1271]  ? find_held_lock+0x39/0x1d0
      [  101.305616][ T1271]  do_filp_open+0x17a/0x270
      [  101.306061][ T1271]  ? may_open_dev+0xc0/0xc0
      [ ... ]
      
      Fixes: fc4ecaee ("net: hsr: add debugfs support for display node list")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84bb59d7
  7. 25 12月, 2019 7 次提交
    • H
      sit: do not confirm neighbor when do pmtu update · 4d42df46
      Hangbin Liu 提交于
      When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
      we should not call dst_confirm_neigh() as there is no two-way communication.
      
      v5: No change.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      Reviewed-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d42df46
    • H
      vti: do not confirm neighbor when do pmtu update · 8247a79e
      Hangbin Liu 提交于
      When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
      we should not call dst_confirm_neigh() as there is no two-way communication.
      
      Although vti and vti6 are immune to this problem because they are IFF_NOARP
      interfaces, as Guillaume pointed. There is still no sense to confirm neighbour
      here.
      
      v5: Update commit description.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      Reviewed-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8247a79e
    • H
      tunnel: do not confirm neighbor when do pmtu update · 7a1592bc
      Hangbin Liu 提交于
      When do tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
      we should not call dst_confirm_neigh() as there is no two-way communication.
      
      v5: No Change.
      v4: Update commit description
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      
      Fixes: 0dec879f ("net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP")
      Reviewed-by: NGuillaume Nault <gnault@redhat.com>
      Tested-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a1592bc
    • H
      ip6_gre: do not confirm neighbor when do pmtu update · 675d76ad
      Hangbin Liu 提交于
      When we do ipv6 gre pmtu update, we will also do neigh confirm currently.
      This will cause the neigh cache be refreshed and set to REACHABLE before
      xmit.
      
      But if the remote mac address changed, e.g. device is deleted and recreated,
      we will not able to notice this and still use the old mac address as the neigh
      cache is REACHABLE.
      
      Fix this by disable neigh confirm when do pmtu update
      
      v5: No change.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Reviewed-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      675d76ad
    • H
      net: add bool confirm_neigh parameter for dst_ops.update_pmtu · bd085ef6
      Hangbin Liu 提交于
      The MTU update code is supposed to be invoked in response to real
      networking events that update the PMTU. In IPv6 PMTU update function
      __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
      confirmed time.
      
      But for tunnel code, it will call pmtu before xmit, like:
        - tnl_update_pmtu()
          - skb_dst_update_pmtu()
            - ip6_rt_update_pmtu()
              - __ip6_rt_update_pmtu()
                - dst_confirm_neigh()
      
      If the tunnel remote dst mac address changed and we still do the neigh
      confirm, we will not be able to update neigh cache and ping6 remote
      will failed.
      
      So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
      should not be invoking dst_confirm_neigh() as we have no evidence
      of successful two-way communication at this point.
      
      On the other hand it is also important to keep the neigh reachability fresh
      for TCP flows, so we cannot remove this dst_confirm_neigh() call.
      
      To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
      to choose whether we should do neigh update or not. I will add the parameter
      in this patch and set all the callers to true to comply with the previous
      way, and fix the tunnel code one by one on later patches.
      
      v5: No change.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Reviewed-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd085ef6
    • M
      sctp: fix err handling of stream initialization · 61d5d406
      Marcelo Ricardo Leitner 提交于
      The fix on 951c6db9 fixed the issued reported there but introduced
      another. When the allocation fails within sctp_stream_init() it is
      okay/necessary to free the genradix. But it is also called when adding
      new streams, from sctp_send_add_streams() and
      sctp_process_strreset_addstrm_in() and in those situations it cannot
      just free the genradix because by then it is a fully operational
      association.
      
      The fix here then is to only free the genradix in sctp_stream_init()
      and on those other call sites  move on with what it already had and let
      the subsequent error handling to handle it.
      
      Tested with the reproducers from this report and the previous one,
      with lksctp-tools and sctp-tests.
      
      Reported-by: syzbot+9a1bc632e78a1a98488b@syzkaller.appspotmail.com
      Fixes: 951c6db9 ("sctp: fix memleak on err handling of stream initialization")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61d5d406
    • A
      udp: fix integer overflow while computing available space in sk_rcvbuf · feed8a4f
      Antonio Messina 提交于
      When the size of the receive buffer for a socket is close to 2^31 when
      computing if we have enough space in the buffer to copy a packet from
      the queue to the buffer we might hit an integer overflow.
      
      When an user set net.core.rmem_default to a value close to 2^31 UDP
      packets are dropped because of this overflow. This can be visible, for
      instance, with failure to resolve hostnames.
      
      This can be fixed by casting sk_rcvbuf (which is an int) to unsigned
      int, similarly to how it is done in TCP.
      Signed-off-by: NAntonio Messina <amessina@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      feed8a4f
  8. 21 12月, 2019 6 次提交
  9. 20 12月, 2019 5 次提交
    • D
      net/sched: cls_u32: fix refcount leak in the error path of u32_change() · 275c44aa
      Davide Caratti 提交于
      when users replace cls_u32 filters with new ones having wrong parameters,
      so that u32_change() fails to validate them, the kernel doesn't roll-back
      correctly, and leaves semi-configured rules.
      
      Fix this in u32_walk(), avoiding a call to the walker function on filters
      that don't have a match rule connected. The side effect is, these "empty"
      filters are not even dumped when present; but that shouldn't be a problem
      as long as we are restoring the original behaviour, where semi-configured
      filters were not even added in the error path of u32_change().
      
      Fixes: 6676d5e4 ("net: sched: set dedicated tcf_walker flag when tp is empty")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      275c44aa
    • P
      netfilter: nft_tproxy: Fix port selector on Big Endian · 8cb4ec44
      Phil Sutter 提交于
      On Big Endian architectures, u16 port value was extracted from the wrong
      parts of u32 sreg_port, just like commit 10596608 ("netfilter:
      nf_tables: fix mismatch in big-endian system") describes.
      
      Fixes: 4ed8eb65 ("netfilter: nf_tables: Add native tproxy support")
      Signed-off-by: NPhil Sutter <phil@nwl.cc>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NMáté Eckl <ecklm94@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8cb4ec44
    • F
      netfilter: ebtables: compat: reject all padding in matches/watchers · e608f631
      Florian Westphal 提交于
      syzbot reported following splat:
      
      BUG: KASAN: vmalloc-out-of-bounds in size_entry_mwt net/bridge/netfilter/ebtables.c:2063 [inline]
      BUG: KASAN: vmalloc-out-of-bounds in compat_copy_entries+0x128b/0x1380 net/bridge/netfilter/ebtables.c:2155
      Read of size 4 at addr ffffc900004461f4 by task syz-executor267/7937
      
      CPU: 1 PID: 7937 Comm: syz-executor267 Not tainted 5.5.0-rc1-syzkaller #0
       size_entry_mwt net/bridge/netfilter/ebtables.c:2063 [inline]
       compat_copy_entries+0x128b/0x1380 net/bridge/netfilter/ebtables.c:2155
       compat_do_replace+0x344/0x720 net/bridge/netfilter/ebtables.c:2249
       compat_do_ebt_set_ctl+0x22f/0x27e net/bridge/netfilter/ebtables.c:2333
       [..]
      
      Because padding isn't considered during computation of ->buf_user_offset,
      "total" is decremented by fewer bytes than it should.
      
      Therefore, the first part of
      
      if (*total < sizeof(*entry) || entry->next_offset < sizeof(*entry))
      
      will pass, -- it should not have.  This causes oob access:
      entry->next_offset is past the vmalloced size.
      
      Reject padding and check that computed user offset (sum of ebt_entry
      structure plus all individual matches/watchers/targets) is same
      value that userspace gave us as the offset of the next entry.
      
      Reported-by: syzbot+f68108fed972453a0ad4@syzkaller.appspotmail.com
      Fixes: 81e675c2 ("netfilter: ebtables: add CONFIG_COMPAT support")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e608f631
    • A
      netfilter: nf_flow_table: fix big-endian integer overflow · c9b3b820
      Arnd Bergmann 提交于
      In some configurations, gcc reports an integer overflow:
      
      net/netfilter/nf_flow_table_offload.c: In function 'nf_flow_rule_match':
      net/netfilter/nf_flow_table_offload.c:80:21: error: unsigned conversion from 'int' to '__be16' {aka 'short unsigned int'} changes value from '327680' to '0' [-Werror=overflow]
         mask->tcp.flags = TCP_FLAG_RST | TCP_FLAG_FIN;
                           ^~~~~~~~~~~~
      
      From what I can tell, we want the upper 16 bits of these constants,
      so they need to be shifted in cpu-endian mode.
      
      Fixes: c29f74e0 ("netfilter: nf_flow_table: hardware offload support")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c9b3b820
    • A
      net, sysctl: Fix compiler warning when only cBPF is present · 1148f9ad
      Alexander Lobakin 提交于
      proc_dointvec_minmax_bpf_restricted() has been firstly introduced
      in commit 2e4a3098 ("bpf: restrict access to core bpf sysctls")
      under CONFIG_HAVE_EBPF_JIT. Then, this ifdef has been removed in
      ede95a63 ("bpf: add bpf_jit_limit knob to restrict unpriv
      allocations"), because a new sysctl, bpf_jit_limit, made use of it.
      Finally, this parameter has become long instead of integer with
      fdadd049 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
      and thus, a new proc_dolongvec_minmax_bpf_restricted() has been
      added.
      
      With this last change, we got back to that
      proc_dointvec_minmax_bpf_restricted() is used only under
      CONFIG_HAVE_EBPF_JIT, but the corresponding ifdef has not been
      brought back.
      
      So, in configurations like CONFIG_BPF_JIT=y && CONFIG_HAVE_EBPF_JIT=n
      since v4.20 we have:
      
        CC      net/core/sysctl_net_core.o
      net/core/sysctl_net_core.c:292:1: warning: ‘proc_dointvec_minmax_bpf_restricted’ defined but not used [-Wunused-function]
        292 | proc_dointvec_minmax_bpf_restricted(struct ctl_table *table, int write,
            | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Suppress this by guarding it with CONFIG_HAVE_EBPF_JIT again.
      
      Fixes: fdadd049 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
      Signed-off-by: NAlexander Lobakin <alobakin@dlink.ru>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20191218091821.7080-1-alobakin@dlink.ru
      1148f9ad
  10. 19 12月, 2019 2 次提交
    • M
      xsk: Add rcu_read_lock around the XSK wakeup · 06870682
      Maxim Mikityanskiy 提交于
      The XSK wakeup callback in drivers makes some sanity checks before
      triggering NAPI. However, some configuration changes may occur during
      this function that affect the result of those checks. For example, the
      interface can go down, and all the resources will be destroyed after the
      checks in the wakeup function, but before it attempts to use these
      resources. Wrap this callback in rcu_read_lock to allow driver to
      synchronize_rcu before actually destroying the resources.
      
      xsk_wakeup is a new function that encapsulates calling ndo_xsk_wakeup
      wrapped into the RCU lock. After this commit, xsk_poll starts using
      xsk_wakeup and checks xs->zc instead of ndo_xsk_wakeup != NULL to decide
      ndo_xsk_wakeup should be called. It also fixes a bug introduced with the
      need_wakeup feature: a non-zero-copy socket may be used with a driver
      supporting zero-copy, and in this case ndo_xsk_wakeup should not be
      called, so the xs->zc check is the correct one.
      
      Fixes: 77cd0d7b ("xsk: add support for need_wakeup flag in AF_XDP rings")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20191217162023.16011-2-maximmi@mellanox.com
      06870682
    • J
      net: nfc: nci: fix a possible sleep-in-atomic-context bug in nci_uart_tty_receive() · b7ac8936
      Jia-Ju Bai 提交于
      The kernel may sleep while holding a spinlock.
      The function call path (from bottom to top) in Linux 4.19 is:
      
      net/nfc/nci/uart.c, 349:
      	nci_skb_alloc in nci_uart_default_recv_buf
      net/nfc/nci/uart.c, 255:
      	(FUNC_PTR)nci_uart_default_recv_buf in nci_uart_tty_receive
      net/nfc/nci/uart.c, 254:
      	spin_lock in nci_uart_tty_receive
      
      nci_skb_alloc(GFP_KERNEL) can sleep at runtime.
      (FUNC_PTR) means a function pointer is called.
      
      To fix this bug, GFP_KERNEL is replaced with GFP_ATOMIC for
      nci_skb_alloc().
      
      This bug is found by a static analysis tool STCheck written by myself.
      Signed-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7ac8936
  11. 18 12月, 2019 4 次提交
  12. 17 12月, 2019 2 次提交
    • S
      vsock/virtio: add WARN_ON check on virtio_transport_get_ops() · 4aaf5961
      Stefano Garzarella 提交于
      virtio_transport_get_ops() and virtio_transport_send_pkt_info()
      can only be used on connecting/connected sockets, since a socket
      assigned to a transport is required.
      
      This patch adds a WARN_ON() on virtio_transport_get_ops() to check
      this requirement, a comment and a returned error on
      virtio_transport_send_pkt_info(),
      Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4aaf5961
    • S
      vsock/virtio: fix null-pointer dereference in virtio_transport_recv_listen() · df18fa14
      Stefano Garzarella 提交于
      With multi-transport support, listener sockets are not bound to any
      transport. So, calling virtio_transport_reset(), when an error
      occurs, on a listener socket produces the following null-pointer
      dereference:
      
        BUG: kernel NULL pointer dereference, address: 00000000000000e8
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000 [#1] SMP PTI
        CPU: 0 PID: 20 Comm: kworker/0:1 Not tainted 5.5.0-rc1-ste-00003-gb4be21f316ac-dirty #56
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
        Workqueue: virtio_vsock virtio_transport_rx_work [vmw_vsock_virtio_transport]
        RIP: 0010:virtio_transport_send_pkt_info+0x20/0x130 [vmw_vsock_virtio_transport_common]
        Code: 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 e5 41 57 41 56 41 55 49 89 f5 41 54 49 89 fc 53 48 83 ec 10 44 8b 76 20 e8 c0 ba fe ff <48> 8b 80 e8 00 00 00 e8 64 e3 7d c1 45 8b 45 00 41 8b 8c 24 d4 02
        RSP: 0018:ffffc900000b7d08 EFLAGS: 00010282
        RAX: 0000000000000000 RBX: ffff88807bf12728 RCX: 0000000000000000
        RDX: ffff88807bf12700 RSI: ffffc900000b7d50 RDI: ffff888035c84000
        RBP: ffffc900000b7d40 R08: ffff888035c84000 R09: ffffc900000b7d08
        R10: ffff8880781de800 R11: 0000000000000018 R12: ffff888035c84000
        R13: ffffc900000b7d50 R14: 0000000000000000 R15: ffff88807bf12724
        FS:  0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000000000e8 CR3: 00000000790f4004 CR4: 0000000000160ef0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         virtio_transport_reset+0x59/0x70 [vmw_vsock_virtio_transport_common]
         virtio_transport_recv_pkt+0x5bb/0xe50 [vmw_vsock_virtio_transport_common]
         ? detach_buf_split+0xf1/0x130
         virtio_transport_rx_work+0xba/0x130 [vmw_vsock_virtio_transport]
         process_one_work+0x1c0/0x300
         worker_thread+0x45/0x3c0
         kthread+0xfc/0x130
         ? current_work+0x40/0x40
         ? kthread_park+0x90/0x90
         ret_from_fork+0x35/0x40
        Modules linked in: sunrpc kvm_intel kvm vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common irqbypass vsock virtio_rng rng_core
        CR2: 00000000000000e8
        ---[ end trace e75400e2ea2fa824 ]---
      
      This happens because virtio_transport_reset() calls
      virtio_transport_send_pkt_info() that can be used only on
      connecting/connected sockets.
      
      This patch fixes the issue, using virtio_transport_reset_no_sock()
      instead of virtio_transport_reset() when we are handling a listener
      socket.
      
      Fixes: c0cfa2d8 ("vsock: add multi-transports support")
      Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df18fa14