1. 01 12月, 2019 1 次提交
    • D
      net: rtnetlink: prevent underflows in do_setvfinfo() · 9f6de5cf
      Dan Carpenter 提交于
      [ Upstream commit d658c8f56ec7b3de8051a24afb25da9ba3c388c5 ]
      
      The "ivm->vf" variable is a u32, but the problem is that a number of
      drivers cast it to an int and then forget to check for negatives.  An
      example of this is in the cxgb4 driver.
      
      drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
        2890  static int cxgb4_mgmt_get_vf_config(struct net_device *dev,
        2891                                      int vf, struct ifla_vf_info *ivi)
                                                  ^^^^^^
        2892  {
        2893          struct port_info *pi = netdev_priv(dev);
        2894          struct adapter *adap = pi->adapter;
        2895          struct vf_info *vfinfo;
        2896
        2897          if (vf >= adap->num_vfs)
                          ^^^^^^^^^^^^^^^^^^^
        2898                  return -EINVAL;
        2899          vfinfo = &adap->vfinfo[vf];
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
      
      There are 48 functions affected.
      
      drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:8435 hclge_set_vf_vlan_filter() warn: can 'vfid' underflow 's32min-2147483646'
      drivers/net/ethernet/freescale/enetc/enetc_pf.c:377 enetc_pf_set_vf_mac() warn: can 'vf' underflow 's32min-2147483646'
      drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:2899 cxgb4_mgmt_get_vf_config() warn: can 'vf' underflow 's32min-254'
      drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:2960 cxgb4_mgmt_set_vf_rate() warn: can 'vf' underflow 's32min-254'
      drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:3019 cxgb4_mgmt_set_vf_rate() warn: can 'vf' underflow 's32min-254'
      drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:3038 cxgb4_mgmt_set_vf_vlan() warn: can 'vf' underflow 's32min-254'
      drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:3086 cxgb4_mgmt_set_vf_link_state() warn: can 'vf' underflow 's32min-254'
      drivers/net/ethernet/chelsio/cxgb/cxgb2.c:791 get_eeprom() warn: can 'i' underflow 's32min-(-4),0,4-s32max'
      drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c:82 bnxt_set_vf_spoofchk() warn: can 'vf_id' underflow 's32min-65534'
      drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c:164 bnxt_set_vf_trust() warn: can 'vf_id' underflow 's32min-65534'
      drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c:186 bnxt_get_vf_config() warn: can 'vf_id' underflow 's32min-65534'
      drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c:228 bnxt_set_vf_mac() warn: can 'vf_id' underflow 's32min-65534'
      drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c:264 bnxt_set_vf_vlan() warn: can 'vf_id' underflow 's32min-65534'
      drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c:293 bnxt_set_vf_bw() warn: can 'vf_id' underflow 's32min-65534'
      drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c:333 bnxt_set_vf_link_state() warn: can 'vf_id' underflow 's32min-65534'
      drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:2595 bnx2x_vf_op_prep() warn: can 'vfidx' underflow 's32min-63'
      drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:2595 bnx2x_vf_op_prep() warn: can 'vfidx' underflow 's32min-63'
      drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c:2281 bnx2x_post_vf_bulletin() warn: can 'vf' underflow 's32min-63'
      drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c:2285 bnx2x_post_vf_bulletin() warn: can 'vf' underflow 's32min-63'
      drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c:2286 bnx2x_post_vf_bulletin() warn: can 'vf' underflow 's32min-63'
      drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c:2292 bnx2x_post_vf_bulletin() warn: can 'vf' underflow 's32min-63'
      drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c:2297 bnx2x_post_vf_bulletin() warn: can 'vf' underflow 's32min-63'
      drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c:1832 qlcnic_sriov_set_vf_mac() warn: can 'vf' underflow 's32min-254'
      drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c:1864 qlcnic_sriov_set_vf_tx_rate() warn: can 'vf' underflow 's32min-254'
      drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c:1937 qlcnic_sriov_set_vf_vlan() warn: can 'vf' underflow 's32min-254'
      drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c:2005 qlcnic_sriov_get_vf_config() warn: can 'vf' underflow 's32min-254'
      drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c:2036 qlcnic_sriov_set_vf_spoofchk() warn: can 'vf' underflow 's32min-254'
      drivers/net/ethernet/emulex/benet/be_main.c:1914 be_get_vf_config() warn: can 'vf' underflow 's32min-65534'
      drivers/net/ethernet/emulex/benet/be_main.c:1915 be_get_vf_config() warn: can 'vf' underflow 's32min-65534'
      drivers/net/ethernet/emulex/benet/be_main.c:1922 be_set_vf_tvt() warn: can 'vf' underflow 's32min-65534'
      drivers/net/ethernet/emulex/benet/be_main.c:1951 be_clear_vf_tvt() warn: can 'vf' underflow 's32min-65534'
      drivers/net/ethernet/emulex/benet/be_main.c:2063 be_set_vf_tx_rate() warn: can 'vf' underflow 's32min-65534'
      drivers/net/ethernet/emulex/benet/be_main.c:2091 be_set_vf_link_state() warn: can 'vf' underflow 's32min-65534'
      drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c:2609 ice_set_vf_port_vlan() warn: can 'vf_id' underflow 's32min-65534'
      drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c:3050 ice_get_vf_cfg() warn: can 'vf_id' underflow 's32min-65534'
      drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c:3103 ice_set_vf_spoofchk() warn: can 'vf_id' underflow 's32min-65534'
      drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c:3181 ice_set_vf_mac() warn: can 'vf_id' underflow 's32min-65534'
      drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c:3237 ice_set_vf_trust() warn: can 'vf_id' underflow 's32min-65534'
      drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c:3286 ice_set_vf_link_state() warn: can 'vf_id' underflow 's32min-65534'
      drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:3919 i40e_validate_vf() warn: can 'vf_id' underflow 's32min-2147483646'
      drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:3957 i40e_ndo_set_vf_mac() warn: can 'vf_id' underflow 's32min-2147483646'
      drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:4104 i40e_ndo_set_vf_port_vlan() warn: can 'vf_id' underflow 's32min-2147483646'
      drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:4263 i40e_ndo_set_vf_bw() warn: can 'vf_id' underflow 's32min-2147483646'
      drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:4309 i40e_ndo_get_vf_config() warn: can 'vf_id' underflow 's32min-2147483646'
      drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:4371 i40e_ndo_set_vf_link_state() warn: can 'vf_id' underflow 's32min-2147483646'
      drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:4441 i40e_ndo_set_vf_spoofchk() warn: can 'vf_id' underflow 's32min-2147483646'
      drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:4441 i40e_ndo_set_vf_spoofchk() warn: can 'vf_id' underflow 's32min-2147483646'
      drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:4504 i40e_ndo_set_vf_trust() warn: can 'vf_id' underflow 's32min-2147483646'
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f6de5cf
  2. 21 11月, 2019 1 次提交
  3. 10 11月, 2019 1 次提交
    • G
      netns: fix GFP flags in rtnl_net_notifyid() · 40400fdd
      Guillaume Nault 提交于
      [ Upstream commit d4e4fdf9e4a27c87edb79b1478955075be141f67 ]
      
      In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to
      rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances,
      but there are a few paths calling rtnl_net_notifyid() from atomic
      context or from RCU critical sections. The later also precludes the use
      of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new()
      call is wrong too, as it uses GFP_KERNEL unconditionally.
      
      Therefore, we need to pass the GFP flags as parameter and propagate it
      through function calls until the proper flags can be determined.
      
      In most cases, GFP_KERNEL is fine. The exceptions are:
        * openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump()
          indirectly call rtnl_net_notifyid() from RCU critical section,
      
        * rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as
          parameter.
      
      Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used
      by nlmsg_new(). The function is allowed to sleep, so better make the
      flags consistent with the ones used in the following
      ovs_vport_cmd_fill_info() call.
      
      Found by code inspection.
      
      Fixes: 9a963454 ("netns: notify netns id events")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      40400fdd
  4. 26 5月, 2019 1 次提交
    • S
      rtnetlink: always put IFLA_LINK for links with a link-netnsid · 2636da60
      Sabrina Dubroca 提交于
      [ Upstream commit feadc4b6cf42a53a8a93c918a569a0b7e62bd350 ]
      
      Currently, nla_put_iflink() doesn't put the IFLA_LINK attribute when
      iflink == ifindex.
      
      In some cases, a device can be created in a different netns with the
      same ifindex as its parent. That device will not dump its IFLA_LINK
      attribute, which can confuse some userspace software that expects it.
      For example, if the last ifindex created in init_net and foo are both
      8, these commands will trigger the issue:
      
          ip link add parent type dummy                   # ifindex 9
          ip link add link parent netns foo type macvlan  # ifindex 9 in ns foo
      
      So, in case a device puts the IFLA_LINK_NETNSID attribute in a dump,
      always put the IFLA_LINK attribute as well.
      
      Thanks to Dan Winship for analyzing the original OpenShift bug down to
      the missing netlink attribute.
      
      v2: change Fixes tag, it's been here forever, as Nicolas Dichtel said
          add Nicolas' ack
      v3: change Fixes tag
          fix subject typo, spotted by Edward Cree
      Analyzed-by: NDan Winship <danw@redhat.com>
      Fixes: d8a5ec67 ("[NET]: netlink support for moving devices between network namespaces.")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2636da60
  5. 17 12月, 2018 1 次提交
    • E
      rtnetlink: ndo_dflt_fdb_dump() only work for ARPHRD_ETHER devices · a482f800
      Eric Dumazet 提交于
      [ Upstream commit 688838934c231bb08f46db687e57f6d8bf82709c ]
      
      kmsan was able to trigger a kernel-infoleak using a gre device [1]
      
      nlmsg_populate_fdb_fill() has a hard coded assumption
      that dev->addr_len is ETH_ALEN, as normally guaranteed
      for ARPHRD_ETHER devices.
      
      A similar issue was fixed recently in commit da71577545a5
      ("rtnetlink: Disallow FDB configuration for non-Ethernet device")
      
      [1]
      BUG: KMSAN: kernel-infoleak in copyout lib/iov_iter.c:143 [inline]
      BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x4c0/0x2700 lib/iov_iter.c:576
      CPU: 0 PID: 6697 Comm: syz-executor310 Not tainted 4.20.0-rc3+ #95
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x32d/0x480 lib/dump_stack.c:113
       kmsan_report+0x12c/0x290 mm/kmsan/kmsan.c:683
       kmsan_internal_check_memory+0x32a/0xa50 mm/kmsan/kmsan.c:743
       kmsan_copy_to_user+0x78/0xd0 mm/kmsan/kmsan_hooks.c:634
       copyout lib/iov_iter.c:143 [inline]
       _copy_to_iter+0x4c0/0x2700 lib/iov_iter.c:576
       copy_to_iter include/linux/uio.h:143 [inline]
       skb_copy_datagram_iter+0x4e2/0x1070 net/core/datagram.c:431
       skb_copy_datagram_msg include/linux/skbuff.h:3316 [inline]
       netlink_recvmsg+0x6f9/0x19d0 net/netlink/af_netlink.c:1975
       sock_recvmsg_nosec net/socket.c:794 [inline]
       sock_recvmsg+0x1d1/0x230 net/socket.c:801
       ___sys_recvmsg+0x444/0xae0 net/socket.c:2278
       __sys_recvmsg net/socket.c:2327 [inline]
       __do_sys_recvmsg net/socket.c:2337 [inline]
       __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334
       __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334
       do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      RIP: 0033:0x441119
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fffc7f008a8 EFLAGS: 00000207 ORIG_RAX: 000000000000002f
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000441119
      RDX: 0000000000000040 RSI: 00000000200005c0 RDI: 0000000000000003
      RBP: 00000000006cc018 R08: 0000000000000100 R09: 0000000000000100
      R10: 0000000000000100 R11: 0000000000000207 R12: 0000000000402080
      R13: 0000000000402110 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was stored to memory at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:246 [inline]
       kmsan_save_stack mm/kmsan/kmsan.c:261 [inline]
       kmsan_internal_chain_origin+0x13d/0x240 mm/kmsan/kmsan.c:469
       kmsan_memcpy_memmove_metadata+0x1a9/0xf70 mm/kmsan/kmsan.c:344
       kmsan_memcpy_metadata+0xb/0x10 mm/kmsan/kmsan.c:362
       __msan_memcpy+0x61/0x70 mm/kmsan/kmsan_instr.c:162
       __nla_put lib/nlattr.c:744 [inline]
       nla_put+0x20a/0x2d0 lib/nlattr.c:802
       nlmsg_populate_fdb_fill+0x444/0x810 net/core/rtnetlink.c:3466
       nlmsg_populate_fdb net/core/rtnetlink.c:3775 [inline]
       ndo_dflt_fdb_dump+0x73a/0x960 net/core/rtnetlink.c:3807
       rtnl_fdb_dump+0x1318/0x1cb0 net/core/rtnetlink.c:3979
       netlink_dump+0xc79/0x1c90 net/netlink/af_netlink.c:2244
       __netlink_dump_start+0x10c4/0x11d0 net/netlink/af_netlink.c:2352
       netlink_dump_start include/linux/netlink.h:216 [inline]
       rtnetlink_rcv_msg+0x141b/0x1540 net/core/rtnetlink.c:4910
       netlink_rcv_skb+0x394/0x640 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4965
       netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
       netlink_unicast+0x1699/0x1740 net/netlink/af_netlink.c:1336
       netlink_sendmsg+0x13c7/0x1440 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg net/socket.c:631 [inline]
       ___sys_sendmsg+0xe3b/0x1240 net/socket.c:2116
       __sys_sendmsg net/socket.c:2154 [inline]
       __do_sys_sendmsg net/socket.c:2163 [inline]
       __se_sys_sendmsg+0x305/0x460 net/socket.c:2161
       __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
       do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:246 [inline]
       kmsan_internal_poison_shadow+0x6d/0x130 mm/kmsan/kmsan.c:170
       kmsan_kmalloc+0xa1/0x100 mm/kmsan/kmsan_hooks.c:186
       __kmalloc+0x14c/0x4d0 mm/slub.c:3825
       kmalloc include/linux/slab.h:551 [inline]
       __hw_addr_create_ex net/core/dev_addr_lists.c:34 [inline]
       __hw_addr_add_ex net/core/dev_addr_lists.c:80 [inline]
       __dev_mc_add+0x357/0x8a0 net/core/dev_addr_lists.c:670
       dev_mc_add+0x6d/0x80 net/core/dev_addr_lists.c:687
       ip_mc_filter_add net/ipv4/igmp.c:1128 [inline]
       igmp_group_added+0x4d4/0xb80 net/ipv4/igmp.c:1311
       __ip_mc_inc_group+0xea9/0xf70 net/ipv4/igmp.c:1444
       ip_mc_inc_group net/ipv4/igmp.c:1453 [inline]
       ip_mc_up+0x1c3/0x400 net/ipv4/igmp.c:1775
       inetdev_event+0x1d03/0x1d80 net/ipv4/devinet.c:1522
       notifier_call_chain kernel/notifier.c:93 [inline]
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x13d/0x240 kernel/notifier.c:401
       __dev_notify_flags+0x3da/0x860 net/core/dev.c:1733
       dev_change_flags+0x1ac/0x230 net/core/dev.c:7569
       do_setlink+0x165f/0x5ea0 net/core/rtnetlink.c:2492
       rtnl_newlink+0x2ad7/0x35a0 net/core/rtnetlink.c:3111
       rtnetlink_rcv_msg+0x1148/0x1540 net/core/rtnetlink.c:4947
       netlink_rcv_skb+0x394/0x640 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4965
       netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
       netlink_unicast+0x1699/0x1740 net/netlink/af_netlink.c:1336
       netlink_sendmsg+0x13c7/0x1440 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg net/socket.c:631 [inline]
       ___sys_sendmsg+0xe3b/0x1240 net/socket.c:2116
       __sys_sendmsg net/socket.c:2154 [inline]
       __do_sys_sendmsg net/socket.c:2163 [inline]
       __se_sys_sendmsg+0x305/0x460 net/socket.c:2161
       __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
       do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      
      Bytes 36-37 of 105 are uninitialized
      Memory access of size 105 starts at ffff88819686c000
      Data copied to user address 0000000020000380
      
      Fixes: d83b0603 ("net: add fdb generic dump routine")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Ido Schimmel <idosch@mellanox.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a482f800
  6. 04 11月, 2018 1 次提交
    • I
      rtnetlink: Disallow FDB configuration for non-Ethernet device · 2799b518
      Ido Schimmel 提交于
      [ Upstream commit da71577545a52be3e0e9225a946e5fd79cfab015 ]
      
      When an FDB entry is configured, the address is validated to have the
      length of an Ethernet address, but the device for which the address is
      configured can be of any type.
      
      The above can result in the use of uninitialized memory when the address
      is later compared against existing addresses since 'dev->addr_len' is
      used and it may be greater than ETH_ALEN, as with ip6tnl devices.
      
      Fix this by making sure that FDB entries are only configured for
      Ethernet devices.
      
      BUG: KMSAN: uninit-value in memcmp+0x11d/0x180 lib/string.c:863
      CPU: 1 PID: 4318 Comm: syz-executor998 Not tainted 4.19.0-rc3+ #49
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x14b/0x190 lib/dump_stack.c:113
        kmsan_report+0x183/0x2b0 mm/kmsan/kmsan.c:956
        __msan_warning+0x70/0xc0 mm/kmsan/kmsan_instr.c:645
        memcmp+0x11d/0x180 lib/string.c:863
        dev_uc_add_excl+0x165/0x7b0 net/core/dev_addr_lists.c:464
        ndo_dflt_fdb_add net/core/rtnetlink.c:3463 [inline]
        rtnl_fdb_add+0x1081/0x1270 net/core/rtnetlink.c:3558
        rtnetlink_rcv_msg+0xa0b/0x1530 net/core/rtnetlink.c:4715
        netlink_rcv_skb+0x36e/0x5f0 net/netlink/af_netlink.c:2454
        rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4733
        netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
        netlink_unicast+0x1638/0x1720 net/netlink/af_netlink.c:1343
        netlink_sendmsg+0x1205/0x1290 net/netlink/af_netlink.c:1908
        sock_sendmsg_nosec net/socket.c:621 [inline]
        sock_sendmsg net/socket.c:631 [inline]
        ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
        __sys_sendmsg net/socket.c:2152 [inline]
        __do_sys_sendmsg net/socket.c:2161 [inline]
        __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
        do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
        entry_SYSCALL_64_after_hwframe+0x63/0xe7
      RIP: 0033:0x440ee9
      Code: e8 cc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fff6a93b518 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440ee9
      RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 00000000004002c8 R11: 0000000000000213 R12: 000000000000b4b0
      R13: 0000000000401ec0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:256 [inline]
        kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:181
        kmsan_kmalloc+0x98/0x100 mm/kmsan/kmsan_hooks.c:91
        kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan_hooks.c:100
        slab_post_alloc_hook mm/slab.h:446 [inline]
        slab_alloc_node mm/slub.c:2718 [inline]
        __kmalloc_node_track_caller+0x9e7/0x1160 mm/slub.c:4351
        __kmalloc_reserve net/core/skbuff.c:138 [inline]
        __alloc_skb+0x2f5/0x9e0 net/core/skbuff.c:206
        alloc_skb include/linux/skbuff.h:996 [inline]
        netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline]
        netlink_sendmsg+0xb49/0x1290 net/netlink/af_netlink.c:1883
        sock_sendmsg_nosec net/socket.c:621 [inline]
        sock_sendmsg net/socket.c:631 [inline]
        ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
        __sys_sendmsg net/socket.c:2152 [inline]
        __do_sys_sendmsg net/socket.c:2161 [inline]
        __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
        do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
        entry_SYSCALL_64_after_hwframe+0x63/0xe7
      
      v2:
      * Make error message more specific (David)
      
      Fixes: 090096bf ("net: generic fdb support for drivers without ndo_fdb_<op>")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-and-tested-by: syzbot+3a288d5f5530b901310e@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+d53ab4e92a1db04110ff@syzkaller.appspotmail.com
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2799b518
  7. 06 10月, 2018 1 次提交
    • M
      rtnetlink: fix rtnl_fdb_dump() for ndmsg header · bd961c9b
      Mauricio Faria de Oliveira 提交于
      Currently, rtnl_fdb_dump() assumes the family header is 'struct ifinfomsg',
      which is not always true -- 'struct ndmsg' is used by iproute2 ('ip neigh').
      
      The problem is, the function bails out early if nlmsg_parse() fails, which
      does occur for iproute2 usage of 'struct ndmsg' because the payload length
      is shorter than the family header alone (as 'struct ifinfomsg' is assumed).
      
      This breaks backward compatibility with userspace -- nothing is sent back.
      
      Some examples with iproute2 and netlink library for go [1]:
      
       1) $ bridge fdb show
          33:33:00:00:00:01 dev ens3 self permanent
          01:00:5e:00:00:01 dev ens3 self permanent
          33:33:ff:15:98:30 dev ens3 self permanent
      
            This one works, as it uses 'struct ifinfomsg'.
      
            fdb_show() @ iproute2/bridge/fdb.c
              """
              .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
              ...
              if (rtnl_dump_request(&rth, RTM_GETNEIGH, [...]
              """
      
       2) $ ip --family bridge neigh
          RTNETLINK answers: Invalid argument
          Dump terminated
      
            This one fails, as it uses 'struct ndmsg'.
      
            do_show_or_flush() @ iproute2/ip/ipneigh.c
              """
              .n.nlmsg_type = RTM_GETNEIGH,
              .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
              """
      
       3) $ ./neighlist
          < no output >
      
            This one fails, as it uses 'struct ndmsg'-based.
      
            neighList() @ netlink/neigh_linux.go
              """
              req := h.newNetlinkRequest(unix.RTM_GETNEIGH, [...]
              msg := Ndmsg{
              """
      
      The actual breakage was introduced by commit 0ff50e83 ("net: rtnetlink:
      bail out from rtnl_fdb_dump() on parse error"), because nlmsg_parse() fails
      if the payload length (with the _actual_ family header) is less than the
      family header length alone (which is assumed, in parameter 'hdrlen').
      This is true in the examples above with struct ndmsg, with size and payload
      length shorter than struct ifinfomsg.
      
      However, that commit just intends to fix something under the assumption the
      family header is indeed an 'struct ifinfomsg' - by preventing access to the
      payload as such (via 'ifm' pointer) if the payload length is not sufficient
      to actually contain it.
      
      The assumption was introduced by commit 5e6d2435 ("bridge: netlink dump
      interface at par with brctl"), to support iproute2's 'bridge fdb' command
      (not 'ip neigh') which indeed uses 'struct ifinfomsg', thus is not broken.
      
      So, in order to unbreak the 'struct ndmsg' family headers and still allow
      'struct ifinfomsg' to continue to work, check for the known message sizes
      used with 'struct ndmsg' in iproute2 (with zero or one attribute which is
      not used in this function anyway) then do not parse the data as ifinfomsg.
      
      Same examples with this patch applied (or revert/before the original fix):
      
          $ bridge fdb show
          33:33:00:00:00:01 dev ens3 self permanent
          01:00:5e:00:00:01 dev ens3 self permanent
          33:33:ff:15:98:30 dev ens3 self permanent
      
          $ ip --family bridge neigh
          dev ens3 lladdr 33:33:00:00:00:01 PERMANENT
          dev ens3 lladdr 01:00:5e:00:00:01 PERMANENT
          dev ens3 lladdr 33:33:ff:15:98:30 PERMANENT
      
          $ ./neighlist
          netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0x0, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
          netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x1, 0x0, 0x5e, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
          netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0xff, 0x15, 0x98, 0x30}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
      
      Tested on mainline (v4.19-rc6) and net-next (3bd09b05b068).
      
      References:
      
      [1] netlink library for go (test-case)
          https://github.com/vishvananda/netlink
      
          $ cat ~/go/src/neighlist/main.go
          package main
          import ("fmt"; "syscall"; "github.com/vishvananda/netlink")
          func main() {
              neighs, _ := netlink.NeighList(0, syscall.AF_BRIDGE)
              for _, neigh := range neighs { fmt.Printf("%#v\n", neigh) }
          }
      
          $ export GOPATH=~/go
          $ go get github.com/vishvananda/netlink
          $ go build neighlist
          $ ~/go/src/neighlist/neighlist
      
      Thanks to David Ahern for suggestions to improve this patch.
      
      Fixes: 0ff50e83 ("net: rtnetlink: bail out from rtnl_fdb_dump() on parse error")
      Fixes: 5e6d2435 ("bridge: netlink dump interface at par with brctl")
      Reported-by: NAidan Obley <aobley@pivotal.io>
      Signed-off-by: NMauricio Faria de Oliveira <mfo@canonical.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd961c9b
  8. 03 10月, 2018 1 次提交
    • E
      rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096 · 0e1d6eca
      Eric Dumazet 提交于
      We have an impressive number of syzkaller bugs that are linked
      to the fact that syzbot was able to create a networking device
      with millions of TX (or RX) queues.
      
      Let's limit the number of RX/TX queues to 4096, this really should
      cover all known cases.
      
      A separate patch will add various cond_resched() in the loops
      handling sysfs entries at device creation and dismantle.
      
      Tested:
      
      lpaa6:~# ip link add gre-4097 numtxqueues 4097 numrxqueues 4097 type ip6gretap
      RTNETLINK answers: Invalid argument
      
      lpaa6:~# time ip link add gre-4096 numtxqueues 4096 numrxqueues 4096 type ip6gretap
      
      real	0m0.180s
      user	0m0.000s
      sys	0m0.107s
      
      Fixes: 76ff5cc9 ("rtnl: allow to specify number of rx and tx queues on device creation")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e1d6eca
  9. 02 10月, 2018 1 次提交
  10. 14 9月, 2018 1 次提交
  11. 30 8月, 2018 1 次提交
  12. 30 7月, 2018 2 次提交
  13. 23 7月, 2018 1 次提交
    • R
      rtnetlink: add rtnl_link_state check in rtnl_configure_link · 5025f7f7
      Roopa Prabhu 提交于
      rtnl_configure_link sets dev->rtnl_link_state to
      RTNL_LINK_INITIALIZED and unconditionally calls
      __dev_notify_flags to notify user-space of dev flags.
      
      current call sequence for rtnl_configure_link
      rtnetlink_newlink
          rtnl_link_ops->newlink
          rtnl_configure_link (unconditionally notifies userspace of
                               default and new dev flags)
      
      If a newlink handler wants to call rtnl_configure_link
      early, we will end up with duplicate notifications to
      user-space.
      
      This patch fixes rtnl_configure_link to check rtnl_link_state
      and call __dev_notify_flags with gchanges = 0 if already
      RTNL_LINK_INITIALIZED.
      
      Later in the series, this patch will help the following sequence
      where a driver implementing newlink can call rtnl_configure_link
      to initialize the link early.
      
      makes the following call sequence work:
      rtnetlink_newlink
          rtnl_link_ops->newlink (vxlan) -> rtnl_configure_link (initializes
                                                      link and notifies
                                                      user-space of default
                                                      dev flags)
          rtnl_configure_link (updates dev flags if requested by user ifm
                               and notifies user-space of new dev flags)
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5025f7f7
  14. 19 7月, 2018 1 次提交
  15. 14 7月, 2018 3 次提交
  16. 07 7月, 2018 1 次提交
    • R
      rtnetlink: add rtnl_link_state check in rtnl_configure_link · 8d356b89
      Roopa Prabhu 提交于
      rtnl_configure_link sets dev->rtnl_link_state to
      RTNL_LINK_INITIALIZED and unconditionally calls
      __dev_notify_flags to notify user-space of dev flags.
      
      current call sequence for rtnl_configure_link
      rtnetlink_newlink
          rtnl_link_ops->newlink
          rtnl_configure_link (unconditionally notifies userspace of
                               default and new dev flags)
      
      If a newlink handler wants to call rtnl_configure_link
      early, we will end up with duplicate notifications to
      user-space.
      
      This patch fixes rtnl_configure_link to check rtnl_link_state
      and call __dev_notify_flags with gchanges = 0 if already
      RTNL_LINK_INITIALIZED.
      
      Later in the series, this patch will help the following sequence
      where a driver implementing newlink can call rtnl_configure_link
      to initialize the link early.
      
      makes the following call sequence work:
      rtnetlink_newlink
          rtnl_link_ops->newlink (vxlan) -> rtnl_configure_link (initializes
                                                      link and notifies
                                                      user-space of default
                                                      dev flags)
          rtnl_configure_link (updates dev flags if requested by user ifm
                               and notifies user-space of new dev flags)
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d356b89
  17. 06 6月, 2018 1 次提交
    • E
      rtnetlink: validate attributes in do_setlink() · 644c7eeb
      Eric Dumazet 提交于
      It seems that rtnl_group_changelink() can call do_setlink
      while a prior call to validate_linkmsg(dev = NULL, ...) could
      not validate IFLA_ADDRESS / IFLA_BROADCAST
      
      Make sure do_setlink() calls validate_linkmsg() instead
      of letting its callers having this responsibility.
      
      With help from Dmitry Vyukov, thanks a lot !
      
      BUG: KMSAN: uninit-value in is_valid_ether_addr include/linux/etherdevice.h:199 [inline]
      BUG: KMSAN: uninit-value in eth_prepare_mac_addr_change net/ethernet/eth.c:275 [inline]
      BUG: KMSAN: uninit-value in eth_mac_addr+0x203/0x2b0 net/ethernet/eth.c:308
      CPU: 1 PID: 8695 Comm: syz-executor3 Not tainted 4.17.0-rc5+ #103
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:113
       kmsan_report+0x149/0x260 mm/kmsan/kmsan.c:1084
       __msan_warning_32+0x6e/0xc0 mm/kmsan/kmsan_instr.c:686
       is_valid_ether_addr include/linux/etherdevice.h:199 [inline]
       eth_prepare_mac_addr_change net/ethernet/eth.c:275 [inline]
       eth_mac_addr+0x203/0x2b0 net/ethernet/eth.c:308
       dev_set_mac_address+0x261/0x530 net/core/dev.c:7157
       do_setlink+0xbc3/0x5fc0 net/core/rtnetlink.c:2317
       rtnl_group_changelink net/core/rtnetlink.c:2824 [inline]
       rtnl_newlink+0x1fe9/0x37a0 net/core/rtnetlink.c:2976
       rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646
       netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664
       netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
       netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336
       netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg net/socket.c:639 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
       __sys_sendmsg net/socket.c:2155 [inline]
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
       do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x455a09
      RSP: 002b:00007fc07480ec68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007fc07480f6d4 RCX: 0000000000455a09
      RDX: 0000000000000000 RSI: 00000000200003c0 RDI: 0000000000000014
      RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000000005d0 R14: 00000000006fdc20 R15: 0000000000000000
      
      Uninit was stored to memory at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
       kmsan_save_stack mm/kmsan/kmsan.c:294 [inline]
       kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685
       kmsan_memcpy_origins+0x11d/0x170 mm/kmsan/kmsan.c:527
       __msan_memcpy+0x109/0x160 mm/kmsan/kmsan_instr.c:478
       do_setlink+0xb84/0x5fc0 net/core/rtnetlink.c:2315
       rtnl_group_changelink net/core/rtnetlink.c:2824 [inline]
       rtnl_newlink+0x1fe9/0x37a0 net/core/rtnetlink.c:2976
       rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646
       netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664
       netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
       netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336
       netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg net/socket.c:639 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
       __sys_sendmsg net/socket.c:2155 [inline]
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
       do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315
       kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322
       slab_post_alloc_hook mm/slab.h:446 [inline]
       slab_alloc_node mm/slub.c:2753 [inline]
       __kmalloc_node_track_caller+0xb32/0x11b0 mm/slub.c:4395
       __kmalloc_reserve net/core/skbuff.c:138 [inline]
       __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206
       alloc_skb include/linux/skbuff.h:988 [inline]
       netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
       netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg net/socket.c:639 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
       __sys_sendmsg net/socket.c:2155 [inline]
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
       do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: e7ed828f ("netlink: support setting devgroup parameters")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      644c7eeb
  18. 01 6月, 2018 2 次提交
    • P
      rtnetlink: Fix null-ptr-deref in rtnl_newlink · af066ed3
      Prashant Bhole 提交于
      In rtnl_newlink(), NULL check is performed on m_ops however member of
      ops is accessed. Fixed by accessing member of m_ops instead of ops.
      
      [  345.432629] BUG: KASAN: null-ptr-deref in rtnl_newlink+0x400/0x1110
      [  345.432629] Read of size 4 at addr 0000000000000088 by task ip/986
      [  345.432629]
      [  345.432629] CPU: 1 PID: 986 Comm: ip Not tainted 4.17.0-rc6+ #9
      [  345.432629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [  345.432629] Call Trace:
      [  345.432629]  dump_stack+0xc6/0x150
      [  345.432629]  ? dump_stack_print_info.cold.0+0x1b/0x1b
      [  345.432629]  ? kasan_report+0xb4/0x410
      [  345.432629]  kasan_report.cold.4+0x8f/0x91
      [  345.432629]  ? rtnl_newlink+0x400/0x1110
      [  345.432629]  rtnl_newlink+0x400/0x1110
      [...]
      
      Fixes: ccf8dbcd ("rtnetlink: Remove VLA usage")
      Signed-off-by: NPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Tested-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af066ed3
    • K
      rtnetlink: Remove VLA usage · ccf8dbcd
      Kees Cook 提交于
      In the quest to remove all stack VLA usage from the kernel[1], this
      allocates the maximum size expected for all possible types and adds
      sanity-checks at both registration and usage to make sure nothing gets
      out of sync. This matches the proposed VLA solution for nfnetlink[2]. The
      values chosen here were based on finding assignments for .maxtype and
      .slave_maxtype and manually counting the enums:
      
      slave_maxtype (max 33):
      	IFLA_BRPORT_MAX     33
      	IFLA_BOND_SLAVE_MAX  9
      
      maxtype (max 45):
      	IFLA_BOND_MAX       28
      	IFLA_BR_MAX         45
      	__IFLA_CAIF_HSI_MAX  8
      	IFLA_CAIF_MAX        4
      	IFLA_CAN_MAX        16
      	IFLA_GENEVE_MAX     12
      	IFLA_GRE_MAX        25
      	IFLA_GTP_MAX         5
      	IFLA_HSR_MAX         7
      	IFLA_IPOIB_MAX       4
      	IFLA_IPTUN_MAX      21
      	IFLA_IPVLAN_MAX      3
      	IFLA_MACSEC_MAX     15
      	IFLA_MACVLAN_MAX     7
      	IFLA_PPP_MAX         2
      	__IFLA_RMNET_MAX     4
      	IFLA_VLAN_MAX        6
      	IFLA_VRF_MAX         2
      	IFLA_VTI_MAX         7
      	IFLA_VXLAN_MAX      28
      	VETH_INFO_MAX        2
      	VXCAN_INFO_MAX       2
      
      This additionally changes maxtype and slave_maxtype fields to unsigned,
      since they're only ever using positive values.
      
      [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
      [2] https://patchwork.kernel.org/patch/10439647/Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccf8dbcd
  19. 18 4月, 2018 1 次提交
  20. 01 4月, 2018 1 次提交
    • K
      net: Do not take net_rwsem in __rtnl_link_unregister() · 554873e5
      Kirill Tkhai 提交于
      This function calls call_netdevice_notifier(), which also
      may take net_rwsem. So, we can't use net_rwsem here.
      
      This patch makes callers of this functions take pernet_ops_rwsem,
      like register_netdevice_notifier() does. This will protect
      the modifications of net_namespace_list, and allows notifiers
      to take it (they won't have to care about context).
      
      Since __rtnl_link_unregister() is used on module load
      and unload (which are not frequent operations), this looks
      for me better, than make all call_netdevice_notifier()
      always executing in "protected net_namespace_list" context.
      
      Also, this fixes the problem we had a deal in 328fbe74
      "Close race between {un, }register_netdevice_notifier and ...",
      and guarantees __rtnl_link_unregister() does not skip
      exitting net.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      554873e5
  21. 30 3月, 2018 1 次提交
    • K
      net: Introduce net_rwsem to protect net_namespace_list · f0b07bb1
      Kirill Tkhai 提交于
      rtnl_lock() is used everywhere, and contention is very high.
      When someone wants to iterate over alive net namespaces,
      he/she has no a possibility to do that without exclusive lock.
      But the exclusive rtnl_lock() in such places is overkill,
      and it just increases the contention. Yes, there is already
      for_each_net_rcu() in kernel, but it requires rcu_read_lock(),
      and this can't be sleepable. Also, sometimes it may be need
      really prevent net_namespace_list growth, so for_each_net_rcu()
      is not fit there.
      
      This patch introduces new rw_semaphore, which will be used
      instead of rtnl_mutex to protect net_namespace_list. It is
      sleepable and allows not-exclusive iterations over net
      namespaces list. It allows to stop using rtnl_lock()
      in several places (what is made in next patches) and makes
      less the time, we keep rtnl_mutex. Here we just add new lock,
      while the explanation of we can remove rtnl_lock() there are
      in next patches.
      
      Fine grained locks generally are better, then one big lock,
      so let's do that with net_namespace_list, while the situation
      allows that.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0b07bb1
  22. 28 3月, 2018 3 次提交
  23. 17 3月, 2018 1 次提交
    • K
      net: Add rtnl_lock_killable() · 79ffdfc6
      Kirill Tkhai 提交于
      rtnl_lock() is widely used mutex in kernel. Some of kernel code
      does memory allocations under it. In case of memory deficit this
      may invoke OOM killer, but the problem is a killed task can't
      exit if it's waiting for the mutex. This may be a reason of deadlock
      and panic.
      
      This patch adds a new primitive, which responds on SIGKILL, and
      it allows to use it in the places, where we don't want to sleep
      forever.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79ffdfc6
  24. 13 2月, 2018 2 次提交
    • K
      net: Convert rtnetlink_net_ops · 46456675
      Kirill Tkhai 提交于
      rtnetlink_net_init() and rtnetlink_net_exit()
      create and destroy netlink socket net::rtnl.
      
      The socket is used to send rtnl notification via
      rtnl_net_notifyid(). There is no a problem
      to create and destroy it in parallel with other
      pernet operations, as we link net in setup_net()
      after the socket is created, and destroy
      in cleanup_net() after net is unhashed from all
      the lists and there is no RCU references on it.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46456675
    • K
      net: Introduce net_sem for protection of pernet_list · 1a57feb8
      Kirill Tkhai 提交于
      Currently, the mutex is mostly used to protect pernet operations
      list. It orders setup_net() and cleanup_net() with parallel
      {un,}register_pernet_operations() calls, so ->exit{,batch} methods
      of the same pernet operations are executed for a dying net, as
      were used to call ->init methods, even after the net namespace
      is unlinked from net_namespace_list in cleanup_net().
      
      But there are several problems with scalability. The first one
      is that more than one net can't be created or destroyed
      at the same moment on the node. For big machines with many cpus
      running many containers it's very sensitive.
      
      The second one is that it's need to synchronize_rcu() after net
      is removed from net_namespace_list():
      
      Destroy net_ns:
      cleanup_net()
        mutex_lock(&net_mutex)
        list_del_rcu(&net->list)
        synchronize_rcu()                                  <--- Sleep there for ages
        list_for_each_entry_reverse(ops, &pernet_list, list)
          ops_exit_list(ops, &net_exit_list)
        list_for_each_entry_reverse(ops, &pernet_list, list)
          ops_free_list(ops, &net_exit_list)
        mutex_unlock(&net_mutex)
      
      This primitive is not fast, especially on the systems with many processors
      and/or when preemptible RCU is enabled in config. So, all the time, while
      cleanup_net() is waiting for RCU grace period, creation of new net namespaces
      is not possible, the tasks, who makes it, are sleeping on the same mutex:
      
      Create net_ns:
      copy_net_ns()
        mutex_lock_killable(&net_mutex)                    <--- Sleep there for ages
      
      I observed 20-30 seconds hangs of "unshare -n" on ordinary 8-cpu laptop
      with preemptible RCU enabled after CRIU tests round is finished.
      
      The solution is to convert net_mutex to the rw_semaphore and add fine grain
      locks to really small number of pernet_operations, what really need them.
      
      Then, pernet_operations::init/::exit methods, modifying the net-related data,
      will require down_read() locking only, while down_write() will be used
      for changing pernet_list (i.e., when modules are being loaded and unloaded).
      
      This gives signify performance increase, after all patch set is applied,
      like you may see here:
      
      %for i in {1..10000}; do unshare -n bash -c exit; done
      
      *before*
      real 1m40,377s
      user 0m9,672s
      sys 0m19,928s
      
      *after*
      real 0m17,007s
      user 0m5,311s
      sys 0m11,779
      
      (5.8 times faster)
      
      This patch starts replacing net_mutex to net_sem. It adds rw_semaphore,
      describes the variables it protects, and makes to use, where appropriate.
      net_mutex is still present, and next patches will kick it out step-by-step.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a57feb8
  25. 09 2月, 2018 1 次提交
    • C
      rtnetlink: require unique netns identifier · 4ff66cae
      Christian Brauner 提交于
      Since we've added support for IFLA_IF_NETNSID for RTM_{DEL,GET,SET,NEW}LINK
      it is possible for userspace to send us requests with three different
      properties to identify a target network namespace. This affects at least
      RTM_{NEW,SET}LINK. Each of them could potentially refer to a different
      network namespace which is confusing. For legacy reasons the kernel will
      pick the IFLA_NET_NS_PID property first and then look for the
      IFLA_NET_NS_FD property but there is no reason to extend this type of
      behavior to network namespace ids. The regression potential is quite
      minimal since the rtnetlink requests in question either won't allow
      IFLA_IF_NETNSID requests before 4.16 is out (RTM_{NEW,SET}LINK) or don't
      support IFLA_NET_NS_{PID,FD} (RTM_{DEL,GET}LINK) in the first place.
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Acked-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ff66cae
  26. 01 2月, 2018 1 次提交
  27. 31 1月, 2018 1 次提交
    • C
      rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK · 5bb8ed07
      Christian Brauner 提交于
      - Backwards Compatibility:
        If userspace wants to determine whether RTM_NEWLINK supports the
        IFLA_IF_NETNSID property they should first send an RTM_GETLINK request
        with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply
        does not include IFLA_IF_NETNSID userspace should assume that
        IFLA_IF_NETNSID is not supported on this kernel.
        If the reply does contain an IFLA_IF_NETNSID property userspace
        can send an RTM_NEWLINK with a IFLA_IF_NETNSID property. If they receive
        EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property
        with RTM_NEWLINK. Userpace should then fallback to other means.
      
      - Security:
        Callers must have CAP_NET_ADMIN in the owning user namespace of the
        target network namespace.
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5bb8ed07
  28. 30 1月, 2018 5 次提交
    • C
      net: introduce helper dev_change_tx_queue_len() · 6a643ddb
      Cong Wang 提交于
      This patch promotes the local change_tx_queue_len() to a core
      helper function, dev_change_tx_queue_len(), so that rtnetlink
      and net-sysfs could share the code. This also prepares for the
      following patch.
      
      Note, the -EFAULT in the original code doesn't make sense,
      we should propagate the errno from notifiers.
      
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a643ddb
    • N
      dev: advertise the new ifindex when the netns iface changes · 38e01b30
      Nicolas Dichtel 提交于
      The goal is to let the user follow an interface that moves to another
      netns.
      
      CC: Jiri Benc <jbenc@redhat.com>
      CC: Christian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reviewed-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38e01b30
    • C
      rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK · b61ad68a
      Christian Brauner 提交于
      - Backwards Compatibility:
        If userspace wants to determine whether RTM_DELLINK supports the
        IFLA_IF_NETNSID property they should first send an RTM_GETLINK request
        with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply
        does not include IFLA_IF_NETNSID userspace should assume that
        IFLA_IF_NETNSID is not supported on this kernel.
        If the reply does contain an IFLA_IF_NETNSID property userspace
        can send an RTM_DELLINK with a IFLA_IF_NETNSID property. If they receive
        EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property
        with RTM_DELLINK. Userpace should then fallback to other means.
      
      - Security:
        Callers must have CAP_NET_ADMIN in the owning user namespace of the
        target network namespace.
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b61ad68a
    • C
      rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK · c310bfcb
      Christian Brauner 提交于
      - Backwards Compatibility:
        If userspace wants to determine whether RTM_SETLINK supports the
        IFLA_IF_NETNSID property they should first send an RTM_GETLINK request
        with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply
        does not include IFLA_IF_NETNSID userspace should assume that
        IFLA_IF_NETNSID is not supported on this kernel.
        If the reply does contain an IFLA_IF_NETNSID property userspace
        can send an RTM_SETLINK with a IFLA_IF_NETNSID property. If they receive
        EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property
        with RTM_SETLINK. Userpace should then fallback to other means.
      
        To retain backwards compatibility the kernel will first check whether a
        IFLA_NET_NS_PID or IFLA_NET_NS_FD property has been passed. If either
        one is found it will be used to identify the target network namespace.
        This implies that users who do not care whether their running kernel
        supports IFLA_IF_NETNSID with RTM_SETLINK can pass both
        IFLA_NET_NS_{FD,PID} and IFLA_IF_NETNSID referring to the same network
        namespace.
      
      - Security:
        Callers must have CAP_NET_ADMIN in the owning user namespace of the
        target network namespace.
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c310bfcb
    • C
      rtnetlink: enable IFLA_IF_NETNSID in do_setlink() · 7c4f63ba
      Christian Brauner 提交于
      RTM_{NEW,SET}LINK already allow operations on other network namespaces
      by identifying the target network namespace through IFLA_NET_NS_{FD,PID}
      properties. This is done by looking for the corresponding properties in
      do_setlink(). Extend do_setlink() to also look for the IFLA_IF_NETNSID
      property. This introduces no functional changes since all callers of
      do_setlink() currently block IFLA_IF_NETNSID by reporting an error before
      they reach do_setlink().
      
      This introduces the helpers:
      
      static struct net *rtnl_link_get_net_by_nlattr(struct net *src_net, struct
                                                     nlattr *tb[])
      
      static struct net *rtnl_link_get_net_capable(const struct sk_buff *skb,
                                                   struct net *src_net,
      					     struct nlattr *tb[], int cap)
      
      to simplify permission checks and target network namespace retrieval for
      RTM_* requests that already support IFLA_NET_NS_{FD,PID} but get extended
      to IFLA_IF_NETNSID. To perserve backwards compatibility the helpers look
      for IFLA_NET_NS_{FD,PID} properties first before checking for
      IFLA_IF_NETNSID.
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c4f63ba
  29. 23 1月, 2018 1 次提交