1. 08 4月, 2019 1 次提交
    • M
      net: vrf: Fix ping failed when vrf mtu is set to 0 · 5055376a
      Miaohe Lin 提交于
      When the mtu of a vrf device is set to 0, it would cause ping
      failed. So I think we should limit vrf mtu in a reasonable range
      to solve this problem. I set dev->min_mtu to IPV6_MIN_MTU, so it
      will works for both ipv4 and ipv6. And if dev->max_mtu still be 0
      can be confusing, so I set dev->max_mtu to ETH_MAX_MTU.
      
      Here is the reproduce step:
      
      1.Config vrf interface and set mtu to 0:
      3: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
      master vrf1 state UP mode DEFAULT group default qlen 1000
          link/ether 52:54:00:9e:dd:c1 brd ff:ff:ff:ff:ff:ff
      
      2.Ping peer:
      3: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
      master vrf1 state UP group default qlen 1000
          link/ether 52:54:00:9e:dd:c1 brd ff:ff:ff:ff:ff:ff
          inet 10.0.0.1/16 scope global enp4s0
             valid_lft forever preferred_lft forever
      connect: Network is unreachable
      
      3.Set mtu to default value, ping works:
      PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
      64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=1.88 ms
      
      Fixes: ad49bc63 ("net: vrf: remove MTU limits for vrf device")
      Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5055376a
  2. 28 3月, 2019 1 次提交
  3. 22 2月, 2019 1 次提交
  4. 07 12月, 2018 2 次提交
  5. 08 11月, 2018 1 次提交
  6. 03 10月, 2018 1 次提交
  7. 29 5月, 2018 1 次提交
  8. 18 4月, 2018 1 次提交
  9. 31 3月, 2018 1 次提交
  10. 28 3月, 2018 1 次提交
  11. 05 3月, 2018 1 次提交
  12. 28 2月, 2018 1 次提交
  13. 24 2月, 2018 1 次提交
  14. 22 2月, 2018 1 次提交
  15. 16 2月, 2018 1 次提交
  16. 26 1月, 2018 1 次提交
  17. 02 11月, 2017 1 次提交
  18. 05 10月, 2017 3 次提交
  19. 22 9月, 2017 1 次提交
  20. 16 9月, 2017 1 次提交
    • A
      net: vrf: avoid gcc-4.6 warning · ecf09117
      Arnd Bergmann 提交于
      When building an allmodconfig kernel with gcc-4.6, we get a rather
      odd warning:
      
      drivers/net/vrf.c: In function ‘vrf_ip6_input_dst’:
      drivers/net/vrf.c:964:3: error: initialized field with side-effects overwritten [-Werror]
      drivers/net/vrf.c:964:3: error: (near initialization for ‘fl6’) [-Werror]
      
      I have no idea what this warning is even trying to say, but it does
      seem like a false positive. Reordering the initialization in to match
      the structure definition gets rid of the warning, and might also avoid
      whatever gcc thinks is wrong here.
      
      Fixes: 9ff74384 ("net: vrf: Handle ipv6 multicast and link-local addresses")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ecf09117
  21. 14 8月, 2017 1 次提交
  22. 08 8月, 2017 1 次提交
  23. 06 7月, 2017 1 次提交
    • N
      vrf: fix bug_on triggered by rx when destroying a vrf · f630c38e
      Nikolay Aleksandrov 提交于
      When destroying a VRF device we cleanup the slaves in its ndo_uninit()
      function, but that causes packets to be switched (skb->dev == vrf being
      destroyed) even though we're pass the point where the VRF should be
      receiving any packets while it is being dismantled. This causes a BUG_ON
      to trigger if we have raw sockets (trace below).
      The reason is that the inetdev of the VRF has been destroyed but we're
      still sending packets up the stack with it, so let's free the slaves in
      the dellink callback as David Ahern suggested.
      
      Note that this fix doesn't prevent packets from going up when the VRF
      device is admin down.
      
      [   35.631371] ------------[ cut here ]------------
      [   35.631603] kernel BUG at net/ipv4/fib_frontend.c:285!
      [   35.631854] invalid opcode: 0000 [#1] SMP
      [   35.631977] Modules linked in:
      [   35.632081] CPU: 2 PID: 22 Comm: ksoftirqd/2 Not tainted 4.12.0-rc7+ #45
      [   35.632247] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [   35.632477] task: ffff88005ad68000 task.stack: ffff88005ad64000
      [   35.632632] RIP: 0010:fib_compute_spec_dst+0xfc/0x1ee
      [   35.632769] RSP: 0018:ffff88005ad67978 EFLAGS: 00010202
      [   35.632910] RAX: 0000000000000001 RBX: ffff880059a7f200 RCX: 0000000000000000
      [   35.633084] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff82274af0
      [   35.633256] RBP: ffff88005ad679f8 R08: 000000000001ef70 R09: 0000000000000046
      [   35.633430] R10: ffff88005ad679f8 R11: ffff880037731cb0 R12: 0000000000000001
      [   35.633603] R13: ffff8800599e3000 R14: 0000000000000000 R15: ffff8800599cb852
      [   35.634114] FS:  0000000000000000(0000) GS:ffff88005d900000(0000) knlGS:0000000000000000
      [   35.634306] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   35.634456] CR2: 00007f3563227095 CR3: 000000000201d000 CR4: 00000000000406e0
      [   35.634632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   35.634865] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   35.635055] Call Trace:
      [   35.635271]  ? __lock_acquire+0xf0d/0x1117
      [   35.635522]  ipv4_pktinfo_prepare+0x82/0x151
      [   35.635831]  raw_rcv_skb+0x17/0x3c
      [   35.636062]  raw_rcv+0xe5/0xf7
      [   35.636287]  raw_local_deliver+0x169/0x1d9
      [   35.636534]  ip_local_deliver_finish+0x87/0x1c4
      [   35.636820]  ip_local_deliver+0x63/0x7f
      [   35.637058]  ip_rcv_finish+0x340/0x3a1
      [   35.637295]  ip_rcv+0x314/0x34a
      [   35.637525]  __netif_receive_skb_core+0x49f/0x7c5
      [   35.637780]  ? lock_acquire+0x13f/0x1d7
      [   35.638018]  ? lock_acquire+0x15e/0x1d7
      [   35.638259]  __netif_receive_skb+0x1e/0x94
      [   35.638502]  ? __netif_receive_skb+0x1e/0x94
      [   35.638748]  netif_receive_skb_internal+0x74/0x300
      [   35.639002]  ? dev_gro_receive+0x2ed/0x411
      [   35.639246]  ? lock_is_held_type+0xc4/0xd2
      [   35.639491]  napi_gro_receive+0x105/0x1a0
      [   35.639736]  receive_buf+0xc32/0xc74
      [   35.639965]  ? detach_buf+0x67/0x153
      [   35.640201]  ? virtqueue_get_buf_ctx+0x120/0x176
      [   35.640453]  virtnet_poll+0x128/0x1c5
      [   35.640690]  net_rx_action+0x103/0x343
      [   35.640932]  __do_softirq+0x1c7/0x4b7
      [   35.641171]  run_ksoftirqd+0x23/0x5c
      [   35.641403]  smpboot_thread_fn+0x24f/0x26d
      [   35.641646]  ? sort_range+0x22/0x22
      [   35.641878]  kthread+0x129/0x131
      [   35.642104]  ? __list_add+0x31/0x31
      [   35.642335]  ? __list_add+0x31/0x31
      [   35.642568]  ret_from_fork+0x2a/0x40
      [   35.642804] Code: 05 bd 87 a3 00 01 e8 1f ef 98 ff 4d 85 f6 48 c7 c7 f0 4a 27 82 41 0f 94 c4 31 c9 31 d2 41 0f b6 f4 e8 04 71 a1 ff 45 84 e4 74 02 <0f> 0b 0f b7 93 c4 00 00 00 4d 8b a5 80 05 00 00 48 03 93 d0 00
      [   35.644342] RIP: fib_compute_spec_dst+0xfc/0x1ee RSP: ffff88005ad67978
      
      Fixes: 193125db ("net: Introduce VRF device driver")
      Reported-by: NChris Cormier <chriscormier@cumulusnetworks.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f630c38e
  24. 27 6月, 2017 2 次提交
  25. 18 6月, 2017 2 次提交
    • W
      net: remove DST_NOCACHE flag · a4c2fd7f
      Wei Wang 提交于
      DST_NOCACHE flag check has been removed from dst_release() and
      dst_hold_safe() in a previous patch because all the dst are now ref
      counted properly and can be released based on refcnt only.
      Looking at the rest of the DST_NOCACHE use, all of them can now be
      removed or replaced with other checks.
      So this patch gets rid of all the DST_NOCACHE usage and remove this flag
      completely.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4c2fd7f
    • W
      ipv6: take dst->__refcnt for insertion into fib6 tree · 1cfb71ee
      Wei Wang 提交于
      In IPv6 routing code, struct rt6_info is created for each static route
      and RTF_CACHE route and inserted into fib6 tree. In both cases, dst
      ref count is not taken.
      As explained in the previous patch, this leads to the need of the dst
      garbage collector.
      
      This patch holds ref count of dst before inserting the route into fib6
      tree and properly releases the dst when deleting it from the fib6 tree
      as a preparation in order to fully get rid of dst gc later.
      
      Also, correct fib6_age() logic to check dst->__refcnt to be 1 to indicate
      no user is referencing the dst.
      
      And remove dst_hold() in vrf_rt6_create() as ip6_dst_alloc() already puts
      dst->__refcnt to 1.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cfb71ee
  26. 16 6月, 2017 1 次提交
    • J
      networking: make skb_push & __skb_push return void pointers · d58ff351
      Johannes Berg 提交于
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions return void * and remove all the casts across
      the tree, adding a (u8 *) cast only where the unsigned char pointer
      was used directly, all done with the following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
          @@
          expression SKB, LEN;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          @@
          - fn(SKB, LEN)[0]
          + *(u8 *)fn(SKB, LEN)
      
      Note that the last part there converts from push(...)[0] to the
      more idiomatic *(u8 *)push(...).
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d58ff351
  27. 09 6月, 2017 1 次提交
  28. 08 6月, 2017 1 次提交
    • D
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller 提交于
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf124db5
  29. 12 5月, 2017 1 次提交
    • G
      driver: vrf: Fix one possible use-after-free issue · 1a4a5bf5
      Gao Feng 提交于
      The current codes only deal with the case that the skb is dropped, it
      may meet one use-after-free issue when NF_HOOK returns 0 that means
      the skb is stolen by one netfilter rule or hook.
      
      When one netfilter rule or hook stoles the skb and return NF_STOLEN,
      it means the skb is taken by the rule, and other modules should not
      touch this skb ever. Maybe the skb is queued or freed directly by the
      rule.
      
      Now uses the nf_hook instead of NF_HOOK to get the result of netfilter,
      and check the return value of nf_hook. Only when its value equals 1, it
      means the skb could go ahead. Or reset the skb as NULL.
      
      BTW, because vrf_rcv_finish is empty function, so needn't invoke it
      even though nf_hook returns 1. But we need to modify vrf_rcv_finish
      to deal with the NF_STOLEN case.
      
      There are two cases when skb is stolen.
      1. The skb is stolen and freed directly.
         There is nothing we need to do, and vrf_rcv_finish isn't invoked.
      2. The skb is queued and reinjected again.
         The vrf_rcv_finish would be invoked as okfn, so need to free the
         skb in it.
      Signed-off-by: NGao Feng <gfree.wind@vip.163.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a4a5bf5
  30. 28 4月, 2017 1 次提交
  31. 18 4月, 2017 2 次提交
  32. 23 3月, 2017 2 次提交
    • D
      net: vrf: performance improvements for IPv6 · a9ec54d1
      David Ahern 提交于
      The VRF driver allows users to implement device based features for an
      entire domain. For example, a qdisc or netfilter rules can be attached
      to a VRF device or tcpdump can be used to view packets for all devices
      in the L3 domain.
      
      The device-based features come with a performance penalty, most
      notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook
      to switch the dst on an skb to its private dst. This allows the skb
      to traverse the xmit stack with the device set to the VRF device
      which in turn enables the netfilter and qdisc features. The VRF
      driver then performs the FIB lookup again and reinserts the packet.
      
      This patch avoids the redirect for IPv6 packets if a qdisc has not
      been attached to a VRF device which is the default config. In this
      case the netfilter hooks and network taps are directly traversed in
      the l3mdev_l3_out handler. If a qdisc is attached to a VRF device,
      then the redirect using the vrf dst is done.
      
      Additional overhead is removed by only checking packet taps if a
      socket is open on the device (vrf_dev->ptype_all list is not empty).
      Packet sockets bound to any device will still get a copy of the
      packet via the real ingress or egress interface.
      
      The end result of this change is a decrease in the overhead of VRF
      for the default, baseline case (ie., no netfilter rules, no packet
      sockets, no qdisc) from a +3% improvement for UDP which has a lookup
      per packet (VRF being better than no l3mdev) to ~2% loss for TCP_CRR
      which connects a socket for each request-response.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9ec54d1
    • D
      net: vrf: performance improvements for IPv4 · dcdd43c4
      David Ahern 提交于
      The VRF driver allows users to implement device based features for an
      entire domain. For example, a qdisc or netfilter rules can be attached
      to a VRF device or tcpdump can be used to view packets for all devices
      in the L3 domain.
      
      The device-based features come with a performance penalty, most
      notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook
      to switch the dst on an skb to its private dst. This allows the skb
      to traverse the xmit stack with the device set to the VRF device
      which in turn enables the netfilter and qdisc features. The VRF
      driver then performs the FIB lookup again and reinserts the packet.
      
      This patch avoids the redirect for IPv4 packets if a qdisc has not
      been attached to a VRF device which is the default config. In this
      case the netfilter hooks and network taps are directly traversed in
      the l3mdev_l3_out handler. If a qdisc is attached to a VRF device,
      then the redirect using the vrf dst is done.
      
      Additional overhead is removed by only checking packet taps if a
      socket is open on the device (vrf_dev->ptype_all list is not empty).
      Packet sockets bound to any device will still get a copy of the
      packet via the real ingress or egress interface.
      
      The end result of this change is a decrease in the overhead of VRF
      for the default, baseline case (ie., no netfilter rules, no packet
      sockets, no qdisc) to ~3% for UDP which has a lookup per packet and
      < 1% overhead for connected sockets that leverage early demux and
      avoid FIB lookups.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dcdd43c4
  33. 22 3月, 2017 1 次提交
    • D
      net: vrf: Reset rt6i_idev in local dst after put · 3dc857f0
      David Ahern 提交于
      The VRF driver takes a reference to the inet6_dev on the VRF device for
      its rt6_local dst when handling local traffic through the VRF device as
      a loopback. When the device is deleted the driver does a put on the idev
      but does not reset rt6i_idev in the rt6_info struct. When the dst is
      destroyed, dst_destroy calls ip6_dst_destroy which does a second put for
      what is essentially the same reference causing it to be prematurely freed.
      Reset rt6i_idev after the put in the vrf driver.
      
      Fixes: b4869aa2 ("net: vrf: ipv6 support for local traffic to
                             local addresses")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dc857f0