1. 27 12月, 2019 3 次提交
    • T
      kabi: reserve space for network subsystem related structure · 148a9c4f
      Tan Xiaojun 提交于
      hulk inclusion
      category: feature
      bugzilla: 13276
      CVE: NA
      
      -------------------------------
      
      Reserve space for the structure in network subsystem.
      Signed-off-by: NTan Xiaojun <tanxiaojun@huawei.com>
      Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      148a9c4f
    • A
      net: dev: Use unsigned integer as an argument to left-shift · b92b47fc
      Andy Shevchenko 提交于
      mainline inclusion
      from mainline-5.0
      commit f4d7b3e23d259c44f1f1c39645450680fcd935d6
      category: bugfix
      bugzilla: 11090
      CVE: NA
      
      -------------------------------------------------
      1 << 31 is Undefined Behaviour according to the C standard.
      Use U type modifier to avoid theoretical overflow.
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit f4d7b3e23d259c44f1f1c39645450680fcd935d6)
      Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
      Reviewed-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      b92b47fc
    • D
      ipvlan, l3mdev: fix broken l3s mode wrt local routes · 15ef6c67
      Daniel Borkmann 提交于
      [ Upstream commit d5256083f62e2720f75bb3c5a928a0afe47d6bc3 ]
      
      While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
      I ran into the issue that while l3 mode is working fine, l3s mode
      does not have any connectivity to kube-apiserver and hence all pods
      end up in Error state as well. The ipvlan master device sits on
      top of a bond device and hostns traffic to kube-apiserver (also running
      in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
      where the latter is the address of the bond0. While in l3 mode, a
      curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
      works fine from hostns, neither of them do in case of l3s. In the
      latter only a curl to https://127.0.0.1:37573 appeared to work where
      for local addresses of bond0 I saw kernel suddenly starting to emit
      ARP requests to query HW address of bond0 which remained unanswered
      and neighbor entries in INCOMPLETE state. These ARP requests only
      happen while in l3s.
      
      Debugging this further, I found the issue is that l3s mode is piggy-
      backing on l3 master device, and in this case local routes are using
      l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit
      f5a0aab8 ("net: ipv4: dst for local input routes should use l3mdev
      if relevant") and 5f02ce24 ("net: l3mdev: Allow the l3mdev to be
      a loopback"). I found that reverting them back into using the
      net->loopback_dev fixed ipvlan l3s connectivity and got everything
      working for the CNI.
      
      Now judging from 4fbae7d8 ("ipvlan: Introduce l3s mode") and the
      l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
      on l3 master device is to get the l3mdev_ip_rcv() receive hook for
      setting the dst entry of the input route without adding its own
      ipvlan specific hacks into the receive path, however, any l3 domain
      semantics beyond just that are breaking l3s operation. Note that
      ipvlan also has the ability to dynamically switch its internal
      operation from l3 to l3s for all ports via ipvlan_set_port_mode()
      at runtime. In any case, l3 vs l3s soley distinguishes itself by
      'de-confusing' netfilter through switching skb->dev to ipvlan slave
      device late in NF_INET_LOCAL_IN before handing the skb to L4.
      
      Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
      if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
      without any additional l3mdev semantics on top. This should also have
      minimal impact since dev->priv_flags is already hot in cache. With
      this set, l3s mode is working fine and I also get things like
      masquerading pod traffic on the ipvlan master properly working.
      
        [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf
      
      Fixes: f5a0aab8 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
      Fixes: 5f02ce24 ("net: l3mdev: Allow the l3mdev to be a loopback")
      Fixes: 4fbae7d8 ("ipvlan: Introduce l3s mode")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Martynas Pumputis <m@lambda.lt>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      15ef6c67
  2. 11 10月, 2018 1 次提交
    • S
      net: ipv4: update fnhe_pmtu when first hop's MTU changes · af7d6cce
      Sabrina Dubroca 提交于
      Since commit 5aad1de5 ("ipv4: use separate genid for next hop
      exceptions"), exceptions get deprecated separately from cached
      routes. In particular, administrative changes don't clear PMTU anymore.
      
      As Stefano described in commit e9fa1495 ("ipv6: Reflect MTU changes
      on PMTU of exceptions for MTU-less routes"), the PMTU discovered before
      the local MTU change can become stale:
       - if the local MTU is now lower than the PMTU, that PMTU is now
         incorrect
       - if the local MTU was the lowest value in the path, and is increased,
         we might discover a higher PMTU
      
      Similarly to what commit e9fa1495 did for IPv6, update PMTU in those
      cases.
      
      If the exception was locked, the discovered PMTU was smaller than the
      minimal accepted PMTU. In that case, if the new local MTU is smaller
      than the current PMTU, let PMTU discovery figure out if locking of the
      exception is still needed.
      
      To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU
      notifier. By the time the notifier is called, dev->mtu has been
      changed. This patch adds the old MTU as additional information in the
      notifier structure, and a new call_netdevice_notifiers_u32() function.
      
      Fixes: 5aad1de5 ("ipv4: use separate genid for next hop exceptions")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af7d6cce
  3. 27 9月, 2018 1 次提交
  4. 11 8月, 2018 1 次提交
  5. 01 8月, 2018 2 次提交
  6. 30 7月, 2018 1 次提交
  7. 17 7月, 2018 1 次提交
    • L
      net: convert gro_count to bitmask · d9f37d01
      Li RongQing 提交于
      gro_hash size is 192 bytes, and uses 3 cache lines, if there is few
      flows, gro_hash may be not fully used, so it is unnecessary to iterate
      all gro_hash in napi_gro_flush(), to occupy unnecessary cacheline.
      
      convert gro_count to a bitmask, and rename it as gro_bitmask, each bit
      represents a element of gro_hash, only flush a gro_hash element if the
      related bit is set, to speed up napi_gro_flush().
      
      and update gro_bitmask only if it will be changed, to reduce cache
      update
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NLi RongQing <lirongqing@baidu.com>
      Cc: Stefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9f37d01
  8. 16 7月, 2018 1 次提交
  9. 14 7月, 2018 2 次提交
  10. 10 7月, 2018 5 次提交
  11. 05 7月, 2018 1 次提交
  12. 04 7月, 2018 3 次提交
    • V
      net/sched: Introduce the ETF Qdisc · 25db26a9
      Vinicius Costa Gomes 提交于
      The ETF (Earliest TxTime First) qdisc uses the information added
      earlier in this series (the socket option SO_TXTIME and the new
      role of sk_buff->tstamp) to schedule packets transmission based
      on absolute time.
      
      For some workloads, just bandwidth enforcement is not enough, and
      precise control of the transmission of packets is necessary.
      
      Example:
      
      $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
                 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
      
      $ tc qdisc add dev enp2s0 parent 100:1 etf delta 100000 \
                 clockid CLOCK_TAI
      
      In this example, the Qdisc will provide SW best-effort for the control
      of the transmission time to the network adapter, the time stamp in the
      socket will be in reference to the clockid CLOCK_TAI and packets
      will leave the qdisc "delta" (100000) nanoseconds before its transmission
      time.
      
      The ETF qdisc will buffer packets sorted by their txtime. It will drop
      packets on enqueue() if their skbuff clockid does not match the clock
      reference of the Qdisc. Moreover, on dequeue(), a packet will be dropped
      if it expires while being enqueued.
      
      The qdisc also supports the SO_TXTIME deadline mode. For this mode, it
      will dequeue a packet as soon as possible and change the skb timestamp
      to 'now' during etf_dequeue().
      
      Note that both the qdisc's and the SO_TXTIME ABIs allow for a clockid
      to be configured, but it's been decided that usage of CLOCK_TAI should
      be enforced until we decide to allow for other clockids to be used.
      The rationale here is that PTP times are usually in the TAI scale, thus
      no other clocks should be necessary. For now, the qdisc will return
      EINVAL if any clocks other than CLOCK_TAI are used.
      Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
      Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25db26a9
    • E
      net: ipv4: listified version of ip_rcv · 17266ee9
      Edward Cree 提交于
      Also involved adding a way to run a netfilter hook over a list of packets.
       Rather than attempting to make netfilter know about lists (which would be
       a major project in itself) we just let it call the regular okfn (in this
       case ip_rcv_finish()) for any packets it steals, and have it give us back
       a list of packets it's synchronously accepted (which normally NF_HOOK
       would automatically call okfn() on, but we want to be able to potentially
       pass the list to a listified version of okfn().)
      The netfilter hooks themselves are indirect calls that still happen per-
       packet (see nf_hook_entry_hookfn()), but again, changing that can be left
       for future work.
      
      There is potential for out-of-order receives if the netfilter hook ends up
       synchronously stealing packets, as they will be processed before any
       accepts earlier in the list.  However, it was already possible for an
       asynchronous accept to cause out-of-order receives, so presumably this is
       considered OK.
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17266ee9
    • E
      net: core: trivial netif_receive_skb_list() entry point · f6ad8c1b
      Edward Cree 提交于
      Just calls netif_receive_skb() in a loop.
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6ad8c1b
  13. 02 7月, 2018 2 次提交
  14. 26 6月, 2018 2 次提交
  15. 05 6月, 2018 3 次提交
  16. 03 6月, 2018 1 次提交
  17. 29 5月, 2018 2 次提交
  18. 25 5月, 2018 2 次提交
    • J
      net: include hash policy in LAG changeupper info · f44aa9ef
      John Hurley 提交于
      LAG upper event notifiers contain the tx type used by the LAG device.
      Extend this to also include the hash policy used for tx types that
      utilize hashing.
      Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f44aa9ef
    • J
      xdp: change ndo_xdp_xmit API to support bulking · 735fc405
      Jesper Dangaard Brouer 提交于
      This patch change the API for ndo_xdp_xmit to support bulking
      xdp_frames.
      
      When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
      Most of the slowdown is caused by DMA API indirect function calls, but
      also the net_device->ndo_xdp_xmit() call.
      
      Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
      single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
      performance improved:
       for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
       for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps
      
      With frames avail as a bulk inside the driver ndo_xdp_xmit call,
      further optimizations are possible, like bulk DMA-mapping for TX.
      
      Testing without CONFIG_RETPOLINE show the same performance for
      physical NIC drivers.
      
      The virtual NIC driver tun sees a huge performance boost, as it can
      avoid doing per frame producer locking, but instead amortize the
      locking cost over the bulk.
      
      V2: Fix compile errors reported by kbuild test robot <lkp@intel.com>
      V4: Isolated ndo, driver changes and callers.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      735fc405
  19. 04 5月, 2018 1 次提交
  20. 02 5月, 2018 1 次提交
  21. 01 5月, 2018 1 次提交
  22. 30 4月, 2018 2 次提交
  23. 27 4月, 2018 1 次提交