1. 04 6月, 2019 1 次提交
  2. 02 5月, 2019 1 次提交
    • Z
      ipv4: set the tcp_min_rtt_wlen range from 0 to one day · 250e51f8
      ZhangXiaoxu 提交于
      [ Upstream commit 19fad20d15a6494f47f85d869f00b11343ee5c78 ]
      
      There is a UBSAN report as below:
      UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56
      signed integer overflow:
      2147483647 * 1000 cannot be represented in type 'int'
      CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e #1
      Call Trace:
       <IRQ>
       dump_stack+0x8c/0xba
       ubsan_epilogue+0x11/0x60
       handle_overflow+0x12d/0x170
       ? ttwu_do_wakeup+0x21/0x320
       __ubsan_handle_mul_overflow+0x12/0x20
       tcp_ack_update_rtt+0x76c/0x780
       tcp_clean_rtx_queue+0x499/0x14d0
       tcp_ack+0x69e/0x1240
       ? __wake_up_sync_key+0x2c/0x50
       ? update_group_capacity+0x50/0x680
       tcp_rcv_established+0x4e2/0xe10
       tcp_v4_do_rcv+0x22b/0x420
       tcp_v4_rcv+0xfe8/0x1190
       ip_protocol_deliver_rcu+0x36/0x180
       ip_local_deliver+0x15b/0x1a0
       ip_rcv+0xac/0xd0
       __netif_receive_skb_one_core+0x7f/0xb0
       __netif_receive_skb+0x33/0xc0
       netif_receive_skb_internal+0x84/0x1c0
       napi_gro_receive+0x2a0/0x300
       receive_buf+0x3d4/0x2350
       ? detach_buf_split+0x159/0x390
       virtnet_poll+0x198/0x840
       ? reweight_entity+0x243/0x4b0
       net_rx_action+0x25c/0x770
       __do_softirq+0x19b/0x66d
       irq_exit+0x1eb/0x230
       do_IRQ+0x7a/0x150
       common_interrupt+0xf/0xf
       </IRQ>
      
      It can be reproduced by:
        echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen
      
      Fixes: f6722583 ("tcp: track min RTT using windowed min-filter")
      Signed-off-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      250e51f8
  3. 27 9月, 2018 1 次提交
  4. 17 8月, 2018 2 次提交
  5. 13 8月, 2018 1 次提交
    • V
      ipv6: Add icmp_echo_ignore_all support for ICMPv6 · e6f86b0f
      Virgile Jarry 提交于
      Preventing the kernel from responding to ICMP Echo Requests messages
      can be useful in several ways. The sysctl parameter
      'icmp_echo_ignore_all' can be used to prevent the kernel from
      responding to IPv4 ICMP echo requests. For IPv6 pings, such
      a sysctl kernel parameter did not exist.
      
      Add the ability to prevent the kernel from responding to IPv6
      ICMP echo requests through the use of the following sysctl
      parameter : /proc/sys/net/ipv6/icmp/echo_ignore_all.
      Update the documentation to reflect this change.
      Signed-off-by: NVirgile Jarry <virgile@acceis.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6f86b0f
  6. 02 8月, 2018 2 次提交
  7. 27 7月, 2018 2 次提交
  8. 25 7月, 2018 1 次提交
  9. 24 7月, 2018 1 次提交
  10. 19 7月, 2018 2 次提交
  11. 17 7月, 2018 3 次提交
  12. 12 7月, 2018 3 次提交
  13. 02 7月, 2018 1 次提交
  14. 28 6月, 2018 1 次提交
    • F
      skbuff: preserve sock reference when scrubbing the skb. · 9c4c3252
      Flavio Leitner 提交于
      The sock reference is lost when scrubbing the packet and that breaks
      TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
      performance impacts of about 50% in a single TCP stream when crossing
      network namespaces.
      
      XPS breaks because the queue mapping stored in the socket is not
      available, so another random queue might be selected when the stack
      needs to transmit something like a TCP ACK, or TCP Retransmissions.
      That causes packet re-ordering and/or performance issues.
      
      TSQ breaks because it orphans the packet while it is still in the
      host, so packets are queued contributing to the buffer bloat problem.
      
      Preserving the sock reference fixes both issues. The socket is
      orphaned anyways in the receiving path before any relevant action
      and on TX side the netfilter checks if the reference is local before
      use it.
      Signed-off-by: NFlavio Leitner <fbl@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c4c3252
  15. 24 6月, 2018 1 次提交
  16. 23 6月, 2018 4 次提交
  17. 15 6月, 2018 1 次提交
  18. 06 6月, 2018 1 次提交
  19. 05 6月, 2018 4 次提交
  20. 04 6月, 2018 1 次提交
    • B
      xsk: new descriptor addressing scheme · bbff2f32
      Björn Töpel 提交于
      Currently, AF_XDP only supports a fixed frame-size memory scheme where
      each frame is referenced via an index (idx). A user passes the frame
      index to the kernel, and the kernel acts upon the data.  Some NICs,
      however, do not have a fixed frame-size model, instead they have a
      model where a memory window is passed to the hardware and multiple
      frames are filled into that window (referred to as the "type-writer"
      model).
      
      By changing the descriptor format from the current frame index
      addressing scheme, AF_XDP can in the future be extended to support
      these kinds of NICs.
      
      In the index-based model, an idx refers to a frame of size
      frame_size. Addressing a frame in the UMEM is done by offseting the
      UMEM starting address by a global offset, idx * frame_size + offset.
      Communicating via the fill- and completion-rings are done by means of
      idx.
      
      In this commit, the idx is removed in favor of an address (addr),
      which is a relative address ranging over the UMEM. To convert an
      idx-based address to the new addr is simply: addr = idx * frame_size +
      offset.
      
      We also stop referring to the UMEM "frame" as a frame. Instead it is
      simply called a chunk.
      
      To transfer ownership of a chunk to the kernel, the addr of the chunk
      is passed in the fill-ring. Note, that the kernel will mask addr to
      make it chunk aligned, so there is no need for userspace to do
      that. E.g., for a chunk size of 2k, passing an addr of 2048, 2050 or
      3000 to the fill-ring will refer to the same chunk.
      
      On the completion-ring, the addr will match that of the Tx descriptor,
      passed to the kernel.
      
      Changing the descriptor format to use chunks/addr will allow for
      future changes to move to a type-writer based model, where multiple
      frames can reside in one chunk. In this model passing one single chunk
      into the fill-ring, would potentially result in multiple Rx
      descriptors.
      
      This commit changes the uapi of AF_XDP sockets, and updates the
      documentation.
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      bbff2f32
  21. 29 5月, 2018 3 次提交
    • S
      virtio_net: Extend virtio to use VF datapath when available · ba5e4426
      Sridhar Samudrala 提交于
      This patch enables virtio_net to switch over to a VF datapath when STANDBY
      feature is enabled and a VF netdev is present with the same MAC address.
      It allows live migration of a VM with a direct attached VF without the need
      to setup a bond/team between a VF and virtio net device in the guest.
      
      It uses the API that is exported by the net_failover driver to create and
      and destroy a master failover netdev. When STANDBY feature is enabled, an
      additional netdev(failover netdev) is created that acts as a master device
      and tracks the state of the 2 lower netdevs. The original virtio_net netdev
      is marked as 'standby' netdev and a passthru device with the same MAC is
      registered as 'primary' netdev.
      
      The hypervisor needs to unplug the VF device from the guest on the source
      host and reset the MAC filter of the VF to initiate failover of datapath
      to virtio before starting the migration. After the migration is completed,
      the destination hypervisor sets the MAC filter on the VF and plugs it back
      to the guest to switch over to VF datapath.
      
      This patch is based on the discussion initiated by Jesse on this thread.
      https://marc.info/?l=linux-virtualization&m=151189725224231&w=2Signed-off-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba5e4426
    • S
      net: Introduce net_failover driver · cfc80d9a
      Sridhar Samudrala 提交于
      The net_failover driver provides an automated failover mechanism via APIs
      to create and destroy a failover master netdev and manages a primary and
      standby slave netdevs that get registered via the generic failover
      infrastructure.
      
      The failover netdev acts a master device and controls 2 slave devices. The
      original paravirtual interface gets registered as 'standby' slave netdev and
      a passthru/vf device with the same MAC gets registered as 'primary' slave
      netdev. Both 'standby' and 'failover' netdevs are associated with the same
      'pci' device. The user accesses the network interface via 'failover' netdev.
      The 'failover' netdev chooses 'primary' netdev as default for transmits when
      it is available with link up and running.
      
      This can be used by paravirtual drivers to enable an alternate low latency
      datapath. It also enables hypervisor controlled live migration of a VM with
      direct attached VF by failing over to the paravirtual datapath when the VF
      is unplugged.
      Signed-off-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfc80d9a
    • S
      net: Introduce generic failover module · 30c8bd5a
      Sridhar Samudrala 提交于
      The failover module provides a generic interface for paravirtual drivers
      to register a netdev and a set of ops with a failover instance. The ops
      are used as event handlers that get called to handle netdev register/
      unregister/link change/name change events on slave pci ethernet devices
      with the same mac address as the failover netdev.
      
      This enables paravirtual drivers to use a VF as an accelerated low latency
      datapath. It also allows migration of VMs with direct attached VFs by
      failing over to the paravirtual datapath when the VF is unplugged.
      Signed-off-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30c8bd5a
  22. 25 5月, 2018 1 次提交
    • E
      ppp: remove the PPPIOCDETACH ioctl · af8d3c7c
      Eric Biggers 提交于
      The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
      before f_count has reached 0, which is fundamentally a bad idea.  It
      does check 'f_count < 2', which excludes concurrent operations on the
      file since they would only be possible with a shared fd table, in which
      case each fdget() would take a file reference.  However, it fails to
      account for the fact that even with 'f_count == 1' the file can still be
      linked into epoll instances.  As reported by syzbot, this can trivially
      be used to cause a use-after-free.
      
      Yet, the only known user of PPPIOCDETACH is pppd versions older than
      ppp-2.4.2, which was released almost 15 years ago (November 2003).
      Also, PPPIOCDETACH apparently stopped working reliably at around the
      same time, when the f_count check was added to the kernel, e.g. see
      https://lkml.org/lkml/2002/12/31/83.  Also, the current 'f_count < 2'
      check makes PPPIOCDETACH only work in single-threaded applications; it
      always fails if called from a multithreaded application.
      
      All pppd versions released in the last 15 years just close() the file
      descriptor instead.
      
      Therefore, instead of hacking around this bug by exporting epoll
      internals to modules, and probably missing other related bugs, just
      remove the PPPIOCDETACH ioctl and see if anyone actually notices.  Leave
      a stub in place that prints a one-time warning and returns EINVAL.
      
      Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Acked-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NGuillaume Nault <g.nault@alphalink.fr>
      Tested-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af8d3c7c
  23. 18 5月, 2018 2 次提交