1. 15 6月, 2016 2 次提交
    • S
      udp reuseport: fix packet of same flow hashed to different socket · d1e37288
      Su, Xuemin 提交于
      There is a corner case in which udp packets belonging to a same
      flow are hashed to different socket when hslot->count changes from 10
      to 11:
      
      1) When hslot->count <= 10, __udp_lib_lookup() searches udp_table->hash,
      and always passes 'daddr' to udp_ehashfn().
      
      2) When hslot->count > 10, __udp_lib_lookup() searches udp_table->hash2,
      but may pass 'INADDR_ANY' to udp_ehashfn() if the sockets are bound to
      INADDR_ANY instead of some specific addr.
      
      That means when hslot->count changes from 10 to 11, the hash calculated by
      udp_ehashfn() is also changed, and the udp packets belonging to a same
      flow will be hashed to different socket.
      
      This is easily reproduced:
      1) Create 10 udp sockets and bind all of them to 0.0.0.0:40000.
      2) From the same host send udp packets to 127.0.0.1:40000, record the
      socket index which receives the packets.
      3) Create 1 more udp socket and bind it to 0.0.0.0:44096. The number 44096
      is 40000 + UDP_HASH_SIZE(4096), this makes the new socket put into the
      same hslot as the aformentioned 10 sockets, and makes the hslot->count
      change from 10 to 11.
      4) From the same host send udp packets to 127.0.0.1:40000, and the socket
      index which receives the packets will be different from the one received
      in step 2.
      This should not happen as the socket bound to 0.0.0.0:44096 should not
      change the behavior of the sockets bound to 0.0.0.0:40000.
      
      It's the same case for IPv6, and this patch also fixes that.
      Signed-off-by: NSu, Xuemin <suxm@chinanetcenter.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1e37288
    • H
      ipv4: fix checksum annotation in udp4_csum_init · b46d9f62
      Hannes Frederic Sowa 提交于
      Reported-by: NCong Wang <xiyou.wangcong@gmail.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Fixes: 4068579e ("net: Implmement RFC 6936 (zero RX csums for UDP/IPv6")
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b46d9f62
  2. 12 6月, 2016 1 次提交
  3. 11 6月, 2016 1 次提交
  4. 03 6月, 2016 1 次提交
  5. 24 5月, 2016 1 次提交
    • E
      ipv4: Fix non-initialized TTL when CONFIG_SYSCTL=n · 049bbf58
      Ezequiel Garcia 提交于
      Commit fa50d974 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
      moves the default TTL assignment, and as side-effect IPv4 TTL now
      has a default value only if sysctl support is enabled (CONFIG_SYSCTL=y).
      
      The sysctl_ip_default_ttl is fundamental for IP to work properly,
      as it provides the TTL to be used as default. The defautl TTL may be
      used in ip_selected_ttl, through the following flow:
      
        ip_select_ttl
          ip4_dst_hoplimit
            net->ipv4.sysctl_ip_default_ttl
      
      This commit fixes the issue by assigning net->ipv4.sysctl_ip_default_ttl
      in net_init_net, called during ipv4's initialization.
      
      Without this commit, a kernel built without sysctl support will send
      all IP packets with zero TTL (unless a TTL is explicitly set, e.g.
      with setsockopt).
      
      Given a similar issue might appear on the other knobs that were
      namespaceify, this commit also moves them.
      
      Fixes: fa50d974 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
      Signed-off-by: NEzequiel Garcia <ezequiel@vanguardiasur.com.ar>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      049bbf58
  6. 21 5月, 2016 9 次提交
  7. 17 5月, 2016 2 次提交
  8. 15 5月, 2016 1 次提交
    • P
      net/route: enforce hoplimit max value · 626abd59
      Paolo Abeni 提交于
      Currently, when creating or updating a route, no check is performed
      in both ipv4 and ipv6 code to the hoplimit value.
      
      The caller can i.e. set hoplimit to 256, and when such route will
       be used, packets will be sent with hoplimit/ttl equal to 0.
      
      This commit adds checks for the RTAX_HOPLIMIT value, in both ipv4
      ipv6 route code, substituting any value greater than 255 with 255.
      
      This is consistent with what is currently done for ADVMSS and MTU
      in the ipv4 code.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      626abd59
  9. 13 5月, 2016 2 次提交
    • A
      udp: Resolve NULL pointer dereference over flow-based vxlan device · ed7cbbce
      Alexander Duyck 提交于
      While testing an OpenStack configuration using VXLANs I saw the following
      call trace:
      
       RIP: 0010:[<ffffffff815fad49>] udp4_lib_lookup_skb+0x49/0x80
       RSP: 0018:ffff88103867bc50  EFLAGS: 00010286
       RAX: ffff88103269bf00 RBX: ffff88103269bf00 RCX: 00000000ffffffff
       RDX: 0000000000004300 RSI: 0000000000000000 RDI: ffff880f2932e780
       RBP: ffff88103867bc60 R08: 0000000000000000 R09: 000000009001a8c0
       R10: 0000000000004400 R11: ffffffff81333a58 R12: ffff880f2932e794
       R13: 0000000000000014 R14: 0000000000000014 R15: ffffe8efbfd89ca0
       FS:  0000000000000000(0000) GS:ffff88103fd80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000488 CR3: 0000000001c06000 CR4: 00000000001426e0
       Stack:
        ffffffff81576515 ffffffff815733c0 ffff88103867bc98 ffffffff815fcc17
        ffff88103269bf00 ffffe8efbfd89ca0 0000000000000014 0000000000000080
        ffffe8efbfd89ca0 ffff88103867bcc8 ffffffff815fcf8b ffff880f2932e794
       Call Trace:
        [<ffffffff81576515>] ? skb_checksum+0x35/0x50
        [<ffffffff815733c0>] ? skb_push+0x40/0x40
        [<ffffffff815fcc17>] udp_gro_receive+0x57/0x130
        [<ffffffff815fcf8b>] udp4_gro_receive+0x10b/0x2c0
        [<ffffffff81605863>] inet_gro_receive+0x1d3/0x270
        [<ffffffff81589e59>] dev_gro_receive+0x269/0x3b0
        [<ffffffff8158a1b8>] napi_gro_receive+0x38/0x120
        [<ffffffffa0871297>] gro_cell_poll+0x57/0x80 [vxlan]
        [<ffffffff815899d0>] net_rx_action+0x160/0x380
        [<ffffffff816965c7>] __do_softirq+0xd7/0x2c5
        [<ffffffff8107d969>] run_ksoftirqd+0x29/0x50
        [<ffffffff8109a50f>] smpboot_thread_fn+0x10f/0x160
        [<ffffffff8109a400>] ? sort_range+0x30/0x30
        [<ffffffff81096da8>] kthread+0xd8/0xf0
        [<ffffffff81693c82>] ret_from_fork+0x22/0x40
        [<ffffffff81096cd0>] ? kthread_park+0x60/0x60
      
      The following trace is seen when receiving a DHCP request over a flow-based
      VXLAN tunnel.  I believe this is caused by the metadata dst having a NULL
      dev value and as a result dev_net(dev) is causing a NULL pointer dereference.
      
      To resolve this I am replacing the check for skb_dst(skb)->dev with just
      skb->dev.  This makes sense as the callers of this function are usually in
      the receive path and as such skb->dev should always be populated.  In
      addition other functions in the area where these are called are already
      using dev_net(skb->dev) to determine the namespace the UDP packet belongs
      in.
      
      Fixes: 63058308 ("udp: Add udp6_lib_lookup_skb and udp4_lib_lookup_skb")
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed7cbbce
    • H
      gre: Fix wrong tpi->proto in WCCP · da73b4e9
      Haishuang Yan 提交于
      When dealing with WCCP in gre6 tunnel, it sets the wrong tpi->protocol,
      that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic.
      Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da73b4e9
  10. 12 5月, 2016 4 次提交
    • D
      net: original ingress device index in PKTINFO · 0b922b7a
      David Ahern 提交于
      Applications such as OSPF and BFD need the original ingress device not
      the VRF device; the latter can be derived from the former. To that end
      add the skb_iif to inet_skb_parm and set it in ipv4 code after clearing
      the skb control buffer similar to IPv6. From there the pktinfo can just
      pull it from cb with the PKTINFO_SKB_CB cast.
      
      The previous patch moving the skb->dev change to L3 means nothing else
      is needed for IPv6; it just works.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b922b7a
    • D
      net: l3mdev: Add hook in ip and ipv6 · 74b20582
      David Ahern 提交于
      Currently the VRF driver uses the rx_handler to switch the skb device
      to the VRF device. Switching the dev prior to the ip / ipv6 layer
      means the VRF driver has to duplicate IP/IPv6 processing which adds
      overhead and makes features such as retaining the ingress device index
      more complicated than necessary.
      
      This patch moves the hook to the L3 layer just after the first NF_HOOK
      for PRE_ROUTING. This location makes exposing the original ingress device
      trivial (next patch) and allows adding other NF_HOOKs to the VRF driver
      in the future.
      
      dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb
      with the switched device through the packet taps to maintain current
      behavior (tcpdump can be used on either the vrf device or the enslaved
      devices).
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74b20582
    • J
      gre: do not keep the GRE header around in collect medata mode · e271c7b4
      Jiri Benc 提交于
      For ipgre interface in collect metadata mode, it doesn't make sense for the
      interface to be of ARPHRD_IPGRE type. The outer header of received packets
      is not needed, as all the information from it is present in metadata_dst. We
      already don't set ipgre_header_ops for collect metadata interfaces, which is
      the only consumer of mac_header pointing to the outer IP header.
      
      Just set the interface type to ARPHRD_NONE in collect metadata mode for
      ipgre (not gretap, that still correctly stays ARPHRD_ETHER) and reset
      mac_header.
      
      Fixes: a64b04d8 ("gre: do not assign header_ops in collect metadata mode")
      Fixes: 2e15ea39 ("ip_gre: Add support to collect tunnel metadata.")
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e271c7b4
    • L
      tcp: replace cnt & rtt with struct in pkts_acked() · 756ee172
      Lawrence Brakmo 提交于
      Replace 2 arguments (cnt and rtt) in the congestion control modules'
      pkts_acked() function with a struct. This will allow adding more
      information without having to modify existing congestion control
      modules (tcp_nv in particular needs bytes in flight when packet
      was sent).
      
      As proposed by Neal Cardwell in his comments to the tcp_nv patch.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      756ee172
  11. 11 5月, 2016 1 次提交
  12. 10 5月, 2016 1 次提交
  13. 07 5月, 2016 4 次提交
  14. 06 5月, 2016 2 次提交
  15. 05 5月, 2016 8 次提交