1. 16 11月, 2015 4 次提交
    • E
      tcp: ensure proper barriers in lockless contexts · 00fd38d9
      Eric Dumazet 提交于
      Some functions access TCP sockets without holding a lock and
      might output non consistent data, depending on compiler and or
      architecture.
      
      tcp_diag_get_info(), tcp_get_info(), tcp_poll(), get_tcp4_sock() ...
      
      Introduce sk_state_load() and sk_state_store() to fix the issues,
      and more clearly document where this lack of locking is happening.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00fd38d9
    • M
      ipv6: Check rt->dst.from for the DST_NOCACHE route · 02bcf4e0
      Martin KaFai Lau 提交于
      All DST_NOCACHE rt6_info used to have rt->dst.from set to
      its parent.
      
      After commit 8e3d5be7 ("ipv6: Avoid double dst_free"),
      DST_NOCACHE is also set to rt6_info which does not have
      a parent (i.e. rt->dst.from is NULL).
      
      This patch catches the rt->dst.from == NULL case.
      
      Fixes: 8e3d5be7 ("ipv6: Avoid double dst_free")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02bcf4e0
    • M
      ipv6: Check expire on DST_NOCACHE route · 5973fb1e
      Martin KaFai Lau 提交于
      Since the expires of the DST_NOCACHE rt can be set during
      the ip6_rt_update_pmtu(), we also need to consider the expires
      value when doing ip6_dst_check().
      
      This patches creates __rt6_check_expired() to only
      check the expire value (if one exists) of the current rt.
      
      In rt6_dst_from_check(), it adds __rt6_check_expired() as
      one of the condition check.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5973fb1e
    • M
      ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 tree · 0d3f6d29
      Martin KaFai Lau 提交于
      The original bug report:
      https://bugzilla.redhat.com/show_bug.cgi?id=1272571
      
      The setup has a IPv4 GRE tunnel running in a IPSec.  The bug
      happens when ndisc starts sending router solicitation at the gre
      interface.  The simplified oops stack is like:
      
      __lock_acquire+0x1b2/0x1c30
      lock_acquire+0xb9/0x140
      _raw_write_lock_bh+0x3f/0x50
      __ip6_ins_rt+0x2e/0x60
      ip6_ins_rt+0x49/0x50
      ~~~~~~~~
      __ip6_rt_update_pmtu.part.54+0x145/0x250
      ip6_rt_update_pmtu+0x2e/0x40
      ~~~~~~~~
      ip_tunnel_xmit+0x1f1/0xf40
      __gre_xmit+0x7a/0x90
      ipgre_xmit+0x15a/0x220
      dev_hard_start_xmit+0x2bd/0x480
      __dev_queue_xmit+0x696/0x730
      dev_queue_xmit+0x10/0x20
      neigh_direct_output+0x11/0x20
      ip6_finish_output2+0x21f/0x770
      ip6_finish_output+0xa7/0x1d0
      ip6_output+0x56/0x190
      ~~~~~~~~
      ndisc_send_skb+0x1d9/0x400
      ndisc_send_rs+0x88/0xc0
      ~~~~~~~~
      
      The rt passed to ip6_rt_update_pmtu() is created by
      icmp6_dst_alloc() and it is not managed by the fib6 tree,
      so its rt6i_table == NULL.  When __ip6_rt_update_pmtu() creates
      a RTF_CACHE clone, the newly created clone also has rt6i_table == NULL
      and it causes the ip6_ins_rt() oops.
      
      During pmtu update, we only want to create a RTF_CACHE clone
      from a rt which is currently managed (or owned) by the
      fib6 tree.  It means either rt->rt6i_node != NULL or
      rt is a RTF_PCPU clone.
      
      It is worth to note that rt6i_table may not be NULL even it is
      not (yet) managed by the fib6 tree (e.g. addrconf_dst_alloc()).
      Hence, rt6i_node is a better check instead of rt6i_table.
      
      Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reported-by: NChris Siebenmann <cks-rhbugzilla@cs.toronto.edu>
      Cc: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d3f6d29
  2. 06 11月, 2015 2 次提交
  3. 05 11月, 2015 1 次提交
  4. 03 11月, 2015 5 次提交
  5. 02 11月, 2015 2 次提交
  6. 30 10月, 2015 1 次提交
  7. 29 10月, 2015 2 次提交
  8. 28 10月, 2015 1 次提交
  9. 27 10月, 2015 1 次提交
  10. 23 10月, 2015 3 次提交
  11. 22 10月, 2015 4 次提交
    • D
      net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr set · d46a9d67
      David Ahern 提交于
      741a11d9 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
      adds the RT6_LOOKUP_F_IFACE flag to make device index mismatch fatal if
      oif is given. Hajime reported that this change breaks the Mobile IPv6
      use case that wants to force the message through one interface yet use
      the source address from another interface. Handle this case by only
      adding the flag if oif is set and saddr is not set.
      
      Fixes: 741a11d9 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
      Cc: Hajime Tazaki <thehajime@gmail.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d46a9d67
    • E
      ipv6: gro: support sit protocol · feec0cb3
      Eric Dumazet 提交于
      Tom Herbert added SIT support to GRO with commit
      19424e05 ("sit: Add gro callbacks to sit_offload"),
      later reverted by Herbert Xu.
      
      The problem came because Tom patch was building GRO
      packets without proper meta data : If packets were locally
      delivered, we would not care.
      
      But if packets needed to be forwarded, GSO engine was not
      able to segment individual segments.
      
      With the following patch, we correctly set skb->encapsulation
      and inner network header. We also update gso_type.
      
      Tested:
      
      Server :
      netserver
      modprobe dummy
      ifconfig dummy0 8.0.0.1 netmask 255.255.255.0 up
      arp -s 8.0.0.100 4e:32:51:04:47:e5
      iptables -I INPUT -s 10.246.7.151 -j TEE --gateway 8.0.0.100
      ifconfig sixtofour0
      sixtofour0 Link encap:IPv6-in-IPv4
                inet6 addr: 2002:af6:798::1/128 Scope:Global
                inet6 addr: 2002:af6:798::/128 Scope:Global
                UP RUNNING NOARP  MTU:1480  Metric:1
                RX packets:411169 errors:0 dropped:0 overruns:0 frame:0
                TX packets:409414 errors:0 dropped:0 overruns:0 carrier:0
                collisions:0 txqueuelen:0
                RX bytes:20319631739 (20.3 GB)  TX bytes:29529556 (29.5 MB)
      
      Client :
      netperf -H 2002:af6:798::1 -l 1000 &
      
      Checked on server traffic copied on dummy0 and verify segments were
      properly rebuilt, with proper IP headers, TCP checksums...
      
      tcpdump on eth0 shows proper GRO aggregation takes place.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      feec0cb3
    • A
      netlink: Rightsize IFLA_AF_SPEC size calculation · b1974ed0
      Arad, Ronen 提交于
      if_nlmsg_size() overestimates the minimum allocation size of netlink
      dump request (when called from rtnl_calcit()) or the size of the
      message (when called from rtnl_getlink()). This is because
      ext_filter_mask is not supported by rtnl_link_get_af_size() and
      rtnl_link_get_size().
      
      The over-estimation is significant when at least one netdev has many
      VLANs configured (8 bytes for each configured VLAN).
      
      This patch-set "rightsizes" the protocol specific attribute size
      calculation by propagating ext_filter_mask to rtnl_link_get_af_size()
      and adding this a argument to get_link_af_size op in rtnl_af_ops.
      
      Bridge module already used filtering aware sizing for notifications.
      br_get_link_af_size_filtered() is consistent with the modified
      get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops.
      br_get_link_af_size() becomes unused and thus removed.
      Signed-off-by: NRonen Arad <ronen.arad@intel.com>
      Acked-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1974ed0
    • D
      net: Really fix vti6 with oif in dst lookups · f1900fb5
      David Ahern 提交于
      6e28b000 ("net: Fix vti use case with oif in dst lookups for IPv6")
      is missing the checks on FLOWI_FLAG_SKIP_NH_OIF. Add them.
      
      Fixes: 42a7b32b ("xfrm: Add oif to dst lookups")
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1900fb5
  12. 19 10月, 2015 2 次提交
    • S
      xfrm: Fix pmtu discovery for local generated packets. · ca064bd8
      Steffen Klassert 提交于
      Commit 044a832a ("xfrm: Fix local error reporting crash
      with interfamily tunnels") moved the setting of skb->protocol
      behind the last access of the inner mode family to fix an
      interfamily crash. Unfortunately now skb->protocol might not
      be set at all, so we fail dispatch to the inner address family.
      As a reault, the local error handler is not called and the
      mtu value is not reported back to userspace.
      
      We fix this by setting skb->protocol on message size errors
      before we call xfrm_local_error.
      
      Fixes: 044a832a ("xfrm: Fix local error reporting crash with interfamily tunnels")
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      ca064bd8
    • E
      tcp: do not set queue_mapping on SYNACK · dc6ef6be
      Eric Dumazet 提交于
      At the time of commit fff32699 ("tcp: reflect SYN queue_mapping into
      SYNACK packets") we had little ways to cope with SYN floods.
      
      We no longer need to reflect incoming skb queue mappings, and instead
      can pick a TX queue based on cpu cooking the SYNACK, with normal XPS
      affinities.
      
      Note that all SYNACK retransmits were picking TX queue 0, this no longer
      is a win given that SYNACK rtx are now distributed on all cpus.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc6ef6be
  13. 17 10月, 2015 1 次提交
  14. 16 10月, 2015 3 次提交
  15. 14 10月, 2015 3 次提交
  16. 13 10月, 2015 5 次提交