1. 11 6月, 2009 1 次提交
    • E
      net: No more expensive sock_hold()/sock_put() on each tx · 2b85a34e
      Eric Dumazet 提交于
      One of the problem with sock memory accounting is it uses
      a pair of sock_hold()/sock_put() for each transmitted packet.
      
      This slows down bidirectional flows because the receive path
      also needs to take a refcount on socket and might use a different
      cpu than transmit path or transmit completion path. So these
      two atomic operations also trigger cache line bounces.
      
      We can see this in tx or tx/rx workloads (media gateways for example),
      where sock_wfree() can be in top five functions in profiles.
      
      We use this sock_hold()/sock_put() so that sock freeing
      is delayed until all tx packets are completed.
      
      As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
      by one unit at init time, until sk_free() is called.
      Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
      to decrement initial offset and atomicaly check if any packets
      are in flight.
      
      skb_set_owner_w() doesnt call sock_hold() anymore
      
      sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
      reached 0 to perform the final freeing.
      
      Drawback is that a skb->truesize error could lead to unfreeable sockets, or
      even worse, prematurely calling __sk_free() on a live socket.
      
      Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
      on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
      contention point. 5 % speedup on a UDP transmit workload (depends
      on number of flows), lowering TX completion cpu usage.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b85a34e
  2. 09 6月, 2009 2 次提交
  3. 08 6月, 2009 1 次提交
    • J
      netfilter: nf_ct_icmp: keep the ICMP ct entries longer · f87fb666
      Jan Kasprzak 提交于
      Current conntrack code kills the ICMP conntrack entry as soon as
      the first reply is received. This is incorrect, as we then see only
      the first ICMP echo reply out of several possible duplicates as
      ESTABLISHED, while the rest will be INVALID. Also this unnecessarily
      increases the conntrackd traffic on H-A firewalls.
      
      Make all the ICMP conntrack entries (including the replied ones)
      last for the default of nf_conntrack_icmp{,v6}_timeout seconds.
      Signed-off-by: NJan "Yenya" Kasprzak <kas@fi.muni.cz>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      f87fb666
  4. 04 6月, 2009 1 次提交
  5. 03 6月, 2009 2 次提交
    • E
      net: skb->dst accessors · adf30907
      Eric Dumazet 提交于
      Define three accessors to get/set dst attached to a skb
      
      struct dst_entry *skb_dst(const struct sk_buff *skb)
      
      void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
      
      void skb_dst_drop(struct sk_buff *skb)
      This one should replace occurrences of :
      dst_release(skb->dst)
      skb->dst = NULL;
      
      Delete skb->dst field
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      adf30907
    • P
      netfilter: conntrack: simplify event caching system · 17e6e4ea
      Pablo Neira Ayuso 提交于
      This patch simplifies the conntrack event caching system by removing
      several events:
      
       * IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted
         since the have no clients.
       * IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter
         days.
       * IPCT_REFRESH which is not of any use since we always include the
         timeout in the messages.
      
      After this patch, the existing events are:
      
       * IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify
       addition and deletion of entries.
       * IPCT_STATUS, that notes that the status bits have changes,
       eg. IPS_SEEN_REPLY and IPS_ASSURED.
       * IPCT_PROTOINFO, that reports that internal protocol information has
       changed, eg. the TCP, DCCP and SCTP protocol state.
       * IPCT_HELPER, that a helper has been assigned or unassigned to this
       entry.
       * IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this
       covers the case when a mark is set to zero.
       * IPCT_NATSEQADJ, to report that there's updates in the NAT sequence
       adjustment.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      17e6e4ea
  6. 02 6月, 2009 1 次提交
  7. 01 6月, 2009 1 次提交
  8. 27 5月, 2009 1 次提交
  9. 22 5月, 2009 1 次提交
  10. 21 5月, 2009 2 次提交
    • J
      IPv6: set RTPROT_KERNEL to initial route · 4f724279
      Jean-Mickael Guerin 提交于
      The use of unspecified protocol in IPv6 initial route prevents quagga to
      install IPv6 default route:
      # show ipv6 route
      S   ::/0 [1/0] via fe80::1, eth1_0
      K>* ::/0 is directly connected, lo, rej
      C>* ::1/128 is directly connected, lo
      C>* fe80::/64 is directly connected, eth1_0
      
      # ip -6 route
      fe80::/64 dev eth1_0  proto kernel  metric 256  mtu 1500 advmss 1440
      hoplimit -1
      ff00::/8 dev eth1_0  metric 256  mtu 1500 advmss 1440 hoplimit -1
      unreachable default dev lo  proto none  metric -1  error -101 hoplimit 255
      
      The attached patch ensures RTPROT_KERNEL to the default initial route
      and fixes the problem for quagga.
      This is similar to "ipv6: protocol for address routes"
      f410a1fb.
      
      # show ipv6 route
      S>* ::/0 [1/0] via fe80::1, eth1_0
      C>* ::1/128 is directly connected, lo
      C>* fe80::/64 is directly connected, eth1_0
      
      # ip -6 route
      fe80::/64 dev eth1_0  proto kernel  metric 256  mtu 1500 advmss 1440
      hoplimit -1
      fe80::/64 dev eth1_0  proto kernel  metric 256  mtu 1500 advmss 1440
      hoplimit -1
      ff00::/8 dev eth1_0  metric 256  mtu 1500 advmss 1440 hoplimit -1
      default via fe80::1 dev eth1_0  proto zebra  metric 1024  mtu 1500
      advmss 1440 hoplimit -1
      unreachable default dev lo  proto kernel  metric -1  error -101 hoplimit 255
      Signed-off-by: NJean-Mickael Guerin <jean-mickael.guerin@6wind.com>
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f724279
    • R
      net: Remove unused parameter from fill method in fib_rules_ops. · 04af8cf6
      Rami Rosen 提交于
      The netlink message header (struct nlmsghdr) is an unused parameter in
      fill method of fib_rules_ops struct.  This patch removes this
      parameter from this method and fixes the places where this method is
      called.
      
      (include/net/fib_rules.h)
      Signed-off-by: NRami Rosen <ramirose@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04af8cf6
  11. 20 5月, 2009 5 次提交
  12. 19 5月, 2009 1 次提交
  13. 18 5月, 2009 2 次提交
  14. 08 5月, 2009 10 次提交
  15. 05 5月, 2009 1 次提交
  16. 29 4月, 2009 1 次提交
  17. 27 4月, 2009 2 次提交
  18. 20 4月, 2009 1 次提交
  19. 14 4月, 2009 1 次提交
  20. 11 4月, 2009 1 次提交
    • V
      ipv6: Fix NULL pointer dereference with time-wait sockets · 499923c7
      Vlad Yasevich 提交于
      Commit b2f5e7cd
      (ipv6: Fix conflict resolutions during ipv6 binding)
      introduced a regression where time-wait sockets were
      not treated correctly.  This resulted in the following:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000062
      IP: [<ffffffff805d7d61>] ipv4_rcv_saddr_equal+0x61/0x70
      ...
      Call Trace:
      [<ffffffffa033847b>] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6]
      [<ffffffffa03505a8>] inet6_csk_bind_conflict+0x88/0xd0 [ipv6]
      [<ffffffff805bb18e>] inet_csk_get_port+0x1ee/0x400
      [<ffffffffa0319b7f>] inet6_bind+0x1cf/0x3a0 [ipv6]
      [<ffffffff8056d17c>] ? sockfd_lookup_light+0x3c/0xd0
      [<ffffffff8056ed49>] sys_bind+0x89/0x100
      [<ffffffff80613ea2>] ? trace_hardirqs_on_thunk+0x3a/0x3c
      [<ffffffff8020bf9b>] system_call_fastpath+0x16/0x1b
      Tested-by: NBrian Haley <brian.haley@hp.com>
      Tested-by: NEd Tomlinson <edt@aei.ca>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      499923c7
  21. 07 4月, 2009 1 次提交
    • S
      xfrm: fix fragmentation on inter family tunnels · d1d88e5d
      Steffen Klassert 提交于
      If an ipv4 packet (not locally generated with IP_DF flag not set) bigger
      than mtu size is supposed to go via a xfrm ipv6 tunnel, the packetsize
      check in xfrm4_tunnel_check_size() is omited and ipv6 drops the packet
      without sending a notice to the original sender of the ipv4 packet.
      
      Another issue is that ipv4 connection tracking does reassembling of
      incomming fragmented packets. If such a reassembled packet is supposed to
      go via a xfrm ipv6 tunnel it will be droped, even if the original sender
      did proper fragmentation.
      
      According to RFC 2473 (section 7) tunnel ipv6 packets resulting from the
      encapsulation of an original packet are considered as locally generated
      packets. If such a packet passed the checks in xfrm{4,6}_tunnel_check_size()
      fragmentation is allowed according to RFC 2473 (section 7.1/7.2).
      
      This patch sets skb->local_df in xfrm6_prepare_output() to achieve
      fragmentation in this case.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1d88e5d
  22. 06 4月, 2009 1 次提交