1. 19 4月, 2010 1 次提交
  2. 25 3月, 2010 1 次提交
  3. 07 1月, 2010 1 次提交
    • O
      ip: fix mc_loop checks for tunnels with multicast outer addresses · 7ad6848c
      Octavian Purdila 提交于
      When we have L3 tunnels with different inner/outer families
      (i.e. IPV4/IPV6) which use a multicast address as the outer tunnel
      destination address, multicast packets will be loopbacked back to the
      sending socket even if IP*_MULTICAST_LOOP is set to disabled.
      
      The mc_loop flag is present in the family specific part of the socket
      (e.g. the IPv4 or IPv4 specific part).  setsockopt sets the inner
      family mc_loop flag. When the packet is pushed through the L3 tunnel
      it will eventually be processed by the outer family which if different
      will check the flag in a different part of the socket then it was set.
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ad6848c
  4. 02 12月, 2009 1 次提交
  5. 24 11月, 2009 1 次提交
  6. 19 10月, 2009 1 次提交
    • E
      inet: rename some inet_sock fields · c720c7e8
      Eric Dumazet 提交于
      In order to have better cache layouts of struct sock (separate zones
      for rx/tx paths), we need this preliminary patch.
      
      Goal is to transfert fields used at lookup time in the first
      read-mostly cache line (inside struct sock_common) and move sk_refcnt
      to a separate cache line (only written by rx path)
      
      This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
      sport and id fields. This allows a future patch to define these
      fields as macros, like sk_refcnt, without name clashes.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c720c7e8
  7. 02 10月, 2009 1 次提交
  8. 03 9月, 2009 1 次提交
    • E
      ip: Report qdisc packet drops · 6ce9e7b5
      Eric Dumazet 提交于
      Christoph Lameter pointed out that packet drops at qdisc level where not
      accounted in SNMP counters. Only if application sets IP_RECVERR, drops
      are reported to user (-ENOBUFS errors) and SNMP counters updated.
      
      IP_RECVERR is used to enable extended reliable error message passing,
      but these are not needed to update system wide SNMP stats.
      
      This patch changes things a bit to allow SNMP counters to be updated,
      regardless of IP_RECVERR being set or not on the socket.
      
      Example after an UDP tx flood
      # netstat -s 
      ...
      IP:
          1487048 outgoing packets dropped
      ...
      Udp:
      ...
          SndbufErrors: 1487048
      
      
      send() syscalls, do however still return an OK status, to not
      break applications.
      
      Note : send() manual page explicitly says for -ENOBUFS error :
      
       "The output queue for a network interface was full.
        This generally indicates that the interface has stopped sending,
        but may be caused by transient congestion.
        (Normally, this does not occur in Linux. Packets are just silently
        dropped when a device queue overflows.) "
      
      This is not true for IP_RECVERR enabled sockets : a send() syscall
      that hit a qdisc drop returns an ENOBUFS error.
      
      Many thanks to Christoph, David, and last but not least, Alexey !
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ce9e7b5
  9. 28 8月, 2009 1 次提交
  10. 12 7月, 2009 1 次提交
  11. 11 6月, 2009 1 次提交
    • E
      net: No more expensive sock_hold()/sock_put() on each tx · 2b85a34e
      Eric Dumazet 提交于
      One of the problem with sock memory accounting is it uses
      a pair of sock_hold()/sock_put() for each transmitted packet.
      
      This slows down bidirectional flows because the receive path
      also needs to take a refcount on socket and might use a different
      cpu than transmit path or transmit completion path. So these
      two atomic operations also trigger cache line bounces.
      
      We can see this in tx or tx/rx workloads (media gateways for example),
      where sock_wfree() can be in top five functions in profiles.
      
      We use this sock_hold()/sock_put() so that sock freeing
      is delayed until all tx packets are completed.
      
      As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
      by one unit at init time, until sk_free() is called.
      Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
      to decrement initial offset and atomicaly check if any packets
      are in flight.
      
      skb_set_owner_w() doesnt call sock_hold() anymore
      
      sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
      reached 0 to perform the final freeing.
      
      Drawback is that a skb->truesize error could lead to unfreeable sockets, or
      even worse, prematurely calling __sk_free() on a live socket.
      
      Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
      on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
      contention point. 5 % speedup on a UDP transmit workload (depends
      on number of flows), lowering TX completion cpu usage.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b85a34e
  12. 09 6月, 2009 1 次提交
  13. 03 6月, 2009 2 次提交
  14. 27 4月, 2009 1 次提交
  15. 16 2月, 2009 1 次提交
  16. 25 11月, 2008 2 次提交
    • E
      net: avoid a pair of dst_hold()/dst_release() in ip_push_pending_frames() · a21bba94
      Eric Dumazet 提交于
      We can reduce pressure on dst entry refcount that slowdown UDP transmit
      path on SMP machines. This pressure is visible on RTP servers when
      delivering content to mediagateways, especially big ones, handling
      thousand of streams. Several cpus send UDP frames to the same
      destination, hence use the same dst entry.
      
      This patch makes ip_push_pending_frames() steal the refcount its
      callers had to take when filling inet->cork.dst.
      
      This doesnt avoid all refcounting, but still gives speedups on SMP,
      on UDP/RAW transmit path.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a21bba94
    • E
      net: avoid a pair of dst_hold()/dst_release() in ip_append_data() · 2e77d89b
      Eric Dumazet 提交于
      We can reduce pressure on dst entry refcount that slowdown UDP transmit
      path on SMP machines. This pressure is visible on RTP servers when
      delivering content to mediagateways, especially big ones, handling
      thousand of streams. Several cpus send UDP frames to the same
      destination, hence use the same dst entry.
      
      This patch makes ip_append_data() eventually steal the refcount its
      callers had to take on the dst entry.
      
      This doesnt avoid all refcounting, but still gives speedups on SMP,
      on UDP/RAW transmit path
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e77d89b
  17. 03 11月, 2008 1 次提交
  18. 01 10月, 2008 1 次提交
  19. 26 7月, 2008 1 次提交
  20. 17 7月, 2008 1 次提交
  21. 15 7月, 2008 1 次提交
  22. 12 6月, 2008 1 次提交
  23. 30 4月, 2008 1 次提交
    • K
      [IPv4] UFO: prevent generation of chained skb destined to UFO device · be9164e7
      Kostya B 提交于
      Problem: ip_append_data() could wrongly generate a chained skb for
      devices which support UFO.  When sk_write_queue is not empty
      (e.g. MSG_MORE), __instead__ of appending data into the next nr_frag
      of the queued skb, a new chained skb is created.
      
      I would normally assume UFO device should get data in nr_frags and not
      in frag_list.  Later the udp4_hwcsum_outgoing() resets csum to NONE
      and skb_gso_segment() has oops.
      
      Proposal:
      1. Even length is less than mtu, employ ip_ufo_append_data()
      and append data to the __existed__ skb in the sk_write_queue.
      
      2. ip_ufo_append_data() is fixed due to a wrong manipulation of
      peek-ing and later enqueue-ing of the same skb.  Now, enqueuing is
      always performed, because on error the further
      ip_flush_pending_frames() would release the queued skb.
      Signed-off-by: NKostya B <bkostya@hotmail.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be9164e7
  24. 26 3月, 2008 1 次提交
  25. 25 3月, 2008 2 次提交
  26. 06 3月, 2008 1 次提交
  27. 01 2月, 2008 2 次提交
  28. 29 1月, 2008 6 次提交
  29. 23 1月, 2008 2 次提交
  30. 07 11月, 2007 1 次提交