1. 03 12月, 2013 1 次提交
  2. 16 11月, 2013 1 次提交
  3. 11 11月, 2013 1 次提交
    • J
      netfilter: push reasm skb through instead of original frag skbs · 6aafeef0
      Jiri Pirko 提交于
      Pushing original fragments through causes several problems. For example
      for matching, frags may not be matched correctly. Take following
      example:
      
      <example>
      On HOSTA do:
      ip6tables -I INPUT -p icmpv6 -j DROP
      ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
      
      and on HOSTB you do:
      ping6 HOSTA -s2000    (MTU is 1500)
      
      Incoming echo requests will be filtered out on HOSTA. This issue does
      not occur with smaller packets than MTU (where fragmentation does not happen)
      </example>
      
      As was discussed previously, the only correct solution seems to be to use
      reassembled skb instead of separete frags. Doing this has positive side
      effects in reducing sk_buff by one pointer (nfct_reasm) and also the reams
      dances in ipvs and conntrack can be removed.
      
      Future plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
      entirely and use code in net/ipv6/reassembly.c instead.
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6aafeef0
  4. 08 11月, 2013 2 次提交
  5. 05 11月, 2013 1 次提交
  6. 04 11月, 2013 1 次提交
  7. 22 10月, 2013 1 次提交
    • E
      ipv6: sit: add GSO/TSO support · 61c1db7f
      Eric Dumazet 提交于
      Now ipv6_gso_segment() is stackable, its relatively easy to
      implement GSO/TSO support for SIT tunnels
      
      Performance results, when segmentation is done after tunnel
      device (as no NIC is yet enabled for TSO SIT support) :
      
      Before patch :
      
      lpq84:~# ./netperf -H 2002:af6:1153:: -Cc
      MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
       87380  16384  16384    10.00      3168.31   4.81     4.64     2.988   2.877
      
      After patch :
      
      lpq84:~# ./netperf -H 2002:af6:1153:: -Cc
      MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
       87380  16384  16384    10.00      5525.00   7.76     5.17     2.763   1.840
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61c1db7f
  8. 20 10月, 2013 2 次提交
    • E
      ipip: add GSO/TSO support · cb32f511
      Eric Dumazet 提交于
      Now inet_gso_segment() is stackable, its relatively easy to
      implement GSO/TSO support for IPIP
      
      Performance results, when segmentation is done after tunnel
      device (as no NIC is yet enabled for TSO IPIP support) :
      
      Before patch :
      
      lpq83:~# ./netperf -H 7.7.9.84 -Cc
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
       87380  16384  16384    10.00      3357.88   5.09     3.70     2.983   2.167
      
      After patch :
      
      lpq83:~# ./netperf -H 7.7.9.84 -Cc
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
       87380  16384  16384    10.00      7710.19   4.52     6.62     1.152   1.687
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb32f511
    • E
      ipv4: gso: make inet_gso_segment() stackable · 3347c960
      Eric Dumazet 提交于
      In order to support GSO on IPIP, we need to make
      inet_gso_segment() stackable.
      
      It should not assume network header starts right after mac
      header.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3347c960
  9. 18 10月, 2013 1 次提交
    • E
      net: refactor sk_page_frag_refill() · 400dfd3a
      Eric Dumazet 提交于
      While working on virtio_net new allocation strategy to increase
      payload/truesize ratio, we found that refactoring sk_page_frag_refill()
      was needed.
      
      This patch splits sk_page_frag_refill() into two parts, adding
      skb_page_frag_refill() which can be used without a socket.
      
      While we are at it, add a minimum frag size of 32 for
      sk_page_frag_refill()
      
      Michael will either use netdev_alloc_frag() from softirq context,
      or skb_page_frag_refill() from process context in refill_work()
       (GFP_KERNEL allocations)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Michael Dalton <mwdalton@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      400dfd3a
  10. 03 10月, 2013 1 次提交
  11. 01 10月, 2013 2 次提交
  12. 27 9月, 2013 1 次提交
    • J
      net.h/skbuff.h: Remove extern from function prototypes · 7965bd4d
      Joe Perches 提交于
      There are a mix of function prototypes with and without extern
      in the kernel sources.  Standardize on not using extern for
      function prototypes.
      
      Function prototypes don't need to be written with extern.
      extern is assumed by the compiler.  Its use is as unnecessary as
      using auto to declare automatic/local variables in a block.
      Signed-off-by: NJoe Perches <joe@perches.com>
      7965bd4d
  13. 04 9月, 2013 2 次提交
  14. 08 8月, 2013 1 次提交
  15. 02 8月, 2013 2 次提交
  16. 01 8月, 2013 1 次提交
  17. 28 6月, 2013 1 次提交
  18. 26 6月, 2013 1 次提交
  19. 11 6月, 2013 2 次提交
  20. 01 6月, 2013 2 次提交
  21. 29 5月, 2013 1 次提交
  22. 28 5月, 2013 2 次提交
    • S
      MPLS: Add limited GSO support · 0d89d203
      Simon Horman 提交于
      In the case where a non-MPLS packet is received and an MPLS stack is
      added it may well be the case that the original skb is GSO but the
      NIC used for transmit does not support GSO of MPLS packets.
      
      The aim of this code is to provide GSO in software for MPLS packets
      whose skbs are GSO.
      
      SKB Usage:
      
      When an implementation adds an MPLS stack to a non-MPLS packet it should do
      the following to skb metadata:
      
      * Set skb->inner_protocol to the old non-MPLS ethertype of the packet.
        skb->inner_protocol is added by this patch.
      
      * Set skb->protocol to the new MPLS ethertype of the packet.
      
      * Set skb->network_header to correspond to the
        end of the L3 header, including the MPLS label stack.
      
      I have posted a patch, "[PATCH v3.29] datapath: Add basic MPLS support to
      kernel" which adds MPLS support to the kernel datapath of Open vSwtich.
      That patch sets the above requirements in datapath/actions.c:push_mpls()
      and was used to exercise this code.  The datapath patch is against the Open
      vSwtich tree but it is intended that it be added to the Open vSwtich code
      present in the mainline Linux kernel at some point.
      
      Features:
      
      I believe that the approach that I have taken is at least partially
      consistent with the handling of other protocols.  Jesse, I understand that
      you have some ideas here.  I am more than happy to change my implementation.
      
      This patch adds dev->mpls_features which may be used by devices
      to advertise features supported for MPLS packets.
      
      A new NETIF_F_MPLS_GSO feature is added for devices which support
      hardware MPLS GSO offload.  Currently no devices support this
      and MPLS GSO always falls back to software.
      
      Alternate Implementation:
      
      One possible alternate implementation is to teach netif_skb_features()
      and skb_network_protocol() about MPLS, in a similar way to their
      understanding of VLANs. I believe this would avoid the need
      for net/mpls/mpls_gso.c and in particular the calls to
      __skb_push() and __skb_push() in mpls_gso_segment().
      
      I have decided on the implementation in this patch as it should
      not introduce any overhead in the case where mpls_gso is not compiled
      into the kernel or inserted as a module.
      
      MPLS GSO suggested by Jesse Gross.
      Based in part on "v4 GRE: Add TCP segmentation offload for GRE"
      by Pravin B Shelar.
      
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d89d203
    • S
      net: Use 16bits for *_headers fields of struct skbuff · 1a37e412
      Simon Horman 提交于
      In order to mitigate ongoing incresase in the size of struct skbuff
      use 16 bit integer offsets rather than pointers for inner_*_headers.
      
      This appears to reduce the size of struct skbuff from 0xd0 to 0xc0
      bytes on x86_64 with the following all unset.
      
      	CONFIG_XFRM
      	CONFIG_NF_CONNTRACK
      	CONFIG_NF_CONNTRACK_MODULE
      	NET_SKBUFF_NF_DEFRAG_NEEDED
      	CONFIG_BRIDGE_NETFILTER
      	CONFIG_NET_SCHED
      	CONFIG_IPV6_NDISC_NODETYPE
      	CONFIG_NET_DMA
      	CONFIG_NETWORK_SECMARK
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a37e412
  23. 20 4月, 2013 2 次提交
  24. 06 4月, 2013 1 次提交
    • P
      netfilter: don't reset nf_trace in nf_reset() · 124dff01
      Patrick McHardy 提交于
      Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
      to reset nf_trace in nf_reset(). This is wrong and unnecessary.
      
      nf_reset() is used in the following cases:
      
      - when passing packets up the the socket layer, at which point we want to
        release all netfilter references that might keep modules pinned while
        the packet is queued. nf_trace doesn't matter anymore at this point.
      
      - when encapsulating or decapsulating IPsec packets. We want to continue
        tracing these packets after IPsec processing.
      
      - when passing packets through virtual network devices. Only devices on
        that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
        used anymore. Its not entirely clear whether those packets should
        be traced after that, however we've always done that.
      
      - when passing packets through virtual network devices that make the
        packet cross network namespace boundaries. This is the only cases
        where we clearly want to reset nf_trace and is also what the
        original patch intended to fix.
      
      Add a new function nf_reset_trace() and use it in dev_forward_skb() to
      fix this properly.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      124dff01
  25. 02 4月, 2013 1 次提交
  26. 28 3月, 2013 2 次提交
  27. 25 3月, 2013 1 次提交
  28. 21 3月, 2013 1 次提交
  29. 14 3月, 2013 2 次提交
    • P
      skb: Propagate pfmemalloc on skb from head page only · cca7af38
      Pavel Emelyanov 提交于
      Hi.
      
      I'm trying to send big chunks of memory from application address space via
      TCP socket using vmsplice + splice like this
      
         mem = mmap(128Mb);
         vmsplice(pipe[1], mem); /* splice memory into pipe */
         splice(pipe[0], tcp_socket); /* send it into network */
      
      When I'm lucky and a huge page splices into the pipe and then into the socket
      _and_ client and server ends of the TCP connection are on the same host,
      communicating via lo, the whole connection gets stuck! The sending queue
      becomes full and app stops writing/splicing more into it, but the receiving
      queue remains empty, and that's why.
      
      The __skb_fill_page_desc observes a tail page of a huge page and erroneously
      propagates its page->pfmemalloc value onto socket (the pfmemalloc on tail pages
      contain garbage). Then this skb->pfmemalloc leaks through lo and due to the
      
          tcp_v4_rcv
          sk_filter
              if (skb->pfmemalloc && !sock_flag(sk, SOCK_MEMALLOC)) /* true */
                  return -ENOMEM
              goto release_and_discard;
      
      no packets reach the socket. Even TCP re-transmits are dropped by this, as skb
      cloning clones the pfmemalloc flag as well.
      
      That said, here's the proper page->pfmemalloc propagation onto socket: we
      must check the huge-page's head page only, other pages' pfmemalloc and mapping
      values do not contain what is expected in this place. However, I'm not sure
      whether this fix is _complete_, since pfmemalloc propagation via lo also
      oesn't look great.
      
      Both, bit propagation from page to skb and this check in sk_filter, were
      introduced by c48a11c7 (netvm: propagate page->pfmemalloc to skb), in v3.5 so
      Mel and stable@ are in Cc.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cca7af38
    • E
      tcp: fix skb_availroom() · 16fad69c
      Eric Dumazet 提交于
      Chrome OS team reported a crash on a Pixel ChromeBook in TCP stack :
      
      https://code.google.com/p/chromium/issues/detail?id=182056
      
      commit a21d4572 (tcp: avoid order-1 allocations on wifi and tx
      path) did a poor choice adding an 'avail_size' field to skb, while
      what we really needed was a 'reserved_tailroom' one.
      
      It would have avoided commit 22b4a4f2 (tcp: fix retransmit of
      partially acked frames) and this commit.
      
      Crash occurs because skb_split() is not aware of the 'avail_size'
      management (and should not be aware)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NMukesh Agrawal <quiche@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16fad69c