1. 02 8月, 2018 1 次提交
  2. 19 7月, 2018 1 次提交
  3. 18 7月, 2018 1 次提交
    • F
      ipv6: remove dependency of nf_defrag_ipv6 on ipv6 module · 70b095c8
      Florian Westphal 提交于
      IPV6=m
      DEFRAG_IPV6=m
      CONNTRACK=y yields:
      
      net/netfilter/nf_conntrack_proto.o: In function `nf_ct_netns_do_get':
      net/netfilter/nf_conntrack_proto.c:802: undefined reference to `nf_defrag_ipv6_enable'
      net/netfilter/nf_conntrack_proto.o:(.rodata+0x640): undefined reference to `nf_conntrack_l4proto_icmpv6'
      
      Setting DEFRAG_IPV6=y causes undefined references to ip6_rhash_params
      ip6_frag_init and ip6_expire_frag_queue so it would be needed to force
      IPV6=y too.
      
      This patch gets rid of the 'followup linker error' by removing
      the dependency of ipv6.ko symbols from netfilter ipv6 defrag.
      
      Shared code is placed into a header, then used from both.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      70b095c8
  4. 17 7月, 2018 1 次提交
    • H
      ipv6/mcast: init as INCLUDE when join SSM INCLUDE group · c7ea20c9
      Hangbin Liu 提交于
      This an IPv6 version patch of "ipv4/igmp: init group mode as INCLUDE when
      join source group". From RFC3810, part 6.1:
      
         If no per-interface state existed for that
         multicast address before the change (i.e., the change consisted of
         creating a new per-interface record), or if no state exists after the
         change (i.e., the change consisted of deleting a per-interface
         record), then the "non-existent" state is considered to have an
         INCLUDE filter mode and an empty source list.
      
      Which means a new multicast group should start with state IN(). Currently,
      for MLDv2 SSM JOIN_SOURCE_GROUP mode, we first call ipv6_sock_mc_join(),
      then ip6_mc_source(), which will trigger a TO_IN() message instead of
      ALLOW().
      
      The issue was exposed by commit a052517a ("net/multicast: should not
      send source list records when have filter mode change"). Before this change,
      we sent both ALLOW(A) and TO_IN(A). Now, we only send TO_IN(A).
      
      Fix it by adding a new parameter to init group mode. Also add some wrapper
      functions to avoid changing too much code.
      
      v1 -> v2:
      In the first version I only cleared the group change record. But this is not
      enough. Because when a new group join, it will init as EXCLUDE and trigger
      a filter mode change in ip/ip6_mc_add_src(), which will clear all source
      addresses sf_crcount. This will prevent early joined address sending state
      change records if multi source addressed joined at the same time.
      
      In v2 patch, I fixed it by directly initializing the mode to INCLUDE for SSM
      JOIN_SOURCE_GROUP. I also split the original patch into two separated patches
      for IPv4 and IPv6.
      
      There is also a difference between v4 and v6 version. For IPv6, when the
      interface goes down and up, we will send correct state change record with
      unspecified IPv6 address (::) with function ipv6_mc_up(). But after DAD is
      completed, we resend the change record TO_IN() in mld_send_initial_cr().
      Fix it by sending ALLOW() for INCLUDE mode in mld_send_initial_cr().
      
      Fixes: a052517a ("net/multicast: should not send source list records when have filter mode change")
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7ea20c9
  5. 07 7月, 2018 2 次提交
  6. 06 7月, 2018 1 次提交
  7. 05 7月, 2018 1 次提交
    • P
      ipv6: make ipv6_renew_options() interrupt/kernel safe · a9ba23d4
      Paul Moore 提交于
      At present the ipv6_renew_options_kern() function ends up calling into
      access_ok() which is problematic if done from inside an interrupt as
      access_ok() calls WARN_ON_IN_IRQ() on some (all?) architectures
      (x86-64 is affected).  Example warning/backtrace is shown below:
      
       WARNING: CPU: 1 PID: 3144 at lib/usercopy.c:11 _copy_from_user+0x85/0x90
       ...
       Call Trace:
        <IRQ>
        ipv6_renew_option+0xb2/0xf0
        ipv6_renew_options+0x26a/0x340
        ipv6_renew_options_kern+0x2c/0x40
        calipso_req_setattr+0x72/0xe0
        netlbl_req_setattr+0x126/0x1b0
        selinux_netlbl_inet_conn_request+0x80/0x100
        selinux_inet_conn_request+0x6d/0xb0
        security_inet_conn_request+0x32/0x50
        tcp_conn_request+0x35f/0xe00
        ? __lock_acquire+0x250/0x16c0
        ? selinux_socket_sock_rcv_skb+0x1ae/0x210
        ? tcp_rcv_state_process+0x289/0x106b
        tcp_rcv_state_process+0x289/0x106b
        ? tcp_v6_do_rcv+0x1a7/0x3c0
        tcp_v6_do_rcv+0x1a7/0x3c0
        tcp_v6_rcv+0xc82/0xcf0
        ip6_input_finish+0x10d/0x690
        ip6_input+0x45/0x1e0
        ? ip6_rcv_finish+0x1d0/0x1d0
        ipv6_rcv+0x32b/0x880
        ? ip6_make_skb+0x1e0/0x1e0
        __netif_receive_skb_core+0x6f2/0xdf0
        ? process_backlog+0x85/0x250
        ? process_backlog+0x85/0x250
        ? process_backlog+0xec/0x250
        process_backlog+0xec/0x250
        net_rx_action+0x153/0x480
        __do_softirq+0xd9/0x4f7
        do_softirq_own_stack+0x2a/0x40
        </IRQ>
        ...
      
      While not present in the backtrace, ipv6_renew_option() ends up calling
      access_ok() via the following chain:
      
        access_ok()
        _copy_from_user()
        copy_from_user()
        ipv6_renew_option()
      
      The fix presented in this patch is to perform the userspace copy
      earlier in the call chain such that it is only called when the option
      data is actually coming from userspace; that place is
      do_ipv6_setsockopt().  Not only does this solve the problem seen in
      the backtrace above, it also allows us to simplify the code quite a
      bit by removing ipv6_renew_options_kern() completely.  We also take
      this opportunity to cleanup ipv6_renew_options()/ipv6_renew_option()
      a small amount as well.
      
      This patch is heavily based on a rough patch by Al Viro.  I've taken
      his original patch, converted a kmemdup() call in do_ipv6_setsockopt()
      to a memdup_user() call, made better use of the e_inval jump target in
      the same function, and cleaned up the use ipv6_renew_option() by
      ipv6_renew_options().
      
      CC: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9ba23d4
  8. 05 6月, 2018 3 次提交
  9. 27 4月, 2018 2 次提交
    • W
      udp: generate gso with UDP_SEGMENT · bec1f6f6
      Willem de Bruijn 提交于
      Support generic segmentation offload for udp datagrams. Callers can
      concatenate and send at once the payload of multiple datagrams with
      the same destination.
      
      To set segment size, the caller sets socket option UDP_SEGMENT to the
      length of each discrete payload. This value must be smaller than or
      equal to the relevant MTU.
      
      A follow-up patch adds cmsg UDP_SEGMENT to specify segment size on a
      per send call basis.
      
      Total byte length may then exceed MTU. If not an exact multiple of
      segment size, the last segment will be shorter.
      
      The implementation adds a gso_size field to the udp socket, ip(v6)
      cmsg cookie and inet_cork structure to be able to set the value at
      setsockopt or cmsg time and to work with both lockless and corked
      paths.
      
      Initial benchmark numbers show UDP GSO about as expensive as TCP GSO.
      
          tcp tso
           3197 MB/s 54232 msg/s 54232 calls/s
               6,457,754,262      cycles
      
          tcp gso
           1765 MB/s 29939 msg/s 29939 calls/s
              11,203,021,806      cycles
      
          tcp without tso/gso *
            739 MB/s 12548 msg/s 12548 calls/s
              11,205,483,630      cycles
      
          udp
            876 MB/s 14873 msg/s 624666 calls/s
              11,205,777,429      cycles
      
          udp gso
           2139 MB/s 36282 msg/s 36282 calls/s
              11,204,374,561      cycles
      
         [*] after reverting commit 0a6b2a1d
             ("tcp: switch to GSO being always on")
      
      Measured total system cycles ('-a') for one core while pinning both
      the network receive path and benchmark process to that core:
      
        perf stat -a -C 12 -e cycles \
          ./udpgso_bench_tx -C 12 -4 -D "$DST" -l 4
      
      Note the reduction in calls/s with GSO. Bytes per syscall drops
      increases from 1470 to 61818.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bec1f6f6
    • W
      udp: expose inet cork to udp · 1cd7884d
      Willem de Bruijn 提交于
      UDP segmentation offload needs access to inet_cork in the udp layer.
      Pass the struct to ip(6)_make_skb instead of allocating it on the
      stack in that function itself.
      
      This patch is a noop otherwise.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cd7884d
  10. 22 4月, 2018 1 次提交
  11. 18 4月, 2018 1 次提交
  12. 04 4月, 2018 1 次提交
  13. 01 4月, 2018 4 次提交
    • E
      inet: frags: remove some helpers · 6befe4a7
      Eric Dumazet 提交于
      Remove sum_frag_mem_limit(), ip_frag_mem() & ip6_frag_mem()
      
      Also since we use rhashtable we can bring back the number of fragments
      in "grep FRAG /proc/net/sockstat /proc/net/sockstat6" that was
      removed in commit 434d3054 ("inet: frag: don't account number
      of fragment queues")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6befe4a7
    • E
      inet: frags: use rhashtables for reassembly units · 648700f7
      Eric Dumazet 提交于
      Some applications still rely on IP fragmentation, and to be fair linux
      reassembly unit is not working under any serious load.
      
      It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)
      
      A work queue is supposed to garbage collect items when host is under memory
      pressure, and doing a hash rebuild, changing seed used in hash computations.
      
      This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
      occurring every 5 seconds if host is under fire.
      
      Then there is the problem of sharing this hash table for all netns.
      
      It is time to switch to rhashtables, and allocate one of them per netns
      to speedup netns dismantle, since this is a critical metric these days.
      
      Lookup is now using RCU. A followup patch will even remove
      the refcount hold/release left from prior implementation and save
      a couple of atomic operations.
      
      Before this patch, 16 cpus (16 RX queue NIC) could not handle more
      than 1 Mpps frags DDOS.
      
      After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
      of storage for the fragments (exact number depends on frags being evicted
      after timeout)
      
      $ grep FRAG /proc/net/sockstat
      FRAG: inuse 1966916 memory 2140004608
      
      A followup patch will change the limits for 64bit arches.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Alexander Aring <alex.aring@gmail.com>
      Cc: Stefan Schmidt <stefan@osg.samsung.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      648700f7
    • E
      inet: frags: add a pointer to struct netns_frags · 093ba729
      Eric Dumazet 提交于
      In order to simplify the API, add a pointer to struct inet_frags.
      This will allow us to make things less complex.
      
      These functions no longer have a struct inet_frags parameter :
      
      inet_frag_destroy(struct inet_frag_queue *q  /*, struct inet_frags *f */)
      inet_frag_put(struct inet_frag_queue *q /*, struct inet_frags *f */)
      inet_frag_kill(struct inet_frag_queue *q /*, struct inet_frags *f */)
      inet_frags_exit_net(struct netns_frags *nf /*, struct inet_frags *f */)
      ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      093ba729
    • E
      ipv6: frag: remove unused field · c22af22c
      Eric Dumazet 提交于
      csum field in struct frag_queue is not used, remove it.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c22af22c
  14. 31 3月, 2018 1 次提交
    • A
      net: Introduce __inet_bind() and __inet6_bind · 3679d585
      Andrey Ignatov 提交于
      Refactor `bind()` code to make it ready to be called from BPF helper
      function `bpf_bind()` (will be added soon). Implementation of
      `inet_bind()` and `inet6_bind()` is separated into `__inet_bind()` and
      `__inet6_bind()` correspondingly. These function can be used from both
      `sk_prot->bind` and `bpf_bind()` contexts.
      
      New functions have two additional arguments.
      
      `force_bind_address_no_port` forces binding to IP only w/o checking
      `inet_sock.bind_address_no_port` field. It'll allow to bind local end of
      a connection to desired IP in `bpf_bind()` w/o changing
      `bind_address_no_port` field of a socket. It's useful since `bpf_bind()`
      can return an error and we'd need to restore original value of
      `bind_address_no_port` in that case if we changed this before calling to
      the helper.
      
      `with_lock` specifies whether to lock socket when working with `struct
      sk` or not. The argument is set to `true` for `sk_prot->bind`, i.e. old
      behavior is preserved. But it will be set to `false` for `bpf_bind()`
      use-case. The reason is all call-sites, where `bpf_bind()` will be
      called, already hold that socket lock.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      3679d585
  15. 12 3月, 2018 1 次提交
  16. 01 3月, 2018 1 次提交
  17. 13 2月, 2018 1 次提交
    • D
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko 提交于
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b2c45d4
  18. 24 1月, 2018 1 次提交
  19. 09 1月, 2018 1 次提交
  20. 03 12月, 2017 1 次提交
  21. 24 11月, 2017 1 次提交
    • W
      net: accept UFO datagrams from tuntap and packet · 0c19f846
      Willem de Bruijn 提交于
      Tuntap and similar devices can inject GSO packets. Accept type
      VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.
      
      Processes are expected to use feature negotiation such as TUNSETOFFLOAD
      to detect supported offload types and refrain from injecting other
      packets. This process breaks down with live migration: guest kernels
      do not renegotiate flags, so destination hosts need to expose all
      features that the source host does.
      
      Partially revert the UFO removal from 182e0b6b~1..d9d30adf.
      This patch introduces nearly(*) no new code to simplify verification.
      It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
      insertion and software UFO segmentation.
      
      It does not reinstate protocol stack support, hardware offload
      (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
      of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.
      
      To support SKB_GSO_UDP reappearing in the stack, also reinstate
      logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
      by squashing in commit 93991221 ("net: skb_needs_check() removes
      CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee6
      ("net: avoid skb_warn_bad_offload false positives on UFO").
      
      (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
      ipv6_proxy_select_ident is changed to return a __be32 and this is
      assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
      at the end of the enum to minimize code churn.
      
      Tested
        Booted a v4.13 guest kernel with QEMU. On a host kernel before this
        patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
        enabled, same as on a v4.13 host kernel.
      
        A UFO packet sent from the guest appears on the tap device:
          host:
            nc -l -p -u 8000 &
            tcpdump -n -i tap0
      
          guest:
            dd if=/dev/zero of=payload.txt bs=1 count=2000
            nc -u 192.16.1.1 8000 < payload.txt
      
        Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
        packets arriving fragmented:
      
          ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
          (from https://github.com/wdebruij/kerneltools/tree/master/tests)
      
      Changes
        v1 -> v2
          - simplified set_offload change (review comment)
          - documented test procedure
      
      Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
      Fixes: fb652fdf ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
      Reported-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c19f846
  22. 11 11月, 2017 1 次提交
  23. 03 11月, 2017 1 次提交
    • T
      ipv6: Implement limits on Hop-by-Hop and Destination options · 47d3d7ac
      Tom Herbert 提交于
      RFC 8200 (IPv6) defines Hop-by-Hop options and Destination options
      extension headers. Both of these carry a list of TLVs which is
      only limited by the maximum length of the extension header (2048
      bytes). By the spec a host must process all the TLVs in these
      options, however these could be used as a fairly obvious
      denial of service attack. I think this could in fact be
      a significant DOS vector on the Internet, one mitigating
      factor might be that many FWs drop all packets with EH (and
      obviously this is only IPv6) so an Internet wide attack might not
      be so effective (yet!).
      
      By my calculation, the worse case packet with TLVs in a standard
      1500 byte MTU packet that would be processed by the stack contains
      1282 invidual TLVs (including pad TLVS) or 724 two byte TLVs. I
      wrote a quick test program that floods a whole bunch of these
      packets to a host and sure enough there is substantial time spent
      in ip6_parse_tlv. These packets contain nothing but unknown TLVS
      (that are ignored), TLV padding, and bogus UDP header with zero
      payload length.
      
        25.38%  [kernel]                    [k] __fib6_clean_all
        21.63%  [kernel]                    [k] ip6_parse_tlv
         4.21%  [kernel]                    [k] __local_bh_enable_ip
         2.18%  [kernel]                    [k] ip6_pol_route.isra.39
         1.98%  [kernel]                    [k] fib6_walk_continue
         1.88%  [kernel]                    [k] _raw_write_lock_bh
         1.65%  [kernel]                    [k] dst_release
      
      This patch adds configurable limits to Destination and Hop-by-Hop
      options. There are three limits that may be set:
        - Limit the number of options in a Hop-by-Hop or Destination options
          extension header.
        - Limit the byte length of a Hop-by-Hop or Destination options
          extension header.
        - Disallow unrecognized options in a Hop-by-Hop or Destination
          options extension header.
      
      The limits are set in corresponding sysctls:
      
        ipv6.sysctl.max_dst_opts_cnt
        ipv6.sysctl.max_hbh_opts_cnt
        ipv6.sysctl.max_dst_opts_len
        ipv6.sysctl.max_hbh_opts_len
      
      If a max_*_opts_cnt is less than zero then unknown TLVs are disallowed.
      The number of known TLVs that are allowed is the absolute value of
      this number.
      
      If a limit is exceeded when processing an extension header the packet is
      dropped.
      
      Default values are set to 8 for options counts, and set to INT_MAX
      for maximum length. Note the choice to limit options to 8 is an
      arbitrary guess (roughly based on the fact that the stack supports
      three HBH options and just one destination option).
      
      These limits have being proposed in draft-ietf-6man-rfc6434-bis.
      
      Tested (by Martin Lau)
      
      I tested out 1 thread (i.e. one raw_udp process).
      
      I changed the net.ipv6.max_dst_(opts|hbh)_number between 8 to 2048.
      With sysctls setting to 2048, the softirq% is packed to 100%.
      With 8, the softirq% is almost unnoticable from mpstat.
      
      v2;
        - Code and documention cleanup.
        - Change references of RFC2460 to be RFC8200.
        - Add reference to RFC6434-bis where the limits will be in standard.
      Signed-off-by: NTom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47d3d7ac
  24. 07 10月, 2017 1 次提交
  25. 04 7月, 2017 1 次提交
  26. 05 6月, 2017 1 次提交
  27. 01 2月, 2017 1 次提交
    • D
      ipv6: fix flow labels when the traffic class is non-0 · 90427ef5
      Dimitris Michailidis 提交于
      ip6_make_flowlabel() determines the flow label for IPv6 packets. It's
      supposed to be passed a flow label, which it returns as is if non-0 and
      in some other cases, otherwise it calculates a new value.
      
      The problem is callers often pass a flowi6.flowlabel, which may also
      contain traffic class bits. If the traffic class is non-0
      ip6_make_flowlabel() mistakes the non-0 it gets as a flow label and
      returns the whole thing. Thus it can return a 'flow label' longer than
      20b and the low 20b of that is typically 0 resulting in packets with 0
      label. Moreover, different packets of a flow may be labeled differently.
      For a TCP flow with ECN non-payload and payload packets get different
      labels as exemplified by this pair of consecutive packets:
      
      (pure ACK)
      Internet Protocol Version 6, Src: 2002:af5:11a3::, Dst: 2002:af5:11a2::
          0110 .... = Version: 6
          .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT)
              .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
              .... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0)
          .... .... .... 0001 1100 1110 0100 1001 = Flow Label: 0x1ce49
          Payload Length: 32
          Next Header: TCP (6)
      
      (payload)
      Internet Protocol Version 6, Src: 2002:af5:11a3::, Dst: 2002:af5:11a2::
          0110 .... = Version: 6
          .... 0000 0010 .... .... .... .... .... = Traffic Class: 0x02 (DSCP: CS0, ECN: ECT(0))
              .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
              .... .... ..10 .... .... .... .... .... = Explicit Congestion Notification: ECN-Capable Transport codepoint '10' (2)
          .... .... .... 0000 0000 0000 0000 0000 = Flow Label: 0x00000
          Payload Length: 688
          Next Header: TCP (6)
      
      This patch allows ip6_make_flowlabel() to be passed more than just a
      flow label and has it extract the part it really wants. This was simpler
      than modifying the callers. With this patch packets like the above become
      
      Internet Protocol Version 6, Src: 2002:af5:11a3::, Dst: 2002:af5:11a2::
          0110 .... = Version: 6
          .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT)
              .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
              .... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0)
          .... .... .... 1010 1111 1010 0101 1110 = Flow Label: 0xafa5e
          Payload Length: 32
          Next Header: TCP (6)
      
      Internet Protocol Version 6, Src: 2002:af5:11a3::, Dst: 2002:af5:11a2::
          0110 .... = Version: 6
          .... 0000 0010 .... .... .... .... .... = Traffic Class: 0x02 (DSCP: CS0, ECN: ECT(0))
              .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
              .... .... ..10 .... .... .... .... .... = Explicit Congestion Notification: ECN-Capable Transport codepoint '10' (2)
          .... .... .... 1010 1111 1010 0101 1110 = Flow Label: 0xafa5e
          Payload Length: 688
          Next Header: TCP (6)
      Signed-off-by: NDimitris Michailidis <dmichail@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90427ef5
  28. 27 1月, 2017 1 次提交
  29. 01 12月, 2016 1 次提交
  30. 10 11月, 2016 1 次提交
  31. 28 6月, 2016 3 次提交