1. 01 11月, 2016 1 次提交
  2. 30 10月, 2016 1 次提交
    • W
      packet: on direct_xmit, limit tso and csum to supported devices · 104ba78c
      Willem de Bruijn 提交于
      When transmitting on a packet socket with PACKET_VNET_HDR and
      PACKET_QDISC_BYPASS, validate device support for features requested
      in vnet_hdr.
      
      Drop TSO packets sent to devices that do not support TSO or have the
      feature disabled. Note that the latter currently do process those
      packets correctly, regardless of not advertising the feature.
      
      Because of SKB_GSO_DODGY, it is not sufficient to test device features
      with netif_needs_gso. Full validate_xmit_skb is needed.
      
      Switch to software checksum for non-TSO packets that request checksum
      offload if that device feature is unsupported or disabled. Note that
      similar to the TSO case, device drivers may perform checksum offload
      correctly even when not advertising it.
      
      When switching to software checksum, packets hit skb_checksum_help,
      which has two BUG_ON checksum not in linear segment. Packet sockets
      always allocate at least up to csum_start + csum_off + 2 as linear.
      
      Tested by running github.com/wdebruij/kerneltools/psock_txring_vnet.c
      
        ethtool -K eth0 tso off tx on
        psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v
        psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v -N
      
        ethtool -K eth0 tx off
        psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G
        psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G -N
      
      v2:
        - add EXPORT_SYMBOL_GPL(validate_xmit_skb_list)
      
      Fixes: d346a3fa ("packet: introduce PACKET_QDISC_BYPASS socket option")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      104ba78c
  3. 21 10月, 2016 1 次提交
    • S
      net: add recursion limit to GRO · fcd91dd4
      Sabrina Dubroca 提交于
      Currently, GRO can do unlimited recursion through the gro_receive
      handlers.  This was fixed for tunneling protocols by limiting tunnel GRO
      to one level with encap_mark, but both VLAN and TEB still have this
      problem.  Thus, the kernel is vulnerable to a stack overflow, if we
      receive a packet composed entirely of VLAN headers.
      
      This patch adds a recursion counter to the GRO layer to prevent stack
      overflow.  When a gro_receive function hits the recursion limit, GRO is
      aborted for this skb and it is processed normally.  This recursion
      counter is put in the GRO CB, but could be turned into a percpu counter
      if we run out of space in the CB.
      
      Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report.
      
      Fixes: CVE-2016-7039
      Fixes: 9b174d88 ("net: Add Transparent Ethernet Bridging GRO support.")
      Fixes: 66e5133f ("vlan: Add GRO support for non hardware accelerated vlan")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NJiri Benc <jbenc@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcd91dd4
  4. 19 10月, 2016 1 次提交
    • I
      net: core: Correctly iterate over lower adjacency list · e4961b07
      Ido Schimmel 提交于
      Tamir reported the following trace when processing ARP requests received
      via a vlan device on top of a VLAN-aware bridge:
      
       NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
      [...]
       CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.8.0-rc7 #1
       Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
       task: ffff88017edfea40 task.stack: ffff88017ee10000
       RIP: 0010:[<ffffffff815dcc73>]  [<ffffffff815dcc73>] netdev_all_lower_get_next_rcu+0x33/0x60
      [...]
       Call Trace:
        <IRQ>
        [<ffffffffa015de0a>] mlxsw_sp_port_lower_dev_hold+0x5a/0xa0 [mlxsw_spectrum]
        [<ffffffffa016f1b0>] mlxsw_sp_router_netevent_event+0x80/0x150 [mlxsw_spectrum]
        [<ffffffff810ad07a>] notifier_call_chain+0x4a/0x70
        [<ffffffff810ad13a>] atomic_notifier_call_chain+0x1a/0x20
        [<ffffffff815ee77b>] call_netevent_notifiers+0x1b/0x20
        [<ffffffff815f2eb6>] neigh_update+0x306/0x740
        [<ffffffff815f38ce>] neigh_event_ns+0x4e/0xb0
        [<ffffffff8165ea3f>] arp_process+0x66f/0x700
        [<ffffffff8170214c>] ? common_interrupt+0x8c/0x8c
        [<ffffffff8165ec29>] arp_rcv+0x139/0x1d0
        [<ffffffff816e505a>] ? vlan_do_receive+0xda/0x320
        [<ffffffff815e3794>] __netif_receive_skb_core+0x524/0xab0
        [<ffffffff815e6830>] ? dev_queue_xmit+0x10/0x20
        [<ffffffffa06d612d>] ? br_forward_finish+0x3d/0xc0 [bridge]
        [<ffffffffa06e5796>] ? br_handle_vlan+0xf6/0x1b0 [bridge]
        [<ffffffff815e3d38>] __netif_receive_skb+0x18/0x60
        [<ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0
        [<ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70
        [<ffffffffa06d7856>] br_pass_frame_up+0xc6/0x160 [bridge]
        [<ffffffffa06d63d7>] ? deliver_clone+0x37/0x50 [bridge]
        [<ffffffffa06d656c>] ? br_flood+0xcc/0x160 [bridge]
        [<ffffffffa06d7b14>] br_handle_frame_finish+0x224/0x4f0 [bridge]
        [<ffffffffa06d7f94>] br_handle_frame+0x174/0x300 [bridge]
        [<ffffffff815e3599>] __netif_receive_skb_core+0x329/0xab0
        [<ffffffff81374815>] ? find_next_bit+0x15/0x20
        [<ffffffff8135e802>] ? cpumask_next_and+0x32/0x50
        [<ffffffff810c9968>] ? load_balance+0x178/0x9b0
        [<ffffffff815e3d38>] __netif_receive_skb+0x18/0x60
        [<ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0
        [<ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70
        [<ffffffffa01544a1>] mlxsw_sp_rx_listener_func+0x61/0xb0 [mlxsw_spectrum]
        [<ffffffffa005c9f7>] mlxsw_core_skb_receive+0x187/0x200 [mlxsw_core]
        [<ffffffffa007332a>] mlxsw_pci_cq_tasklet+0x63a/0x9b0 [mlxsw_pci]
        [<ffffffff81091986>] tasklet_action+0xf6/0x110
        [<ffffffff81704556>] __do_softirq+0xf6/0x280
        [<ffffffff8109213f>] irq_exit+0xdf/0xf0
        [<ffffffff817042b4>] do_IRQ+0x54/0xd0
        [<ffffffff8170214c>] common_interrupt+0x8c/0x8c
      
      The problem is that netdev_all_lower_get_next_rcu() never advances the
      iterator, thereby causing the loop over the lower adjacency list to run
      forever.
      
      Fix this by advancing the iterator and avoid the infinite loop.
      
      Fixes: 7ce856aa ("mlxsw: spectrum: Add couple of lower device helper functions")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NTamir Winetroub <tamirw@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4961b07
  5. 11 10月, 2016 1 次提交
    • E
      latent_entropy: Mark functions with __latent_entropy · 0766f788
      Emese Revfy 提交于
      The __latent_entropy gcc attribute can be used only on functions and
      variables.  If it is on a function then the plugin will instrument it for
      gathering control-flow entropy. If the attribute is on a variable then
      the plugin will initialize it with random contents.  The variable must
      be an integer, an integer array type or a structure with integer fields.
      
      These specific functions have been selected because they are init
      functions (to help gather boot-time entropy), are called at unpredictable
      times, or they have variable loops, each of which provide some level of
      latent entropy.
      Signed-off-by: NEmese Revfy <re.emese@gmail.com>
      [kees: expanded commit message]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      0766f788
  6. 04 10月, 2016 1 次提交
    • A
      net: Add netdev all_adj_list refcnt propagation to fix panic · 93409033
      Andrew Collins 提交于
      This is a respin of a patch to fix a relatively easily reproducible kernel
      panic related to the all_adj_list handling for netdevs in recent kernels.
      
      The following sequence of commands will reproduce the issue:
      
      ip link add link eth0 name eth0.100 type vlan id 100
      ip link add link eth0 name eth0.200 type vlan id 200
      ip link add name testbr type bridge
      ip link set eth0.100 master testbr
      ip link set eth0.200 master testbr
      ip link add link testbr mac0 type macvlan
      ip link delete dev testbr
      
      This creates an upper/lower tree of (excuse the poor ASCII art):
      
                  /---eth0.100-eth0
      mac0-testbr-
                  \---eth0.200-eth0
      
      When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted twice from
      the mac0 list. Unfortunately, during setup in __netdev_upper_dev_link, only one
      reference to eth0 is added, so this results in a panic.
      
      This change adds reference count propagation so things are handled properly.
      
      Matthias Schiffer reported a similar crash in batman-adv:
      
      https://github.com/freifunk-gluon/gluon/issues/680
      https://www.open-mesh.org/issues/247
      
      which this patch also seems to resolve.
      Signed-off-by: NAndrew Collins <acollins@cradlepoint.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93409033
  7. 25 9月, 2016 1 次提交
  8. 11 9月, 2016 1 次提交
  9. 05 9月, 2016 1 次提交
    • M
      bonding: Fix bonding crash · 24b27fc4
      Mahesh Bandewar 提交于
      Following few steps will crash kernel -
      
        (a) Create bonding master
            > modprobe bonding miimon=50
        (b) Create macvlan bridge on eth2
            > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
      	   type macvlan
        (c) Now try adding eth2 into the bond
            > echo +eth2 > /sys/class/net/bond0/bonding/slaves
            <crash>
      
      Bonding does lots of things before checking if the device enslaved is
      busy or not.
      
      In this case when the notifier call-chain sends notifications, the
      bond_netdev_event() assumes that the rx_handler /rx_handler_data is
      registered while the bond_enslave() hasn't progressed far enough to
      register rx_handler for the new slave.
      
      This patch adds a rx_handler check that can be performed right at the
      beginning of the enslave code to avoid getting into this situation.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      24b27fc4
  10. 31 8月, 2016 1 次提交
  11. 27 8月, 2016 2 次提交
    • I
      bridge: switchdev: Add forward mark support for stacked devices · 6bc506b4
      Ido Schimmel 提交于
      switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of
      port netdevs so that packets being flooded by the device won't be
      flooded twice.
      
      It works by assigning a unique identifier (the ifindex of the first
      bridge port) to bridge ports sharing the same parent ID. This prevents
      packets from being flooded twice by the same switch, but will flood
      packets through bridge ports belonging to a different switch.
      
      This method is problematic when stacked devices are taken into account,
      such as VLANs. In such cases, a physical port netdev can have upper
      devices being members in two different bridges, thus requiring two
      different 'offload_fwd_mark's to be configured on the port netdev, which
      is impossible.
      
      The main problem is that packet and netdev marking is performed at the
      physical netdev level, whereas flooding occurs between bridge ports,
      which are not necessarily port netdevs.
      
      Instead, packet and netdev marking should really be done in the bridge
      driver with the switch driver only telling it which packets it already
      forwarded. The bridge driver will mark such packets using the mark
      assigned to the ingress bridge port and will prevent the packet from
      being forwarded through any bridge port sharing the same mark (i.e.
      having the same parent ID).
      
      Remove the current switchdev 'offload_fwd_mark' implementation and
      instead implement the proposed method. In addition, make rocker - the
      sole user of the mark - use the proposed method.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bc506b4
    • P
      net: flush the softnet backlog in process context · 145dd5f9
      Paolo Abeni 提交于
      Currently in process_backlog(), the process_queue dequeuing is
      performed with local IRQ disabled, to protect against
      flush_backlog(), which runs in hard IRQ context.
      
      This patch moves the flush operation to a work queue and runs the
      callback with bottom half disabled to protect the process_queue
      against dequeuing.
      Since process_queue is now always manipulated in bottom half context,
      the irq disable/enable pair around the dequeue operation are removed.
      
      To keep the flush time as low as possible, the flush
      works are scheduled on all online cpu simultaneously, using the
      high priority work-queue and statically allocated, per cpu,
      work structs.
      
      Overall this change increases the time required to destroy a device
      to improve slightly the packets reinjection performances.
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      145dd5f9
  12. 14 8月, 2016 1 次提交
    • S
      net: remove type_check from dev_get_nest_level() · 952fcfd0
      Sabrina Dubroca 提交于
      The idea for type_check in dev_get_nest_level() was to count the number
      of nested devices of the same type (currently, only macvlan or vlan
      devices).
      This prevented the false positive lockdep warning on configurations such
      as:
      
      eth0 <--- macvlan0 <--- vlan0 <--- macvlan1
      
      However, this doesn't prevent a warning on a configuration such as:
      
      eth0 <--- macvlan0 <--- vlan0
      eth1 <--- vlan1 <--- macvlan1
      
      In this case, all the locks end up with a nesting subclass of 1, so
      lockdep thinks that there is still a deadlock:
      
      - in the first case we have (macvlan_netdev_addr_lock_key, 1) and then
        take (vlan_netdev_xmit_lock_key, 1)
      - in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then
        take (macvlan_netdev_addr_lock_key, 1)
      
      By removing the linktype check in dev_get_nest_level() and always
      incrementing the nesting depth, lockdep considers this configuration
      valid.
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      952fcfd0
  13. 11 8月, 2016 1 次提交
  14. 20 7月, 2016 1 次提交
  15. 10 7月, 2016 1 次提交
  16. 06 7月, 2016 1 次提交
  17. 05 7月, 2016 1 次提交
  18. 26 6月, 2016 1 次提交
    • E
      net_sched: drop packets after root qdisc lock is released · 520ac30f
      Eric Dumazet 提交于
      Qdisc performance suffers when packets are dropped at enqueue()
      time because drops (kfree_skb()) are done while qdisc lock is held,
      delaying a dequeue() draining the queue.
      
      Nominal throughput can be reduced by 50 % when this happens,
      at a time we would like the dequeue() to proceed as fast as possible.
      
      Even FQ is vulnerable to this problem, while one of FQ goals was
      to provide some flow isolation.
      
      This patch adds a 'struct sk_buff **to_free' parameter to all
      qdisc->enqueue(), and in qdisc_drop() helper.
      
      I measured a performance increase of up to 12 %, but this patch
      is a prereq so that future batches in enqueue() can fly.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      520ac30f
  19. 17 6月, 2016 2 次提交
  20. 11 6月, 2016 2 次提交
    • L
      vfs: make the string hashes salt the hash · 8387ff25
      Linus Torvalds 提交于
      We always mixed in the parent pointer into the dentry name hash, but we
      did it late at lookup time.  It turns out that we can simplify that
      lookup-time action by salting the hash with the parent pointer early
      instead of late.
      
      A few other users of our string hashes also wanted to mix in their own
      pointers into the hash, and those are updated to use the same mechanism.
      
      Hash users that don't have any particular initial salt can just use the
      NULL pointer as a no-salt.
      
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8387ff25
    • D
      bpf: enforce recursion limit on redirects · a70b506e
      Daniel Borkmann 提交于
      Respect the stack's xmit_recursion limit for calls into dev_queue_xmit().
      Currently, they are not handeled by the limiter when attached to clsact's
      egress parent, for example, and a buggy program redirecting it to the
      same device again could run into stack overflow eventually. It would be
      good if we could notify an admin to give him a chance to react. We reuse
      xmit_recursion instead of having one private to eBPF, so that the stack's
      current recursion depth will be taken into account as well. Follow-up to
      commit 3896d655 ("bpf: introduce bpf_clone_redirect() helper") and
      27b29f63 ("bpf: add bpf_redirect() helper").
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a70b506e
  21. 09 6月, 2016 1 次提交
  22. 08 6月, 2016 2 次提交
    • E
      net_sched: transform qdisc running bit into a seqcount · f9eb8aea
      Eric Dumazet 提交于
      Instead of using a single bit (__QDISC___STATE_RUNNING)
      in sch->__state, use a seqcount.
      
      This adds lockdep support, but more importantly it will allow us
      to sample qdisc/class statistics without having to grab qdisc root lock.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9eb8aea
    • E
      net: get rid of spin_trylock() in net_tx_action() · 3bcb846c
      Eric Dumazet 提交于
      Note: Tom Herbert posted almost same patch 3 months back, but for
      different reasons.
      
      The reasons we want to get rid of this spin_trylock() are :
      
      1) Under high qdisc pressure, the spin_trylock() has almost no
      chance to succeed.
      
      2) We loop multiple times in softirq handler, eventually reaching
      the max retry count (10), and we schedule ksoftirqd.
      
      Since we want to adhere more strictly to ksoftirqd being waked up in
      the future (https://lwn.net/Articles/687617/), better avoid spurious
      wakeups.
      
      3) calls to __netif_reschedule() dirty the cache line containing
      q->next_sched, slowing down the owner of qdisc.
      
      4) RT kernels can not use the spin_trylock() here.
      
      With help of busylock, we get the qdisc spinlock fast enough, and
      the trylock trick brings only performance penalty.
      
      Depending on qdisc setup, I observed a gain of up to 19 % in qdisc
      performance (1016600 pps instead of 853400 pps, using prio+tbf+fq_codel)
      
      ("mpstat -I SCPU 1" is much happier now)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Acked-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3bcb846c
  23. 17 5月, 2016 1 次提交
  24. 12 5月, 2016 1 次提交
    • D
      net: l3mdev: Add hook in ip and ipv6 · 74b20582
      David Ahern 提交于
      Currently the VRF driver uses the rx_handler to switch the skb device
      to the VRF device. Switching the dev prior to the ip / ipv6 layer
      means the VRF driver has to duplicate IP/IPv6 processing which adds
      overhead and makes features such as retaining the ingress device index
      more complicated than necessary.
      
      This patch moves the hook to the L3 layer just after the first NF_HOOK
      for PRE_ROUTING. This location makes exposing the original ingress device
      trivial (next patch) and allows adding other NF_HOOKs to the VRF driver
      in the future.
      
      dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb
      with the switched device through the packet taps to maintain current
      behavior (tcpdump can be used on either the vrf device or the enslaved
      devices).
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74b20582
  25. 09 5月, 2016 1 次提交
  26. 05 5月, 2016 1 次提交
  27. 04 5月, 2016 1 次提交
  28. 30 4月, 2016 1 次提交
  29. 29 4月, 2016 1 次提交
    • J
      tuntap: calculate rps hash only when needed · 3df97ba8
      Jason Wang 提交于
      There's no need to calculate rps hash if it was not enabled. So this
      patch export rps_needed and check it before trying to get rps
      hash. Tests (using pktgen to inject packets to guest) shows this can
      improve pps about 13% (when rps is disabled).
      
      Before:
      ~1150000 pps
      After:
      ~1300000 pps
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      ----
      Changes from V1:
      - Fix build when CONFIG_RPS is not set
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3df97ba8
  30. 28 4月, 2016 1 次提交
  31. 22 4月, 2016 1 次提交
  32. 15 4月, 2016 4 次提交
    • E
      net: validate_xmit_skb() changes · d21fd63e
      Eric Dumazet 提交于
      skbs given to validate_xmit_skb() should not have a next
      pointer anymore.
      
      Also if a packet is dropped, increment dev->tx_dropped
      __dev_queue_xmit() no longer has to change tx_dropped in this case.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d21fd63e
    • A
      GSO: Support partial segmentation offload · 802ab55a
      Alexander Duyck 提交于
      This patch adds support for something I am referring to as GSO partial.
      The basic idea is that we can support a broader range of devices for
      segmentation if we use fixed outer headers and have the hardware only
      really deal with segmenting the inner header.  The idea behind the naming
      is due to the fact that everything before csum_start will be fixed headers,
      and everything after will be the region that is handled by hardware.
      
      With the current implementation it allows us to add support for the
      following GSO types with an inner TSO_MANGLEID or TSO6 offload:
      NETIF_F_GSO_GRE
      NETIF_F_GSO_GRE_CSUM
      NETIF_F_GSO_IPIP
      NETIF_F_GSO_SIT
      NETIF_F_UDP_TUNNEL
      NETIF_F_UDP_TUNNEL_CSUM
      
      In the case of hardware that already supports tunneling we may be able to
      extend this further to support TSO_TCPV4 without TSO_MANGLEID if the
      hardware can support updating inner IPv4 headers.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      802ab55a
    • A
      GRO: Add support for TCP with fixed IPv4 ID field, limit tunnel IP ID values · 1530545e
      Alexander Duyck 提交于
      This patch does two things.
      
      First it allows TCP to aggregate TCP frames with a fixed IPv4 ID field.  As
      a result we should now be able to aggregate flows that were converted from
      IPv6 to IPv4.  In addition this allows us more flexibility for future
      implementations of segmentation as we may be able to use a fixed IP ID when
      segmenting the flow.
      
      The second thing this does is that it places limitations on the outer IPv4
      ID header in the case of tunneled frames.  Specifically it forces the IP ID
      to be incrementing by 1 unless the DF bit is set in the outer IPv4 header.
      This way we can avoid creating overlapping series of IP IDs that could
      possibly be fragmented if the frame goes through GRO and is then
      resegmented via GSO.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1530545e
    • A
      GSO: Add GSO type for fixed IPv4 ID · cbc53e08
      Alexander Duyck 提交于
      This patch adds support for TSO using IPv4 headers with a fixed IP ID
      field.  This is meant to allow us to do a lossless GRO in the case of TCP
      flows that use a fixed IP ID such as those that convert IPv6 header to IPv4
      headers.
      
      In addition I am adding a feature that for now I am referring to TSO with
      IP ID mangling.  Basically when this flag is enabled the device has the
      option to either output the flow with incrementing IP IDs or with a fixed
      IP ID regardless of what the original IP ID ordering was.  This is useful
      in cases where the DF bit is set and we do not care if the original IP ID
      value is maintained.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbc53e08
  33. 14 4月, 2016 1 次提交