1. 15 1月, 2014 2 次提交
    • B
      net: Add net_dev_start_xmit trace event, exposing more skb fields · d87d04a7
      Ben Hutchings 提交于
      The existing net/net_dev_xmit trace event provides little information
      about the skb that has been passed to the driver, and it is not
      simple to add more since the skb may already have been freed at
      the point the event is emitted.
      
      Add a separate trace event before the skb is passed to the driver,
      including most fields that are likely to be interesting for debugging
      driver datapath behaviour.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d87d04a7
    • P
      net: add skb_checksum_setup · ed1f50c3
      Paul Durrant 提交于
      This patch adds a function to set up the partial checksum offset for IP
      packets (and optionally re-calculate the pseudo-header checksum) into the
      core network code.
      The implementation was previously private and duplicated between xen-netback
      and xen-netfront, however it is not xen-specific and is potentially useful
      to any network driver.
      Signed-off-by: NPaul Durrant <paul.durrant@citrix.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed1f50c3
  2. 14 1月, 2014 11 次提交
  3. 11 1月, 2014 2 次提交
    • C
      tcp: metrics: New netlink attribute for src IP and dumped in netlink reply · 8a59359c
      Christoph Paasch 提交于
      This patch adds a new netlink attribute for the source-IP and appends it
      to the netlink reply. Now, iproute2 can have access to the source-IP.
      Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a59359c
    • J
      net: core: explicitly select a txq before doing l2 forwarding · f663dd9a
      Jason Wang 提交于
      Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
      will cause several issues:
      
      - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
        instead of lower device which misses the necessary txq synchronization for
        lower device such as txq stopping or frozen required by dev watchdog or
        control path.
      - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
        watchdog.
      - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
        when tso is disabled for lower device.
      
      Fix this by explicitly introducing a new param for .ndo_select_queue() for just
      selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
      extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
      forwarding transmission.
      
      With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
      to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
      a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
      dev_queue_xmit() to do the transmission.
      
      In the future, it was also required for macvtap l2 forwarding support since it
      provides a necessary synchronization method.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: e1000-devel@lists.sourceforge.net
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f663dd9a
  4. 10 1月, 2014 6 次提交
  5. 08 1月, 2014 8 次提交
    • D
      net: skbuff: const-ify casts in skb_queue_* functions · fd44b93c
      Daniel Borkmann 提交于
      We should const-ify comparisons on skb_queue_* inline helper
      functions as their parameters are const as well, so lets not
      drop that.
      Suggested-by: NBrad Spengler <spender@grsecurity.net>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd44b93c
    • P
      netfilter: nft_meta: add l4proto support · 4566bf27
      Patrick McHardy 提交于
      For L3-proto independant rules we need to get at the L4 protocol value
      directly. Add it to the nft_pktinfo struct and use the meta expression
      to retrieve it.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4566bf27
    • P
      netfilter: nf_tables: add nfproto support to meta expression · 124edfa9
      Patrick McHardy 提交于
      Needed by multi-family tables to distinguish IPv4 and IPv6 packets.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      124edfa9
    • P
      netfilter: nf_tables: add "inet" table for IPv4/IPv6 · 1d49144c
      Patrick McHardy 提交于
      This patch adds a new table family and a new filter chain that you can
      use to attach IPv4 and IPv6 rules. This should help to simplify
      rule-set maintainance in dual-stack setups.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1d49144c
    • P
      netfilter: nf_tables: add support for multi family tables · 115a60b1
      Patrick McHardy 提交于
      Add support to register chains to multiple hooks for different address
      families for mixed IPv4/IPv6 tables.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      115a60b1
    • P
      netfilter: nf_tables: add hook ops to struct nft_pktinfo · c9484874
      Patrick McHardy 提交于
      Multi-family tables need the AF from the hook ops. Add a pointer to the
      hook ops and replace usage of the hooknum member in struct nft_pktinfo.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c9484874
    • J
      net-gre-gro: Add GRE support to the GRO stack · bf5a755f
      Jerry Chu 提交于
      This patch built on top of Commit 299603e8
      ("net-gro: Prepare GRO stack for the upcoming tunneling support") to add
      the support of the standard GRE (RFC1701/RFC2784/RFC2890) to the GRO
      stack. It also serves as an example for supporting other encapsulation
      protocols in the GRO stack in the future.
      
      The patch supports version 0 and all the flags (key, csum, seq#) but
      will flush any pkt with the S (seq#) flag. This is because the S flag
      is not support by GSO, and a GRO pkt may end up in the forwarding path,
      thus requiring GSO support to break it up correctly.
      
      Currently the "packet_offload" structure only contains L3 (ETH_P_IP/
      ETH_P_IPV6) GRO offload support so the encapped pkts are limited to
      IP pkts (i.e., w/o L2 hdr). But support for other protocol type can
      be easily added, so is the support for GRE variations like NVGRE.
      
      The patch also support csum offload. Specifically if the csum flag is on
      and the h/w is capable of checksumming the payload (CHECKSUM_COMPLETE),
      the code will take advantage of the csum computed by the h/w when
      validating the GRE csum.
      
      Note that commit 60769a5d "ipv4: gre:
      add GRO capability" already introduces GRO capability to IPv4 GRE
      tunnels, using the gro_cells infrastructure. But GRO is done after
      GRE hdr has been removed (i.e., decapped). The following patch applies
      GRO when pkts first come in (before hitting the GRE tunnel code). There
      is some performance advantage for applying GRO as early as possible.
      Also this approach is transparent to other subsystem like Open vSwitch
      where GRE decap is handled outside of the IP stack hence making it
      harder for the gro_cells stuff to apply. On the other hand, some NICs
      are still not capable of hashing on the inner hdr of a GRE pkt (RSS).
      In that case the GRO processing of pkts from the same remote host will
      all happen on the same CPU and the performance may be suboptimal.
      
      I'm including some rough preliminary performance numbers below. Note
      that the performance will be highly dependent on traffic load, mix as
      usual. Moreover it also depends on NIC offload features hence the
      following is by no means a comprehesive study. Local testing and tuning
      will be needed to decide the best setting.
      
      All tests spawned 50 copies of netperf TCP_STREAM and ran for 30 secs.
      (super_netperf 50 -H 192.168.1.18 -l 30)
      
      An IP GRE tunnel with only the key flag on (e.g., ip tunnel add gre1
      mode gre local 10.246.17.18 remote 10.246.17.17 ttl 255 key 123)
      is configured.
      
      The GRO support for pkts AFTER decap are controlled through the device
      feature of the GRE device (e.g., ethtool -K gre1 gro on/off).
      
      1.1 ethtool -K gre1 gro off; ethtool -K eth0 gro off
      thruput: 9.16Gbps
      CPU utilization: 19%
      
      1.2 ethtool -K gre1 gro on; ethtool -K eth0 gro off
      thruput: 5.9Gbps
      CPU utilization: 15%
      
      1.3 ethtool -K gre1 gro off; ethtool -K eth0 gro on
      thruput: 9.26Gbps
      CPU utilization: 12-13%
      
      1.4 ethtool -K gre1 gro on; ethtool -K eth0 gro on
      thruput: 9.26Gbps
      CPU utilization: 10%
      
      The following tests were performed on a different NIC that is capable of
      csum offload. I.e., the h/w is capable of computing IP payload csum
      (CHECKSUM_COMPLETE).
      
      2.1 ethtool -K gre1 gro on (hence will use gro_cells)
      
      2.1.1 ethtool -K eth0 gro off; csum offload disabled
      thruput: 8.53Gbps
      CPU utilization: 9%
      
      2.1.2 ethtool -K eth0 gro off; csum offload enabled
      thruput: 8.97Gbps
      CPU utilization: 7-8%
      
      2.1.3 ethtool -K eth0 gro on; csum offload disabled
      thruput: 8.83Gbps
      CPU utilization: 5-6%
      
      2.1.4 ethtool -K eth0 gro on; csum offload enabled
      thruput: 8.98Gbps
      CPU utilization: 5%
      
      2.2 ethtool -K gre1 gro off
      
      2.2.1 ethtool -K eth0 gro off; csum offload disabled
      thruput: 5.93Gbps
      CPU utilization: 9%
      
      2.2.2 ethtool -K eth0 gro off; csum offload enabled
      thruput: 5.62Gbps
      CPU utilization: 8%
      
      2.2.3 ethtool -K eth0 gro on; csum offload disabled
      thruput: 7.69Gbps
      CPU utilization: 8%
      
      2.2.4 ethtool -K eth0 gro on; csum offload enabled
      thruput: 8.96Gbps
      CPU utilization: 5-6%
      Signed-off-by: NH.K. Jerry Chu <hkchu@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf5a755f
    • F
      IPv6: add the option to use anycast addresses as source addresses in echo reply · 509aba3b
      FX Le Bail 提交于
      This change allows to follow a recommandation of RFC4942.
      
      - Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
        as source addresses for ICMPv6 echo reply. This sysctl is false by default
        to preserve existing behavior.
      - Add inline check ipv6_anycast_destination().
      - Use them in icmpv6_echo_reply().
      
      Reference:
      RFC4942 - IPv6 Transition/Coexistence Security Considerations
         (http://tools.ietf.org/html/rfc4942#section-2.1.6)
      
      2.1.6. Anycast Traffic Identification and Security
      
         [...]
         To avoid exposing knowledge about the internal structure of the
         network, it is recommended that anycast servers now take advantage of
         the ability to return responses with the anycast address as the
         source address if possible.
      Signed-off-by: NFrancois-Xavier Le Bail <fx.lebail@yahoo.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      509aba3b
  6. 07 1月, 2014 8 次提交
  7. 06 1月, 2014 1 次提交
  8. 05 1月, 2014 2 次提交