1. 15 5月, 2015 29 次提交
  2. 14 5月, 2015 11 次提交
    • D
      Merge branch 'nf-ingress' · 5a99e7f2
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter ingress support (v4)
      
      This is the v4 round of patches to add the Netfilter ingress hook, it basically
      comes in two steps:
      
      1) Add the CONFIG_NET_INGRESS switch to wrap the ingress static key around it.
         The idea is to use the same global static key to avoid adding more code to
         the hot path.
      
      2) Add the Netfilter ingress hook after the tc ingress hook, under the global
         ingress_needed static key. As I said, the netfilter ingress hook also has
         its own static key, that is nested under the global static key. Please, see
         patch 5/5 for performance numbers and more information.
      
      I originally started this next round, as it was suggested, exploring the
      independent static key for netfilter ingress just after tc ingress, but the
      results that I gathered from that patch are not good for non-users:
      
      Result: OK: 6425927(c6425843+d83) usec, 100000000 (60byte,0frags)
        15561955pps 7469Mb/sec (7469738400bps) errors: 100000000
      
      this roughly means 500Kpps less performance wrt. the base numbers, so that's
      the reason why I discarded that approach and I focused on this.
      
      The idea of this patchset is to open the window to nf_tables, which comes with
      features that will work out-of-the-box (once the boiler plate code to support
      the 'netdev' table family is in place), to avoid repeating myself [1], the most
      relevant features are:
      
      1) Multi-dimensional key dictionary lookups.
      2) Arbitrary stateful flow tables.
      3) Transactions and good support for dynamic updates.
      
      But there are also interest aspects to consider from userspace, such as the
      ability to support new layer 2 protocols without kernel updates, a well-defined
      netlink interface, userspace libraries and utilities for third party
      applications, among others.
      
      I hope we can be happy with this approach.
      
      Please, apply. Thanks.
      
      [1] http://marc.info/?l=netfilter-devel&m=143033337020328&w=2
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a99e7f2
    • P
      netfilter: add netfilter ingress hook after handle_ing() under unique static key · e687ad60
      Pablo Neira 提交于
      This patch adds the Netfilter ingress hook just after the existing tc ingress
      hook, that seems to be the consensus solution for this.
      
      Note that the Netfilter hook resides under the global static key that enables
      ingress filtering. Nonetheless, Netfilter still also has its own static key for
      minimal impact on the existing handle_ing().
      
      * Without this patch:
      
      Result: OK: 6216490(c6216338+d152) usec, 100000000 (60byte,0frags)
        16086246pps 7721Mb/sec (7721398080bps) errors: 100000000
      
          42.46%  kpktgend_0   [kernel.kallsyms]   [k] __netif_receive_skb_core
          25.92%  kpktgend_0   [kernel.kallsyms]   [k] kfree_skb
           7.81%  kpktgend_0   [pktgen]            [k] pktgen_thread_worker
           5.62%  kpktgend_0   [kernel.kallsyms]   [k] ip_rcv
           2.70%  kpktgend_0   [kernel.kallsyms]   [k] netif_receive_skb_internal
           2.34%  kpktgend_0   [kernel.kallsyms]   [k] netif_receive_skb_sk
           1.44%  kpktgend_0   [kernel.kallsyms]   [k] __build_skb
      
      * With this patch:
      
      Result: OK: 6214833(c6214731+d101) usec, 100000000 (60byte,0frags)
        16090536pps 7723Mb/sec (7723457280bps) errors: 100000000
      
          41.23%  kpktgend_0      [kernel.kallsyms]  [k] __netif_receive_skb_core
          26.57%  kpktgend_0      [kernel.kallsyms]  [k] kfree_skb
           7.72%  kpktgend_0      [pktgen]           [k] pktgen_thread_worker
           5.55%  kpktgend_0      [kernel.kallsyms]  [k] ip_rcv
           2.78%  kpktgend_0      [kernel.kallsyms]  [k] netif_receive_skb_internal
           2.06%  kpktgend_0      [kernel.kallsyms]  [k] netif_receive_skb_sk
           1.43%  kpktgend_0      [kernel.kallsyms]  [k] __build_skb
      
      * Without this patch + tc ingress:
      
              tc filter add dev eth4 parent ffff: protocol ip prio 1 \
                      u32 match ip dst 4.3.2.1/32
      
      Result: OK: 9269001(c9268821+d179) usec, 100000000 (60byte,0frags)
        10788648pps 5178Mb/sec (5178551040bps) errors: 100000000
      
          40.99%  kpktgend_0   [kernel.kallsyms]  [k] __netif_receive_skb_core
          17.50%  kpktgend_0   [kernel.kallsyms]  [k] kfree_skb
          11.77%  kpktgend_0   [cls_u32]          [k] u32_classify
           5.62%  kpktgend_0   [kernel.kallsyms]  [k] tc_classify_compat
           5.18%  kpktgend_0   [pktgen]           [k] pktgen_thread_worker
           3.23%  kpktgend_0   [kernel.kallsyms]  [k] tc_classify
           2.97%  kpktgend_0   [kernel.kallsyms]  [k] ip_rcv
           1.83%  kpktgend_0   [kernel.kallsyms]  [k] netif_receive_skb_internal
           1.50%  kpktgend_0   [kernel.kallsyms]  [k] netif_receive_skb_sk
           0.99%  kpktgend_0   [kernel.kallsyms]  [k] __build_skb
      
      * With this patch + tc ingress:
      
              tc filter add dev eth4 parent ffff: protocol ip prio 1 \
                      u32 match ip dst 4.3.2.1/32
      
      Result: OK: 9308218(c9308091+d126) usec, 100000000 (60byte,0frags)
        10743194pps 5156Mb/sec (5156733120bps) errors: 100000000
      
          42.01%  kpktgend_0   [kernel.kallsyms]   [k] __netif_receive_skb_core
          17.78%  kpktgend_0   [kernel.kallsyms]   [k] kfree_skb
          11.70%  kpktgend_0   [cls_u32]           [k] u32_classify
           5.46%  kpktgend_0   [kernel.kallsyms]   [k] tc_classify_compat
           5.16%  kpktgend_0   [pktgen]            [k] pktgen_thread_worker
           2.98%  kpktgend_0   [kernel.kallsyms]   [k] ip_rcv
           2.84%  kpktgend_0   [kernel.kallsyms]   [k] tc_classify
           1.96%  kpktgend_0   [kernel.kallsyms]   [k] netif_receive_skb_internal
           1.57%  kpktgend_0   [kernel.kallsyms]   [k] netif_receive_skb_sk
      
      Note that the results are very similar before and after.
      
      I can see gcc gets the code under the ingress static key out of the hot path.
      Then, on that cold branch, it generates the code to accomodate the netfilter
      ingress static key. My explanation for this is that this reduces the pressure
      on the instruction cache for non-users as the new code is out of the hot path,
      and it comes with minimal impact for tc ingress users.
      
      Using gcc version 4.8.4 on:
      
      Architecture:          x86_64
      CPU op-mode(s):        32-bit, 64-bit
      Byte Order:            Little Endian
      CPU(s):                8
      [...]
      L1d cache:             16K
      L1i cache:             64K
      L2 cache:              2048K
      L3 cache:              8192K
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e687ad60
    • P
      net: add CONFIG_NET_INGRESS to enable ingress filtering · 1cf51900
      Pablo Neira 提交于
      This new config switch enables the ingress filtering infrastructure that is
      controlled through the ingress_needed static key. This prepares the
      introduction of the Netfilter ingress hook that resides under this unique
      static key.
      
      Note that CONFIG_SCH_INGRESS automatically selects this, that should be no
      problem since this also depends on CONFIG_NET_CLS_ACT.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cf51900
    • P
      netfilter: add nf_hook_list_active() · b8d0aad0
      Pablo Neira 提交于
      In preparation to have netfilter ingress per-device hook list.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8d0aad0
    • P
      f7191483
    • P
      87d5c18c
    • D
      net: macb: OR vs AND typos · a104a6b3
      Dan Carpenter 提交于
      The bitwise tests are always true here because it uses '|' where '&' is
      intended.
      
      Fixes: 98b5a0f4 ('net: macb: Add support for jumbo frames')
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a104a6b3
    • A
      net: Reserve skb headroom and set skb->dev even if using __alloc_skb · a080e7bd
      Alexander Duyck 提交于
      When I had inlined __alloc_rx_skb into __netdev_alloc_skb and
      __napi_alloc_skb I had overlooked the fact that there was a return in the
      __alloc_rx_skb.  As a result we weren't reserving headroom or setting the
      skb->dev in certain cases.  This change corrects that by adding a couple of
      jump labels to jump to depending on __alloc_skb either succeeding or failing.
      
      Fixes: 9451980a ("net: Use cached copy of pfmemalloc to avoid accessing page")
      Reported-by: NFelipe Balbi <balbi@ti.com>
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Tested-by: NKevin Hilman <khilman@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a080e7bd
    • D
      Merge branch 'geneve_tunnel_driver' · 59af132b
      David S. Miller 提交于
      John W. Linville says:
      
      ====================
      add GENEVE netdev tunnel driver
      
      This 5-patch kernel series adds a netdev implementation of a GENEVE
      tunnel driver, and the single iproute2 patch enables creation and
      such for those netdevs.  This makes use of the existing GENEVE
      infrastructure already used by the OVS code.  The net/ipv4/geneve.c
      file is renamed as net/ipv4/geneve_core.c as part of these changes.
      
       drivers/net/Kconfig            |   14 +
       drivers/net/Makefile           |    1
       drivers/net/geneve.c           |  503 +++++++++++++++++++++++++++++++++++++++++
       include/net/geneve.h           |    5
       include/uapi/linux/if_link.h   |    9
       net/ipv4/Kconfig               |    4
       net/ipv4/Makefile              |    2
       net/ipv4/geneve.c              |    6
       net/ipv4/geneve_core.c         |    4
       net/openvswitch/Kconfig        |    2
       net/openvswitch/vport-geneve.c |    5
       11 files changed, 538 insertions(+), 17 deletions(-)
      
      The overall structure of the GENEVE netdev driver is strongly
      influenced by the VXLAN netdev driver.  This is not surprising, as the
      two drivers are intended to serve similar purposes.  As development of
      the GENEVE driver continues, it is likely that those similarities will
      grow stronger.  This will include both simple configuration options
      (e.g. TOS and TTL settings) and new control plane support.
      
      The current implementation is very simple, restricting itself to point
      to point links over IPv4.  This is due only to the simplicity of the
      implementation, and no such limit is inherent to GENEVE in any way.
      Support for IPv6 links and more sophisticated control plane options
      are predictable enhancements.
      
      Using the included iproute2 patch, a GENEVE tunnel is created thusly:
      
              ip link add dev gnv0 type geneve remote 192.168.22.1 vni 1234
              ip link set gnv0 up
              ip addr add 10.1.1.1/24 dev gnv0
      
      After a corresponding tunnel interface is created at the link partner,
      traffic should proceed as expected.
      
      Please let me know if anyone has problems...thanks!
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59af132b
    • J
      geneve: add initial netdev driver for GENEVE tunnels · 2d07dc79
      John W. Linville 提交于
      This is an initial implementation of a netdev driver for GENEVE
      tunnels.  This implementation uses a fixed UDP port, and only supports
      point-to-point links with specific partner endpoints.  Only IPv4
      links are supported at this time.
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d07dc79
    • J