1. 20 9月, 2017 5 次提交
    • E
      ipv4: speedup ipv6 tunnels dismantle · 64bc1781
      Eric Dumazet 提交于
      Implement exit_batch() method to dismantle more devices
      per round.
      
      (rtnl_lock() ...
       unregister_netdevice_many() ...
       rtnl_unlock())
      
      Tested:
      $ cat add_del_unshare.sh
      for i in `seq 1 40`
      do
       (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
      done
      wait ; grep net_namespace /proc/slabinfo
      
      Before patch :
      $ time ./add_del_unshare.sh
      net_namespace        126    282   5504    1    2 : tunables    8    4    0 : slabdata    126    282      0
      
      real    1m38.965s
      user    0m0.688s
      sys     0m37.017s
      
      After patch:
      $ time ./add_del_unshare.sh
      net_namespace        135    291   5504    1    2 : tunables    8    4    0 : slabdata    135    291      0
      
      real	0m22.117s
      user	0m0.728s
      sys	0m35.328s
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64bc1781
    • E
      ipv6: addrlabel: per netns list · a90c9347
      Eric Dumazet 提交于
      Having a global list of labels do not scale to thousands of
      netns in the cloud era. This causes quadratic behavior on
      netns creation and deletion.
      
      This is time having a per netns list of ~10 labels.
      
      Tested:
      
      $ time perf record (for f in `seq 1 3000` ; do ip netns add tast$f; done)
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 3.637 MB perf.data (~158898 samples) ]
      
      real    0m20.837s # instead of 0m24.227s
      user    0m0.328s
      sys     0m20.338s # instead of 0m23.753s
      
          16.17%       ip  [kernel.kallsyms]  [k] netlink_broadcast_filtered
          12.30%       ip  [kernel.kallsyms]  [k] netlink_has_listeners
           6.76%       ip  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
           5.78%       ip  [kernel.kallsyms]  [k] memset_erms
           5.77%       ip  [kernel.kallsyms]  [k] kobject_uevent_env
           5.18%       ip  [kernel.kallsyms]  [k] refcount_sub_and_test
           4.96%       ip  [kernel.kallsyms]  [k] _raw_read_lock
           3.82%       ip  [kernel.kallsyms]  [k] refcount_inc_not_zero
           3.33%       ip  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
           2.11%       ip  [kernel.kallsyms]  [k] unmap_page_range
           1.77%       ip  [kernel.kallsyms]  [k] __wake_up
           1.69%       ip  [kernel.kallsyms]  [k] strlen
           1.17%       ip  [kernel.kallsyms]  [k] __wake_up_common
           1.09%       ip  [kernel.kallsyms]  [k] insert_header
           1.04%       ip  [kernel.kallsyms]  [k] page_remove_rmap
           1.01%       ip  [kernel.kallsyms]  [k] consume_skb
           0.98%       ip  [kernel.kallsyms]  [k] netlink_trim
           0.51%       ip  [kernel.kallsyms]  [k] kernfs_link_sibling
           0.51%       ip  [kernel.kallsyms]  [k] filemap_map_pages
           0.46%       ip  [kernel.kallsyms]  [k] memcpy_erms
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a90c9347
    • C
      net_sched: no need to free qdisc in RCU callback · 752fbcc3
      Cong Wang 提交于
      gen estimator has been rewritten in commit 1c0d32fd
      ("net_sched: gen_estimator: complete rewrite of rate estimators"),
      the caller no longer needs to wait for a grace period. So this
      patch gets rid of it.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      752fbcc3
    • V
      net: dsa: remove copy of master ethtool_ops · f5619866
      Vivien Didelot 提交于
      There is no need to store a copy of the master ethtool ops, storing the
      original pointer in DSA and the new one in the master netdev itself is
      enough.
      
      In the meantime, set orig_ethtool_ops to NULL when restoring the master
      ethtool ops and check the presence of the master original ethtool ops as
      well as its needed functions before calling them.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5619866
    • E
      net: sk_buff rbnode reorg · bffa72cf
      Eric Dumazet 提交于
      skb->rbnode shares space with skb->next, skb->prev and skb->tstamp
      
      Current uses (TCP receive ofo queue and netem) need to save/restore
      tstamp, while skb->dev is either NULL (TCP) or a constant for a given
      queue (netem).
      
      Since we plan using an RB tree for TCP retransmit queue to speedup SACK
      processing with large BDP, this patch exchanges skb->dev and
      skb->tstamp.
      
      This saves some overhead in both TCP and netem.
      
      v2: removes the swtstamp field from struct tcp_skb_cb
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bffa72cf
  2. 19 9月, 2017 1 次提交
  3. 16 9月, 2017 1 次提交
    • X
      sctp: fix an use-after-free issue in sctp_sock_dump · d25adbeb
      Xin Long 提交于
      Commit 86fdb344 ("sctp: ensure ep is not destroyed before doing the
      dump") tried to fix an use-after-free issue by checking !sctp_sk(sk)->ep
      with holding sock and sock lock.
      
      But Paolo noticed that endpoint could be destroyed in sctp_rcv without
      sock lock protection. It means the use-after-free issue still could be
      triggered when sctp_rcv put and destroy ep after sctp_sock_dump checks
      !ep, although it's pretty hard to reproduce.
      
      I could reproduce it by mdelay in sctp_rcv while msleep in sctp_close
      and sctp_sock_dump long time.
      
      This patch is to add another param cb_done to sctp_for_each_transport
      and dump ep->assocs with holding tsp after jumping out of transport's
      traversal in it to avoid this issue.
      
      It can also improve sctp diag dump to make it run faster, as no need
      to save sk into cb->args[5] and keep calling sctp_for_each_transport
      any more.
      
      This patch is also to use int * instead of int for the pos argument
      in sctp_for_each_transport, which could make postion increment only
      in sctp_for_each_transport and no need to keep changing cb->args[2]
      in sctp_sock_filter and sctp_sock_dump any more.
      
      Fixes: 86fdb344 ("sctp: ensure ep is not destroyed before doing the dump")
      Reported-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d25adbeb
  4. 14 9月, 2017 1 次提交
    • D
      sctp: potential read out of bounds in sctp_ulpevent_type_enabled() · fa5f7b51
      Dan Carpenter 提交于
      This code causes a static checker warning because Smatch doesn't trust
      anything that comes from skb->data.  I've reviewed this code and I do
      think skb->data can be controlled by the user here.
      
      The sctp_event_subscribe struct has 13 __u8 fields and we want to see
      if ours is non-zero.  sn_type can be any value in the 0-USHRT_MAX range.
      We're subtracting SCTP_SN_TYPE_BASE which is 1 << 15 so we could read
      either before the start of the struct or after the end.
      
      This is a very old bug and it's surprising that it would go undetected
      for so long but my theory is that it just doesn't have a big impact so
      it would be hard to notice.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa5f7b51
  5. 13 9月, 2017 1 次提交
    • C
      net_sched: get rid of tcfa_rcu · d7fb60b9
      Cong Wang 提交于
      gen estimator has been rewritten in commit 1c0d32fd
      ("net_sched: gen_estimator: complete rewrite of rate estimators"),
      the caller is no longer needed to wait for a grace period.
      So this patch gets rid of it.
      
      This also completely closes a race condition between action free
      path and filter chain add/remove path for the following patch.
      Because otherwise the nested RCU callback can't be caught by
      rcu_barrier().
      
      Please see also the comments in code.
      
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7fb60b9
  6. 09 9月, 2017 1 次提交
    • F
      netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable" · e1bf1687
      Florian Westphal 提交于
      This reverts commit 870190a9.
      
      It was not a good idea. The custom hash table was a much better
      fit for this purpose.
      
      A fast lookup is not essential, in fact for most cases there is no lookup
      at all because original tuple is not taken and can be used as-is.
      What needs to be fast is insertion and deletion.
      
      rhlist removal however requires a rhlist walk.
      We can have thousands of entries in such a list if source port/addresses
      are reused for multiple flows, if this happens removal requests are so
      expensive that deletions of a few thousand flows can take several
      seconds(!).
      
      The advantages that we got from rhashtable are:
      1) table auto-sizing
      2) multiple locks
      
      1) would be nice to have, but it is not essential as we have at
      most one lookup per new flow, so even a million flows in the bysource
      table are not a problem compared to current deletion cost.
      2) is easy to add to custom hash table.
      
      I tried to add hlist_node to rhlist to speed up rhltable_remove but this
      isn't doable without changing semantics.  rhltable_remove_fast will
      check that the to-be-deleted object is part of the table and that
      requires a list walk that we want to avoid.
      
      Furthermore, using hlist_node increases size of struct rhlist_head, which
      in turn increases nf_conn size.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=196821Reported-by: NIvan Babrou <ibobrik@gmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e1bf1687
  7. 06 9月, 2017 3 次提交
    • F
      net: dsa: Allow switch drivers to indicate number of TX queues · 55199df6
      Florian Fainelli 提交于
      Let switch drivers indicate how many TX queues they support. Some
      switches, such as Broadcom Starfighter 2 are designed with 8 egress
      queues. Future changes will allow us to leverage the queue mapping and
      direct the transmission towards a particular queue.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55199df6
    • T
      flow_dissector: Cleanup control flow · 3a1214e8
      Tom Herbert 提交于
      __skb_flow_dissect is riddled with gotos that make discerning the flow,
      debugging, and extending the capability difficult. This patch
      reorganizes things so that we only perform goto's after the two main
      switch statements (no gotos within the cases now). It also eliminates
      several goto labels so that there are only two labels that can be target
      for goto.
      Reported-by: NAlexander Popov <alex.popov@linux.com>
      Signed-off-by: NTom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a1214e8
    • A
      net/ncsi: fix ncsi_vlan_rx_{add,kill}_vid references · fd0c88b7
      Arnd Bergmann 提交于
      We get a new link error in allmodconfig kernels after ftgmac100
      started using the ncsi helpers:
      
      ERROR: "ncsi_vlan_rx_kill_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
      ERROR: "ncsi_vlan_rx_add_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
      
      Related to that, we get another error when CONFIG_NET_NCSI is disabled:
      
      drivers/net/ethernet/faraday/ftgmac100.c:1626:25: error: 'ncsi_vlan_rx_add_vid' undeclared here (not in a function); did you mean 'ncsi_start_dev'?
      drivers/net/ethernet/faraday/ftgmac100.c:1627:26: error: 'ncsi_vlan_rx_kill_vid' undeclared here (not in a function); did you mean 'ncsi_vlan_rx_add_vid'?
      
      This fixes both problems at once, using a 'static inline' stub helper
      for the disabled case, and exporting the functions when they are present.
      
      Fixes: 51564585 ("ftgmac100: Support NCSI VLAN filtering when available")
      Fixes: 21acf630 ("net/ncsi: Configure VLAN tag filter")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd0c88b7
  8. 05 9月, 2017 1 次提交
    • J
      mac80211: fix VLAN handling with TXQs · 53168215
      Johannes Berg 提交于
      With TXQs, the AP_VLAN interfaces are resolved to their owner AP
      interface when enqueuing the frame, which makes sense since the
      frame really goes out on that as far as the driver is concerned.
      
      However, this introduces a problem: frames to be encrypted with
      a VLAN-specific GTK will now be encrypted with the AP GTK, since
      the information about which virtual interface to use to select
      the key is taken from the TXQ.
      
      Fix this by preserving info->control.vif and using that in the
      dequeue function. This now requires doing the driver-mapping
      in the dequeue as well.
      
      Since there's no way to filter the frames that are sitting on a
      TXQ, drop all frames, which may affect other interfaces, when an
      AP_VLAN is removed.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      53168215
  9. 04 9月, 2017 7 次提交
  10. 02 9月, 2017 2 次提交
  11. 01 9月, 2017 2 次提交
    • A
      devlink: Add IPv6 header for dpipe · 1797f5b3
      Arkadi Sharshevsky 提交于
      This will be used by the IPv6 host table which will be introduced in the
      following patches. The fields in the header are added per-use. This header
      is global and can be reused by many drivers.
      Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1797f5b3
    • C
      net_sched: add reverse binding for tc class · 07d79fc7
      Cong Wang 提交于
      TC filters when used as classifiers are bound to TC classes.
      However, there is a hidden difference when adding them in different
      orders:
      
      1. If we add tc classes before its filters, everything is fine.
         Logically, the classes exist before we specify their ID's in
         filters, it is easy to bind them together, just as in the current
         code base.
      
      2. If we add tc filters before the tc classes they bind, we have to
         do dynamic lookup in fast path. What's worse, this happens all
         the time not just once, because on fast path tcf_result is passed
         on stack, there is no way to propagate back to the one in tc filters.
      
      This hidden difference hurts performance silently if we have many tc
      classes in hierarchy.
      
      This patch intends to close this gap by doing the reverse binding when
      we create a new class, in this case we can actually search all the
      filters in its parent, match and fixup by classid. And because
      tcf_result is specific to each type of tc filter, we have to introduce
      a new ops for each filter to tell how to bind the class.
      
      Note, we still can NOT totally get rid of those class lookup in
      ->enqueue() because cgroup and flow filters have no way to determine
      the classid at setup time, they still have to go through dynamic lookup.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07d79fc7
  12. 31 8月, 2017 4 次提交
  13. 30 8月, 2017 4 次提交
    • E
      neigh: increase queue_len_bytes to match wmem_default · eaa72dc4
      Eric Dumazet 提交于
      Florian reported UDP xmit drops that could be root caused to the
      too small neigh limit.
      
      Current limit is 64 KB, meaning that even a single UDP socket would hit
      it, since its default sk_sndbuf comes from net.core.wmem_default
      (~212992 bytes on 64bit arches).
      
      Once ARP/ND resolution is in progress, we should allow a little more
      packets to be queued, at least for one producer.
      
      Once neigh arp_queue is filled, a rogue socket should hit its sk_sndbuf
      limit and either block in sendmsg() or return -EAGAIN.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eaa72dc4
    • D
      ipv6: Use rt6i_idev index for echo replies to a local address · 1b70d792
      David Ahern 提交于
      Tariq repored local pings to linklocal address is failing:
      $ ifconfig ens8
      ens8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
              inet 11.141.16.6  netmask 255.255.0.0  broadcast 11.141.255.255
              inet6 fe80::7efe:90ff:fecb:7502  prefixlen 64  scopeid 0x20<link>
              ether 7c:fe:90:cb:75:02  txqueuelen 1000  (Ethernet)
              RX packets 12  bytes 1164 (1.1 KiB)
              RX errors 0  dropped 0  overruns 0  frame 0
              TX packets 30  bytes 2484 (2.4 KiB)
              TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
      
      $  /bin/ping6 -c 3 fe80::7efe:90ff:fecb:7502%ens8
      PING fe80::7efe:90ff:fecb:7502%ens8(fe80::7efe:90ff:fecb:7502) 56 data bytes
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b70d792
    • Y
      net: add NSH header structures and helpers · 1f0b7744
      Yi Yang 提交于
      NSH (Network Service Header)[1] is a new protocol for service
      function chaining, it can be handled as a L3 protocol like
      IPv4 and IPv6, Eth + NSH + Inner packet or VxLAN-gpe + NSH +
      Inner packet are two typical use cases.
      
      This patch adds NSH header structures and helpers for NSH GSO
      support and Open vSwitch NSH support.
      
      [1] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/
      
      [Jiri: added nsh_hdr() helper and renamed the header struct to "struct
      nshhdr" to match the usual pattern. Removed packet type defines, these are
      now shared with VXLAN-GPE.]
      Signed-off-by: NYi Yang <yi.y.yang@intel.com>
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f0b7744
    • J
      vxlan: factor out VXLAN-GPE next protocol · fa20e0e3
      Jiri Benc 提交于
      The values are shared between VXLAN-GPE and NSH. Originally probably by
      coincidence but I notified both working groups about this last year and they
      seem to keep the values in sync since then.
      
      Hopefully they'll get a single IANA registry for the values, too. (I asked
      them for that.)
      
      Factor out the code to be shared by the NSH implementation.
      
      NSH and MPLS values are added in this patch, too. For MPLS, the drafts
      incorrectly assign only a single value, while we have two MPLS ethertypes.
      I raised the problem with both groups. For now, I assume the value is for
      unicast.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa20e0e3
  14. 29 8月, 2017 6 次提交
    • D
      rxrpc: Allow failed client calls to be retried · c038a58c
      David Howells 提交于
      Allow a client call that failed on network error to be retried, provided
      that the Tx queue still holds DATA packet 1.  This allows an operation to
      be submitted to another server or another address for the same server
      without having to repackage and re-encrypt the data so far processed.
      
      Two new functions are provided:
      
       (1) rxrpc_kernel_check_call() - This is used to find out the completion
           state of a call to guess whether it can be retried and whether it
           should be retried.
      
       (2) rxrpc_kernel_retry_call() - Disconnect the call from its current
           connection, reset the state and submit it as a new client call to a
           new address.  The new address need not match the previous address.
      
      A call may be retried even if all the data hasn't been loaded into it yet;
      a partially constructed will be retained at the same point it was at when
      an error condition was detected.  msg_data_left() can be used to find out
      how much data was packaged before the error occurred.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c038a58c
    • D
      rxrpc: Add notification of end-of-Tx phase · e833251a
      David Howells 提交于
      Add a callback to rxrpc_kernel_send_data() so that a kernel service can get
      a notification that the AF_RXRPC call has transitioned out the Tx phase and
      is now waiting for a reply or a final ACK.
      
      This is called from AF_RXRPC with the call state lock held so the
      notification is guaranteed to come before any reply is passed back.
      
      Further, modify the AFS filesystem to make use of this so that we don't have
      to change the afs_call state before sending the last bit of data.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e833251a
    • S
      net/ncsi: Configure VLAN tag filter · 21acf630
      Samuel Mendoza-Jonas 提交于
      Make use of the ndo_vlan_rx_{add,kill}_vid callbacks to have the NCSI
      stack process new VLAN tags and configure the channel VLAN filter
      appropriately.
      Several VLAN tags can be set and a "Set VLAN Filter" packet must be sent
      for each one, meaning the ncsi_dev_state_config_svf state must be
      repeated. An internal list of VLAN tags is maintained, and compared
      against the current channel's ncsi_channel_filter in order to keep track
      within the state. VLAN filters are removed in a similar manner, with the
      introduction of the ncsi_dev_state_config_clear_vids state. The maximum
      number of VLAN tag filters is determined by the "Get Capabilities"
      response from the channel.
      Signed-off-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21acf630
    • G
      irda: move include/net/irda into staging subdirectory · 5bf916ee
      Greg Kroah-Hartman 提交于
      And finally, move the irda include files into
      drivers/staging/irda/include/net/irda.  Yes, it's a long path, but it
      makes it easy for us to just add a Makefile directory path addition and
      all of the net and drivers code "just works".
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5bf916ee
    • W
      ipv6: fix sparse warning on rt6i_node · 4e587ea7
      Wei Wang 提交于
      Commit c5cff856 adds rcu grace period before freeing fib6_node. This
      generates a new sparse warning on rt->rt6i_node related code:
        net/ipv6/route.c:1394:30: error: incompatible types in comparison
        expression (different address spaces)
        ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
        expression (different address spaces)
      
      This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
      rcu API is used for it.
      After this fix, sparse no longer generates the above warning.
      
      Fixes: c5cff856 ("ipv6: add rcu grace period before freeing fib6_node")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e587ea7
    • W
      gre: add collect_md mode to ERSPAN tunnel · 1a66a836
      William Tu 提交于
      Similar to gre, vxlan, geneve, ipip tunnels, allow ERSPAN tunnels to
      operate in 'collect metadata' mode.  bpf_skb_[gs]et_tunnel_key() helpers
      can make use of it right away.  OVS can use it as well in the future.
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a66a836
  15. 28 8月, 2017 1 次提交
    • A
      netfilter: convert hook list to an array · 960632ec
      Aaron Conole 提交于
      This converts the storage and layout of netfilter hook entries from a
      linked list to an array.  After this commit, hook entries will be
      stored adjacent in memory.  The next pointer is no longer required.
      
      The ops pointers are stored at the end of the array as they are only
      used in the register/unregister path and in the legacy br_netfilter code.
      
      nf_unregister_net_hooks() is slower than needed as it just calls
      nf_unregister_net_hook in a loop (i.e. at least n synchronize_net()
      calls), this will be addressed in followup patch.
      
      Test setup:
       - ixgbe 10gbit
       - netperf UDP_STREAM, 64 byte packets
       - 5 hooks: (raw + mangle prerouting, mangle+filter input, inet filter):
      empty mangle and raw prerouting, mangle and filter input hooks:
      353.9
      this patch:
      364.2
      Signed-off-by: NAaron Conole <aconole@bytheb.org>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      960632ec