1. 09 9月, 2017 1 次提交
    • F
      netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable" · e1bf1687
      Florian Westphal 提交于
      This reverts commit 870190a9.
      
      It was not a good idea. The custom hash table was a much better
      fit for this purpose.
      
      A fast lookup is not essential, in fact for most cases there is no lookup
      at all because original tuple is not taken and can be used as-is.
      What needs to be fast is insertion and deletion.
      
      rhlist removal however requires a rhlist walk.
      We can have thousands of entries in such a list if source port/addresses
      are reused for multiple flows, if this happens removal requests are so
      expensive that deletions of a few thousand flows can take several
      seconds(!).
      
      The advantages that we got from rhashtable are:
      1) table auto-sizing
      2) multiple locks
      
      1) would be nice to have, but it is not essential as we have at
      most one lookup per new flow, so even a million flows in the bysource
      table are not a problem compared to current deletion cost.
      2) is easy to add to custom hash table.
      
      I tried to add hlist_node to rhlist to speed up rhltable_remove but this
      isn't doable without changing semantics.  rhltable_remove_fast will
      check that the to-be-deleted object is part of the table and that
      requires a list walk that we want to avoid.
      
      Furthermore, using hlist_node increases size of struct rhlist_head, which
      in turn increases nf_conn size.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=196821Reported-by: NIvan Babrou <ibobrik@gmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e1bf1687
  2. 08 9月, 2017 1 次提交
  3. 06 9月, 2017 5 次提交
    • C
      net: mdio-mux: add mdio_mux parameter to mdio_mux_init() · 5482a978
      Corentin Labbe 提交于
      mdio_mux_init() use the parameter dev for two distinct thing:
      1) Have a device for all devm_ functions
      2) Get device_node from it
      
      Since it is two distinct purpose, this patch add a parameter mdio_mux
      that is linked to task 2.
      
      This will also permit to register an of_node mdio-mux that lacks a direct
      owning device.
      For example a mdio-mux which is a subnode of a real device.
      Signed-off-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5482a978
    • F
      net: dsa: Allow switch drivers to indicate number of TX queues · 55199df6
      Florian Fainelli 提交于
      Let switch drivers indicate how many TX queues they support. Some
      switches, such as Broadcom Starfighter 2 are designed with 8 egress
      queues. Future changes will allow us to leverage the queue mapping and
      direct the transmission towards a particular queue.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55199df6
    • T
      flow_dissector: Cleanup control flow · 3a1214e8
      Tom Herbert 提交于
      __skb_flow_dissect is riddled with gotos that make discerning the flow,
      debugging, and extending the capability difficult. This patch
      reorganizes things so that we only perform goto's after the two main
      switch statements (no gotos within the cases now). It also eliminates
      several goto labels so that there are only two labels that can be target
      for goto.
      Reported-by: NAlexander Popov <alex.popov@linux.com>
      Signed-off-by: NTom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a1214e8
    • A
      soc: ti/knav_dma: include dmaengine header · 2c08ab3f
      Arnd Bergmann 提交于
      A header file cleanup apparently caused a build regression
      with one driver using the knav infrastructure:
      
      In file included from drivers/net/ethernet/ti/netcp_core.c:30:0:
      include/linux/soc/ti/knav_dma.h:129:30: error: field 'direction' has incomplete type
        enum dma_transfer_direction direction;
                                    ^~~~~~~~~
      drivers/net/ethernet/ti/netcp_core.c: In function 'netcp_txpipe_open':
      drivers/net/ethernet/ti/netcp_core.c:1349:21: error: 'DMA_MEM_TO_DEV' undeclared (first use in this function); did you mean 'DMA_MEMORY_MAP'?
        config.direction = DMA_MEM_TO_DEV;
                           ^~~~~~~~~~~~~~
                           DMA_MEMORY_MAP
      drivers/net/ethernet/ti/netcp_core.c:1349:21: note: each undeclared identifier is reported only once for each function it appears in
      drivers/net/ethernet/ti/netcp_core.c: In function 'netcp_setup_navigator_resources':
      drivers/net/ethernet/ti/netcp_core.c:1659:22: error: 'DMA_DEV_TO_MEM' undeclared (first use in this function); did you mean 'DMA_DESC_HOST'?
        config.direction  = DMA_DEV_TO_MEM;
      
      As the header is no longer included implicitly through netdevice.h,
      we should include it in the header that references the enum.
      
      Fixes: 0dd5759d ("net: remove dmaengine.h inclusion from netdevice.h")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c08ab3f
    • A
      net/ncsi: fix ncsi_vlan_rx_{add,kill}_vid references · fd0c88b7
      Arnd Bergmann 提交于
      We get a new link error in allmodconfig kernels after ftgmac100
      started using the ncsi helpers:
      
      ERROR: "ncsi_vlan_rx_kill_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
      ERROR: "ncsi_vlan_rx_add_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
      
      Related to that, we get another error when CONFIG_NET_NCSI is disabled:
      
      drivers/net/ethernet/faraday/ftgmac100.c:1626:25: error: 'ncsi_vlan_rx_add_vid' undeclared here (not in a function); did you mean 'ncsi_start_dev'?
      drivers/net/ethernet/faraday/ftgmac100.c:1627:26: error: 'ncsi_vlan_rx_kill_vid' undeclared here (not in a function); did you mean 'ncsi_vlan_rx_add_vid'?
      
      This fixes both problems at once, using a 'static inline' stub helper
      for the disabled case, and exporting the functions when they are present.
      
      Fixes: 51564585 ("ftgmac100: Support NCSI VLAN filtering when available")
      Fixes: 21acf630 ("net/ncsi: Configure VLAN tag filter")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd0c88b7
  4. 05 9月, 2017 1 次提交
    • J
      mac80211: fix VLAN handling with TXQs · 53168215
      Johannes Berg 提交于
      With TXQs, the AP_VLAN interfaces are resolved to their owner AP
      interface when enqueuing the frame, which makes sense since the
      frame really goes out on that as far as the driver is concerned.
      
      However, this introduces a problem: frames to be encrypted with
      a VLAN-specific GTK will now be encrypted with the AP GTK, since
      the information about which virtual interface to use to select
      the key is taken from the TXQ.
      
      Fix this by preserving info->control.vif and using that in the
      dequeue function. This now requires doing the driver-mapping
      in the dequeue as well.
      
      Since there's no way to filter the frames that are sitting on a
      TXQ, drop all frames, which may affect other interfaces, when an
      AP_VLAN is removed.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      53168215
  5. 04 9月, 2017 10 次提交
    • P
      netlink: add NLM_F_NONREC flag for deletion requests · 2335ba70
      Pablo Neira Ayuso 提交于
      In the last NFWS in Faro, Portugal, we discussed that netlink is lacking
      the semantics to request non recursive deletions, ie. do not delete an
      object iff it has child objects that hang from this parent object that
      the user requests to be deleted.
      
      We need this new flag to solve a problem for the iptables-compat
      backward compatibility utility, that runs iptables commands using the
      existing nf_tables netlink interface. Specifically, custom chains in
      iptables cannot be deleted if there are rules in it, however, nf_tables
      allows to remove any chain that is populated with content. To sort out
      this asymmetry, iptables-compat userspace sets this new NLM_F_NONREC
      flag to obtain the same semantics that iptables provides.
      
      This new flag should only be used for deletion requests. Note this new
      flag value overlaps with the existing:
      
      * NLM_F_ROOT for get requests.
      * NLM_F_REPLACE for new requests.
      
      However, those flags should not ever be used in deletion requests.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2335ba70
    • V
      net: Remove CONFIG_NETFILTER_DEBUG and _ASSERT() macros. · 9efdb14f
      Varsha Rao 提交于
      This patch removes CONFIG_NETFILTER_DEBUG and _ASSERT() macros as they
      are no longer required. Replace _ASSERT() macros with WARN_ON().
      Signed-off-by: NVarsha Rao <rvarsha016@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9efdb14f
    • V
      net: Replace NF_CT_ASSERT() with WARN_ON(). · 44d6e2f2
      Varsha Rao 提交于
      This patch removes NF_CT_ASSERT() and instead uses WARN_ON().
      Signed-off-by: NVarsha Rao <rvarsha016@gmail.com>
      44d6e2f2
    • F
      netfilter: remove unused hooknum arg from packet functions · d1c1e39d
      Florian Westphal 提交于
      tested with allmodconfig build.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      d1c1e39d
    • P
      netfilter: nft_limit: add stateful object type · a6912055
      Pablo M. Bermudo Garay 提交于
      Register a new limit stateful object type into the stateful object
      infrastructure.
      Signed-off-by: NPablo M. Bermudo Garay <pablombg@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a6912055
    • P
      netfilter: nf_tables: add select_ops for stateful objects · dfc46034
      Pablo M. Bermudo Garay 提交于
      This patch adds support for overloading stateful objects operations
      through the select_ops() callback, just as it is implemented for
      expressions.
      
      This change is needed for upcoming additions to the stateful objects
      infrastructure.
      Signed-off-by: NPablo M. Bermudo Garay <pablombg@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      dfc46034
    • V
      netfilter: xt_hashlimit: add rate match mode · bea74641
      Vishwanath Pai 提交于
      This patch adds a new feature to hashlimit that allows matching on the
      current packet/byte rate without rate limiting. This can be enabled
      with a new flag --hashlimit-rate-match. The match returns true if the
      current rate of packets is above/below the user specified value.
      
      The main difference between the existing algorithm and the new one is
      that the existing algorithm rate-limits the flow whereas the new
      algorithm does not. Instead it *classifies* the flow based on whether
      it is above or below a certain rate. I will demonstrate this with an
      example below. Let us assume this rule:
      
      iptables -A INPUT -m hashlimit --hashlimit-above 10/s -j new_chain
      
      If the packet rate is 15/s, the existing algorithm would ACCEPT 10
      packets every second and send 5 packets to "new_chain".
      
      But with the new algorithm, as long as the rate of 15/s is sustained,
      all packets will continue to match and every packet is sent to new_chain.
      
      This new functionality will let us classify different flows based on
      their current rate, so that further decisions can be made on them based on
      what the current rate is.
      
      This is how the new algorithm works:
      We divide time into intervals of 1 (sec/min/hour) as specified by
      the user. We keep track of the number of packets/bytes processed in the
      current interval. After each interval we reset the counter to 0.
      
      When we receive a packet for match, we look at the packet rate
      during the current interval and the previous interval to make a
      decision:
      
      if [ prev_rate < user and cur_rate < user ]
              return Below
      else
              return Above
      
      Where cur_rate is the number of packets/bytes seen in the current
      interval, prev is the number of packets/bytes seen in the previous
      interval and 'user' is the rate specified by the user.
      
      We also provide flexibility to the user for choosing the time
      interval using the option --hashilmit-interval. For example the user can
      keep a low rate like x/hour but still keep the interval as small as 1
      second.
      
      To preserve backwards compatibility we have to add this feature in a new
      revision, so I've created revision 3 for hashlimit. The two new options
      we add are:
      
      --hashlimit-rate-match
      --hashlimit-rate-interval
      
      I have updated the help text to add these new options. Also added a few
      tests for the new options.
      Suggested-by: NIgor Lubashev <ilubashe@akamai.com>
      Reviewed-by: NJosh Hunt <johunt@akamai.com>
      Signed-off-by: NVishwanath Pai <vpai@akamai.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      bea74641
    • J
      Revert "net: fix percpu memory leaks" · 5a63643e
      Jesper Dangaard Brouer 提交于
      This reverts commit 1d6119ba.
      
      After reverting commit 6d7b857d ("net: use lib/percpu_counter API
      for fragmentation mem accounting") then here is no need for this
      fix-up patch.  As percpu_counter is no longer used, it cannot
      memory leak it any-longer.
      
      Fixes: 6d7b857d ("net: use lib/percpu_counter API for fragmentation mem accounting")
      Fixes: 1d6119ba ("net: fix percpu memory leaks")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a63643e
    • J
      Revert "net: use lib/percpu_counter API for fragmentation mem accounting" · fb452a1a
      Jesper Dangaard Brouer 提交于
      This reverts commit 6d7b857d.
      
      There is a bug in fragmentation codes use of the percpu_counter API,
      that can cause issues on systems with many CPUs.
      
      The frag_mem_limit() just reads the global counter (fbc->count),
      without considering other CPUs can have upto batch size (130K) that
      haven't been subtracted yet.  Due to the 3MBytes lower thresh limit,
      this become dangerous at >=24 CPUs (3*1024*1024/130000=24).
      
      The correct API usage would be to use __percpu_counter_compare() which
      does the right thing, and takes into account the number of (online)
      CPUs and batch size, to account for this and call __percpu_counter_sum()
      when needed.
      
      We choose to revert the use of the lib/percpu_counter API for frag
      memory accounting for several reasons:
      
      1) On systems with CPUs > 24, the heavier fully locked
         __percpu_counter_sum() is always invoked, which will be more
         expensive than the atomic_t that is reverted to.
      
      Given systems with more than 24 CPUs are becoming common this doesn't
      seem like a good option.  To mitigate this, the batch size could be
      decreased and thresh be increased.
      
      2) The add_frag_mem_limit+sub_frag_mem_limit pairs happen on the RX
         CPU, before SKBs are pushed into sockets on remote CPUs.  Given
         NICs can only hash on L2 part of the IP-header, the NIC-RXq's will
         likely be limited.  Thus, a fair chance that atomic add+dec happen
         on the same CPU.
      
      Revert note that commit 1d6119ba ("net: fix percpu memory leaks")
      removed init_frag_mem_limit() and instead use inet_frags_init_net().
      After this revert, inet_frags_uninit_net() becomes empty.
      
      Fixes: 6d7b857d ("net: use lib/percpu_counter API for fragmentation mem accounting")
      Fixes: 1d6119ba ("net: fix percpu memory leaks")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb452a1a
    • S
      ipv4: Don't override return code from ip_route_input_noref() · 64327fc8
      Stefano Brivio 提交于
      After ip_route_input() calls ip_route_input_noref(), another
      check on skb_dst() is done, but if this fails, we shouldn't
      override the return code from ip_route_input_noref(), as it
      could have been more specific (i.e. -EHOSTUNREACH).
      
      This also saves one call to skb_dst_force_safe() and one to
      skb_dst() in case the ip_route_input_noref() check fails.
      Reported-by: NSabrina Dubroca <sdubroca@redhat.com>
      Fixes: 9df16efa ("ipv4: call dst_hold_safe() properly")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Acked-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64327fc8
  6. 03 9月, 2017 1 次提交
  7. 02 9月, 2017 8 次提交
  8. 01 9月, 2017 13 次提交