1. 26 9月, 2016 9 次提交
    • W
      be2net: fix non static symbol warnings · e6053dd5
      Wei Yongjun 提交于
      Fixes the following sparse warnings:
      
      drivers/net/ethernet/emulex/benet/be_main.c:47:25: warning:
       symbol 'be_err_recovery_workq' was not declared. Should it be static?
      drivers/net/ethernet/emulex/benet/be_main.c:63:25: warning:
       symbol 'be_wq' was not declared. Should it be static?
      Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6053dd5
    • R
      net: smc91x: take into account register shift · 876a55b8
      Robert Jarzmik 提交于
      This aligns smc91x with its cousin, namely smc911x.c.
      This also allows the driver to run also in a device-tree based lubbock
      board build, on which it was tested.
      Signed-off-by: NRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      876a55b8
    • C
      cxgb4: fix -ve error check on a signed iq · 1cb1860d
      Colin Ian King 提交于
      iq is unsigned, so the error check for iq < 0 has no effect so errors
      can slip past this check.  Fix this by making iq signed and also
      get_filter_steerq return a signed int so a -ve error can be returned.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cb1860d
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · bce3414e
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for your net-next
      tree, they are:
      
      1) Consolidate GRE protocol tracker using new GRE protocol definitions,
         patches from Gao Feng.
      
      2) Properly parse continuation lines in SIP helper, update allowed
         characters in Call-ID header and allow tabs in SIP headers as
         specified by RFC3261, from Marco Angaroni.
      
      3) Remove useless code in FTP conntrack helper, also from Gao Feng.
      
      4) Add number generation expression for nf_tables, with random and
         incremental generators. This also includes specific offset to add
         to the result, patches from Laura Garcia Liebana. Liping Zhang
         follows with a fix to avoid a race in this new expression.
      
      5) Fix new quota expression inversion logic, added in the previous
         pull request.
      
      6) Missing validation of queue configuration in nft_queue, patch
         from Liping Zhang.
      
      7) Remove unused ctl_table_path, as part of the deprecation of the
         ip_conntrack sysctl interface coming in the previous batch.
         Again from Liping Zhang.
      
      8) Add offset attribute to nft_hash expression, so we can generate
         any output from a specific base offset. Moreover, check for
         possible overflow, patches from Laura Garcia.
      
      9) Allow to invert dynamic set insertion from packet path, to check
         for overflows in case the set is full.
      
      10) Revisit nft_set_pktinfo*() logic from nf_tables to ensure
          proper initialization of layer 4 protocol. Consolidate pktinfo
          structure initialization for bridge and netdev families.
      
      11) Do not inconditionally drop IPv6 packets that we cannot parse
          transport protocol for ip6 and inet families, let the user decide
          on this via ruleset policy.
      
      12) Get rid of gotos in __nf_ct_try_assign_helper().
      
      13) Check for return value in register_netdevice_notifier() and
          nft_register_chain_type(), patches from Gao Feng.
      
      14) Get rid of CONFIG_IP6_NF_IPTABLES dependency in nf_queue
          infrastructure that is common to nf_tables, from Liping Zhang.
      
      15) Disable 'found' and 'searched' stats that are updates from the
          packet hotpath, not very useful these days.
      
      16) Validate maximum value of u32 netlink attributes in nf_tables,
          this introduces nft_parse_u32_check(). From Laura Garcia.
      
      17) Add missing code to integrate nft_queue with maps, patch from
          Liping Zhang. This also includes missing support ranges in
          nft_queue bridge family.
      
      18) Fix check in nft_payload_fast_eval() that ensure that we don't
          go over the skbuff data boundary, from Liping Zhang.
      
      19) Check if transport protocol is set from nf_tables tracing and
          payload expression. Again from Liping Zhang.
      
      20) Use net_get_random_once() whenever possible, from Gao Feng.
      
      21) Replace hardcoded value by sizeof() in xt_helper, from Gao Feng.
      
      22) Remove superfluous check for found element in nft_lookup.
      
      23) Simplify TCPMSS logic to check for minimum MTU, from Gao Feng.
      
      24) Replace double linked list by single linked list in Netfilter
          core hook infrastructure, patchset from Aaron Conole. This
          includes several patches to prepare this update.
      
      25) Fix wrong sequence adjustment of TCP RST with no ACK, from
          Gao Feng.
      
      26) Relax check for direction attribute in nft_ct for layer 3 and 4
          protocol fields, from Liping Zhang.
      
      27) Add new revision for hashlimit to support higher pps of upto 1
          million, from Vishwanath Pai.
      
      28) Evict stale entries in nf_conntrack when reading entries from
          /proc/net/nf_conntrack, from Florian Westphal.
      
      29) Fix transparent match for IPv6 request sockets, from Krisztian
          Kovacs.
      
      30) Add new range expression for nf_tables.
      
      31) Add missing code to support for flags in nft_log. Expose NF_LOG_*
          flags via uapi and use it from the generic logging infrastructure,
          instead of using xt specific definitions, from Liping Zhang.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bce3414e
    • P
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next · f20fbc07
      Pablo Neira Ayuso 提交于
      Conflicts:
      	net/netfilter/core.c
      	net/netfilter/nf_tables_netdev.c
      
      Resolve two conflicts before pull request for David's net-next tree:
      
      1) Between c73c2484 ("netfilter: nf_tables_netdev: remove redundant
         ip_hdr assignment") from the net tree and commit ddc8b602
         ("netfilter: introduce nft_set_pktinfo_{ipv4, ipv6}_validate()").
      
      2) Between e8bffe0c ("net: Add _nf_(un)register_hooks symbols") and
         Aaron Conole's patches to replace list_head with single linked list.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f20fbc07
    • L
      netfilter: nf_log: get rid of XT_LOG_* macros · 8cb2a7d5
      Liping Zhang 提交于
      nf_log is used by both nftables and iptables, so use XT_LOG_XXX macros
      here is not appropriate. Replace them with NF_LOG_XXX.
      Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8cb2a7d5
    • L
      netfilter: nft_log: complete NFTA_LOG_FLAGS attr support · ff107d27
      Liping Zhang 提交于
      NFTA_LOG_FLAGS attribute is already supported, but the related
      NF_LOG_XXX flags are not exposed to the userspace. So we cannot
      explicitly enable log flags to log uid, tcp sequence, ip options
      and so on, i.e. such rule "nft add rule filter output log uid"
      is not supported yet.
      
      So move NF_LOG_XXX macro definitions to the uapi/../nf_log.h. In
      order to keep consistent with other modules, change NF_LOG_MASK to
      refer to all supported log flags. On the other hand, add a new
      NF_LOG_DEFAULT_MASK to refer to the original default log flags.
      
      Finally, if user specify the unsupported log flags or NFTA_LOG_GROUP
      and NFTA_LOG_FLAGS are set at the same time, report EINVAL to the
      userspace.
      Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ff107d27
    • P
      netfilter: nf_tables: add range expression · 0f3cd9b3
      Pablo Neira Ayuso 提交于
      Inverse ranges != [a,b] are not currently possible because rules are
      composites of && operations, and we need to express this:
      
      	data < a || data > b
      
      This patch adds a new range expression. Positive ranges can be already
      through two cmp expressions:
      
      	cmp(sreg, data, >=)
      	cmp(sreg, data, <=)
      
      This new range expression provides an alternative way to express this.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0f3cd9b3
    • K
      netfilter: xt_socket: fix transparent match for IPv6 request sockets · 7a682575
      KOVACS Krisztian 提交于
      The introduction of TCP_NEW_SYN_RECV state, and the addition of request
      sockets to the ehash table seems to have broken the --transparent option
      of the socket match for IPv6 (around commit a9407000).
      
      Now that the socket lookup finds the TCP_NEW_SYN_RECV socket instead of the
      listener, the --transparent option tries to match on the no_srccheck flag
      of the request socket.
      
      Unfortunately, that flag was only set for IPv4 sockets in tcp_v4_init_req()
      by copying the transparent flag of the listener socket. This effectively
      causes '-m socket --transparent' not match on the ACK packet sent by the
      client in a TCP handshake.
      
      Based on the suggestion from Eric Dumazet, this change moves the code
      initializing no_srccheck to tcp_conn_request(), rendering the above
      scenario working again.
      
      Fixes: a9407000 ("netfilter: xt_socket: prepare for TCP_NEW_SYN_RECV support")
      Signed-off-by: NAlex Badics <alex.badics@balabit.com>
      Signed-off-by: NKOVACS Krisztian <hidden@balabit.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      7a682575
  2. 25 9月, 2016 31 次提交
    • F
      netfilter: evict stale entries when user reads /proc/net/nf_conntrack · 58e207e4
      Florian Westphal 提交于
      Fabian reports a possible conntrack memory leak (could not reproduce so
      far), however, one minor issue can be easily resolved:
      
      > cat /proc/net/nf_conntrack | wc -l = 5
      > 4 minutes required to clean up the table.
      
      We should not report those timed-out entries to the user in first place.
      And instead of just skipping those timed-out entries while iterating over
      the table we can also zap them (we already do this during ctnetlink
      walks, but I forgot about the /proc interface).
      
      Fixes: f330a7fd ("netfilter: conntrack: get rid of conntrack timer")
      Reported-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      58e207e4
    • V
      netfilter: xt_hashlimit: Create revision 2 to support higher pps rates · 11d5f157
      Vishwanath Pai 提交于
      Create a new revision for the hashlimit iptables extension module. Rev 2
      will support higher pps of upto 1 million, Version 1 supports only 10k.
      
      To support this we have to increase the size of the variables avg and
      burst in hashlimit_cfg to 64-bit. Create two new structs hashlimit_cfg2
      and xt_hashlimit_mtinfo2 and also create newer versions of all the
      functions for match, checkentry and destroy.
      
      Some of the functions like hashlimit_mt, hashlimit_mt_check etc are very
      similar in both rev1 and rev2 with only minor changes, so I have split
      those functions and moved all the common code to a *_common function.
      Signed-off-by: NVishwanath Pai <vpai@akamai.com>
      Signed-off-by: NJoshua Hunt <johunt@akamai.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      11d5f157
    • V
      netfilter: xt_hashlimit: Prepare for revision 2 · 0dc60a45
      Vishwanath Pai 提交于
      I am planning to add a revision 2 for the hashlimit xtables module to
      support higher packets per second rates. This patch renames all the
      functions and variables related to revision 1 by adding _v1 at the
      end of the names.
      Signed-off-by: NVishwanath Pai <vpai@akamai.com>
      Signed-off-by: NJoshua Hunt <johunt@akamai.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0dc60a45
    • L
      netfilter: nft_ct: report error if mark and dir specified simultaneously · 7bfdde70
      Liping Zhang 提交于
      NFT_CT_MARK is unrelated to direction, so if NFTA_CT_DIRECTION attr is
      specified, report EINVAL to the userspace. This validation check was
      already done at nft_ct_get_init, but we missed it in nft_ct_set_init.
      Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      7bfdde70
    • L
      netfilter: nft_ct: unnecessary to require dir when use ct l3proto/protocol · d767ff2c
      Liping Zhang 提交于
      Currently, if the user want to match ct l3proto, we must specify the
      direction, for example:
        # nft add rule filter input ct original l3proto ipv4
                                       ^^^^^^^^
      Otherwise, error message will be reported:
        # nft add rule filter input ct l3proto ipv4
        nft add rule filter input ct l3proto ipv4
        <cmdline>:1:1-38: Error: Could not process rule: Invalid argument
        add rule filter input ct l3proto ipv4
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
      Actually, there's no need to require NFTA_CT_DIRECTION attr, because
      ct l3proto and protocol are unrelated to direction.
      
      And for compatibility, even if the user specify the NFTA_CT_DIRECTION
      attr, do not report error, just skip it.
      Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d767ff2c
    • G
      netfilter: seqadj: Fix the wrong ack adjust for the RST packet without ack · 8d11350f
      Gao Feng 提交于
      It is valid that the TCP RST packet which does not set ack flag, and bytes
      of ack number are zero. But current seqadj codes would adjust the "0" ack
      to invalid ack number. Actually seqadj need to check the ack flag before
      adjust it for these RST packets.
      
      The following is my test case
      
      client is 10.26.98.245, and add one iptable rule:
      iptables  -I INPUT -p tcp --sport 12345 -m connbytes --connbytes 2:
      --connbytes-dir reply --connbytes-mode packets -j REJECT --reject-with
      tcp-reset
      This iptables rule could generate on TCP RST without ack flag.
      
      server:10.172.135.55
      Enable the synproxy with seqadjust by the following iptables rules
      iptables -t raw -A PREROUTING -i eth0 -p tcp -d 10.172.135.55 --dport 12345
      -m tcp --syn -j CT --notrack
      
      iptables -A INPUT -i eth0 -p tcp -d 10.172.135.55 --dport 12345 -m conntrack
      --ctstate INVALID,UNTRACKED -j SYNPROXY --sack-perm --timestamp --wscale 7
      --mss 1460
      iptables -A OUTPUT -o eth0 -p tcp -s 10.172.135.55 --sport 12345 -m conntrack
      --ctstate INVALID,UNTRACKED -m tcp --tcp-flags SYN,RST,ACK SYN,ACK -j ACCEPT
      
      The following is my test result.
      
      1. packet trace on client
      root@routers:/tmp# tcpdump -i eth0 tcp port 12345 -n
      tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
      listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
      IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [S], seq 3695959829,
      win 29200, options [mss 1460,sackOK,TS val 452367884 ecr 0,nop,wscale 7],
      length 0
      IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [S.], seq 546723266,
      ack 3695959830, win 0, options [mss 1460,sackOK,TS val 15643479 ecr 452367884,
      nop,wscale 7], length 0
      IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [.], ack 1, win 229,
      options [nop,nop,TS val 452367885 ecr 15643479], length 0
      IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [.], ack 1, win 226,
      options [nop,nop,TS val 15643479 ecr 452367885], length 0
      IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [R], seq 3695959830,
      win 0, length 0
      
      2. seqadj log on server
      [62873.867319] Adjusting sequence number from 602341895->546723267,
      ack from 3695959830->3695959830
      [62873.867644] Adjusting sequence number from 602341895->546723267,
      ack from 3695959830->3695959830
      [62873.869040] Adjusting sequence number from 3695959830->3695959830,
      ack from 0->55618628
      
      To summarize, it is clear that the seqadj codes adjust the 0 ack when receive
      one TCP RST packet without ack.
      Signed-off-by: NGao Feng <fgao@ikuai8.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8d11350f
    • A
      netfilter: replace list_head with single linked list · e3b37f11
      Aaron Conole 提交于
      The netfilter hook list never uses the prev pointer, and so can be trimmed to
      be a simple singly-linked list.
      
      In addition to having a more light weight structure for hook traversal,
      struct net becomes 5568 bytes (down from 6400) and struct net_device becomes
      2176 bytes (down from 2240).
      Signed-off-by: NAaron Conole <aconole@bytheb.org>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e3b37f11
    • D
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · fe0acb5f
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2016-09-24
      
      This series contains updates to i40e and i40evf only.
      
      Harshitha removes the ability to set or advertise X722 to 100 Mbps,
      since it is not supported, so we should not be able to advertise or
      set the NIC to 100 Mbps.
      
      Alan fixes an issue where deleting a MAC filter did not really delete the
      filter in question.  The reason being that the wrong cmd_flag is passed to
      the firmware.
      
      Preethi adds the encapsulation checksum offload negotiation flag, so that
      we can control it.
      
      Jake cleans up the ATR auto_disable_flags use, since some locations
      disable ATR accidentally using the "full" disable by disabling the flag
      in the standard flags field.  This permanently forces ATR off instead of
      temporarily disabling it.  Then updated checks to include when there are
      TCP/IP4 sideband rules in effect, where ATR should be disabled.  Lastly,
      adds support to the i40evf driver for setting interrupt moderation values
      per queue, like in i40e.
      
      Henry cleans up unreachable code, since i40e_shutdown_adminq() is always
      true.
      
      Mitch enables support for adaptive interrupt throttling, since all the
      code for it is already in the interrupt handler.  The fixes a rare
      case where we might get a VSI with no queues and we try to configure
      RSS, which would result in a divide by zero.
      
      Alex fixes an issue where transmit cleanup flow was incorrectly assuming
      it could check for the flow director bits after it had unmapped the
      buffer.  Then adds a txring_txq() to allow us to convert a i40e_ring/
      i40evf_ring to a netdev_tx_queue structure, like ixgbe and fm10k.  This
      avoids having to make a multi-line function call for all the areas that
      need access to it.  Re-factors the Flow Director filter configuration
      out into a separate function, like we did for the standard xmit path.
      Cleans up the debugfs hook for Flow Director since it was meant for
      debug only.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe0acb5f
    • D
      Merge tag 'rxrpc-rewrite-20160924' of... · 21445c91
      David S. Miller 提交于
      Merge tag 'rxrpc-rewrite-20160924' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      David Howells says:
      
      ====================
      rxrpc: Implement slow-start and other bits
      
      This set of patches implements the RxRPC slow-start feature for AF_RXRPC to
      improve performance and handling of occasional packet loss.  This is more or
      less the same as TCP slow start [RFC 5681].  Firstly, there are some ACK
      generation improvements:
      
       (1) Send ACKs regularly to apprise the peer of our state so that they can do
           congestion management of their own.
      
       (2) Send an ACK when we fill in a hole in the buffer so that the peer can
           find out that we did this thus forestalling retransmission.
      
       (3) Note the final DATA packet's serial number in the final ACK for
           correlation purposes.
      
      and a couple of bug fixes:
      
       (4) Reinitialise the ACK state and clear the ACK and resend timers upon
           entering the client reply reception phase to kill off any pending probe
           ACKs.
      
       (5) Delay the resend timer to allow for nsec->jiffies conversion errors.
      
      and then there's the slow-start pieces:
      
       (6) Summarise an ACK.
      
       (7) Schedule a PING or IDLE ACK if the reply to a client call is overdue to
           try and find out what happened to it.
      
       (8) Implement the slow start feature.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21445c91
    • J
      i40evf: support queue-specific settings for interrupt moderation · 65e87c03
      Jacob Keller 提交于
      In commit a75e8005 ("i40e: queue-specific settings for interrupt
      moderation") the i40e driver gained support for setting interrupt
      moderation values per queue. This patch adds support for this feature
      to the i40evf driver as well. In addition, a few changes are made to
      the i40e implementation to add function header documentation comments,
      as well.
      
      This behaves in a similar fashion to the implementation in i40e. Thus,
      requesting the moderation value when no queue is provided will report
      queue 0 value, while setting the value without a queue will set all
      queues at once.
      
      Change-ID: I1f310a57c8e6c84a8524c178d44d1b7a6d3a848e
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      65e87c03
    • M
      i40e: don't configure zero-size RSS table · a4fa59cc
      Mitch Williams 提交于
      In some rare cases, we might get a VSI with no queues. In this case, we
      cannot configure RSS on this VSI as it will try to divide by zero when
      configuring the lookup table.
      
      Change-ID: I6ae173a7dd3481a081e079eb10eb80275de2adb0
      Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      a4fa59cc
    • A
      i40e: Strip out debugfs hook for Flow Director filter programming · 1eb846ac
      Alexander Duyck 提交于
      This  interface was only ever meant for debug only. Since it is not
      supposed to be here we are removing it.
      
      Change-ID: Id771a1e5e7d3e2b4b7f56591b61fb48c921e1d04
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1eb846ac
    • A
      i40e: Split Flow Director descriptor config into separate function · 5e02f283
      Alexander Duyck 提交于
      In an effort to improve code readability I am splitting the Flow Director
      filter configuration out into a separate function like we have done for the
      standard xmit path.  The general idea is to provide a single block of code
      that translates the flow specification into a proper Flow Director
      descriptor.
      
      Change-ID: Id355ad8030c4e6c72c57504fa09de60c976a8ffe
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      5e02f283
    • A
      i40e/i40evf: Add txring_txq function to match fm10k and ixgbe · e486bdfd
      Alexander Duyck 提交于
      This patch adds a txring_txq function which allows us to convert a
      i40e_ring/i40evf_ring to a netdev_tx_queue structure.  This way we
      can avoid having to make a multi-line function call for all the spots
      that need access to this.
      
      Change-ID: Ic063b71d8b92ea406d2c32e798c8e2b02809d65b
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      e486bdfd
    • A
      i40e: Fix Flow Director raw_buf cleanup · 64bfd68e
      Alexander Duyck 提交于
      The Tx cleanup flow was incorrectly assuming it could check for the flow
      director bits after it had unmapped the buffer.  However in this case it
      results in us trying to free a raw_buf as though it is an sk_buff.
      
      To fix this I am moving up the flag test for the FD_SB bit so that when
      find a non-NULL skb or raw_buf value we then check the flag and use the
      appropriate call to free the buffer.
      
      Change-ID: I6284034ba1ea87c9922e56f6eb3181f7f09bddde
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      64bfd68e
    • M
      i40evf: enable adaptive interrupt throttling · f19a973f
      Mitch Williams 提交于
      All of the code to support adaptive interrupt throttling is already in
      the interrupt handler, it just needs to be enabled. Fill out the data
      structures properly to make it happen. Single-flow traffic tests may
      show slightly lower throughput, but interrupts per second will drop by
      about 75%.
      
      Change-ID: I9cd7d42c025b906bf1bb85c6aeb6112684aa6471
      Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      f19a973f
    • A
      i40e: Increase minimum number of allocated VSI · 7ac4b5c6
      Akeem Abodunrin 提交于
      This patch increases minimum number of allocated VSIs, so as to resolve
      failure adding VSI for VF when 64-VFs assigned to a PF. The driver
      supports up to 128 VFs per device, users can decide to enable up to
      64-VFs on a single PF, especially 2 X 40 devices. In that scenario, with
      VMDq co-existence, there would be starvation of VSIs - with this patch,
      supported features would have enough VSIs for configuration now.
      
      Change-ID: If084f4cd823667af8fe7fdc11489c705b32039d5
      Signed-off-by: NAkeem Abodunrin <akeem.g.abodunrin@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      7ac4b5c6
    • B
    • H
      i40e: removing unreachable code · ac9c5c6d
      Henry Tieman 提交于
      The return value from i40e_shutdown_adminq() is always 0
      (I40E_SUCCESS). So, the test for non-0 will never be true. Cleanup
      by removing the test and debug print statement.
      
      Change-ID: Ie51e8e37515c3e3a6a9ff26fa951d0e5e24343c1
      Signed-off-by: NHenry Tieman <henry.w.tieman@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      ac9c5c6d
    • J
      i40e: check conflicting ntuple/sideband rules when re-enabling ATR · a3417d28
      Jacob Keller 提交于
      In i40e_fdir_check_and_reenable(), the driver performs some checks to
      determine whether it is safe to re-enable FD Sideband and FD ATR
      support. The current check will only determine if there is available
      space in the flow director table. However, this ignores the fact that
      ATR should be disabled when there are TCP/IPv4 sideband rules in effect.
      Add the missing check, and update the info message printed when
      I40E_DEBUG_FD is enabled.
      
      Change-ID: Ibb9c63e5be95d63c53a498fdd5dbf69f54a00e08
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      a3417d28
    • J
      i40e: cleanup ATR auto_disable_flags use · 234dc4e6
      Jacob Keller 提交于
      Some locations that disable ATR accidentally used the "full" disable by
      disabling the flag in the standard flags field. This incorrectly forces
      ATR off permanently instead of temporarily disabling it. In addition,
      some code locations accidentally set the ATR flag enabled when they only
      meant to clear the auto_disable_flags. This results in ignoring the
      user's ethtool private flag settings.
      
      Additionally, when disabling ATR via ethtool, we did not perform a flush
      of the FD table. This results in the previously assigned ATR rules still
      functioning which was not expected.
      
      Cleanup all these areas so that automatic disable uses only the
      auto_disable_flag. Fix the flush code so that we can trigger a flush
      even when we've disabled ATR and SB support, as otherwise the flush
      doesn't work. Fix ethtool setting to actually request a flush. Fix
      NETIF_F_NTUPLE flag to only clear the auto_disable setting and not
      enable the full feature.
      
      Change-ID: Ib2486111f8031bd16943e9308757b276305c03b5
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      234dc4e6
    • P
      i40e: add encap csum VF offload flag · 2199254c
      Preethi Banala 提交于
      Add ENCAP_CSUM offload negotiation flag. Currently VF assumes checksum
      offload for encapsulated packets is supported by default. Going forward,
      this feature needs to be negotiated with PF before advertising to the
      stack. Hence, we need a flag to control it.
      This is in regards to prepping up for VF base mode functionality support.
      
      Change-ID: Iaab1f25cc0abda5f2fbe3309092640f0e77d163e
      Signed-off-by: NPreethi Banala <preethi.banala@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      2199254c
    • A
      i40e: fix deleting mac filters · a6cb9146
      Alan Brady 提交于
      There exists a bug in which deleting a mac filter does not actually
      occur.  The driver reports that the filter has been deleted with no
      error.  The problem occurs because the wrong cmd_flag is passed to the
      firmware when deleting the filter.  The firmware reports an error back
      to the driver but it is expressly ignored.
      
      This fixes the bug by using the correct flag when deleting a filter.
      Without this patch, deleted filters remain in firmware and function as
      if they had not been deleted.
      
      Change-ID: I5f22b874f3b83f457702f18f0d5602ca21ac40c3
      Signed-off-by: NAlan Brady <alan.brady@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      a6cb9146
    • H
      i40e: Remove 100 Mbps SGMII support for X722 · f2c7c1d0
      Harshitha Ramamurthy 提交于
      This patch fixes the problem where driver shows 100 Mbps as a supported speed,
      and allows it to be configured for advertising on X722 devices. This patch
      fixes the problem by not setting the 100 Mbps SGMII flag for X722 devices.
      
      Without this patch, the user incorrectly thinks that 100 Mbps is supported
      and hence might try to advertise it on X722 devices when it is actually not
      a supported speed.
      
      Change-ID: I8c3d7c4251a9402d98994ed29749b7b895a0f205
      Signed-off-by: NHarshitha Ramamurthy <harshitha.ramamurthy@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      f2c7c1d0
    • L
      gre: use nla_get_be32() to extract flowinfo · c2675de4
      Lance Richardson 提交于
      Eliminate a sparse endianness mismatch warning, use nla_get_be32() to
      extract a __be32 value instead of nla_get_u32().
      Signed-off-by: NLance Richardson <lrichard@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2675de4
    • A
      netfilter: nf_queue: whitespace cleanup · 54f17bbc
      Aaron Conole 提交于
      A future patch will modify the hook drop and outfn functions.  This will
      cause the line lengths to take up too much space.  This is simply a
      readability change.
      Signed-off-by: NAaron Conole <aconole@bytheb.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      54f17bbc
    • D
      rxrpc: Implement slow-start · 57494343
      David Howells 提交于
      Implement RxRPC slow-start, which is similar to RFC 5681 for TCP.  A
      tracepoint is added to log the state of the congestion management algorithm
      and the decisions it makes.
      
      Notes:
      
       (1) Since we send fixed-size DATA packets (apart from the final packet in
           each phase), counters and calculations are in terms of packets rather
           than bytes.
      
       (2) The ACK packet carries the equivalent of TCP SACK.
      
       (3) The FLIGHT_SIZE calculation in RFC 5681 doesn't seem particularly
           suited to SACK of a small number of packets.  It seems that, almost
           inevitably, by the time three 'duplicate' ACKs have been seen, we have
           narrowed the loss down to one or two missing packets, and the
           FLIGHT_SIZE calculation ends up as 2.
      
       (4) In rxrpc_resend(), if there was no data that apparently needed
           retransmission, we transmit a PING ACK to ask the peer to tell us what
           its Rx window state is.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      57494343
    • D
      rxrpc: Schedule an ACK if the reply to a client call appears overdue · 0d967960
      David Howells 提交于
      If we've sent all the request data in a client call but haven't seen any
      sign of the reply data yet, schedule an ACK to be sent to the server to
      find out if the reply data got lost.
      
      If the server hasn't yet hard-ACK'd the request data, we send a PING ACK to
      demand a response to find out whether we need to retransmit.
      
      If the server says it has received all of the data, we send an IDLE ACK to
      tell the server that we haven't received anything in the receive phase as
      yet.
      
      To make this work, a non-immediate PING ACK must carry a delay.  I've chosen
      the same as the IDLE ACK for the moment.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      0d967960
    • D
      rxrpc: Generate a summary of the ACK state for later use · 31a1b989
      David Howells 提交于
      Generate a summary of the Tx buffer packet state when an ACK is received
      for use in a later patch that does congestion management.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      31a1b989
    • D
      rxrpc: Delay the resend timer to allow for nsec->jiffies conv error · df0562a7
      David Howells 提交于
      When determining the resend timer value, we have a value in nsec but the
      timer is in jiffies which may be a million or more times more coarse.
      nsecs_to_jiffies() rounds down - which means that the resend timeout
      expressed as jiffies is very likely earlier than the one expressed as
      nanoseconds from which it was derived.
      
      The problem is that rxrpc_resend() gets triggered by the timer, but can't
      then find anything to resend yet.  It sets the timer again - but gets
      kicked off immediately again and again until the nanosecond-based expiry
      time is reached and we actually retransmit.
      
      Fix this by adding 1 to the jiffies-based resend_at value to counteract the
      rounding and make sure that the timer happens after the nanosecond-based
      expiry is passed.
      
      Alternatives would be to adjust the timestamp on the packets to align
      with the jiffie scale or to switch back to using jiffie-timestamps.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      df0562a7
    • D
      rxrpc: Reinitialise the call ACK and timer state for client reply phase · dd7c1ee5
      David Howells 提交于
      Clear the ACK reason, ACK timer and resend timer when entering the client
      reply phase when the first DATA packet is received.  New ACKs will be
      proposed once the data is queued.
      
      The resend timer is no longer relevant and we need to cancel ACKs scheduled
      to probe for a lost reply.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      dd7c1ee5