1. 08 12月, 2016 21 次提交
  2. 07 12月, 2016 19 次提交
    • D
      Merge branch 'bnxt_en-RDMA' · d3243aef
      David S. Miller 提交于
      Michael Chan says:
      
      ====================
      bnxt_en: Add interface to support RDMA driver.
      
      This series adds an interface to support a brand new RDMA driver bnxt_re.
      The first step is to re-arrange some code so that pci_enable_msix() can
      be called during pci probe.  The purpose is to allow the RDMA driver to
      initialize and stay initialized whether the netdev is up or down.
      
      Then we make some changes to VF resource allocation so that there is
      enough resources to support RDMA.
      
      Finally the last patch adds a simple interface to allow the RDMA driver to
      probe and register itself with any bnxt_en devices that support RDMA.
      Once registered, the RDMA driver can request MSIX, send fw messages, and
      receive some notifications.
      
      v2: Fixed kbuild test robot warnings.
      
      David, please consider this series for net-next.  Thanks.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3243aef
    • M
      bnxt_en: Add interface to support RDMA driver. · a588e458
      Michael Chan 提交于
      Since the network driver and RDMA driver operate on the same PCI function,
      we need to create an interface to allow the RDMA driver to share resources
      with the network driver.
      
      1. Create a new bnxt_en_dev struct which will be returned by
      bnxt_ulp_probe() upon success.  After that, all calls from the RDMA driver
      to bnxt_en will pass a pointer to this struct.
      
      2. This struct contains additional function pointers to register, request
      msix, send fw messages, register for async events.
      
      3. If the RDMA driver wants to enable RDMA on the function, it needs to
      call the function pointer bnxt_register_device().  A ulp_ops structure
      is passed for RCU protected upcalls from bnxt_en to the RDMA driver.
      
      4. The RDMA driver can call firmware APIs using the bnxt_send_fw_msg()
      function pointer.
      
      5. 1 stats context is reserved when the RDMA driver registers.  MSIX
      and completion rings are reserved when the RDMA driver calls
      bnxt_request_msix() function pointer.
      
      6. When the RDMA driver calls bnxt_unregister_device(), all RDMA resources
      will be cleaned up.
      
      v2: Fixed 2 uninitialized variable warnings.
      Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a588e458
    • M
      bnxt_en: Refactor the driver registration function with firmware. · a1653b13
      Michael Chan 提交于
      The driver register function with firmware consists of passing version
      information and registering for async events.  To support the RDMA driver,
      the async events that we need to register may change.  Separate the
      driver register function into 2 parts so that we can just update the
      async events for the RDMA driver.
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1653b13
    • M
      bnxt_en: Reserve RDMA resources by default. · e4060d30
      Michael Chan 提交于
      If the device supports RDMA, we'll setup network default rings so that
      there are enough minimum resources for RDMA, if possible.  However, the
      user can still increase network rings to the max if he wants.  The actual
      RDMA resources won't be reserved until the RDMA driver registers.
      
      v2: Fix compile warning when BNXT_CONFIG_SRIOV is not set.
      Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4060d30
    • M
      bnxt_en: Improve completion ring allocation for VFs. · 7b08f661
      Michael Chan 提交于
      All available remaining completion rings not used by the PF should be
      made available for the VFs so that there are enough rings in the VF to
      support RDMA.  The earlier workaround code of capping the rings by the
      statistics context is removed.
      
      When SRIOV is disabled, call a new function bnxt_restore_pf_fw_resources()
      to restore FW resources.  Later on we need to add some logic to account
      for RDMA resources.
      Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b08f661
    • M
      bnxt_en: Move function reset to bnxt_init_one(). · aa8ed021
      Michael Chan 提交于
      Now that MSIX is enabled in bnxt_init_one(), resources may be allocated by
      the RDMA driver before the network device is opened.  So we cannot do
      function reset in bnxt_open() which will clear all the resources.
      
      The proper place to do function reset now is in bnxt_init_one().
      If we get AER, we'll do function reset as well.
      Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa8ed021
    • M
      bnxt_en: Enable MSIX early in bnxt_init_one(). · 7809592d
      Michael Chan 提交于
      To better support the new RDMA driver, we need to move pci_enable_msix()
      from bnxt_open() to bnxt_init_one().  This way, MSIX vectors are available
      to the RDMA driver whether the network device is up or down.
      
      Part of the existing bnxt_setup_int_mode() function is now refactored into
      a new bnxt_init_int_mode().  bnxt_init_int_mode() is called during
      bnxt_init_one() to enable MSIX.  The remaining logic in
      bnxt_setup_int_mode() to map the IRQs to the completion rings is called
      during bnxt_open().
      
      v2: Fixed compile warning when CONFIG_BNXT_SRIOV is not set.
      Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7809592d
    • M
      bnxt_en: Add bnxt_set_max_func_irqs(). · 33c2657e
      Michael Chan 提交于
      By refactoring existing code into this new function.  The new function
      will be used in subsequent patches.
      
      v2: Fixed compile warning when CONFIG_BNXT_SRIOV is not set.
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33c2657e
    • E
      net: sock_rps_record_flow() is for connected sockets · 5b8e2f61
      Eric Dumazet 提交于
      Paolo noticed a cache line miss in UDP recvmsg() to access
      sk_rxhash, sharing a cache line with sk_drops.
      
      sk_drops might be heavily incremented by cpus handling a flood targeting
      this socket.
      
      We might place sk_drops on a separate cache line, but lets try
      to avoid wasting 64 bytes per socket just for this, since we have
      other bottlenecks to take care of.
      
      sock_rps_record_flow() should only access sk_rxhash for connected
      flows.
      
      Testing sk_state for TCP_ESTABLISHED covers most of the cases for
      connected sockets, for a zero cost, since system calls using
      sock_rps_record_flow() also access sk->sk_prot which is on the
      same cache line.
      
      A follow up patch will provide a static_key (Jump Label) since most
      hosts do not even use RFS.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b8e2f61
    • J
      i40e: move all updates for VLAN mode into i40e_sync_vsi_filters · 489a3265
      Jacob Keller 提交于
      In a similar fashion to how we handled exiting VLAN mode, move the logic
      in i40e_vsi_add_vlan into i40e_sync_vsi_filters. Extract this logic into
      its own function for ease of understanding as it will become quite
      complex.
      
      The new function, i40e_correct_mac_vlan_filters() correctly updates all
      filters for when we need to enter VLAN mode, exit VLAN mode, and also
      enforces the PVID when assigned.
      
      Call i40e_correct_mac_vlan_filters from i40e_sync_vsi_filters passing it
      the number of active VLAN filters, and the two temporary lists.
      
      Remove the function for updating VLAN=0 filters from i40e_vsi_add_vlan.
      
      The end result is that the logic for entering and exiting VLAN mode is
      in one location which has the most knowledge about all filters. This
      ensures that we always correctly have the non-VLAN filters assigned to
      VID=0 or VID=-1 regardless of how we ended up getting to this result.
      
      Additionally this enforces the PVID at sync time so that we know for
      certain that an assigned PVID results in only filters with that PVID
      will be added to the firmware.
      
      Change-ID: I895cee81e9c92d0a16baee38bd0ca51bbb14e372
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      489a3265
    • J
      i40e: use (add|rm)_vlan_all_mac helper functions when changing PVID · 9af52f60
      Jacob Keller 提交于
      The current flow for adding or updating the PVID for a VF uses
      i40e_vsi_add_vlan and i40e_vsi_kill_vlan which each take, then release
      the hash lock. In addition the two functions also must take special care
      that they do not perform VLAN mode changes as this will make the code in
      i40e_ndo_set_vf_port_vlan behave incorrectly.
      
      Fix these issues by using the new helper functions i40e_add_vlan_all_mac
      and i40e_rm_vlan_all_mac which expect the hash lock to already be taken.
      Additionally these functions do not perform any state updates in regards
      to VLAN mode, so they are safe to use in the PVID update flow.
      
      It should be noted that we don't need the VLAN mode update code here,
      because there are only a few flows here.
      
      (a) we're adding a new PVID
        In this case, if we already had VLAN filters the VSI is knocked
        offline so we don't need to worry about pre-existing VLAN filters
      
      (b) we're replacing an existing PVID
        In this case, we can't have any VLAN filters except those with the old
        PVID which we already take care of manually.
      
      (c) we're removing an existing PVID
        Similarly to above, we can't have any existing VLAN filters except
        those with the old PVID which we already take care of correctly.
      
      Because of this, we do not need (or even want) the special accounting
      done in i40e_vsi_add_vlan, so use of the helpers is a saner alternative.
      It also opens the door for a future patch which will refactor the flow
      of i40e_vsi_add_vlan now that it is not needed in this function.
      
      Change-ID: Ia841f63da94e12b106f41cf7d28ce8ce92f2ad99
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      9af52f60
    • J
      i40e: factor out addition/deletion of VLAN per each MAC address · 490a4ad3
      Jacob Keller 提交于
      A future refactor of how the PF assigns a PVID to a VF will want to be
      able to add and remove a block of filters by VLAN without worrying about
      accidentally triggering the accounting for I40E_VLAN_ANY. Additionally
      the PVID assignment would like to be able to batch several changes under
      one use of the mac_filter_hash_lock.
      
      Factor out the addition and deletion of a VLAN on all MACs into their
      own function which i40e_vsi_(add|kill)_vlan can use. These new functions
      expect the caller to take the hash lock, as well as perform any
      necessary accounting for updating I40E_VLAN_ANY filters if we are now
      operating under VLAN mode.
      
      Change-ID: If79e5b60b770433275350a74b3f1880333a185d5
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      490a4ad3
    • J
      i40e: delete filter after adding its replacement when converting · 75697025
      Jacob Keller 提交于
      Fix a subtle issue with the code for converting VID=-1 filters into VID=0
      filters when adding a new VLAN. Previously the code deleted the VID=-1
      filter, and then added a new VID=0 filter. In the rare case that the
      addition fails due to -ENOMEM, we end up completely deleting the filter
      which prevents recovery if memory pressure subsides. While it is not
      strictly an issue because it is likely that memory issues would result
      in many other problems, we shouldn't delete the filter until after the
      addition succeeds.
      
      Change-ID: Icba07ddd04ecc6a3b27c2e29f2c1c8673d266826
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      75697025
    • J
      i40e: refactor i40e_update_filter_state to avoid passing aq_err · ac9e2390
      Jacob Keller 提交于
      The current caller of i40e_update_filter_state incorrectly passes
      aq_ret, an i40e_status variable, instead of the expected aq_err. This
      happens to work because i40e_status is actually just a typedef integer,
      and 0 is still the successful return. However i40e_update_filter_state
      has special handling for ENOSPC which is currently being ignored.
      
      Also notice that firmware does not update the per-filter response for
      many types of errors, such as EINVAL. Thus, modify the filter setup so
      that the firmware response memory is pre-set with I40E_AQC_MM_ERR_NO_RES.
      
      This enables us to refactor i40e_update_filter_state, removing the need
      to pass aq_err and avoiding a need for having 3 different flows for
      checking the filter state.
      
      The resulting code for i40e_update_filter_state is much simpler, only
      a single loop and we always check each filter response value every time.
      Since we pre-set the response value to match our expected error this
      correctly works for all success and error flows.
      
      Change-ID: Ie292c9511f34ee18c6ef40f955ad13e28b7aea7d
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      ac9e2390
    • J
      i40e: recalculate vsi->active_filters from hash contents · 38326218
      Jacob Keller 提交于
      Previous code refactors have accidentally caused issues with the
      counting of active_filters. Avoid similar issues in the future by simply
      re-counting the active filters every time after we handle add and delete
      of all the filters. Additionally this allows us to simplify the check
      for when we exit promiscuous mode since we can combine the check for
      failed filters at the same time.
      
      Additionally since we recount filters at the end we need to set
      vsi->promisc_threshold as well.
      
      The resulting code takes a bit longer since we do have to loop over
      filters again. However, the result is more readable and less likely to
      become incorrect due to failed accounting of filters in the future.
      Finally, this ensures that it is not possible for vsi->active_filters to
      ever underflow since we never decrement it.
      
      Change-ID: Ib4f3a377e60eb1fa6c91ea86cc02238c08edd102
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      38326218
    • J
      i40e: defeature support for PTP L4 frame detection on XL710 · 1e28e861
      Jacob Keller 提交于
      A product decision has been made to defeature detection of PTP frames
      over L4 (UDP) on the XL710 MAC. Do not advertise support for L4
      timestamping.
      
      Change-ID: I41fbb0f84ebb27c43e23098c08156f2625c6ee06
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1e28e861
    • M
      i40e: lock service task correctly · 91089033
      Mitch Williams 提交于
      The service task lock was being set in the scheduling function, not the
      actual service task. This would potentially leave the bit set for a long
      time before the task actually ran. Furthermore, if the service task
      takes too long, it calls the schedule function to reschedule itself -
      which would fail to take the lock and do nothing.
      
      Instead, set and clear the lock bit in the service task itself. In the
      process, get rid of the i40e_service_event_complete() function, which is
      really just two lines of code that can be put right in the service task
      itself.
      
      Change-ID: I83155e682b686121e2897f4429eb7d3f7c669168
      Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      91089033
    • M
      i40e: Add functions which apply correct PHY access method for read and write operation · f62ba914
      Michal Kosiarz 提交于
      Depending on external PHY type, register access method should be
      different. Clause22 or Clause45 can be chosen for different PHYs.
      Implemented functions apply correct access method for used device.
      
      Change-ID: If39d5f0da9c0b905a8cbdc1ab89885535e7d0426
      Signed-off-by: NMichal Kosiarz <michal.kosiarz@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      f62ba914
    • C
      i40e: Add FEC for 25g · 60f000a4
      Carolyn Wyborny 提交于
      This patch adds adminq support for Forward Error
      Correction ("FEC")for 25g products.
      
      Change-ID: Iaff4910737c239d2c730e5c22a313ce9c37d3964
      Signed-off-by: NCarolyn Wyborny <carolyn.wyborny@intel.com>
      Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
      Signed-off-by: NJacek Naczyk <jacek.naczyk@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      60f000a4