1. 11 1月, 2014 1 次提交
    • J
      net: core: explicitly select a txq before doing l2 forwarding · f663dd9a
      Jason Wang 提交于
      Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
      will cause several issues:
      
      - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
        instead of lower device which misses the necessary txq synchronization for
        lower device such as txq stopping or frozen required by dev watchdog or
        control path.
      - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
        watchdog.
      - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
        when tso is disabled for lower device.
      
      Fix this by explicitly introducing a new param for .ndo_select_queue() for just
      selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
      extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
      forwarding transmission.
      
      With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
      to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
      a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
      dev_queue_xmit() to do the transmission.
      
      In the future, it was also required for macvtap l2 forwarding support since it
      provides a necessary synchronization method.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: e1000-devel@lists.sourceforge.net
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f663dd9a
  2. 04 12月, 2013 1 次提交
  3. 02 12月, 2013 1 次提交
  4. 08 11月, 2013 7 次提交
  5. 05 11月, 2013 9 次提交
    • J
      net/mlx4_core: Implement resource quota enforcement · 146f3ef4
      Jack Morgenstein 提交于
      Implements resource quota grant decision when resources are requested,
      for the following resources:  QPs, CQs, SRQs, MPTs, MTTs, vlans, MACs,
      and Counters.
      
      When granting a resource, the quota system increases the allocated-count
      for that slave.
      
      When the slave later frees the resource, its allocated-count is reduced.
      
      A spinlock is used to protect the integrity of each resource's free-pool counter.
      (One slave may be in the process of being granted a resource while another
      slave has crashed, initiating cleanup of that slave's resource quotas).
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      146f3ef4
    • J
      net/mlx4_core: Fix quota handling in the QUERY_FUNC_CAP wrapper · eb456a68
      Jack Morgenstein 提交于
      In current kernels, the mlx4 driver running on a VM does not
      differentiate between max resource numbers for the HCA and
      max quotas -- it simply takes the quota values passed to it
      as max-resource values.
      
      However, the driver actually requires the VFs to be aware of
      the actual number of resources that the HCA was initialized with,
      for QPs, CQs, SRQs and MPTs.
      
      For QPs, CQs and SRQs, the reason is that in completion handling
      the driver must know which of the 24 bits are the actual resource
      number, and which are "padding" bits.
      
      For MPTs, also, the driver assumes knowledge of the number of MPTs
      in the system.
      
      The previous commit fixes the quota logic on the VM for the quota values
      passed to it by QUERY_FUNC_CAPS.
      
      For QPs, CQs, SRQs, and MPTs, it takes the max resource numbers
      from QUERY_HCA (and not QUERY_FUNC_CAPS).  The quotas passed
      in QUERY_FUNC_CAPS are used to report max resource number values
      in the response to ib_query_device.
      
      However, the Hypervisor driver must consider that VMs
      may be running previous kernels, and compatibility must be preserved.
      
      To resolve the incompatibility with previous kernels running on VMs,
      we deprecated the quota fields in mlx4_QUERY_FUNC_CAP.  In the
      deprecated fields, we pass the max-resource values from INIT_HCA
      
      The quota fields are moved to a new location, and the current kernel
      driver takes the proper values from that location. There is
      also a new flag in dword 0, bit 28 of the mlx4_QUERY_FUNC_CAP mailbox;
      if this flag is set, the (VM) driver takes the quota values from the
      new location.
      
      VMs running previous kernels will work properly, except that the max resource
      numbers reported in ib_query_device for these resources will be
      too high.  The Hypervisor driver will, however, enforce the quotas
      for these VMs.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb456a68
    • J
      mlx4: Structures and init/teardown for VF resource quotas · 5a0d0a61
      Jack Morgenstein 提交于
      This is step #1 for implementing SRIOV resource quotas for VFs.
      
      Quotas are implemented per resource type for VFs and the PF, to prevent
      any entity from simply grabbing all the resources for itself and leaving
      the other entities unable to obtain such resources.
      
      Resources which are allocated using quotas:  QPs, CQs, SRQs, MPTs, MTTs, MAC,
                                                   VLAN, and Counters.
      
      The quota system works as follows:
      Each entity (VF or PF) is given a max number of a given resource (its quota),
      and a guaranteed minimum number for each resource (starvation prevention).
      
      For QPs, CQs, SRQs, MPTs and MTTs:
      50% of the available quantity for the resource is divided equally among
      the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module
      parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool.
      The other 50% is the "free pool", allocated on a first-come-first-serve basis.
      For each VF/PF, resources are first allocated from its "guaranteed-minimum"
      pool. When that pool is exhausted, the driver attempts to allocate from
      the resource "free-pool".
      
      The quota (i.e., max) for the VFs and the PF is:
        The free-pool amount (50% of the real max) + the guaranteed minimum
      
      For MACs:
        Guarantee 2 MACs per VF/PF per port. As a result, since we have only
        128 MACs per port, reduce the allowable number of VFs from 64 to 63.
        Any remaining MACs are put into a free pool.
      
      For VLANs:
        For the PF, the per-port quota is 128 and guarantee is 64
           (to allow the PF to register at least a VLAN per VF in VST mode).
        For the VFs, the per-port quota is 64 and the guarantee is 0.
            We assume that VGT VFs are trusted not to abuse the VLAN resource.
      
      For Counters:
        For all functions (PF and VFs), the quota is 128 and the guarantee is 0.
      
      In this patch, we define the needed structures, which are added to the
      resource-tracker struct.  In addition, we do initialization
      for the resource quota, and adjust the query_device response to use quotas
      rather than resource maxima.
      
      As part of the implementation, we introduce a new field in
      mlx4_dev: quotas.  This field holds the resource quotas used
      to report maxima to the upper layers (ib_core, via query_device).
      
      The HCA maxima of these values are passed to the VFs (via
      QUERY_HCA) so that they may continue to use these in handling
      QPs, CQs, SRQs and MPTs.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a0d0a61
    • J
      net/mlx4_core: Fix checking order in MR table init · a30f1bc5
      Jack Morgenstein 提交于
      In procedure mlx4_init_mr_table(), slaves should do no processing,
      but should return success. This initialization is hypervisor-only.
      
      However, the check for num_mpts being a power-of-2 was performed
      before the check to return immediately if the driver is for a slave.
      This resulted in spurious failures.
      
      The order of performing the checks is reversed, so that if the
      driver is for a slave, no processing is done and success is returned.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a30f1bc5
    • J
      net/mlx4_core: Don't fail reg/unreg vlan for older guests · 2c957ff2
      Jack Morgenstein 提交于
      In upstream kernels under SRIOV, the vlan register/unregister calls
      were NOPs (doing nothing and returning OK). We detect these old
      calls from guests (via the comm channel), since previously the
      port number in mlx4_register_vlan was passed (improperly) in the
      out_param. This has been corrected so that the port number is now
      passed in bits 8..15 of the in_modifier field.
      
      For old calls, these bits will be zero, so if the passed port
      number is zero, we can still look at the out_param field to see
      if it contains a valid port number. If yes, the VM is running
      an old driver.
      
      Since for old drivers, the register/unregister_vlan wrappers were
      NOPs, we continue this policy -- the reason being that upstream
      had an additional bug in eth driver running on guests (where
      procedure mlx4_en_vlan_rx_kill_vid() had the following code:
      
      if (!mlx4_find_cached_vlan(mdev->dev, priv->port, vid, &idx))
              mlx4_unregister_vlan(mdev->dev, priv->port, idx);
      else
              en_err(priv, "could not find vid %d in cache\n", vid);
      
      On a VM, mlx4_find_cached_vlan() will always fail, since the
      vlan cache is located on the Hypervisor; on guests it is empty.
      
      Therefore, if we allow upstream guests to register vlans, we will
      have vlan leakage since the unregister will never be performed.
      Leaving vlan reg/unreg for old guest drivers as a NOP is not a
      feature regression, since in upstream the register/unregister
      vlan wrapper is a NOP.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c957ff2
    • J
      net/mlx4_core: Resource tracker for reg/unreg vlans · 4874080d
      Jack Morgenstein 提交于
      Add resource tracker support for reg/unreg vlans calls done by VFs.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4874080d
    • J
      net/mlx4_en: Use vlan id instead of vlan index for unregistration · 2009d005
      Jack Morgenstein 提交于
      Use of vlan_index created problems unregistering vlans on guests.
      
      In addition, tools delete vlan by tag, not by index, lets follow that.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2009d005
    • J
      net/mlx4_core: Fix reg/unreg vlan/mac to conform to the firmware spec · acddd5dd
      Jack Morgenstein 提交于
      The functions mlx4_register_vlan, mlx4_unregister_vlan, mlx4_register_mac,
      mlx4_unregister_mac all made illegal use of the out_param in multifunc mode
      to pass the port number. The firmware spec specifies that the port number
      should be passed in bits 8..15 of the input-modifier field for ALLOC_RES and
      FREE_RES (sections 20.15.1 and 20.15.2).
      
      For MAC register/unregister, this patch contains workarounds so that guests
      running previous kernels continue to work on a new Hypervisor, and guests
      running the new kernel will continue to work on old hypervisors.
      
      Vlan registeration capability is still not operational in multifunction mode,
      since the vlan wrapper functions are not implemented in this patch.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acddd5dd
    • J
      net/mlx4_core: Fix register/unreg vlan flow · 162226a1
      Jack Morgenstein 提交于
      The reg/unreg vlan code was broken:
      
      1. a wrapped function called another wrapped function, causing a deadlock.
      
      2. unregister_vlan called cmd_box instead of cmd_box_imm, leading to
         incorrectly passed parameters.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      162226a1
  6. 04 11月, 2013 1 次提交
  7. 18 10月, 2013 4 次提交
  8. 14 10月, 2013 1 次提交
  9. 09 10月, 2013 2 次提交
  10. 13 9月, 2013 1 次提交
  11. 22 8月, 2013 5 次提交
  12. 14 8月, 2013 1 次提交
  13. 06 8月, 2013 1 次提交
    • J
      net: mlx4: Staticize local functions · f094668c
      Jingoo Han 提交于
      These local functions are used only in this file.
      Fix the following sparse warnings:
      
      drivers/net/ethernet/mellanox/mlx4/cmd.c:803:5: warning: symbol 'MLX4_CMD_UPDATE_QP_wrapper' was not declared. Should it be static?
      drivers/net/ethernet/mellanox/mlx4/cmd.c:812:5: warning: symbol 'MLX4_CMD_GET_OP_REQ_wrapper' was not declared. Should it be static?
      drivers/net/ethernet/mellanox/mlx4/cmd.c:1547:5: warning: symbol 'mlx4_master_immediate_activate_vlan_qos' was not declared. Should
      it be static?
      Signed-off-by: NJingoo Han <jg1.han@samsung.com>
      Acked-By: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f094668c
  14. 02 8月, 2013 3 次提交
  15. 29 7月, 2013 2 次提交
    • Y
      net/mlx4_core: Respond to operation request by firmware · fe6f700d
      Yevgeny Petrilin 提交于
      This commit adds new firmware command and new firmware event.  The firmware
      raises the MLX4_EVENT_TYPE_OP_REQUIRED event in order to signal the driver it
      needs to perform an administrative operation throughout the MLX4_CMD_GET_OP_REQ
      command. At the moment the supported operation is adding/removing multicast
      entries which are used by the firmware for handling NCSI traffic in B0
      steering mode.
      
      Also, had to swap the order of mlx4_init_mcg_table() and
      mlx4_init_eq_table() to make sure that driver will get events only after
      resources are initialized to handle it.
      Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.com>
      Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe6f700d
    • E
      net/mlx4_en: Fix BlueFlame race · 2d4b6466
      Eugenia Emantayev 提交于
      Fix a race between BlueFlame flow and stamping in post send flow.
      Example:
      	SW: Build WQE 0 on the TX buffer, except the ownership bit
      	SW: Set ownership for WQE 0 on the TX buffer
      	SW: Ring doorbell for WQE 0
      	SW: Build WQE 1 on the TX buffer, except the ownership bit
      	SW: Set ownership for WQE 1 on the TX buffer
      	HW: Read WQE 0 and then WQE 1, before doorbell was rung/BF was done for WQE 1
      	HW: Produce CQEs for WQE 0 and WQE 1
      	SW: Process the CQEs, and stamp WQE 0 and WQE 1 accordingly (on the TX buffer)
      	SW: Copy WQE 1 from the TX buffer to the BF register - ALREADY STAMPED!
      	HW: CQE error with index 0xFFFF  - the BF WQE's control segment is STAMPED,
      		so the BF index is 0xFFFF. Error: Invalid Opcode.
      As a result QP enters the error state and no traffic can be sent.
      
      Solution:
      When stamping - do not stamp last completed wqe.
      Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d4b6466