1. 05 2月, 2015 1 次提交
    • M
      net/mlx4_core: Port aggregation upper layer interface · 53f33ae2
      Moni Shoua 提交于
      Supply interface functions to bond and unbond ports of a mlx4 internal
      interfaces. Example for such an interface is the one registered by the
      mlx4 IB driver under RoCE.
      
      There are
      
      1. Functions to go in/out to/from bonded mode
      2. Function to remap virtual ports to physical ports
      
      The bond_mutex prevents simultaneous access to data that keep status of
      the device in bonded mode.
      
      The upper mlx4 interface marks to the mlx4 core module that they
      want to be subject for such bonding by setting the MLX4_INTFF_BONDING
      flag. Interface which goes to/from bonded mode is re-created.
      
      The mlx4 Ethernet driver does not set this flag when registering the
      interface, the IB driver does.
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53f33ae2
  2. 12 12月, 2014 4 次提交
    • M
      net/mlx4: Add support for A0 steering · 7d077cd3
      Matan Barak 提交于
      Add the required firmware commands for A0 steering and a way to enable
      that. The firmware support focuses on INIT_HCA, QUERY_HCA, QUERY_PORT,
      QUERY_DEV_CAP and QUERY_FUNC_CAP commands. Those commands are used
      to configure and query the device.
      
      The different A0 DMFS (steering) modes are:
      
      Static - optimized performance, but flow steering rules are
      limited. This mode should be choosed explicitly by the user
      in order to be used.
      
      Dynamic - this mode should be explicitly choosed by the user.
      In this mode, the FW works in optimized steering mode as long as
      it can and afterwards automatically drops to classic (full) DMFS.
      
      Disable - this mode should be explicitly choosed by the user.
      The user instructs the system not to use optimized steering, even if
      the FW supports Dynamic A0 DMFS (and thus will be able to use optimized
      steering in Default A0 DMFS mode).
      
      Default - this mode is implicitly choosed. In this mode, if the FW
      supports Dynamic A0 DMFS, it'll work in this mode. Otherwise, it'll
      work at Disable A0 DMFS mode.
      
      Under SRIOV configuration, when the A0 steering mode is enabled,
      older guest VF drivers who aren't using the RX QP allocation flag
      (MLX4_RESERVE_A0_QP) will get a QP from the general range and
      fail when attempting to register a steering rule. To avoid that,
      the PF context behaviour is changed once on A0 static mode, to
      require support for the allocation flag in VF drivers too.
      
      In order to enable A0 steering, we use log_num_mgm_entry_size param.
      If the value of the parameter is not positive, we treat the absolute
      value of log_num_mgm_entry_size as a bit field. Setting bit 2 of this
      bit field enables static A0 steering.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d077cd3
    • M
      net/mlx4: Add A0 hybrid steering · d57febe1
      Matan Barak 提交于
      A0 hybrid steering is a form of high performance flow steering.
      By using this mode, mlx4 cards use a fast limited table based steering,
      in order to enable fast steering of unicast packets to a QP.
      
      In order to implement A0 hybrid steering we allocate resources
      from different zones:
      (1) General range
      (2) Special MAC-assigned QPs [RSS, Raw-Ethernet] each has its own region.
      
      When we create a rss QP or a raw ethernet (A0 steerable and BF ready) QP,
      we try hard to allocate the QP from range (2). Otherwise, we try hard not
      to allocate from this  range. However, when the system is pushed to its
      limits and one needs every resource, the allocator uses every region it can.
      
      Meaning, when we run out of raw-eth qps, the allocator allocates from the
      general range (and the special-A0 area is no longer active). If we run out
      of RSS qps, the mechanism tries to allocate from the raw-eth QP zone. If that
      is also exhausted, the allocator will allocate from the general range
      (and the A0 region is no longer active).
      
      Note that if a raw-eth qp is allocated from the general range, it attempts
      to allocate the range such that bits 6 and 7 (blueflame bits) in the
      QP number are not set.
      
      When the feature is used in SRIOV, the VF has to notify the PF what
      kind of QP attributes it needs. In order to do that, along with the
      "Eth QP blueflame" bit, we reserve a new "A0 steerable QP". According
      to the combination of these bits, the PF tries to allocate a suitable QP.
      
      In order to maintain backward compatibility (with older PFs), the PF
      notifies which QP attributes it supports via QUERY_FUNC_CAP command.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d57febe1
    • D
      net/mlx4: Add a check if there are too many reserved QPs · ab256e5a
      Dotan Barak 提交于
      The number of reserved QPs is affected both from the firmware and
      from the driver's requirements. This patch adds a check that
      validates that this number is indeed feasable.
      Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab256e5a
    • E
      net/mlx4: Change QP allocation scheme · ddae0349
      Eugenia Emantayev 提交于
      When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
      in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.
      
      The current Ethernet driver code reserves a Tx QP range with 256b alignment.
      
      This is wrong because if there are more than 64 Tx QPs in use,
      QPNs >= base + 65 will have bits 6/7 set.
      
      This problem is not specific for the Ethernet driver, any entity that
      tries to reserve more than 64 BF-enabled QPs should fail. Also, using
      ranges is not necessary here and is wasteful.
      
      The new mechanism introduced here will support reservation for
      "Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
      (when hypervisors support WC in VMs). The flow we use is:
      
      1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
         and request "BF enabled QPs" if BF is supported for the function
      
      2. In the ALLOC_RES FW command, change param1 to:
      a. param1[23:0]  - number of QPs
      b. param1[31-24] - flags controlling QPs reservation
      
      Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
      bits 6 and 7 unset in order to be used in Ethernet.
      
      Bits 24-30 of the flags are currently reserved.
      
      When a function tries to allocate a QP, it states the required attributes
      for this QP. Those attributes are considered "best-effort". If an attribute,
      such as Ethernet BF enabled QP, is a must-have attribute, the function has
      to check that attribute is supported before trying to do the allocation.
      
      In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
      which are unsupported. If SRIOV is used, the PF validates those attributes
      and masks out unsupported attributes as well. In order to notify VFs which
      attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
      mailbox is filled by the PF, which notifies which QP allocation attributes
      it supports.
      Signed-off-by: NEugenia Emantayev <eugenia@mellanox.co.il>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddae0349
  3. 11 9月, 2014 1 次提交
  4. 05 6月, 2014 1 次提交
  5. 03 6月, 2014 1 次提交
  6. 17 5月, 2014 1 次提交
  7. 09 5月, 2014 1 次提交
  8. 17 1月, 2014 1 次提交
  9. 10 12月, 2013 1 次提交
    • J
      mlx4_core: Roll back round robin bitmap allocation commit for CQs, SRQs, and MPTs · 7c6d74d2
      Jack Morgenstein 提交于
      Commit f4ec9e95 "mlx4_core: Change bitmap allocator to work in round-robin fashion"
      introduced round-robin allocation (via bitmap) for all resources which allocate
      via a bitmap.
      
      Round robin allocation is desirable for mcgs, counters, pd's, UARs, and xrcds.
      These are simply numbers, with no involvement of ICM memory mapping.
      
      Round robin is required for QPs, since we had a problem with immediate
      reuse of a 24-bit QP number (commit f4ec9e95).
      
      However, for other resources which use the bitmap allocator and involve
      mapping ICM memory -- MPTs, CQs, SRQs -- round-robin is not desirable.
      
      What happens in these cases is the following:
      
      ICM memory is allocated and mapped in chunks of 256K.
      
      Since the resource allocation index goes up monotonically, the allocator
      will eventually require mapping a new chunk. Now, chunks are also unmapped
      when their reference count goes back to zero.  Thus, if a single app is
      running and starts/exits frequently we will have the following situation:
      
      When the app starts, a new chunk must be allocated and mapped.
      
      When the app exits, the chunk reference count goes back to zero, and the
      chunk is unmapped and freed. Therefore, the app must pay the cost of allocation
      and mapping of ICM memory each time it runs (although the price is paid only when
      allocating the initial entry in the new chunk).
      
      For apps which allocate MPTs/SRQs/CQs and which operate as described above,
      this presented a performance problem.
      
      We therefore roll back the round-robin allocator modification for MPTs, CQs, SRQs.
      Reported-by: NMatthew Finlay <matt@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c6d74d2
  10. 05 11月, 2013 1 次提交
    • J
      mlx4: Structures and init/teardown for VF resource quotas · 5a0d0a61
      Jack Morgenstein 提交于
      This is step #1 for implementing SRIOV resource quotas for VFs.
      
      Quotas are implemented per resource type for VFs and the PF, to prevent
      any entity from simply grabbing all the resources for itself and leaving
      the other entities unable to obtain such resources.
      
      Resources which are allocated using quotas:  QPs, CQs, SRQs, MPTs, MTTs, MAC,
                                                   VLAN, and Counters.
      
      The quota system works as follows:
      Each entity (VF or PF) is given a max number of a given resource (its quota),
      and a guaranteed minimum number for each resource (starvation prevention).
      
      For QPs, CQs, SRQs, MPTs and MTTs:
      50% of the available quantity for the resource is divided equally among
      the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module
      parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool.
      The other 50% is the "free pool", allocated on a first-come-first-serve basis.
      For each VF/PF, resources are first allocated from its "guaranteed-minimum"
      pool. When that pool is exhausted, the driver attempts to allocate from
      the resource "free-pool".
      
      The quota (i.e., max) for the VFs and the PF is:
        The free-pool amount (50% of the real max) + the guaranteed minimum
      
      For MACs:
        Guarantee 2 MACs per VF/PF per port. As a result, since we have only
        128 MACs per port, reduce the allowable number of VFs from 64 to 63.
        Any remaining MACs are put into a free pool.
      
      For VLANs:
        For the PF, the per-port quota is 128 and guarantee is 64
           (to allow the PF to register at least a VLAN per VF in VST mode).
        For the VFs, the per-port quota is 64 and the guarantee is 0.
            We assume that VGT VFs are trusted not to abuse the VLAN resource.
      
      For Counters:
        For all functions (PF and VFs), the quota is 128 and the guarantee is 0.
      
      In this patch, we define the needed structures, which are added to the
      resource-tracker struct.  In addition, we do initialization
      for the resource quota, and adjust the query_device response to use quotas
      rather than resource maxima.
      
      As part of the implementation, we introduce a new field in
      mlx4_dev: quotas.  This field holds the resource quotas used
      to report maxima to the upper layers (ib_core, via query_device).
      
      The HCA maxima of these values are passed to the VFs (via
      QUERY_HCA) so that they may continue to use these in handling
      QPs, CQs, SRQs and MPTs.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a0d0a61
  11. 08 3月, 2013 1 次提交
  12. 01 10月, 2012 3 次提交
    • J
      mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations · 47605df9
      Jack Morgenstein 提交于
      Previously, the structure of a guest's proxy QPs followed the
      structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
      qp1 port 2, ...).  The guest then did offset calculations on the
      sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().
      
      This is now changed so that the guest does no offset calculations
      regarding proxy or tunnel QPs to use.  This change frees the PPF from
      needing to adhere to a specific order in allocating proxy and tunnel
      QPs.
      
      Now QUERY_FUNC_CAP provides each port individually with its proxy
      qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
      used directly where required (with no offset calculations).
      
      To accomplish this change, several fields were added to the phys_caps
      structure for use by the PPF and by non-SR-IOV mode:
      
          base_sqpn -- in non-sriov mode, this was formerly sqp_start.
          base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
          base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.
      
      The current code in the PPF still adheres to the previous layout of
      sqps, proxy-sqps and tunnel-sqps.  However, the PPF can change this
      layout without affecting VF or (paravirtualized) PF code.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      47605df9
    • J
      mlx4_core: INIT/CLOSE port logic for IB ports in SR-IOV mode · 980e9001
      Jack Morgenstein 提交于
      Normally, INIT_PORT and CLOSE_PORT are invoked when special QP0
      transitions to RTR, or transitions to ERR/RESET respectively.
      
      In SR-IOV mode, however, the master is also paravirtualized.  This in
      turn requires that we not do INIT_PORT until the entire QP0 path (real
      QP0 and proxy QP0) is ready to receive.  When the real QP0 goes down,
      we should indicate that the port is not active.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      980e9001
    • J
      mlx4_core: Add proxy and tunnel QPs to the reserved QP area · e2c76824
      Jack Morgenstein 提交于
      In addition, pass the proxy and tunnel QP numbers to slaves so the
      driver can perform special QP paravirtualization.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      e2c76824
  13. 07 3月, 2012 1 次提交
  14. 23 1月, 2012 1 次提交
  15. 14 12月, 2011 3 次提交
    • E
      mlx4_core: resource tracking for HCA resources used by guests · c82e9aa0
      Eli Cohen 提交于
      The resource tracker is used to track usage of HCA resources by the different
      guests.
      
      Virtual functions (VFs) are attached to guest operating systems but
      resources are allocated from the same pool and are assigned to VFs. It is
      essential that hostile/buggy guests not be able to affect the operation of
      other VFs, possibly attached to other guest OSs since ConnectX firmware is not
      tolerant to misuse of resources.
      
      The resource tracker module associates each resource with a VF and maintains
      state information for the allocated object. It also defines allowed state
      transitions and enforces them.
      
      Relationships between resources are also referred to. For example, CQs are
      pointed to by QPs, so it is forbidden to destroy a CQ if a QP refers to it.
      
      ICM memory is always accessible through the primary function and hence it is
      allocated by the owner of the primary function.
      
      When a guest dies, an FLR is generated for all the VFs it owns and all the
      resources it used are freed.
      
      The tracked resource types are: QPs, CQs, SRQs, MPTs, MTTs, MACs, RES_EQs,
      and XRCDNs.
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c82e9aa0
    • J
      mlx4_core: qp modifications for SRIOV · fe9a2603
      Jack Morgenstein 提交于
      QPs are resources which are allocated and tracked by the PF driver.
      In multifunction mode, the allocation and icm mapping is done in
      the resource tracker (later patch in this sequence).
      
      To accomplish this, we have "work" functions whose names start with
      "__", and "request" functions (same name, no __). If we are operating
      in multifunction mode, the request function actually results in
      comm-channel commands being sent (ALLOC_RES or FREE_RES).
      The PF-driver comm-channel handler will ultimately invoke the
      "work" (__) function and return the result.
      
      If we are not in multifunction mode, the "work" handler is invoked
      immediately.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe9a2603
    • J
      mlx4_core: Add "native" argument to mlx4_cmd and its callers (where needed) · f9baff50
      Jack Morgenstein 提交于
      For SRIOV, some Hypervisor commands can be executed directly (native = 1).
      Others should go through the command wrapper flow (for tracking resource
      usage, for example, or for changing some HCA configurations that slaves
      need to be notified of).
      
      This patch sets the groundwork for this capability -- adding the correct
      value of "native" in each case.
      
      Note that if SRIOV is not activated, this parameter has no effect.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9baff50
  16. 01 11月, 2011 1 次提交
  17. 11 8月, 2011 1 次提交
  18. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  19. 06 9月, 2009 1 次提交
  20. 23 10月, 2008 1 次提交
  21. 11 10月, 2008 1 次提交
  22. 26 7月, 2008 1 次提交
  23. 26 4月, 2008 1 次提交
  24. 21 11月, 2007 1 次提交
  25. 15 11月, 2007 1 次提交
  26. 11 10月, 2007 1 次提交
    • R
      mlx4_core: Fix section mismatches · 3d73c288
      Roland Dreier 提交于
          
      Commit ee49bd93 ("mlx4_core: Reset device when internal error is
      detected") introduced some section mismatch problems when
      CONFIG_HOTPLUG=n, because the error recovery code tears down and
      reinitializes the device after everything is loaded, which ends up
      calling into lots of code marked __devinit and __devexit from regular
      .text.  Fix this by getting rid of these now-incorrect section
      markers.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      3d73c288
  27. 10 10月, 2007 1 次提交
  28. 13 7月, 2007 1 次提交
  29. 09 7月, 2007 1 次提交
  30. 09 5月, 2007 1 次提交
    • R
      IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters · 225c7b1f
      Roland Dreier 提交于
      Add an InfiniBand driver for Mellanox ConnectX adapters.  Because
      these adapters can also be used as ethernet NICs and Fibre Channel 
      HBAs, the driver is split into two modules: 
       
        mlx4_core: Handles low-level things like device initialization and 
          processing firmware commands.  Also controls resource allocation 
          so that the InfiniBand, ethernet and FC functions can share a 
          device without stepping on each other. 
       
        mlx4_ib: Handles InfiniBand-specific things; plugs into the 
          InfiniBand midlayer. 
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      225c7b1f