1. 07 4月, 2016 27 次提交
    • S
      i40e: Leave debug_mask cleared at init · 89dd0551
      Shannon Nelson 提交于
      Don't set our internal debug_mask at startup unless we get specific signal
      to from the debug module parameter.
      
      This should take care of the issue with all the device capabilities getting
      printed even when we hadn't asked for the debug info.
      
      Change-ID: I7fbc6bd8b11ed9b0631ec018ff36015a04100b6c
      Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      89dd0551
    • D
      i40e: Inserting a HW capability display info · 453e16e8
      Deepthi Kavalur 提交于
      Display MSIx vector count for HW capabilities.
      
      Change-ID: I4b41e9b50360cf660e7fbcb85b9390fedcf313b1
      Signed-off-by: NDeepthi Kavalur <deepthi.kavalur@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      453e16e8
    • D
      Merge branch 'mlxsw-dcb' · 58a01d4d
      David S. Miller 提交于
      Jiri Pirko says:
      
      ====================
      mlxsw: Introduce support for Data Center Bridging
      
      Ido says:
      
      This patchset introduces support for Quality of Service (QoS) as part of the
      IEEE Data Center Bridiging (DCB) standards.
      
      Patches 1-9 do the required device initialization. Specifically, patches 1-6
      initialize the ports' headroom buffers, which are used at ingress to store
      incoming packets while they go through the switch's pipeline. Patches 7-9
      complete them by initializing the egress scheduling.
      
      The pipeline mentioned above determines the packet's egress port(s) and
      traffic class. Ideally, once out of the pipeline the packet moves to the
      switch's shared buffer (to be introduced in Jiri's patchset, currently
      default values are used) and scheduled for transmission according to its
      traffic class. The egress scheduling is configured according to the 802.1Qaz
      standard, which is part of the DCB infrastructure supported by Linux. This
      is introduced in patches 10-12.
      
      Even after going through the pipeline packets are not always eligible to
      enter the shared buffer. This is determined by the amount of available space
      and the quotas associated with the packet. However, if flow control is
      enabled and the packet is associated with the lossless flow, then it will
      stay in the headroom and won't be discarded. This is introduced in patches
      13-17.
      
      Please check individual commit messages for more info, as I tried to keep
      them pretty detailed.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58a01d4d
    • I
      mlxsw: spectrum: Add IEEE 802.1Qbb PFC support · d81a6bdb
      Ido Schimmel 提交于
      Implement the appropriate DCB ops and allow a user to configure certain
      traffic classes as lossless.
      
      The operation configures PFC for both the egress (respecting PFC frames)
      and ingress (sending PFC frames) parts of the port.
      
      At egress, when a PFC frame is received for a PFC enabled priority, then
      all the priorities mapped to the same TC are stopped.
      
      At ingress, the priority group (PG) buffers to which the enabled PFC
      priorities are mapped are configured to be lossless. PFC frames will be
      transmitted when the Xoff threshold is crossed.
      
      The user-supplied delay parameter is used to determine the PG's size
      according to the following formula:
      
      PG_SIZE = PG_SIZE_LOSSY + delay * CELL_FACTOR + MTU
      
      In the worst case scenario the delay will be made up of packets that
      are all of size CELL_SIZE + 1, which means each packet will require
      almost twice its true size when buffered in the switch. We therefore
      multiply this value by the "cell factor", which is close to 2.
      
      Another MTU is added in case the transmitting host already started
      transmitting a maximum length frame when the PFC packet was received.
      
      As with PAUSE enabled ports, when the port's MTU is changed both the
      PGs' size and threshold are adjusted accordingly.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d81a6bdb
    • I
      mlxsw: reg: Introduce per priority counters · 34dba0a5
      Ido Schimmel 提交于
      We are going to add support for PFC as part of DCB ops, which requires us
      to report the number of PFC frames sent and received per priority.
      
      Add per priority counters in order to report number of PFC frames sent
      and received per priority.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34dba0a5
    • I
      mlxsw: spectrum: Add support for PAUSE frames · 9f7ec052
      Ido Schimmel 提交于
      When a packet ingress the switch it's placed in its assigned priority
      group (PG) buffer in the port's headroom buffer while it goes through
      the switch's pipeline. After going through the pipeline - which
      determines its egress port(s) and traffic class - it's moved to the
      switch's shared buffer awaiting transmission.
      
      However, some packets are not eligible to enter the shared buffer due to
      exceeded quotas or insufficient space. Marking their associated PGs as
      lossless will cause the packets to accumulate in the PG buffer. Another
      reason for packets accumulation are complicated pipelines (e.g.
      involving a lot of ACLs).
      
      To prevent packets from being dropped a user can enable PAUSE frames on
      the port. This will mark all the active PGs as lossless and set their
      size according to the maximum delay, as it's not configured by user.
      
                               +----------------+   +
                               |                |   |
                               |                |   |
                               |                |   |
                               |                |   |
                               |                |   |
                               |                |   | Delay
                               |                |   |
                               |                |   |
                               |                |   |
                               |                |   |
                               |                |   |
          Xon/Xoff threshold   +----------------+   +
                               |                |   |
                               |                |   | 2 * MTU
                               |                |   |
                               +----------------+   +
      
      The delay (612 [Cells]) was calculated according to worst-case scenario
      involving maximum MTU and 100m cables.
      
      After marking the PGs as lossless the device is configured to respect
      incoming PAUSE frames (Rx PAUSE) and generate PAUSE frames (Tx PAUSE)
      according to user's settings.
      
      Whenever the port's headroom configuration changes we take into account
      the PAUSE configuration, so that we correctly set the PG's type (lossy /
      lossless), size and threshold. This can happen when:
      
      a) The port's MTU changes, as it directly affects the PG's size.
      
      b) A PG is created following user configuration, by binding a priority
      to it.
      
      Note that the relevant SUPPORTED flags were already mistakenly set by
      the driver before this commit.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f7ec052
    • I
      mlxsw: reg: Add lossless settings for PBMC register · 155f9de2
      Ido Schimmel 提交于
      When configuring PAUSE frames and PFC we'll need to configure the
      Xon/Xoff threshold for the priority group (PG) buffers.
      
      Add the Xon/Xoff threshold fields to the PBMC register so that we can
      configure these when needed.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      155f9de2
    • I
      mlxsw: reg: Add Port Flow Control Configuration register · 6f253d83
      Ido Schimmel 提交于
      Add the Port Flow Control Configuration (PFCC) register, which
      configures both flow control and Priority-based Flow Control (PFC).
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f253d83
    • I
      mlxsw: spectrum: Allow setting maximum rate for a TC · cc7cf517
      Ido Schimmel 提交于
      Allow a user to set maximum rate for a particular TC using DCB ops.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc7cf517
    • I
      mlxsw: spectrum: Add IEEE 802.1Qaz ETS support · 8e8dfe9f
      Ido Schimmel 提交于
      Implement the appropriate DCB ops and allow a user to configure:
      	* Priority to traffic class (TC) mapping with a total of 8
      	  supported TCs
      	* Transmission selection algorithm (TSA) for each TC and the
      	  corresponding weights in case of weighted round robin (WRR)
      
      As previously explained, we treat the priority group (PG) buffer in the
      port's headroom as the ingress counterpart of the egress TC. Therefore,
      when a certain priority to TC mapping is configured, we also configure
      the port's headroom buffer.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e8dfe9f
    • I
      mlxsw: spectrum: Introduce support for Data Center Bridging (DCB) · f00817df
      Ido Schimmel 提交于
      Introduce basic infrastructure for DCB and add the missing ops in
      following patches.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f00817df
    • I
      mlxsw: spectrum: Initialize egress scheduling · 90183b98
      Ido Schimmel 提交于
      Before introducing support for DCB ops we should first make sure we
      initialize the relevant parts in the device correctly. Specifically, the
      egress scheduling.
      
      The device supports a superset of the 802.1Qaz standard with 4 hierarchy
      levels that can be linked to each other in multiple ways and with
      different transmission selection algorithms (TSA) employed between them.
      
      However, since we only intend to support the 802.1Qaz standard we
      flatten the hierarchies and let the user configure via DCB ops the TSA
      and max rate shaper at the subgroup hierarchy (see figure below) and the
      mapping between switch priority to traffic class. By default, all switch
      priorities are mapped to traffic class 0, strict priority is employed
      and max shaper is disabled.
      
      Default configuration:
      
               switch priority 0      ...         switch priority 7
                       +                                  +
                       |                                  |
                       +----------------------------------+
                       |
                    +--v--+                          +-----+
      Traffic Class |     |                          |     |
        Hierarchy   | TC0 |           ...            | TC7 |
                    |     |                          |     |
                    +--+--+                          +--+--+
                       |                                |
                    +--v--+                          +--v--+
        Subgroup    | SG0 |                          | SG7 |
        Hierarchy   |     |                          |     |
                    +-----+                          +-----+
                    | TSA |                          | TSA |
                    +-----+           ...            +-----+
                    | MAX |                          | MAX |
                    +--+--+                          +--+--+
                       |                                |
                       +---------------+----------------+
                                       |
                                    +--v--+
                            Group   |     |
                          Hierarchy | GR0 |
                                    |     |
                                    +--+--+
                                       |
                                    +--v--+
                            Port    |     |
                          Hierarchy | PR0 |
                                    |     |
                                    +-----+
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90183b98
    • I
      mlxsw: reg: Add QoS Switch Traffic Class Table register · 2c63a555
      Ido Schimmel 提交于
      As part of DCB ops we'll have to configure the priority to traffic class
      mapping of a port.
      
      Add the QoS Switch Traffic Class Table (QTCT) register, which configures
      the mapping between the packet switch priority and traffic class on the
      transmit port.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c63a555
    • I
      mlxsw: reg: Add QoS ETS Element Configuration register · b9b7cee4
      Ido Schimmel 提交于
      We are going to introduce support for DCB, so we need to be able to
      configure the traffic selection algorithm (TSA) used by each traffic
      class (TC), as well as the bandwidth percentage allocated to each TC in
      case of ETS.
      
      Add the QoS ETS Element Configuration register, which controls the
      above parameters.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9b7cee4
    • I
      mlxsw: spectrum: Set port's shared buffer size to 0 · d6b7c13b
      Ido Schimmel 提交于
      In addition to the priority group (PG) buffers in the headroom, the
      device enables the allocation of headroom shared buffer, which can
      be shared between different PGs.
      
      However, we are not going to use the headroom shared buffer and instead
      allow the user to use its size for PGs or the switch's shared buffer.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6b7c13b
    • I
      mlxsw: reg: Use correct PBMC register length · 7ad7cd61
      Ido Schimmel 提交于
      The last field of the PBMC register is at offset 0x64 and its size is
      0x8, so the correct register's length is 0x6C bytes.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ad7cd61
    • I
      mlxsw: spectrum: Correctly configure headroom size · ff6551ec
      Ido Schimmel 提交于
      When packets ingress the switch they are assigned a switch priority and
      directed to the corresponding priority group (PG) buffer in the port's
      headroom buffer.
      
      Since we now map all switch priorities to priority group 0 (PG0) by
      default, there is no need to allocate the other priority groups during
      initialization. The only exception is PG9, which is used for control
      traffic.
      
      At minimum, the PG should be able to store the currently classified
      packet (pipeline latency isn't 0) and also the packets arriving during
      the classification time. However, an incoming packet will not be
      buffered if there is no available MTU-sized buffer space for storing it.
      
      The buffer needed to accommodate for pipeline latency is variable and
      needs to take into account both the current link speed and current
      latency of the pipeline, which is time-dependent. Testing showed that
      setting the PG's size to twice the current MTU is optimal.
      
      Since PG9 is used strictly for control packets and not subject to flow
      control, we are not going to resize it according to user configuration,
      so we simply set it according to worst case scenario, which is twice the
      maximum MTU.
      
      In any case, later patches in the series will allow a user to direct
      lossless flows to other PGs than PG0 and set their size to accommodate
      for round-trip propagation delay.
      
      The above change also requires us to resize the PG buffer whenever the
      port's MTU is changed.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff6551ec
    • I
      mlxsw: spectrum: Add bytes to cells helper · 1a198449
      Ido Schimmel 提交于
      Buffers in the switch store packets in units called buffer cells. Add a
      helper to convert from bytes to cells, so that the actual number of
      cells required (result is round up) is returned.
      
      Also, drop the SB (shared buffer) acronym from the BYTES_PER_CELL macro,
      as this unit is also used in the ports' buffers and not only the
      switch's shared buffer.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a198449
    • I
      mlxsw: spectrum: Map all switch priorities to priority group 0 · dd6cb0f9
      Ido Schimmel 提交于
      During transmission, the skb's priority is used to map the skb to a
      traffic class, where the idea is to group priorities with similar
      characteristics (e.g. lossy, lossless) to the same traffic class. By
      default, all priorities are mapped to traffic class 0.
      
      In the device, we model the skb's priority as the switch priority, which
      is assigned to a packet according to its PCP value and ingress port
      (untagged packets are assigned the port's default switch priority - 0).
      
      At ingress, the packet is directed to a priority group (PG) buffer in
      the port's headroom buffer according to the packet's switch priority and
      switch priority to buffer mapping.
      
      While it's possible to configure the egress mapping between skb's
      priority (switch priority) and traffic class, there is no mechanism to
      configure the ingress mapping to a PG.
      
      In order to keep things simple and since grouping certain priorities into
      a traffic class at egress also implies they should be grouped the same
      at ingress, treat a PG as the ingress counterpart of an egress traffic
      class.
      
      Having established the above, during initialization map all the switch
      priorities to PG0 in accordance with the Linux defaults for traffic
      class mapping.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd6cb0f9
    • I
      mlxsw: reg: Add Port Prio To Buffer register · b98ff151
      Ido Schimmel 提交于
      When packets ingress the switch they are assigned a switch priority
      number that dictates the packet's priority group (PG) buffer in the
      port's headroom buffer.
      
      Add the Port Prio To Buffer (PPTB) register, which configures the switch
      priority to PG mapping.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b98ff151
    • D
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 92b6d35f
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2016-04-05
      
      This series contains updates to i40e and i40evf only.
      
      Colin Ian King cleaned up a redundant NULL check which was found by static
      analysis.
      
      Anjali enables geneve receive offload for XL710/X710 devices.
      
      Mitch cleans up unused variable in i40e_vc_get_vf_resources_msg().
      Fixed the driver to actually be able to adjust VLAN tagging features
      through ethtool, as expected.  Fixed a problem where VF resets would
      get lost by the PF preventing the VF driver from initializing.  Also
      put users mind at ease by lowering some message levels since many of
      these conditions can happen any time VFs are enabled or disabled and
      are not really indicative a fatal problems, unless they happen
      continuously.
      
      Shannon disables the link polling to lessen the admin queue traffic
      especially since the link event mask usage has been fixed recently.
      
      Alex Duyck fixes the i40e and i40evf drivers to correctly update
      checksums for frames up to 16776960 in length which should be more than
      large enough for all possible TSO frames in the near future.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92b6d35f
    • D
      Merge branch 'vxlan-gpe' · 6f555635
      David S. Miller 提交于
      Jiri Benc says:
      
      ====================
      vxlan: implement Generic Protocol Extension (GPE)
      
      v3: just rebased on top of the current net-next, no changes
      
      This patchset implements VXLAN-GPE. It follows the same model as the tun/tap
      driver: depending on the chosen mode, the vxlan interface is created either
      as ARPHRD_ETHER (non-GPE) or ARPHRD_NONE (GPE).
      
      Note that the internal fdb control plane cannot be used together with
      VXLAN-GPE and attempt to configure it will be rejected by the driver. In
      fact, COLLECT_METADATA is required to be set for now. This can be relaxed in
      the future by adding support for static PtP configuration; it will be
      backward compatible and won't affect existing users.
      
      The previous version of the patchset supported two GPE modes, L2 and L3. The
      L2 mode (now called "ether mode" in the code) was removed from this version.
      It can be easily added later if there's demand. The L3 mode is now called
      "raw mode" and supports also encapsulated Ethernet headers (via ETH_P_TEB).
      
      The only limitation of not having "ether mode" for GPE is for ip route based
      encapsulation: with such setup, only IP packets can be encapsulated. Meaning
      no Ethernet encapsulation. It seems there's not much use for this, though.
      If it turns out to be useful, we'll add it.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f555635
    • J
      vxlan: implement GPE · e1e5314d
      Jiri Benc 提交于
      Implement VXLAN-GPE. Only COLLECT_METADATA is supported for now (it is
      possible to support static configuration, too, if there is demand for it).
      
      The GPE header parsing has to be moved before iptunnel_pull_header, as we
      need to know the protocol.
      
      v2: Removed what was called "L2 mode" in v1 of the patchset. Only "L3 mode"
          (now called "raw mode") is added by this patch. This mode does not allow
          Ethernet header to be encapsulated in VXLAN-GPE when using ip route to
          specify the encapsulation, IP header is encapsulated instead. The patch
          does support Ethernet to be encapsulated, though, using ETH_P_TEB in
          skb->protocol. This will be utilized by other COLLECT_METADATA users
          (openvswitch in particular).
      
          If there is ever demand for Ethernet encapsulation with VXLAN-GPE using
          ip route, it's easy to add a new flag switching the interface to
          "Ethernet mode" (called "L2 mode" in v1 of this patchset). For now,
          leave this out, it seems we don't need it.
      
          Disallowed more flag combinations, especially RCO with GPE.
          Added comment explaining that GBP and GPE cannot be set together.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1e5314d
    • J
      ip_tunnel: implement __iptunnel_pull_header · a6d5bbf3
      Jiri Benc 提交于
      Allow calling of iptunnel_pull_header without special casing ETH_P_TEB inner
      protocol.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6d5bbf3
    • J
      vxlan: move fdb code to common location in vxlan_xmit · 47e5d1b0
      Jiri Benc 提交于
      Handle VXLAN_F_COLLECT_METADATA before VXLAN_F_PROXY. The latter does not
      make sense with the former, as it needs populated fdb which does not happen
      in metadata mode.
      
      After this cleanup, the fdb code in vxlan_xmit is moved to a common location
      and can be later skipped for VXLAN-GPE which does not necessarily carry
      inner Ethernet header.
      
      v2: changed commit description to not reference L3 mode
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47e5d1b0
    • J
      vxlan: move Ethernet initialization to a separate function · 0c867c9b
      Jiri Benc 提交于
      This will allow to initialize vxlan in ARPHRD_NONE mode based on the passed
      rtnl attributes.
      
      v2: renamed "l2mode" to "ether".
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c867c9b
    • H
      cxgb4/cxgb4vf: Deprecate module parameter dflt_msg_enable · 8a21ec4e
      Hariprasad Shenai 提交于
      Message level can be set through ethtool, so deprecate module parameter
      which is used to set the same.
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a21ec4e
  2. 06 4月, 2016 13 次提交