1. 12 1月, 2018 1 次提交
    • E
      {net,ib}/mlx5: Don't disable local loopback multicast traffic when needed · 8978cc92
      Eran Ben Elisha 提交于
      There are systems platform information management interfaces (such as
      HOST2BMC) for which we cannot disable local loopback multicast traffic.
      
      Separate disable_local_lb_mc and disable_local_lb_uc capability bits so
      driver will not disable multicast loopback traffic if not supported.
      (It is expected that Firmware will not set disable_local_lb_mc if
      HOST2BMC is running for example.)
      
      Function mlx5_nic_vport_update_local_lb will do best effort to
      disable/enable UC/MC loopback traffic and return success only in case it
      succeeded to changed all allowed by Firmware.
      
      Adapt mlx5_ib and mlx5e to support the new cap bits.
      
      Fixes: 2c43c5a0 ("net/mlx5e: Enable local loopback in loopback selftest")
      Fixes: c85023e1 ("IB/mlx5: Add raw ethernet local loopback support")
      Fixes: bded747b ("net/mlx5: Add raw ethernet local loopback firmware command")
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      8978cc92
  2. 11 1月, 2018 2 次提交
  3. 03 1月, 2018 2 次提交
  4. 22 12月, 2017 1 次提交
  5. 20 12月, 2017 15 次提交
  6. 15 12月, 2017 1 次提交
  7. 14 12月, 2017 3 次提交
  8. 28 11月, 2017 4 次提交
  9. 21 11月, 2017 1 次提交
  10. 16 11月, 2017 1 次提交
    • M
      mm: remove __GFP_COLD · 453f85d4
      Mel Gorman 提交于
      As the page free path makes no distinction between cache hot and cold
      pages, there is no real useful ordering of pages in the free list that
      allocation requests can take advantage of.  Juding from the users of
      __GFP_COLD, it is likely that a number of them are the result of copying
      other sites instead of actually measuring the impact.  Remove the
      __GFP_COLD parameter which simplifies a number of paths in the page
      allocator.
      
      This is potentially controversial but bear in mind that the size of the
      per-cpu pagelists versus modern cache sizes means that the whole per-cpu
      list can often fit in the L3 cache.  Hence, there is only a potential
      benefit for microbenchmarks that alloc/free pages in a tight loop.  It's
      even worse when THP is taken into account which has little or no chance
      of getting a cache-hot page as the per-cpu list is bypassed and the
      zeroing of multiple pages will thrash the cache anyway.
      
      The truncate microbenchmarks are not shown as this patch affects the
      allocation path and not the free path.  A page fault microbenchmark was
      tested but it showed no sigificant difference which is not surprising
      given that the __GFP_COLD branches are a miniscule percentage of the
      fault path.
      
      Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      453f85d4
  11. 14 11月, 2017 2 次提交
  12. 13 11月, 2017 1 次提交
  13. 10 11月, 2017 5 次提交
  14. 09 11月, 2017 1 次提交
    • G
      net/mlx5e: CHECKSUM_COMPLETE offload for VLAN/QinQ packets · f938daee
      Gal Pressman 提交于
      When the VLAN tag is present in the packet buffer (i.e VLAN stripping disabled, QinQ)
      the driver will currently report CHECKSUM_UNNECESSARY.
      Instead of using CHECKSUM_COMPLETE offload for packets with first
      ethertype of IPv4/6, use it for packets with last ethertype of IPv4/6 to
      cover the former cases as well.
      
      The checksum field present in the CQE is calculated from the IP header
      until the end of the packet. When the first ethertype is different than
      IPv4/6 (for ex. 802.1Q VLAN) a checksum of the VLAN header/s should be
      added. The small header/s checksum calculation will allow us to use
      CHECKSUM_COMPLETE instead of CHECKSUM_UNNECESSARY.
      
      Testing bandwidth of one and 8 TCP streams to a single RQ,
      LRO and VLAN stripping offloads disabled:
      CPU: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
      NIC: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
      
      Before:
      +--------------+--------------------+---------------------+----------------------+
      | Traffic type | 1 Stream BW [Mbps] | 8 Streams BW [Mbps] |   Checksum offload   |
      +--------------+--------------------+---------------------+----------------------+
      | Untagged     |          28,247.35 |           24,716.88 | CHECKSUM_COMPLETE    |
      | VLAN         |          27,516.69 |           23,752.26 | CHECKSUM_UNNECESSARY |
      | QinQ         |           6,961.30 |           20,667.04 | CHECKSUM_UNNECESSARY |
      +--------------+--------------------+---------------------+----------------------+
      
      Now:
      +--------------+--------------------+---------------------+-------------------+
      | Traffic type | 1 Stream BW [Mbps] | 8 Streams BW [Mbps] | Checksum offload  |
      +--------------+--------------------+---------------------+-------------------+
      | Untagged     |          28,521.28 |           24,926.32 | CHECKSUM_COMPLETE |
      | VLAN         |          27,389.37 |           23,715.34 | CHECKSUM_COMPLETE |
      | QinQ         |           6,901.77 |           20,845.73 | CHECKSUM_COMPLETE |
      +--------------+--------------------+---------------------+-------------------+
      
      No performance degradation observed.
      Signed-off-by: NGal Pressman <galp@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      f938daee