1. 02 12月, 2017 7 次提交
    • S
      bnxt_en: Add ETH_RESET_AP support · 6502ad59
      Scott Branden 提交于
      Add ETH_RESET_AP support handling to reset the internal
      Application Processor(s) of the SmartNIC card.
      Signed-off-by: NScott Branden <scott.branden@broadcom.com>
      Acked-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6502ad59
    • S
      net: ethtool: add support for reset of AP inside NIC interface. · 40e44a1e
      Scott Branden 提交于
      Add ETH_RESET_AP to reset the application processor(s) inside the NIC
      interface.
      
      Current ETH_RESET_MGMT supports a management processor inside this NIC.
      This is typically used for remote NIC management purposes.
      
      Application processors exist inside some SmartNICs to run various
      applications inside the NIC processor - be it a simple algorithm without
      an OS to as complex as hosting multiple VMs.
      Signed-off-by: NScott Branden <scott.branden@broadcom.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40e44a1e
    • D
      Merge branch 'rds-tcp-netns-delete-related-fixes' · 68bf33f4
      David S. Miller 提交于
      Sowmini Varadhan says:
      
      ====================
      rds-tcp netns delete related fixes
      
      Patchset contains cleanup and bug fixes. Patch 1 is the removal
      of some redundant code/functions. Patch 2 and 3 are fixes for
      corner cases identified by syzkaller. I've not been able to
      reproduce the actual use-after-free race flagged in the syzkaller
      reports, thus these fixes are based on code inspection plus
      manual testing to make sure the modified code paths are executed
      without problems in the commonly encountered timing cases.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68bf33f4
    • S
      rds: tcp: atomically purge entries from rds_tcp_conn_list during netns delete · f10b4cff
      Sowmini Varadhan 提交于
      The rds_tcp_kill_sock() function parses the rds_tcp_conn_list
      to find the rds_connection entries marked for deletion as part
      of the netns deletion under the protection of the rds_tcp_conn_lock.
      Since the rds_tcp_conn_list tracks rds_tcp_connections (which
      have a 1:1 mapping with rds_conn_path), multiple tc entries in
      the rds_tcp_conn_list will map to a single rds_connection, and will
      be deleted as part of the rds_conn_destroy() operation that is
      done outside the rds_tcp_conn_lock.
      
      The rds_tcp_conn_list traversal done under the protection of
      rds_tcp_conn_lock should not leave any doomed tc entries in
      the list after the rds_tcp_conn_lock is released, else another
      concurrently executiong netns delete (for a differnt netns) thread
      may trip on these entries.
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f10b4cff
    • S
      rds: tcp: correctly sequence cleanup on netns deletion. · 681648e6
      Sowmini Varadhan 提交于
      Commit 8edc3aff ("rds: tcp: Take explicit refcounts on struct net")
      introduces a regression in rds-tcp netns cleanup. The cleanup_net(),
      (and thus rds_tcp_dev_event notification) is only called from put_net()
      when all netns refcounts go to 0, but this cannot happen if the
      rds_connection itself is holding a c_net ref that it expects to
      release in rds_tcp_kill_sock.
      
      Instead, the rds_tcp_kill_sock callback should make sure to
      tear down state carefully, ensuring that the socket teardown
      is only done after all data-structures and workqs that depend
      on it are quiesced.
      
      The original motivation for commit 8edc3aff ("rds: tcp: Take explicit
      refcounts on struct net") was to resolve a race condition reported by
      syzkaller where workqs for tx/rx/connect were triggered after the
      namespace was deleted. Those worker threads should have been
      cancelled/flushed before socket tear-down and indeed,
      rds_conn_path_destroy() does try to sequence this by doing
           /* cancel cp_send_w */
           /* cancel cp_recv_w */
           /* flush cp_down_w */
           /* free data structures */
      Here the "flush cp_down_w" will trigger rds_conn_shutdown and thus
      invoke rds_tcp_conn_path_shutdown() to close the tcp socket, so that
      we ought to have satisfied the requirement that "socket-close is
      done after all other dependent state is quiesced". However,
      rds_conn_shutdown has a bug in that it *always* triggers the reconnect
      workq (and if connection is successful, we always restart tx/rx
      workqs so with the right timing, we risk the race conditions reported
      by syzkaller).
      
      Netns deletion is like module teardown- no need to restart a
      reconnect in this case. We can use the c_destroy_in_prog bit
      to avoid restarting the reconnect.
      
      Fixes: 8edc3aff ("rds: tcp: Take explicit refcounts on struct net")
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      681648e6
    • S
      rds: tcp: remove redundant function rds_tcp_conn_paths_destroy() · 2d746c93
      Sowmini Varadhan 提交于
      A side-effect of Commit c14b0366 ("rds: tcp: set linger to 1
      when unloading a rds-tcp") is that we always send a RST on the tcp
      connection for rds_conn_destroy(), so rds_tcp_conn_paths_destroy()
      is not needed any more and is removed in this patch.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d746c93
    • J
      tipc: fall back to smaller MTU if allocation of local send skb fails · 4c94cc2d
      Jon Maloy 提交于
      When sending node local messages the code is using an 'mtu' of 66060
      bytes to avoid unnecessary fragmentation. During situations of low
      memory tipc_msg_build() may sometimes fail to allocate such large
      buffers, resulting in unnecessary send failures. This can easily be
      remedied by falling back to a smaller MTU, and then reassemble the
      buffer chain as if the message were arriving from a remote node.
      
      At the same time, we change the initial MTU setting of the broadcast
      link to a lower value, so that large messages always are fragmented
      into smaller buffers even when we run in single node mode. Apart from
      obtaining the same advantage as for the 'fallback' solution above, this
      turns out to give a significant performance improvement. This can
      probably be explained with the __pskb_copy() operation performed on the
      buffer for each recipient during reception. We found the optimal value
      for this, considering the most relevant skb pool, to be 3744 bytes.
      Acked-by: NYing Xue <ying.xue@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c94cc2d
  2. 01 12月, 2017 6 次提交
    • D
      Merge branch 'macb-rx-packet-filtering' · 201c78e0
      David S. Miller 提交于
      Rafal Ozieblo says:
      
      ====================
      Receive packets filtering for macb driver
      
      This patch series adds support for receive packets
      filtering for Cadence GEM driver. Packets can be redirect
      to different hardware queues based on source IP, destination IP,
      source port or destination port. To enable filtering,
      support for RX queueing was added as well.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      201c78e0
    • R
      net: macb: Added support for RX filtering · ae8223de
      Rafal Ozieblo 提交于
      This patch allows filtering received packets to different
      hardware queues (aka ntuple).
      Signed-off-by: NRafal Ozieblo <rafalo@cadence.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae8223de
    • R
      net: macb: Added some queue statistics · 512286bb
      Rafal Ozieblo 提交于
      Added statistics per queue:
      - qX_rx_packets
      - qX_rx_bytes
      - qX_rx_dropped
      - qX_tx_packets
      - qX_tx_bytes
      - qX_tx_dropped
      Signed-off-by: NRafal Ozieblo <rafalo@cadence.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      512286bb
    • R
      net: macb: Added support for many RX queues · ae1f2a56
      Rafal Ozieblo 提交于
      To be able for packet reception on different RX queues some
      configuration has to be performed. This patch checks how many
      hardware queue does GEM support and initializes them.
      Signed-off-by: NRafal Ozieblo <rafalo@cadence.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae1f2a56
    • S
      vmxnet3: increase default rx ring sizes · 7475908f
      Shrikrishna Khare 提交于
      There are several reasons for increasing the receive ring sizes:
      
      1. The original ring size of 256 was chosen about 10 years ago when
      vmxnet3 was first created. At that time, 10Gbps Ethernet was not prevalent
      and servers were dominated by 1Gbps Ethernet. Now 10Gbps is common place,
      and higher bandwidth links -- 25Gbps, 40Gbps, 50Gbps -- are starting
      to appear. 256 Rx ring entries are simply not enough to keep up with
      higher link speed when there is a burst of network frames coming from
      these high speed links. Even with full MTU size frames, they are gone
      in a short time. It is also more common to have a mix of frame sizes,
      and more likely bi-modal distribution of frame sizes so the average frame
      size is not close to full MTU. If we consider average frame size of 800B,
      1024 frames that come in a burst takes ~0.65 ms to arrive at 10Gbps. With
      256 entires, it takes ~0.16 ms to arrive at 10Gbps.  At 25Gbps or 40Gbps,
      this time is reduced accordingly.
      
      2. On a hypervisor where there are many VMs and CPU is over committed,
      i.e. the number of VCPUs is more than the number of VCPUs, each PCPU is
      in effect time shared between multiple VMs/VCPUs. The time granularity at
      which this multiplexing occurs is typically coarser than between processes
      on a guest OS. Trying to time slice more finely is not efficient, for
      example, if memory cache is barely warmed up when switching from one VM
      to another occurs. This CPU overcommit adds delay to when the driver
      in a VM can service incoming packets. Whether CPU is over committed
      really depends on customer workloads. For certain situations, it is very
      common. For example, workloads of desktop VMs and product testing setups.
      Consolidation and sharing is what drives efficiency of a customer setup
      for such workloads. In these situations, the raw network bandwidth may
      not be very high, but the delays between when a VM is running or not
      running can also be relatively long.
      Signed-off-by: NShrikrishna Khare <skhare@vmware.com>
      Acked-by: NJin Heo <heoj@vmware.com>
      Acked-by: NGuolin Yang <gyang@vmware.com>
      Acked-by: NBoon Ang <bang@vmware.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7475908f
    • F
      net: dsa: bcm_sf2: Utilize b53_get_tag_protocol() · 9f66816a
      Florian Fainelli 提交于
      Utilize the much more capable b53_get_tag_protocol() which takes care of
      all Broadcom switches specifics to resolve which port can have Broadcom
      tags enabled or not.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f66816a
  3. 30 11月, 2017 27 次提交