1. 25 5月, 2019 10 次提交
    • D
      ipv6: Make fib6_nh optional at the end of fib6_info · 1cf844c7
      David Ahern 提交于
      Move fib6_nh to the end of fib6_info and make it an array of
      size 0. Pass a flag to fib6_info_alloc indicating if the
      allocation needs to add space for a fib6_nh.
      
      The current code path always has a fib6_nh allocated with a
      fib6_info; with nexthop objects they will be separate.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cf844c7
    • D
      ipv6: Move exception bucket to fib6_nh · cc5c073a
      David Ahern 提交于
      Similar to the pcpu routes exceptions are really per nexthop, so move
      rt6i_exception_bucket from fib6_info to fib6_nh.
      
      To avoid additional increases to the size of fib6_nh for a 1-bit flag,
      use the lowest bit in the allocated memory pointer for the flushed flag.
      Add helpers for retrieving the bucket pointer to mask off the flag.
      
      The cleanup of the exception bucket is moved to fib6_nh_release.
      
      fib6_nh_flush_exceptions can now be called from 2 contexts:
      1. deleting a fib entry
      2. deleting a fib6_nh
      
      For 1., fib6_nh_flush_exceptions is called for a specific fib6_info that
      is getting deleted. All exceptions in the cache using the entry are
      deleted. For 2, the fib6_nh itself is getting destroyed so
      fib6_nh_flush_exceptions is called for a NULL fib6_info which means
      flush all entries.
      
      The pmtu.sh selftest exercises the affected code paths - from creating
      exceptions to cleaning them up on device delete. All tests pass without
      any rcu locking or memleak warnings.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc5c073a
    • D
      ipv6: Refactor exception functions · c0b220cf
      David Ahern 提交于
      Before moving exception bucket from fib6_info to fib6_nh, refactor
      rt6_flush_exceptions, rt6_remove_exception_rt, rt6_mtu_change_route,
      and rt6_update_exception_stamp_rt. In all 3 cases, move the primary
      logic into a new helper that starts with fib6_nh_. The latter 3
      functions still take a fib6_info; this will be changed to fib6_nh
      in the next patch.
      
      In the case of rt6_mtu_change_route, move the fib6_metric_locked
      out as a standalone check - no need to call the new function if
      the fib entry has the mtu locked. Also, add fib6_info to
      rt6_mtu_change_arg as a way of passing the fib entry to the new
      helper.
      
      No functional change intended. The goal here is to make the next
      patch easier to review by moving existing lookup logic for each to
      new helpers.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0b220cf
    • D
      ipv6: Refactor fib6_drop_pcpu_from · 7d88d8b5
      David Ahern 提交于
      Move the existing pcpu walk in fib6_drop_pcpu_from to a new
      helper, __fib6_drop_pcpu_from, that can be invoked per fib6_nh with a
      reference to the from entries that need to be evicted. If the passed
      in 'from' is non-NULL then only entries associated with that fib6_info
      are removed (e.g., case where fib entry is deleted); if the 'from' is
      NULL are entries are flushed (e.g., fib6_nh is deleted).
      
      For fib6_info entries with builtin fib6_nh (ie., current code) there
      is no change in behavior.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d88d8b5
    • D
      ipv6: Move pcpu cached routes to fib6_nh · f40b6ae2
      David Ahern 提交于
      rt6_info are specific instances of a fib entry and are tied to a
      device and gateway - ie., a nexthop. Before nexthop objects, IPv6 fib
      entries have separate fib6_info for each nexthop in a multipath route,
      so the location of the pcpu cache in the fib6_info struct worked.
      However, with nexthop objects a fib6_info can point to a set of nexthops
      (yet another alignment of ipv6 with ipv4). Accordingly, the pcpu
      cache needs to be moved to the fib6_nh struct so the cached entries
      are local to the nexthop specification used to create the rt6_info.
      
      Initialization and free of the pcpu entries moved to fib6_nh_init and
      fib6_nh_release.
      
      Change in location only, from fib6_info down to fib6_nh; no other
      functional change intended.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f40b6ae2
    • D
      Merge branch 'ENETC-support-hardware-timestamping' · daeceb2d
      David S. Miller 提交于
      Y.b. Lu says:
      
      ====================
      ENETC: support hardware timestamping
      
      This patch-set is to support hardware timestamping for ENETC
      and also to add ENETC 1588 timer device tree node for ls1028a.
      
      Because the ENETC RX BD ring dynamic allocation has not been
      supported and it is too expensive to use extended RX BDs
      if timestamping is not used, a Kconfig option is used to
      enable extended RX BDs in order to support hardware
      timestamping. This option will be removed once RX BD
      ring dynamic allocation is implemented.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      daeceb2d
    • Y
      arm64: dts: fsl: ls1028a: add ENETC 1588 timer node · 49401003
      Y.b. Lu 提交于
      Add ENETC 1588 timer node which is ENETC PF 4 (Physiscal Function 4).
      Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49401003
    • Y
      dt-binding: ptp_qoriq: support ENETC PTP compatible · ad8288b8
      Y.b. Lu 提交于
      Add a new compatible for ENETC PTP.
      Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad8288b8
    • Y
      enetc: add get_ts_info interface for ethtool · 41514737
      Y.b. Lu 提交于
      This patch is to add get_ts_info interface for ethtool
      to support getting timestamping capability.
      Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41514737
    • Y
      enetc: add hardware timestamping support · d3982312
      Y.b. Lu 提交于
      This patch is to add hardware timestamping support
      for ENETC. On Rx, timestamping is enabled for all
      frames. On Tx, we only instruct the hardware to
      timestamp the frames marked accordingly by the stack.
      
      Because the RX BD ring dynamic allocation has not been
      supported and it is too expensive to use extended RX BDs
      if timestamping is not used, a Kconfig option is used to
      enable extended RX BDs in order to support hardware
      timestamping. This option will be removed once RX BD
      ring dynamic allocation is implemented.
      Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com>
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3982312
  2. 24 5月, 2019 27 次提交
  3. 23 5月, 2019 3 次提交
    • S
      hv_sock: perf: loop in send() to maximize bandwidth · 14a1eaa8
      Sunil Muthuswamy 提交于
      Currently, the hv_sock send() iterates once over the buffer, puts data into
      the VMBUS channel and returns. It doesn't maximize on the case when there
      is a simultaneous reader draining data from the channel. In such a case,
      the send() can maximize the bandwidth (and consequently minimize the cpu
      cycles) by iterating until the channel is found to be full.
      
      Perf data:
      Total Data Transfer: 10GB/iteration
      Single threaded reader/writer, Linux hvsocket writer with Windows hvsocket
      reader
      Packet size: 64KB
      CPU sys time was captured using the 'time' command for the writer to send
      10GB of data.
      'Send Buffer Loop' is with the patch applied.
      The values below are over 10 iterations.
      
      |--------------------------------------------------------|
      |        |        Current        |   Send Buffer Loop    |
      |--------------------------------------------------------|
      |        | Throughput | CPU sys  | Throughput | CPU sys  |
      |        | (MB/s)     | time (s) | (MB/s)     | time (s) |
      |--------------------------------------------------------|
      | Min    |     407    |   7.048  |    401     |  5.958   |
      |--------------------------------------------------------|
      | Max    |     455    |   7.563  |    542     |  6.993   |
      |--------------------------------------------------------|
      | Avg    |     440    |   7.411  |    451     |  6.639   |
      |--------------------------------------------------------|
      | Median |     446    |   7.417  |    447     |  6.761   |
      |--------------------------------------------------------|
      
      Observation:
      1. The avg throughput doesn't really change much with this change for this
      scenario. This is most probably because the bottleneck on throughput is
      somewhere else.
      2. The average system (or kernel) cpu time goes down by 10%+ with this
      change, for the same amount of data transfer.
      Signed-off-by: NSunil Muthuswamy <sunilmut@microsoft.com>
      Reviewed-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14a1eaa8
    • S
      hv_sock: perf: Allow the socket buffer size options to influence the actual socket buffers · ac383f58
      Sunil Muthuswamy 提交于
      Currently, the hv_sock buffer size is static and can't scale to the
      bandwidth requirements of the application. This change allows the
      applications to influence the socket buffer sizes using the SO_SNDBUF and
      the SO_RCVBUF socket options.
      
      Few interesting points to note:
      1. Since the VMBUS does not allow a resize operation of the ring size, the
      socket buffer size option should be set prior to establishing the
      connection for it to take effect.
      2. Setting the socket option comes with the cost of that much memory being
      reserved/allocated by the kernel, for the lifetime of the connection.
      
      Perf data:
      Total Data Transfer: 1GB
      Single threaded reader/writer
      Results below are summarized over 10 iterations.
      
      Linux hvsocket writer + Windows hvsocket reader:
      |---------------------------------------------------------------------------------------------|
      |Packet size ->   |      128B       |       1KB       |       4KB       |        64KB         |
      |---------------------------------------------------------------------------------------------|
      |SO_SNDBUF size | |                 Throughput in MB/s (min/max/avg/median):                  |
      |               v |                                                                           |
      |---------------------------------------------------------------------------------------------|
      |      Default    | 109/118/114/116 | 636/774/701/700 | 435/507/480/476 |   410/491/462/470   |
      |      16KB       | 110/116/112/111 | 575/705/662/671 | 749/900/854/869 |   592/824/692/676   |
      |      32KB       | 108/120/115/115 | 703/823/767/772 | 718/878/850/866 | 1593/2124/2000/2085 |
      |      64KB       | 108/119/114/114 | 592/732/683/688 | 805/934/903/911 | 1784/1943/1862/1843 |
      |---------------------------------------------------------------------------------------------|
      
      Windows hvsocket writer + Linux hvsocket reader:
      |---------------------------------------------------------------------------------------------|
      |Packet size ->   |     128B    |      1KB        |          4KB        |        64KB         |
      |---------------------------------------------------------------------------------------------|
      |SO_RCVBUF size | |               Throughput in MB/s (min/max/avg/median):                    |
      |               v |                                                                           |
      |---------------------------------------------------------------------------------------------|
      |      Default    | 69/82/75/73 | 313/343/333/336 |   418/477/446/445   |   659/701/676/678   |
      |      16KB       | 69/83/76/77 | 350/401/375/382 |   506/548/517/516   |   602/624/615/615   |
      |      32KB       | 62/83/73/73 | 471/529/496/494 |   830/1046/935/939  | 944/1180/1070/1100  |
      |      64KB       | 64/70/68/69 | 467/533/501/497 | 1260/1590/1430/1431 | 1605/1819/1670/1660 |
      |---------------------------------------------------------------------------------------------|
      Signed-off-by: NSunil Muthuswamy <sunilmut@microsoft.com>
      Reviewed-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac383f58
    • E
      ipv4/igmp: shrink struct ip_sf_list · 0db355d4
      Eric Dumazet 提交于
      Removing two 4 bytes holes allows to use kmalloc-32
      kmem cache instead of kmalloc-64 on 64bit kernels.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0db355d4