1. 05 5月, 2020 1 次提交
    • A
      bpf: Avoid gcc-10 stringop-overflow warning in struct bpf_prog · d26c0cc5
      Arnd Bergmann 提交于
      gcc-10 warns about accesses to zero-length arrays:
      
      kernel/bpf/core.c: In function 'bpf_patch_insn_single':
      cc1: warning: writing 8 bytes into a region of size 0 [-Wstringop-overflow=]
      In file included from kernel/bpf/core.c:21:
      include/linux/filter.h:550:20: note: at offset 0 to object 'insnsi' with size 0 declared here
        550 |   struct bpf_insn  insnsi[0];
            |                    ^~~~~~
      
      In this case, we really want to have two flexible-array members,
      but that is not possible. Removing the union to make insnsi a
      flexible-array member while leaving insns as a zero-length array
      fixes the warning, as nothing writes to the other one in that way.
      
      This trick only works on linux-3.18 or higher, as older versions
      had additional members in the union.
      
      Fixes: 60a3b225 ("net: bpf: make eBPF interpreter images read-only")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200430213101.135134-6-arnd@arndb.de
      d26c0cc5
  2. 03 5月, 2020 1 次提交
  3. 02 5月, 2020 1 次提交
    • S
      bpf: Sharing bpf runtime stats with BPF_ENABLE_STATS · d46edd67
      Song Liu 提交于
      Currently, sysctl kernel.bpf_stats_enabled controls BPF runtime stats.
      Typical userspace tools use kernel.bpf_stats_enabled as follows:
      
        1. Enable kernel.bpf_stats_enabled;
        2. Check program run_time_ns;
        3. Sleep for the monitoring period;
        4. Check program run_time_ns again, calculate the difference;
        5. Disable kernel.bpf_stats_enabled.
      
      The problem with this approach is that only one userspace tool can toggle
      this sysctl. If multiple tools toggle the sysctl at the same time, the
      measurement may be inaccurate.
      
      To fix this problem while keep backward compatibility, introduce a new
      bpf command BPF_ENABLE_STATS. On success, this command enables stats and
      returns a valid fd. BPF_ENABLE_STATS takes argument "type". Currently,
      only one type, BPF_STATS_RUN_TIME, is supported. We can extend the
      command to support other types of stats in the future.
      
      With BPF_ENABLE_STATS, user space tool would have the following flow:
      
        1. Get a fd with BPF_ENABLE_STATS, and make sure it is valid;
        2. Check program run_time_ns;
        3. Sleep for the monitoring period;
        4. Check program run_time_ns again, calculate the difference;
        5. Close the fd.
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200430071506.1408910-2-songliubraving@fb.com
      d46edd67
  4. 01 5月, 2020 5 次提交
  5. 29 4月, 2020 12 次提交
  6. 28 4月, 2020 1 次提交
  7. 27 4月, 2020 4 次提交
  8. 26 4月, 2020 2 次提交
  9. 25 4月, 2020 2 次提交
  10. 24 4月, 2020 4 次提交
    • E
      net: napi: add hard irqs deferral feature · 6f8b12d6
      Eric Dumazet 提交于
      Back in commit 3b47d303 ("net: gro: add a per device gro flush timer")
      we added the ability to arm one high resolution timer, that we used
      to keep not-complete packets in GRO engine a bit longer, hoping that further
      frames might be added to them.
      
      Since then, we added the napi_complete_done() interface, and commit
      364b6055 ("net: busy-poll: return busypolling status to drivers")
      allowed drivers to avoid re-arming NIC interrupts if we made a promise
      that their NAPI poll() handler would be called in the near future.
      
      This infrastructure can be leveraged, thanks to a new device parameter,
      which allows to arm the napi hrtimer, instead of re-arming the device
      hard IRQ.
      
      We have noticed that on some servers with 32 RX queues or more, the chit-chat
      between the NIC and the host caused by IRQ delivery and re-arming could hurt
      throughput by ~20% on 100Gbit NIC.
      
      In contrast, hrtimers are using local (percpu) resources and might have lower
      cost.
      
      The new tunable, named napi_defer_hard_irqs, is placed in the same hierarchy
      than gro_flush_timeout (/sys/class/net/ethX/)
      
      By default, both gro_flush_timeout and napi_defer_hard_irqs are zero.
      
      This patch does not change the prior behavior of gro_flush_timeout
      if used alone : NIC hard irqs should be rearmed as before.
      
      One concrete usage can be :
      
      echo 20000 >/sys/class/net/eth1/gro_flush_timeout
      echo 10 >/sys/class/net/eth1/napi_defer_hard_irqs
      
      If at least one packet is retired, then we will reset napi counter
      to 10 (napi_defer_hard_irqs), ensuring at least 10 periodic scans
      of the queue.
      
      On busy queues, this should avoid NIC hard IRQ, while before this patch IRQ
      avoidance was only possible if napi->poll() was exhausting its budget
      and not call napi_complete_done().
      
      This feature also can be used to work around some non-optimal NIC irq
      coalescing strategies.
      
      Having the ability to insert XX usec delays between each napi->poll()
      can increase cache efficiency, since we increase batch sizes.
      
      It also keeps serving cpus not idle too long, reducing tail latencies.
      Co-developed-by: NLuigi Rizzo <lrizzo@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f8b12d6
    • L
      net/mlx5: Update transobj.c new cmd interface · e0b4b472
      Leon Romanovsky 提交于
      Do mass update of transobj.c to reuse newly introduced
      mlx5_cmd_exec_in*() interfaces.
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      e0b4b472
    • L
      net/mlx5: Update cq.c to new cmd interface · d1f62050
      Leon Romanovsky 提交于
      Do mass update of cq.c to reuse newly introduced
      mlx5_cmd_exec_in*() interfaces.
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      d1f62050
    • L
      net/mlx5: Update vport.c to new cmd interface · 5d1c9a11
      Leon Romanovsky 提交于
      Do mass update of vport.c to reuse newly introduced
      mlx5_cmd_exec_in*() interfaces.
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      5d1c9a11
  11. 23 4月, 2020 6 次提交
  12. 22 4月, 2020 1 次提交
    • J
      pnp: Use list_for_each_entry() instead of open coding · 01b2bafe
      Jason Gunthorpe 提交于
      Aside from good practice, this avoids a warning from gcc 10:
      
      ./include/linux/kernel.h:997:3: warning: array subscript -31 is outside array bounds of ‘struct list_head[1]’ [-Warray-bounds]
        997 |  ((type *)(__mptr - offsetof(type, member))); })
            |  ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      ./include/linux/list.h:493:2: note: in expansion of macro ‘container_of’
        493 |  container_of(ptr, type, member)
            |  ^~~~~~~~~~~~
      ./include/linux/pnp.h:275:30: note: in expansion of macro ‘list_entry’
        275 | #define global_to_pnp_dev(n) list_entry(n, struct pnp_dev, global_list)
            |                              ^~~~~~~~~~
      ./include/linux/pnp.h:281:11: note: in expansion of macro ‘global_to_pnp_dev’
        281 |  (dev) != global_to_pnp_dev(&pnp_global); \
            |           ^~~~~~~~~~~~~~~~~
      arch/x86/kernel/rtc.c:189:2: note: in expansion of macro ‘pnp_for_each_dev’
        189 |  pnp_for_each_dev(dev) {
      
      Because the common code doesn't cast the starting list_head to the
      containing struct.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      [ rjw: Whitespace adjustments ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      01b2bafe