1. 20 12月, 2020 8 次提交
  2. 05 12月, 2020 1 次提交
  3. 01 12月, 2020 2 次提交
    • B
      net: Add SO_BUSY_POLL_BUDGET socket option · 7c951caf
      Björn Töpel 提交于
      This option lets a user set a per socket NAPI budget for
      busy-polling. If the options is not set, it will use the default of 8.
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/bpf/20201130185205.196029-3-bjorn.topel@gmail.com
      7c951caf
    • B
      net: Introduce preferred busy-polling · 7fd3253a
      Björn Töpel 提交于
      The existing busy-polling mode, enabled by the SO_BUSY_POLL socket
      option or system-wide using the /proc/sys/net/core/busy_read knob, is
      an opportunistic. That means that if the NAPI context is not
      scheduled, it will poll it. If, after busy-polling, the budget is
      exceeded the busy-polling logic will schedule the NAPI onto the
      regular softirq handling.
      
      One implication of the behavior above is that a busy/heavy loaded NAPI
      context will never enter/allow for busy-polling. Some applications
      prefer that most NAPI processing would be done by busy-polling.
      
      This series adds a new socket option, SO_PREFER_BUSY_POLL, that works
      in concert with the napi_defer_hard_irqs and gro_flush_timeout
      knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were
      introduced in commit 6f8b12d6 ("net: napi: add hard irqs deferral
      feature"), and allows for a user to defer interrupts to be enabled and
      instead schedule the NAPI context from a watchdog timer. When a user
      enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled,
      and the NAPI context is being processed by a softirq, the softirq NAPI
      processing will exit early to allow the busy-polling to be performed.
      
      If the application stops performing busy-polling via a system call,
      the watchdog timer defined by gro_flush_timeout will timeout, and
      regular softirq handling will resume.
      
      In summary; Heavy traffic applications that prefer busy-polling over
      softirq processing should use this option.
      
      Example usage:
      
        $ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs
        $ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout
      
      Note that the timeout should be larger than the userspace processing
      window, otherwise the watchdog will timeout and fall back to regular
      softirq processing.
      
      Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket.
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/bpf/20201130185205.196029-2-bjorn.topel@gmail.com
      7fd3253a
  4. 26 10月, 2020 27 次提交
  5. 25 9月, 2020 1 次提交
  6. 11 9月, 2020 1 次提交
    • A
      epoll: EPOLL_CTL_ADD: close the race in decision to take fast path · fe0a916c
      Al Viro 提交于
      Checking for the lack of epitems refering to the epoll we want to insert into
      is not enough; we might have an insertion of that epoll into another one that
      has already collected the set of files to recheck for excessive reverse paths,
      but hasn't gotten to creating/inserting the epitem for it.
      
      However, any such insertion in progress can be detected - it will update the
      generation count in our epoll when it's done looking through it for files
      to check.  That gets done under ->mtx of our epoll and that allows us to
      detect that safely.
      
      We are *not* holding epmutex here, so the generation count is not stable.
      However, since both the update of ep->gen by loop check and (later)
      insertion into ->f_ep_link are done with ep->mtx held, we are fine -
      the sequence is
      	grab epmutex
      	bump loop_check_gen
      	...
      	grab tep->mtx		// 1
      	tep->gen = loop_check_gen
      	...
      	drop tep->mtx		// 2
      	...
      	grab tep->mtx		// 3
      	...
      	insert into ->f_ep_link
      	...
      	drop tep->mtx		// 4
      	bump loop_check_gen
      	drop epmutex
      and if the fastpath check in another thread happens for that
      eventpoll, it can come
      	* before (1) - in that case fastpath is just fine
      	* after (4) - we'll see non-empty ->f_ep_link, slow path
      taken
      	* between (2) and (3) - loop_check_gen is stable,
      with ->mtx providing barriers and we end up taking slow path.
      
      Note that ->f_ep_link emptiness check is slightly racy - we are protected
      against insertions into that list, but removals can happen right under us.
      Not a problem - in the worst case we'll end up taking a slow path for
      no good reason.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fe0a916c