1. 21 11月, 2015 8 次提交
  2. 19 11月, 2015 10 次提交
    • E
      net: provide generic busy polling to all NAPI drivers · 93d05d4a
      Eric Dumazet 提交于
      NAPI drivers no longer need to observe a particular protocol
      to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y)
      
      napi_hash_add() and napi_hash_del() are automatically called
      from core networking stack, respectively from
      netif_napi_add() and netif_napi_del()
      
      This patch depends on free_netdev() and netif_napi_del() being
      called from process context, which seems to be the norm.
      
      Drivers might still prefer to call napi_hash_del() on their
      own, since they might combine all the rcu grace periods into
      a single one, knowing their NAPI structures lifetime, while
      core networking stack has no idea of a possible combining.
      
      Once this patch proves to not bring serious regressions,
      we will cleanup drivers to either remove napi_hash_del()
      or provide appropriate rcu grace periods combining.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93d05d4a
    • E
      net: napi_hash_del() returns a boolean status · 34cbe27e
      Eric Dumazet 提交于
      napi_hash_del() will soon be used from both drivers (if they want)
      or core networking stack.
      
      Callers are responsibles to ensure an RCU grace period is respected
      before freeing napi structure : napi_hash_del() can signal if
      this RCU grace period is needed or not.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34cbe27e
    • E
      net: move napi_hash[] into read mostly section · 6180d9de
      Eric Dumazet 提交于
      We do not often add/delete a napi context.
      Moving napi_hash[] into read mostly section avoids potential false sharing.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6180d9de
    • E
      net: add netif_tx_napi_add() · d64b5e85
      Eric Dumazet 提交于
      netif_tx_napi_add() is a variant of netif_napi_add()
      
      It should be used by drivers that use a napi structure
      to exclusively poll TX.
      
      We do not want to add this kind of napi in napi_hash[] in following
      patches, adding generic busy polling to all NAPI drivers.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d64b5e85
    • E
      net: move skb_mark_napi_id() into core networking stack · 93f93a44
      Eric Dumazet 提交于
      We would like to automatically provide busy polling support
      to all NAPI drivers, without them having to implement anything.
      
      skb_mark_napi_id() can be called from napi_gro_receive() and
      napi_get_frags().
      
      Few drivers are still calling skb_mark_napi_id() because
      they use netif_receive_skb(). They should eventually call
      napi_gro_receive() instead. I will leave this to drivers
      maintainers.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93f93a44
    • E
      net: network drivers no longer need to implement ndo_busy_poll() · ce6aea93
      Eric Dumazet 提交于
      Instead of having to implement complex ndo_busy_poll() method,
      drivers can simply rely on NAPI poll logic.
      
      Busy polling gains are mainly coming from polling itself,
      not on exact details on how we poll the device.
      
      ndo_busy_poll() if implemented can avoid touching
      napi state, but it adds extra synchronization between
      normal napi->poll() and busy poll handler, slowing down
      the common path (non busy polling) with extra atomic operations.
      In practice few drivers ever got busy poll because of the complexity.
      
      We could go one step further, and make busy polling
      available for all NAPI drivers, but this would require
      that all netif_napi_del() calls are done in process context
      so that we can call synchronize_rcu().
      Full audit would be required.
      
      Before this is done, a driver still needs to call :
      
      - skb_mark_napi_id() for each skb provided to the stack.
      - napi_hash_add() and napi_hash_del() to allocate a napi_id per napi struct.
      - Make sure RCU grace period is respected after napi_hash_del() before
        memory containing napi structure is freed.
      
      Followup patch implements busy poll for mlx5 driver as an example.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce6aea93
    • E
      net: allow BH servicing in sk_busy_loop() · 2a028ecb
      Eric Dumazet 提交于
      Instead of blocking BH in whole sk_busy_loop(), block them
      only around ->ndo_busy_poll() calls.
      
      This has many benefits.
      
      1) allow tunneled traffic to use busy poll as well as native traffic.
         Tunnels handlers usually call netif_rx() and depend on net_rx_action()
         being run (from sofirq handler)
      
      2) allow RFS/RPS being used (sending IPI to other cpus if needed)
      
      3) use the 'lets burn cpu cycles' budget to do useful work
         (like TX completions, timers, RCU callbacks...)
      
      4) reduce BH latencies, making busy poll a better citizen.
      
      Tested:
      
      Tested with SIT tunnel
      
      lpaa5:~# echo 0 >/proc/sys/net/core/busy_read
      lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
      MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
      Local /Remote
      Socket Size   Request  Resp.   Elapsed  Trans.
      Send   Recv   Size     Size    Time     Rate
      bytes  Bytes  bytes    bytes   secs.    per sec
      
      16384  87380  1        1       10.00    37373.93
      16384  87380
      
      Now enable busy poll on both hosts
      
      lpaa5:~# echo 70 >/proc/sys/net/core/busy_read
      lpaa6:~# echo 70 >/proc/sys/net/core/busy_read
      
      lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
      MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
      Local /Remote
      Socket Size   Request  Resp.   Elapsed  Trans.
      Send   Recv   Size     Size    Time     Rate
      bytes  Bytes  bytes    bytes   secs.    per sec
      
      16384  87380  1        1       10.00    58314.77
      16384  87380
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a028ecb
    • E
      net: un-inline sk_busy_loop() · 02d62e86
      Eric Dumazet 提交于
      There is really little gain from inlining this big function.
      We'll soon make it even bigger in following patches.
      
      This means we no longer need to export napi_by_id()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02d62e86
    • E
      net: better skb->sender_cpu and skb->napi_id cohabitation · 52bd2d62
      Eric Dumazet 提交于
      skb->sender_cpu and skb->napi_id share a common storage,
      and we had various bugs about this.
      
      We had to call skb_sender_cpu_clear() in some places to
      not leave a prior skb->napi_id and fool netdev_pick_tx()
      
      As suggested by Alexei, we could split the space so that
      these errors can not happen.
      
      0 value being reserved as the common (not initialized) value,
      let's reserve [1 .. NR_CPUS] range for valid sender_cpu,
      and [NR_CPUS+1 .. ~0U] for valid napi_id.
      
      This will allow proper busy polling support over tunnels.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Suggested-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52bd2d62
    • B
      net ipv4: use preferred log methods · 09605cc1
      Bastian Stender 提交于
      Replace printk calls with preferred unconditional log method calls to keep
      kernel messages clean.
      
      Added newline to "too small MTU" message.
      Signed-off-by: NBastian Stender <bst@pengutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09605cc1
  3. 18 11月, 2015 8 次提交
  4. 17 11月, 2015 7 次提交
  5. 16 11月, 2015 7 次提交