1. 21 11月, 2015 6 次提交
    • J
      tipc: narrow down exposure of struct tipc_node · 5be9c086
      Jon Paul Maloy 提交于
      In our effort to have less code and include dependencies between
      entities such as node, link and bearer, we try to narrow down
      the exposed interface towards the node as much as possible.
      
      In this commit, we move the definition of struct tipc_node, along
      with many of its associated function declarations, from node.h to
      node.c. We also move some function definitions from link.c and
      name_distr.c to node.c, since they access fields in struct tipc_node
      that should not be externally visible. The moved functions are renamed
      according to new location, and made static whenever possible.
      
      There are no functional changes in this commit.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5be9c086
    • J
      tipc: convert node lock to rwlock · 5405ff6e
      Jon Paul Maloy 提交于
      According to the node FSM a node in state SELF_UP_PEER_UP cannot
      change state inside a lock context, except when a TUNNEL_PROTOCOL
      (SYNCH or FAILOVER) packet arrives. However, the node's individual
      links may still change state.
      
      Since each link now is protected by its own spinlock, we finally have
      the conditions in place to convert the node spinlock to an rwlock_t.
      If the node state and arriving packet type are rigth, we can let the
      link directly receive the packet under protection of its own spinlock
      and the node lock in read mode. In all other cases we use the node
      lock in write mode. This enables full concurrent execution between
      parallel links during steady-state traffic situations, i.e., 99+ %
      of the time.
      
      This commit implements this change.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5405ff6e
    • J
      tipc: introduce per-link spinlock · 2312bf61
      Jon Paul Maloy 提交于
      As a preparation to allow parallel links to work more independently
      from each other we introduce a per-link spinlock, to be stored in the
      struct nodes's link entry area. Since the node lock still is a regular
      spinlock there is no increase in parallellism at this stage.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2312bf61
    • J
      tipc: reduce code dependency between binding table and node layer · 1d7e1c25
      Jon Paul Maloy 提交于
      The file name_distr.c currently contains three functions,
      named_cluster_distribute(), tipc_publ_subcscribe() and
      tipc_publ_unsubscribe() that all directly access fields in
      struct tipc_node. We want to eliminate such dependencies, so
      we move those functions to the file node.c and rename them to
      tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe()
      respectively.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d7e1c25
    • J
      tipc: small cleanup of function tipc_node_check_state() · 5c10e979
      Jon Paul Maloy 提交于
      The function tipc_node_check_state() contains the core logics
      for handling link synchronization and failover. For this reason,
      it is important to keep it as comprehensible as possible.
      
      In this commit, we make three small cleanups.
      
      1) If the node is in state SELF_DOWN_PEER_LEAVING and the received
         packet confirms that the peer has lost contact, there will be no
         further action in this function. To make this clearer, we return
         from the function directly after the state change.
      
      2) Since commit 0f8b8e28 ("tipc: eliminate risk of stalled
         link synchronization") only the logically first TUNNEL_PROTO/SYNCH
         packet can alter the link state and set the synch point,
         independently of arrival order. Hence, there is not any longer any
         need to adjust the synch value in case such packets arrive in
         disorder. We remove this adjustment.
      
      3) It is the intention that any message arriving on any of the links
         may trig a check for and possible termination of a node SYNCH state.
         A redundant and unnoticed check for tipc_link_is_synching() obviously
         beats this purpose, with the effect that only packets arriving on the
         synching link may currently end the synch state. We remove this check.
         This change will further shorten the synchronization period between
         parallel links.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c10e979
    • J
      tipc: move linearization of buffers to generic code · c7cad0d6
      Jon Paul Maloy 提交于
      In commit 5cbb28a4 ("tipc: linearize arriving NAME_DISTR
      and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR,
      LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function
      tipc_udp_recv(). The location of the change was selected in order
      to make the commit easily appliable to 'net' and 'stable'.
      
      We now move this linearization to where it should be done, in the
      functions tipc_named_rcv() and tipc_link_proto_rcv() respectively.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7cad0d6
  2. 19 11月, 2015 10 次提交
    • E
      net: provide generic busy polling to all NAPI drivers · 93d05d4a
      Eric Dumazet 提交于
      NAPI drivers no longer need to observe a particular protocol
      to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y)
      
      napi_hash_add() and napi_hash_del() are automatically called
      from core networking stack, respectively from
      netif_napi_add() and netif_napi_del()
      
      This patch depends on free_netdev() and netif_napi_del() being
      called from process context, which seems to be the norm.
      
      Drivers might still prefer to call napi_hash_del() on their
      own, since they might combine all the rcu grace periods into
      a single one, knowing their NAPI structures lifetime, while
      core networking stack has no idea of a possible combining.
      
      Once this patch proves to not bring serious regressions,
      we will cleanup drivers to either remove napi_hash_del()
      or provide appropriate rcu grace periods combining.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93d05d4a
    • E
      net: napi_hash_del() returns a boolean status · 34cbe27e
      Eric Dumazet 提交于
      napi_hash_del() will soon be used from both drivers (if they want)
      or core networking stack.
      
      Callers are responsibles to ensure an RCU grace period is respected
      before freeing napi structure : napi_hash_del() can signal if
      this RCU grace period is needed or not.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34cbe27e
    • E
      net: move napi_hash[] into read mostly section · 6180d9de
      Eric Dumazet 提交于
      We do not often add/delete a napi context.
      Moving napi_hash[] into read mostly section avoids potential false sharing.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6180d9de
    • E
      net: add netif_tx_napi_add() · d64b5e85
      Eric Dumazet 提交于
      netif_tx_napi_add() is a variant of netif_napi_add()
      
      It should be used by drivers that use a napi structure
      to exclusively poll TX.
      
      We do not want to add this kind of napi in napi_hash[] in following
      patches, adding generic busy polling to all NAPI drivers.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d64b5e85
    • E
      net: move skb_mark_napi_id() into core networking stack · 93f93a44
      Eric Dumazet 提交于
      We would like to automatically provide busy polling support
      to all NAPI drivers, without them having to implement anything.
      
      skb_mark_napi_id() can be called from napi_gro_receive() and
      napi_get_frags().
      
      Few drivers are still calling skb_mark_napi_id() because
      they use netif_receive_skb(). They should eventually call
      napi_gro_receive() instead. I will leave this to drivers
      maintainers.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93f93a44
    • E
      net: network drivers no longer need to implement ndo_busy_poll() · ce6aea93
      Eric Dumazet 提交于
      Instead of having to implement complex ndo_busy_poll() method,
      drivers can simply rely on NAPI poll logic.
      
      Busy polling gains are mainly coming from polling itself,
      not on exact details on how we poll the device.
      
      ndo_busy_poll() if implemented can avoid touching
      napi state, but it adds extra synchronization between
      normal napi->poll() and busy poll handler, slowing down
      the common path (non busy polling) with extra atomic operations.
      In practice few drivers ever got busy poll because of the complexity.
      
      We could go one step further, and make busy polling
      available for all NAPI drivers, but this would require
      that all netif_napi_del() calls are done in process context
      so that we can call synchronize_rcu().
      Full audit would be required.
      
      Before this is done, a driver still needs to call :
      
      - skb_mark_napi_id() for each skb provided to the stack.
      - napi_hash_add() and napi_hash_del() to allocate a napi_id per napi struct.
      - Make sure RCU grace period is respected after napi_hash_del() before
        memory containing napi structure is freed.
      
      Followup patch implements busy poll for mlx5 driver as an example.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce6aea93
    • E
      net: allow BH servicing in sk_busy_loop() · 2a028ecb
      Eric Dumazet 提交于
      Instead of blocking BH in whole sk_busy_loop(), block them
      only around ->ndo_busy_poll() calls.
      
      This has many benefits.
      
      1) allow tunneled traffic to use busy poll as well as native traffic.
         Tunnels handlers usually call netif_rx() and depend on net_rx_action()
         being run (from sofirq handler)
      
      2) allow RFS/RPS being used (sending IPI to other cpus if needed)
      
      3) use the 'lets burn cpu cycles' budget to do useful work
         (like TX completions, timers, RCU callbacks...)
      
      4) reduce BH latencies, making busy poll a better citizen.
      
      Tested:
      
      Tested with SIT tunnel
      
      lpaa5:~# echo 0 >/proc/sys/net/core/busy_read
      lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
      MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
      Local /Remote
      Socket Size   Request  Resp.   Elapsed  Trans.
      Send   Recv   Size     Size    Time     Rate
      bytes  Bytes  bytes    bytes   secs.    per sec
      
      16384  87380  1        1       10.00    37373.93
      16384  87380
      
      Now enable busy poll on both hosts
      
      lpaa5:~# echo 70 >/proc/sys/net/core/busy_read
      lpaa6:~# echo 70 >/proc/sys/net/core/busy_read
      
      lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
      MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
      Local /Remote
      Socket Size   Request  Resp.   Elapsed  Trans.
      Send   Recv   Size     Size    Time     Rate
      bytes  Bytes  bytes    bytes   secs.    per sec
      
      16384  87380  1        1       10.00    58314.77
      16384  87380
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a028ecb
    • E
      net: un-inline sk_busy_loop() · 02d62e86
      Eric Dumazet 提交于
      There is really little gain from inlining this big function.
      We'll soon make it even bigger in following patches.
      
      This means we no longer need to export napi_by_id()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02d62e86
    • E
      net: better skb->sender_cpu and skb->napi_id cohabitation · 52bd2d62
      Eric Dumazet 提交于
      skb->sender_cpu and skb->napi_id share a common storage,
      and we had various bugs about this.
      
      We had to call skb_sender_cpu_clear() in some places to
      not leave a prior skb->napi_id and fool netdev_pick_tx()
      
      As suggested by Alexei, we could split the space so that
      these errors can not happen.
      
      0 value being reserved as the common (not initialized) value,
      let's reserve [1 .. NR_CPUS] range for valid sender_cpu,
      and [NR_CPUS+1 .. ~0U] for valid napi_id.
      
      This will allow proper busy polling support over tunnels.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Suggested-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52bd2d62
    • B
      net ipv4: use preferred log methods · 09605cc1
      Bastian Stender 提交于
      Replace printk calls with preferred unconditional log method calls to keep
      kernel messages clean.
      
      Added newline to "too small MTU" message.
      Signed-off-by: NBastian Stender <bst@pengutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09605cc1
  3. 18 11月, 2015 8 次提交
  4. 17 11月, 2015 7 次提交
  5. 16 11月, 2015 9 次提交