1. 13 11月, 2014 4 次提交
  2. 12 11月, 2014 6 次提交
    • J
      irda: Remove IRDA_<TYPE> logging macros · 6c91023d
      Joe Perches 提交于
      And use the more common mechanisms directly.
      
      Other miscellanea:
      
      o Coalesce formats
      o Add missing newlines
      o Realign arguments
      o Remove unnecessary OOM message logging as
        there's a generic stack dump already on OOM.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c91023d
    • W
      neigh: remove dynamic neigh table registration support · d7480fd3
      WANG Cong 提交于
      Currently there are only three neigh tables in the whole kernel:
      arp table, ndisc table and decnet neigh table. What's more,
      we don't support registering multiple tables per family.
      Therefore we can just make these tables statically built-in.
      
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7480fd3
    • J
      net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited · ba7a46f1
      Joe Perches 提交于
      Use the more common dynamic_debug capable net_dbg_ratelimited
      and remove the LIMIT_NETDEBUG macro.
      
      All messages are still ratelimited.
      
      Some KERN_<LEVEL> uses are changed to KERN_DEBUG.
      
      This may have some negative impact on messages that were
      emitted at KERN_INFO that are not not enabled at all unless
      DEBUG is defined or dynamic_debug is enabled.  Even so,
      these messages are now _not_ emitted by default.
      
      This also eliminates the use of the net_msg_warn sysctl
      "/proc/sys/net/core/warnings".  For backward compatibility,
      the sysctl is not removed, but it has no function.  The extern
      declaration of net_msg_warn is removed from sock.h and made
      static in net/core/sysctl_net_core.c
      
      Miscellanea:
      
      o Update the sysctl documentation
      o Remove the embedded uses of pr_fmt
      o Coalesce format fragments
      o Realign arguments
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba7a46f1
    • J
      dsa: Use netdev_<level> instead of printk · a2ae6007
      Joe Perches 提交于
      Neaten and standardize the logging output.
      
      Other miscellanea:
      
      o Use pr_notice_once instead of a guard flag.
      o Convert existing pr_<level> uses too.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2ae6007
    • E
      net: introduce SO_INCOMING_CPU · 2c8c56e1
      Eric Dumazet 提交于
      Alternative to RPS/RFS is to use hardware support for multiple
      queues.
      
      Then split a set of million of sockets into worker threads, each
      one using epoll() to manage events on its own socket pool.
      
      Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
      know after accept() or connect() on which queue/cpu a socket is managed.
      
      We normally use one cpu per RX queue (IRQ smp_affinity being properly
      set), so remembering on socket structure which cpu delivered last packet
      is enough to solve the problem.
      
      After accept(), connect(), or even file descriptor passing around
      processes, applications can use :
      
       int cpu;
       socklen_t len = sizeof(cpu);
      
       getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);
      
      And use this information to put the socket into the right silo
      for optimal performance, as all networking stack should run
      on the appropriate cpu, without need to send IPI (RPS/RFS).
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c8c56e1
    • E
      tcp: move sk_mark_napi_id() at the right place · 3d97379a
      Eric Dumazet 提交于
      sk_mark_napi_id() is used to record for a flow napi id of incoming
      packets for busypoll sake.
      We should do this only on established flows, not on listeners.
      
      This was 'working' by virtue of the socket cloning, but doing
      this on SYN packets in unecessary cache line dirtying.
      
      Even if we move sk_napi_id in the same cache line than sk_lock,
      we are working to make SYN processing lockless, so it is desirable
      to set sk_napi_id only for established flows.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d97379a
  3. 11 11月, 2014 3 次提交
    • H
      ipv4: Avoid reading user iov twice after raw_probe_proto_opt · c008ba5b
      Herbert Xu 提交于
      Ever since raw_probe_proto_opt was added it had the problem of
      causing the user iov to be read twice, once during the probe for
      the protocol header and once again in ip_append_data.
      
      This is a potential security problem since it means that whatever
      we're probing may be invalid.  This patch plugs the hole by
      firstly advancing the iov so we don't read the same spot again,
      and secondly saving what we read the first time around for use
      by ip_append_data.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c008ba5b
    • H
      ipv4: Use standard iovec primitive in raw_probe_proto_opt · 32b5913a
      Herbert Xu 提交于
      The function raw_probe_proto_opt tries to extract the first two
      bytes from the user input in order to seed the IPsec lookup for
      ICMP packets.  In doing so it's processing iovec by hand and
      overcomplicating things.
      
      This patch replaces the manual iovec processing with a call to
      memcpy_fromiovecend.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32b5913a
    • E
      net: gro: add a per device gro flush timer · 3b47d303
      Eric Dumazet 提交于
      Tuning coalescing parameters on NIC can be really hard.
      
      Servers can handle both bulk and RPC like traffic, with conflicting
      goals : bulk flows want as big GRO packets as possible, RPC want minimal
      latencies.
      
      To reach big GRO packets on 10Gbe NIC, one can use :
      
      ethtool -C eth0 rx-usecs 4 rx-frames 44
      
      But this penalizes rpc sessions, with an increase of latencies, up to
      50% in some cases, as NICs generally do not force an interrupt when
      a packet with TCP Push flag is received.
      
      Some NICs do not have an absolute timer, only a timer rearmed for every
      incoming packet.
      
      This patch uses a different strategy : Let GRO stack decides what do do,
      based on traffic pattern.
      
      Packets with Push flag wont be delayed.
      Packets without Push flag might be held in GRO engine, if we keep
      receiving data.
      
      This new mechanism is off by default, and shall be enabled by setting
      /sys/class/net/ethX/gro_flush_timeout to a value in nanosecond.
      
      To fully enable this mechanism, drivers should use napi_complete_done()
      instead of napi_complete().
      
      Tested:
       Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)
      
      Without this feature, we send back about 305,000 ACK per second.
      
      GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
      
      Setting a timer of 2000 nsec is enough to increase GRO packet sizes
      and reduce number of ACK packets. (811/19.2 = 42)
      
      Receiver performs less calls to upper stacks, less wakes up.
      This also reduces cpu usage on the sender, as it receives less ACK
      packets.
      
      Note that reducing number of wakes up increases cpu efficiency, but can
      decrease QPS, as applications wont have the chance to warmup cpu caches
      doing a partial read of RPC requests/answers if they fit in one skb.
      
      B:~# sar -n DEV 1 10 | grep eth0 | tail -1
      Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
      0.00      0.50
      
      B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
      
      B:~# sar -n DEV 1 10 | grep eth0 | tail -1
      Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
      0.00      0.50
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b47d303
  4. 10 11月, 2014 6 次提交
  5. 09 11月, 2014 1 次提交
  6. 08 11月, 2014 3 次提交
  7. 07 11月, 2014 6 次提交
  8. 06 11月, 2014 11 次提交