1. 30 8月, 2013 3 次提交
    • D
      net: packet: document available fanout policies · 7ec06da8
      Daniel Borkmann 提交于
      Update documentation to add fanout policies that are available.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ec06da8
    • E
      tcp: TSO packets automatic sizing · 95bd09eb
      Eric Dumazet 提交于
      After hearing many people over past years complaining against TSO being
      bursty or even buggy, we are proud to present automatic sizing of TSO
      packets.
      
      One part of the problem is that tcp_tso_should_defer() uses an heuristic
      relying on upcoming ACKS instead of a timer, but more generally, having
      big TSO packets makes little sense for low rates, as it tends to create
      micro bursts on the network, and general consensus is to reduce the
      buffering amount.
      
      This patch introduces a per socket sk_pacing_rate, that approximates
      the current sending rate, and allows us to size the TSO packets so
      that we try to send one packet every ms.
      
      This field could be set by other transports.
      
      Patch has no impact for high speed flows, where having large TSO packets
      makes sense to reach line rate.
      
      For other flows, this helps better packet scheduling and ACK clocking.
      
      This patch increases performance of TCP flows in lossy environments.
      
      A new sysctl (tcp_min_tso_segs) is added, to specify the
      minimal size of a TSO packet (default being 2).
      
      A follow-up patch will provide a new packet scheduler (FQ), using
      sk_pacing_rate as an input to perform optional per flow pacing.
      
      This explains why we chose to set sk_pacing_rate to twice the current
      rate, allowing 'slow start' ramp up.
      
      sk_pacing_rate = 2 * cwnd * mss / srtt
      
      v2: Neal Cardwell reported a suspect deferring of last two segments on
      initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
      into account tp->xmit_size_goal_segs
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Van Jacobson <vanj@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95bd09eb
    • H
      ipv6: drop fragmented ndisc packets by default (RFC 6980) · b800c3b9
      Hannes Frederic Sowa 提交于
      This patch implements RFC6980: Drop fragmented ndisc packets by
      default. If a fragmented ndisc packet is received the user is informed
      that it is possible to disable the check.
      
      Cc: Fernando Gont <fernando@gont.com.ar>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b800c3b9
  2. 28 8月, 2013 1 次提交
  3. 24 8月, 2013 2 次提交
  4. 21 8月, 2013 1 次提交
  5. 14 8月, 2013 1 次提交
    • H
      ipv6: make unsolicited report intervals configurable for mld · fc4eba58
      Hannes Frederic Sowa 提交于
      Commit cab70040 ("net: igmp:
      Reduce Unsolicited report interval to 1s when using IGMPv3") and
      2690048c ("net: igmp: Allow user-space
      configuration of igmp unsolicited report interval") by William Manley made
      igmp unsolicited report intervals configurable per interface and corrected
      the interval of unsolicited igmpv3 report messages resendings to 1s.
      
      Same needs to be done for IPv6:
      
      MLDv1 (RFC2710 7.10.): 10 seconds
      MLDv2 (RFC3810 9.11.): 1 second
      
      Both intervals are configurable via new procfs knobs
      mldv1_unsolicited_report_interval and mldv2_unsolicited_report_interval.
      
      (also added .force_mld_version to ipv6_devconf_dflt to bring structs in
      line without semantic changes)
      
      v2:
      a) Joined documentation update for IPv4 and IPv6 MLD/IGMP
         unsolicited_report_interval procfs knobs.
      b) incorporate stylistic feedback from William Manley
      
      v3:
      a) add new DEVCONF_* values to the end of the enum (thanks to David
         Miller)
      
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: William Manley <william.manley@youview.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc4eba58
  6. 12 8月, 2013 1 次提交
  7. 10 8月, 2013 2 次提交
  8. 07 8月, 2013 1 次提交
  9. 02 8月, 2013 1 次提交
  10. 01 8月, 2013 2 次提交
  11. 31 7月, 2013 2 次提交
  12. 26 7月, 2013 1 次提交
  13. 25 7月, 2013 3 次提交
    • T
      HOWTO ja_JP sync · 9b0ee8cf
      Tsugikazu Shibata 提交于
      Attached is Documentation/ja_JP/HOWTO sync patch for 3.10.
      This patch was reviewed by Japanese translation community called JF.
      Signed-off-by: NTsugikazu Shibata <tshibata@ab.jp.nec.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b0ee8cf
    • E
      tcp: TCP_NOTSENT_LOWAT socket option · c9bee3b7
      Eric Dumazet 提交于
      Idea of this patch is to add optional limitation of number of
      unsent bytes in TCP sockets, to reduce usage of kernel memory.
      
      TCP receiver might announce a big window, and TCP sender autotuning
      might allow a large amount of bytes in write queue, but this has little
      performance impact if a large part of this buffering is wasted :
      
      Write queue needs to be large only to deal with large BDP, not
      necessarily to cope with scheduling delays (incoming ACKS make room
      for the application to queue more bytes)
      
      For most workloads, using a value of 128 KB or less is OK to give
      applications enough time to react to POLLOUT events in time
      (or being awaken in a blocking sendmsg())
      
      This patch adds two ways to set the limit :
      
      1) Per socket option TCP_NOTSENT_LOWAT
      
      2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets
      not using TCP_NOTSENT_LOWAT socket option (or setting a zero value)
      Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect.
      
      This changes poll()/select()/epoll() to report POLLOUT
      only if number of unsent bytes is below tp->nosent_lowat
      
      Note this might increase number of sendmsg()/sendfile() calls
      when using non blocking sockets,
      and increase number of context switches for blocking sockets.
      
      Note this is not related to SO_SNDLOWAT (as SO_SNDLOWAT is
      defined as :
       Specify the minimum number of bytes in the buffer until
       the socket layer will pass the data to the protocol)
      
      Tested:
      
      netperf sessions, and watching /proc/net/protocols "memory" column for TCP
      
      With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory
      used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458)
      
      lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
      lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
      TCPv6     1880      2   45458   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
      TCP       1696    508   45458   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
      
      lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
      lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
      TCPv6     1880      2   20567   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
      TCP       1696    508   20567   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
      
      Using 128KB has no bad effect on the throughput or cpu usage
      of a single flow, although there is an increase of context switches.
      
      A bonus is that we hold socket lock for a shorter amount
      of time and should improve latencies of ACK processing.
      
      lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
      lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
      OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
      Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
      Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
      Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
      Final       Final                                             %     Method %      Method
      1651584     6291456     16384  20.00   17447.90   10^6bits/s  3.13  S      -1.00  U      0.353   -1.000  usec/KB
      
       Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':
      
                 412,514 context-switches
      
           200.034645535 seconds time elapsed
      
      lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
      lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
      OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
      Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
      Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
      Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
      Final       Final                                             %     Method %      Method
      1593240     6291456     16384  20.00   17321.16   10^6bits/s  3.35  S      -1.00  U      0.381   -1.000  usec/KB
      
       Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':
      
               2,675,818 context-switches
      
           200.029651391 seconds time elapsed
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-By: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9bee3b7
    • D
      net: sctp: trivial: update mailing list address · 91705c61
      Daniel Borkmann 提交于
      The SCTP mailing list address to send patches or questions
      to is linux-sctp@vger.kernel.org and not
      lksctp-developers@lists.sourceforge.net anymore. Therefore,
      update all occurences.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91705c61
  14. 23 7月, 2013 2 次提交
  15. 17 7月, 2013 2 次提交
  16. 15 7月, 2013 2 次提交
  17. 12 7月, 2013 3 次提交
  18. 11 7月, 2013 4 次提交
    • S
      zswap: add documentation · 61b0d760
      Seth Jennings 提交于
      Add the documentation file for the zswap functionality
      Signed-off-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
      Cc: Jenifer Hopper <jhopper@us.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Joe Perches <joe@perches.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
      Cc: Hugh Dickens <hughd@google.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61b0d760
    • E
      net: rename busy poll socket op and globals · 64b0dc51
      Eliezer Tamir 提交于
      Rename LL_SO to BUSY_POLL_SO
      Rename sysctl_net_ll_{read,poll} to sysctl_busy_{read,poll}
      Fix up users of these variables.
      Fix documentation for sysctl.
      
      a patch for the socket.7  man page will follow separately,
      because of limitations of my mail setup.
      Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64b0dc51
    • J
      dm: add switch target · 9d0eb0ab
      Jim Ramsay 提交于
      dm-switch is a new target that maps IO to underlying block devices
      efficiently when there is a large number of fixed-sized address regions
      but there is no simple pattern to allow for a compact mapping
      representation such as dm-stripe.
      
      Though we have developed this target for a specific storage device, Dell
      EqualLogic, we have made an effort to keep it as general purpose as
      possible in the hope that others may benefit.
      
      Originally developed by Jim Ramsay. Simplified by Mikulas Patocka.
      Signed-off-by: NJim Ramsay <jim_ramsay@dell.com>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      9d0eb0ab
    • M
      Add support driver for Intel Rapid Start Technology · 34a956db
      Matthew Garrett 提交于
      Intel Rapid Start Technology is a firmware-based suspend-to-disk
      implementation. Once placed in S3, the device will wake once either a
      timeout elapses or the battery reaches a critical level. It will then resume
      to the firmware and copy the contents of RAM to a specialised partition, and
      then power off the machine. If the user turns the machine back on the
      firmware will copy the contents of the partition back to RAM and then resume
      from S3 as normal.
      
      This driver provides an interface for configuring the wakeup events and
      timeout. It still requires firmware support and an appropriate suspend
      partition.
      Signed-off-by: NMatthew Garrett <mjg59@srcf.ucam.org>
      Signed-off-by: NMatthew Garrett <matthew.garrett@nebula.com>
      34a956db
  19. 10 7月, 2013 6 次提交