1. 28 7月, 2013 5 次提交
  2. 25 7月, 2013 4 次提交
    • T
      net: Make devnet_rename_seq static · 18afa4b0
      Thomas Gleixner 提交于
      No users outside net/core/dev.c.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18afa4b0
    • E
      tcp: TCP_NOTSENT_LOWAT socket option · c9bee3b7
      Eric Dumazet 提交于
      Idea of this patch is to add optional limitation of number of
      unsent bytes in TCP sockets, to reduce usage of kernel memory.
      
      TCP receiver might announce a big window, and TCP sender autotuning
      might allow a large amount of bytes in write queue, but this has little
      performance impact if a large part of this buffering is wasted :
      
      Write queue needs to be large only to deal with large BDP, not
      necessarily to cope with scheduling delays (incoming ACKS make room
      for the application to queue more bytes)
      
      For most workloads, using a value of 128 KB or less is OK to give
      applications enough time to react to POLLOUT events in time
      (or being awaken in a blocking sendmsg())
      
      This patch adds two ways to set the limit :
      
      1) Per socket option TCP_NOTSENT_LOWAT
      
      2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets
      not using TCP_NOTSENT_LOWAT socket option (or setting a zero value)
      Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect.
      
      This changes poll()/select()/epoll() to report POLLOUT
      only if number of unsent bytes is below tp->nosent_lowat
      
      Note this might increase number of sendmsg()/sendfile() calls
      when using non blocking sockets,
      and increase number of context switches for blocking sockets.
      
      Note this is not related to SO_SNDLOWAT (as SO_SNDLOWAT is
      defined as :
       Specify the minimum number of bytes in the buffer until
       the socket layer will pass the data to the protocol)
      
      Tested:
      
      netperf sessions, and watching /proc/net/protocols "memory" column for TCP
      
      With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory
      used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458)
      
      lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
      lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
      TCPv6     1880      2   45458   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
      TCP       1696    508   45458   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
      
      lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
      lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
      TCPv6     1880      2   20567   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
      TCP       1696    508   20567   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
      
      Using 128KB has no bad effect on the throughput or cpu usage
      of a single flow, although there is an increase of context switches.
      
      A bonus is that we hold socket lock for a shorter amount
      of time and should improve latencies of ACK processing.
      
      lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
      lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
      OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
      Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
      Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
      Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
      Final       Final                                             %     Method %      Method
      1651584     6291456     16384  20.00   17447.90   10^6bits/s  3.13  S      -1.00  U      0.353   -1.000  usec/KB
      
       Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':
      
                 412,514 context-switches
      
           200.034645535 seconds time elapsed
      
      lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
      lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
      OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
      Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
      Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
      Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
      Final       Final                                             %     Method %      Method
      1593240     6291456     16384  20.00   17321.16   10^6bits/s  3.35  S      -1.00  U      0.381   -1.000  usec/KB
      
       Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':
      
               2,675,818 context-switches
      
           200.029651391 seconds time elapsed
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-By: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9bee3b7
    • E
      net: add sk_stream_is_writeable() helper · 64dc6130
      Eric Dumazet 提交于
      Several call sites use the hardcoded following condition :
      
      sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)
      
      Lets use a helper because TCP_NOTSENT_LOWAT support will change this
      condition for TCP sockets.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64dc6130
    • D
      net: sctp: trivial: update mailing list address · 91705c61
      Daniel Borkmann 提交于
      The SCTP mailing list address to send patches or questions
      to is linux-sctp@vger.kernel.org and not
      lksctp-developers@lists.sourceforge.net anymore. Therefore,
      update all occurences.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91705c61
  3. 24 7月, 2013 3 次提交
  4. 23 7月, 2013 4 次提交
  5. 19 7月, 2013 1 次提交
    • E
      vlan: mask vlan prio bits · d4b812de
      Eric Dumazet 提交于
      In commit 48cc32d3
      ("vlan: don't deliver frames for unknown vlans to protocols")
      Florian made sure we set pkt_type to PACKET_OTHERHOST
      if the vlan id is set and we could find a vlan device for this
      particular id.
      
      But we also have a problem if prio bits are set.
      
      Steinar reported an issue on a router receiving IPv6 frames with a
      vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
      because skb->vlan_tci is set.
      
      Forwarded frame is completely corrupted : We can see (8100:4000)
      being inserted in the middle of IPv6 source address :
      
      16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 >
      9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64
             0x0000:  0000 0029 8000 c7c3 7103 0001 a0ae e651
             0x0010:  0000 0000 ccce 0b00 0000 0000 1011 1213
             0x0020:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
             0x0030:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233
      
      It seems we are not really ready to properly cope with this right now.
      
      We can probably do better in future kernels :
      vlan_get_ingress_priority() should be a netdev property instead of
      a per vlan_dev one.
      
      For stable kernels, lets clear vlan_tci to fix the bugs.
      Reported-by: NSteinar H. Gunderson <sesse@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4b812de
  6. 17 7月, 2013 10 次提交
  7. 15 7月, 2013 1 次提交
    • P
      kernel: delete __cpuinit usage from all core kernel files · 0db0628d
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      This removes all the uses of the __cpuinit macros from C files in
      the core kernel directories (kernel, init, lib, mm, and include)
      that don't really have a specific maintainer.
      
      [1] https://lkml.org/lkml/2013/5/20/589Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      0db0628d
  8. 14 7月, 2013 1 次提交
  9. 13 7月, 2013 4 次提交
  10. 12 7月, 2013 2 次提交
  11. 11 7月, 2013 5 次提交
    • M
      mm: remove free_area_cache · 98d1e64f
      Michel Lespinasse 提交于
      Since all architectures have been converted to use vm_unmapped_area(),
      there is no remaining use for the free_area_cache.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98d1e64f
    • S
      zbud: add to mm/ · 4e2e2770
      Seth Jennings 提交于
      zbud is an special purpose allocator for storing compressed pages.  It
      is designed to store up to two compressed pages per physical page.
      While this design limits storage density, it has simple and
      deterministic reclaim properties that make it preferable to a higher
      density approach when reclaim will be used.
      
      zbud works by storing compressed pages, or "zpages", together in pairs
      in a single memory page called a "zbud page".  The first buddy is "left
      justifed" at the beginning of the zbud page, and the last buddy is
      "right justified" at the end of the zbud page.  The benefit is that if
      either buddy is freed, the freed buddy space, coalesced with whatever
      slack space that existed between the buddies, results in the largest
      possible free region within the zbud page.
      
      zbud also provides an attractive lower bound on density.  The ratio of
      zpages to zbud pages can not be less than 1.  This ensures that zbud can
      never "do harm" by using more pages to store zpages than the
      uncompressed zpages would have used on their own.
      
      This implementation is a rewrite of the zbud allocator internally used
      by zcache in the driver/staging tree.  The rewrite was necessary to
      remove some of the zcache specific elements that were ingrained
      throughout and provide a generic allocation interface that can later be
      used by zsmalloc and others.
      
      This patch adds zbud to mm/ for later use by zswap.
      Signed-off-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
      Cc: Jenifer Hopper <jhopper@us.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Joe Perches <joe@perches.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
      Cc: Hugh Dickens <hughd@google.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Bob Liu <bob.liu@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e2e2770
    • E
      net: rename busy poll socket op and globals · 64b0dc51
      Eliezer Tamir 提交于
      Rename LL_SO to BUSY_POLL_SO
      Rename sysctl_net_ll_{read,poll} to sysctl_busy_{read,poll}
      Fix up users of these variables.
      Fix documentation for sysctl.
      
      a patch for the socket.7  man page will follow separately,
      because of limitations of my mail setup.
      Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64b0dc51
    • E
      net: rename ll methods to busy-poll · 8b80cda5
      Eliezer Tamir 提交于
      Rename ndo_ll_poll to ndo_busy_poll.
      Rename sk_mark_ll to sk_mark_napi_id.
      Rename skb_mark_ll to skb_mark_napi_id.
      Correct all useres of these functions.
      Update comments and defines  in include/net/busy_poll.h
      Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b80cda5
    • E
      net: rename include/net/ll_poll.h to include/net/busy_poll.h · 076bb0c8
      Eliezer Tamir 提交于
      Rename the file and correct all the places where it is included.
      Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      076bb0c8