1. 26 3月, 2006 1 次提交
    • D
      [PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications · f348d70a
      Davide Libenzi 提交于
      Implement the half-closed devices notifiation, by adding a new POLLRDHUP
      (and its alias EPOLLRDHUP) bit to the existing poll/select sets.  Since the
      existing POLLHUP handling, that does not report correctly half-closed
      devices, was feared to be changed, this implementation leaves the current
      POLLHUP reporting unchanged and simply add a new bit that is set in the few
      places where it makes sense.  The same thing was discussed and conceptually
      agreed quite some time ago:
      
      http://lkml.org/lkml/2003/7/12/116
      
      Since this new event bit is added to the existing Linux poll infrastruture,
      even the existing poll/select system calls will be able to use it.  As far
      as the existing POLLHUP handling, the patch leaves it as is.  The
      pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing
      archs and sets the bit in the six relevant files.  The other attached diff
      is the simple change required to sys/epoll.h to add the EPOLLRDHUP
      definition.
      
      There is "a stupid program" to test POLLRDHUP delivery here:
      
       http://www.xmailserver.org/pollrdhup-test.c
      
      It tests poll(2), but since the delivery is same epoll(2) will work equally.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f348d70a
  2. 21 3月, 2006 3 次提交
  3. 04 1月, 2006 2 次提交
  4. 30 11月, 2005 2 次提交
    • A
      [NET]: Add const markers to various variables. · 9b5b5cff
      Arjan van de Ven 提交于
      the patch below marks various variables const in net/; the goal is to
      move them to the .rodata section so that they can't false-share
      cachelines with things that get written to, as well as potentially
      helping gcc a bit with optimisations.  (these were found using a gcc
      patch to warn about such variables)
      Signed-off-by: NArjan van de Ven <arjan@infradead.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b5b5cff
    • M
      [IPV4] tcp/route: Another look at hash table sizes · 18955cfc
      Mike Stroyan 提交于
        The tcp_ehash hash table gets too big on systems with really big memory.
      It is worse on systems with pages larger than 4KB.  It wastes memory that
      could be better used.  It also makes the netstat command slow because reading
      /proc/net/tcp and /proc/net/tcp6 needs to go through the full hash table.
      
        The default value should not be larger for larger page sizes.  It seems
      that the effect of page size is an unintended error dating back a long
      time.  I also wonder if the default value really should be a larger
      fraction of memory for systems with more memory.  While systems with
      really big ram can afford more space for hash tables, it is not clear to
      me that they benefit from increasing the allocation ratio for this table.
      
        The amount of memory allocated is determined by net/ipv4/tcp.c:tcp_init and
      mm/page_alloc.c:alloc_large_system_hash.
      
      tcp_init calls alloc_large_system_hash passing parameters-
          bucketsize=sizeof(struct tcp_ehash_bucket)
          numentries=thash_entries
          scale=(num_physpages >= 128 * 1024) ? (25-PAGE_SHIFT) : (27-PAGE_SHIFT)
          limit=0
      
      On i386, PAGE_SHIFT is 12 for a page size of 4K
      On ia64, PAGE_SHIFT defaults to 14 for a page size of 16K
      
      The num_physpages test above makes the allocation take a larger fraction
      of the total memory on systems with larger memory.  The threshold size
      for a i386 system is 512MB.  For an ia64 system with 16KB pages the
      threshold is 2GB.
      
      For smaller memory systems-
      On i386, scale = (27 - 12) = 15
      On ia64, scale = (27 - 14) = 13
      For larger memory systems-
      On i386, scale = (25 - 12) = 13
      On ia64, scale = (25 - 14) = 11
      
        For the rest of this discussion, I'll just track the larger memory case.
      
        The default behavior has numentries=thash_entries=0, so the allocated
      size is determined by either scale or by the default limit of 1/16 of
      total memory.
      
      In alloc_large_system_hash-
      |	numentries = (flags & HASH_HIGHMEM) ? nr_all_pages : nr_kernel_pages;
      |	numentries += (1UL << (20 - PAGE_SHIFT)) - 1;
      |	numentries >>= 20 - PAGE_SHIFT;
      |	numentries <<= 20 - PAGE_SHIFT;
      
        At this point, numentries is pages for all of memory, rounded up to the
      nearest megabyte boundary.
      
      |	/* limit to 1 bucket per 2^scale bytes of low memory */
      |	if (scale > PAGE_SHIFT)
      |		numentries >>= (scale - PAGE_SHIFT);
      |	else
      |		numentries <<= (PAGE_SHIFT - scale);
      
      On i386, numentries >>= (13 - 12), so numentries is 1/8196 of
      bytes of total memory.
      On ia64, numentries <<= (14 - 11), so numentries is 1/2048 of
      bytes of total memory.
      
      |        log2qty = long_log2(numentries);
      |
      |        do {
      |                size = bucketsize << log2qty;
      
      bucketsize is 16, so size is 16 times numentries, rounded
      down to a power of two.
      
      On i386, size is 1/512 of bytes of total memory.
      On ia64, size is 1/128 of bytes of total memory.
      
      For smaller systems the results are
      On i386, size is 1/2048 of bytes of total memory.
      On ia64, size is 1/512 of bytes of total memory.
      
        The large page effect can be removed by just replacing
      the use of PAGE_SHIFT with a constant of 12 in the calls to
      alloc_large_system_hash.  That makes them more like the other uses of
      that function from fs/inode.c and fs/dcache.c
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18955cfc
  5. 11 11月, 2005 2 次提交
  6. 06 11月, 2005 1 次提交
  7. 06 9月, 2005 1 次提交
  8. 02 9月, 2005 2 次提交
  9. 30 8月, 2005 17 次提交
  10. 24 8月, 2005 1 次提交
    • D
      [TCP]: Unconditionally clear TCP_NAGLE_PUSH in skb_entail(). · 89ebd197
      David S. Miller 提交于
      Intention of this bit is to force pushing of the existing
      send queue when TCP_CORK or TCP_NODELAY state changes via
      setsockopt().
      
      But it's easy to create a situation where the bit never
      clears.  For example, if the send queue starts empty:
      
      1) set TCP_NODELAY
      2) clear TCP_NODELAY
      3) set TCP_CORK
      4) do small write()
      
      The current code will leave TCP_NAGLE_PUSH set after that
      sequence.  Unconditionally clearing the bit when new data
      is added via skb_entail() solves the problem.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89ebd197
  11. 09 7月, 2005 1 次提交
  12. 06 7月, 2005 3 次提交
    • D
      [TCP]: Move to new TSO segmenting scheme. · c1b4a7e6
      David S. Miller 提交于
      Make TSO segment transmit size decisions at send time not earlier.
      
      The basic scheme is that we try to build as large a TSO frame as
      possible when pulling in the user data, but the size of the TSO frame
      output to the card is determined at transmit time.
      
      This is guided by tp->xmit_size_goal.  It is always set to a multiple
      of MSS and tells sendmsg/sendpage how large an SKB to try and build.
      
      Later, tcp_write_xmit() and tcp_push_one() chop up the packet if
      necessary and conditions warrant.  These routines can also decide to
      "defer" in order to wait for more ACKs to arrive and thus allow larger
      TSO frames to be emitted.
      
      A general observation is that TSO elongates the pipe, thus requiring a
      larger congestion window and larger buffering especially at the sender
      side.  Therefore, it is important that applications 1) get a large
      enough socket send buffer (this is accomplished by our dynamic send
      buffer expansion code) 2) do large enough writes.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1b4a7e6
    • D
      [TCP]: Fix send-side cpu utiliziation regression. · b4e26f5e
      David S. Miller 提交于
      Only put user data purely to pages when doing TSO.
      
      The extra page allocations cause two problems:
      
      1) Add the overhead of the page allocations themselves.
      2) Make us do small user copies when we get to the end
         of the TCP socket cache page.
      
      It is still beneficial to purely use pages for TSO,
      so we will do it for that case.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4e26f5e
    • D
      [TCP]: Simplify SKB data portion allocation with NETIF_F_SG. · c65f7f00
      David S. Miller 提交于
      The ideal and most optimal layout for an SKB when doing
      scatter-gather is to put all the headers at skb->data, and
      all the user data in the page array.
      
      This makes SKB splitting and combining extremely simple,
      especially before a packet goes onto the wire the first
      time.
      
      So, when sk_stream_alloc_pskb() is given a zero size, make
      sure there is no skb_tailroom().  This is achieved by applying
      SKB_DATA_ALIGN() to the header length used here.
      
      Next, make select_size() in TCP output segmentation use a
      length of zero when NETIF_F_SG is true on the outgoing
      interface.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c65f7f00
  13. 24 6月, 2005 2 次提交
  14. 19 6月, 2005 2 次提交