1. 29 1月, 2008 13 次提交
    • I
      [TCP]: Uninline tcp_is_cwnd_limited · cea14e0e
      Ilpo Järvinen 提交于
      net/ipv4/tcp_cong.c:
        tcp_reno_cong_avoid |  -65
       1 function changed, 65 bytes removed, diff: -65
      
      net/ipv4/arp.c:
        arp_ignore |   -5
       1 function changed, 5 bytes removed, diff: -5
      
      net/ipv4/tcp_bic.c:
        bictcp_cong_avoid |  -57
       1 function changed, 57 bytes removed, diff: -57
      
      net/ipv4/tcp_cubic.c:
        bictcp_cong_avoid |  -61
       1 function changed, 61 bytes removed, diff: -61
      
      net/ipv4/tcp_highspeed.c:
        hstcp_cong_avoid |  -63
       1 function changed, 63 bytes removed, diff: -63
      
      net/ipv4/tcp_hybla.c:
        hybla_cong_avoid |  -85
       1 function changed, 85 bytes removed, diff: -85
      
      net/ipv4/tcp_htcp.c:
        htcp_cong_avoid |  -57
       1 function changed, 57 bytes removed, diff: -57
      
      net/ipv4/tcp_veno.c:
        tcp_veno_cong_avoid |  -52
       1 function changed, 52 bytes removed, diff: -52
      
      net/ipv4/tcp_scalable.c:
        tcp_scalable_cong_avoid |  -61
       1 function changed, 61 bytes removed, diff: -61
      
      net/ipv4/tcp_yeah.c:
        tcp_yeah_cong_avoid |  -75
       1 function changed, 75 bytes removed, diff: -75
      
      net/ipv4/tcp_illinois.c:
        tcp_illinois_cong_avoid |  -54
       1 function changed, 54 bytes removed, diff: -54
      
      net/dccp/ccids/ccid3.c:
        ccid3_update_send_interval |   -7
        ccid3_hc_tx_packet_recv    |   +7
       2 functions changed, 7 bytes added, 7 bytes removed, diff: +0
      
      net/ipv4/tcp_cong.c:
        tcp_is_cwnd_limited |  +88
       1 function changed, 88 bytes added, diff: +88
      
      built-in.o:
       14 functions changed, 95 bytes added, 642 bytes removed, diff: -547
      
      ...Again some gcc artifacts visible as well.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cea14e0e
    • I
      [TCP]: Uninline tcp_set_state · 490d5046
      Ilpo Järvinen 提交于
      net/ipv4/tcp.c:
        tcp_close_state | -226
        tcp_done        | -145
        tcp_close       | -564
        tcp_disconnect  | -141
       4 functions changed, 1076 bytes removed, diff: -1076
      
      net/ipv4/tcp_input.c:
        tcp_fin               |  -86
        tcp_rcv_state_process | -164
       2 functions changed, 250 bytes removed, diff: -250
      
      net/ipv4/tcp_ipv4.c:
        tcp_v4_connect | -209
       1 function changed, 209 bytes removed, diff: -209
      
      net/ipv4/arp.c:
        arp_ignore |   +5
       1 function changed, 5 bytes added, diff: +5
      
      net/ipv6/tcp_ipv6.c:
        tcp_v6_connect | -158
       1 function changed, 158 bytes removed, diff: -158
      
      net/sunrpc/xprtsock.c:
        xs_sendpages |   -2
       1 function changed, 2 bytes removed, diff: -2
      
      net/dccp/ccids/ccid3.c:
        ccid3_update_send_interval |   +7
       1 function changed, 7 bytes added, diff: +7
      
      net/ipv4/tcp.c:
        tcp_set_state | +238
       1 function changed, 238 bytes added, diff: +238
      
      built-in.o:
       12 functions changed, 250 bytes added, 1695 bytes removed, diff: -1445
      
      I've no explanation why some unrelated changes seem to occur
      consistently as well (arp_ignore, ccid3_update_send_interval;
      I checked the arp_ignore asm and it seems to be due to some
      reordered of operation order causing some extra opcodes to be
      generated). Still, the benefits are pretty obvious from the
      codiff's results.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      490d5046
    • I
      [TCP]: Remove TCPCB_URG & TCPCB_AT_TAIL as unnecessary · 4828e7f4
      Ilpo Järvinen 提交于
      The snd_up check should be enough. I suspect this has been
      there to provide a minor optimization in clean_rtx_queue which
      used to have a small if (!->sacked) block which could skip
      snd_up check among the other work.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4828e7f4
    • I
    • H
      [NET] CORE: Introducing new memory accounting interface. · 3ab224be
      Hideo Aoki 提交于
      This patch introduces new memory accounting functions for each network
      protocol. Most of them are renamed from memory accounting functions
      for stream protocols. At the same time, some stream memory accounting
      functions are removed since other functions do same thing.
      
      Renaming:
      	sk_stream_free_skb()		->	sk_wmem_free_skb()
      	__sk_stream_mem_reclaim()	->	__sk_mem_reclaim()
      	sk_stream_mem_reclaim()		->	sk_mem_reclaim()
      	sk_stream_mem_schedule 		->    	__sk_mem_schedule()
      	sk_stream_pages()      		->	sk_mem_pages()
      	sk_stream_rmem_schedule()	->	sk_rmem_schedule()
      	sk_stream_wmem_schedule()	->	sk_wmem_schedule()
      	sk_charge_skb()			->	sk_mem_charge()
      
      Removeing
      	sk_stream_rfree():	consolidates into sock_rfree()
      	sk_stream_set_owner_r(): consolidates into skb_set_owner_r()
      	sk_stream_mem_schedule()
      
      The following functions are added.
          	sk_has_account(): check if the protocol supports accounting
      	sk_mem_uncharge(): do the opposite of sk_mem_charge()
      
      In addition, to achieve consolidation, updating sk_wmem_queued is
      removed from sk_mem_charge().
      
      Next, to consolidate memory accounting functions, this patch adds
      memory accounting calls to network core functions. Moreover, present
      memory accounting call is renamed to new accounting call.
      
      Finally we replace present memory accounting calls with new interface
      in TCP and SCTP.
      Signed-off-by: NTakahiro Yasui <tyasui@redhat.com>
      Signed-off-by: NHideo Aoki <haoki@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ab224be
    • Y
      [TCP]: Convert several length variable to unsigned. · 9cb5734e
      YOSHIFUJI Hideaki 提交于
      Several length variables cannot be negative, so convert int to
      unsigned int.  This also allows us to do sane shift operations
      on those variables.
      Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cb5734e
    • I
      [TCP]: Abstract tp->highest_sack accessing & point to next skb · 6859d494
      Ilpo Järvinen 提交于
      Pointing to the next skb is necessary to avoid referencing
      already SACKed skbs which will soon be on a separate list.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6859d494
    • I
    • I
    • I
      [TCP]: Move FRTO checks out from write queue abstraction funcs · 8512430e
      Ilpo Järvinen 提交于
      Better place exists in update_send_head (other non-queue related
      adjustments are done there as well) which is the only caller of
      tcp_advance_send_head (now that the bogus call from mtu_probe is
      gone).
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8512430e
    • I
      [TCP]: Rewrite SACK block processing & sack_recv_cache use · 68f8353b
      Ilpo Järvinen 提交于
      Key points of this patch are:
      
        - In case new SACK information is advance only type, no skb
          processing below previously discovered highest point is done
        - Optimize cases below highest point too since there's no need
          to always go up to highest point (which is very likely still
          present in that SACK), this is not entirely true though
          because I'm dropping the fastpath_skb_hint which could
          previously optimize those cases even better. Whether that's
          significant, I'm not too sure.
      
      Currently it will provide skipping by walking. Combined with
      RB-tree, all skipping would become fast too regardless of window
      size (can be done incrementally later).
      
      Previously a number of cases in TCP SACK processing fails to
      take advantage of costly stored information in sack_recv_cache,
      most importantly, expected events such as cumulative ACK and new
      hole ACKs. Processing on such ACKs result in rather long walks
      building up latencies (which easily gets nasty when window is
      huge). Those latencies are often completely unnecessary
      compared with the amount of _new_ information received, usually
      for cumulative ACK there's no new information at all, yet TCP
      walks whole queue unnecessary potentially taking a number of
      costly cache misses on the way, etc.!
      
      Since the inclusion of highest_sack, there's a lot information
      that is very likely redundant (SACK fastpath hint stuff,
      fackets_out, highest_sack), though there's no ultimate guarantee
      that they'll remain the same whole the time (in all unearthly
      scenarios). Take advantage of this knowledge here and drop
      fastpath hint and use direct access to highest SACKed skb as
      a replacement.
      
      Effectively "special cased" fastpath is dropped. This change
      adds some complexity to introduce better coveraged "fastpath",
      though the added complexity should make TCP behave more cache
      friendly.
      
      The current ACK's SACK blocks are compared against each cached
      block individially and only ranges that are new are then scanned
      by the high constant walk. For other parts of write queue, even
      when in previously known part of the SACK blocks, a faster skip
      function is used (if necessary at all). In addition, whenever
      possible, TCP fast-forwards to highest_sack skb that was made
      available by an earlier patch. In typical case, no other things
      but this fast-forward and mandatory markings after that occur
      making the access pattern quite similar to the former fastpath
      "special case".
      
      DSACKs are special case that must always be walked.
      
      The local to recv_sack_cache copying could be more intelligent
      w.r.t DSACKs which are likely to be there only once but that
      is left to a separate patch.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68f8353b
    • I
      [TCP]: Convert highest_sack to sk_buff to allow direct access · a47e5a98
      Ilpo Järvinen 提交于
      It is going to replace the sack fastpath hint quite soon... :-)
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a47e5a98
    • J
      [TCP]: Splice receive support. · 9c55e01c
      Jens Axboe 提交于
      Support for network splice receive.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c55e01c
  2. 20 11月, 2007 1 次提交
  3. 24 10月, 2007 1 次提交
  4. 11 10月, 2007 13 次提交
  5. 29 9月, 2007 1 次提交
    • D
      [TCP]: Fix MD5 signature handling on big-endian. · f8ab18d2
      David S. Miller 提交于
      Based upon a report and initial patch by Peter Lieven.
      
      tcp4_md5sig_key and tcp6_md5sig_key need to start with
      the exact same members as tcp_md5sig_key.  Because they
      are both cast to that type by tcp_v{4,6}_md5_do_lookup().
      
      Unfortunately tcp{4,6}_md5sig_key use a u16 for the key
      length instead of a u8, which is what tcp_md5sig_key
      uses.  This just so happens to work by accident on
      little-endian, but on big-endian it doesn't.
      
      Instead of casting, just place tcp_md5sig_key as the first member of
      the address-family specific structures, adjust the access sites, and
      kill off the ugly casts.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8ab18d2
  6. 03 8月, 2007 1 次提交
    • D
      [TCP]: Invoke tcp_sendmsg() directly, do not use inet_sendmsg(). · 3516ffb0
      David S. Miller 提交于
      As discovered by Evegniy Polyakov, if we try to sendmsg after
      a connection reset, we can do incredibly stupid things.
      
      The core issue is that inet_sendmsg() tries to autobind the
      socket, but we should never do that for TCP.  Instead we should
      just go straight into TCP's sendmsg() code which will do all
      of the necessary state and pending socket error checks.
      
      TCP's sendpage already directly vectors to tcp_sendpage(), so this
      merely brings sendmsg() in line with that.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3516ffb0
  7. 31 7月, 2007 1 次提交
  8. 18 7月, 2007 1 次提交
  9. 31 5月, 2007 1 次提交
  10. 03 5月, 2007 1 次提交
  11. 30 4月, 2007 2 次提交
    • I
      [TCP] FRTO: RFC4138 allows Nagle override when new data must be sent · d551e454
      Ilpo Järvinen 提交于
      This is a corner case where less than MSS sized new data thingie
      is awaiting in the send queue. For F-RTO to work correctly, a
      new data segment must be sent at certain point or F-RTO cannot
      be used at all. RFC4138 allows overriding of Nagle at that
      point.
      
      Implementation uses frto_counter states 2 and 3 to distinguish
      when Nagle override is needed.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d551e454
    • I
      [TCP]: Catch skb with S+L bugs earlier · 34588b4c
      Ilpo Järvinen 提交于
      SACKED_ACKED and LOST are mutually exclusive with SACK, thus
      having their sum larger than packets_out is bug with SACK.
      Eventually these bugs trigger traps in the tcp_clean_rtx_queue
      with SACK but it's much more informative to do this here.
      
      Non-SACK TCP, however, could get more than packets_out duplicate
      ACKs which each increment sacked_out, so it makes sense to do
      this kind of limitting for non-SACK TCP but not for SACK enabled
      one. Perhaps the author had the opposite in mind but did the
      logic accidently wrong way around? Anyway, the sacked_out
      incrementer code for non-SACK already deals this issue before
      calling sync_left_out so this trapping can be done
      unconditionally.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34588b4c
  12. 26 4月, 2007 4 次提交
    • S
      [TCP]: Congestion control API update. · 164891aa
      Stephen Hemminger 提交于
      Do some simple changes to make congestion control API faster/cleaner.
      * use ktime_t rather than timeval
      * merge rtt sampling into existing ack callback
        this means one indirect call versus two per ack.
      * use flags bits to store options/settings
      Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      164891aa
    • I
      [TCP]: Sed magic converts func(sk, tp, ...) -> func(sk, ...) · 9e412ba7
      Ilpo Järvinen 提交于
      This is (mostly) automated change using magic:
      
      sed -e '/struct sock \*sk/ N' -e '/struct sock \*sk/ N'
          -e '/struct sock \*sk/ N' -e '/struct sock \*sk/ N'
          -e 's|struct sock \*sk,[\n\t ]*struct tcp_sock \*tp\([^{]*\n{\n\)|
      	  struct sock \*sk\1\tstruct tcp_sock *tp = tcp_sk(sk);\n|g'
          -e 's|struct sock \*sk, struct tcp_sock \*tp|
      	  struct sock \*sk|g' -e 's|sk, tp\([^-]\)|sk\1|g'
      
      Fixed four unused variable (tp) warnings that were introduced.
      
      In addition, manually added newlines after local variables and
      tweaked function arguments positioning.
      
      $ gcc --version
      gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)
      ...
      $ codiff -fV built-in.o.old built-in.o.new
      net/ipv4/route.c:
        rt_cache_flush |  +14
       1 function changed, 14 bytes added
      
      net/ipv4/tcp.c:
        tcp_setsockopt |   -5
        tcp_sendpage   |  -25
        tcp_sendmsg    |  -16
       3 functions changed, 46 bytes removed
      
      net/ipv4/tcp_input.c:
        tcp_try_undo_recovery |   +3
        tcp_try_undo_dsack    |   +2
        tcp_mark_head_lost    |  -12
        tcp_ack               |  -15
        tcp_event_data_recv   |  -32
        tcp_rcv_state_process |  -10
        tcp_rcv_established   |   +1
       7 functions changed, 6 bytes added, 69 bytes removed, diff: -63
      
      net/ipv4/tcp_output.c:
        update_send_head          |   -9
        tcp_transmit_skb          |  +19
        tcp_cwnd_validate         |   +1
        tcp_write_wakeup          |  -17
        __tcp_push_pending_frames |  -25
        tcp_push_one              |   -8
        tcp_send_fin              |   -4
       7 functions changed, 20 bytes added, 63 bytes removed, diff: -43
      
      built-in.o.new:
       18 functions changed, 40 bytes added, 178 bytes removed, diff: -138
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e412ba7
    • A
      [TCP]: Uninline tcp_done(). · 4ac02bab
      Andi Kleen 提交于
      The function is quite big and has several call sites and nothing
      to collapse by compiler optimization on inlining.
      
      Besides it's nicer to read in a in .c file.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ac02bab
    • H
      [NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY · 60476372
      Herbert Xu 提交于
      When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
      maps to the semantics of CHECKSUM_UNNECESSARY.  Therefore we should
      treat it as such in the stack.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60476372