1. 24 8月, 2010 8 次提交
    • E
      net: copy_rtnl_link_stats64() simplification · afdcba37
      Eric Dumazet 提交于
      No need to use a temporary struct rtnl_link_stats64 variable,
      just copy the source to skb buffer.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Reviewed-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afdcba37
    • C
      net_sched: act_csum: coding style cleanup · 0eec32ff
      Changli Gao 提交于
      Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0eec32ff
    • D
      pkt_sched: Make act_csum depend upon INET. · 7abac686
      David S. Miller 提交于
      It uses ip_send_check() and stuff like that.
      Reported-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7abac686
    • G
      dccp ccid-2: Replace broken RTT estimator with better algorithm · 231cc2aa
      Gerrit Renker 提交于
      The current CCID-2 RTT estimator code is in parts broken and lags behind the
      suggestions in RFC2988 of using scaled variants for SRTT/RTTVAR.
      
      That code is replaced by the present patch, which reuses the Linux TCP RTT
      estimator code.
      
      Further details:
      ----------------
       1. The minimum RTO of previously one second has been replaced with TCP's, since
          RFC4341, sec. 5 says that the minimum of 1 sec. (suggested in RFC2988, 2.4)
          is not necessary. Instead, the TCP_RTO_MIN is used, which agrees with DCCP's
          concept of a default RTT (RFC 4340, 3.4).
       2. The maximum RTO has been set to DCCP_RTO_MAX (64 sec), which agrees with
          RFC2988, (2.5).
       3. De-inlined the function ccid2_new_ack().
       4. Added a FIXME: the RTT is sampled several times per Ack Vector, which will
          give the wrong estimate. It should be replaced with one sample per Ack.
          However, at the moment this can not be resolved easily, since
          - it depends on TX history code (which also needs some work),
          - the cleanest solution is not to use the `sent' time at all (saves 4 bytes
            per entry) and use DCCP timestamps / elapsed time to estimated the RTT,
            which however is non-trivial to get right (but needs to be done).
      
      Reasons for reusing the Linux TCP estimator algorithm:
      ------------------------------------------------------
      Some time was spent to find a better alternative, using basic RFC2988 as a first
      step. Further analysis and experimentation showed that the Linux TCP RTO
      estimator is superior to a basic RFC2988 implementation. A summary is on
      http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid2/rto_estimator/
      
      In addition, this estimator fared well in a recent empirical evaluation:
      
          Rewaskar, Sushant, Jasleen Kaur and F. Donelson Smith.
          A Performance Study of Loss Detection/Recovery in Real-world TCP
          Implementations. Proceedings of 15th IEEE International
          Conference on Network Protocols (ICNP-07), 2007.
      
      Thus there is significant benefit in reusing the existing TCP code.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      231cc2aa
    • G
      dccp ccid-2: Simplify dec_pipe and rearming of RTO timer · c38c92a8
      Gerrit Renker 提交于
      This removes the dec_pipe function and improves the way the RTO timer is rearmed
      when a new acknowledgment comes in.
      
      Details and justification for removal:
      --------------------------------------
       1) The BUG_ON in dec_pipe is never triggered: pipe is only decremented for TX
          history entries between tail and head, for which it had previously been
          incremented in tx_packet_sent; and it is not decremented twice for the same
          entry, since it is
          - either decremented when a corresponding Ack Vector cell in state 0 or 1
            was received (and then ccid2s_acked==1),
          - or it is decremented when ccid2s_acked==0, as part of the loss detection
            in tx_packet_recv (and hence it can not have been decremented earlier).
      
       2) Restarting the RTO timer happens for every single entry in each Ack Vector
          parsed by tx_packet_recv (according to RFC 4340, 11.4 this can happen up to
          16192 times per Ack Vector).
      
       3) The RTO timer should not be restarted when all outstanding data has been
          acknowledged. This is currently done similar to (2), in dec_pipe, when
          pipe has reached 0.
      
      The patch onsolidates the code which rearms the RTO timer, combining the
      segments from new_ack and dec_pipe. As a result, the code becomes clearer
      (compare with tcp_rearm_rto()).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c38c92a8
    • G
      dccp ccid-2: Remove redundant sanity tests · 30564e35
      Gerrit Renker 提交于
      This removes the ccid2_hc_tx_check_sanity function: it is redundant.
      
      Details:
      
      The tx_check_sanity function performs three tests:
       1) it checks that the circular TX list is sorted
          - in ascending order of sequence number (ccid2s_seq)
          - and time (ccid2s_sent),
          - in the direction from `tail' (hctx_seqt) to `head' (hctx_seqh);
       2) it ensures that the entire list has the length seqbufc * CCID2_SEQBUF_LEN;
       3) it ensures that pipe equals the number of packets that were not
          marked `acked' (ccid2s_acked) between `tail' and `head'.
      
      The following argues that each of these tests is redundant, this can be verified
      by going through the code.
      
      (1) is not necessary, since both time and GSS increase from one packet to the
      next, so that subsequent insertions in tx_packet_sent (which advance the `head'
      pointer) will be in ascending order of time and sequence number.
      
      In (2), the length of the list is always equal to seqbufc times CCID2_SEQBUF_LEN
      (set to 1024) unless allocation caused an earlier failure, because:
       * at initialisation (tx_init), there is one chunk of size 1024 and seqbufc=1;
       * subsequent calls to tx_alloc_seq take place whenever head->next == tail in
         tx_packet_sent; then a new chunk of size 1024 is inserted between head and
         tail, and seqbufc is incremented by one.
      
      To show that (3) is redundant requires looking at two cases.
      
      The `pipe' variable of the TX socket is incremented only in tx_packet_sent, and
      decremented in tx_packet_recv.  When head == tail (TX history empty) then pipe
      should be 0, which is the case directly after initialisation and after a
      retransmission timeout has occurred (ccid2_hc_tx_rto_expire).
      
      The first case involves parsing Ack Vectors for packets recorded in the live
      portion of the buffer, between tail and head. For each packet marked by the
      receiver as received (state 0) or ECN-marked (state 1), pipe is decremented by
      one, so for all such packets the BUG_ON in tx_check_sanity will not trigger.
      
      The second case is the loss detection in the second half of tx_packet_recv,
      below the comment "Check for NUMDUPACK".
      
      The first while-loop here ensures that the sequence number of `seqp' is either
      above or equal to `high_ack', or otherwise equal to the highest sequence number
      sent so far (of the entry head->prev, as head points to the next unsent entry).
      The next while-loop ("while (1)") counts the number of acked packets starting
      from that position of seqp, going backwards in the direction from head->prev to
      tail. If NUMDUPACK=3 such packets were counted within this loop, `seqp' points
      to the last acknowledged packet of these, and the "if (done == NUMDUPACK)" block
      is entered next.
      The while-loop contained within that block in turn traverses the list backwards,
      from head to tail; the position of `seqp' is saved in the variable `last_acked'.
      For each packet not marked as `acked', a congestion event is triggered within
      the loop, and pipe is decremented. The loop terminates when `seqp' has reached
      `tail', whereupon tail is set to the position previously stored in `last_acked'.
      Thus, between `last_acked' and the previous position of `tail',
       - pipe has been decremented earlier if the packet was marked as state 0 or 1;
       - pipe was decremented if the packet was not marked as acked.
      That is, pipe has been decremented by the number of packets between `last_acked'
      and the previous position of `tail'. As a consequence, pipe now again reflects
      the number of packets which have not (yet) been acked between the new position
      of tail (at `last_acked') and head->prev, or 0 if head==tail. The result is that
      the BUG_ON condition in check_sanity will also not be triggered, hence the test
      (3) is also redundant.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30564e35
    • G
      dccp ccid-3: No more CCID control blocks in LISTEN state · 51c22bb5
      Gerrit Renker 提交于
      The CCIDs are activated as last of the features, at the end of the handshake,
      were the LISTEN state of the master socket is inherited into the server
      state of the child socket. Thus, the only states visible to CCIDs now are
      OPEN/PARTOPEN, and the closing states.
      
      This allows to remove tests which were previously necessary to protect
      against referencing a socket in the listening state (in CCID-3), but which
      now have become redundant.
      
      As a further byproduct of enabling the CCIDs only after the connection has been
      fully established, several typecast-initialisations of ccid3_hc_{rx,tx}_sock
      can now be eliminated:
       * the CCID is loaded, so it is not necessary to test if it is NULL,
       * if it is possible to load a CCID and leave the private area NULL, then this
          is a bug, which should crash loudly - and earlier,
       * the test for state==OPEN || state==PARTOPEN now reduces only to the closing
         phase (e.g. when the node has received an unexpected Reset).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51c22bb5
    • G
      ccid: ccid-2/3 code cosmetics · 67b67e36
      Gerrit Renker 提交于
      This patch collects cosmetics-only changes to separate these from
      code changes:
       * update with regard to CodingStyle and whitespace changes,
       * documentation:
         - adding/revising comments,
         - remove CCID-3 RX socket documentation which is either
           duplicate or refers to fields that no longer exist,
       * expand embedded tfrc_tx_info struct inline for consistency,
         removing indirections via #define.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67b67e36
  2. 23 8月, 2010 5 次提交
    • D
      net: Rename skb_has_frags to skb_has_frag_list · 21dc3301
      David S. Miller 提交于
      SKBs can be "fragmented" in two ways, via a page array (called
      skb_shinfo(skb)->frags[]) and via a list of SKBs (called
      skb_shinfo(skb)->frag_list).
      
      Since skb_has_frags() tests the latter, it's name is confusing
      since it sounds more like it's testing the former.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21dc3301
    • H
      tcp: allow effective reduction of TCP's rcv-buffer via setsockopt · e88c64f0
      Hagen Paul Pfeifer 提交于
      Via setsockopt it is possible to reduce the socket RX buffer
      (SO_RCVBUF). TCP method to select the initial window and window scaling
      option in tcp_select_initial_window() currently misbehaves and do not
      consider a reduced RX socket buffer via setsockopt.
      
      Even though the server's RX buffer is reduced via setsockopt() to 256
      byte (Initial Window 384 byte => 256 * 2 - (256 * 2 / 4)) the window
      scale option is still 7:
      
      192.168.1.38.40676 > 78.47.222.210.5001: Flags [S], seq 2577214362, win 5840, options [mss 1460,sackOK,TS val 338417 ecr 0,nop,wscale 0], length 0
      78.47.222.210.5001 > 192.168.1.38.40676: Flags [S.], seq 1570631029, ack 2577214363, win 384, options [mss 1452,sackOK,TS val 2435248895 ecr 338417,nop,wscale 7], length 0
      192.168.1.38.40676 > 78.47.222.210.5001: Flags [.], ack 1, win 5840, options [nop,nop,TS val 338421 ecr 2435248895], length 0
      
      Within tcp_select_initial_window() the original space argument - a
      representation of the rx buffer size - is expanded during
      tcp_select_initial_window(). Only sysctl_tcp_rmem[2], sysctl_rmem_max
      and window_clamp are considered to calculate the initial window.
      
      This patch adjust the window_clamp argument if the user explicitly
      reduce the receive buffer.
      Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e88c64f0
    • S
      bridge: is PACKET_LOOPBACK unlikely()? · c2368e79
      Simon Horman 提交于
      While looking at using netdev_rx_handler_register for openvswitch Jesse
      Gross suggested that an unlikely() might be worthwhile in that code.
      I'm interested to see if its appropriate for the bridge code.
      
      Cc: Jesse Gross <jesse@nicira.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2368e79
    • C
      net: 802.1q: make vlan_hwaccel_do_receive() return void · 05532121
      Changli Gao 提交于
      vlan_hwaccel_do_receive() always returns 0, so make it return void.
      Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05532121
    • S
      net/sched: need to include net/ip6_checksum.h · 2436243a
      Stephen Rothwell 提交于
      for the declararion of csum_ipv6_magic.
      
      Fixes this build error on PowerPC (at least):
      
      net/sched/act_csum.c: In function 'tcf_csum_ipv6_icmp':
      net/sched/act_csum.c:178: error: implicit declaration of function 'csum_ipv6_magic'
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2436243a
  3. 22 8月, 2010 4 次提交
  4. 20 8月, 2010 9 次提交
  5. 19 8月, 2010 9 次提交
  6. 18 8月, 2010 5 次提交