1. 23 1月, 2012 5 次提交
    • G
      net: introduce res_counter_charge_nofail() for socket allocations · 0e90b31f
      Glauber Costa 提交于
      There is a case in __sk_mem_schedule(), where an allocation
      is beyond the maximum, but yet we are allowed to proceed.
      It happens under the following condition:
      
      	sk->sk_wmem_queued + size >= sk->sk_sndbuf
      
      The network code won't revert the allocation in this case,
      meaning that at some point later it'll try to do it. Since
      this is never communicated to the underlying res_counter
      code, there is an inbalance in res_counter uncharge operation.
      
      I see two ways of fixing this:
      
      1) storing the information about those allocations somewhere
         in memcg, and then deducting from that first, before
         we start draining the res_counter,
      2) providing a slightly different allocation function for
         the res_counter, that matches the original behavior of
         the network code more closely.
      
      I decided to go for #2 here, believing it to be more elegant,
      since #1 would require us to do basically that, but in a more
      obscure way.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      CC: Tejun Heo <tj@kernel.org>
      CC: Li Zefan <lizf@cn.fujitsu.com>
      CC: Laurent Chavey <chavey@google.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e90b31f
    • S
      tcp: md5: using remote adress for md5 lookup in rst packet · 8a622e71
      shawnlu 提交于
      md5 key is added in socket through remote address.
      remote address should be used in finding md5 key when
      sending out reset packet.
      Signed-off-by: Nshawnlu <shawn.lu@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a622e71
    • P
      pktgen: Fix unsigned function that is returning negative vals · bf0813bd
      Paul Gortmaker 提交于
      Every call to num_args() immediately checks the return value for
      less than zero, as it will return -EFAULT for a failed get_user()
      call.  So it makes no sense for the function to be declared as an
      unsigned long.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf0813bd
    • Y
      tcp: detect loss above high_seq in recovery · 974c1236
      Yuchung Cheng 提交于
      Correctly implement a loss detection heuristic: New sequences (above
      high_seq) sent during the fast recovery are deemed lost when higher
      sequences are SACKed.
      
      Current code does not catch these losses, because tcp_mark_head_lost()
      does not check packets beyond high_seq. The fix is straight-forward by
      checking packets until the highest sacked packet. In addition, all the
      FLAG_DATA_LOST logic are in-effective and redundant and can be removed.
      
      Update the loss heuristic comments. The algorithm above is documented
      as heuristic B, but it is redundant too because heuristic A already
      covers B.
      
      Note that this change only marks some forward-retransmitted packets LOST.
      It does NOT forbid TCP performing further CWR on new losses. A potential
      follow-up patch under preparation is to perform another CWR on "new"
      losses such as
      1) sequence above high_seq is lost (by resetting high_seq to snd_nxt)
      2) retransmission is lost.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      974c1236
    • V
      netem: Fix off-by-one bug in reordering · a42b4799
      Vijay Subramanian 提交于
      With netem reordering, a gap of N is supposed to reorder every Nth packet with
      given reorder probability.  However, the code currently skips N packets and
      reorders every (N+1)th packet.
      Signed-off-by: NVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a42b4799
  2. 21 1月, 2012 2 次提交
    • N
      tcp: fix undo after RTO for CUBIC · 5a45f008
      Neal Cardwell 提交于
      This patch fixes CUBIC so that cwnd reductions made during RTOs can be
      undone (just as they already can be undone when using the default/Reno
      behavior).
      
      When undoing cwnd reductions, BIC-derived congestion control modules
      were restoring the cwnd from last_max_cwnd. There were two problems
      with using last_max_cwnd to restore a cwnd during undo:
      
      (a) last_max_cwnd was set to 0 on state transitions into TCP_CA_Loss
      (by calling the module's reset() functions), so cwnd reductions from
      RTOs could not be undone.
      
      (b) when fast_covergence is enabled (which it is by default)
      last_max_cwnd does not actually hold the value of snd_cwnd before the
      loss; instead, it holds a scaled-down version of snd_cwnd.
      
      This patch makes the following changes:
      
      (1) upon undo, revert snd_cwnd to ca->loss_cwnd, which is already, as
      the existing comment notes, the "congestion window at last loss"
      
      (2) stop forgetting ca->loss_cwnd on TCP_CA_Loss events
      
      (3) use ca->last_max_cwnd to check if we're in slow start
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Acked-by: NSangtae Ha <sangtae.ha@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a45f008
    • N
      tcp: fix undo after RTO for BIC · fc16dcd8
      Neal Cardwell 提交于
      This patch fixes BIC so that cwnd reductions made during RTOs can be
      undone (just as they already can be undone when using the default/Reno
      behavior).
      
      When undoing cwnd reductions, BIC-derived congestion control modules
      were restoring the cwnd from last_max_cwnd. There were two problems
      with using last_max_cwnd to restore a cwnd during undo:
      
      (a) last_max_cwnd was set to 0 on state transitions into TCP_CA_Loss
      (by calling the module's reset() functions), so cwnd reductions from
      RTOs could not be undone.
      
      (b) when fast_covergence is enabled (which it is by default)
      last_max_cwnd does not actually hold the value of snd_cwnd before the
      loss; instead, it holds a scaled-down version of snd_cwnd.
      
      This patch makes the following changes:
      
      (1) upon undo, revert snd_cwnd to ca->loss_cwnd, which is already, as
      the existing comment notes, the "congestion window at last loss"
      
      (2) stop forgetting ca->loss_cwnd on TCP_CA_Loss events
      
      (3) use ca->last_max_cwnd to check if we're in slow start
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc16dcd8
  3. 19 1月, 2012 4 次提交
  4. 18 1月, 2012 5 次提交
  5. 17 1月, 2012 13 次提交
  6. 16 1月, 2012 2 次提交
  7. 14 1月, 2012 2 次提交
  8. 13 1月, 2012 4 次提交
    • R
      RDS: Remove some unused iWARP code · 5b7bf42e
      Roland Dreier 提交于
      rds_iw_flush_goal() just returns a count, but it is only called in one
      place and its return value is ignored there.  So delete all the dead code.
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b7bf42e
    • E
      net_sched: sfq: add optional RED on top of SFQ · ddecf0f4
      Eric Dumazet 提交于
      Adds an optional Random Early Detection on each SFQ flow queue.
      
      Traditional SFQ limits count of packets, while RED permits to also
      control number of bytes per flow, and adds ECN capability as well.
      
      1) We dont handle the idle time management in this RED implementation,
      since each 'new flow' begins with a null qavg. We really want to address
      backlogged flows.
      
      2) if headdrop is selected, we try to ecn mark first packet instead of
      currently enqueued packet. This gives faster feedback for tcp flows
      compared to traditional RED [ marking the last packet in queue ]
      
      Example of use :
      
      tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 4sec sfq \
      	limit 3000 headdrop flows 512 divisor 16384 \
      	redflowlimit 100000 min 8000 max 60000 probability 0.20 ecn
      
      qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
      flows 512/16384 divisor 16384
       ewma 6 min 8000b max 60000b probability 0.2 ecn
       prob_mark 0 prob_mark_head 4876 prob_drop 6131
       forced_mark 0 forced_mark_head 0 forced_drop 0
       Sent 1175211782 bytes 777537 pkt (dropped 6131, overlimits 11007
      requeues 0)
       rate 99483Kbit 8219pps backlog 689392b 456p requeues 0
      
      In this test, with 64 netperf TCP_STREAM sessions, 50% using ECN enabled
      flows, we can see number of packets CE marked is smaller than number of
      drops (for non ECN flows)
      
      If same test is run, without RED, we can check backlog is much bigger.
      
      qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
      flows 512/16384 divisor 16384
       Sent 1148683617 bytes 795006 pkt (dropped 0, overlimits 0 requeues 0)
       rate 98429Kbit 8521pps backlog 1221290b 841p requeues 0
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      CC: Dave Taht <dave.taht@gmail.com>
      Tested-by: NDave Taht <dave.taht@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddecf0f4
    • G
      net: decrement memcg jump label when limit, not usage, is changed · 1398eee0
      Glauber Costa 提交于
      The logic of the current code is that whenever we destroy
      a cgroup that had its limit set (set meaning different than
      maximum), we should decrement the jump_label counter.
      Otherwise we assume it was never incremented.
      
      But what the code actually does is test for RES_USAGE
      instead of RES_LIMIT. Usage being different than maximum
      is likely to be true most of the time.
      
      The effect of this is that the key must become negative,
      and since the jump_label test says:
      
              !!atomic_read(&key->enabled);
      
      we'll have jump_labels still on when no one else is
      using this functionality.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      CC: David S. Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1398eee0
    • E
      net: reintroduce missing rcu_assign_pointer() calls · cf778b00
      Eric Dumazet 提交于
      commit a9b3cd7f (rcu: convert uses of rcu_assign_pointer(x, NULL) to
      RCU_INIT_POINTER) did a lot of incorrect changes, since it did a
      complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x,
      y).
      
      We miss needed barriers, even on x86, when y is not NULL.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf778b00
  9. 12 1月, 2012 3 次提交