1. 11 5月, 2012 12 次提交
    • E
      codel: Controlled Delay AQM · 76e3cc12
      Eric Dumazet 提交于
      An implementation of CoDel AQM, from Kathleen Nichols and Van Jacobson.
      
      http://queue.acm.org/detail.cfm?id=2209336
      
      This AQM main input is no longer queue size in bytes or packets, but the
      delay packets stay in (FIFO) queue.
      
      As we don't have infinite memory, we still can drop packets in enqueue()
      in case of massive load, but mean of CoDel is to drop packets in
      dequeue(), using a control law based on two simple parameters :
      
      target : target sojourn time (default 5ms)
      interval : width of moving time window (default 100ms)
      
      Based on initial work from Dave Taht.
      
      Refactored to help future codel inclusion as a plugin for other linux
      qdisc (FQ_CODEL, ...), like RED.
      
      include/net/codel.h contains codel algorithm as close as possible than
      Kathleen reference.
      
      net/sched/sch_codel.c contains the linux qdisc specific glue.
      
      Separate structures permit a memory efficient implementation of fq_codel
      (to be sent as a separate work) : Each flow has its own struct
      codel_vars.
      
      timestamps are taken at enqueue() time with 1024 ns precision, allowing
      a range of 2199 seconds in queue, and 100Gb links support. iproute2 uses
      usec as base unit.
      
      Selected packets are dropped, unless ECN is enabled and packets can get
      ECN mark instead.
      
      Tested from 2Mb to 10Gb speeds with no particular problems, on ixgbe and
      tg3 drivers (BQL enabled).
      
      Usage: tc qdisc ... codel [ limit PACKETS ] [ target TIME ]
                                [ interval TIME ] [ ecn ]
      
      qdisc codel 10: parent 1:1 limit 2000p target 3.0ms interval 60.0ms ecn
       Sent 13347099587 bytes 8815805 pkt (dropped 0, overlimits 0 requeues 0)
       rate 202365Kbit 16708pps backlog 113550b 75p requeues 0
        count 116 lastcount 98 ldelay 4.3ms dropping drop_next 816us
        maxpacket 1514 ecn_mark 84399 drop_overlimit 0
      
      CoDel must be seen as a base module, and should be used keeping in mind
      there is still a FIFO queue. So a typical setup will probably need a
      hierarchy of several qdiscs and packet classifiers to be able to meet
      whatever constraints a user might have.
      
      One possible example would be to use fq_codel, which combines Fair
      Queueing and CoDel, in replacement of sfq / sfq_red.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDave Taht <dave.taht@bufferbloat.net>
      Cc: Kathleen Nichols <nichols@pollere.com>
      Cc: Van Jacobson <van@pollere.net>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Matt Mathis <mattmathis@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76e3cc12
    • E
      net_sched: update bstats in dequeue() · 2dd875ff
      Eric Dumazet 提交于
      Class bytes/packets stats can be misleading because they are updated in
      enqueue() while packet might be dropped later.
      
      We already fixed all qdiscs but sch_atm.
      
      This patch makes the final cleanup.
      
      class rate estimators can now match qdisc ones.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2dd875ff
    • J
      net, drivers/net: Convert compare_ether_addr_64bits to ether_addr_equal_64bits · a6700db1
      Joe Perches 提交于
      Use the new bool function ether_addr_equal_64bits to add
      some clarity and reduce the likelihood for misuse of
      compare_ether_addr_64bits for sorting.
      
      Done via cocci script:
      
      $ cat compare_ether_addr_64bits.cocci
      @@
      expression a,b;
      @@
      -	!compare_ether_addr_64bits(a, b)
      +	ether_addr_equal_64bits(a, b)
      
      @@
      expression a,b;
      @@
      -	compare_ether_addr_64bits(a, b)
      +	!ether_addr_equal_64bits(a, b)
      
      @@
      expression a,b;
      @@
      -	!ether_addr_equal_64bits(a, b) == 0
      +	ether_addr_equal_64bits(a, b)
      
      @@
      expression a,b;
      @@
      -	!ether_addr_equal_64bits(a, b) != 0
      +	!ether_addr_equal_64bits(a, b)
      
      @@
      expression a,b;
      @@
      -	ether_addr_equal_64bits(a, b) == 0
      +	!ether_addr_equal_64bits(a, b)
      
      @@
      expression a,b;
      @@
      -	ether_addr_equal_64bits(a, b) != 0
      +	ether_addr_equal_64bits(a, b)
      
      @@
      expression a,b;
      @@
      -	!!ether_addr_equal_64bits(a, b)
      +	ether_addr_equal_64bits(a, b)
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6700db1
    • J
      etherdevice.h: Add ether_addr_equal_64bits · baf523c9
      Joe Perches 提交于
      Add an optimized boolean function to check if
      2 ethernet addresses are the same.
      
      This is to avoid any confusion about compare_ether_addr_64bits
      returning an unsigned, and not being able to use the
      compare_ether_addr_64bits function for sorting ala memcmp.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      baf523c9
    • J
      drivers/net: Convert compare_ether_addr to ether_addr_equal · 2e42e474
      Joe Perches 提交于
      Use the new bool function ether_addr_equal to add
      some clarity and reduce the likelihood for misuse
      of compare_ether_addr for sorting.
      
      Done via cocci script:
      
      $ cat compare_ether_addr.cocci
      @@
      expression a,b;
      @@
      -	!compare_ether_addr(a, b)
      +	ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	compare_ether_addr(a, b)
      +	!ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	!ether_addr_equal(a, b) == 0
      +	ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	!ether_addr_equal(a, b) != 0
      +	!ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	ether_addr_equal(a, b) == 0
      +	!ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	ether_addr_equal(a, b) != 0
      +	ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	!!ether_addr_equal(a, b)
      +	ether_addr_equal(a, b)
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e42e474
    • S
      be2net: avoid disabling sriov while VFs are assigned · 39f1d94d
      Sathya Perla 提交于
      Calling pci_disable_sriov() while VFs are assigned to VMs causes
      kernel panic. This patch uses PCI_DEV_FLAGS_ASSIGNED bit state of the
      VF's pci_dev to avoid this. Also, the unconditional function reset cmd
      issued on a PF probe can delete the VF configuration for the
      previously enabled VFs. A scratchpad register is now used to issue a
      function reset only when needed (i.e., in a crash dump scenario.)
      Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39f1d94d
    • J
      l2tp: fix data packet sequence number handling · d301e325
      James Chapman 提交于
      If enabled, L2TP data packets have sequence numbers which a receiver
      can use to drop out of sequence frames or try to reorder them. The
      first frame has sequence number 0, but the L2TP code currently expects
      it to be 1. This results in the first data frame being handled as out
      of sequence.
      
      This one-line patch fixes the problem.
      Signed-off-by: NJames Chapman <jchapman@katalix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d301e325
    • J
      l2tp: fix reorder timeout recovery · 38d40b3f
      James Chapman 提交于
      When L2TP data packet reordering is enabled, packets are held in a
      queue while waiting for out-of-sequence packets. If a packet gets
      lost, packets will be held until the reorder timeout expires, when we
      are supposed to then advance to the sequence number of the next packet
      but we don't currently do so. As a result, the data channel is stuck
      because we are waiting for a packet that will never arrive - all
      packets age out and none are passed.
      
      The fix is to add a flag to the session context, which is set when the
      reorder timeout expires and tells the receive code to reset the next
      expected sequence number to that of the next packet in the queue.
      
      Tested in a production L2TP network with Starent and Nortel L2TP gear.
      Signed-off-by: NJames Chapman <jchapman@katalix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38d40b3f
    • P
      tcp: Out-line tcp_try_rmem_schedule · 1070b1b8
      Pavel Emelyanov 提交于
      As proposed by Eric, make the tcp_input.o thinner.
      
      add/remove: 1/1 grow/shrink: 1/4 up/down: 868/-1329 (-461)
      function                                     old     new   delta
      tcp_try_rmem_schedule                          -     864    +864
      tcp_ack                                     4811    4815      +4
      tcp_validate_incoming                        817     815      -2
      tcp_collapse                                 860     858      -2
      tcp_send_rcvq                                555     353    -202
      tcp_data_queue                              3435    3033    -402
      tcp_prune_queue                              721       -    -721
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1070b1b8
    • P
      tcp: Schedule rmem for rcvq repair send · 3c961afe
      Pavel Emelyanov 提交于
      As noted by Eric, no checks are performed on the data size we're
      putting in the read queue during repair. Thus, validate the given
      data size with the common rmem management routine.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c961afe
    • P
      tcp: Move rcvq sending to tcp_input.c · 292e8d8c
      Pavel Emelyanov 提交于
      It actually works on the input queue and will use its read mem
      routines, thus it's better to have in in the tcp_input.c file.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      292e8d8c
    • D
  2. 10 5月, 2012 28 次提交