1. 27 6月, 2011 1 次提交
    • J
      net_sched: fix dequeuer fairness · d5b8aa1d
      jamal 提交于
      Results on dummy device can be seen in my netconf 2011
      slides. These results are for a 10Gige IXGBE intel
      nic - on another i5 machine, very similar specs to
      the one used in the netconf2011 results.
      It turns out - this is a hell lot worse than dummy
      and so this patch is even more beneficial for 10G.
      
      Test setup:
      ----------
      
      System under test sending packets out.
      Additional box connected directly dropping packets.
      Installed prio qdisc on the eth device and default
      netdev default length of 1000 used as is.
      The 3 prio bands each were set to 100 (didnt factor in
      the results).
      
      5 packet runs were made and the middle 3 picked.
      
      results
      -------
      
      The "cpu" column indicates the which cpu the sample
      was taken on,
      The "Pkt runx" carries the number of packets a cpu
      dequeued when forced to be in the "dequeuer" role.
      The "avg" for each run is the number of times each
      cpu should be a "dequeuer" if the system was fair.
      
      3.0-rc4      (plain)
      cpu         Pkt run1        Pkt run2        Pkt run3
      ================================================
      cpu0        21853354        21598183        22199900
      cpu1          431058          473476          393159
      cpu2          481975          477529          458466
      cpu3        23261406        23412299        22894315
      avg         11506948        11490372        11486460
      
      3.0-rc4 with patch and default weight 64
      cpu 	     Pkt run1        Pkt run2        Pkt run3
      ================================================
      cpu0        13205312        13109359        13132333
      cpu1        10189914        10159127        10122270
      cpu2        10213871        10124367        10168722
      cpu3        13165760        13164767        13096705
      avg         11693714        11639405        11630008
      
      As you can see the system is still not perfect but
      is a lot better than what it was before...
      
      At the moment we use the old backlog weight, weight_p
      which is 64 packets. It seems to be reasonably fine
      with that value.
      The system could be made more fair if we reduce the
      weight_p (as per my presentation), but we are going
      to affect the shared backlog weight. Unless deemed
      necessary, I think the default value is fine. If not
      we could add yet another knob.
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5b8aa1d
  2. 22 6月, 2011 2 次提交
  3. 10 6月, 2011 1 次提交
    • G
      rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679
      Greg Rose 提交于
      The message size allocated for rtnl ifinfo dumps was limited to
      a single page.  This is not enough for additional interface info
      available with devices that support SR-IOV and caused a bug in
      which VF info would not be displayed if more than approximately
      40 VFs were created per interface.
      
      Implement a new function pointer for the rtnl_register service that will
      calculate the amount of data required for the ifinfo dump and allocate
      enough data to satisfy the request.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c7ac8679
  4. 07 6月, 2011 2 次提交
  5. 26 5月, 2011 1 次提交
  6. 24 5月, 2011 1 次提交
    • E
      sch_sfq: avoid giving spurious NET_XMIT_CN signals · 8efa8854
      Eric Dumazet 提交于
      While chasing a possible net_sched bug, I found that IP fragments have
      litle chance to pass a congestioned SFQ qdisc :
      
      - Say SFQ qdisc is full because one flow is non responsive.
      - ip_fragment() wants to send two fragments belonging to an idle flow.
      - sfq_enqueue() queues first packet, but see queue limit reached :
      - sfq_enqueue() drops one packet from 'big consumer', and returns
      NET_XMIT_CN.
      - ip_fragment() cancel remaining fragments.
      
      This patch restores fairness, making sure we return NET_XMIT_CN only if
      we dropped a packet from the same flow.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      CC: Jamal Hadi Salim <hadi@cyberus.ca>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8efa8854
  7. 23 5月, 2011 1 次提交
    • E
      net: avoid synchronize_rcu() in dev_deactivate_many · 3137663d
      Eric Dumazet 提交于
      dev_deactivate_many() issues one synchronize_rcu() call after qdiscs set
      to noop_qdisc.
      
      This call is here to make sure they are no outstanding qdisc-less
      dev_queue_xmit calls before returning to caller.
      
      But in dismantle phase, we dont have to wait, because we wont activate
      again the device, and we are going to wait one rcu grace period later in
      rollback_registered_many().
      
      After this patch, device dismantle uses one synchronize_net() and one
      rcu_barrier() call only, so we have a ~30% speedup and a smaller RTNL
      latency.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>,
      CC: Ben Greear <greearb@candelatech.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3137663d
  8. 20 5月, 2011 2 次提交
  9. 08 5月, 2011 3 次提交
  10. 23 4月, 2011 1 次提交
  11. 05 4月, 2011 1 次提交
  12. 31 3月, 2011 1 次提交
  13. 05 3月, 2011 1 次提交
  14. 04 3月, 2011 1 次提交
  15. 26 2月, 2011 1 次提交
  16. 25 2月, 2011 8 次提交
  17. 24 2月, 2011 3 次提交
    • S
      em_meta: fix sparse warning · e0c56310
      stephen hemminger 提交于
      gfp_t needs to be cast to integer.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0c56310
    • S
      mqprio: cleanups · ea18fd95
      stephen hemminger 提交于
      * make qdisc_ops local
      * add sparse annotation about expected unlock/unlock in dump_class_stats
      * fix indentation
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea18fd95
    • E
      net_sched: SFB flow scheduler · e13e02a3
      Eric Dumazet 提交于
      This is the Stochastic Fair Blue scheduler, based on work from :
      
      W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: A New Class of Active Queue
      Management Algorithms. U. Michigan CSE-TR-387-99, April 1999.
      
      http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf
      
      This implementation is based on work done by Juliusz Chroboczek
      
      General SFB algorithm can be found in figure 14, page 15:
      
      B[l][n] : L x N array of bins (L levels, N bins per level)
      enqueue()
      Calculate hash function values h{0}, h{1}, .. h{L-1}
      Update bins at each level
      for i = 0 to L - 1
         if (B[i][h{i}].qlen > bin_size)
            B[i][h{i}].p_mark += p_increment;
         else if (B[i][h{i}].qlen == 0)
            B[i][h{i}].p_mark -= p_decrement;
      p_min = min(B[0][h{0}].p_mark ... B[L-1][h{L-1}].p_mark);
      if (p_min == 1.0)
          ratelimit();
      else
          mark/drop with probabilty p_min;
      
      I did the adaptation of Juliusz code to meet current kernel standards,
      and various changes to address previous comments :
      
      http://thread.gmane.org/gmane.linux.network/90225
      http://thread.gmane.org/gmane.linux.network/90375
      
      Default flow classifier is the rxhash introduced by RPS in 2.6.35, but
      we can use an external flow classifier if wanted.
      
      tc qdisc add dev $DEV parent 1:11 handle 11:  \
              est 0.5sec 2sec sfb limit 128
      
      tc filter add dev $DEV protocol ip parent 11: handle 3 \
              flow hash keys dst divisor 1024
      
      Notes:
      
      1) SFB default child qdisc is pfifo_fast. It can be changed by another
      qdisc but a child qdisc MUST not drop a packet previously queued. This
      is because SFB needs to handle a dequeued packet in order to maintain
      its virtual queue states. pfifo_head_drop or CHOKe should not be used.
      
      2) ECN is enabled by default, unlike RED/CHOKe/GRED
      
      With help from Patrick McHardy & Andi Kleen
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Andi Kleen <andi@firstfloor.org>
      CC: John W. Linville <linville@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e13e02a3
  18. 23 2月, 2011 1 次提交
  19. 21 2月, 2011 1 次提交
    • E
      net: Fix more stale on-stack list_head objects. · 5f04d506
      Eric W. Biederman 提交于
      From: Eric W. Biederman <ebiederm@xmission.com>
      
      In the beginning with batching unreg_list was a list that was used only
      once in the lifetime of a network device (I think).  Now we have calls
      using the unreg_list that can happen multiple times in the life of a
      network device like dev_deactivate and dev_close that are also using the
      unreg_list.  In addition in unregister_netdevice_queue we also do a
      list_move because for devices like veth pairs it is possible that
      unregister_netdevice_queue will be called multiple times.
      
      So I think the change below to fix dev_deactivate which Eric D. missed
      will fix this problem.  Now to go test that.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f04d506
  20. 15 2月, 2011 1 次提交
  21. 03 2月, 2011 3 次提交
  22. 27 1月, 2011 1 次提交
  23. 22 1月, 2011 1 次提交
    • E
      net_sched: TCQ_F_CAN_BYPASS generalization · 23624935
      Eric Dumazet 提交于
      Now qdisc stab is handled before TCQ_F_CAN_BYPASS test in
      __dev_xmit_skb(), we can generalize TCQ_F_CAN_BYPASS to other qdiscs
      than pfifo_fast : pfifo, bfifo, pfifo_head_drop and sfq
      
      SFQ is special because it can have external classifiers, and in these
      cases, we cannot bypass queue discipline (packet could be dropped by
      classifier) without admin asking it, or further changes.
      
      Its worth doing this, especially for SFQ, avoiding dirtying memory in
      case no packets are already waiting in queue.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23624935
  24. 21 1月, 2011 1 次提交