1. 04 10月, 2014 2 次提交
    • J
      qdisc: dequeue bulking also pickup GSO/TSO packets · 808e7ac0
      Jesper Dangaard Brouer 提交于
      The TSO and GSO segmented packets already benefit from bulking
      on their own.
      
      The TSO packets have always taken advantage of the only updating
      the tailptr once for a large packet.
      
      The GSO segmented packets have recently taken advantage of
      bulking xmit_more API, via merge commit 53fda7f7 ("Merge
      branch 'xmit_list'"), specifically via commit 7f2e870f ("net:
      Move main gso loop out of dev_hard_start_xmit() into helper.")
      allowing qdisc requeue of remaining list.  And via commit
      ce93718f ("net: Don't keep around original SKB when we
      software segment GSO frames.").
      
      This patch allow further bulking of TSO/GSO packets together,
      when dequeueing from the qdisc.
      
      Testing:
       Measuring HoL (Head-of-Line) blocking for TSO and GSO, with
      netperf-wrapper. Bulking several TSO show no performance regressions
      (requeues were in the area 32 requeues/sec).
      
      Bulking several GSOs does show small regression or very small
      improvement (requeues were in the area 8000 requeues/sec).
      
       Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional
      latency. Base-case, which is "normal" GSO bulking, sees varying
      high-prio queue delay between 0.38ms to 0.47ms.  Bulking several GSOs
      together, result in a stable high-prio queue delay of 0.50ms.
      
       Using igb at 100Mbit/s with GSO bulking, shows an improvement.
      Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      808e7ac0
    • J
      qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE · 5772e9a3
      Jesper Dangaard Brouer 提交于
      Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
      sending/processing an entire skb list.
      
      This patch implements qdisc bulk dequeue, by allowing multiple packets
      to be dequeued in dequeue_skb().
      
      The optimization principle for this is two fold, (1) to amortize
      locking cost and (2) avoid expensive tailptr update for notifying HW.
       (1) Several packets are dequeued while holding the qdisc root_lock,
      amortizing locking cost over several packet.  The dequeued SKB list is
      processed under the TXQ lock in dev_hard_start_xmit(), thus also
      amortizing the cost of the TXQ lock.
       (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
      API to delay HW tailptr update, which also reduces the cost per
      packet.
      
      One restriction of the new API is that every SKB must belong to the
      same TXQ.  This patch takes the easy way out, by restricting bulk
      dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
      qdisc only have attached a single TXQ.
      
      Some detail about the flow; dev_hard_start_xmit() will process the skb
      list, and transmit packets individually towards the driver (see
      xmit_one()).  In case the driver stops midway in the list, the
      remaining skb list is returned by dev_hard_start_xmit().  In
      sch_direct_xmit() this returned list is requeued by dev_requeue_skb().
      
      To avoid overshooting the HW limits, which results in requeuing, the
      patch limits the amount of bytes dequeued, based on the drivers BQL
      limits.  In-effect bulking will only happen for BQL enabled drivers.
      
      Small amounts for extra HoL blocking (2x MTU/0.24ms) were
      measured at 100Mbit/s, with bulking 8 packets, but the
      oscillating nature of the measurement indicate something, like
      sched latency might be causing this effect. More comparisons
      show, that this oscillation goes away occationally. Thus, we
      disregard this artifact completely and remove any "magic" bulking
      limit.
      
      For now, as a conservative approach, stop bulking when seeing TSO and
      segmented GSO packets.  They already benefit from bulking on their own.
      A followup patch add this, to allow easier bisect-ability for finding
      regressions.
      
      Jointed work with Hannes, Daniel and Florian.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5772e9a3
  2. 30 9月, 2014 1 次提交
    • J
      net: sched: make bstats per cpu and estimator RCU safe · 22e0f8b9
      John Fastabend 提交于
      In order to run qdisc's without locking statistics and estimators
      need to be handled correctly.
      
      To resolve bstats make the statistics per cpu. And because this is
      only needed for qdiscs that are running without locks which is not
      the case for most qdiscs in the near future only create percpu
      stats when qdiscs set the TCQ_F_CPUSTATS flag.
      
      Next because estimators use the bstats to calculate packets per
      second and bytes per second the estimator code paths are updated
      to use the per cpu statistics.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22e0f8b9
  3. 20 9月, 2014 1 次提交
  4. 14 9月, 2014 2 次提交
  5. 04 9月, 2014 1 次提交
  6. 03 9月, 2014 1 次提交
  7. 02 9月, 2014 2 次提交
  8. 30 8月, 2014 1 次提交
  9. 02 7月, 2014 1 次提交
  10. 01 4月, 2014 1 次提交
  11. 11 1月, 2014 1 次提交
    • J
      net: core: explicitly select a txq before doing l2 forwarding · f663dd9a
      Jason Wang 提交于
      Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
      will cause several issues:
      
      - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
        instead of lower device which misses the necessary txq synchronization for
        lower device such as txq stopping or frozen required by dev watchdog or
        control path.
      - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
        watchdog.
      - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
        when tso is disabled for lower device.
      
      Fix this by explicitly introducing a new param for .ndo_select_queue() for just
      selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
      extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
      forwarding transmission.
      
      With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
      to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
      a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
      dev_queue_xmit() to do the transmission.
      
      In the future, it was also required for macvtap l2 forwarding support since it
      provides a necessary synchronization method.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: e1000-devel@lists.sourceforge.net
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f663dd9a
  12. 14 12月, 2013 1 次提交
  13. 11 12月, 2013 1 次提交
  14. 08 11月, 2013 1 次提交
  15. 08 10月, 2013 1 次提交
    • E
      net: Separate the close_list and the unreg_list v2 · 5cde2829
      Eric W. Biederman 提交于
      Separate the unreg_list and the close_list in dev_close_many preventing
      dev_close_many from permuting the unreg_list.  The permutations of the
      unreg_list have resulted in cases where the loopback device is accessed
      it has been freed in code such as dst_ifdown.  Resulting in subtle memory
      corruption.
      
      This is the second bug from sharing the storage between the close_list
      and the unreg_list.  The issues that crop up with sharing are
      apparently too subtle to show up in normal testing or usage, so let's
      forget about being clever and use two separate lists.
      
      v2: Make all callers pass in a close_list to dev_close_many
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5cde2829
  16. 21 9月, 2013 1 次提交
  17. 01 9月, 2013 2 次提交
  18. 31 8月, 2013 1 次提交
    • S
      qdisc: allow setting default queuing discipline · 6da7c8fc
      stephen hemminger 提交于
      By default, the pfifo_fast queue discipline has been used by default
      for all devices. But we have better choices now.
      
      This patch allow setting the default queueing discipline with sysctl.
      This allows easy use of better queueing disciplines on all devices
      without having to use tc qdisc scripts. It is intended to allow
      an easy path for distributions to make fq_codel or sfq the default
      qdisc.
      
      This patch also makes pfifo_fast more of a first class qdisc, since
      it is now possible to manually override the default and explicitly
      use pfifo_fast. The behavior for systems who do not use the sysctl
      is unchanged, they still get pfifo_fast
      
      Also removes leftover random # in sysctl net core.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6da7c8fc
  19. 15 8月, 2013 1 次提交
    • J
      net_sched: restore "linklayer atm" handling · 8a8e3d84
      Jesper Dangaard Brouer 提交于
      commit 56b765b7 ("htb: improved accuracy at high rates")
      broke the "linklayer atm" handling.
      
       tc class add ... htb rate X ceil Y linklayer atm
      
      The linklayer setting is implemented by modifying the rate table
      which is send to the kernel.  No direct parameter were
      transferred to the kernel indicating the linklayer setting.
      
      The commit 56b765b7 ("htb: improved accuracy at high rates")
      removed the use of the rate table system.
      
      To keep compatible with older iproute2 utils, this patch detects
      the linklayer by parsing the rate table.  It also supports future
      versions of iproute2 to send this linklayer parameter to the
      kernel directly. This is done by using the __reserved field in
      struct tc_ratespec, to convey the choosen linklayer option, but
      only using the lower 4 bits of this field.
      
      Linklayer detection is limited to speeds below 100Mbit/s, because
      at high rates the rtab is gets too inaccurate, so bad that
      several fields contain the same values, this resembling the ATM
      detect.  Fields even start to contain "0" time to send, e.g. at
      1000Mbit/s sending a 96 bytes packet cost "0", thus the rtab have
      been more broken than we first realized.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a8e3d84
  20. 06 8月, 2013 1 次提交
  21. 12 6月, 2013 1 次提交
  22. 03 6月, 2013 1 次提交
  23. 28 3月, 2013 1 次提交
    • S
      sch: add missing u64 in psched_ratecfg_precompute() · ea872d77
      Sergey Popovich 提交于
      It seems that commit
      
      commit 292f1c7f
      Author: Jiri Pirko <jiri@resnulli.us>
      Date:   Tue Feb 12 00:12:03 2013 +0000
      
          sch: make htb_rate_cfg and functions around that generic
      
      adds little regression.
      
      Before:
      
      # tc qdisc add dev eth0 root handle 1: htb default ffff
      # tc class add dev eth0 classid 1:ffff htb rate 5Gbit
      # tc -s class show dev eth0
      class htb 1:ffff root prio 0 rate 5000Mbit ceil 5000Mbit burst 625b cburst
      625b
       Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
       rate 0bit 0pps backlog 0b 0p requeues 0
       lended: 0 borrowed: 0 giants: 0
       tokens: 31 ctokens: 31
      
      After:
      
      # tc qdisc add dev eth0 root handle 1: htb default ffff
      # tc class add dev eth0 classid 1:ffff htb rate 5Gbit
      # tc -s class show dev eth0
      class htb 1:ffff root prio 0 rate 1544Mbit ceil 1544Mbit burst 625b cburst
      625b
       Sent 5073 bytes 41 pkt (dropped 0, overlimits 0 requeues 0)
       rate 1976bit 2pps backlog 0b 0p requeues 0
       lended: 41 borrowed: 0 giants: 0
       tokens: 1802 ctokens: 1802
      
      This probably due to lost u64 cast of rate parameter in
      psched_ratecfg_precompute() (net/sched/sch_generic.c).
      Signed-off-by: NSergey Popovich <popovich_sergei@mail.ru>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea872d77
  24. 13 2月, 2013 1 次提交
  25. 12 12月, 2012 1 次提交
    • E
      pkt_sched: avoid requeues if possible · 1abbe139
      Eric Dumazet 提交于
      With BQL being deployed, we can more likely have following behavior :
      
      We dequeue a packet from qdisc in dequeue_skb(), then we realize target
      tx queue is in XOFF state in sch_direct_xmit(), and we have to hold the
      skb into gso_skb for later.
      
      This shows in stats (tc -s qdisc dev eth0) as requeues.
      
      Problem of these requeues is that high priority packets can not be
      dequeued as long as this (possibly low prio and big TSO packet) is not
      removed from gso_skb.
      
      At 1Gbps speed, a full size TSO packet is 500 us of extra latency.
      
      In some cases, we know that all packets dequeued from a qdisc are
      for a particular and known txq :
      
      - If device is non multi queue
      - For all MQ/MQPRIO slave qdiscs
      
      This patch introduces a new qdisc flag, TCQ_F_ONETXQUEUE to mark
      this capability, so that dequeue_skb() is allowed to dequeue a packet
      only if the associated txq is not stopped.
      
      This indeed reduce latencies for high prio packets (or improve fairness
      with sfq/fq_codel), and almost remove qdisc 'requeues'.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1abbe139
  26. 06 9月, 2012 1 次提交
    • E
      net: qdisc busylock needs lockdep annotations · 23d3b8bf
      Eric Dumazet 提交于
      It seems we need to provide ability for stacked devices
      to use specific lock_class_key for sch->busylock
      
      We could instead default l2tpeth tx_queue_len to 0 (no qdisc), but
      a user might use a qdisc anyway.
      
      (So same fixes are probably needed on non LLTX stacked drivers)
      
      Noticed while stressing L2TPV3 setup :
      
      ======================================================
       [ INFO: possible circular locking dependency detected ]
       3.6.0-rc3+ #788 Not tainted
       -------------------------------------------------------
       netperf/4660 is trying to acquire lock:
        (l2tpsock){+.-...}, at: [<ffffffffa0208db2>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
      
       but task is already holding lock:
        (&(&sch->busylock)->rlock){+.-...}, at: [<ffffffff81596595>] dev_queue_xmit+0xd75/0xe00
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (&(&sch->busylock)->rlock){+.-...}:
              [<ffffffff810a5df0>] lock_acquire+0x90/0x200
              [<ffffffff817499fc>] _raw_spin_lock_irqsave+0x4c/0x60
              [<ffffffff81074872>] __wake_up+0x32/0x70
              [<ffffffff8136d39e>] tty_wakeup+0x3e/0x80
              [<ffffffff81378fb3>] pty_write+0x73/0x80
              [<ffffffff8136cb4c>] tty_put_char+0x3c/0x40
              [<ffffffff813722b2>] process_echoes+0x142/0x330
              [<ffffffff813742ab>] n_tty_receive_buf+0x8fb/0x1230
              [<ffffffff813777b2>] flush_to_ldisc+0x142/0x1c0
              [<ffffffff81062818>] process_one_work+0x198/0x760
              [<ffffffff81063236>] worker_thread+0x186/0x4b0
              [<ffffffff810694d3>] kthread+0x93/0xa0
              [<ffffffff81753e24>] kernel_thread_helper+0x4/0x10
      
       -> #0 (l2tpsock){+.-...}:
              [<ffffffff810a5288>] __lock_acquire+0x1628/0x1b10
              [<ffffffff810a5df0>] lock_acquire+0x90/0x200
              [<ffffffff817498c1>] _raw_spin_lock+0x41/0x50
              [<ffffffffa0208db2>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
              [<ffffffffa021a802>] l2tp_eth_dev_xmit+0x32/0x60 [l2tp_eth]
              [<ffffffff815952b2>] dev_hard_start_xmit+0x502/0xa70
              [<ffffffff815b63ce>] sch_direct_xmit+0xfe/0x290
              [<ffffffff81595a05>] dev_queue_xmit+0x1e5/0xe00
              [<ffffffff815d9d60>] ip_finish_output+0x3d0/0x890
              [<ffffffff815db019>] ip_output+0x59/0xf0
              [<ffffffff815da36d>] ip_local_out+0x2d/0xa0
              [<ffffffff815da5a3>] ip_queue_xmit+0x1c3/0x680
              [<ffffffff815f4192>] tcp_transmit_skb+0x402/0xa60
              [<ffffffff815f4a94>] tcp_write_xmit+0x1f4/0xa30
              [<ffffffff815f5300>] tcp_push_one+0x30/0x40
              [<ffffffff815e6672>] tcp_sendmsg+0xe82/0x1040
              [<ffffffff81614495>] inet_sendmsg+0x125/0x230
              [<ffffffff81576cdc>] sock_sendmsg+0xdc/0xf0
              [<ffffffff81579ece>] sys_sendto+0xfe/0x130
              [<ffffffff81752c92>] system_call_fastpath+0x16/0x1b
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(&(&sch->busylock)->rlock);
                                      lock(l2tpsock);
                                      lock(&(&sch->busylock)->rlock);
         lock(l2tpsock);
      
        *** DEADLOCK ***
      
       5 locks held by netperf/4660:
        #0:  (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff815e581c>] tcp_sendmsg+0x2c/0x1040
        #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff815da3e0>] ip_queue_xmit+0x0/0x680
        #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff815d9ac5>] ip_finish_output+0x135/0x890
        #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff81595820>] dev_queue_xmit+0x0/0xe00
        #4:  (&(&sch->busylock)->rlock){+.-...}, at: [<ffffffff81596595>] dev_queue_xmit+0xd75/0xe00
      
       stack backtrace:
       Pid: 4660, comm: netperf Not tainted 3.6.0-rc3+ #788
       Call Trace:
        [<ffffffff8173dbf8>] print_circular_bug+0x1fb/0x20c
        [<ffffffff810a5288>] __lock_acquire+0x1628/0x1b10
        [<ffffffff810a334b>] ? check_usage+0x9b/0x4d0
        [<ffffffff810a3f44>] ? __lock_acquire+0x2e4/0x1b10
        [<ffffffff810a5df0>] lock_acquire+0x90/0x200
        [<ffffffffa0208db2>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
        [<ffffffff817498c1>] _raw_spin_lock+0x41/0x50
        [<ffffffffa0208db2>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
        [<ffffffffa0208db2>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
        [<ffffffffa021a802>] l2tp_eth_dev_xmit+0x32/0x60 [l2tp_eth]
        [<ffffffff815952b2>] dev_hard_start_xmit+0x502/0xa70
        [<ffffffff81594e0e>] ? dev_hard_start_xmit+0x5e/0xa70
        [<ffffffff81595961>] ? dev_queue_xmit+0x141/0xe00
        [<ffffffff815b63ce>] sch_direct_xmit+0xfe/0x290
        [<ffffffff81595a05>] dev_queue_xmit+0x1e5/0xe00
        [<ffffffff81595820>] ? dev_hard_start_xmit+0xa70/0xa70
        [<ffffffff815d9d60>] ip_finish_output+0x3d0/0x890
        [<ffffffff815d9ac5>] ? ip_finish_output+0x135/0x890
        [<ffffffff815db019>] ip_output+0x59/0xf0
        [<ffffffff815da36d>] ip_local_out+0x2d/0xa0
        [<ffffffff815da5a3>] ip_queue_xmit+0x1c3/0x680
        [<ffffffff815da3e0>] ? ip_local_out+0xa0/0xa0
        [<ffffffff815f4192>] tcp_transmit_skb+0x402/0xa60
        [<ffffffff815fa25e>] ? tcp_md5_do_lookup+0x18e/0x1a0
        [<ffffffff815f4a94>] tcp_write_xmit+0x1f4/0xa30
        [<ffffffff815f5300>] tcp_push_one+0x30/0x40
        [<ffffffff815e6672>] tcp_sendmsg+0xe82/0x1040
        [<ffffffff81614495>] inet_sendmsg+0x125/0x230
        [<ffffffff81614370>] ? inet_create+0x6b0/0x6b0
        [<ffffffff8157e6e2>] ? sock_update_classid+0xc2/0x3b0
        [<ffffffff8157e750>] ? sock_update_classid+0x130/0x3b0
        [<ffffffff81576cdc>] sock_sendmsg+0xdc/0xf0
        [<ffffffff81162579>] ? fget_light+0x3f9/0x4f0
        [<ffffffff81579ece>] sys_sendto+0xfe/0x130
        [<ffffffff810a69ad>] ? trace_hardirqs_on+0xd/0x10
        [<ffffffff8174a0b0>] ? _raw_spin_unlock_irq+0x30/0x50
        [<ffffffff810757e3>] ? finish_task_switch+0x83/0xf0
        [<ffffffff810757a6>] ? finish_task_switch+0x46/0xf0
        [<ffffffff81752cb7>] ? sysret_check+0x1b/0x56
        [<ffffffff81752c92>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23d3b8bf
  27. 15 8月, 2012 1 次提交
  28. 16 5月, 2012 1 次提交
  29. 02 4月, 2012 1 次提交
  30. 30 11月, 2011 1 次提交
  31. 17 11月, 2011 1 次提交
  32. 15 7月, 2011 1 次提交
  33. 27 6月, 2011 1 次提交
    • J
      net_sched: fix dequeuer fairness · d5b8aa1d
      jamal 提交于
      Results on dummy device can be seen in my netconf 2011
      slides. These results are for a 10Gige IXGBE intel
      nic - on another i5 machine, very similar specs to
      the one used in the netconf2011 results.
      It turns out - this is a hell lot worse than dummy
      and so this patch is even more beneficial for 10G.
      
      Test setup:
      ----------
      
      System under test sending packets out.
      Additional box connected directly dropping packets.
      Installed prio qdisc on the eth device and default
      netdev default length of 1000 used as is.
      The 3 prio bands each were set to 100 (didnt factor in
      the results).
      
      5 packet runs were made and the middle 3 picked.
      
      results
      -------
      
      The "cpu" column indicates the which cpu the sample
      was taken on,
      The "Pkt runx" carries the number of packets a cpu
      dequeued when forced to be in the "dequeuer" role.
      The "avg" for each run is the number of times each
      cpu should be a "dequeuer" if the system was fair.
      
      3.0-rc4      (plain)
      cpu         Pkt run1        Pkt run2        Pkt run3
      ================================================
      cpu0        21853354        21598183        22199900
      cpu1          431058          473476          393159
      cpu2          481975          477529          458466
      cpu3        23261406        23412299        22894315
      avg         11506948        11490372        11486460
      
      3.0-rc4 with patch and default weight 64
      cpu 	     Pkt run1        Pkt run2        Pkt run3
      ================================================
      cpu0        13205312        13109359        13132333
      cpu1        10189914        10159127        10122270
      cpu2        10213871        10124367        10168722
      cpu3        13165760        13164767        13096705
      avg         11693714        11639405        11630008
      
      As you can see the system is still not perfect but
      is a lot better than what it was before...
      
      At the moment we use the old backlog weight, weight_p
      which is 64 packets. It seems to be reasonably fine
      with that value.
      The system could be made more fair if we reduce the
      weight_p (as per my presentation), but we are going
      to affect the shared backlog weight. Unless deemed
      necessary, I think the default value is fine. If not
      we could add yet another knob.
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5b8aa1d
  34. 07 6月, 2011 1 次提交
  35. 23 5月, 2011 1 次提交
    • E
      net: avoid synchronize_rcu() in dev_deactivate_many · 3137663d
      Eric Dumazet 提交于
      dev_deactivate_many() issues one synchronize_rcu() call after qdiscs set
      to noop_qdisc.
      
      This call is here to make sure they are no outstanding qdisc-less
      dev_queue_xmit calls before returning to caller.
      
      But in dismantle phase, we dont have to wait, because we wont activate
      again the device, and we are going to wait one rcu grace period later in
      rollback_registered_many().
      
      After this patch, device dismantle uses one synchronize_net() and one
      rcu_barrier() call only, so we have a ~30% speedup and a smaller RTNL
      latency.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>,
      CC: Ben Greear <greearb@candelatech.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3137663d
  36. 04 3月, 2011 1 次提交