1. 07 10月, 2014 11 次提交
  2. 06 10月, 2014 12 次提交
  3. 05 10月, 2014 3 次提交
    • J
      net: sched: suspicious RCU usage in qdisc_watchdog · 1e203c1a
      John Fastabend 提交于
      Suspicious RCU usage in qdisc_watchdog call needs to be done inside
      rcu_read_lock/rcu_read_unlock. And then Qdisc destroy operations
      need to ensure timer is cancelled before removing qdisc structure.
      
      [ 3992.191339] ===============================
      [ 3992.191340] [ INFO: suspicious RCU usage. ]
      [ 3992.191343] 3.17.0-rc6net-next+ #72 Not tainted
      [ 3992.191345] -------------------------------
      [ 3992.191347] include/net/sch_generic.h:272 suspicious rcu_dereference_check() usage!
      [ 3992.191348]
      [ 3992.191348] other info that might help us debug this:
      [ 3992.191348]
      [ 3992.191351]
      [ 3992.191351] rcu_scheduler_active = 1, debug_locks = 1
      [ 3992.191353] no locks held by swapper/1/0.
      [ 3992.191355]
      [ 3992.191355] stack backtrace:
      [ 3992.191358] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.17.0-rc6net-next+ #72
      [ 3992.191360] Hardware name:                  /DZ77RE-75K, BIOS GAZ7711H.86A.0060.2012.1115.1750 11/15/2012
      [ 3992.191362]  0000000000000001 ffff880235803e48 ffffffff8178f92c 0000000000000000
      [ 3992.191366]  ffff8802322224a0 ffff880235803e78 ffffffff810c9966 ffff8800a5fe3000
      [ 3992.191370]  ffff880235803f30 ffff8802359cd768 ffff8802359cd6e0 ffff880235803e98
      [ 3992.191374] Call Trace:
      [ 3992.191376]  <IRQ>  [<ffffffff8178f92c>] dump_stack+0x4e/0x68
      [ 3992.191387]  [<ffffffff810c9966>] lockdep_rcu_suspicious+0xe6/0x130
      [ 3992.191392]  [<ffffffff8167213a>] qdisc_watchdog+0x8a/0xb0
      [ 3992.191396]  [<ffffffff810f93f2>] __run_hrtimer+0x72/0x420
      [ 3992.191399]  [<ffffffff810f9bcd>] ? hrtimer_interrupt+0x7d/0x240
      [ 3992.191403]  [<ffffffff816720b0>] ? tc_classify+0xc0/0xc0
      [ 3992.191406]  [<ffffffff810f9c4f>] hrtimer_interrupt+0xff/0x240
      [ 3992.191410]  [<ffffffff8109e4a5>] ? __atomic_notifier_call_chain+0x5/0x140
      [ 3992.191415]  [<ffffffff8103577b>] local_apic_timer_interrupt+0x3b/0x60
      [ 3992.191419]  [<ffffffff8179c2b5>] smp_apic_timer_interrupt+0x45/0x60
      [ 3992.191422]  [<ffffffff8179a6bf>] apic_timer_interrupt+0x6f/0x80
      [ 3992.191424]  <EOI>  [<ffffffff815ed233>] ? cpuidle_enter_state+0x73/0x2e0
      [ 3992.191432]  [<ffffffff815ed22e>] ? cpuidle_enter_state+0x6e/0x2e0
      [ 3992.191437]  [<ffffffff815ed567>] cpuidle_enter+0x17/0x20
      [ 3992.191441]  [<ffffffff810c0741>] cpu_startup_entry+0x3d1/0x4a0
      [ 3992.191445]  [<ffffffff81106fc6>] ? clockevents_config_and_register+0x26/0x30
      [ 3992.191448]  [<ffffffff81033c16>] start_secondary+0x1b6/0x260
      
      Fixes: b26b0d1e ("net: qdisc: use rcu prefix and silence sparse warnings")
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e203c1a
    • F
      net: dsa: do not call phy_start_aneg · f7d6b96f
      Florian Fainelli 提交于
      Commit f7f1de51 ("net: dsa: start and stop the PHY state machine")
      add calls to phy_start() in dsa_slave_open() respectively phy_stop() in
      dsa_slave_close().
      
      We also call phy_start_aneg() in dsa_slave_create(), and this call is
      messing up with the PHY state machine, since we basically start the
      auto-negotiation, and later on restart it when calling phy_start().
      phy_start() does not currently handle the PHY_FORCING or PHY_AN states
      properly, but such a fix would be too invasive for this window.
      
      Fixes: f7f1de51 ("net: dsa: start and stop the PHY state machine")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7d6b96f
    • V
      net: Cleanup skb cloning by adding SKB_FCLONE_FREE · c8753d55
      Vijay Subramanian 提交于
      SKB_FCLONE_UNAVAILABLE has overloaded meaning depending on type of skb.
      1: If skb is allocated from head_cache, it indicates fclone is not available.
      2: If skb is a companion fclone skb (allocated from fclone_cache), it indicates
      it is available to be used.
      
      To avoid confusion for case 2 above, this patch  replaces
      SKB_FCLONE_UNAVAILABLE with SKB_FCLONE_FREE where appropriate. For fclone
      companion skbs, this indicates it is free for use.
      
      SKB_FCLONE_UNAVAILABLE will now simply indicate skb is from head_cache and
      cannot / will not have a companion fclone.
      Signed-off-by: NVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8753d55
  4. 04 10月, 2014 8 次提交
    • T
      ip_tunnel: Add GUE support · bc1fc390
      Tom Herbert 提交于
      This patch allows configuring IPIP, sit, and GRE tunnels to use GUE.
      This is very similar to fou excpet that we need to insert the GUE header
      in addition to the UDP header on transmit.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc1fc390
    • T
      gue: Receive side for Generic UDP Encapsulation · 37dd0247
      Tom Herbert 提交于
      This patch adds support receiving for GUE packets in the fou module. The
      fou module now supports direct foo-over-udp (no encapsulation header)
      and GUE. To support this a type parameter is added to the fou netlink
      parameters.
      
      For a GUE socket we define gue_udp_recv, gue_gro_receive, and
      gue_gro_complete to handle the specifics of the GUE protocol. Most
      of the code to manage and configure sockets is common with the fou.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37dd0247
    • T
      fou: eliminate IPv4,v6 specific GRO functions · efc98d08
      Tom Herbert 提交于
      This patch removes fou[46]_gro_receive and fou[46]_gro_complete
      functions. The v4 or v6 variants were chosen for the UDP offloads
      based on the address family of the socket this is not necessary
      or correct. Alternatively, this patch adds is_ipv6 to napi_gro_skb.
      This is set in udp6_gro_receive and unset in udp4_gro_receive. In
      fou_gro_receive the value is used to select the correct inet_offloads
      for the protocol of the outer IP header.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efc98d08
    • T
      ip_tunnel: Account for secondary encapsulation header in max_headroom · 7371e022
      Tom Herbert 提交于
      When adjusting max_header for the tunnel interface based on egress
      device we need to account for any extra bytes in secondary encapsulation
      (e.g. FOU).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7371e022
    • E
      net: do not export skb_gro_receive() · 01291202
      Eric Dumazet 提交于
      skb_gro_receive() is only called from tcp_gro_receive() which is
      not in a module.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      01291202
    • E
      qdisc: validate skb without holding lock · 55a93b3e
      Eric Dumazet 提交于
      Validation of skb can be pretty expensive :
      
      GSO segmentation and/or checksum computations.
      
      We can do this without holding qdisc lock, so that other cpus
      can queue additional packets.
      
      Trick is that requeued packets were already validated, so we carry
      a boolean so that sch_direct_xmit() can validate a fresh skb list,
      or directly use an old one.
      
      Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads
      host.
      
      Turning TSO on or off had no effect on throughput, only few more cpu
      cycles. Lock contention on qdisc lock disappeared.
      
      Same if disabling TX checksum offload.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55a93b3e
    • J
      qdisc: dequeue bulking also pickup GSO/TSO packets · 808e7ac0
      Jesper Dangaard Brouer 提交于
      The TSO and GSO segmented packets already benefit from bulking
      on their own.
      
      The TSO packets have always taken advantage of the only updating
      the tailptr once for a large packet.
      
      The GSO segmented packets have recently taken advantage of
      bulking xmit_more API, via merge commit 53fda7f7 ("Merge
      branch 'xmit_list'"), specifically via commit 7f2e870f ("net:
      Move main gso loop out of dev_hard_start_xmit() into helper.")
      allowing qdisc requeue of remaining list.  And via commit
      ce93718f ("net: Don't keep around original SKB when we
      software segment GSO frames.").
      
      This patch allow further bulking of TSO/GSO packets together,
      when dequeueing from the qdisc.
      
      Testing:
       Measuring HoL (Head-of-Line) blocking for TSO and GSO, with
      netperf-wrapper. Bulking several TSO show no performance regressions
      (requeues were in the area 32 requeues/sec).
      
      Bulking several GSOs does show small regression or very small
      improvement (requeues were in the area 8000 requeues/sec).
      
       Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional
      latency. Base-case, which is "normal" GSO bulking, sees varying
      high-prio queue delay between 0.38ms to 0.47ms.  Bulking several GSOs
      together, result in a stable high-prio queue delay of 0.50ms.
      
       Using igb at 100Mbit/s with GSO bulking, shows an improvement.
      Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      808e7ac0
    • J
      qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE · 5772e9a3
      Jesper Dangaard Brouer 提交于
      Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
      sending/processing an entire skb list.
      
      This patch implements qdisc bulk dequeue, by allowing multiple packets
      to be dequeued in dequeue_skb().
      
      The optimization principle for this is two fold, (1) to amortize
      locking cost and (2) avoid expensive tailptr update for notifying HW.
       (1) Several packets are dequeued while holding the qdisc root_lock,
      amortizing locking cost over several packet.  The dequeued SKB list is
      processed under the TXQ lock in dev_hard_start_xmit(), thus also
      amortizing the cost of the TXQ lock.
       (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
      API to delay HW tailptr update, which also reduces the cost per
      packet.
      
      One restriction of the new API is that every SKB must belong to the
      same TXQ.  This patch takes the easy way out, by restricting bulk
      dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
      qdisc only have attached a single TXQ.
      
      Some detail about the flow; dev_hard_start_xmit() will process the skb
      list, and transmit packets individually towards the driver (see
      xmit_one()).  In case the driver stops midway in the list, the
      remaining skb list is returned by dev_hard_start_xmit().  In
      sch_direct_xmit() this returned list is requeued by dev_requeue_skb().
      
      To avoid overshooting the HW limits, which results in requeuing, the
      patch limits the amount of bytes dequeued, based on the drivers BQL
      limits.  In-effect bulking will only happen for BQL enabled drivers.
      
      Small amounts for extra HoL blocking (2x MTU/0.24ms) were
      measured at 100Mbit/s, with bulking 8 packets, but the
      oscillating nature of the measurement indicate something, like
      sched latency might be causing this effect. More comparisons
      show, that this oscillation goes away occationally. Thus, we
      disregard this artifact completely and remove any "magic" bulking
      limit.
      
      For now, as a conservative approach, stop bulking when seeing TSO and
      segmented GSO packets.  They already benefit from bulking on their own.
      A followup patch add this, to allow easier bisect-ability for finding
      regressions.
      
      Jointed work with Hannes, Daniel and Florian.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5772e9a3
  5. 03 10月, 2014 6 次提交