1. 08 6月, 2016 3 次提交
  2. 25 5月, 2016 2 次提交
  3. 18 5月, 2016 1 次提交
    • W
      net_sched: close another race condition in tcf_mirred_release() · dc327f89
      WANG Cong 提交于
      We saw the following extra refcount release on veth device:
      
        kernel: [7957821.463992] unregister_netdevice: waiting for mesos50284 to become free. Usage count = -1
      
      Since we heavily use mirred action to redirect packets to veth, I think
      this is caused by the following race condition:
      
      CPU0:
      tcf_mirred_release(): (in RCU callback)
      	struct net_device *dev = rcu_dereference_protected(m->tcfm_dev, 1);
      
      CPU1:
      mirred_device_event():
              spin_lock_bh(&mirred_list_lock);
              list_for_each_entry(m, &mirred_list, tcfm_list) {
                      if (rcu_access_pointer(m->tcfm_dev) == dev) {
                              dev_put(dev);
                              /* Note : no rcu grace period necessary, as
                               * net_device are already rcu protected.
                               */
                              RCU_INIT_POINTER(m->tcfm_dev, NULL);
                      }
              }
              spin_unlock_bh(&mirred_list_lock);
      
      CPU0:
      tcf_mirred_release():
              spin_lock_bh(&mirred_list_lock);
              list_del(&m->tcfm_list);
              spin_unlock_bh(&mirred_list_lock);
              if (dev)               // <======== Stil refers to the old m->tcfm_dev
                      dev_put(dev);  // <======== dev_put() is called on it again
      
      The action init code path is good because it is impossible to modify
      an action that is being removed.
      
      So, fix this by moving everything under the spinlock.
      
      Fixes: 2ee22a90 ("net_sched: act_mirred: remove spinlock in fast path")
      Fixes: 6bd00b85 ("act_mirred: fix a race condition on mirred_list")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc327f89
  4. 17 5月, 2016 4 次提交
  5. 11 5月, 2016 6 次提交
  6. 09 5月, 2016 1 次提交
    • E
      fq_codel: add memory limitation per queue · 95b58430
      Eric Dumazet 提交于
      On small embedded routers, one wants to control maximal amount of
      memory used by fq_codel, instead of controlling number of packets or
      bytes, since GRO/TSO make these not practical.
      
      Assuming skb->truesize is accurate, we have to keep track of
      skb->truesize sum for skbs in queue.
      
      This patch adds a new TCA_FQ_CODEL_MEMORY_LIMIT attribute.
      
      I chose a default value of 32 MBytes, which looks reasonable even
      for heavy duty usages. (Prior fq_codel users should not be hurt
      when they upgrade their kernels)
      
      Two fields are added to tc_fq_codel_qd_stats to report :
       - Current memory usage
       - Number of drops caused by memory limits
      
      # tc qd replace dev eth1 root est 1sec 4sec fq_codel memory_limit 4M
      ..
      # tc -s -d qd sh dev eth1
      qdisc fq_codel 8008: root refcnt 257 limit 10240p flows 1024
       quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
       Sent 2083566791363 bytes 1376214889 pkt (dropped 4994406, overlimits 0
      requeues 21705223)
       rate 9841Mbit 812549pps backlog 3906120b 376p requeues 21705223
        maxpacket 68130 drop_overlimit 4994406 new_flow_count 28855414
        ecn_mark 0 memory_used 4190048 drop_overmemory 4994406
        new_flows_len 1 old_flows_len 177
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Dave Täht <dave.taht@gmail.com>
      Cc: Sebastian Möller <moeller0@gmx.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95b58430
  7. 07 5月, 2016 1 次提交
  8. 05 5月, 2016 2 次提交
    • F
      net: remove dev->trans_start · 9b36627a
      Florian Westphal 提交于
      previous patches removed all direct accesses to dev->trans_start,
      so change the netif_trans_update helper to update trans_start of
      netdev queue 0 instead and then remove trans_start from struct net_device.
      
      AFAICS a lot of the netif_trans_update() invocations are now useless
      because they occur in ndo_start_xmit and driver doesn't set LLTX
      (i.e. stack already took care of the update).
      
      As I can't test any of them it seems better to just leave them alone.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b36627a
    • F
      treewide: replace dev->trans_start update with helper · 860e9538
      Florian Westphal 提交于
      Replace all trans_start updates with netif_trans_update helper.
      change was done via spatch:
      
      struct net_device *d;
      @@
      - d->trans_start = jiffies
      + netif_trans_update(d)
      
      Compile tested only.
      
      Cc: user-mode-linux-devel@lists.sourceforge.net
      Cc: linux-xtensa@linux-xtensa.org
      Cc: linux1394-devel@lists.sourceforge.net
      Cc: linux-rdma@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: MPT-FusionLinux.pdl@broadcom.com
      Cc: linux-scsi@vger.kernel.org
      Cc: linux-can@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-omap@vger.kernel.org
      Cc: linux-hams@vger.kernel.org
      Cc: linux-usb@vger.kernel.org
      Cc: linux-wireless@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: devel@driverdev.osuosl.org
      Cc: b.a.t.m.a.n@lists.open-mesh.org
      Cc: linux-bluetooth@vger.kernel.org
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NFelipe Balbi <felipe.balbi@linux.intel.com>
      Acked-by: NMugunthan V N <mugunthanvnm@ti.com>
      Acked-by: NAntonio Quartulli <a@unstable.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      860e9538
  9. 04 5月, 2016 1 次提交
    • E
      fq_codel: add batch ability to fq_codel_drop() · 9d18562a
      Eric Dumazet 提交于
      In presence of inelastic flows and stress, we can call
      fq_codel_drop() for every packet entering fq_codel qdisc.
      
      fq_codel_drop() is quite expensive, as it does a linear scan
      of 4 KB of memory to find a fat flow.
      Once found, it drops the oldest packet of this flow.
      
      Instead of dropping a single packet, try to drop 50% of the backlog
      of this fat flow, with a configurable limit of 64 packets per round.
      
      TCA_FQ_CODEL_DROP_BATCH_SIZE is the new attribute to make this
      limit configurable.
      
      With this strategy the 4 KB search is amortized to a single cache line
      per drop [1], so fq_codel_drop() no longer appears at the top of kernel
      profile in presence of few inelastic flows.
      
      [1] Assuming a 64byte cache line, and 1024 buckets
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDave Taht <dave.taht@gmail.com>
      Cc: Jonathan Morton <chromatix99@gmail.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: Dave Taht
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d18562a
  10. 03 5月, 2016 1 次提交
    • N
      netem: Segment GSO packets on enqueue · 6071bd1a
      Neil Horman 提交于
      This was recently reported to me, and reproduced on the latest net kernel,
      when attempting to run netperf from a host that had a netem qdisc attached
      to the egress interface:
      
      [  788.073771] ---------------------[ cut here ]---------------------------
      [  788.096716] WARNING: at net/core/dev.c:2253 skb_warn_bad_offload+0xcd/0xda()
      [  788.129521] bnx2: caps=(0x00000001801949b3, 0x0000000000000000) len=2962
      data_len=0 gso_size=1448 gso_type=1 ip_summed=3
      [  788.182150] Modules linked in: sch_netem kvm_amd kvm crc32_pclmul ipmi_ssif
      ghash_clmulni_intel sp5100_tco amd64_edac_mod aesni_intel lrw gf128mul
      glue_helper ablk_helper edac_mce_amd cryptd pcspkr sg edac_core hpilo ipmi_si
      i2c_piix4 k10temp fam15h_power hpwdt ipmi_msghandler shpchp acpi_power_meter
      pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c
      sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt
      i2c_algo_bit drm_kms_helper ahci ata_generic pata_acpi ttm libahci
      crct10dif_pclmul pata_atiixp tg3 libata crct10dif_common drm crc32c_intel ptp
      serio_raw bnx2 r8169 hpsa pps_core i2c_core mii dm_mirror dm_region_hash dm_log
      dm_mod
      [  788.465294] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W
      ------------   3.10.0-327.el7.x86_64 #1
      [  788.511521] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/17/2012
      [  788.542260]  ffff880437c036b8 f7afc56532a53db9 ffff880437c03670
      ffffffff816351f1
      [  788.576332]  ffff880437c036a8 ffffffff8107b200 ffff880633e74200
      ffff880231674000
      [  788.611943]  0000000000000001 0000000000000003 0000000000000000
      ffff880437c03710
      [  788.647241] Call Trace:
      [  788.658817]  <IRQ>  [<ffffffff816351f1>] dump_stack+0x19/0x1b
      [  788.686193]  [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0
      [  788.713803]  [<ffffffff8107b29c>] warn_slowpath_fmt+0x5c/0x80
      [  788.741314]  [<ffffffff812f92f3>] ? ___ratelimit+0x93/0x100
      [  788.767018]  [<ffffffff81637f49>] skb_warn_bad_offload+0xcd/0xda
      [  788.796117]  [<ffffffff8152950c>] skb_checksum_help+0x17c/0x190
      [  788.823392]  [<ffffffffa01463a1>] netem_enqueue+0x741/0x7c0 [sch_netem]
      [  788.854487]  [<ffffffff8152cb58>] dev_queue_xmit+0x2a8/0x570
      [  788.880870]  [<ffffffff8156ae1d>] ip_finish_output+0x53d/0x7d0
      ...
      
      The problem occurs because netem is not prepared to handle GSO packets (as it
      uses skb_checksum_help in its enqueue path, which cannot manipulate these
      frames).
      
      The solution I think is to simply segment the skb in a simmilar fashion to the
      way we do in __dev_queue_xmit (via validate_xmit_skb), with some minor changes.
      When we decide to corrupt an skb, if the frame is GSO, we segment it, corrupt
      the first segment, and enqueue the remaining ones.
      
      tested successfully by myself on the latest net kernel, to which this applies
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Jamal Hadi Salim <jhs@mojatatu.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: netem@lists.linux-foundation.org
      CC: eric.dumazet@gmail.com
      CC: stephen@networkplumber.org
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6071bd1a
  11. 27 4月, 2016 2 次提交
  12. 26 4月, 2016 3 次提交
  13. 14 4月, 2016 2 次提交
  14. 19 3月, 2016 1 次提交
    • D
      cls_bpf: reset class and reuse major in da · 3a461da1
      Daniel Borkmann 提交于
      There are two issues with the current code. First one is that we need
      to set res->class to 0 in case we use non-default classid matching.
      
      This is important for the case where cls_bpf was initially set up with
      an optional binding to a default class with tcf_bind_filter(), where
      the underlying qdisc implements bind_tcf() that fills res->class and
      tests for it later on when doing the classification. Convention for
      these cases is that after tc_classify() was called, such qdiscs (atm,
      drr, qfq, cbq, hfsc, htb) first test class, and if 0, then they lookup
      based on classid.
      
      Second, there's a bug with da mode, where res->classid is only assigned
      a 16 bit minor, but it needs to expand to the full 32 bit major/minor
      combination instead, therefore we need to expand with the bound major.
      This is fine as classes belonging to a classful qdisc must share the
      same major.
      
      Fixes: 045efa82 ("cls_bpf: introduce integrated actions")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a461da1
  15. 12 3月, 2016 1 次提交
  16. 11 3月, 2016 1 次提交
  17. 09 3月, 2016 1 次提交
    • K
      net_sched: dsmark: use qdisc_dequeue_peeked() · f8b33d8e
      Kyeong Yoo 提交于
      This fix is for dsmark similar to commit 3557619f
      ("net_sched: prio: use qdisc_dequeue_peeked")
      and makes use of qdisc_dequeue_peeked() instead of direct dequeue() call.
      
      First time, wrr peeks dsmark, which will then peek into sfq.
      sfq dequeues an skb and it's stored in sch->gso_skb.
      Next time, wrr tries to dequeue from dsmark, which will call sfq dequeue
      directly. This results skipping the previously peeked skb.
      
      So changed dsmark dequeue to call qdisc_dequeue_peeked() instead to use
      peeked skb if exists.
      Signed-off-by: NKyeong Yoo <kyeong.yoo@alliedtelesis.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8b33d8e
  18. 07 3月, 2016 2 次提交
  19. 04 3月, 2016 1 次提交
  20. 02 3月, 2016 4 次提交
    • D
      sch_mqprio: Fix build with older gcc. · 241deec9
      David S. Miller 提交于
        CC [M]  net/sched/sch_mqprio.o
      net/sched/sch_mqprio.c: In function ?mqprio_init?:
      net/sched/sch_mqprio.c:145: error: unknown field ?tc? specified in initializer
      net/sched/sch_mqprio.c:145: warning: missing braces around initializer
      net/sched/sch_mqprio.c:145: warning: (near initialization for ?tc.<anonymous>?)
      make[2]: *** [net/sched/sch_mqprio.o] Error 1
      make[1]: *** [net/sched] Error 2
      make: *** [net] Error 2
      
      Several people reported this, surround the unnamed union
      member initialization with braces to fix.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      241deec9
    • W
      net: remove skb_sender_cpu_clear() · 64d4e343
      WANG Cong 提交于
      After commit 52bd2d62 ("net: better skb->sender_cpu and skb->napi_id cohabitation")
      skb_sender_cpu_clear() becomes empty and can be removed.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64d4e343
    • J
      Support to encoding decoding skb prio on IFE action · 200e10f4
      Jamal Hadi Salim 提交于
          Example usage:
          Set the skb priority using skbedit then allow it to be encoded
      
          sudo tc qdisc add dev $ETH root handle 1: prio
          sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
          u32 match ip protocol 1 0xff flowid 1:2 \
          action skbedit prio 17 \
          action ife encode \
          allow prio \
          dst 02:15:15:15:15:15
      
          Note: You dont need the skbedit action if you are already encoding the
          skb priority earlier. A zero skb priority will not be sent
      
          Alternative hard code static priority of decimal 33 (unlike skbedit)
          then mark of 0x12 every time the filter matches
      
          sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \
          u32 match ip protocol 1 0xff flowid 1:2 \
          action ife encode \
          type 0xDEAD \
          use prio 33 \
          use mark 0x12 \
          dst 02:15:15:15:15:15
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      200e10f4
    • J
      Support to encoding decoding skb mark on IFE action · 084e2f65
      Jamal Hadi Salim 提交于
      Example usage:
      Set the skb using skbedit then allow it to be encoded
      
      sudo tc qdisc add dev $ETH root handle 1: prio
      sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
      u32 match ip protocol 1 0xff flowid 1:2 \
      action skbedit mark 17 \
      action ife encode \
      allow mark \
      dst 02:15:15:15:15:15
      
      Note: You dont need the skbedit action if you are already encoding the
      skb mark earlier. A zero skb mark, when seen, will not be encoded.
      
      Alternative hard code static mark of 0x12 every time the filter matches
      
      sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \
      u32 match ip protocol 1 0xff flowid 1:2 \
      action ife encode \
      type 0xDEAD \
      use mark 0x12 \
      dst 02:15:15:15:15:15
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      084e2f65