1. 30 10月, 2014 1 次提交
  2. 22 10月, 2014 1 次提交
  3. 10 10月, 2014 1 次提交
    • J
      net_sched: restore qdisc quota fairness limits after bulk dequeue · b8358d70
      Jesper Dangaard Brouer 提交于
      Restore the quota fairness between qdisc's, that we broke with commit
      5772e9a3 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE").
      
      Before that commit, the quota in __qdisc_run() were in packets as
      dequeue_skb() would only dequeue a single packet, that assumption
      broke with bulk dequeue.
      
      We choose not to account for the number of packets inside the TSO/GSO
      packets (accessable via "skb_gso_segs").  As the previous fairness
      also had this "defect". Thus, GSO/TSO packets counts as a single
      packet.
      
      Further more, we choose to slack on accuracy, by allowing a bulk
      dequeue try_bulk_dequeue_skb() to exceed the "packets" limit, only
      limited by the BQL bytelimit.  This is done because BQL prefers to get
      its full budget for appropriate feedback from TX completion.
      
      In future, we might consider reworking this further and, if it allows,
      switch to a time-based model, as suggested by Eric. Right now, we only
      restore old semantics.
      
      Joint work with Eric, Hannes, Daniel and Jesper.  Hannes wrote the
      first patch in cooperation with Daniel and Jesper.  Eric rewrote the
      patch.
      
      Fixes: 5772e9a3 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8358d70
  4. 09 10月, 2014 1 次提交
  5. 08 10月, 2014 1 次提交
    • E
      net: better IFF_XMIT_DST_RELEASE support · 02875878
      Eric Dumazet 提交于
      Testing xmit_more support with netperf and connected UDP sockets,
      I found strange dst refcount false sharing.
      
      Current handling of IFF_XMIT_DST_RELEASE is not optimal.
      
      Dropping dst in validate_xmit_skb() is certainly too late in case
      packet was queued by cpu X but dequeued by cpu Y
      
      The logical point to take care of drop/force is in __dev_queue_xmit()
      before even taking qdisc lock.
      
      As Julian Anastasov pointed out, need for skb_dst() might come from some
      packet schedulers or classifiers.
      
      This patch adds new helper to cleanly express needs of various drivers
      or qdiscs/classifiers.
      
      Drivers that need skb_dst() in their ndo_start_xmit() should call
      following helper in their setup instead of the prior :
      
      	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
      ->
      	netif_keep_dst(dev);
      
      Instead of using a single bit, we use two bits, one being
      eventually rebuilt in bonding/team drivers.
      
      The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
      rebuilt in bonding/team. Eventually, we could add something
      smarter later.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02875878
  6. 07 10月, 2014 3 次提交
  7. 06 10月, 2014 1 次提交
  8. 05 10月, 2014 2 次提交
    • I
      ematch: Fix early ending of inverted containers. · 34a419d4
      Ignacy Gawędzki 提交于
      The result of a negated container has to be inverted before checking for
      early ending.
      
      This fixes my previous attempt (17c9c823) to
      make inverted containers work correctly.
      Signed-off-by: NIgnacy Gawędzki <ignacy.gawedzki@green-communications.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34a419d4
    • J
      net: sched: suspicious RCU usage in qdisc_watchdog · 1e203c1a
      John Fastabend 提交于
      Suspicious RCU usage in qdisc_watchdog call needs to be done inside
      rcu_read_lock/rcu_read_unlock. And then Qdisc destroy operations
      need to ensure timer is cancelled before removing qdisc structure.
      
      [ 3992.191339] ===============================
      [ 3992.191340] [ INFO: suspicious RCU usage. ]
      [ 3992.191343] 3.17.0-rc6net-next+ #72 Not tainted
      [ 3992.191345] -------------------------------
      [ 3992.191347] include/net/sch_generic.h:272 suspicious rcu_dereference_check() usage!
      [ 3992.191348]
      [ 3992.191348] other info that might help us debug this:
      [ 3992.191348]
      [ 3992.191351]
      [ 3992.191351] rcu_scheduler_active = 1, debug_locks = 1
      [ 3992.191353] no locks held by swapper/1/0.
      [ 3992.191355]
      [ 3992.191355] stack backtrace:
      [ 3992.191358] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.17.0-rc6net-next+ #72
      [ 3992.191360] Hardware name:                  /DZ77RE-75K, BIOS GAZ7711H.86A.0060.2012.1115.1750 11/15/2012
      [ 3992.191362]  0000000000000001 ffff880235803e48 ffffffff8178f92c 0000000000000000
      [ 3992.191366]  ffff8802322224a0 ffff880235803e78 ffffffff810c9966 ffff8800a5fe3000
      [ 3992.191370]  ffff880235803f30 ffff8802359cd768 ffff8802359cd6e0 ffff880235803e98
      [ 3992.191374] Call Trace:
      [ 3992.191376]  <IRQ>  [<ffffffff8178f92c>] dump_stack+0x4e/0x68
      [ 3992.191387]  [<ffffffff810c9966>] lockdep_rcu_suspicious+0xe6/0x130
      [ 3992.191392]  [<ffffffff8167213a>] qdisc_watchdog+0x8a/0xb0
      [ 3992.191396]  [<ffffffff810f93f2>] __run_hrtimer+0x72/0x420
      [ 3992.191399]  [<ffffffff810f9bcd>] ? hrtimer_interrupt+0x7d/0x240
      [ 3992.191403]  [<ffffffff816720b0>] ? tc_classify+0xc0/0xc0
      [ 3992.191406]  [<ffffffff810f9c4f>] hrtimer_interrupt+0xff/0x240
      [ 3992.191410]  [<ffffffff8109e4a5>] ? __atomic_notifier_call_chain+0x5/0x140
      [ 3992.191415]  [<ffffffff8103577b>] local_apic_timer_interrupt+0x3b/0x60
      [ 3992.191419]  [<ffffffff8179c2b5>] smp_apic_timer_interrupt+0x45/0x60
      [ 3992.191422]  [<ffffffff8179a6bf>] apic_timer_interrupt+0x6f/0x80
      [ 3992.191424]  <EOI>  [<ffffffff815ed233>] ? cpuidle_enter_state+0x73/0x2e0
      [ 3992.191432]  [<ffffffff815ed22e>] ? cpuidle_enter_state+0x6e/0x2e0
      [ 3992.191437]  [<ffffffff815ed567>] cpuidle_enter+0x17/0x20
      [ 3992.191441]  [<ffffffff810c0741>] cpu_startup_entry+0x3d1/0x4a0
      [ 3992.191445]  [<ffffffff81106fc6>] ? clockevents_config_and_register+0x26/0x30
      [ 3992.191448]  [<ffffffff81033c16>] start_secondary+0x1b6/0x260
      
      Fixes: b26b0d1e ("net: qdisc: use rcu prefix and silence sparse warnings")
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e203c1a
  9. 04 10月, 2014 3 次提交
    • E
      qdisc: validate skb without holding lock · 55a93b3e
      Eric Dumazet 提交于
      Validation of skb can be pretty expensive :
      
      GSO segmentation and/or checksum computations.
      
      We can do this without holding qdisc lock, so that other cpus
      can queue additional packets.
      
      Trick is that requeued packets were already validated, so we carry
      a boolean so that sch_direct_xmit() can validate a fresh skb list,
      or directly use an old one.
      
      Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads
      host.
      
      Turning TSO on or off had no effect on throughput, only few more cpu
      cycles. Lock contention on qdisc lock disappeared.
      
      Same if disabling TX checksum offload.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55a93b3e
    • J
      qdisc: dequeue bulking also pickup GSO/TSO packets · 808e7ac0
      Jesper Dangaard Brouer 提交于
      The TSO and GSO segmented packets already benefit from bulking
      on their own.
      
      The TSO packets have always taken advantage of the only updating
      the tailptr once for a large packet.
      
      The GSO segmented packets have recently taken advantage of
      bulking xmit_more API, via merge commit 53fda7f7 ("Merge
      branch 'xmit_list'"), specifically via commit 7f2e870f ("net:
      Move main gso loop out of dev_hard_start_xmit() into helper.")
      allowing qdisc requeue of remaining list.  And via commit
      ce93718f ("net: Don't keep around original SKB when we
      software segment GSO frames.").
      
      This patch allow further bulking of TSO/GSO packets together,
      when dequeueing from the qdisc.
      
      Testing:
       Measuring HoL (Head-of-Line) blocking for TSO and GSO, with
      netperf-wrapper. Bulking several TSO show no performance regressions
      (requeues were in the area 32 requeues/sec).
      
      Bulking several GSOs does show small regression or very small
      improvement (requeues were in the area 8000 requeues/sec).
      
       Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional
      latency. Base-case, which is "normal" GSO bulking, sees varying
      high-prio queue delay between 0.38ms to 0.47ms.  Bulking several GSOs
      together, result in a stable high-prio queue delay of 0.50ms.
      
       Using igb at 100Mbit/s with GSO bulking, shows an improvement.
      Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      808e7ac0
    • J
      qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE · 5772e9a3
      Jesper Dangaard Brouer 提交于
      Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
      sending/processing an entire skb list.
      
      This patch implements qdisc bulk dequeue, by allowing multiple packets
      to be dequeued in dequeue_skb().
      
      The optimization principle for this is two fold, (1) to amortize
      locking cost and (2) avoid expensive tailptr update for notifying HW.
       (1) Several packets are dequeued while holding the qdisc root_lock,
      amortizing locking cost over several packet.  The dequeued SKB list is
      processed under the TXQ lock in dev_hard_start_xmit(), thus also
      amortizing the cost of the TXQ lock.
       (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
      API to delay HW tailptr update, which also reduces the cost per
      packet.
      
      One restriction of the new API is that every SKB must belong to the
      same TXQ.  This patch takes the easy way out, by restricting bulk
      dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
      qdisc only have attached a single TXQ.
      
      Some detail about the flow; dev_hard_start_xmit() will process the skb
      list, and transmit packets individually towards the driver (see
      xmit_one()).  In case the driver stops midway in the list, the
      remaining skb list is returned by dev_hard_start_xmit().  In
      sch_direct_xmit() this returned list is requeued by dev_requeue_skb().
      
      To avoid overshooting the HW limits, which results in requeuing, the
      patch limits the amount of bytes dequeued, based on the drivers BQL
      limits.  In-effect bulking will only happen for BQL enabled drivers.
      
      Small amounts for extra HoL blocking (2x MTU/0.24ms) were
      measured at 100Mbit/s, with bulking 8 packets, but the
      oscillating nature of the measurement indicate something, like
      sched latency might be causing this effect. More comparisons
      show, that this oscillation goes away occationally. Thus, we
      disregard this artifact completely and remove any "magic" bulking
      limit.
      
      For now, as a conservative approach, stop bulking when seeing TSO and
      segmented GSO packets.  They already benefit from bulking on their own.
      A followup patch add this, to allow easier bisect-ability for finding
      regressions.
      
      Jointed work with Hannes, Daniel and Florian.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5772e9a3
  10. 02 10月, 2014 2 次提交
    • W
      net_sched: avoid calling tcf_unbind_filter() in call_rcu callback · a0efb80c
      WANG Cong 提交于
      This fixes the following crash:
      
      [   63.976822] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [   63.980094] CPU: 1 PID: 15 Comm: ksoftirqd/1 Not tainted 3.17.0-rc6+ #648
      [   63.980094] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [   63.980094] task: ffff880117dea690 ti: ffff880117dfc000 task.ti: ffff880117dfc000
      [   63.980094] RIP: 0010:[<ffffffff817e6d07>]  [<ffffffff817e6d07>] u32_destroy_key+0x27/0x6d
      [   63.980094] RSP: 0018:ffff880117dffcc0  EFLAGS: 00010202
      [   63.980094] RAX: ffff880117dea690 RBX: ffff8800d02e0820 RCX: 0000000000000000
      [   63.980094] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 6b6b6b6b6b6b6b6b
      [   63.980094] RBP: ffff880117dffcd0 R08: 0000000000000000 R09: 0000000000000000
      [   63.980094] R10: 00006c0900006ba8 R11: 00006ba100006b9d R12: 0000000000000001
      [   63.980094] R13: ffff8800d02e0898 R14: ffffffff817e6d4d R15: ffff880117387a30
      [   63.980094] FS:  0000000000000000(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
      [   63.980094] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [   63.980094] CR2: 00007f07e6732fed CR3: 000000011665b000 CR4: 00000000000006e0
      [   63.980094] Stack:
      [   63.980094]  ffff88011a9cd300 ffffffff82051ac0 ffff880117dffce0 ffffffff817e6d68
      [   63.980094]  ffff880117dffd70 ffffffff810cb4c7 ffffffff810cb3cd ffff880117dfffd8
      [   63.980094]  ffff880117dea690 ffff880117dea690 ffff880117dfffd8 000000000000000a
      [   63.980094] Call Trace:
      [   63.980094]  [<ffffffff817e6d68>] u32_delete_key_freepf_rcu+0x1b/0x1d
      [   63.980094]  [<ffffffff810cb4c7>] rcu_process_callbacks+0x3bb/0x691
      [   63.980094]  [<ffffffff810cb3cd>] ? rcu_process_callbacks+0x2c1/0x691
      [   63.980094]  [<ffffffff817e6d4d>] ? u32_destroy_key+0x6d/0x6d
      [   63.980094]  [<ffffffff810780a4>] __do_softirq+0x142/0x323
      [   63.980094]  [<ffffffff810782a8>] run_ksoftirqd+0x23/0x53
      [   63.980094]  [<ffffffff81092126>] smpboot_thread_fn+0x203/0x221
      [   63.980094]  [<ffffffff81091f23>] ? smpboot_unpark_thread+0x33/0x33
      [   63.980094]  [<ffffffff8108e44d>] kthread+0xc9/0xd1
      [   63.980094]  [<ffffffff819e00ea>] ? do_wait_for_common+0xf8/0x125
      [   63.980094]  [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61
      [   63.980094]  [<ffffffff819e43ec>] ret_from_fork+0x7c/0xb0
      [   63.980094]  [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61
      
      tp could be freed in call_rcu callback too, the order is not guaranteed.
      
      John Fastabend says:
      
      ====================
      Its worth noting why this is safe. Any running schedulers will either
      read the valid class field or it will be zeroed.
      
      All schedulers today when the class is 0 do a lookup using the
      same call used by the tcf_exts_bind(). So even if we have a running
      classifier hit the null class pointer it will do a lookup and get
      to the same result. This is particularly fragile at the moment because
      the only way to verify this is to audit the schedulers call sites.
      ====================
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0efb80c
    • W
      net_sched: fix another crash in cls_tcindex · 6e056569
      WANG Cong 提交于
      This patch fixes the following crash:
      
      [  166.670795] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [  166.674230] IP: [<ffffffff814b739f>] __list_del_entry+0x5c/0x98
      [  166.674230] PGD d0ea5067 PUD ce7fc067 PMD 0
      [  166.674230] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [  166.674230] CPU: 1 PID: 775 Comm: tc Not tainted 3.17.0-rc6+ #642
      [  166.674230] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  166.674230] task: ffff8800d03c4d20 ti: ffff8800cae7c000 task.ti: ffff8800cae7c000
      [  166.674230] RIP: 0010:[<ffffffff814b739f>]  [<ffffffff814b739f>] __list_del_entry+0x5c/0x98
      [  166.674230] RSP: 0018:ffff8800cae7f7d0  EFLAGS: 00010207
      [  166.674230] RAX: 0000000000000000 RBX: ffff8800cba8d700 RCX: ffff8800cba8d700
      [  166.674230] RDX: 0000000000000000 RSI: dead000000200200 RDI: ffff8800cba8d700
      [  166.674230] RBP: ffff8800cae7f7d0 R08: 0000000000000001 R09: 0000000000000001
      [  166.674230] R10: 0000000000000000 R11: 000000000000859a R12: ffffffffffffffe8
      [  166.674230] R13: ffff8800cba8c5b8 R14: 0000000000000001 R15: ffff8800cba8d700
      [  166.674230] FS:  00007fdb5f04a740(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
      [  166.674230] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  166.674230] CR2: 0000000000000000 CR3: 00000000cf929000 CR4: 00000000000006e0
      [  166.674230] Stack:
      [  166.674230]  ffff8800cae7f7e8 ffffffff814b73e8 ffff8800cba8d6e8 ffff8800cae7f828
      [  166.674230]  ffffffff817caeec 0000000000000046 ffff8800cba8c5b0 ffff8800cba8c5b8
      [  166.674230]  0000000000000000 0000000000000001 ffff8800cf8e33e8 ffff8800cae7f848
      [  166.674230] Call Trace:
      [  166.674230]  [<ffffffff814b73e8>] list_del+0xd/0x2b
      [  166.674230]  [<ffffffff817caeec>] tcf_action_destroy+0x4c/0x71
      [  166.674230]  [<ffffffff817ca0ce>] tcf_exts_destroy+0x20/0x2d
      [  166.674230]  [<ffffffff817ec2b5>] tcindex_delete+0x196/0x1b7
      
      struct list_head can not be simply copied and we should always init it.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e056569
  11. 30 9月, 2014 5 次提交
  12. 29 9月, 2014 4 次提交
  13. 26 9月, 2014 1 次提交
    • E
      net: sched: use pinned timers · 4a8e320c
      Eric Dumazet 提交于
      While using a MQ + NETEM setup, I had confirmation that the default
      timer migration ( /proc/sys/kernel/timer_migration ) is killing us.
      
      Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX
      queues) :
      
      EST="est 1sec 4sec"
      for ETH in eth1
      do
       tc qd del dev $ETH root 2>/dev/null
       tc qd add dev $ETH root handle 1: mq
       tc qd add dev $ETH parent 1:1 $EST netem limit 70000 delay 6ms
       tc qd add dev $ETH parent 1:2 $EST netem limit 70000 delay 8ms
       tc qd add dev $ETH parent 1:3 $EST netem limit 70000 delay 10ms
       tc qd add dev $ETH parent 1:4 $EST netem limit 70000 delay 12ms
       tc qd add dev $ETH parent 1:5 $EST netem limit 70000 delay 14ms
       tc qd add dev $ETH parent 1:6 $EST netem limit 70000 delay 16ms
       tc qd add dev $ETH parent 1:7 $EST netem limit 80000 delay 18ms
       tc qd add dev $ETH parent 1:8 $EST netem limit 90000 delay 20ms
      done
      
      We can see that timers get migrated into a single cpu, presumably idle
      at the time timers are set up.
      Then all qdisc dequeues run from this cpu and huge lock contention
      happens. This single cpu is stuck in softirq mode and cannot dequeue
      fast enough.
      
          39.24%  [kernel]          [k] _raw_spin_lock
           2.65%  [kernel]          [k] netem_enqueue
           1.80%  [kernel]          [k] netem_dequeue
           1.63%  [kernel]          [k] copy_user_enhanced_fast_string
           1.45%  [kernel]          [k] _raw_spin_lock_bh
      
      By pinning qdisc timers on the cpu running the qdisc, we respect proper
      XPS setting and remove this lock contention.
      
           5.84%  [kernel]          [k] netem_enqueue
           4.83%  [kernel]          [k] _raw_spin_lock
           2.92%  [kernel]          [k] copy_user_enhanced_fast_string
      
      Current Qdiscs that benefit from this change are :
      
      	netem, cbq, fq, hfsc, tbf, htb.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a8e320c
  14. 23 9月, 2014 4 次提交
  15. 20 9月, 2014 2 次提交
  16. 17 9月, 2014 7 次提交
    • J
      net: sched: cls_cgroup need tcf_exts_init in all cases · 9f6c38e7
      John Fastabend 提交于
      This ensures the tcf_exts_init() is called for all cases.
      
      Fixes: 952313bd ("net: sched: cls_cgroup use RCU")
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f6c38e7
    • J
      net: sched: cls_fw: add missing tcf_exts_init call in fw_change() · e1f93eb0
      John Fastabend 提交于
      When allocating a new structure we also need to call tcf_exts_init
      to initialize exts.
      
      A follow up patch might be in order to remove some of this code
      and do tcf_exts_assign(). With this we could remove the
      tcf_exts_init/tcf_exts_change pattern for some of the classifiers.
      As part of the future tcf_actions RCU series this will need to be
      done. For now fix the call here.
      
      Fixes e35a8ee5 ("net: sched: fw use RCU")
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1f93eb0
    • J
      net: sched: cls_cgroup fix possible memory leak of 'new' · d14cbfc8
      John Fastabend 提交于
      tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
      head:   54996b52
      commit: c7953ef2 [625/646] net: sched: cls_cgroup use RCU
      
      net/sched/cls_cgroup.c:130 cls_cgroup_change() warn: possible memory leak of 'new'
      net/sched/cls_cgroup.c:135 cls_cgroup_change() warn: possible memory leak of 'new'
      net/sched/cls_cgroup.c:139 cls_cgroup_change() warn: possible memory leak of 'new'
      
      Fixes: c7953ef2 ("net: sched: cls_cgroup use RCU")
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d14cbfc8
    • J
      net: sched: cls_u32 add missing rcu_assign_pointer and annotation · a96366bf
      John Fastabend 提交于
      Add missing rcu_assign_pointer and missing  annotation for ht_up
      in cls_u32.c
      
      Caught by kbuild bot,
      
      >> net/sched/cls_u32.c:378:36: sparse: incorrect type in initializer (different address spaces)
         net/sched/cls_u32.c:378:36:    expected struct tc_u_hnode *ht
         net/sched/cls_u32.c:378:36:    got struct tc_u_hnode [noderef] <asn:4>*ht_up
      >> net/sched/cls_u32.c:610:54: sparse: incorrect type in argument 4 (different address spaces)
         net/sched/cls_u32.c:610:54:    expected struct tc_u_hnode *ht
         net/sched/cls_u32.c:610:54:    got struct tc_u_hnode [noderef] <asn:4>*ht_up
      >> net/sched/cls_u32.c:684:18: sparse: incorrect type in assignment (different address spaces)
         net/sched/cls_u32.c:684:18:    expected struct tc_u_hnode [noderef] <asn:4>*ht_up
         net/sched/cls_u32.c:684:18:    got struct tc_u_hnode *[assigned] ht
      >> net/sched/cls_u32.c:359:18: sparse: dereference of noderef expression
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a96366bf
    • J
      net: sched: fix unsued cpu variable · 80aab73d
      John Fastabend 提交于
      kbuild test robot reported an unused variable cpu in cls_u32.c
      after the patch below. This happens when PERF and MARK config
      variables are disabled
      
      Fix this is to use separate variables for perf and mark
      and define the cpu variable inside the ifdef logic.
      
      Fixes: 459d5f62 ("net: sched: make cls_u32 per cpu")'
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80aab73d
    • W
      net_sched: fix a null pointer dereference in tcindex_set_parms() · 69301eaa
      WANG Cong 提交于
      This patch fixes the following crash:
      
      [   42.199159] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      [   42.200027] IP: [<ffffffff817e3fc4>] tcindex_set_parms+0x45c/0x526
      [   42.200027] PGD d2319067 PUD d4ffe067 PMD 0
      [   42.200027] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [   42.200027] CPU: 0 PID: 541 Comm: tc Not tainted 3.17.0-rc4+ #603
      [   42.200027] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [   42.200027] task: ffff8800d22d2670 ti: ffff8800ce790000 task.ti: ffff8800ce790000
      [   42.200027] RIP: 0010:[<ffffffff817e3fc4>]  [<ffffffff817e3fc4>] tcindex_set_parms+0x45c/0x526
      [   42.200027] RSP: 0018:ffff8800ce793898  EFLAGS: 00010202
      [   42.200027] RAX: 0000000000000001 RBX: ffff8800d1786498 RCX: 0000000000000000
      [   42.200027] RDX: ffffffff82114ec8 RSI: ffffffff82114ec8 RDI: ffffffff82114ec8
      [   42.200027] RBP: ffff8800ce793958 R08: 00000000000080d0 R09: 0000000000000001
      [   42.200027] R10: ffff8800ce7939a0 R11: 0000000000000246 R12: ffff8800d017d238
      [   42.200027] R13: 0000000000000018 R14: ffff8800d017c6a0 R15: ffff8800d1786620
      [   42.200027] FS:  00007f4e24539740(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
      [   42.200027] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   42.200027] CR2: 0000000000000018 CR3: 00000000cff38000 CR4: 00000000000006f0
      [   42.200027] Stack:
      [   42.200027]  ffff8800ce0949f0 0000000000000000 0000000200000003 ffff880000000000
      [   42.200027]  ffff8800ce7938b8 ffff8800ce7938b8 0000000600000007 0000000000000000
      [   42.200027]  ffff8800ce7938d8 ffff8800ce7938d8 0000000600000007 ffff8800ce0949f0
      [   42.200027] Call Trace:
      [   42.200027]  [<ffffffff817e4169>] tcindex_change+0xdb/0xee
      [   42.200027]  [<ffffffff817c16ca>] tc_ctl_tfilter+0x44d/0x63f
      [   42.200027]  [<ffffffff8179d161>] rtnetlink_rcv_msg+0x181/0x194
      [   42.200027]  [<ffffffff8179cf9d>] ? rtnl_lock+0x17/0x19
      [   42.200027]  [<ffffffff8179cfe0>] ? __rtnl_unlock+0x17/0x17
      [   42.200027]  [<ffffffff817ee296>] netlink_rcv_skb+0x49/0x8b
      [   43.462494]  [<ffffffff8179cfc2>] rtnetlink_rcv+0x23/0x2a
      [   43.462494]  [<ffffffff817ec8df>] netlink_unicast+0xc7/0x148
      [   43.462494]  [<ffffffff817ed413>] netlink_sendmsg+0x5cb/0x63d
      [   43.462494]  [<ffffffff810ad781>] ? mark_lock+0x2e/0x224
      [   43.462494]  [<ffffffff817757b8>] __sock_sendmsg_nosec+0x25/0x27
      [   43.462494]  [<ffffffff81778165>] sock_sendmsg+0x57/0x71
      [   43.462494]  [<ffffffff81152bbd>] ? might_fault+0x57/0xa4
      [   43.462494]  [<ffffffff81152c06>] ? might_fault+0xa0/0xa4
      [   43.462494]  [<ffffffff81152bbd>] ? might_fault+0x57/0xa4
      [   43.462494]  [<ffffffff817838fd>] ? verify_iovec+0x69/0xb7
      [   43.462494]  [<ffffffff817784f8>] ___sys_sendmsg+0x21d/0x2bb
      [   43.462494]  [<ffffffff81009db3>] ? native_sched_clock+0x35/0x37
      [   43.462494]  [<ffffffff8109ab53>] ? sched_clock_local+0x12/0x72
      [   43.462494]  [<ffffffff810ad781>] ? mark_lock+0x2e/0x224
      [   43.462494]  [<ffffffff8109ada4>] ? sched_clock_cpu+0xa0/0xb9
      [   43.462494]  [<ffffffff810aee37>] ? __lock_acquire+0x5fe/0xde4
      [   43.462494]  [<ffffffff8119f570>] ? rcu_read_lock_held+0x36/0x38
      [   43.462494]  [<ffffffff8119f75a>] ? __fcheck_files.isra.7+0x4b/0x57
      [   43.462494]  [<ffffffff8119fbf2>] ? __fget_light+0x30/0x54
      [   43.462494]  [<ffffffff81779012>] __sys_sendmsg+0x42/0x60
      [   43.462494]  [<ffffffff81779042>] SyS_sendmsg+0x12/0x1c
      [   43.462494]  [<ffffffff819d24d2>] system_call_fastpath+0x16/0x1b
      
      'p->h' could be NULL while 'cp->h' is always update to date.
      
      Fixes: commit 331b7292 ("net: sched: RCU cls_tcindex")
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-By: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69301eaa
    • W
      net_sched: fix memory leak in cls_tcindex · 44b75e43
      WANG Cong 提交于
      Fixes: commit 331b7292 ("net: sched: RCU cls_tcindex")
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-By: NJohn Fastabend <john.r.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44b75e43
  17. 16 9月, 2014 1 次提交