1. 20 9月, 2016 4 次提交
  2. 19 9月, 2016 6 次提交
  3. 16 9月, 2016 6 次提交
  4. 11 9月, 2016 3 次提交
  5. 29 8月, 2016 1 次提交
    • A
      net_sched: fix use of uninitialized ethertype variable in cls_flower · 0b498a52
      Arnd Bergmann 提交于
      The addition of VLAN support caused a possible use of uninitialized
      data if we encounter a zero TCA_FLOWER_KEY_ETH_TYPE key, as pointed
      out by "gcc -Wmaybe-uninitialized":
      
      net/sched/cls_flower.c: In function 'fl_change':
      net/sched/cls_flower.c:366:22: error: 'ethertype' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      This changes the code to only set the ethertype field if it
      was nonzero, as before the patch.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Fixes: 9399ae9a ("net_sched: flower: Add vlan support")
      Cc: Hadar Hen Zion <hadarh@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b498a52
  6. 26 8月, 2016 1 次提交
  7. 23 8月, 2016 2 次提交
  8. 19 8月, 2016 5 次提交
    • H
      net_sched: act_vlan: Add priority option · 956af371
      Hadar Hen Zion 提交于
      The current vlan push action supports only vid and protocol options.
      Add priority option.
      
      Example script that adds vlan push action with vid and
      priority:
      
      tc filter add dev veth0 protocol ip parent ffff: \
      	   flower \
      	   	indev veth0 \
      	   action vlan push id 100 priority 5
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      956af371
    • H
      net_sched: flower: Add vlan support · 9399ae9a
      Hadar Hen Zion 提交于
      Enhance flower to support 802.1Q vlan protocol classification.
      Currently, the supported fields are vlan_id and vlan_priority.
      
      Example:
      
      	# add a flower filter with vlan id and priority classification
      	tc filter add dev ens4f0 protocol 802.1Q parent ffff: \
      		flower \
      		indev ens4f0 \
      		vlan_ethtype ipv4 \
      		vlan_id 100 \
      		vlan_prio 3 \
      	action vlan pop
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9399ae9a
    • H
      net_sched: flower: Avoid dissection of unmasked keys · 339ba878
      Hadar Hen Zion 提交于
      The current flower implementation checks the mask range and set all the
      keys included in that range as "used_keys", even if a specific key in
      the range has a zero mask.
      
      This behavior can cause a false positive return value of
      dissector_uses_key function and unnecessary dissection in
      __skb_flow_dissect.
      
      This patch checks explicitly the mask of each key and "used_keys" will
      be set accordingly.
      
      Fixes: 77b9900e ('tc: introduce Flower classifier')
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      339ba878
    • J
      net: sched: avoid duplicates in qdisc dump · ea327469
      Jiri Kosina 提交于
      tc_dump_qdisc() performs dumping of the per-device qdiscs in two phases;
      first, the "standard" dev->qdisc is being dumped. Second, if there is/are
      ingress queue(s), they are being dumped as well.
      
      After conversion of netdevice's qdisc linked-list into hashtable, these
      two sets are not in two disjunctive sets/lists any more, but are both
      "reachable" directly from netdevice's hashtable. As a consequence, the
      "full-depth" dump of the ingress qdiscs results in immediately hitting the
      netdevice hashtable again, and duplicating the dump that has already been
      performed for dev->qdisc.
      What in fact needs to be dumped in case of ingress queue is "just" the
      top-level ingress qdisc, as everything else has been dumped already.
      
      Fix this by extending tc_dump_qdisc_root() in a way that it can be instructed
      whether it should (while performing the "full" per-netdev qdisc dump) perform
      the whole recursion, or just dump "additional" top-level (ingress) qdiscs
      without performing any kind of recursion.
      
      This fixes duplicate dumps such as
      
      	qdisc mq 0: root
      	qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc clsact ffff: parent ffff:fff1
      	qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      
      Fixes: 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
      Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea327469
    • J
      net: sched: fix handling of singleton qdiscs with qdisc_hash · 69012ae4
      Jiri Kosina 提交于
      qdisc_match_from_root() is now iterating over per-netdevice qdisc
      hashtable instead of going through a linked-list of qdiscs (independently
      on the actual underlying netdev), which was the case before the switch to
      hashtable for qdiscs.
      
      For singleton qdiscs, there is no underlying netdev associated though, and
      therefore dumping a singleton qdisc will panic, as qdisc_dev(root) will
      always be NULL.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000410
       IP: [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
       PGD 1aceba067 PUD 1aceb7067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP
      [ ... ]
       task: ffff8801ec996e00 task.stack: ffff8801ec934000
       RIP: 0010:[<ffffffff8167efac>]  [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
       RSP: 0018:ffff8801ec937ab0  EFLAGS: 00010203
       RAX: 0000000000000408 RBX: ffff88025e612000 RCX: ffffffffffffffd8
       RDX: 0000000000000000 RSI: 00000000ffff0000 RDI: ffffffff81cf8100
       RBP: ffff8801ec937ab0 R08: 000000000001c160 R09: ffff8802668032c0
       R10: ffffffff81cf8100 R11: 0000000000000030 R12: 00000000ffff0000
       R13: ffff88025e612000 R14: ffffffff81cf3140 R15: 0000000000000000
       FS:  00007f24b9af6740(0000) GS:ffff88026f280000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000410 CR3: 00000001aceec000 CR4: 00000000001406e0
       Stack:
        ffff8801ec937ad0 ffffffff81681210 ffff88025dd51a00 00000000fffffff1
        ffff8801ec937b88 ffffffff81681e4e ffffffff81c42bc0 ffff880262431500
        ffffffff81cf3140 ffff88025dd51a10 ffff88025dd51a24 00000000ec937b38
       Call Trace:
        [<ffffffff81681210>] qdisc_lookup+0x40/0x50
        [<ffffffff81681e4e>] tc_modify_qdisc+0x21e/0x550
        [<ffffffff8166ae25>] rtnetlink_rcv_msg+0x95/0x220
        [<ffffffff81209602>] ? __kmalloc_track_caller+0x172/0x230
        [<ffffffff8166ad90>] ? rtnl_newlink+0x870/0x870
        [<ffffffff816897b7>] netlink_rcv_skb+0xa7/0xc0
        [<ffffffff816657c8>] rtnetlink_rcv+0x28/0x30
        [<ffffffff8168919b>] netlink_unicast+0x15b/0x210
        [<ffffffff81689569>] netlink_sendmsg+0x319/0x390
        [<ffffffff816379f8>] sock_sendmsg+0x38/0x50
        [<ffffffff81638296>] ___sys_sendmsg+0x256/0x260
        [<ffffffff811b1275>] ? __pagevec_lru_add_fn+0x135/0x280
        [<ffffffff811b1a90>] ? pagevec_lru_move_fn+0xd0/0xf0
        [<ffffffff811b1140>] ? trace_event_raw_event_mm_lru_insertion+0x180/0x180
        [<ffffffff811b1b85>] ? __lru_cache_add+0x75/0xb0
        [<ffffffff817708a6>] ? _raw_spin_unlock+0x16/0x40
        [<ffffffff811d8dff>] ? handle_mm_fault+0x39f/0x1160
        [<ffffffff81638b15>] __sys_sendmsg+0x45/0x80
        [<ffffffff81638b62>] SyS_sendmsg+0x12/0x20
        [<ffffffff810038e7>] do_syscall_64+0x57/0xb0
      
      Fix this by special-casing singleton qdiscs (those that don't have
      underlying netdevice) and introduce immediate handling of those rather
      than trying to go over an underlying netdevice. We're in the same
      situation in tc_dump_qdisc_root() and tc_dump_tclass_root().
      
      Ultimately, this will have to be slightly reworked so that we are actually
      able to show singleton qdiscs (noop) in the dump properly; but we're not
      currently doing that anyway, so no regression there, and better do this in
      a gradual manner.
      
      Fixes: 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
      Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reported-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69012ae4
  9. 18 8月, 2016 5 次提交
  10. 11 8月, 2016 1 次提交
  11. 09 8月, 2016 2 次提交
    • M
      net/sched/sch_hfsc.c: remove unused cl_myfadj · 37088f61
      Michal Soltys 提交于
      The code using this variable has been commented out in the past as it
      was causing issues in upperlimited link-sharing scenarios.
      Signed-off-by: NMichal Soltys <soltys@ziu.info>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37088f61
    • M
      net/sched/sch_hfsc.c: keep fsc and virtual times in sync; fix an old bug · 678a6241
      Michal Soltys 提交于
      This patch simplifies how we update fsc and calculate vt from it - while
      keeping the expected functionality identical with how hfsc behaves
      curently. It also fixes a certain issue introduced with
      a very old patch.
      
      The idea is, that instead of correcting cl_vt before fsc curve update
      (rtsc_min) and correcting cl_vt after calculation (rtsc_y2x) to keep
      cl_vt local to the current period - we can simply rely on virtual times
      and curve values always being in sync - analogously to how rsc and usc
      function, except that we use virtual time here.
      
      Why hasn't it been done since the beginning this way ? The likely scenario
      (basing on the code trying to correct curves whenever possible) was to
      keep the virtual times as small as possible - as they have tendency to
      "gallop" forward whenever their siblings and other fair sharing
      subtrees are idling. On top of that, current code is subtly bugged, so
      cumulative time (without any corrections) is always kept and used in
      init_vf() when a new backlog period begins (using cl_cvtoff).
      
      Is cumulative value safe ? Generally yes, though corner cases are easy
      to create. For example consider:
      
      1gbit interface
      some 100kbit leaf, everything else idle
      
      With current tick (64ns) 1s is 15625000 ticks, but the leaf is alone and
      it's virtual time, so in reality it's 10000 times more. ITOW 38 bits are
      needed to hold 1 second. 54 - 1 day, 59 - 1 month, 63 - 1 year (all
      logarithms rounded up). It's getting somewhat dangerous, but also
      requires setup excusing this kind of values not mentioning permanently
      backlogged class for a year. In near most extreme case (10gbit, 10kbit
      leaf), we have "enough" to hold ~13.6 days in 64 bits.
      
      Well, the issue remains mostly theoretical and cl_cvtoff has been
      working fine for all those years. Sensible configuration are de-facto
      immune to this issue, and not so sensible can solve it with a cronjob
      and its period inversely proportional to the insanity of such setup =)
      
      Now let's explain the subtle bug mentioned earlier.
      
      The issue is related to how offsets are kept and how we calculate
      virtual times and update fair service curve(s). The issue itself is
      subtle, but easy to observe with long m1 segments. It was introduced in
      rather old patch:
      
      Commit 99296150c7: "[NET_SCHED]: O(1) children vtoff adjustment
      in HFSC scheduler"
      
      (available in git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git)
      
      Originally when a new backlog period was started, cl_vtoff of each
      sibling was updated with cl_cvtmax from past period - naturally moving
      all cl_vt to proper starting point. That patch adjusted it so cumulative
      offset is kept in the parent, and there is no need for traversing the
      list (as any subsequent child activation derives new vt from already
      active sibling(s)).
      
      But with this change, cl_vtoff (of each sibling) is no longer persistent
      across the inactivity periods, as it's calculated from parent's
      cl_cvtoff on a new backlog period, conflicting with the following curve
      correction from the previous period:
      
      if (cl->cl_virtual.x == vt) {
              cl->cl_virtual.x -= cl->cl_vtoff;
      	cl->cl_vtoff = 0;
      }
      
      This essentially tries to keep curve as if it was local to the period
      and resets cl_vtoff (cumulative vt offset of the class) to 0 when
      possible (read: when we have an intersection or if a new curve is below
      the old one). But then it's recalculated from cl_cvtoff on next active
      period.  Then rtsc_min() call preceding the above if() doesn't really
      do what we expect it to do in such scenario - as it calculates the
      minimum of corrected curve (from the previous backlog period) and the
      new uncorrected curve (with offset derived from cl_cvtoff).
      
      Example:
      
      tc class add dev $ife parent 1:0 classid 1:1  hfsc ls m2 100mbit ul m2 100mbit
      tc class add dev $ife parent 1:1 classid 1:10 hfsc ls m1 80mbit d 10s m2 20mbit
      tc class add dev $ife parent 1:1 classid 1:11 hfsc ls m2 20mbit
      
      start B, keep it backlogged, let it run 6s (30s worth of vt as A is idle)
      pause B briefly to force cl_cvtoff update in parent (whole 1:1 going idle)
      start A, let it run 10s
      pause A briefly to force rtsc_min()
      
      At this point we would expect A to continue at 20mbit after a brief
      moment of 80mbit. But instead A will use 80mbit for full 10s again. It's
      the effect of first correcting A (during 'start A'), and then - after
      unpausing - calculating rtsc_min() from old corrected and new uncorrected
      curve.
      
      The patch fixes this bug and keepis vt and fsc in sync (virtual times
      are cumulative, not local to the backlog period).
      Signed-off-by: NMichal Soltys <soltys@ziu.info>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      678a6241
  12. 26 7月, 2016 2 次提交
    • W
      net_sched: get rid of struct tcf_common · ec0595cc
      WANG Cong 提交于
      After the previous patch, struct tc_action should be enough
      to represent the generic tc action, tcf_common is not necessary
      any more. This patch gets rid of it to make tc action code
      more readable.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec0595cc
    • W
      net_sched: move tc_action into tcf_common · a85a970a
      WANG Cong 提交于
      struct tc_action is confusing, currently we use it for two purposes:
      1) Pass in arguments and carry out results from helper functions
      2) A generic representation for tc actions
      
      The first one is error-prone, since we need to make sure we don't
      miss anything. This patch aims to get rid of this use, by moving
      tc_action into tcf_common, so that they are allocated together
      in hashtable and can be cast'ed easily.
      
      And together with the following patch, we could really make
      tc_action a generic representation for all tc actions and each
      type of action can inherit from it.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a85a970a
  13. 25 7月, 2016 2 次提交