- 20 6月, 2018 2 次提交
-
-
由 Davide Caratti 提交于
in the following script # tc actions add action ife encode allow prio pass index 42 # tc actions replace action ife encode allow tcindex drop index 42 the action control should remain equal to 'pass', if the kernel failed to replace the TC action. Pospone the assignment of the action control, to ensure it is not overwritten in the error path of tcf_ife_init(). Fixes: ef6980b6 ("introduce IFE action") Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Acked-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Davide Caratti 提交于
a recursive lock warning [1] can be observed with the following script, # $TC actions add action ife encode allow prio pass index 42 IFE type 0xED3E # $TC actions replace action ife encode allow tcindex pass index 42 in case the kernel was unable to run the last command (e.g. because of the impossibility to load 'act_meta_skbtcindex'). For a similar reason, the kernel can leak idr in the error path of tcf_ife_init(), because tcf_idr_release() is not called after successful idr reservation: # $TC actions add action ife encode allow tcindex index 47 IFE type 0xED3E RTNETLINK answers: No such file or directory We have an error talking to the kernel # $TC actions add action ife encode allow tcindex index 47 IFE type 0xED3E RTNETLINK answers: No space left on device We have an error talking to the kernel # $TC actions add action ife encode use mark 7 type 0xfefe pass index 47 IFE type 0xFEFE RTNETLINK answers: No space left on device We have an error talking to the kernel Since tcfa_lock is already taken when the action is being edited, a call to tcf_idr_release() wrongly makes tcf_idr_cleanup() take the same lock again. On the other hand, tcf_idr_release() needs to be called in the error path of tcf_ife_init(), to undo the last tcf_idr_create() invocation. Fix both problems in tcf_ife_init(). Since the cleanup() routine can now be called when ife->params is NULL, also add a NULL pointer check to avoid calling kfree_rcu(NULL, rcu). [1] ============================================ WARNING: possible recursive locking detected 4.17.0-rc4.kasan+ #417 Tainted: G E -------------------------------------------- tc/3932 is trying to acquire lock: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_cleanup+0x19/0x80 [act_ife] but task is already holding lock: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&p->tcfa_lock)->rlock); lock(&(&p->tcfa_lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by tc/3932: #0: 000000007ca8e990 (rtnl_mutex){+.+.}, at: tcf_ife_init+0xf61/0x13c0 [act_ife] #1: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife] stack backtrace: CPU: 3 PID: 3932 Comm: tc Tainted: G E 4.17.0-rc4.kasan+ #417 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: dump_stack+0x9a/0xeb __lock_acquire+0xf43/0x34a0 ? debug_check_no_locks_freed+0x2b0/0x2b0 ? debug_check_no_locks_freed+0x2b0/0x2b0 ? debug_check_no_locks_freed+0x2b0/0x2b0 ? __mutex_lock+0x62f/0x1240 ? kvm_sched_clock_read+0x1a/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x39/0x1d0 ? lock_acquire+0x10b/0x330 lock_acquire+0x10b/0x330 ? tcf_ife_cleanup+0x19/0x80 [act_ife] _raw_spin_lock_bh+0x38/0x70 ? tcf_ife_cleanup+0x19/0x80 [act_ife] tcf_ife_cleanup+0x19/0x80 [act_ife] __tcf_idr_release+0xff/0x350 tcf_ife_init+0xdde/0x13c0 [act_ife] ? ife_exit_net+0x290/0x290 [act_ife] ? __lock_is_held+0xb4/0x140 tcf_action_init_1+0x67b/0xad0 ? tcf_action_dump_old+0xa0/0xa0 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? kvm_sched_clock_read+0x1a/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? memset+0x1f/0x40 tcf_action_init+0x30f/0x590 ? tcf_action_init_1+0xad0/0xad0 ? memset+0x1f/0x40 tc_ctl_action+0x48e/0x5e0 ? mutex_lock_io_nested+0x1160/0x1160 ? tca_action_gd+0x990/0x990 ? sched_clock+0x5/0x10 ? find_held_lock+0x39/0x1d0 rtnetlink_rcv_msg+0x4da/0x990 ? validate_linkmsg+0x680/0x680 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x39/0x1d0 netlink_rcv_skb+0x127/0x350 ? validate_linkmsg+0x680/0x680 ? netlink_ack+0x970/0x970 ? __kmalloc_node_track_caller+0x304/0x3a0 netlink_unicast+0x40f/0x5d0 ? netlink_attachskb+0x580/0x580 ? _copy_from_iter_full+0x187/0x760 ? import_iovec+0x90/0x390 netlink_sendmsg+0x67f/0xb50 ? netlink_unicast+0x5d0/0x5d0 ? copy_msghdr_from_user+0x206/0x340 ? netlink_unicast+0x5d0/0x5d0 sock_sendmsg+0xb3/0xf0 ___sys_sendmsg+0x60a/0x8b0 ? copy_msghdr_from_user+0x340/0x340 ? lock_downgrade+0x5e0/0x5e0 ? tty_write_lock+0x18/0x50 ? kvm_sched_clock_read+0x1a/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x39/0x1d0 ? lock_downgrade+0x5e0/0x5e0 ? lock_acquire+0x10b/0x330 ? __audit_syscall_entry+0x316/0x690 ? current_kernel_time64+0x6b/0xd0 ? __fget_light+0x55/0x1f0 ? __sys_sendmsg+0xd2/0x170 __sys_sendmsg+0xd2/0x170 ? __ia32_sys_shutdown+0x70/0x70 ? syscall_trace_enter+0x57a/0xd60 ? rcu_read_lock_sched_held+0xdc/0x110 ? __bpf_trace_sys_enter+0x10/0x10 ? do_syscall_64+0x22/0x480 do_syscall_64+0xa5/0x480 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fd646988ba0 RSP: 002b:00007fffc9fab3c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fffc9fab4f0 RCX: 00007fd646988ba0 RDX: 0000000000000000 RSI: 00007fffc9fab440 RDI: 0000000000000003 RBP: 000000005b28c8b3 R08: 0000000000000002 R09: 0000000000000000 R10: 00007fffc9faae20 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fffc9fab504 R14: 0000000000000001 R15: 000000000066c100 Fixes: 4e8c8615 ("net sched: net sched: ife action fix late binding") Fixes: ef6980b6 ("introduce IFE action") Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Acked-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 6月, 2018 1 次提交
-
-
由 Konstantin Khlebnikov 提交于
When blackhole is used on top of classful qdisc like hfsc it breaks qlen and backlog counters because packets are disappear without notice. In HFSC non-zero qlen while all classes are inactive triggers warning: WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc] and schedules watchdog work endlessly. This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS, this flag tells upper layer: this packet is gone and isn't queued. Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 6月, 2018 1 次提交
-
-
由 Kees Cook 提交于
The kvzalloc() function has a 2-factor argument form, kvcalloc(). This patch replaces cases of: kvzalloc(a * b, gfp) with: kvcalloc(a * b, gfp) as well as handling cases of: kvzalloc(a * b * c, gfp) with: kvzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kvcalloc(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kvzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kvzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kvzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kvzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kvzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kvzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kvzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kvzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kvzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kvzalloc( - sizeof(char) * COUNT + COUNT , ...) | kvzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kvzalloc + kvcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kvzalloc + kvcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kvzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kvzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kvzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kvzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kvzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kvzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kvzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kvzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kvzalloc(C1 * C2 * C3, ...) | kvzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kvzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kvzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kvzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kvzalloc(sizeof(THING) * C2, ...) | kvzalloc(sizeof(TYPE) * C2, ...) | kvzalloc(C1 * C2 * C3, ...) | kvzalloc(C1 * C2, ...) | - kvzalloc + kvcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kvzalloc + kvcalloc ( - (E1) * E2 + E1, E2 , ...) | - kvzalloc + kvcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kvzalloc + kvcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 09 6月, 2018 1 次提交
-
-
由 Davide Caratti 提交于
use nla_strlcpy() to avoid copying data beyond the length of TCA_DEF_DATA netlink attribute, in case it is less than SIMP_MAX_DATA and it does not end with '\0' character. v2: fix errors in the commit message, thanks Hangbin Liu Fixes: fa1b1cff ("net_cls_act: Make act_simple use of netlink policy.") Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 6月, 2018 1 次提交
-
-
由 David S. Miller 提交于
This reverts commit d96a43c6. This potentially breaks things, so reverting as per request by Jakub Kicinski. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 6月, 2018 4 次提交
-
-
由 Paul Blakey 提交于
When using a vxlan device as the ingress dev, we count it as a "no offload dev", so when such a rule comes and err stop is true, we fail early and don't try the egdev route which can offload it through the egress device. Fix that by not calling the block offload if one of the devices attached to it is not offload capable, but make sure egress on such case is capable instead. Fixes: caa72601 ("net: sched: keep track of offloaded filters [..]") Reviewed-by: NRoi Dayan <roid@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NPaul Blakey <paulb@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
If requested tcf proto is not found, get and del filter netlink protocol handlers output error message to extack, but do not return actual error code. Add check to return ENOENT when result of tp find function is NULL pointer. Fixes: c431f89b ("net: sched: split tc_ctl_tfilter into three handlers") Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paul Blakey 提交于
We incorrectly compare the mask and the result is that we can't modify an already existing rule. Fix that by comparing correctly. Fixes: 05cd271f ("cls_flower: Support multiple masks per priority") Reported-by: NVlad Buslov <vladbu@mellanox.com> Reviewed-by: NRoi Dayan <roid@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NPaul Blakey <paulb@mellanox.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paul Blakey 提交于
When destroying the instance, destroy the head rhashtable. Fixes: 05cd271f ("cls_flower: Support multiple masks per priority") Reported-by: NVlad Buslov <vladbu@mellanox.com> Reviewed-by: NRoi Dayan <roid@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NPaul Blakey <paulb@mellanox.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 6月, 2018 3 次提交
-
-
由 Vlad Buslov 提交于
tc_ctl_tfilter handles three netlink message types: RTM_NEWTFILTER, RTM_DELTFILTER, RTM_GETTFILTER. However, implementation of this function involves a lot of branching on specific message type because most of the code is message-specific. This significantly complicates adding new functionality and doesn't provide much benefit of code reuse. Split tc_ctl_tfilter to three standalone functions that handle filter new, delete and get requests. The only truly protocol independent part of tc_ctl_tfilter is code that looks up queue, class, and block. Refactor this code to standalone tcf_block_find function that is used by all three new handlers. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paul Blakey 提交于
When we fail to modify a rule, we incorrectly release the idr handle of the unmodified old rule. Fix that by checking if we need to release it. Fixes: fe2502e4 ("net_sched: remove cls_flower idr on failure") Reported-by: NVlad Buslov <vladbu@mellanox.com> Reviewed-by: NRoi Dayan <roid@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NPaul Blakey <paulb@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Song Liu 提交于
Checking netif_xmit_frozen_or_stopped() at the end of sch_direct_xmit() is being bypassed. This is because "ret" from sch_direct_xmit() will be either NETDEV_TX_OK or NETDEV_TX_BUSY, and only ret == NETDEV_TX_OK == 0 will reach the condition: if (ret && netif_xmit_frozen_or_stopped(txq)) return false; This patch cleans up the code by removing the whole condition. For more discussion about this, please refer to https://marc.info/?t=152727195700008Signed-off-by: NSong Liu <songliubraving@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 5月, 2018 2 次提交
-
-
由 Jakub Kicinski 提交于
MQ doesn't hold any statistics on its own, however, statistic from offloads are requested starting from the root, hence MQ will read the old values for its sums. Call into the drivers, because of the additive nature of the stats drivers are aware of how much "pending updates" they have to children of the MQ. Since MQ reset its stats on every dump we can simply offset the stats, predicting how stats of offloaded children will change. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
mq offload is trivial, we just need to let the device know that the root qdisc is mq. Alternative approach would be to export qdisc_lookup() and make drivers check the root type themselves, but notification via ndo_setup_tc is more in line with other qdiscs. Note that mq doesn't hold any stats on it's own, it just adds up stats of its children. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 5月, 2018 2 次提交
-
-
由 Cong Wang 提交于
Commit 05f0fe6b ("RCU, workqueue: Implement rcu_work") introduces new API's for dispatching work in a RCU callback. Now we can just switch to the new API's for tc filters. This could get rid of a lot of code. Cc: Tejun Heo <tj@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Or Gerlitz 提交于
When dealing with ingress rule on a netdev, if we did fine through the conventional path, there's no need to continue into the egdev route, and we can stop right there. Not doing so may cause a 2nd rule to be added by the cls api layer with the ingress being the egdev. For example, under sriov switchdev scheme, a user rule of VFR A --> VFR B will end up with two HW rules (1) VF A --> VF B and (2) uplink --> VF B Fixes: 208c0f4b ('net: sched: use tc_setup_cb_call to call per-block callbacks') Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 5月, 2018 1 次提交
-
-
由 Vlad Buslov 提交于
Initial net_device implementation used ingress_lock spinlock to synchronize ingress path of device. This lock was used in both process and bh context. In some code paths action map lock was obtained while holding ingress_lock. Commit e1e992e5 ("[NET_SCHED] protect action config/dump from irqs") modified actions to always disable bh, while using action map lock, in order to prevent deadlock on ingress_lock in softirq. This lock was removed from net_device, so disabling bh, while accessing action map, is no longer necessary. Replace all action idr spinlock usage with regular calls that do not disable bh. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 5月, 2018 1 次提交
-
-
由 Paolo Abeni 提交于
Hangbin reported an Oops triggered by the syzkaller qdisc rules: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI Modules linked in: sch_red CPU: 0 PID: 28699 Comm: syz-executor5 Not tainted 4.17.0-rc4.kcov #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:qdisc_hash_add+0x26/0xa0 RSP: 0018:ffff8800589cf470 EFLAGS: 00010203 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff824ad971 RDX: 0000000000000007 RSI: ffffc9000ce9f000 RDI: 000000000000003c RBP: 0000000000000001 R08: ffffed000b139ea2 R09: ffff8800589cf4f0 R10: ffff8800589cf50f R11: ffffed000b139ea2 R12: ffff880054019fc0 R13: ffff880054019fb4 R14: ffff88005c0af600 R15: ffff880054019fb0 FS: 00007fa6edcb1700(0000) GS:ffff88005ce00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000740 CR3: 000000000fc16000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: red_change+0x2d2/0xed0 [sch_red] qdisc_create+0x57e/0xef0 tc_modify_qdisc+0x47f/0x14e0 rtnetlink_rcv_msg+0x6a8/0x920 netlink_rcv_skb+0x2a2/0x3c0 netlink_unicast+0x511/0x740 netlink_sendmsg+0x825/0xc30 sock_sendmsg+0xc5/0x100 ___sys_sendmsg+0x778/0x8e0 __sys_sendmsg+0xf5/0x1b0 do_syscall_64+0xbd/0x3b0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x450869 RSP: 002b:00007fa6edcb0c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fa6edcb16b4 RCX: 0000000000450869 RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000013 RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000008778 R14: 0000000000702838 R15: 00007fa6edcb1700 Code: e9 0b fe ff ff 0f 1f 44 00 00 55 53 48 89 fb 89 f5 e8 3f 07 f3 fe 48 8d 7b 3c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 51 RIP: qdisc_hash_add+0x26/0xa0 RSP: ffff8800589cf470 When a red qdisc is updated with a 0 limit, the child qdisc is left unmodified, no additional scheduler is created in red_change(), the 'child' local variable is rightfully NULL and must not add it to the hash table. This change addresses the above issue moving qdisc_hash_add() right after the child qdisc creation. It additionally removes unneeded checks for noop_qdisc. Reported-by: NHangbin Liu <liuhangbin@gmail.com> Fixes: 49b49971 ("net: sched: make default fifo qdiscs appear in the dump") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Acked-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 5月, 2018 2 次提交
-
-
由 Paolo Abeni 提交于
After the previous patch, for NOLOCK qdiscs, q->seqlock is always held when the dequeue() is invoked, we can drop any additional locking to protect such operation. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
So that we can use lockdep on it. The newly introduced sequence lock has the same scope of busylock, so it shares the same lockdep annotation, but it's only used for NOLOCK qdiscs. With this changeset we acquire such lock in the control path around flushing operation (qdisc reset), to allow more NOLOCK qdisc perf improvement in the next patch. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 5月, 2018 2 次提交
-
-
由 Davide Caratti 提交于
Similarly to what was done with commit a52956df ("net sched actions: fix refcnt leak in skbmod"), fix the error path of tcf_vlan_init() to avoid refcnt leaks when wrong value of TCA_VLAN_PUSH_VLAN_PROTOCOL is given. Fixes: 5026c9b1 ("net sched: vlan action fix late binding") CC: Roman Mashak <mrv@mojatatu.com> Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
Currently NOLOCK qdiscs pay a measurable overhead to atomically manipulate the __QDISC_STATE_RUNNING. Such bit is flipped twice per packet in the uncontended scenario with packet rate below the line rate: on packed dequeue and on the next, failing dequeue attempt. This changeset moves the bit manipulation into the qdisc_run_{begin,end} helpers, so that the bit is now flipped only once per packet, with measurable performance improvement in the uncontended scenario. This also allows simplifying the qdisc teardown code path - since qdisc_is_running() is now effective for each qdisc type - and avoid a possible race between qdisc_run() and dev_deactivate_many(), as now the some_qdisc_is_busy() can properly detect NOLOCK qdiscs being busy dequeuing packets. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 5月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
Variants of proc_create{,_data} that directly take a seq_file show callback and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 12 5月, 2018 3 次提交
-
-
由 Roman Mashak 提交于
When application fails to pass flags in netlink TLV when replacing existing skbmod action, the kernel will leak refcnt: $ tc actions get action skbmod index 1 total acts 0 action order 0: skbmod pipe set smac 00:11:22:33:44:55 index 1 ref 1 bind 0 For example, at this point a buggy application replaces the action with index 1 with new smac 00:aa:22:33:44:55, it fails because of zero flags, however refcnt gets bumped: $ tc actions get actions skbmod index 1 total acts 0 action order 0: skbmod pipe set smac 00:11:22:33:44:55 index 1 ref 2 bind 0 $ Tha patch fixes this by calling tcf_idr_release() on existing actions. Fixes: 86da71b5 ("net_sched: Introduce skbmod action") Signed-off-by: NRoman Mashak <mrv@mojatatu.com> Acked-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
In case modules are not configured, error out when tp->ops is null and prevent later null pointer dereference. Fixes: 33a48927 ("sched: push TC filter protocol creation into a separate function") Signed-off-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roman Mashak 提交于
When application fails to pass flags in netlink TLV for a new skbedit action, the kernel results in the following oops: [ 8.307732] BUG: unable to handle kernel paging request at 0000000000021130 [ 8.309167] PGD 80000000193d1067 P4D 80000000193d1067 PUD 180e0067 PMD 0 [ 8.310595] Oops: 0000 [#1] SMP PTI [ 8.311334] Modules linked in: kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper serio_raw [ 8.314190] CPU: 1 PID: 397 Comm: tc Not tainted 4.17.0-rc3+ #357 [ 8.315252] RIP: 0010:__tcf_idr_release+0x33/0x140 [ 8.316203] RSP: 0018:ffffa0718038f840 EFLAGS: 00010246 [ 8.317123] RAX: 0000000000000001 RBX: 0000000000021100 RCX: 0000000000000000 [ 8.319831] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000021100 [ 8.321181] RBP: 0000000000000000 R08: 000000000004adf8 R09: 0000000000000122 [ 8.322645] R10: 0000000000000000 R11: ffffffff9e5b01ed R12: 0000000000000000 [ 8.324157] R13: ffffffff9e0d3cc0 R14: 0000000000000000 R15: 0000000000000000 [ 8.325590] FS: 00007f591292e700(0000) GS:ffff8fcf5bc40000(0000) knlGS:0000000000000000 [ 8.327001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.327987] CR2: 0000000000021130 CR3: 00000000180e6004 CR4: 00000000001606a0 [ 8.329289] Call Trace: [ 8.329735] tcf_skbedit_init+0xa7/0xb0 [ 8.330423] tcf_action_init_1+0x362/0x410 [ 8.331139] ? try_to_wake_up+0x44/0x430 [ 8.331817] tcf_action_init+0x103/0x190 [ 8.332511] tc_ctl_action+0x11a/0x220 [ 8.333174] rtnetlink_rcv_msg+0x23d/0x2e0 [ 8.333902] ? _cond_resched+0x16/0x40 [ 8.334569] ? __kmalloc_node_track_caller+0x5b/0x2c0 [ 8.335440] ? rtnl_calcit.isra.31+0xf0/0xf0 [ 8.336178] netlink_rcv_skb+0xdb/0x110 [ 8.336855] netlink_unicast+0x167/0x220 [ 8.337550] netlink_sendmsg+0x2a7/0x390 [ 8.338258] sock_sendmsg+0x30/0x40 [ 8.338865] ___sys_sendmsg+0x2c5/0x2e0 [ 8.339531] ? pagecache_get_page+0x27/0x210 [ 8.340271] ? filemap_fault+0xa2/0x630 [ 8.340943] ? page_add_file_rmap+0x108/0x200 [ 8.341732] ? alloc_set_pte+0x2aa/0x530 [ 8.342573] ? finish_fault+0x4e/0x70 [ 8.343332] ? __handle_mm_fault+0xbc1/0x10d0 [ 8.344337] ? __sys_sendmsg+0x53/0x80 [ 8.345040] __sys_sendmsg+0x53/0x80 [ 8.345678] do_syscall_64+0x4f/0x100 [ 8.346339] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 8.347206] RIP: 0033:0x7f591191da67 [ 8.347831] RSP: 002b:00007fff745abd48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 8.349179] RAX: ffffffffffffffda RBX: 00007fff745abe70 RCX: 00007f591191da67 [ 8.350431] RDX: 0000000000000000 RSI: 00007fff745abdc0 RDI: 0000000000000003 [ 8.351659] RBP: 000000005af35251 R08: 0000000000000001 R09: 0000000000000000 [ 8.352922] R10: 00000000000005f1 R11: 0000000000000246 R12: 0000000000000000 [ 8.354183] R13: 00007fff745afed0 R14: 0000000000000001 R15: 00000000006767c0 [ 8.355400] Code: 41 89 d4 53 89 f5 48 89 fb e8 aa 20 fd ff 85 c0 0f 84 ed 00 00 00 48 85 db 0f 84 cf 00 00 00 40 84 ed 0f 85 cd 00 00 00 45 84 e4 <8b> 53 30 74 0d 85 d2 b8 ff ff ff ff 0f 8f b3 00 00 00 8b 43 2c [ 8.358699] RIP: __tcf_idr_release+0x33/0x140 RSP: ffffa0718038f840 [ 8.359770] CR2: 0000000000021130 [ 8.360438] ---[ end trace 60c66be45dfc14f0 ]--- The caller calls action's ->init() and passes pointer to "struct tc_action *a", which later may be initialized to point at the existing action, otherwise "struct tc_action *a" is still invalid, and therefore dereferencing it is an error as happens in tcf_idr_release, where refcnt is decremented. So in case of missing flags tcf_idr_release must be called only for existing actions. v2: - prepare patch for net tree Fixes: 5e1567ae ("net sched: skbedit action fix late binding") Signed-off-by: NRoman Mashak <mrv@mojatatu.com> Acked-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 5月, 2018 2 次提交
-
-
由 Craig Dillabaugh 提交于
Signed-off-by: NCraig Dillabaugh <cdillaba@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Normally, a socket can not be freed/reused unless all its TX packets left qdisc and were TX-completed. However connect(AF_UNSPEC) allows this to happen. With commit fc59d5bd ("pkt_sched: fq: clear time_next_packet for reused flows") we cleared f->time_next_packet but took no special action if the flow was still in the throttled rb-tree. Since f->time_next_packet is the key used in the rb-tree searches, blindly clearing it might break rb-tree integrity. We need to make sure the flow is no longer in the rb-tree to avoid this problem. Fixes: fc59d5bd ("pkt_sched: fq: clear time_next_packet for reused flows") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 5月, 2018 1 次提交
-
-
由 Paul Blakey 提交于
Currently flower doesn't support inserting filters with different masks on a single priority, even if the actual flows (key + mask) inserted aren't overlapping, as with the use case of offloading openvswitch datapath flows. Instead one must go up one level, and assign different priorities for each mask, which will create a different flower instances. This patch opens flower to support more than one mask per priority, and a single flower instance. It does so by adding another hash table on top of the existing one which will store the different masks, and the filters that share it. The user is left with the responsibility of ensuring non overlapping flows, otherwise precedence is not guaranteed. Signed-off-by: NPaul Blakey <paulb@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 4月, 2018 2 次提交
-
-
由 Alexander Aring 提交于
There is currently no handling to check on a invalid tlv length. This patch adds such handling to avoid killing the kernel with a malformed ife packet. Signed-off-by: NAlexander Aring <aring@mojatatu.com> Reviewed-by: NYotam Gigi <yotam.gi@gmail.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Aring 提交于
We need to record stats for received metadata that we dont know how to process. Have find_decode_metaid() return -ENOENT to capture this. Signed-off-by: NAlexander Aring <aring@mojatatu.com> Reviewed-by: NYotam Gigi <yotam.gi@gmail.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 4月, 2018 1 次提交
-
-
由 Cong Wang 提交于
When we delete a u32 key via u32_delete_key(), we forget to call idr_remove() to remove its handle from IDR. Fixes: e7614370 ("net_sched: use idr to allocate u32 filter handles") Reported-by: NMarcin Kabiesz <admin@hostcenter.eu> Tested-by: NMarcin Kabiesz <admin@hostcenter.eu> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 4月, 2018 1 次提交
-
-
由 Davide Caratti 提交于
when tcf_bpf_init_from_ops() fails (e.g. because of program having invalid number of instructions), tcf_bpf_cfg_cleanup() calls bpf_prog_put(NULL) or bpf_prog_destroy(NULL). Unless CONFIG_BPF_SYSCALL is unset, this causes the following error: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 PGD 800000007345a067 P4D 800000007345a067 PUD 340e1067 PMD 0 Oops: 0000 [#1] SMP PTI Modules linked in: act_bpf(E) ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache jbd2 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic pcbc snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd glue_helper cryptd joydev snd_timer snd virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_blk drm virtio_net virtio_console i2c_core crc32c_intel serio_raw virtio_pci ata_piix libata virtio_ring floppy virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_bpf] CPU: 3 PID: 5654 Comm: tc Tainted: G E 4.16.0.bpf_test+ #408 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__bpf_prog_put+0xc/0xc0 RSP: 0018:ffff9594003ef728 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff9594003ef758 RCX: 0000000000000024 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044 R10: 0000000000000220 R11: ffff8a7ab9f17131 R12: 0000000000000000 R13: ffff8a7ab7c3c8e0 R14: 0000000000000001 R15: ffff8a7ab88f1054 FS: 00007fcb2f17c740(0000) GS:ffff8a7abfd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000020 CR3: 000000007c888006 CR4: 00000000001606e0 Call Trace: tcf_bpf_cfg_cleanup+0x2f/0x40 [act_bpf] tcf_bpf_cleanup+0x4c/0x70 [act_bpf] __tcf_idr_release+0x79/0x140 tcf_bpf_init+0x125/0x330 [act_bpf] tcf_action_init_1+0x2cc/0x430 ? get_page_from_freelist+0x3f0/0x11b0 tcf_action_init+0xd3/0x1b0 tc_ctl_action+0x18b/0x240 rtnetlink_rcv_msg+0x29c/0x310 ? _cond_resched+0x15/0x30 ? __kmalloc_node_track_caller+0x1b9/0x270 ? rtnl_calcit.isra.29+0x100/0x100 netlink_rcv_skb+0xd2/0x110 netlink_unicast+0x17c/0x230 netlink_sendmsg+0x2cd/0x3c0 sock_sendmsg+0x30/0x40 ___sys_sendmsg+0x27a/0x290 ? mem_cgroup_commit_charge+0x80/0x130 ? page_add_new_anon_rmap+0x73/0xc0 ? do_anonymous_page+0x2a2/0x560 ? __handle_mm_fault+0xc75/0xe20 __sys_sendmsg+0x58/0xa0 do_syscall_64+0x6e/0x1a0 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7fcb2e58eba0 RSP: 002b:00007ffc93c496c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007ffc93c497f0 RCX: 00007fcb2e58eba0 RDX: 0000000000000000 RSI: 00007ffc93c49740 RDI: 0000000000000003 RBP: 000000005ac6a646 R08: 0000000000000002 R09: 0000000000000000 R10: 00007ffc93c49120 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffc93c49804 R14: 0000000000000001 R15: 000000000066afa0 Code: 5f 00 48 8b 43 20 48 c7 c7 70 2f 7c b8 c7 40 10 00 00 00 00 5b e9 a5 8b 61 00 0f 1f 44 00 00 0f 1f 44 00 00 41 54 55 48 89 fd 53 <48> 8b 47 20 f0 ff 08 74 05 5b 5d 41 5c c3 41 89 f4 0f 1f 44 00 RIP: __bpf_prog_put+0xc/0xc0 RSP: ffff9594003ef728 CR2: 0000000000000020 Fix it in tcf_bpf_cfg_cleanup(), ensuring that bpf_prog_{put,destroy}(f) is called only when f is not NULL. Fixes: bbc09e78 ("net/sched: fix idr leak on the error path of tcf_bpf_init()") Reported-by: NLucas Bates <lucasb@mojatatu.com> Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 4月, 2018 1 次提交
-
-
由 Li RongQing 提交于
move messages emitting out of sch_tree_lock to avoid holding this lock too long. Signed-off-by: NLi RongQing <lirongqing@baidu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 3月, 2018 1 次提交
-
-
由 Kirill Tkhai 提交于
Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 3月, 2018 2 次提交
-
-
由 Craig Dillabaugh 提交于
Fixes a bug in the tcf_dump_walker function that can cause some actions to not be reported when dumping a large number of actions. This issue became more aggrevated when cookies feature was added. In particular this issue is manifest when large cookie values are assigned to the actions and when enough actions are created that the resulting table must be dumped in multiple batches. The number of actions returned in each batch is limited by the total number of actions and the memory buffer size. With small cookies the numeric limit is reached before the buffer size limit, which avoids the code path triggering this bug. When large cookies are used buffer fills before the numeric limit, and the erroneous code path is hit. For example after creating 32 csum actions with the cookie aaaabbbbccccdddd $ tc actions ls action csum total acts 26 action order 0: csum (tcp) action continue index 1 ref 1 bind 0 cookie aaaabbbbccccdddd ..... action order 25: csum (tcp) action continue index 26 ref 1 bind 0 cookie aaaabbbbccccdddd total acts 6 action order 0: csum (tcp) action continue index 28 ref 1 bind 0 cookie aaaabbbbccccdddd ...... action order 5: csum (tcp) action continue index 32 ref 1 bind 0 cookie aaaabbbbccccdddd Note that the action with index 27 is omitted from the report. Fixes: 4b3550ef ("[NET_SCHED]: Use nla_nest_start/nla_nest_end")" Signed-off-by: NCraig Dillabaugh <cdillaba@mojatatu.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
After the qdisc lock was dropped in pfifo_fast we allow multiple enqueue threads and dequeue threads to run in parallel. On the enqueue side the skb bit ooo_okay is used to ensure all related skbs are enqueued in-order. On the dequeue side though there is no similar logic. What we observe is with fewer queues than CPUs it is possible to re-order packets when two instances of __qdisc_run() are running in parallel. Each thread will dequeue a skb and then whichever thread calls the ndo op first will be sent on the wire. This doesn't typically happen because qdisc_run() is usually triggered by the same core that did the enqueue. However, drivers will trigger __netif_schedule() when queues are transitioning from stopped to awake using the netif_tx_wake_* APIs. When this happens netif_schedule() calls qdisc_run() on the same CPU that did the netif_tx_wake_* which is usually done in the interrupt completion context. This CPU is selected with the irq affinity which is unrelated to the enqueue operations. To resolve this we add a RUNNING bit to the qdisc to ensure only a single dequeue per qdisc is running. Enqueue and dequeue operations can still run in parallel and also on multi queue NICs we can still have a dequeue in-flight per qdisc, which is typically per CPU. Fixes: c5ad119f ("net: sched: pfifo_fast use skb_array") Reported-by: NJakob Unterwurzacher <jakob.unterwurzacher@theobroma-systems.com> Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 3月, 2018 2 次提交
-
-
由 Davide Caratti 提交于
use u16 in place of __be16 to suppress the following sparse warnings: net/sched/act_vlan.c:150:26: warning: incorrect type in assignment (different base types) net/sched/act_vlan.c:150:26: expected restricted __be16 [usertype] push_vid net/sched/act_vlan.c:150:26: got unsigned short net/sched/act_vlan.c:151:21: warning: restricted __be16 degrades to integer net/sched/act_vlan.c:208:26: warning: incorrect type in assignment (different base types) net/sched/act_vlan.c:208:26: expected unsigned short [unsigned] [usertype] tcfv_push_vid net/sched/act_vlan.c:208:26: got restricted __be16 [usertype] push_vid Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Davide Caratti 提交于
tcf_idr_cleanup() is no more used, so remove it. Suggested-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-