- 02 9月, 2017 10 次提交
-
-
由 Eric Dumazet 提交于
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. v2: added the change in drivers/vhost/net.c as spotted by Willem. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
In order to convert this atomic_t refcnt to refcount_t, we need to init the refcount to one to not trigger a 0 -> 1 transition. This also removes one atomic operation in fast path. v2: removed dead code in sock_zerocopy_put_abort() as suggested by Willem. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ivan Delalande 提交于
Report TCP MD5 (RFC2385) signing keys, addresses and address prefixes to processes with CAP_NET_ADMIN requesting INET_DIAG_INFO. Currently it is not possible to retrieve these from the kernel once they have been configured on sockets. Signed-off-by: NIvan Delalande <colona@arista.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ivan Delalande 提交于
Extend inet_diag_handler to allow individual protocols to report additional data on INET_DIAG_INFO through idiag_get_aux. The size can be dynamic and is computed by idiag_get_aux_size. Signed-off-by: NIvan Delalande <colona@arista.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Thomas Meyer 提交于
Grepping for "sizeof\(.+\) / sizeof\(" found this as one of the first candidates. Maybe a coccinelle can catch all of those. Signed-off-by: NThomas Meyer <thomas@m3y3r.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Excess of seafood or something happened while I cooked the commit adding RB tree to inetpeer. Of course, RCU rules need to be respected or bad things can happen. In this particular loop, we need to read *pp once per iteration, not twice. Fixes: b145425f ("inetpeer: remove AVL implementation in favor of RB tree") Reported-by: NJohn Sperbeck <jsperbeck@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Make sock_filter_is_valid_access consistent with other is_valid_access helpers. Requested-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yossi Kuperman 提交于
After commit dce4551c ("udp: preserve head state for IP_CMSG_PASSSEC") we preserve the secpath for the whole skb lifecycle, but we also end up leaking a reference to it. We must clear the head state on skb reception, if secpath is present. Fixes: dce4551c ("udp: preserve head state for IP_CMSG_PASSSEC") Signed-off-by: NYossi Kuperman <yossiku@mellanox.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
Commit 6bc506b4 ("bridge: switchdev: Add forward mark support for stacked devices") added the 'offload_fwd_mark' bit to the skb in order to allow drivers to indicate to the bridge driver that they already forwarded the packet in L2. In case the bit is set, before transmitting the packet from each port, the port's mark is compared with the mark stored in the skb's control block. If both marks are equal, we know the packet arrived from a switch device that already forwarded the packet and it's not re-transmitted. However, if the packet is transmitted from the bridge device itself (e.g., br0), we should clear the 'offload_fwd_mark' bit as the mark stored in the skb's control block isn't valid. This scenario can happen in rare cases where a packet was trapped during L3 forwarding and forwarded by the kernel to a bridge device. Fixes: 6bc506b4 ("bridge: switchdev: Add forward mark support for stacked devices") Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Reported-by: NYotam Gigi <yotamg@mellanox.com> Tested-by: NYotam Gigi <yotamg@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
The mlxsw driver relies on NETDEV_CHANGEUPPER events to configure the device in case a port is enslaved to a master netdev such as bridge or bond. Since the driver ignores events unrelated to its ports and their uppers, it's possible to engineer situations in which the device's data path differs from the kernel's. One example to such a situation is when a port is enslaved to a bond that is already enslaved to a bridge. When the bond was enslaved the driver ignored the event - as the bond wasn't one of its uppers - and therefore a bridge port instance isn't created in the device. Until such configurations are supported forbid them by checking that the upper device doesn't have uppers of its own. Fixes: 0d65fc13 ("mlxsw: spectrum: Implement LAG port join/leave") Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Reported-by: NNogah Frankel <nogahf@mellanox.com> Tested-by: NNogah Frankel <nogahf@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 9月, 2017 6 次提交
-
-
由 David Ahern 提交于
Allow BPF programs run on sock create to use the get_current_uid_gid helper. IPv4 and IPv6 sockets are created in a process context so there is always a valid uid/gid Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Add socket mark and priority to fields that can be set by ebpf program when a socket is created. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arkadi Sharshevsky 提交于
This will be used by the IPv6 host table which will be introduced in the following patches. The fields in the header are added per-use. This header is global and can be reused by many drivers. Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andrii 提交于
Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv() in net/dccp/ipv6.c, similar to the handling in net/ipv6/tcp_ipv6.c Signed-off-by: NAndrii Vladyka <tulup@mail.ru> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roopa Prabhu 提交于
This extends bridge fdb table tracepoints to also cover learned fdb entries in the br_fdb_update path. Note that unlike other tracepoints I have moved this to when the fdb is modified because this is in the datapath and can generate a lot of noise in the trace output. br_fdb_update is also called from added_by_user context in the NTF_USE case which is already traced ..hence the !added_by_user check. Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Cong Wang 提交于
TC filters when used as classifiers are bound to TC classes. However, there is a hidden difference when adding them in different orders: 1. If we add tc classes before its filters, everything is fine. Logically, the classes exist before we specify their ID's in filters, it is easy to bind them together, just as in the current code base. 2. If we add tc filters before the tc classes they bind, we have to do dynamic lookup in fast path. What's worse, this happens all the time not just once, because on fast path tcf_result is passed on stack, there is no way to propagate back to the one in tc filters. This hidden difference hurts performance silently if we have many tc classes in hierarchy. This patch intends to close this gap by doing the reverse binding when we create a new class, in this case we can actually search all the filters in its parent, match and fixup by classid. And because tcf_result is specific to each type of tc filter, we have to introduce a new ops for each filter to tell how to bind the class. Note, we still can NOT totally get rid of those class lookup in ->enqueue() because cgroup and flow filters have no way to determine the classid at setup time, they still have to go through dynamic lookup. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 8月, 2017 18 次提交
-
-
由 Steffen Klassert 提交于
A recent commit added an output_mark. When copying this output_mark, the return value of copy_sec_ctx is overwitten without a check. Fix this by copying the output_mark before the security context. Fixes: 077fbac4 ("net: xfrm: support setting an output mark.") Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Yossi Kuperman 提交于
In conjunction with crypto offload [1], removing the ESP trailer by hardware can potentially improve the performance by avoiding (1) a cache miss incurred by reading the nexthdr field and (2) the necessity to calculate the csum value of the trailer in order to keep skb->csum valid. This patch introduces the changes to the xfrm stack and merely serves as an infrastructure. Subsequent patch to mlx5 driver will put this to a good use. [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg175733.htmlSigned-off-by: NYossi Kuperman <yossiku@mellanox.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 David Ahern 提交于
IPv4 name uses "destination ip" as does the IPv6 patch set. Make the mac field consistent. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
syzkaller had no problem to trigger a deadlock, attaching a KCM socket to another one (or itself). (original syzkaller report was a very confusing lockdep splat during a sendmsg()) It seems KCM claims to only support TCP, but no enforcement is done, so we might need to add additional checks. Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Acked-by: NTom Herbert <tom@quantonium.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
sch_tbf calls qdisc_watchdog_cancel() in both its ->reset and ->destroy callbacks but it may fail before the timer is initialized due to missing options (either not supplied by user-space or set as a default qdisc), also q->qdisc is used by ->reset and ->destroy so we need it initialized. Reproduce: $ sysctl net.core.default_qdisc=tbf $ ip l set ethX up Crash log: [ 959.160172] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 959.160323] IP: qdisc_reset+0xa/0x5c [ 959.160400] PGD 59cdb067 [ 959.160401] P4D 59cdb067 [ 959.160466] PUD 59ccb067 [ 959.160532] PMD 0 [ 959.160597] [ 959.160706] Oops: 0000 [#1] SMP [ 959.160778] Modules linked in: sch_tbf sch_sfb sch_prio sch_netem [ 959.160891] CPU: 2 PID: 1562 Comm: ip Not tainted 4.13.0-rc6+ #62 [ 959.160998] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 959.161157] task: ffff880059c9a700 task.stack: ffff8800376d0000 [ 959.161263] RIP: 0010:qdisc_reset+0xa/0x5c [ 959.161347] RSP: 0018:ffff8800376d3610 EFLAGS: 00010286 [ 959.161531] RAX: ffffffffa001b1dd RBX: ffff8800373a2800 RCX: 0000000000000000 [ 959.161733] RDX: ffffffff8215f160 RSI: ffffffff8215f160 RDI: 0000000000000000 [ 959.161939] RBP: ffff8800376d3618 R08: 00000000014080c0 R09: 00000000ffffffff [ 959.162141] R10: ffff8800376d3578 R11: 0000000000000020 R12: ffffffffa001d2c0 [ 959.162343] R13: ffff880037538000 R14: 00000000ffffffff R15: 0000000000000001 [ 959.162546] FS: 00007fcc5126b740(0000) GS:ffff88005d900000(0000) knlGS:0000000000000000 [ 959.162844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 959.163030] CR2: 0000000000000018 CR3: 000000005abc4000 CR4: 00000000000406e0 [ 959.163233] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 959.163436] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 959.163638] Call Trace: [ 959.163788] tbf_reset+0x19/0x64 [sch_tbf] [ 959.163957] qdisc_destroy+0x8b/0xe5 [ 959.164119] qdisc_create_dflt+0x86/0x94 [ 959.164284] ? dev_activate+0x129/0x129 [ 959.164449] attach_one_default_qdisc+0x36/0x63 [ 959.164623] netdev_for_each_tx_queue+0x3d/0x48 [ 959.164795] dev_activate+0x4b/0x129 [ 959.164957] __dev_open+0xe7/0x104 [ 959.165118] __dev_change_flags+0xc6/0x15c [ 959.165287] dev_change_flags+0x25/0x59 [ 959.165451] do_setlink+0x30c/0xb3f [ 959.165613] ? check_chain_key+0xb0/0xfd [ 959.165782] rtnl_newlink+0x3a4/0x729 [ 959.165947] ? rtnl_newlink+0x117/0x729 [ 959.166121] ? ns_capable_common+0xd/0xb1 [ 959.166288] ? ns_capable+0x13/0x15 [ 959.166450] rtnetlink_rcv_msg+0x188/0x197 [ 959.166617] ? rcu_read_unlock+0x3e/0x5f [ 959.166783] ? rtnl_newlink+0x729/0x729 [ 959.166948] netlink_rcv_skb+0x6c/0xce [ 959.167113] rtnetlink_rcv+0x23/0x2a [ 959.167273] netlink_unicast+0x103/0x181 [ 959.167439] netlink_sendmsg+0x326/0x337 [ 959.167607] sock_sendmsg_nosec+0x14/0x3f [ 959.167772] sock_sendmsg+0x29/0x2e [ 959.167932] ___sys_sendmsg+0x209/0x28b [ 959.168098] ? do_raw_spin_unlock+0xcd/0xf8 [ 959.168267] ? _raw_spin_unlock+0x27/0x31 [ 959.168432] ? __handle_mm_fault+0x651/0xdb1 [ 959.168602] ? check_chain_key+0xb0/0xfd [ 959.168773] __sys_sendmsg+0x45/0x63 [ 959.168934] ? __sys_sendmsg+0x45/0x63 [ 959.169100] SyS_sendmsg+0x19/0x1b [ 959.169260] entry_SYSCALL_64_fastpath+0x23/0xc2 [ 959.169432] RIP: 0033:0x7fcc5097e690 [ 959.169592] RSP: 002b:00007ffd0d5c7b48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 959.169887] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007fcc5097e690 [ 959.170089] RDX: 0000000000000000 RSI: 00007ffd0d5c7b90 RDI: 0000000000000003 [ 959.170292] RBP: ffff8800376d3f98 R08: 0000000000000001 R09: 0000000000000003 [ 959.170494] R10: 00007ffd0d5c7910 R11: 0000000000000246 R12: 0000000000000006 [ 959.170697] R13: 000000000066f1a0 R14: 00007ffd0d5cfc40 R15: 0000000000000000 [ 959.170900] ? trace_hardirqs_off_caller+0xa7/0xcf [ 959.171076] Code: 00 41 c7 84 24 14 01 00 00 00 00 00 00 41 c7 84 24 98 00 00 00 00 00 00 00 41 5c 41 5d 41 5e 5d c3 66 66 66 66 90 55 48 89 e5 53 <48> 8b 47 18 48 89 fb 48 8b 40 48 48 85 c0 74 02 ff d0 48 8b bb [ 959.171637] RIP: qdisc_reset+0xa/0x5c RSP: ffff8800376d3610 [ 959.171821] CR2: 0000000000000018 Fixes: 87b60cfa ("net_sched: fix error recovery at qdisc creation") Fixes: 0fbbeb1b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()") Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
Currently only a memory allocation failure can lead to this, so let's initialize the timer first. Fixes: 6529eaba ("net: sched: introduce tcf block infractructure") Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
netem can fail in ->init due to missing options (either not supplied by user-space or used as a default qdisc) causing a timer->base null pointer deref in its ->destroy() and ->reset() callbacks. Reproduce: $ sysctl net.core.default_qdisc=netem $ ip l set ethX up Crash log: [ 1814.846943] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1814.847181] IP: hrtimer_active+0x17/0x8a [ 1814.847270] PGD 59c34067 [ 1814.847271] P4D 59c34067 [ 1814.847337] PUD 37374067 [ 1814.847403] PMD 0 [ 1814.847468] [ 1814.847582] Oops: 0000 [#1] SMP [ 1814.847655] Modules linked in: sch_netem(O) sch_fq_codel(O) [ 1814.847761] CPU: 3 PID: 1573 Comm: ip Tainted: G O 4.13.0-rc6+ #62 [ 1814.847884] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 1814.848043] task: ffff88003723a700 task.stack: ffff88005adc8000 [ 1814.848235] RIP: 0010:hrtimer_active+0x17/0x8a [ 1814.848407] RSP: 0018:ffff88005adcb590 EFLAGS: 00010246 [ 1814.848590] RAX: 0000000000000000 RBX: ffff880058e359d8 RCX: 0000000000000000 [ 1814.848793] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880058e359d8 [ 1814.848998] RBP: ffff88005adcb5b0 R08: 00000000014080c0 R09: 00000000ffffffff [ 1814.849204] R10: ffff88005adcb660 R11: 0000000000000020 R12: 0000000000000000 [ 1814.849410] R13: ffff880058e359d8 R14: 00000000ffffffff R15: 0000000000000001 [ 1814.849616] FS: 00007f733bbca740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000 [ 1814.849919] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1814.850107] CR2: 0000000000000000 CR3: 0000000059f0d000 CR4: 00000000000406e0 [ 1814.850313] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1814.850518] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1814.850723] Call Trace: [ 1814.850875] hrtimer_try_to_cancel+0x1a/0x93 [ 1814.851047] hrtimer_cancel+0x15/0x20 [ 1814.851211] qdisc_watchdog_cancel+0x12/0x14 [ 1814.851383] netem_reset+0xe6/0xed [sch_netem] [ 1814.851561] qdisc_destroy+0x8b/0xe5 [ 1814.851723] qdisc_create_dflt+0x86/0x94 [ 1814.851890] ? dev_activate+0x129/0x129 [ 1814.852057] attach_one_default_qdisc+0x36/0x63 [ 1814.852232] netdev_for_each_tx_queue+0x3d/0x48 [ 1814.852406] dev_activate+0x4b/0x129 [ 1814.852569] __dev_open+0xe7/0x104 [ 1814.852730] __dev_change_flags+0xc6/0x15c [ 1814.852899] dev_change_flags+0x25/0x59 [ 1814.853064] do_setlink+0x30c/0xb3f [ 1814.853228] ? check_chain_key+0xb0/0xfd [ 1814.853396] ? check_chain_key+0xb0/0xfd [ 1814.853565] rtnl_newlink+0x3a4/0x729 [ 1814.853728] ? rtnl_newlink+0x117/0x729 [ 1814.853905] ? ns_capable_common+0xd/0xb1 [ 1814.854072] ? ns_capable+0x13/0x15 [ 1814.854234] rtnetlink_rcv_msg+0x188/0x197 [ 1814.854404] ? rcu_read_unlock+0x3e/0x5f [ 1814.854572] ? rtnl_newlink+0x729/0x729 [ 1814.854737] netlink_rcv_skb+0x6c/0xce [ 1814.854902] rtnetlink_rcv+0x23/0x2a [ 1814.855064] netlink_unicast+0x103/0x181 [ 1814.855230] netlink_sendmsg+0x326/0x337 [ 1814.855398] sock_sendmsg_nosec+0x14/0x3f [ 1814.855584] sock_sendmsg+0x29/0x2e [ 1814.855747] ___sys_sendmsg+0x209/0x28b [ 1814.855912] ? do_raw_spin_unlock+0xcd/0xf8 [ 1814.856082] ? _raw_spin_unlock+0x27/0x31 [ 1814.856251] ? __handle_mm_fault+0x651/0xdb1 [ 1814.856421] ? check_chain_key+0xb0/0xfd [ 1814.856592] __sys_sendmsg+0x45/0x63 [ 1814.856755] ? __sys_sendmsg+0x45/0x63 [ 1814.856923] SyS_sendmsg+0x19/0x1b [ 1814.857083] entry_SYSCALL_64_fastpath+0x23/0xc2 [ 1814.857256] RIP: 0033:0x7f733b2dd690 [ 1814.857419] RSP: 002b:00007ffe1d3387d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 1814.858238] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f733b2dd690 [ 1814.858445] RDX: 0000000000000000 RSI: 00007ffe1d338820 RDI: 0000000000000003 [ 1814.858651] RBP: ffff88005adcbf98 R08: 0000000000000001 R09: 0000000000000003 [ 1814.858856] R10: 00007ffe1d3385a0 R11: 0000000000000246 R12: 0000000000000002 [ 1814.859060] R13: 000000000066f1a0 R14: 00007ffe1d3408d0 R15: 0000000000000000 [ 1814.859267] ? trace_hardirqs_off_caller+0xa7/0xcf [ 1814.859446] Code: 10 55 48 89 c7 48 89 e5 e8 45 a1 fb ff 31 c0 5d c3 31 c0 c3 66 66 66 66 90 55 48 89 e5 41 56 41 55 41 54 53 49 89 fd 49 8b 45 30 <4c> 8b 20 41 8b 5c 24 38 31 c9 31 d2 48 c7 c7 50 8e 1d 82 41 89 [ 1814.860022] RIP: hrtimer_active+0x17/0x8a RSP: ffff88005adcb590 [ 1814.860214] CR2: 0000000000000000 Fixes: 87b60cfa ("net_sched: fix error recovery at qdisc creation") Fixes: 0fbbeb1b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()") Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
It is very unlikely to happen but the backlogs memory allocation could fail and will free q->flows, but then ->destroy() will free q->flows too. For correctness remove the first free and let ->destroy clean up. Fixes: 87b60cfa ("net_sched: fix error recovery at qdisc creation") Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
CBQ can fail on ->init by wrong nl attributes or simply for missing any, f.e. if it's set as a default qdisc then TCA_OPTIONS (opt) will be NULL when it is activated. The first thing init does is parse opt but it will dereference a null pointer if used as a default qdisc, also since init failure at default qdisc invokes ->reset() which cancels all timers then we'll also dereference two more null pointers (timer->base) as they were never initialized. To reproduce: $ sysctl net.core.default_qdisc=cbq $ ip l set ethX up Crash log of the first null ptr deref: [44727.907454] BUG: unable to handle kernel NULL pointer dereference at (null) [44727.907600] IP: cbq_init+0x27/0x205 [44727.907676] PGD 59ff4067 [44727.907677] P4D 59ff4067 [44727.907742] PUD 59c70067 [44727.907807] PMD 0 [44727.907873] [44727.907982] Oops: 0000 [#1] SMP [44727.908054] Modules linked in: [44727.908126] CPU: 1 PID: 21312 Comm: ip Not tainted 4.13.0-rc6+ #60 [44727.908235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [44727.908477] task: ffff88005ad42700 task.stack: ffff880037214000 [44727.908672] RIP: 0010:cbq_init+0x27/0x205 [44727.908838] RSP: 0018:ffff8800372175f0 EFLAGS: 00010286 [44727.909018] RAX: ffffffff816c3852 RBX: ffff880058c53800 RCX: 0000000000000000 [44727.909222] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffff8800372175f8 [44727.909427] RBP: ffff880037217650 R08: ffffffff81b0f380 R09: 0000000000000000 [44727.909631] R10: ffff880037217660 R11: 0000000000000020 R12: ffffffff822a44c0 [44727.909835] R13: ffff880058b92000 R14: 00000000ffffffff R15: 0000000000000001 [44727.910040] FS: 00007ff8bc583740(0000) GS:ffff88005d880000(0000) knlGS:0000000000000000 [44727.910339] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [44727.910525] CR2: 0000000000000000 CR3: 00000000371e5000 CR4: 00000000000406e0 [44727.910731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [44727.910936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [44727.911141] Call Trace: [44727.911291] ? lockdep_init_map+0xb6/0x1ba [44727.911461] ? qdisc_alloc+0x14e/0x187 [44727.911626] qdisc_create_dflt+0x7a/0x94 [44727.911794] ? dev_activate+0x129/0x129 [44727.911959] attach_one_default_qdisc+0x36/0x63 [44727.912132] netdev_for_each_tx_queue+0x3d/0x48 [44727.912305] dev_activate+0x4b/0x129 [44727.912468] __dev_open+0xe7/0x104 [44727.912631] __dev_change_flags+0xc6/0x15c [44727.912799] dev_change_flags+0x25/0x59 [44727.912966] do_setlink+0x30c/0xb3f [44727.913129] ? check_chain_key+0xb0/0xfd [44727.913294] ? check_chain_key+0xb0/0xfd [44727.913463] rtnl_newlink+0x3a4/0x729 [44727.913626] ? rtnl_newlink+0x117/0x729 [44727.913801] ? ns_capable_common+0xd/0xb1 [44727.913968] ? ns_capable+0x13/0x15 [44727.914131] rtnetlink_rcv_msg+0x188/0x197 [44727.914300] ? rcu_read_unlock+0x3e/0x5f [44727.914465] ? rtnl_newlink+0x729/0x729 [44727.914630] netlink_rcv_skb+0x6c/0xce [44727.914796] rtnetlink_rcv+0x23/0x2a [44727.914956] netlink_unicast+0x103/0x181 [44727.915122] netlink_sendmsg+0x326/0x337 [44727.915291] sock_sendmsg_nosec+0x14/0x3f [44727.915459] sock_sendmsg+0x29/0x2e [44727.915619] ___sys_sendmsg+0x209/0x28b [44727.915784] ? do_raw_spin_unlock+0xcd/0xf8 [44727.915954] ? _raw_spin_unlock+0x27/0x31 [44727.916121] ? __handle_mm_fault+0x651/0xdb1 [44727.916290] ? check_chain_key+0xb0/0xfd [44727.916461] __sys_sendmsg+0x45/0x63 [44727.916626] ? __sys_sendmsg+0x45/0x63 [44727.916792] SyS_sendmsg+0x19/0x1b [44727.916950] entry_SYSCALL_64_fastpath+0x23/0xc2 [44727.917125] RIP: 0033:0x7ff8bbc96690 [44727.917286] RSP: 002b:00007ffc360991e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [44727.917579] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007ff8bbc96690 [44727.917783] RDX: 0000000000000000 RSI: 00007ffc36099230 RDI: 0000000000000003 [44727.917987] RBP: ffff880037217f98 R08: 0000000000000001 R09: 0000000000000003 [44727.918190] R10: 00007ffc36098fb0 R11: 0000000000000246 R12: 0000000000000006 [44727.918393] R13: 000000000066f1a0 R14: 00007ffc360a12e0 R15: 0000000000000000 [44727.918597] ? trace_hardirqs_off_caller+0xa7/0xcf [44727.918774] Code: 41 5f 5d c3 66 66 66 66 90 55 48 8d 56 04 45 31 c9 49 c7 c0 80 f3 b0 81 48 89 e5 41 55 41 54 53 48 89 fb 48 8d 7d a8 48 83 ec 48 <0f> b7 0e be 07 00 00 00 83 e9 04 e8 e6 f7 d8 ff 85 c0 0f 88 bb [44727.919332] RIP: cbq_init+0x27/0x205 RSP: ffff8800372175f0 [44727.919516] CR2: 0000000000000000 Fixes: 0fbbeb1b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()") Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
Depending on where ->init fails we can get a null pointer deref due to uninitialized hires timer (watchdog) or a double free of the qdisc hash because it is already freed by ->destroy(). Fixes: 8d553738 ("net/sched/hfsc: allocate tcf block for hfsc root class") Fixes: 87b60cfa ("net_sched: fix error recovery at qdisc creation") Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
If sch_hhf fails in its ->init() function (either due to wrong user-space arguments as below or memory alloc failure of hh_flows) it will do a null pointer deref of q->hh_flows in its ->destroy() function. To reproduce the crash: $ tc qdisc add dev eth0 root hhf quantum 2000000 non_hh_weight 10000000 Crash log: [ 690.654882] BUG: unable to handle kernel NULL pointer dereference at (null) [ 690.655565] IP: hhf_destroy+0x48/0xbc [ 690.655944] PGD 37345067 [ 690.655948] P4D 37345067 [ 690.656252] PUD 58402067 [ 690.656554] PMD 0 [ 690.656857] [ 690.657362] Oops: 0000 [#1] SMP [ 690.657696] Modules linked in: [ 690.658032] CPU: 3 PID: 920 Comm: tc Not tainted 4.13.0-rc6+ #57 [ 690.658525] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 690.659255] task: ffff880058578000 task.stack: ffff88005acbc000 [ 690.659747] RIP: 0010:hhf_destroy+0x48/0xbc [ 690.660146] RSP: 0018:ffff88005acbf9e0 EFLAGS: 00010246 [ 690.660601] RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000 [ 690.661155] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff821f63f0 [ 690.661710] RBP: ffff88005acbfa08 R08: ffffffff81b10a90 R09: 0000000000000000 [ 690.662267] R10: 00000000f42b7019 R11: ffff880058578000 R12: 00000000ffffffea [ 690.662820] R13: ffff8800372f6400 R14: 0000000000000000 R15: 0000000000000000 [ 690.663769] FS: 00007f8ae5e8b740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000 [ 690.667069] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 690.667965] CR2: 0000000000000000 CR3: 0000000058523000 CR4: 00000000000406e0 [ 690.668918] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 690.669945] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 690.671003] Call Trace: [ 690.671743] qdisc_create+0x377/0x3fd [ 690.672534] tc_modify_qdisc+0x4d2/0x4fd [ 690.673324] rtnetlink_rcv_msg+0x188/0x197 [ 690.674204] ? rcu_read_unlock+0x3e/0x5f [ 690.675091] ? rtnl_newlink+0x729/0x729 [ 690.675877] netlink_rcv_skb+0x6c/0xce [ 690.676648] rtnetlink_rcv+0x23/0x2a [ 690.677405] netlink_unicast+0x103/0x181 [ 690.678179] netlink_sendmsg+0x326/0x337 [ 690.678958] sock_sendmsg_nosec+0x14/0x3f [ 690.679743] sock_sendmsg+0x29/0x2e [ 690.680506] ___sys_sendmsg+0x209/0x28b [ 690.681283] ? __handle_mm_fault+0xc7d/0xdb1 [ 690.681915] ? check_chain_key+0xb0/0xfd [ 690.682449] __sys_sendmsg+0x45/0x63 [ 690.682954] ? __sys_sendmsg+0x45/0x63 [ 690.683471] SyS_sendmsg+0x19/0x1b [ 690.683974] entry_SYSCALL_64_fastpath+0x23/0xc2 [ 690.684516] RIP: 0033:0x7f8ae529d690 [ 690.685016] RSP: 002b:00007fff26d2d6b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 690.685931] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f8ae529d690 [ 690.686573] RDX: 0000000000000000 RSI: 00007fff26d2d700 RDI: 0000000000000003 [ 690.687047] RBP: ffff88005acbff98 R08: 0000000000000001 R09: 0000000000000000 [ 690.687519] R10: 00007fff26d2d480 R11: 0000000000000246 R12: 0000000000000002 [ 690.687996] R13: 0000000001258070 R14: 0000000000000001 R15: 0000000000000000 [ 690.688475] ? trace_hardirqs_off_caller+0xa7/0xcf [ 690.688887] Code: 00 00 e8 2a 02 ae ff 49 8b bc 1d 60 02 00 00 48 83 c3 08 e8 19 02 ae ff 48 83 fb 20 75 dc 45 31 f6 4d 89 f7 4d 03 bd 20 02 00 00 <49> 8b 07 49 39 c7 75 24 49 83 c6 10 49 81 fe 00 40 00 00 75 e1 [ 690.690200] RIP: hhf_destroy+0x48/0xbc RSP: ffff88005acbf9e0 [ 690.690636] CR2: 0000000000000000 Fixes: 87b60cfa ("net_sched: fix error recovery at qdisc creation") Fixes: 10239edf ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc") Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
The below commit added a call to ->destroy() on init failure, but multiq still frees ->queues on error in init, but ->queues is also freed by ->destroy() thus we get double free and corrupted memory. Very easy to reproduce (eth0 not multiqueue): $ tc qdisc add dev eth0 root multiq RTNETLINK answers: Operation not supported $ ip l add dumdum type dummy (crash) Trace log: [ 3929.467747] general protection fault: 0000 [#1] SMP [ 3929.468083] Modules linked in: [ 3929.468302] CPU: 3 PID: 967 Comm: ip Not tainted 4.13.0-rc6+ #56 [ 3929.468625] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 3929.469124] task: ffff88003716a700 task.stack: ffff88005872c000 [ 3929.469449] RIP: 0010:__kmalloc_track_caller+0x117/0x1be [ 3929.469746] RSP: 0018:ffff88005872f6a0 EFLAGS: 00010246 [ 3929.470042] RAX: 00000000000002de RBX: 0000000058a59000 RCX: 00000000000002df [ 3929.470406] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff821f7020 [ 3929.470770] RBP: ffff88005872f6e8 R08: 000000000001f010 R09: 0000000000000000 [ 3929.471133] R10: ffff88005872f730 R11: 0000000000008cdd R12: ff006d75646d7564 [ 3929.471496] R13: 00000000014000c0 R14: ffff88005b403c00 R15: ffff88005b403c00 [ 3929.471869] FS: 00007f0b70480740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000 [ 3929.472286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3929.472677] CR2: 00007ffcee4f3000 CR3: 0000000059d45000 CR4: 00000000000406e0 [ 3929.473209] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3929.474109] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3929.474873] Call Trace: [ 3929.475337] ? kstrdup_const+0x23/0x25 [ 3929.475863] kstrdup+0x2e/0x4b [ 3929.476338] kstrdup_const+0x23/0x25 [ 3929.478084] __kernfs_new_node+0x28/0xbc [ 3929.478478] kernfs_new_node+0x35/0x55 [ 3929.478929] kernfs_create_link+0x23/0x76 [ 3929.479478] sysfs_do_create_link_sd.isra.2+0x85/0xd7 [ 3929.480096] sysfs_create_link+0x33/0x35 [ 3929.480649] device_add+0x200/0x589 [ 3929.481184] netdev_register_kobject+0x7c/0x12f [ 3929.481711] register_netdevice+0x373/0x471 [ 3929.482174] rtnl_newlink+0x614/0x729 [ 3929.482610] ? rtnl_newlink+0x17f/0x729 [ 3929.483080] rtnetlink_rcv_msg+0x188/0x197 [ 3929.483533] ? rcu_read_unlock+0x3e/0x5f [ 3929.483984] ? rtnl_newlink+0x729/0x729 [ 3929.484420] netlink_rcv_skb+0x6c/0xce [ 3929.484858] rtnetlink_rcv+0x23/0x2a [ 3929.485291] netlink_unicast+0x103/0x181 [ 3929.485735] netlink_sendmsg+0x326/0x337 [ 3929.486181] sock_sendmsg_nosec+0x14/0x3f [ 3929.486614] sock_sendmsg+0x29/0x2e [ 3929.486973] ___sys_sendmsg+0x209/0x28b [ 3929.487340] ? do_raw_spin_unlock+0xcd/0xf8 [ 3929.487719] ? _raw_spin_unlock+0x27/0x31 [ 3929.488092] ? __handle_mm_fault+0x651/0xdb1 [ 3929.488471] ? check_chain_key+0xb0/0xfd [ 3929.488847] __sys_sendmsg+0x45/0x63 [ 3929.489206] ? __sys_sendmsg+0x45/0x63 [ 3929.489576] SyS_sendmsg+0x19/0x1b [ 3929.489901] entry_SYSCALL_64_fastpath+0x23/0xc2 [ 3929.490172] RIP: 0033:0x7f0b6fb93690 [ 3929.490423] RSP: 002b:00007ffcee4ed588 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 3929.490881] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f0b6fb93690 [ 3929.491198] RDX: 0000000000000000 RSI: 00007ffcee4ed5d0 RDI: 0000000000000003 [ 3929.491521] RBP: ffff88005872ff98 R08: 0000000000000001 R09: 0000000000000000 [ 3929.491801] R10: 00007ffcee4ed350 R11: 0000000000000246 R12: 0000000000000002 [ 3929.492075] R13: 000000000066f1a0 R14: 00007ffcee4f5680 R15: 0000000000000000 [ 3929.492352] ? trace_hardirqs_off_caller+0xa7/0xcf [ 3929.492590] Code: 8b 45 c0 48 8b 45 b8 74 17 48 8b 4d c8 83 ca ff 44 89 ee 4c 89 f7 e8 83 ca ff ff 49 89 c4 eb 49 49 63 56 20 48 8d 48 01 4d 8b 06 <49> 8b 1c 14 48 89 c2 4c 89 e0 65 49 0f c7 08 0f 94 c0 83 f0 01 [ 3929.493335] RIP: __kmalloc_track_caller+0x117/0x1be RSP: ffff88005872f6a0 Fixes: 87b60cfa ("net_sched: fix error recovery at qdisc creation") Fixes: f07d1501 ("multiq: Further multiqueue cleanup") Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
The commit below added a call to the ->destroy() callback for all qdiscs which failed in their ->init(), but some were not prepared for such change and can't handle partially initialized qdisc. HTB is one of them and if any error occurs before the qdisc watchdog timer and qdisc work are initialized then we can hit either a null ptr deref (timer->base) when canceling in ->destroy or lockdep error info about trying to register a non-static key and a stack dump. So to fix these two move the watchdog timer and workqueue init before anything that can err out. To reproduce userspace needs to send broken htb qdisc create request, tested with a modified tc (q_htb.c). Trace log: [ 2710.897602] BUG: unable to handle kernel NULL pointer dereference at (null) [ 2710.897977] IP: hrtimer_active+0x17/0x8a [ 2710.898174] PGD 58fab067 [ 2710.898175] P4D 58fab067 [ 2710.898353] PUD 586c0067 [ 2710.898531] PMD 0 [ 2710.898710] [ 2710.899045] Oops: 0000 [#1] SMP [ 2710.899232] Modules linked in: [ 2710.899419] CPU: 1 PID: 950 Comm: tc Not tainted 4.13.0-rc6+ #54 [ 2710.899646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 2710.900035] task: ffff880059ed2700 task.stack: ffff88005ad4c000 [ 2710.900262] RIP: 0010:hrtimer_active+0x17/0x8a [ 2710.900467] RSP: 0018:ffff88005ad4f960 EFLAGS: 00010246 [ 2710.900684] RAX: 0000000000000000 RBX: ffff88003701e298 RCX: 0000000000000000 [ 2710.900933] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003701e298 [ 2710.901177] RBP: ffff88005ad4f980 R08: 0000000000000001 R09: 0000000000000001 [ 2710.901419] R10: ffff88005ad4f800 R11: 0000000000000400 R12: 0000000000000000 [ 2710.901663] R13: ffff88003701e298 R14: ffffffff822a4540 R15: ffff88005ad4fac0 [ 2710.901907] FS: 00007f2f5e90f740(0000) GS:ffff88005d880000(0000) knlGS:0000000000000000 [ 2710.902277] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2710.902500] CR2: 0000000000000000 CR3: 0000000058ca3000 CR4: 00000000000406e0 [ 2710.902744] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2710.902977] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2710.903180] Call Trace: [ 2710.903332] hrtimer_try_to_cancel+0x1a/0x93 [ 2710.903504] hrtimer_cancel+0x15/0x20 [ 2710.903667] qdisc_watchdog_cancel+0x12/0x14 [ 2710.903866] htb_destroy+0x2e/0xf7 [ 2710.904097] qdisc_create+0x377/0x3fd [ 2710.904330] tc_modify_qdisc+0x4d2/0x4fd [ 2710.904511] rtnetlink_rcv_msg+0x188/0x197 [ 2710.904682] ? rcu_read_unlock+0x3e/0x5f [ 2710.904849] ? rtnl_newlink+0x729/0x729 [ 2710.905017] netlink_rcv_skb+0x6c/0xce [ 2710.905183] rtnetlink_rcv+0x23/0x2a [ 2710.905345] netlink_unicast+0x103/0x181 [ 2710.905511] netlink_sendmsg+0x326/0x337 [ 2710.905679] sock_sendmsg_nosec+0x14/0x3f [ 2710.905847] sock_sendmsg+0x29/0x2e [ 2710.906010] ___sys_sendmsg+0x209/0x28b [ 2710.906176] ? do_raw_spin_unlock+0xcd/0xf8 [ 2710.906346] ? _raw_spin_unlock+0x27/0x31 [ 2710.906514] ? __handle_mm_fault+0x651/0xdb1 [ 2710.906685] ? check_chain_key+0xb0/0xfd [ 2710.906855] __sys_sendmsg+0x45/0x63 [ 2710.907018] ? __sys_sendmsg+0x45/0x63 [ 2710.907185] SyS_sendmsg+0x19/0x1b [ 2710.907344] entry_SYSCALL_64_fastpath+0x23/0xc2 Note that probably this bug goes further back because the default qdisc handling always calls ->destroy on init failure too. Fixes: 87b60cfa ("net_sched: fix error recovery at qdisc creation") Fixes: 0fbbeb1b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()") Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ahmed Abdelsalam 提交于
IPv6 packet may carry more than one extension header, and IPv6 nodes must accept and attempt to process extension headers in any order and occurring any number of times in the same packet. Hence, there should be no assumption that Segment Routing extension header is to appear immediately after the IPv6 header. Moreover, section 4.1 of RFC 8200 gives a recommendation on the order of appearance of those extension headers within an IPv6 packet. According to this recommendation, Segment Routing extension header should appear after Hop-by-Hop and Destination Options headers (if they present). This patch fixes the get_srh(), so it gets the segment routing header regardless of its position in the chain of the extension headers in IPv6 packet, and makes sure that the IPv6 routing extension header is of Type 4. Signed-off-by: NAhmed Abdelsalam <amsalam20@gmail.com> Acked-by: NDavid Lebrun <david.lebrun@uclouvain.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Chris Mi 提交于
Typically, each TC filter has its own action. All the actions of the same type are saved in its hash table. But the hash buckets are too small that it degrades to a list. And the performance is greatly affected. For example, it takes about 0m11.914s to insert 64K rules. If we convert the hash table to IDR, it only takes about 0m1.500s. The improvement is huge. But please note that the test result is based on previous patch that cls_flower uses IDR. Signed-off-by: NChris Mi <chrism@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Chris Mi 提交于
Currently, all filters with the same priority are linked in a doubly linked list. Every filter should have a unique handle. To make the handle unique, we need to iterate the list every time to see if the handle exists or not when inserting a new filter. It is time-consuming. For example, it takes about 5m3.169s to insert 64K rules. This patch changes cls_flower to use IDR. With this patch, it takes about 0m1.127s to insert 64K rules. The improvement is huge. But please note that in this testing, all filters share the same action. If every filter has a unique action, that is another bottleneck. Follow-up patch in this patchset addresses that. Signed-off-by: NChris Mi <chrism@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
This reverts commit 45f119bf. Eric Dumazet says: We found at Google a significant regression caused by 45f119bf tcp: remove header prediction In typical RPC (TCP_RR), when a TCP socket receives data, we now call tcp_ack() while we used to not call it. This touches enough cache lines to cause a slowdown. so problem does not seem to be HP removal itself but the tcp_ack() call. Therefore, it might be possible to remove HP after all, provided one finds a way to elide tcp_ack for most cases. Reported-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
This change was a followup to the header prediction removal, so first revert this as a prerequisite to back out hp removal. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 8月, 2017 6 次提交
-
-
由 Eric Dumazet 提交于
Florian reported UDP xmit drops that could be root caused to the too small neigh limit. Current limit is 64 KB, meaning that even a single UDP socket would hit it, since its default sk_sndbuf comes from net.core.wmem_default (~212992 bytes on 64bit arches). Once ARP/ND resolution is in progress, we should allow a little more packets to be queued, at least for one producer. Once neigh arp_queue is filled, a rogue socket should hit its sk_sndbuf limit and either block in sendmsg() or return -EAGAIN. Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Tariq repored local pings to linklocal address is failing: $ ifconfig ens8 ens8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 11.141.16.6 netmask 255.255.0.0 broadcast 11.141.255.255 inet6 fe80::7efe:90ff:fecb:7502 prefixlen 64 scopeid 0x20<link> ether 7c:fe:90:cb:75:02 txqueuelen 1000 (Ethernet) RX packets 12 bytes 1164 (1.1 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 30 bytes 2484 (2.4 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 $ /bin/ping6 -c 3 fe80::7efe:90ff:fecb:7502%ens8 PING fe80::7efe:90ff:fecb:7502%ens8(fe80::7efe:90ff:fecb:7502) 56 data bytes Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
Add a new nsh/ directory. It currently holds only GSO functions but more will come: in particular, code shared by openvswitch and tc to manipulate NSH headers. For now, assume there's no hardware support for NSH segmentation. We can always introduce netdev->nsh_features later. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Aring 提交于
This patch handles a default IFE type if it's not given by user space netlink api. The default IFE type will be the registered ethertype by IEEE for IFE ForCES. Signed-off-by: NAlexander Aring <aring@mojatatu.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Benjamin Poirier 提交于
... which may happen with certain values of tp_reserve and maclen. Fixes: 58d19b19 ("packet: vnet_hdr support for tpacket_rcv") Signed-off-by: NBenjamin Poirier <bpoirier@suse.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Parthasarathy Bhuvaragan 提交于
For a bond slave device as a tipc bearer, the dev represents the bond interface and orig_dev represents the slave in tipc_l2_rcv_msg(). Since we decode the tipc_ptr from bonding device (dev), we fail to find the bearer and thus tipc links are not established. In this commit, we register the tipc protocol callback per device and look for tipc bearer from both the devices. Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-