- 21 7月, 2020 2 次提交
-
-
由 Florian Fainelli 提交于
Now that we have all the infrastructure in place for calling into the dsa_ptr->netdev_ops function pointers, install them when we configure the DSA CPU/management interface and tear them down. The flow is unchanged from before, but now we preserve equality of tests when network device drivers do tests like dev->netdev_ops == &foo_ops which was not the case before since we were allocating an entirely new structure. Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Fainelli 提交于
Add definitions for the dsa_netdevice_ops structure which is a subset of the net_device_ops structure for the specific operations that we care about overlaying on top of the DSA CPU port net_device and provide inline stubs that take core managing whether DSA code is reachable. Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 7月, 2020 7 次提交
-
-
由 Willem de Bruijn 提交于
Add setsockopt SOL_IP/IP_RECVERR_4884 to return the offset to an extension struct if present. ICMP messages may include an extension structure after the original datagram. RFC 4884 standardized this behavior. It stores the offset in words to the extension header in u8 icmphdr.un.reserved[1]. The field is valid only for ICMP types destination unreachable, time exceeded and parameter problem, if length is at least 128 bytes and entire packet does not exceed 576 bytes. Return the offset to the start of the extension struct when reading an ICMP error from the error queue, if it matches the above constraints. Do not return the raw u8 field. Return the offset from the start of the user buffer, in bytes. The kernel does not return the network and transport headers, so subtract those. Also validate the headers. Return the offset regardless of validation, as an invalid extension must still not be misinterpreted as part of the original datagram. Note that !invalid does not imply valid. If the extension version does not match, no validation can take place, for instance. For backward compatibility, make this optional, set by setsockopt SOL_IP/IP_RECVERR_RFC4884. For API example and feature test, see github.com/wdebruij/kerneltools/blob/master/tests/recv_icmp_v2.c For forward compatibility, reserve only setsockopt value 1, leaving other bits for additional icmp extensions. Changes v1->v2: - convert word offset to byte offset from start of user buffer - return in ee_data as u8 may be insufficient - define extension struct and object header structs - return len only if constraints met - if returning len, also validate Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Lorenzo Bianconi 提交于
Introduce xdp_get_shared_info_from_{buff,frame} utility routines to get skb_shared_info from xdp buffer/frame pointer. xdp_get_shared_info_from_{buff,frame} will be used to implement xdp multi-buffer support Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christoph Hellwig 提交于
Just check for a NULL method instead of wiring up sock_no_{get,set}sockopt. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NMarc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christoph Hellwig 提交于
Handle the few cases that need special treatment in-line using in_compat_syscall(). This also removes all the now unused compat_{get,set}sockopt methods. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christoph Hellwig 提交于
Handle the few cases that need special treatment in-line using in_compat_syscall(). Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christoph Hellwig 提交于
Add the compat handling to sock_common_{get,set}sockopt instead, keyed of in_compat_syscall(). This allow to remove the now unused ->compat_{get,set}sockopt methods from struct proto_ops. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Acked-by: NStefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christoph Hellwig 提交于
Add a helper that copies either a native or compat bpf_fprog from userspace after verifying the length, and remove the compat setsockopt handlers that now aren't required. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 7月, 2020 2 次提交
-
-
由 Petr Machata 提交于
This reverts commit aebe4426. Signed-off-by: NPetr Machata <petrm@mellanox.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Petr Machata 提交于
Mirred currently does not mix well with blocks executed after the qdisc root lock is taken. This includes classification blocks (such as in PRIO, ETS, DRR qdiscs) and qevents. The locking caused by the packet mirrored by mirred can cause deadlocks: either when the thread of execution attempts to take the lock a second time, or when two threads end up waiting on each other's locks. The qevent patchset attempted to not introduce further badness of this sort, and dropped the lock before executing the qevent block. However this lead to too little locking and races between qdisc configuration and packet enqueue in the RED qdisc. Before the deadlock issues are solved in a way that can be applied across many qdiscs reasonably easily, do for qevents what is done for the classification blocks and just keep holding the root lock. Signed-off-by: NPetr Machata <petrm@mellanox.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 16 7月, 2020 6 次提交
-
-
由 Randy Dunlap 提交于
Drop doubled words in several comments. Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Randy Dunlap 提交于
Drop doubled word "the" in a comment. Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Randy Dunlap 提交于
Drop doubled word "to" in a comment. Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Randy Dunlap 提交于
Drop doubled words "or" and "the" in several comments. Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Randy Dunlap 提交于
Drop doubled word "not" in a comment. Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Randy Dunlap 提交于
Drop doubled words in two comments. Fix a spello/typo. Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 15 7月, 2020 4 次提交
-
-
由 YueHaibing 提交于
commit 263e1201 ("mptcp: consolidate synack processing.") left behind this, remove it. Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Acked-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 YueHaibing 提交于
It is not used since commit 09c75704 ("xfrm: remove flow cache") Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 YueHaibing 提交于
They are not used any more since commit b1edeb10 ("netlabel: Replace protocol/NetLabel linking with refrerence counts") Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Acked-by: NPaul Moore <paul@paul-moore.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Horatiu Vultur 提交于
Extend switchdev API to add support for MRP interconnect. The HW is notified in the following cases: SWITCHDEV_OBJ_ID_IN_ROLE_MRP: This is used when the interconnect role of the node changes. The supported roles are MIM and MIC. SWITCHDEV_OBJ_ID_IN_STATE_MRP: This is used when the interconnect ring changes it states to open or closed. SWITCHDEV_OBJ_ID_IN_TEST_MRP: This is used to start/stop sending MRP_InTest frames on all MRP ports. This is called only on nodes that have the interconnect role MIM. Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 7月, 2020 2 次提交
-
-
由 Petr Machata 提交于
Previously, shared blocks were only relevant for the pseudo-qdiscs ingress and clsact. Recently, a qevent facility was introduced, which allows to bind blocks to well-defined slots of a qdisc instance. RED in particular got two qevents: early_drop and mark. Drivers that wish to offload these blocks will be sent the usual notification, and need to know which qdisc it is related to. To that end, extend flow_block_offload with a "sch" pointer, and initialize as appropriate. This prompts changes in the indirect block facility, which now tracks the scheduler in addition to the netdevice. Update signatures of several functions similarly. Signed-off-by: NPetr Machata <petrm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ciara Loftus 提交于
It can be useful for the user to know the reason behind a dropped packet. Introduce new counters which track drops on the receive path caused by: 1. rx ring being full 2. fill ring being empty Also, on the tx path introduce a counter which tracks the number of times we attempt pull from the tx ring when it is empty. Signed-off-by: NCiara Loftus <ciara.loftus@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200708072835.4427-2-ciara.loftus@intel.com
-
- 11 7月, 2020 5 次提交
-
-
由 Vladyslav Tarasiuk 提交于
In order to use new devlink port health reporters infrastructure, add corresponding constructor and destructor functions. Signed-off-by: NVladyslav Tarasiuk <vladyslavt@mellanox.com> Reviewed-by: NMoshe Shemesh <moshe@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladyslav Tarasiuk 提交于
Add devlink-health reporter support on per-port basis. The main difference existing devlink-health is that port reporters are stored in per-devlink_port lists. Upon creation of such health reporter the reference to a port it belongs to is stored in reporter struct. Fill the port index attribute in devlink-health response to allow devlink userspace utility to distinguish between device and port reporters. Signed-off-by: NVladyslav Tarasiuk <vladyslavt@mellanox.com> Reviewed-by: NMoshe Shemesh <moshe@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Add an interface to report offloaded UDP ports via ethtool netlink. Now that core takes care of tracking which UDP tunnel ports the NICs are aware of we can quite easily export this information out to user space. The responsibility of writing the netlink dumps is split between ethtool code and udp_tunnel_nic.c - since udp_tunnel module may not always be loaded, yet we should always report the capabilities of the NIC. $ ethtool --show-tunnels eth0 Tunnel information for eth0: UDP port table 0: Size: 4 Types: vxlan No entries UDP port table 1: Size: 4 Types: geneve, vxlan-gpe Entries (1): port 1230, vxlan-gpe v4: - back to v2, build fix is now directly in udp_tunnel.h v3: - don't compile ETHTOOL_MSG_TUNNEL_INFO_GET in if CONFIG_INET not set. v2: - fix string set count, - reorder enums in the uAPI, - fix type of ETHTOOL_A_TUNNEL_UDP_TABLE_TYPES to bitset in docs and comments. Signed-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Cater to devices which: (a) may want to sleep in the callbacks; (b) only have IPv4 support; (c) need all the programming to happen while the netdev is up. Drivers attach UDP tunnel offload info struct to their netdevs, where they declare how many UDP ports of various tunnel types they support. Core takes care of tracking which ports to offload. Use a fixed-size array since this matches what almost all drivers do, and avoids a complexity and uncertainty around memory allocations in an atomic context. Make sure that tunnel drivers don't try to replay the ports when new NIC netdev is registered. Automatic replays would mess up reference counting, and will be removed completely once all drivers are converted. v4: - use a #define NULL to avoid build issues with CONFIG_INET=n. Signed-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Make it possible to use tunnel types as flags more easily. There doesn't appear to be any user using the type as an array index, so this should make no difference. Signed-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 7月, 2020 6 次提交
-
-
由 Danielle Ratson 提交于
Add a new attribute that indicates the split ability of devlink port. Drivers are expected to set it via devlink_port_attrs_set(), before registering the port. Signed-off-by: NDanielle Ratson <danieller@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Danielle Ratson 提交于
Add a new devlink port attribute that indicates the port's number of lanes. Drivers are expected to set it via devlink_port_attrs_set(), before registering the port. The attribute is not passed to user space in case the number of lanes is invalid (0). Signed-off-by: NDanielle Ratson <danieller@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Danielle Ratson 提交于
Currently, devlink_port_attrs_set accepts a long list of parameters, that most of them are devlink port's attributes. Use the devlink_port_attrs struct to replace the relevant parameters. Signed-off-by: NDanielle Ratson <danieller@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Danielle Ratson 提交于
The struct devlink_port_attrs holds the attributes of devlink_port. Similarly to the previous patch, 'switch_port' attribute is another exception. Move 'switch_port' to be devlink_port's field. Signed-off-by: NDanielle Ratson <danieller@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Danielle Ratson 提交于
The struct devlink_port_attrs holds the attributes of devlink_port. The 'set' field is not devlink_port's attribute as opposed to most of the others. Move 'set' to be devlink_port's field called 'attrs_set'. Signed-off-by: NDanielle Ratson <danieller@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Martin KaFai Lau 提交于
bpf_sk_reuseport_detach is currently called when sk->sk_user_data is not NULL. It is incorrect because sk->sk_user_data may not be managed by the bpf's reuseport_array. It has been reported in [1] that, the bpf_sk_reuseport_detach() which is called from udp_lib_unhash() has corrupted the sk_user_data managed by l2tp. This patch solves it by using another bit (defined as SK_USER_DATA_BPF) of the sk_user_data pointer value. It marks that a sk_user_data is managed/owned by BPF. The patch depends on a PTRMASK introduced in commit f1ff5ce2 ("net, sk_msg: Clear sk_user_data pointer on clone if tagged"). [ Note: sk->sk_user_data is used by bpf's reuseport_array only when a sk is added to the bpf's reuseport_array. i.e. doing setsockopt(SO_REUSEPORT) and having "sk->sk_reuseport == 1" alone will not stop sk->sk_user_data being used by other means. ] [1]: https://lore.kernel.org/netdev/20200706121259.GA20199@katalix.com/ Fixes: 5dc4c4b7 ("bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY") Reported-by: NJames Chapman <jchapman@katalix.com> Reported-by: syzbot+9f092552ba9a5efca5df@syzkaller.appspotmail.com Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Tested-by: NJames Chapman <jchapman@katalix.com> Acked-by: NJames Chapman <jchapman@katalix.com> Link: https://lore.kernel.org/bpf/20200709061110.4019316-1-kafai@fb.com
-
- 09 7月, 2020 1 次提交
-
-
由 Linus Walleij 提交于
This implements the known parts of the Realtek 4 byte tag protocol version 0xA, as found in the RTL8366RB DSA switch. It is designated as protocol version 0xA as a different Realtek 4 byte tag format with protocol version 0x9 is known to exist in the Realtek RTL8306 chips. The tag and switch chip lacks public documentation, so the tag format has been reverse-engineered from packet dumps. As only ingress traffic has been available for analysis an egress tag has not been possible to develop (even using educated guesses about bit fields) so this is as far as it gets. It is not known if the switch even supports egress tagging. Excessive attempts to figure out the egress tag format was made. When nothing else worked, I just tried all bit combinations with 0xannp where a is protocol and p is port. I looped through all values several times trying to get a response from ping, without any positive result. Using just these ingress tags however, the switch functionality is vastly improved and the packets find their way into the destination port without any tricky VLAN configuration. On the D-Link DIR-685 the LAN ports now come up and respond to ping without any command line configuration so this is a real improvement for users. Egress packets need to be restricted to the proper target ports using VLAN, which the RTL8366RB DSA switch driver already sets up. Cc: DENG Qingfang <dqfext@gmail.com> Cc: Mauri Sandberg <sandberg@mailfence.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 7月, 2020 1 次提交
-
-
由 Martin Varghese 提交于
The packets from tunnel devices (eg bareudp) may have only metadata in the dst pointer of skb. Hence a pointer check of neigh_lookup is needed in dst_neigh_lookup_skb Kernel crashes when packets from bareudp device is processed in the kernel neighbour subsytem. [ 133.384484] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 133.385240] #PF: supervisor instruction fetch in kernel mode [ 133.385828] #PF: error_code(0x0010) - not-present page [ 133.386603] PGD 0 P4D 0 [ 133.386875] Oops: 0010 [#1] SMP PTI [ 133.387275] CPU: 0 PID: 5045 Comm: ping Tainted: G W 5.8.0-rc2+ #15 [ 133.388052] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 133.391076] RIP: 0010:0x0 [ 133.392401] Code: Bad RIP value. [ 133.394029] RSP: 0018:ffffb79980003d50 EFLAGS: 00010246 [ 133.396656] RAX: 0000000080000102 RBX: ffff9de2fe0d6600 RCX: ffff9de2fe5e9d00 [ 133.399018] RDX: 0000000000000000 RSI: ffff9de2fe5e9d00 RDI: ffff9de2fc21b400 [ 133.399685] RBP: ffff9de2fe5e9d00 R08: 0000000000000000 R09: 0000000000000000 [ 133.400350] R10: ffff9de2fbc6be22 R11: ffff9de2fe0d6600 R12: ffff9de2fc21b400 [ 133.401010] R13: ffff9de2fe0d6628 R14: 0000000000000001 R15: 0000000000000003 [ 133.401667] FS: 00007fe014918740(0000) GS:ffff9de2fec00000(0000) knlGS:0000000000000000 [ 133.402412] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 133.402948] CR2: ffffffffffffffd6 CR3: 000000003bb72000 CR4: 00000000000006f0 [ 133.403611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 133.404270] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 133.404933] Call Trace: [ 133.405169] <IRQ> [ 133.405367] __neigh_update+0x5a4/0x8f0 [ 133.405734] arp_process+0x294/0x820 [ 133.406076] ? __netif_receive_skb_core+0x866/0xe70 [ 133.406557] arp_rcv+0x129/0x1c0 [ 133.406882] __netif_receive_skb_one_core+0x95/0xb0 [ 133.407340] process_backlog+0xa7/0x150 [ 133.407705] net_rx_action+0x2af/0x420 [ 133.408457] __do_softirq+0xda/0x2a8 [ 133.408813] asm_call_on_stack+0x12/0x20 [ 133.409290] </IRQ> [ 133.409519] do_softirq_own_stack+0x39/0x50 [ 133.410036] do_softirq+0x50/0x60 [ 133.410401] __local_bh_enable_ip+0x50/0x60 [ 133.410871] ip_finish_output2+0x195/0x530 [ 133.411288] ip_output+0x72/0xf0 [ 133.411673] ? __ip_finish_output+0x1f0/0x1f0 [ 133.412122] ip_send_skb+0x15/0x40 [ 133.412471] raw_sendmsg+0x853/0xab0 [ 133.412855] ? insert_pfn+0xfe/0x270 [ 133.413827] ? vvar_fault+0xec/0x190 [ 133.414772] sock_sendmsg+0x57/0x80 [ 133.415685] __sys_sendto+0xdc/0x160 [ 133.416605] ? syscall_trace_enter+0x1d4/0x2b0 [ 133.417679] ? __audit_syscall_exit+0x1d9/0x280 [ 133.418753] ? __prepare_exit_to_usermode+0x5d/0x1a0 [ 133.419819] __x64_sys_sendto+0x24/0x30 [ 133.420848] do_syscall_64+0x4d/0x90 [ 133.421768] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 133.422833] RIP: 0033:0x7fe013689c03 [ 133.423749] Code: Bad RIP value. [ 133.424624] RSP: 002b:00007ffc7288f418 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 133.425940] RAX: ffffffffffffffda RBX: 000056151fc63720 RCX: 00007fe013689c03 [ 133.427225] RDX: 0000000000000040 RSI: 000056151fc63720 RDI: 0000000000000003 [ 133.428481] RBP: 00007ffc72890b30 R08: 000056151fc60500 R09: 0000000000000010 [ 133.429757] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 [ 133.431041] R13: 000056151fc636e0 R14: 000056151fc616bc R15: 0000000000000080 [ 133.432481] Modules linked in: mpls_iptunnel act_mirred act_tunnel_key cls_flower sch_ingress veth mpls_router ip_tunnel bareudp ip6_udp_tunnel udp_tunnel macsec udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc xt_MASQUERADE iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc ebtable_filter ebtables overlay ip6table_filter ip6_tables iptable_filter sunrpc ext4 mbcache jbd2 pcspkr i2c_piix4 virtio_balloon joydev ip_tables xfs libcrc32c ata_generic qxl pata_acpi drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ata_piix libata virtio_net net_failover virtio_console failover virtio_blk i2c_core virtio_pci virtio_ring serio_raw floppy virtio dm_mirror dm_region_hash dm_log dm_mod [ 133.444045] CR2: 0000000000000000 [ 133.445082] ---[ end trace f4aeee1958fd1638 ]--- [ 133.446236] RIP: 0010:0x0 [ 133.447180] Code: Bad RIP value. [ 133.448152] RSP: 0018:ffffb79980003d50 EFLAGS: 00010246 [ 133.449363] RAX: 0000000080000102 RBX: ffff9de2fe0d6600 RCX: ffff9de2fe5e9d00 [ 133.450835] RDX: 0000000000000000 RSI: ffff9de2fe5e9d00 RDI: ffff9de2fc21b400 [ 133.452237] RBP: ffff9de2fe5e9d00 R08: 0000000000000000 R09: 0000000000000000 [ 133.453722] R10: ffff9de2fbc6be22 R11: ffff9de2fe0d6600 R12: ffff9de2fc21b400 [ 133.455149] R13: ffff9de2fe0d6628 R14: 0000000000000001 R15: 0000000000000003 [ 133.456520] FS: 00007fe014918740(0000) GS:ffff9de2fec00000(0000) knlGS:0000000000000000 [ 133.458046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 133.459342] CR2: ffffffffffffffd6 CR3: 000000003bb72000 CR4: 00000000000006f0 [ 133.460782] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 133.462240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 133.463697] Kernel panic - not syncing: Fatal exception in interrupt [ 133.465226] Kernel Offset: 0xfa00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 133.467025] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: aaa0c23c ("Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug") Signed-off-by: NMartin Varghese <martin.varghese@nokia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 7月, 2020 4 次提交
-
-
由 Pablo Neira Ayuso 提交于
This new chain flag specifies that: * the kernel dynamically allocates the chain name, if no chain name is specified. * If the immediate expression that refers to this chain is removed, then this bound chain (and its content) is destroyed. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
This enum definition was never exposed through UAPI. Rename NFT_BASE_CHAIN to NFT_CHAIN_BASE for consistency. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
This netlink attribute allows you to refer to chains inside a transaction as an alternative to the name and the handle. The chain binding support requires this new chain ID approach. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Julian Anastasov 提交于
YangYuxi is reporting that connection reuse is causing one-second delay when SYN hits existing connection in TIME_WAIT state. Such delay was added to give time to expire both the IPVS connection and the corresponding conntrack. This was considered a rare case at that time but it is causing problem for some environments such as Kubernetes. As nf_conntrack_tcp_packet() can decide to release the conntrack in TIME_WAIT state and to replace it with a fresh NEW conntrack, we can use this to allow rescheduling just by tuning our check: if the conntrack is confirmed we can not schedule it to different real server and the one-second delay still applies but if new conntrack was created, we are free to select new real server without any delays. YangYuxi lists some of the problem reports: - One second connection delay in masquerading mode: https://marc.info/?t=151683118100004&r=1&w=2 - IPVS low throughput #70747 https://github.com/kubernetes/kubernetes/issues/70747 - Apache Bench can fill up ipvs service proxy in seconds #544 https://github.com/cloudnativelabs/kube-router/issues/544 - Additional 1s latency in `host -> service IP -> pod` https://github.com/kubernetes/kubernetes/issues/90854 Fixes: f719e375 ("ipvs: drop first packet to redirect conntrack") Co-developed-by: NYangYuxi <yx.atom1@gmail.com> Signed-off-by: NYangYuxi <yx.atom1@gmail.com> Signed-off-by: NJulian Anastasov <ja@ssi.bg> Reviewed-by: NSimon Horman <horms@verge.net.au> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-