- 19 11月, 2019 2 次提交
-
-
由 Steven Rostedt (VMware) 提交于
The ftrace_graph_stub was created and points to ftrace_stub as a way to assign the functon graph tracer function pointer to a stub function with a different prototype than what ftrace_stub has and not trigger the C verifier. The ftrace_graph_stub was created via the linker script vmlinux.lds.h. Unfortunately, powerpc already uses the name ftrace_graph_stub for its internal implementation of the function graph tracer, and even though powerpc would still build, the change via the linker script broke function tracer on powerpc from working. By using the name ftrace_stub_graph, which does not exist anywhere else in the kernel, this should not be a problem. Link: https://lore.kernel.org/r/1573849732.5937.136.camel@lca.pw Fixes: b83b43ff ("fgraph: Fix function type mismatches of ftrace_graph_return using ftrace_stub") Reorted-by: NQian Cai <cai@lca.pw> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
If a direct ftrace callback is at a location that does not have any other ftrace helpers attached to it, it is possible to simply just change the text to call the new caller (if the architecture supports it). But this requires special architecture code. Currently, modify_ftrace_direct() uses a trick to add a stub ftrace callback to the location forcing it to call the ftrace iterator. Then it can change the direct helper to call the new function in C, and then remove the stub. Removing the stub will have the location now call the new location that the direct helper is using. The new helper function does the registering the stub trick, but is a weak function, allowing an architecture to override it to do something a bit more direct. Link: https://lore.kernel.org/r/20191115215125.mbqv7taqnx376yed@ast-mbp.dhcp.thefacebook.comSuggested-by: NAlexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 15 11月, 2019 6 次提交
-
-
由 Steven Rostedt (VMware) 提交于
Add a new function modify_ftrace_direct() that will allow a user to update an existing direct caller to a new trampoline, without missing hits due to unregistering one and then adding another. Link: https://lore.kernel.org/r/20191109022907.6zzo6orhxpt5n2sv@ast-mbp.dhcp.thefacebook.comSuggested-by: NAlexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Piotr Maziarz 提交于
Without this, buffers can be printed with __print_array macro that has no formatting options and can be hard to read. The other way is to mimic formatting capability with multiple calls of trace event with one call per row which gives performance impact and different timestamp in each row. Link: http://lkml.kernel.org/r/1573130738-29390-2-git-send-email-piotrx.maziarz@linux.intel.comSigned-off-by: NPiotr Maziarz <piotrx.maziarz@linux.intel.com> Signed-off-by: NCezary Rojewski <cezary.rojewski@intel.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Piotr Maziarz 提交于
Provided function is an analogue of print_hex_dump(). Implementing this function in seq_buf allows using for multiple purposes (e.g. for tracing) and therefore prevents from code duplication in every layer that uses seq_buf. print_hex_dump() is an essential part of logging data to dmesg. Adding similar capability for other purposes is beneficial to all users. Example usage: seq_buf_hex_dump(seq, "", DUMP_PREFIX_OFFSET, 16, 4, buf, ARRAY_SIZE(buf), true); Example output: 00000000: 00000000 ffffff10 ffffff32 ffff3210 ........2....2.. 00000010: ffff3210 83d00437 c0700000 00000000 .2..7.....p..... 00000020: 02010004 0000000f 0000000f 00004002 .............@.. 00000030: 00000fff 00000000 ........ Link: http://lkml.kernel.org/r/1573130738-29390-1-git-send-email-piotrx.maziarz@linux.intel.comSigned-off-by: NPiotr Maziarz <piotrx.maziarz@linux.intel.com> Signed-off-by: NCezary Rojewski <cezary.rojewski@intel.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Andy Shevchenko 提交于
Comparator function type, cmp_func_t, is defined in the types.h, use it in bsearch() and, thus, add more sense to the corresponding comment in the code. Link: http://lkml.kernel.org/r/20191007135656.37734-2-andriy.shevchenko@linux.intel.comSigned-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Andy Shevchenko 提交于
The function types for swap, cmp and cmp_r functions are already being in use by modules. Move them to types.h that everybody in kernel will be able to use generic types instead of custom ones. This adds more sense to the comment in bsearch() later on. Link: http://lkml.kernel.org/r/20191007135656.37734-1-andriy.shevchenko@linux.intel.comSigned-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
The C compiler is allowing more checks to make sure that function pointers are assigned to the correct prototype function. Unfortunately, the function graph tracer uses a special name with its assigned ftrace_graph_return function pointer that maps to a stub function used by the function tracer (ftrace_stub). The ftrace_graph_return variable is compared to the ftrace_stub in some archs to know if the function graph tracer is enabled or not. This means we can not just simply create a new function stub that compares it without modifying all the archs. Instead, have the linker script create a function_graph_stub that maps to ftrace_stub, and this way we can define the prototype for it to match the prototype of ftrace_graph_return, and make the compiler checks all happy! Link: http://lkml.kernel.org/r/20191015090055.789a0aed@gandalf.local.home Cc: linux-sh@vger.kernel.org Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Reported-by: NSami Tolvanen <samitolvanen@google.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 13 11月, 2019 5 次提交
-
-
由 Divya Indi 提交于
Declare the newly introduced and exported APIs in the header file - include/linux/trace.h. Moving previous declarations from kernel/trace/trace.h to include/linux/trace.h. Link: http://lkml.kernel.org/r/1565805327-579-2-git-send-email-divya.indi@oracle.comSigned-off-by: NDivya Indi <divya.indi@oracle.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
As testing for direct calls from the function graph tracer adds a little overhead (which is a lot when tracing every function), add a counter that can be used to test if function_graph tracer needs to test for a direct caller or not. It would have been nicer if we could use a static branch, but the static branch logic fails when used within the function graph tracer trampoline. Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
Enable x86 to allow for register_ftrace_direct(), where a custom trampoline may be called directly from an ftrace mcount/fentry location. Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
As function_graph tracer modifies the return address to insert a trampoline to trace the return of a function, it must be aware of a direct caller, as when it gets called, the function's return address may not be at on the stack where it expects. It may have to see if that return address points to the a direct caller and adjust if it is. Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
Add the start of the functionality to allow other trampolines to use the ftrace mcount/fentry/nop location. This adds two new functions: register_ftrace_direct() and unregister_ftrace_direct() Both take two parameters: the first is the instruction address of where the mcount/fentry/nop exists, and the second is the trampoline to have that location called. This will handle cases where ftrace is already used on that same location, and will make it still work, where the registered direct called trampoline will get called after all the registered ftrace callers are handled. Currently, it will not allow for IP_MODIFY functions to be called at the same locations, which include some kprobes and live kernel patching. At this point, no architecture supports this. This is only the start of implementing the framework. Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 04 11月, 2019 1 次提交
-
-
由 Miroslav Benes 提交于
Livepatch uses ftrace for redirection to new patched functions. It means that if ftrace is disabled, all live patched functions are disabled as well. Toggling global 'ftrace_enabled' sysctl thus affect it directly. It is not a problem per se, because only administrator can set sysctl values, but it still may be surprising. Introduce PERMANENT ftrace_ops flag to amend this. If the FTRACE_OPS_FL_PERMANENT is set on any ftrace ops, the tracing cannot be disabled by disabling ftrace_enabled. Equally, a callback with the flag set cannot be registered if ftrace_enabled is disabled. Link: http://lkml.kernel.org/r/20191016113316.13415-2-mbenes@suse.czReviewed-by: NPetr Mladek <pmladek@suse.com> Reviewed-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: NMiroslav Benes <mbenes@suse.cz> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 01 11月, 2019 1 次提交
-
-
由 Eric Dumazet 提交于
SOMAXCONN is /proc/sys/net/core/somaxconn default value. It has been defined as 128 more than 20 years ago. Since it caps the listen() backlog values, the very small value has caused numerous problems over the years, and many people had to raise it on their hosts after beeing hit by problems. Google has been using 1024 for at least 15 years, and we increased this to 4096 after TCP listener rework has been completed, more than 4 years ago. We got no complain of this change breaking any legacy application. Many applications indeed setup a TCP listener with listen(fd, -1); meaning they let the system select the backlog. Raising SOMAXCONN lowers chance of the port being unavailable under even small SYNFLOOD attack, and reduces possibilities of side channel vulnerabilities. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Yue Cao <ycao009@ucr.edu> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 10月, 2019 5 次提交
-
-
由 Javier Martinez Canillas 提交于
The driver exposes EFI runtime services to user-space through an IOCTL interface, calling the EFI services function pointers directly without using the efivar API. Disallow access to the /dev/efi_test character device when the kernel is locked down to prevent arbitrary user-space to call EFI runtime services. Also require CAP_SYS_ADMIN to open the chardev to prevent unprivileged users to call the EFI runtime services, instead of just relying on the chardev file mode bits for this. The main user of this driver is the fwts [0] tool that already checks if the effective user ID is 0 and fails otherwise. So this change shouldn't cause any regression to this tool. [0]: https://wiki.ubuntu.com/FirmwareTestSuite/Reference/uefivarinfoSigned-off-by: NJavier Martinez Canillas <javierm@redhat.com> Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: NLaszlo Ersek <lersek@redhat.com> Acked-by: NMatthew Garrett <mjg59@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191029173755.27149-7-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kairui Song 提交于
Currently, kernel fails to boot on some HyperV VMs when using EFI. And it's a potential issue on all x86 platforms. It's caused by broken kernel relocation on EFI systems, when below three conditions are met: 1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR) by the loader. 2. There isn't enough room to contain the kernel, starting from the default load address (eg. something else occupied part the region). 3. In the memmap provided by EFI firmware, there is a memory region starts below LOAD_PHYSICAL_ADDR, and suitable for containing the kernel. EFI stub will perform a kernel relocation when condition 1 is met. But due to condition 2, EFI stub can't relocate kernel to the preferred address, so it fallback to ask EFI firmware to alloc lowest usable memory region, got the low region mentioned in condition 3, and relocated kernel there. It's incorrect to relocate the kernel below LOAD_PHYSICAL_ADDR. This is the lowest acceptable kernel relocation address. The first thing goes wrong is in arch/x86/boot/compressed/head_64.S. Kernel decompression will force use LOAD_PHYSICAL_ADDR as the output address if kernel is located below it. Then the relocation before decompression, which move kernel to the end of the decompression buffer, will overwrite other memory region, as there is no enough memory there. To fix it, just don't let EFI stub relocate the kernel to any address lower than lowest acceptable address. [ ardb: introduce efi_low_alloc_above() to reduce the scope of the change ] Signed-off-by: NKairui Song <kasong@redhat.com> Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191029173755.27149-6-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Eric Dumazet 提交于
We already annotated most accesses to sk->sk_napi_id We missed sk_mark_napi_id() and sk_mark_napi_id_once() which might be called without socket lock held in UDP stack. KCSAN reported : BUG: KCSAN: data-race in udpv6_queue_rcv_one_skb / udpv6_queue_rcv_one_skb write to 0xffff888121c6d108 of 4 bytes by interrupt on cpu 0: sk_mark_napi_id include/net/busy_poll.h:125 [inline] __udpv6_queue_rcv_skb net/ipv6/udp.c:571 [inline] udpv6_queue_rcv_one_skb+0x70c/0xb40 net/ipv6/udp.c:672 udpv6_queue_rcv_skb+0xb5/0x400 net/ipv6/udp.c:689 udp6_unicast_rcv_skb.isra.0+0xd7/0x180 net/ipv6/udp.c:832 __udp6_lib_rcv+0x69c/0x1770 net/ipv6/udp.c:913 udpv6_rcv+0x2b/0x40 net/ipv6/udp.c:1015 ip6_protocol_deliver_rcu+0x22a/0xbe0 net/ipv6/ip6_input.c:409 ip6_input_finish+0x30/0x50 net/ipv6/ip6_input.c:450 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip6_input+0x177/0x190 net/ipv6/ip6_input.c:459 dst_input include/net/dst.h:442 [inline] ip6_rcv_finish+0x110/0x140 net/ipv6/ip6_input.c:76 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ipv6_rcv+0x1a1/0x1b0 net/ipv6/ip6_input.c:284 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124 process_backlog+0x1d3/0x420 net/core/dev.c:5955 napi_poll net/core/dev.c:6392 [inline] net_rx_action+0x3ae/0xa90 net/core/dev.c:6460 write to 0xffff888121c6d108 of 4 bytes by interrupt on cpu 1: sk_mark_napi_id include/net/busy_poll.h:125 [inline] __udpv6_queue_rcv_skb net/ipv6/udp.c:571 [inline] udpv6_queue_rcv_one_skb+0x70c/0xb40 net/ipv6/udp.c:672 udpv6_queue_rcv_skb+0xb5/0x400 net/ipv6/udp.c:689 udp6_unicast_rcv_skb.isra.0+0xd7/0x180 net/ipv6/udp.c:832 __udp6_lib_rcv+0x69c/0x1770 net/ipv6/udp.c:913 udpv6_rcv+0x2b/0x40 net/ipv6/udp.c:1015 ip6_protocol_deliver_rcu+0x22a/0xbe0 net/ipv6/ip6_input.c:409 ip6_input_finish+0x30/0x50 net/ipv6/ip6_input.c:450 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip6_input+0x177/0x190 net/ipv6/ip6_input.c:459 dst_input include/net/dst.h:442 [inline] ip6_rcv_finish+0x110/0x140 net/ipv6/ip6_input.c:76 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ipv6_rcv+0x1a1/0x1b0 net/ipv6/ip6_input.c:284 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124 process_backlog+0x1d3/0x420 net/core/dev.c:5955 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 10890 Comm: syz-executor.0 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: e68b6e50 ("udp: enable busy polling for all sockets") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
This socket field can be read and written by concurrent cpus. Use READ_ONCE() and WRITE_ONCE() annotations to document this, and avoid some compiler 'optimizations'. KCSAN reported : BUG: KCSAN: data-race in tcp_v4_rcv / tcp_v4_rcv write to 0xffff88812220763c of 4 bytes by interrupt on cpu 0: sk_incoming_cpu_update include/net/sock.h:953 [inline] tcp_v4_rcv+0x1b3c/0x1bb0 net/ipv4/tcp_ipv4.c:1934 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:442 [inline] ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124 process_backlog+0x1d3/0x420 net/core/dev.c:5955 napi_poll net/core/dev.c:6392 [inline] net_rx_action+0x3ae/0xa90 net/core/dev.c:6460 __do_softirq+0x115/0x33f kernel/softirq.c:292 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082 do_softirq.part.0+0x6b/0x80 kernel/softirq.c:337 do_softirq kernel/softirq.c:329 [inline] __local_bh_enable_ip+0x76/0x80 kernel/softirq.c:189 read to 0xffff88812220763c of 4 bytes by interrupt on cpu 1: sk_incoming_cpu_update include/net/sock.h:952 [inline] tcp_v4_rcv+0x181a/0x1bb0 net/ipv4/tcp_ipv4.c:1934 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:442 [inline] ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124 process_backlog+0x1d3/0x420 net/core/dev.c:5955 napi_poll net/core/dev.c:6392 [inline] net_rx_action+0x3ae/0xa90 net/core/dev.c:6460 __do_softirq+0x115/0x33f kernel/softirq.c:292 run_ksoftirqd+0x46/0x60 kernel/softirq.c:603 smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Trond Myklebust 提交于
When we're destroying the host transport mechanism, we should ensure that we do not leak memory by failing to release any back channel slots that might still exist. Reported-by: NNeil Brown <neilb@suse.de> Reported-by: Nkbuild test robot <lkp@intel.com> Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 30 10月, 2019 1 次提交
-
-
由 Roi Dayan 提交于
The union should contain the extended dest and counter list. Remove the resevered 0x40 bits which is redundant. This change doesn't break any functionally. Everything works today because the code in fs_cmd.c is using the correct structs if extended dest or the basic dest. Fixes: 1b115498 ("net/mlx5: Introduce extended destination fields") Signed-off-by: NRoi Dayan <roid@mellanox.com> Reviewed-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
-
- 29 10月, 2019 2 次提交
-
-
由 Tejun Heo 提交于
sk_page_frag() optimizes skb_frag allocations by using per-task skb_frag cache when it knows it's the only user. The condition is determined by seeing whether the socket allocation mask allows blocking - if the allocation may block, it obviously owns the task's context and ergo exclusively owns current->task_frag. Unfortunately, this misses recursion through memory reclaim path. Please take a look at the following backtrace. [2] RIP: 0010:tcp_sendmsg_locked+0xccf/0xe10 ... tcp_sendmsg+0x27/0x40 sock_sendmsg+0x30/0x40 sock_xmit.isra.24+0xa1/0x170 [nbd] nbd_send_cmd+0x1d2/0x690 [nbd] nbd_queue_rq+0x1b5/0x3b0 [nbd] __blk_mq_try_issue_directly+0x108/0x1b0 blk_mq_request_issue_directly+0xbd/0xe0 blk_mq_try_issue_list_directly+0x41/0xb0 blk_mq_sched_insert_requests+0xa2/0xe0 blk_mq_flush_plug_list+0x205/0x2a0 blk_flush_plug_list+0xc3/0xf0 [1] blk_finish_plug+0x21/0x2e _xfs_buf_ioapply+0x313/0x460 __xfs_buf_submit+0x67/0x220 xfs_buf_read_map+0x113/0x1a0 xfs_trans_read_buf_map+0xbf/0x330 xfs_btree_read_buf_block.constprop.42+0x95/0xd0 xfs_btree_lookup_get_block+0x95/0x170 xfs_btree_lookup+0xcc/0x470 xfs_bmap_del_extent_real+0x254/0x9a0 __xfs_bunmapi+0x45c/0xab0 xfs_bunmapi+0x15/0x30 xfs_itruncate_extents_flags+0xca/0x250 xfs_free_eofblocks+0x181/0x1e0 xfs_fs_destroy_inode+0xa8/0x1b0 destroy_inode+0x38/0x70 dispose_list+0x35/0x50 prune_icache_sb+0x52/0x70 super_cache_scan+0x120/0x1a0 do_shrink_slab+0x120/0x290 shrink_slab+0x216/0x2b0 shrink_node+0x1b6/0x4a0 do_try_to_free_pages+0xc6/0x370 try_to_free_mem_cgroup_pages+0xe3/0x1e0 try_charge+0x29e/0x790 mem_cgroup_charge_skmem+0x6a/0x100 __sk_mem_raise_allocated+0x18e/0x390 __sk_mem_schedule+0x2a/0x40 [0] tcp_sendmsg_locked+0x8eb/0xe10 tcp_sendmsg+0x27/0x40 sock_sendmsg+0x30/0x40 ___sys_sendmsg+0x26d/0x2b0 __sys_sendmsg+0x57/0xa0 do_syscall_64+0x42/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 In [0], tcp_send_msg_locked() was using current->page_frag when it called sk_wmem_schedule(). It already calculated how many bytes can be fit into current->page_frag. Due to memory pressure, sk_wmem_schedule() called into memory reclaim path which called into xfs and then IO issue path. Because the filesystem in question is backed by nbd, the control goes back into the tcp layer - back into tcp_sendmsg_locked(). nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes sense - it's in the process of freeing memory and wants to be able to, e.g., drop clean pages to make forward progress. However, this confused sk_page_frag() called from [2]. Because it only tests whether the allocation allows blocking which it does, it now thinks current->page_frag can be used again although it already was being used in [0]. After [2] used current->page_frag, the offset would be increased by the used amount. When the control returns to [0], current->page_frag's offset is increased and the previously calculated number of bytes now may overrun the end of allocated memory leading to silent memory corruptions. Fix it by adding gfpflags_normal_context() which tests sleepable && !reclaim and use it to determine whether to use current->task_frag. v2: Eric didn't like gfp flags being tested twice. Introduce a new helper gfpflags_normal_context() and combine the two tests. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Some paths call skb_queue_empty() without holding the queue lock. We must use a barrier in order to not let the compiler do strange things, and avoid KCSAN splats. Adding a barrier in skb_queue_empty() might be overkill, I prefer adding a new helper to clearly identify points where the callers might be lockless. This might help us finding real bugs. The corresponding WRITE_ONCE() should add zero cost for current compilers. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 10月, 2019 2 次提交
-
-
由 Geert Uytterhoeven 提交于
As per POSIX, the correct spelling of the error code is EACCES: include/uapi/asm-generic/errno-base.h:#define EACCES 13 /* Permission denied */ Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Kosina <trivial@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/20191024122904.12463-1-geert+renesas@glider.beSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stefano Garzarella 提交于
The 'work' field was introduced with commit 06a8fc78 ("VSOCK: Introduce virtio_vsock_common.ko") but it is never used in the code, so we can remove it to save memory allocated in the per-packet 'struct virtio_vsock_pkt' Suggested-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NStefano Garzarella <sgarzare@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 26 10月, 2019 2 次提交
-
-
由 Guillaume Nault 提交于
In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances, but there are a few paths calling rtnl_net_notifyid() from atomic context or from RCU critical sections. The later also precludes the use of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new() call is wrong too, as it uses GFP_KERNEL unconditionally. Therefore, we need to pass the GFP flags as parameter and propagate it through function calls until the proper flags can be determined. In most cases, GFP_KERNEL is fine. The exceptions are: * openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump() indirectly call rtnl_net_notifyid() from RCU critical section, * rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as parameter. Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used by nlmsg_new(). The function is allowed to sleep, so better make the flags consistent with the ones used in the following ovs_vport_cmd_fill_info() call. Found by code inspection. Fixes: 9a963454 ("netns: notify netns id events") Signed-off-by: NGuillaume Nault <gnault@redhat.com> Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: NPravin B Shelar <pshelar@ovn.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Dooks (Codethink) 提交于
If CONFIG_NET_HWBM is not set, then these stub functions in <net/hwbm.h> should be declared static to avoid trying to export them from any driver that includes this. Fixes the following sparse warnings: ./include/net/hwbm.h:24:6: warning: symbol 'hwbm_buf_free' was not declared. Should it be static? ./include/net/hwbm.h:25:5: warning: symbol 'hwbm_pool_refill' was not declared. Should it be static? ./include/net/hwbm.h:26:5: warning: symbol 'hwbm_pool_add' was not declared. Should it be static? Signed-off-by: NBen Dooks (Codethink) <ben.dooks@codethink.co.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 10月, 2019 7 次提交
-
-
由 Taehee Yoo 提交于
This patch removes variables and callback these are related to the nested device structure. devices that can be nested have their own nest_level variable that represents the depth of nested devices. In the previous patch, new {lower/upper}_level variables are added and they replace old private nest_level variable. So, this patch removes all 'nest_level' variables. In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added to get lockdep subclass value, which is actually lower nested depth value. But now, they use the dynamic lockdep key to avoid lockdep warning instead of the subclass. So, this patch removes ->ndo_get_lock_subclass() callback. Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taehee Yoo 提交于
Current vxlan code doesn't limit the number of nested devices. Nested devices would be handled recursively and this routine needs huge stack memory. So, unlimited nested devices could make stack overflow. In order to fix this issue, this patch adds adjacent links. The adjacent link APIs internally check the depth level. Test commands: ip link add dummy0 type dummy ip link add vxlan0 type vxlan id 0 group 239.1.1.1 dev dummy0 \ dstport 4789 for i in {1..100} do let A=$i-1 ip link add vxlan$i type vxlan id $i group 239.1.1.1 \ dev vxlan$A dstport 4789 done ip link del dummy0 The top upper link is vxlan100 and the lowest link is vxlan0. When vxlan0 is deleting, the upper devices will be deleted recursively. It needs huge stack memory so it makes stack overflow. Splat looks like: [ 229.628477] ============================================================================= [ 229.629785] BUG page->ptl (Not tainted): Padding overwritten. 0x0000000026abf214-0x0000000091f6abb2 [ 229.629785] ----------------------------------------------------------------------------- [ 229.629785] [ 229.655439] ================================================================== [ 229.629785] INFO: Slab 0x00000000ff7cfda8 objects=19 used=19 fp=0x00000000fe33776c flags=0x200000000010200 [ 229.655688] BUG: KASAN: stack-out-of-bounds in unmap_single_vma+0x25a/0x2e0 [ 229.655688] Read of size 8 at addr ffff888113076928 by task vlan-network-in/2334 [ 229.655688] [ 229.629785] Padding 0000000026abf214: 00 80 14 0d 81 88 ff ff 68 91 81 14 81 88 ff ff ........h....... [ 229.629785] Padding 0000000001e24790: 38 91 81 14 81 88 ff ff 68 91 81 14 81 88 ff ff 8.......h....... [ 229.629785] Padding 00000000b39397c8: 33 30 62 a7 ff ff ff ff ff eb 60 22 10 f1 ff 1f 30b.......`".... [ 229.629785] Padding 00000000bc98f53a: 80 60 07 13 81 88 ff ff 00 80 14 0d 81 88 ff ff .`.............. [ 229.629785] Padding 000000002aa8123d: 68 91 81 14 81 88 ff ff f7 21 17 a7 ff ff ff ff h........!...... [ 229.629785] Padding 000000001c8c2369: 08 81 14 0d 81 88 ff ff 03 02 00 00 00 00 00 00 ................ [ 229.629785] Padding 000000004e290c5d: 21 90 a2 21 10 ed ff ff 00 00 00 00 00 fc ff df !..!............ [ 229.629785] Padding 000000000e25d731: 18 60 07 13 81 88 ff ff c0 8b 13 05 81 88 ff ff .`.............. [ 229.629785] Padding 000000007adc7ab3: b3 8a b5 41 00 00 00 00 ...A.... [ 229.629785] FIX page->ptl: Restoring 0x0000000026abf214-0x0000000091f6abb2=0x5a [ ... ] Fixes: acaf4e70 ("net: vxlan: when lower dev unregisters remove vxlan dev as well") Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taehee Yoo 提交于
In order to link an adjacent node, netdev_upper_dev_link() is used and in order to unlink an adjacent node, netdev_upper_dev_unlink() is used. unlink operation does not fail, but link operation can fail. In order to exchange adjacent nodes, we should unlink an old adjacent node first. then, link a new adjacent node. If link operation is failed, we should link an old adjacent node again. But this link operation can fail too. It eventually breaks the adjacent link relationship. This patch adds an ignore flag into the netdev_adjacent structure. If this flag is set, netdev_upper_dev_link() ignores an old adjacent node for a moment. This patch also adds new functions for other modules. netdev_adjacent_change_prepare() netdev_adjacent_change_commit() netdev_adjacent_change_abort() netdev_adjacent_change_prepare() inserts new device into adjacent list but new device is not allowed to use immediately. If netdev_adjacent_change_prepare() fails, it internally rollbacks adjacent list so that we don't need any other action. netdev_adjacent_change_commit() deletes old device in the adjacent list and allows new device to use. netdev_adjacent_change_abort() rollbacks adjacent list. Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taehee Yoo 提交于
team interface could be nested and it's lock variable could be nested too. But this lock uses static lockdep key and there is no nested locking handling code such as mutex_lock_nested() and so on. so the Lockdep would warn about the circular locking scenario that couldn't happen. In order to fix, this patch makes the team module to use dynamic lock key instead of static key. Test commands: ip link add team0 type team ip link add team1 type team ip link set team0 master team1 ip link set team0 nomaster ip link set team1 master team0 ip link set team1 nomaster Splat that looks like: [ 40.364352] WARNING: possible recursive locking detected [ 40.364964] 5.4.0-rc3+ #96 Not tainted [ 40.365405] -------------------------------------------- [ 40.365973] ip/750 is trying to acquire lock: [ 40.366542] ffff888060b34c40 (&team->lock){+.+.}, at: team_set_mac_address+0x151/0x290 [team] [ 40.367689] but task is already holding lock: [ 40.368729] ffff888051201c40 (&team->lock){+.+.}, at: team_del_slave+0x29/0x60 [team] [ 40.370280] other info that might help us debug this: [ 40.371159] Possible unsafe locking scenario: [ 40.371942] CPU0 [ 40.372338] ---- [ 40.372673] lock(&team->lock); [ 40.373115] lock(&team->lock); [ 40.373549] *** DEADLOCK *** [ 40.374432] May be due to missing lock nesting notation [ 40.375338] 2 locks held by ip/750: [ 40.375851] #0: ffffffffabcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0 [ 40.376927] #1: ffff888051201c40 (&team->lock){+.+.}, at: team_del_slave+0x29/0x60 [team] [ 40.377989] stack backtrace: [ 40.378650] CPU: 0 PID: 750 Comm: ip Not tainted 5.4.0-rc3+ #96 [ 40.379368] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 40.380574] Call Trace: [ 40.381208] dump_stack+0x7c/0xbb [ 40.381959] __lock_acquire+0x269d/0x3de0 [ 40.382817] ? register_lock_class+0x14d0/0x14d0 [ 40.383784] ? check_chain_key+0x236/0x5d0 [ 40.384518] lock_acquire+0x164/0x3b0 [ 40.385074] ? team_set_mac_address+0x151/0x290 [team] [ 40.385805] __mutex_lock+0x14d/0x14c0 [ 40.386371] ? team_set_mac_address+0x151/0x290 [team] [ 40.387038] ? team_set_mac_address+0x151/0x290 [team] [ 40.387632] ? mutex_lock_io_nested+0x1380/0x1380 [ 40.388245] ? team_del_slave+0x60/0x60 [team] [ 40.388752] ? rcu_read_lock_sched_held+0x90/0xc0 [ 40.389304] ? rcu_read_lock_bh_held+0xa0/0xa0 [ 40.389819] ? lock_acquire+0x164/0x3b0 [ 40.390285] ? lockdep_rtnl_is_held+0x16/0x20 [ 40.390797] ? team_port_get_rtnl+0x90/0xe0 [team] [ 40.391353] ? __module_text_address+0x13/0x140 [ 40.391886] ? team_set_mac_address+0x151/0x290 [team] [ 40.392547] team_set_mac_address+0x151/0x290 [team] [ 40.393111] dev_set_mac_address+0x1f0/0x3f0 [ ... ] Fixes: 3d249d4c ("net: introduce ethernet teaming device") Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taehee Yoo 提交于
All bonding device has same lockdep key and subclass is initialized with nest_level. But actual nest_level value can be changed when a lower device is attached. And at this moment, the subclass should be updated but it seems to be unsafe. So this patch makes bonding use dynamic lockdep key instead of the subclass. Test commands: ip link add bond0 type bond for i in {1..5} do let A=$i-1 ip link add bond$i type bond ip link set bond$i master bond$A done ip link set bond5 master bond0 Splat looks like: [ 307.992912] WARNING: possible recursive locking detected [ 307.993656] 5.4.0-rc3+ #96 Tainted: G W [ 307.994367] -------------------------------------------- [ 307.995092] ip/761 is trying to acquire lock: [ 307.995710] ffff8880513aac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding] [ 307.997045] but task is already holding lock: [ 307.997923] ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding] [ 307.999215] other info that might help us debug this: [ 308.000251] Possible unsafe locking scenario: [ 308.001137] CPU0 [ 308.001533] ---- [ 308.001915] lock(&(&bond->stats_lock)->rlock#2/2); [ 308.002609] lock(&(&bond->stats_lock)->rlock#2/2); [ 308.003302] *** DEADLOCK *** [ 308.004310] May be due to missing lock nesting notation [ 308.005319] 3 locks held by ip/761: [ 308.005830] #0: ffffffff9fcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0 [ 308.006894] #1: ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding] [ 308.008243] #2: ffffffff9f9219c0 (rcu_read_lock){....}, at: bond_get_stats+0x9f/0x500 [bonding] [ 308.009422] stack backtrace: [ 308.010124] CPU: 0 PID: 761 Comm: ip Tainted: G W 5.4.0-rc3+ #96 [ 308.011097] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 308.012179] Call Trace: [ 308.012601] dump_stack+0x7c/0xbb [ 308.013089] __lock_acquire+0x269d/0x3de0 [ 308.013669] ? register_lock_class+0x14d0/0x14d0 [ 308.014318] lock_acquire+0x164/0x3b0 [ 308.014858] ? bond_get_stats+0xb8/0x500 [bonding] [ 308.015520] _raw_spin_lock_nested+0x2e/0x60 [ 308.016129] ? bond_get_stats+0xb8/0x500 [bonding] [ 308.017215] bond_get_stats+0xb8/0x500 [bonding] [ 308.018454] ? bond_arp_rcv+0xf10/0xf10 [bonding] [ 308.019710] ? rcu_read_lock_held+0x90/0xa0 [ 308.020605] ? rcu_read_lock_sched_held+0xc0/0xc0 [ 308.021286] ? bond_get_stats+0x9f/0x500 [bonding] [ 308.021953] dev_get_stats+0x1ec/0x270 [ 308.022508] bond_get_stats+0x1d1/0x500 [bonding] Fixes: d3fff6c4 ("net: add netdev_lockdep_set_classes() helper") Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taehee Yoo 提交于
Some interface types could be nested. (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..) These interface types should set lockdep class because, without lockdep class key, lockdep always warn about unexisting circular locking. In the current code, these interfaces have their own lockdep class keys and these manage itself. So that there are so many duplicate code around the /driver/net and /net/. This patch adds new generic lockdep keys and some helper functions for it. This patch does below changes. a) Add lockdep class keys in struct net_device - qdisc_running, xmit, addr_list, qdisc_busylock - these keys are used as dynamic lockdep key. b) When net_device is being allocated, lockdep keys are registered. - alloc_netdev_mqs() c) When net_device is being free'd llockdep keys are unregistered. - free_netdev() d) Add generic lockdep key helper function - netdev_register_lockdep_key() - netdev_unregister_lockdep_key() - netdev_update_lockdep_key() e) Remove unnecessary generic lockdep macro and functions f) Remove unnecessary lockdep code of each interfaces. After this patch, each interface modules don't need to maintain their lockdep keys. Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taehee Yoo 提交于
Current code doesn't limit the number of nested devices. Nested devices would be handled recursively and this needs huge stack memory. So, unlimited nested devices could make stack overflow. This patch adds upper_level and lower_level, they are common variables and represent maximum lower/upper depth. When upper/lower device is attached or dettached, {lower/upper}_level are updated. and if maximum depth is bigger than 8, attach routine fails and returns -EMLINK. In addition, this patch converts recursive routine of netdev_walk_all_{lower/upper} to iterator routine. Test commands: ip link add dummy0 type dummy ip link add link dummy0 name vlan1 type vlan id 1 ip link set vlan1 up for i in {2..55} do let A=$i-1 ip link add vlan$i link vlan$A type vlan id $i done ip link del dummy0 Splat looks like: [ 155.513226][ T908] BUG: KASAN: use-after-free in __unwind_start+0x71/0x850 [ 155.514162][ T908] Write of size 88 at addr ffff8880608a6cc0 by task ip/908 [ 155.515048][ T908] [ 155.515333][ T908] CPU: 0 PID: 908 Comm: ip Not tainted 5.4.0-rc3+ #96 [ 155.516147][ T908] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 155.517233][ T908] Call Trace: [ 155.517627][ T908] [ 155.517918][ T908] Allocated by task 0: [ 155.518412][ T908] (stack is not available) [ 155.518955][ T908] [ 155.519228][ T908] Freed by task 0: [ 155.519885][ T908] (stack is not available) [ 155.520452][ T908] [ 155.520729][ T908] The buggy address belongs to the object at ffff8880608a6ac0 [ 155.520729][ T908] which belongs to the cache names_cache of size 4096 [ 155.522387][ T908] The buggy address is located 512 bytes inside of [ 155.522387][ T908] 4096-byte region [ffff8880608a6ac0, ffff8880608a7ac0) [ 155.523920][ T908] The buggy address belongs to the page: [ 155.524552][ T908] page:ffffea0001822800 refcount:1 mapcount:0 mapping:ffff88806c657cc0 index:0x0 compound_mapcount:0 [ 155.525836][ T908] flags: 0x100000000010200(slab|head) [ 155.526445][ T908] raw: 0100000000010200 ffffea0001813808 ffffea0001a26c08 ffff88806c657cc0 [ 155.527424][ T908] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 [ 155.528429][ T908] page dumped because: kasan: bad access detected [ 155.529158][ T908] [ 155.529410][ T908] Memory state around the buggy address: [ 155.530060][ T908] ffff8880608a6b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 155.530971][ T908] ffff8880608a6c00: fb fb fb fb fb f1 f1 f1 f1 00 f2 f2 f2 f3 f3 f3 [ 155.531889][ T908] >ffff8880608a6c80: f3 fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 155.532806][ T908] ^ [ 155.533509][ T908] ffff8880608a6d00: fb fb fb fb fb fb fb fb fb f1 f1 f1 f1 00 00 00 [ 155.534436][ T908] ffff8880608a6d80: f2 f3 f3 f3 f3 fb fb fb 00 00 00 00 00 00 00 00 [ ... ] Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 10月, 2019 2 次提交
-
-
由 Eric Dumazet 提交于
syzbot reported the following issue : BUG: KCSAN: data-race in update_defense_level / update_defense_level read to 0xffffffff861a6260 of 4 bytes by task 3006 on cpu 1: update_defense_level+0x621/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:177 defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225 process_one_work+0x3d4/0x890 kernel/workqueue.c:2269 worker_thread+0xa0/0x800 kernel/workqueue.c:2415 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 write to 0xffffffff861a6260 of 4 bytes by task 7333 on cpu 0: update_defense_level+0xa62/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:205 defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225 process_one_work+0x3d4/0x890 kernel/workqueue.c:2269 worker_thread+0xa0/0x800 kernel/workqueue.c:2415 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 7333 Comm: kworker/0:5 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events defense_work_handler Indeed, old_secure_tcp is currently a static variable, while it needs to be a per netns variable. Fixes: a0840e2e ("IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
由 Eric Dumazet 提交于
UDP IPv6 packets auto flowlabels are using a 32bit secret (static u32 hashrnd in net/core/flow_dissector.c) and apply jhash() over fields known by the receivers. Attackers can easily infer the 32bit secret and use this information to identify a device and/or user, since this 32bit secret is only set at boot time. Really, using jhash() to generate cookies sent on the wire is a serious security concern. Trying to change the rol32(hash, 16) in ip6_make_flowlabel() would be a dead end. Trying to periodically change the secret (like in sch_sfq.c) could change paths taken in the network for long lived flows. Let's switch to siphash, as we did in commit df453700 ("inet: switch IP ID generator to siphash") Using a cryptographically strong pseudo random function will solve this privacy issue and more generally remove other weak points in the stack. Packet schedulers using skb_get_hash_perturb() benefit from this change. Fixes: b5677416 ("ipv6: Enable auto flow labels by default") Fixes: 42240901 ("ipv6: Implement different admin modes for automatic flow labels") Fixes: 67800f9b ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel") Fixes: cb1ce2ef ("ipv6: Implement automatic flow label generation on transmit") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NJonathan Berger <jonathann1@walla.com> Reported-by: NAmit Klein <aksecurity@gmail.com> Reported-by: NBenny Pinkas <benny@pinkas.net> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 10月, 2019 4 次提交
-
-
由 Alan Somers 提交于
Retroactively add changelog entry for FUSE protocols 7.1 through 7.8. Signed-off-by: NAlan Somers <asomers@FreeBSD.org> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Arnd Bergmann 提交于
The ionic driver started using dymamic_hex_dump(), but that is not always defined: drivers/net/ethernet/pensando/ionic/ionic_main.c:229:2: error: implicit declaration of function 'dynamic_hex_dump' [-Werror,-Wimplicit-function-declaration] Add a dummy implementation to use when CONFIG_DYNAMIC_DEBUG is disabled, printing nothing. Fixes: 938962d5 ("ionic: Add adminq action") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NShannon Nelson <snelson@pensando.io> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
-
由 Daniel Borkmann 提交于
syzkaller managed to trigger the following crash: [...] BUG: unable to handle page fault for address: ffffc90001923030 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD aa551067 P4D aa551067 PUD aa552067 PMD a572b067 PTE 80000000a1173163 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 7982 Comm: syz-executor912 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:bpf_jit_binary_hdr include/linux/filter.h:787 [inline] RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:531 [inline] RIP: 0010:bpf_tree_comp kernel/bpf/core.c:600 [inline] RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline] RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline] RIP: 0010:bpf_prog_kallsyms_find kernel/bpf/core.c:674 [inline] RIP: 0010:is_bpf_text_address+0x184/0x3b0 kernel/bpf/core.c:709 [...] Call Trace: kernel_text_address kernel/extable.c:147 [inline] __kernel_text_address+0x9a/0x110 kernel/extable.c:102 unwind_get_return_address+0x4c/0x90 arch/x86/kernel/unwind_frame.c:19 arch_stack_walk+0x98/0xe0 arch/x86/kernel/stacktrace.c:26 stack_trace_save+0xb6/0x150 kernel/stacktrace.c:123 save_stack mm/kasan/common.c:69 [inline] set_track mm/kasan/common.c:77 [inline] __kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:510 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:518 slab_post_alloc_hook mm/slab.h:584 [inline] slab_alloc mm/slab.c:3319 [inline] kmem_cache_alloc+0x1f5/0x2e0 mm/slab.c:3483 getname_flags+0xba/0x640 fs/namei.c:138 getname+0x19/0x20 fs/namei.c:209 do_sys_open+0x261/0x560 fs/open.c:1091 __do_sys_open fs/open.c:1115 [inline] __se_sys_open fs/open.c:1110 [inline] __x64_sys_open+0x87/0x90 fs/open.c:1110 do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe [...] After further debugging it turns out that we walk kallsyms while in parallel we tear down a BPF program which contains subprograms that have been JITed though the program itself has not been fully exposed and is eventually bailing out with error. The bpf_prog_kallsyms_del_subprogs() in bpf_prog_load()'s error path removes the symbols, however, bpf_prog_free() tears down the JIT memory too early via scheduled work. Instead, it needs to properly respect RCU grace period as the kallsyms walk for BPF is under RCU. Fix it by refactoring __bpf_prog_put()'s tear down and reuse it in our error path where we defer final destruction when we have subprogs in the program. Fixes: 7d1982b4 ("bpf: fix panic in prog load calls cleanup") Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs") Reported-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Tested-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com Link: https://lore.kernel.org/bpf/55f6367324c2d7e9583fa9ccf5385dcbba0d7a6e.1571752452.git.daniel@iogearbox.net
-
由 Dan Carpenter 提交于
The issue is in drivers/infiniband/core/uverbs_std_types_cq.c in the UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE) function. We check that: if (attr.comp_vector >= attrs->ufile->device->num_comp_vectors) { But we don't check if "attr.comp_vector" is negative. It could potentially lead to an array underflow. My concern would be where cq->vector is used in the create_cq() function from the cxgb4 driver. And really "attr.comp_vector" is appears as a u32 to user space so that's the right type to use. Fixes: 9ee79fce ("IB/core: Add completion queue (cq) object actions") Link: https://lore.kernel.org/r/20191011133419.GA22905@mwandaSigned-off-by: NDan Carpenter <dan.carpenter@oracle.com> Reviewed-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-