- 14 12月, 2019 4 次提交
-
-
由 Björn Töpel 提交于
>From Intel 64 and IA-32 Architectures Optimization Reference Manual, 3.4.1.4 Code Alignment, Assembly/Compiler Coding Rule 11: All branch targets should be 16-byte aligned. This commits aligns branch targets according to the Intel manual. The nops used to align branch targets make the dispatcher larger, and therefore the number of supported dispatch points/programs are descreased from 64 to 48. Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-7-bjorn.topel@gmail.com
-
由 Björn Töpel 提交于
This commit adds a BPF dispatcher for XDP. The dispatcher is updated from the XDP control-path, dev_xdp_install(), and used when an XDP program is run via bpf_prog_run_xdp(). Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-4-bjorn.topel@gmail.com
-
由 Björn Töpel 提交于
The BPF dispatcher is a multi-way branch code generator, mainly targeted for XDP programs. When an XDP program is executed via the bpf_prog_run_xdp(), it is invoked via an indirect call. The indirect call has a substantial performance impact, when retpolines are enabled. The dispatcher transform indirect calls to direct calls, and therefore avoids the retpoline. The dispatcher is generated using the BPF JIT, and relies on text poking provided by bpf_arch_text_poke(). The dispatcher hijacks a trampoline function it via the __fentry__ nop of the trampoline. One dispatcher instance currently supports up to 64 dispatch points. A user creates a dispatcher with its corresponding trampoline with the DEFINE_BPF_DISPATCHER macro. Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-3-bjorn.topel@gmail.com
-
由 Björn Töpel 提交于
Refactor the image allocation in the BPF trampoline code into a separate function, so it can be shared with the BPF dispatcher in upcoming commits. Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-2-bjorn.topel@gmail.com
-
- 08 12月, 2019 2 次提交
-
-
由 Linus Torvalds 提交于
This code is ancient, and goes back to when we only had a single page for the pipe buffers. The exact history is hidden in the mists of time (ie "before git", and in fact predates the BK repository too). At that long-ago point in time, it actually helped to try to merge big back-and-forth pipe reads and writes, and not limit pipe reads to the single pipe buffer in length just because that was all we had at a time. However, since then we've expanded the pipe buffers to multiple pages, and this logic really doesn't seem to make sense. And a lot of it is somewhat questionable (ie "hmm, the user asked for a non-blocking read, but we see that there's a writer pending, so let's wait anyway to get the extra data that the writer will have"). But more importantly, it makes the "go to sleep" logic much less obvious, and considering the wakeup issues we've had, I want to make for less of those kinds of things. Cc: David Howells <dhowells@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Dumazet 提交于
syzbot was once again able to crash a host by setting a very small mtu on loopback device. Let's make inetdev_valid_mtu() available in include/net/ip.h, and use it in ip_setup_cork(), so that we protect both ip_append_page() and __ip_append_data() Also add a READ_ONCE() when the device mtu is read. Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(), even if other code paths might write over this field. Add a big comment in include/linux/netdevice.h about dev->mtu needing READ_ONCE()/WRITE_ONCE() annotations. Hopefully we will add the missing ones in followup patches. [1] refcount_t: saturated; leaking memory. WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 panic+0x2e3/0x75c kernel/panic.c:221 __warn.cold+0x2f/0x3e kernel/panic.c:582 report_bug+0x289/0x300 lib/bug.c:195 fixup_bug arch/x86/kernel/traps.c:174 [inline] fixup_bug arch/x86/kernel/traps.c:169 [inline] do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286 invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027 RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22 Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd <0f> 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89 RSP: 0018:ffff88809689f550 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1 R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001 R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40 refcount_add include/linux/refcount.h:193 [inline] skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999 sock_wmalloc+0xf1/0x120 net/core/sock.c:2096 ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383 udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276 inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821 kernel_sendpage+0x92/0xf0 net/socket.c:3794 sock_sendpage+0x8b/0xc0 net/socket.c:936 pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458 splice_from_pipe_feed fs/splice.c:512 [inline] __splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636 splice_from_pipe+0x108/0x170 fs/splice.c:671 generic_splice_sendpage+0x3c/0x50 fs/splice.c:842 do_splice_from fs/splice.c:861 [inline] direct_splice_actor+0x123/0x190 fs/splice.c:1035 splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990 do_splice_direct+0x1da/0x2a0 fs/splice.c:1078 do_sendfile+0x597/0xd00 fs/read_write.c:1464 __do_sys_sendfile64 fs/read_write.c:1525 [inline] __se_sys_sendfile64 fs/read_write.c:1511 [inline] __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x441409 Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409 RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005 RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010 R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180 R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000 Kernel Offset: disabled Rebooting in 86400 seconds.. Fixes: 1470ddf7 ("inet: Remove explicit write references to sk/inet in ip_append_data") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 12月, 2019 1 次提交
-
-
由 Guillaume Nault 提交于
If no synflood happens for a long enough period of time, then the synflood timestamp isn't refreshed and jiffies can advance so much that time_after32() can't accurately compare them any more. Therefore, we can end up in a situation where time_after32(now, last_overflow + HZ) returns false, just because these two values are too far apart. In that case, the synflood timestamp isn't updated as it should be, which can trick tcp_synq_no_recent_overflow() into rejecting valid syncookies. For example, let's consider the following scenario on a system with HZ=1000: * The synflood timestamp is 0, either because that's the timestamp of the last synflood or, more commonly, because we're working with a freshly created socket. * We receive a new SYN, which triggers synflood protection. Let's say that this happens when jiffies == 2147484649 (that is, 'synflood timestamp' + HZ + 2^31 + 1). * Then tcp_synq_overflow() doesn't update the synflood timestamp, because time_after32(2147484649, 1000) returns false. With: - 2147484649: the value of jiffies, aka. 'now'. - 1000: the value of 'last_overflow' + HZ. * A bit later, we receive the ACK completing the 3WHS. But cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow() says that we're not under synflood. That's because time_after32(2147484649, 120000) returns false. With: - 2147484649: the value of jiffies, aka. 'now'. - 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID. Of course, in reality jiffies would have increased a bit, but this condition will last for the next 119 seconds, which is far enough to accommodate for jiffie's growth. Fix this by updating the overflow timestamp whenever jiffies isn't within the [last_overflow, last_overflow + HZ] range. That shouldn't have any performance impact since the update still happens at most once per second. Now we're guaranteed to have fresh timestamps while under synflood, so tcp_synq_no_recent_overflow() can safely use it with time_after32() in such situations. Stale timestamps can still make tcp_synq_no_recent_overflow() return the wrong verdict when not under synflood. This will be handled in the next patch. For 64 bits architectures, the problem was introduced with the conversion of ->tw_ts_recent_stamp to 32 bits integer by commit cca9bab1 ("tcp: use monotonic timestamps for PAWS"). The problem has always been there on 32 bits architectures. Fixes: cca9bab1 ("tcp: use monotonic timestamps for PAWS") Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 12月, 2019 16 次提交
-
-
由 Mike Rapoport 提交于
There are no architectures that use include/asm-generic/4level-fixup.h therefore it can be removed along with __ARCH_HAS_4LEVEL_HACK define. Link: http://lkml.kernel.org/r/1572938135-31886-14-git-send-email-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com> Cc: Anatoly Pugachev <matorola@gmail.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Peter Rosin <peda@axentia.se> Cc: Richard Weinberger <richard@nod.at> Cc: Rolf Eike Beer <eike-kernel@sf-tec.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Sam Creasey <sammy@sammy.net> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andy Shevchenko 提交于
In some drivers we want to have a single operation over bitmap which is an equivalent to: *dst = (*old & ~(*mask)) | (*new & *mask) Introduce bitmap_replace() helper for this. Link: http://lkml.kernel.org/r/20191022172922.61232-8-andriy.shevchenko@linux.intel.comSigned-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: NLinus Walleij <linus.walleij@linaro.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Marek Vasut <marek.vasut+renesas@gmail.com> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Cc: William Breathitt Gray <vilhelm.gray@gmail.com> Cc: Yury Norov <yury.norov@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Konovalov 提交于
Patch series " kcov: collect coverage from usb and vhost", v3. This patchset extends kcov to allow collecting coverage from backgound kernel threads. This extension requires custom annotations for each of the places where coverage collection is desired. This patchset implements this for hub events in the USB subsystem and for vhost workers. See the first patch description for details about the kcov extension. The other two patches apply this kcov extension to USB and vhost. Examples of other subsystems that might potentially benefit from this when custom annotations are added (the list is based on process_one_work() callers for bugs recently reported by syzbot): 1. fs: writeback wb_workfn() worker, 2. net: addrconf_dad_work()/addrconf_verify_work() workers, 3. net: neigh_periodic_work() worker, 4. net/p9: p9_write_work()/p9_read_work() workers, 5. block: blk_mq_run_work_fn() worker. These patches have been used to enable coverage-guided USB fuzzing with syzkaller for the last few years, see the details here: https://github.com/google/syzkaller/blob/master/docs/linux/external_fuzzing_usb.md This patchset has been pushed to the public Linux kernel Gerrit instance: https://linux-review.googlesource.com/c/linux/kernel/git/torvalds/linux/+/1524 This patch (of 3): Add background thread coverage collection ability to kcov. With KCOV_ENABLE coverage is collected only for syscalls that are issued from the current process. With KCOV_REMOTE_ENABLE it's possible to collect coverage for arbitrary parts of the kernel code, provided that those parts are annotated with kcov_remote_start()/kcov_remote_stop(). This allows to collect coverage from two types of kernel background threads: the global ones, that are spawned during kernel boot in a limited number of instances (e.g. one USB hub_event() worker thread is spawned per USB HCD); and the local ones, that are spawned when a user interacts with some kernel interface (e.g. vhost workers). To enable collecting coverage from a global background thread, a unique global handle must be assigned and passed to the corresponding kcov_remote_start() call. Then a userspace process can pass a list of such handles to the KCOV_REMOTE_ENABLE ioctl in the handles array field of the kcov_remote_arg struct. This will attach the used kcov device to the code sections, that are referenced by those handles. Since there might be many local background threads spawned from different userspace processes, we can't use a single global handle per annotation. Instead, the userspace process passes a non-zero handle through the common_handle field of the kcov_remote_arg struct. This common handle gets saved to the kcov_handle field in the current task_struct and needs to be passed to the newly spawned threads via custom annotations. Those threads should in turn be annotated with kcov_remote_start()/kcov_remote_stop(). Internally kcov stores handles as u64 integers. The top byte of a handle is used to denote the id of a subsystem that this handle belongs to, and the lower 4 bytes are used to denote the id of a thread instance within that subsystem. A reserved value 0 is used as a subsystem id for common handles as they don't belong to a particular subsystem. The bytes 4-7 are currently reserved and must be zero. In the future the number of bytes used for the subsystem or handle ids might be increased. When a particular userspace process collects coverage by via a common handle, kcov will collect coverage for each code section that is annotated to use the common handle obtained as kcov_handle from the current task_struct. However non common handles allow to collect coverage selectively from different subsystems. Link: http://lkml.kernel.org/r/e90e315426a384207edbec1d6aa89e43008e4caf.1572366574.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: David Windsor <dwindsor@gmail.com> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: Alexander Potapenko <glider@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
As we've done with VFS, string operations, etc, reject usercopy sizes larger than INT_MAX, which would be nice to have for catching bugs related to size calculation overflows[1]. This adds 10 bytes to x86_64 defconfig text and 1980 bytes to the data section: text data bss dec hex filename 19691167 5134320 1646664 26472151 193eed7 vmlinux.before 19691177 5136300 1646664 26474141 193f69d vmlinux.after [1] https://marc.info/?l=linux-s390&m=156631939010493&w=2 Link: http://lkml.kernel.org/r/201908251612.F9902D7A@keescookSigned-off-by: NKees Cook <keescook@chromium.org> Suggested-by: NDan Carpenter <dan.carpenter@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Huang Shijie 提交于
Follow the kernel conventions, rename addr_in_gen_pool to gen_pool_has_addr. [sjhuang@iluvatar.ai: fix Documentation/ too] Link: http://lkml.kernel.org/r/20181229015914.5573-1-sjhuang@iluvatar.ai Link: http://lkml.kernel.org/r/20181228083950.20398-1-sjhuang@iluvatar.aiSigned-off-by: NHuang Shijie <sjhuang@iluvatar.ai> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
After move parent assignment out, we can check the color directly. Link: http://lkml.kernel.org/r/20191028021442.5450-2-richardw.yang@linux.intel.comSigned-off-by: NWei Yang <richardw.yang@linux.intel.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NMichel Lespinasse <walken@google.com> Reviewed-by: NDavidlohr Bueso <dbueso@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
Both in Case 2 and 3, we exchange n and s. This mean no matter whether child2 is NULL or not, successor's parent should be assigned to node's. This patch takes this step out to make it explicit and reduce the ambiguity. Besides, this step reduces some symbol size like rb_erase(). KERN_CONFIG upstream patched OPT_FOR_PERF 877 870 OPT_FOR_SIZE 635 621 Link: http://lkml.kernel.org/r/20191028021442.5450-1-richardw.yang@linux.intel.comSigned-off-by: NWei Yang <richardw.yang@linux.intel.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NMichel Lespinasse <walken@google.com> Reviewed-by: NDavidlohr Bueso <dbueso@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 William Breathitt Gray 提交于
Pach series "Introduce the for_each_set_clump8 macro", v18. While adding GPIO get_multiple/set_multiple callback support for various drivers, I noticed a pattern of looping manifesting that would be useful standardized as a macro. This patchset introduces the for_each_set_clump8 macro and utilizes it in several GPIO drivers. The for_each_set_clump macro8 facilitates a for-loop syntax that iterates over a memory region entire groups of set bits at a time. For example, suppose you would like to iterate over a 32-bit integer 8 bits at a time, skipping over 8-bit groups with no set bit, where XXXXXXXX represents the current 8-bit group: Example: 10111110 00000000 11111111 00110011 First loop: 10111110 00000000 11111111 XXXXXXXX Second loop: 10111110 00000000 XXXXXXXX 00110011 Third loop: XXXXXXXX 00000000 11111111 00110011 Each iteration of the loop returns the next 8-bit group that has at least one set bit. The for_each_set_clump8 macro has four parameters: * start: set to the bit offset of the current clump * clump: set to the current clump value * bits: bitmap to search within * size: bitmap size in number of bits In this version of the patchset, the for_each_set_clump macro has been reimplemented and simplified based on the suggestions provided by Rasmus Villemoes and Andy Shevchenko in the version 4 submission. In particular, the function of the for_each_set_clump macro has been restricted to handle only 8-bit clumps; the drivers that use the for_each_set_clump macro only handle 8-bit ports so a generic for_each_set_clump implementation is not necessary. Thus, a solution for large clumps (i.e. those larger than the width of a bitmap word) can be postponed until a driver appears that actually requires such a generic for_each_set_clump implementation. For what it's worth, a semi-generic for_each_set_clump (i.e. for clumps smaller than the width of a bitmap word) can be implemented by simply replacing the hardcoded '8' and '0xFF' instances with respective variables. I have not yet had a need for such an implementation, and since it falls short of a true generic for_each_set_clump function, I have decided to forgo such an implementation for now. In addition, the bitmap_get_value8 and bitmap_set_value8 functions are introduced to get and set 8-bit values respectively. Their use is based on the behavior suggested in the patchset version 4 review. This patch (of 14): This macro iterates for each 8-bit group of bits (clump) with set bits, within a bitmap memory region. For each iteration, "start" is set to the bit offset of the found clump, while the respective clump value is stored to the location pointed by "clump". Additionally, the bitmap_get_value8 and bitmap_set_value8 functions are introduced to respectively get and set an 8-bit value in a bitmap memory region. [gustavo@embeddedor.com: fix potential sign-extension overflow] Link: http://lkml.kernel.org/r/20191015184657.GA26541@embeddedor [akpm@linux-foundation.org: s/ULL/UL/, per Joe] [vilhelm.gray@gmail.com: add for_each_set_clump8 documentation] Link: http://lkml.kernel.org/r/20191016161825.301082-1-vilhelm.gray@gmail.com Link: http://lkml.kernel.org/r/893c3b4f03266c9496137cc98ac2b1bd27f92c73.1570641097.git.vilhelm.gray@gmail.comSigned-off-by: NWilliam Breathitt Gray <vilhelm.gray@gmail.com> Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Suggested-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Suggested-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Suggested-by: NLukas Wunner <lukas@wunner.de> Tested-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Phil Reid <preid@electromag.com.au> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Mathias Duckeck <m.duckeck@kunbus.de> Cc: Morten Hein Tiljeset <morten.tiljeset@prevas.dk> Cc: Sean Nyekjaer <sean.nyekjaer@prevas.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Xiaoming Ni 提交于
blocking_notifier_chain_cond_register() does not consider system_booting state, which is the only difference between this function and blocking_notifier_cain_register(). This can be a bug and is a piece of duplicate code. Delete blocking_notifier_chain_cond_register() Link: http://lkml.kernel.org/r/1568861888-34045-4-git-send-email-nixiaoming@huawei.comSigned-off-by: NXiaoming Ni <nixiaoming@huawei.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Sam Protsenko <semen.protsenko@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vasily Averin <vvs@virtuozzo.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andy Shevchenko 提交于
There were discussions in the past about use cases for simple_strto<foo>() functions and, in some rare cases, they have a benefit over kstrto<foo>() ones. Update a comment to reduce confusion about special use cases. Link: http://lkml.kernel.org/r/20190801192904.41087-1-andriy.shevchenko@linux.intel.comSigned-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Suggested-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Mans Rullgard <mans@mansr.com> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rikard Falkeborn 提交于
Having BUILD_BUG_ON_ZERO produce a value of type size_t leads to awkward casts in cases where the result needs to be signed, or of smaller type than size_t. To avoid this, cast the value to int instead and rely on implicit type conversions when a larger or unsigned type is needed. Link: http://lkml.kernel.org/r/20190811184938.1796-3-rikard.falkeborn@gmail.comSigned-off-by: NRikard Falkeborn <rikard.falkeborn@gmail.com> Suggested-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: NKees Cook <keescook@chromium.org> Reviewed-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Cc: Joe Perches <joe@perches.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alessio Balsini 提交于
Fix coding style of "struct ctl_table" and "struct ctl_table_header" to have inline braces. Besides the wide use of this proposed cose style, this change helps to find at a glance the struct definition when navigating the code. Link: http://lkml.kernel.org/r/20190903154906.188651-1-balsini@android.comSigned-off-by: NAlessio Balsini <balsini@android.com> Acked-by: NLuis Chamberlain <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Miaohe Lin 提交于
state_size and ops are in the wrong position. Link: http://lkml.kernel.org/r/20190910021747.11216-1-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NAleksa Sarai <cyphar@cyphar.com> Reviewed-by: NChristian Brauner <christian.brauner@ubuntu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
Use common names from vmstat array when possible. This gives not much difference in code size for now, but should help in keeping interfaces consistent. add/remove: 0/2 grow/shrink: 2/0 up/down: 70/-72 (-2) Function old new delta memory_stat_format 984 1050 +66 memcg_stat_show 957 961 +4 memcg1_event_names 32 - -32 mem_cgroup_lru_names 40 - -40 Total: Before=14485337, After=14485335, chg -0.00% Link: http://lkml.kernel.org/r/157113012508.453.80391533767219371.stgit@buzzSigned-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
Statistics in vmstat is combined from counters with different structure, but names for them are merged into one array. This patch adds trivial helpers to get name for each item: const char *zone_stat_name(enum zone_stat_item item); const char *numa_stat_name(enum numa_stat_item item); const char *node_stat_name(enum node_stat_item item); const char *writeback_stat_name(enum writeback_stat_item item); const char *vm_event_name(enum vm_event_item item); Names for enum writeback_stat_item are folded in the middle of vmstat_text so this patch moves declaration into header to calculate offset of following items. Also this patch reuses piece of node stat names for lru list names: const char *lru_list_name(enum lru_list lru); This returns common lru list names: "inactive_anon", "active_anon", "inactive_file", "active_file", "unevictable". [khlebnikov@yandex-team.ru: do not use size of vmstat_text as count of /proc/vmstat items] Link: http://lkml.kernel.org/r/157152151769.4139.15423465513138349343.stgit@buzz Link: https://lore.kernel.org/linux-mm/cd1c42ae-281f-c8a8-70ac-1d01d417b2e1@infradead.org/T/#u Link: http://lkml.kernel.org/r/157113012325.453.562783073839432766.stgit@buzzSigned-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: YueHaibing <yuehaibing@huawei.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Martin Varghese 提交于
The skb_mpls_push was not updating ethertype of an ethernet packet if the packet was originally received from a non ARPHRD_ETHER device. In the below OVS data path flow, since the device corresponding to port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_push function does not update the ethertype of the packet even though the previous push_eth action had added an ethernet header to the packet. recirc_id(0),in_port(7),eth_type(0x0800),ipv4(tos=0/0xfc,ttl=64,frag=no), actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00), push_mpls(label=13,tc=0,ttl=64,bos=1,eth_type=0x8847),4 Fixes: 8822e270 ("net: core: move push MPLS functionality from OvS to core helper") Signed-off-by: NMartin Varghese <martin.varghese@nokia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 12月, 2019 1 次提交
-
-
由 Corentin Labbe 提交于
This patch move the define for AGPGART_MINOR to include/linux/miscdevice.h. It is better that all minor number definitions are in the same place. Signed-off-by: NCorentin Labbe <clabbe@baylibre.com> Acked-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NDave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/1574324085-4338-3-git-send-email-clabbe@baylibre.com
-
- 03 12月, 2019 9 次提交
-
-
由 Christoph Hellwig 提交于
bio based drivers only need to update q->nr_zones. Do that manually instead of overloading blk_revalidate_disk_zones to keep that function simpler for the next round of changes that will rely even more on the request based functionality. Reviewed-by: NJavier González <javier@javigon.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Invert the meaning of seq_zones_bitmap by keeping a bitmap of conventional zones. This allows not having a bitmap for devices that do not have conventional zones. Reviewed-by: NJavier González <javier@javigon.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Simplify the arguments to blkdev_nr_zones by passing a gendisk instead of the block_device and capacity. This also removes the need for __blkdev_nr_zones as all callers are outside the fast path and can deal with the additional branch. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
bvec_iter_advance() is quite popular, but compilers fail to do proper alias analysis and optimise it good enough. The assembly is checked for gcc 9.2, x86-64. - remove @iter->bi_size from min(...), as it's always less than @bytes. Modify at the beginning and forget about it. - the compiler isn't able to collapse memory dependencies and remove writes in the loop. Help it by explicitely using local vars. Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Just like commit f67676d1 for read/write requests, this one ensures that the sockaddr data has been copied for IORING_OP_CONNECT if we need to punt the request to async context. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Just like commit f67676d1 for read/write requests, this one ensures that the msghdr data is fully copied if we need to punt a recvmsg or sendmsg system call to async context. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jean Delvare 提交于
Add a utility function dmi_memdev_handle() which returns the DMI handle associated with a given memory slot. This will allow kernel drivers to iterate over the memory slots. Signed-off-by: NJean Delvare <jdelvare@suse.de>
-
由 Jean Delvare 提交于
Store the memory type while walking the memory slots, and provide a way to retrieve it later. Signed-off-by: NJean Delvare <jdelvare@suse.de>
-
由 Martin Varghese 提交于
The skb_mpls_pop was not updating ethertype of an ethernet packet if the packet was originally received from a non ARPHRD_ETHER device. In the below OVS data path flow, since the device corresponding to port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_pop function does not update the ethertype of the packet even though the previous push_eth action had added an ethernet header to the packet. recirc_id(0),in_port(7),eth_type(0x8847), mpls(label=12/0xfffff,tc=0/0,ttl=0/0x0,bos=1/1), actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00), pop_mpls(eth_type=0x800),4 Fixes: ed246cee ("net: core: move pop MPLS functionality from OvS to core helper") Signed-off-by: NMartin Varghese <martin.varghese@nokia.com> Acked-by: NPravin B Shelar <pshelar@ovn.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 12月, 2019 7 次提交
-
-
由 Souptick Joarder 提交于
__online_page_set_limits() is a dummy function - remove it and all callers. Link: http://lkml.kernel.org/r/8e1bc9d3b492f6bde16e95ebc1dee11d6aefabd7.1567889743.git.jrdr.linux@gmail.com Link: http://lkml.kernel.org/r/854db2cf8145d9635249c95584d9a91fd774a229.1567889743.git.jrdr.linux@gmail.com Link: http://lkml.kernel.org/r/9afe6c5a18158f3884a6b302ac2c772f3da49ccc.1567889743.git.jrdr.linux@gmail.comSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hao Lee 提交于
The member in struct zonelist is _zonerefs instead of zones. Link: http://lkml.kernel.org/r/20190927144049.GA29622@haolee.github.ioSigned-off-by: NHao Lee <haolee.swjtu@gmail.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NWei Yang <richardw.yang@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
The first parameter hstate in function hugetlb_fault_mutex_hash() is not used anymore. This patch removes it. [akpm@linux-foundation.org: various build fixes] [cai@lca.pw: fix a GCC compilation warning] Link: http://lkml.kernel.org/r/1570544108-32331-1-git-send-email-cai@lca.pw Link: http://lkml.kernel.org/r/20191005003302.785-1-richardw.yang@linux.intel.comSigned-off-by: NWei Yang <richardw.yang@linux.intel.com> Signed-off-by: NQian Cai <cai@lca.pw> Suggested-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
huge_pte_offset() produced a sparse warning due to an improper return type when the kernel was built with !CONFIG_HUGETLB_PAGE. Fix the bad type and also convert all the macros in this block to static inline wrappers. Two existing wrappers in this block had lines in excess of 80 columns so clean those up as well. No functional change. Link: http://lkml.kernel.org/r/20191112194558.139389-3-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com> Reported-by: NBen Dooks <ben.dooks@codethink.co.uk> Suggested-by: NJason Gunthorpe <jgg@ziepe.ca> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
A new clang diagnostic (-Wsizeof-array-div) warns about the calculation to determine the number of u32's in an array of unsigned longs. Suppress warning by adding parentheses. While looking at the above issue, noticed that the 'address' parameter to hugetlb_fault_mutex_hash is no longer used. So, remove it from the definition and all callers. No functional change. Link: http://lkml.kernel.org/r/20190919011847.18400-1-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com> Reported-by: NNathan Chancellor <natechancellor@gmail.com> Reviewed-by: NNathan Chancellor <natechancellor@gmail.com> Reviewed-by: NDavidlohr Bueso <dbueso@suse.de> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Ilie Halip <ilie.halip@gmail.com> Cc: David Bolvansky <david.bolvansky@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yunfeng Ye 提交于
sparse_buffer_init() use memblock_alloc_try_nid_raw() to allocate memory for page management structure, if memory allocation fails from specified node, it will fall back to allocate from other nodes. Normally, the page management structure will not exceed 2% of the total memory, but a large continuous block of allocation is needed. In most cases, memory allocation from the specified node will succeed, but a node memory become highly fragmented will fail. we expect to allocate memory base section rather than by allocating a large block of memory from other NUMA nodes Add memblock_alloc_exact_nid_raw() for this situation, which allocate boot memory block on the exact node. If a large contiguous block memory allocate fail in sparse_buffer_init(), it will fall back to allocate small block memory base section. Link: http://lkml.kernel.org/r/66755ea7-ab10-8882-36fd-3e02b03775d5@huawei.comSigned-off-by: NYunfeng Ye <yeyunfeng@huawei.com> Reviewed-by: NMike Rapoport <rppt@linux.ibm.com> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Qian Cai <cai@lca.pw> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
We split the LRU lists into inactive and an active parts to maximize workingset protection while allowing just enough inactive cache space to faciltate readahead and writeback for one-off file accesses (e.g. a linear scan through a file, or logging); or just enough inactive anon to maintain recent reference information when reclaim needs to swap. With cgroups and their nested LRU lists, we currently don't do this correctly. While recursive cgroup reclaim establishes a relative LRU order among the pages of all involved cgroups, inactive:active size decisions are done on a per-cgroup level. As a result, we'll reclaim a cgroup's workingset when it doesn't have cold pages, even when one of its siblings has plenty of it that should be reclaimed first. For example: workload A has 50M worth of hot cache but doesn't do any one-off file accesses; meanwhile, parallel workload B scans files and rarely accesses the same page twice. If these workloads were to run in an uncgrouped system, A would be protected from the high rate of cache faults from B. But if they were put in parallel cgroups for memory accounting purposes, B's fast cache fault rate would push out the hot cache pages of A. This is unexpected and undesirable - the "scan resistance" of the page cache is broken. This patch moves inactive:active size balancing decisions to the root of reclaim - the same level where the LRU order is established. It does this by looking at the recursive size of the inactive and the active file sets of the cgroup subtree at the beginning of the reclaim cycle, and then making a decision - scan or skip active pages - that applies throughout the entire run and to every cgroup involved. With that in place, in the test above, the VM will recognize that there are plenty of inactive pages in the combined cache set of workloads A and B and prefer the one-off cache in B over the hot pages in A. The scan resistance of the cache is restored. [cai@lca.pw: fix some -Wenum-conversion warnings] Link: http://lkml.kernel.org/r/1573848697-29262-1-git-send-email-cai@lca.pw Link: http://lkml.kernel.org/r/20191107205334.158354-4-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NSuren Baghdasaryan <surenb@google.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Rik van Riel <riel@surriel.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-