- 25 5月, 2019 3 次提交
-
-
由 Jiong Wang 提交于
After previous patches, verifier will mark a insn if it really needs zero extension on dst_reg. It is then for back-ends to decide how to use such information to eliminate unnecessary zero extension code-gen during JIT compilation. One approach is verifier insert explicit zero extension for those insns that need zero extension in a generic way, JIT back-ends then do not generate zero extension for sub-register write at default. However, only those back-ends which do not have hardware zero extension want this optimization. Back-ends like x86_64 and AArch64 have hardware zero extension support that the insertion should be disabled. This patch introduces new target hook "bpf_jit_needs_zext" which returns false at default, meaning verifier zero extension insertion is disabled at default. A back-end could override this hook to return true if it doesn't have hardware support and want verifier insert zero extension explicitly. Offload targets do not use this native target hook, instead, they could get the optimization results using bpf_prog_offload_ops.finalize. NOTE: arches could have diversified features, it is possible for one arch to have hardware zero extension support for some sub-register write insns but not for all. For example, PowerPC, SPARC have zero extended loads, but not for alu32. So when verifier zero extension insertion enabled, these JIT back-ends need to peephole insns to remove those zero extension inserted for insn that actually has hardware zero extension support. The peephole could be as simple as looking the next insn, if it is a special zero extension insn then it is safe to eliminate it if the current insn has hardware zero extension support. Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NJiong Wang <jiong.wang@netronome.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Jiong Wang 提交于
The encoding for this new variant is based on BPF_X format. "imm" field was 0 only, now it could be 1 which means doing zero extension unconditionally .code = BPF_ALU | BPF_MOV | BPF_X .dst_reg = DST .src_reg = SRC .imm = 1 We use this new form for doing zero extension for which verifier will guarantee SRC == DST. Implications on JIT back-ends when doing code-gen for BPF_ALU | BPF_MOV | BPF_X: 1. No change if hardware already does zero extension unconditionally for sub-register write. 2. Otherwise, when seeing imm == 1, just generate insns to clear high 32-bit. No need to generate insns for the move because when imm == 1, dst_reg is the same as src_reg at the moment. Interpreter doesn't need change as well. It is doing unconditionally zero extension for mov32 already. One helper macro BPF_ZEXT_REG is added to help creating zero extension insn using this new mov32 variant. One helper function insn_is_zext is added for checking one insn is an zero extension on dst. This will be widely used by a few JIT back-ends in later patches in this set. Signed-off-by: NJiong Wang <jiong.wang@netronome.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Jiong Wang 提交于
eBPF ISA specification requires high 32-bit cleared when low 32-bit sub-register is written. This applies to destination register of ALU32 etc. JIT back-ends must guarantee this semantic when doing code-gen. x86_64 and AArch64 ISA has the same semantics, so the corresponding JIT back-end doesn't need to do extra work. However, 32-bit arches (arm, x86, nfp etc.) and some other 64-bit arches (PowerPC, SPARC etc) need to do explicit zero extension to meet this requirement, otherwise code like the following will fail. u64_value = (u64) u32_value ... other uses of u64_value This is because compiler could exploit the semantic described above and save those zero extensions for extending u32_value to u64_value, these JIT back-ends are expected to guarantee this through inserting extra zero extensions which however could be a significant increase on the code size. Some benchmarks show there could be ~40% sub-register writes out of total insns, meaning at least ~40% extra code-gen. One observation is these extra zero extensions are not always necessary. Take above code snippet for example, it is possible u32_value will never be casted into a u64, the value of high 32-bit of u32_value then could be ignored and extra zero extension could be eliminated. This patch implements this idea, insns defining sub-registers will be marked when the high 32-bit of the defined sub-register matters. For those unmarked insns, it is safe to eliminate high 32-bit clearnace for them. Algo: - Split read flags into READ32 and READ64. - Record index of insn that does sub-register write. Keep the index inside reg state and update it during verifier insn walking. - A full register read on a sub-register marks its definition insn as needing zero extension on dst register. A new sub-register write overrides the old one. - When propagating read64 during path pruning, also mark any insn defining a sub-register that is read in the pruned path as full-register. Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NJiong Wang <jiong.wang@netronome.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 24 5月, 2019 2 次提交
-
-
由 Alexei Starovoitov 提交于
All prune points inside a callee bpf function most likely will have different callsites. For example, if function foo() is called from two callsites the half of explored states in all prune points in foo() will be useless for subsequent walking of one of those callsites. Fortunately explored_states pruning heuristics keeps the number of states per prune point small, but walking these states is still a waste of cpu time when the callsite of the current state is different from the callsite of the explored state. To improve pruning logic convert explored_states into hash table and use simple insn_idx ^ callsite hash to select hash bucket. This optimization has no effect on programs without bpf2bpf calls and drastically improves programs with calls. In the later case it reduces total memory consumption in 1M scale tests by almost 3 times (peak_states drops from 5752 to 2016). Care should be taken when comparing the states for equivalency. Since the same hash bucket can now contain states with different indices the insn_idx has to be part of verifier_state and compared. Different hash table sizes and different hash functions were explored, but the results were not significantly better vs this patch. They can be improved in the future. Hit/miss heuristic is not counting index miscompare as a miss. Otherwise verifier stats become unstable when experimenting with different hash functions. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Alexei Starovoitov 提交于
split explored_states into prune_point boolean mark and link list of explored states. This removes STATE_LIST_MARK hack and allows marks to be separate from states. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 23 5月, 2019 1 次提交
-
-
由 Eric Dumazet 提交于
Removing two 4 bytes holes allows to use kmalloc-32 kmem cache instead of kmalloc-64 on 64bit kernels. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 5月, 2019 8 次提交
-
-
由 Thomas Gleixner 提交于
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program see the file copying if not write to the free software foundation 675 mass ave cambridge ma 02139 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 52 file(s). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com> Reviewed-by: NSteve Winslow <swinslow@gmail.com> Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org> Reviewed-by: NAllison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154042.342335923@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Thomas Gleixner 提交于
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 13 file(s). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NSteve Winslow <swinslow@gmail.com> Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com> Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org> Reviewed-by: NAllison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154042.236620792@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Thomas Gleixner 提交于
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details [based] [from] [clk] [highbank] [c] you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 355 file(s). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org> Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com> Reviewed-by: NSteve Winslow <swinslow@gmail.com> Reviewed-by: NAllison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154041.837383322@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Thomas Gleixner 提交于
Based on 1 normalized pattern(s): licensed under the fsf s gnu public license v2 or later extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 2 file(s). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org> Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com> Reviewed-by: NSteve Winslow <swinslow@gmail.com> Reviewed-by: NAllison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154041.526489261@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Thomas Gleixner 提交于
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details the full gnu general public license is included in this distribution in the file called copying extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 9 file(s). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NSteve Winslow <swinslow@gmail.com> Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org> Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com> Reviewed-by: NAllison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154041.244154651@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Thomas Gleixner 提交于
Based on 1 normalized pattern(s): licensed under gplv2 or later extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 118 file(s). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com> Reviewed-by: NSteve Winslow <swinslow@gmail.com> Reviewed-by: NAllison Randal <allison@lohutok.net> Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154040.961286471@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Thomas Gleixner 提交于
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 51 franklin street fifth floor boston ma 02110 1301 usa this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option [no]_[pad]_[ctrl] any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 51 franklin street fifth floor boston ma 02110 1301 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 176 file(s). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com> Reviewed-by: NSteve Winslow <swinslow@gmail.com> Reviewed-by: NAllison Randal <allison@lohutok.net> Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154040.652910950@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Thomas Gleixner 提交于
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 20 5月, 2019 1 次提交
-
-
由 Petr Štetiar 提交于
of_get_mac_address prior to commit d01f449c ("of_net: add NVMEM support to of_get_mac_address") could return only valid pointer or NULL, after this change it could return only valid pointer or ERR_PTR encoded error value, but I've forget to change the return value of of_get_mac_address in case where the kernel is compiled without CONFIG_OF, so I'm doing so now. Cc: Mirko Lindner <mlindner@marvell.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Fixes: d01f449c ("of_net: add NVMEM support to of_get_mac_address") Reported-by: NOctavio Alvarez <octallk1@alvarezp.org> Signed-off-by: NPetr Štetiar <ynezz@true.cz> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 5月, 2019 2 次提交
-
-
由 Feng Tang 提交于
Currently on panic, kernel will lower the loglevel and print out pending printk msg only with console_flush_on_panic(). Add an option for users to configure the "panic_print" to replay all dmesg in buffer, some of which they may have never seen due to the loglevel setting, which will help panic debugging . [feng.tang@intel.com: keep the original console_flush_on_panic() inside panic()] Link: http://lkml.kernel.org/r/1556199137-14163-1-git-send-email-feng.tang@intel.com [feng.tang@intel.com: use logbuf lock to protect the console log index] Link: http://lkml.kernel.org/r/1556269868-22654-1-git-send-email-feng.tang@intel.com Link: http://lkml.kernel.org/r/1556095872-36838-1-git-send-email-feng.tang@intel.comSigned-off-by: NFeng Tang <feng.tang@intel.com> Reviewed-by: NPetr Mladek <pmladek@suse.com> Cc: Aaro Koskinen <aaro.koskinen@nokia.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Uladzislau Rezki (Sony) 提交于
Patch series "improve vmap allocation", v3. Objective --------- Please have a look for the description at: https://lkml.org/lkml/2018/10/19/786 but let me also summarize it a bit here as well. The current implementation has O(N) complexity. Requests with different permissive parameters can lead to long allocation time. When i say "long" i mean milliseconds. Description ----------- This approach organizes the KVA memory layout into free areas of the 1-ULONG_MAX range, i.e. an allocation is done over free areas lookups, instead of finding a hole between two busy blocks. It allows to have lower number of objects which represent the free space, therefore to have less fragmented memory allocator. Because free blocks are always as large as possible. It uses the augment tree where all free areas are sorted in ascending order of va->va_start address in pair with linked list that provides O(1) access to prev/next elements. Since the tree is augment, we also maintain the "subtree_max_size" of VA that reflects a maximum available free block in its left or right sub-tree. Knowing that, we can easily traversal toward the lowest (left most path) free area. Allocation: ~O(log(N)) complexity. It is sequential allocation method therefore tends to maximize locality. The search is done until a first suitable block is large enough to encompass the requested parameters. Bigger areas are split. I copy paste here the description of how the area is split, since i described it in https://lkml.org/lkml/2018/10/19/786 <snip> A free block can be split by three different ways. Their names are FL_FIT_TYPE, LE_FIT_TYPE/RE_FIT_TYPE and NE_FIT_TYPE, i.e. they correspond to how requested size and alignment fit to a free block. FL_FIT_TYPE - in this case a free block is just removed from the free list/tree because it fully fits. Comparing with current design there is an extra work with rb-tree updating. LE_FIT_TYPE/RE_FIT_TYPE - left/right edges fit. In this case what we do is just cutting a free block. It is as fast as a current design. Most of the vmalloc allocations just end up with this case, because the edge is always aligned to 1. NE_FIT_TYPE - Is much less common case. Basically it happens when requested size and alignment does not fit left nor right edges, i.e. it is between them. In this case during splitting we have to build a remaining left free area and place it back to the free list/tree. Comparing with current design there are two extra steps. First one is we have to allocate a new vmap_area structure. Second one we have to insert that remaining free block to the address sorted list/tree. In order to optimize a first case there is a cache with free_vmap objects. Instead of allocating from slab we just take an object from the cache and reuse it. Second one is pretty optimized. Since we know a start point in the tree we do not do a search from the top. Instead a traversal begins from a rb-tree node we split. <snip> De-allocation. ~O(log(N)) complexity. An area is not inserted straight away to the tree/list, instead we identify the spot first, checking if it can be merged around neighbors. The list provides O(1) access to prev/next, so it is pretty fast to check it. Summarizing. If merged then large coalesced areas are created, if not the area is just linked making more fragments. There is one more thing that i should mention here. After modification of VA node, its subtree_max_size is updated if it was/is the biggest area in its left or right sub-tree. Apart of that it can also be populated back to upper levels to fix the tree. For more details please have a look at the __augment_tree_propagate_from() function and the description. Tests and stressing ------------------- I use the "test_vmalloc.sh" test driver available under "tools/testing/selftests/vm/" since 5.1-rc1 kernel. Just trigger "sudo ./test_vmalloc.sh" to find out how to deal with it. Tested on different platforms including x86_64/i686/ARM64/x86_64_NUMA. Regarding last one, i do not have any physical access to NUMA system, therefore i emulated it. The time of stressing is days. If you run the test driver in "stress mode", you also need the patch that is in Andrew's tree but not in Linux 5.1-rc1. So, please apply it: http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/commit/?id=e0cf7749bade6da318e98e934a24d8b62fab512c After massive testing, i have not identified any problems like memory leaks, crashes or kernel panics. I find it stable, but more testing would be good. Performance analysis -------------------- I have used two systems to test. One is i5-3320M CPU @ 2.60GHz and another is HiKey960(arm64) board. i5-3320M runs on 4.20 kernel, whereas Hikey960 uses 4.15 kernel. I have both system which could run on 5.1-rc1 as well, but the results have not been ready by time i an writing this. Currently it consist of 8 tests. There are three of them which correspond to different types of splitting(to compare with default). We have 3 ones(see above). Another 5 do allocations in different conditions. a) sudo ./test_vmalloc.sh performance When the test driver is run in "performance" mode, it runs all available tests pinned to first online CPU with sequential execution test order. We do it in order to get stable and repeatable results. Take a look at time difference in "long_busy_list_alloc_test". It is not surprising because the worst case is O(N). # i5-3320M How many cycles all tests took: CPU0=646919905370(default) cycles vs CPU0=193290498550(patched) cycles # See detailed table with results here: ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_performance_default.txt ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_performance_patched.txt # Hikey960 8x CPUs How many cycles all tests took: CPU0=3478683207 cycles vs CPU0=463767978 cycles # See detailed table with results here: ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/HiKey960_performance_default.txt ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/HiKey960_performance_patched.txt b) time sudo ./test_vmalloc.sh test_repeat_count=1 With this configuration, all tests are run on all available online CPUs. Before running each CPU shuffles its tests execution order. It gives random allocation behaviour. So it is rough comparison, but it puts in the picture for sure. # i5-3320M <default> vs <patched> real 101m22.813s real 0m56.805s user 0m0.011s user 0m0.015s sys 0m5.076s sys 0m0.023s # See detailed table with results here: ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_test_repeat_count_1_default.txt ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_test_repeat_count_1_patched.txt # Hikey960 8x CPUs <default> vs <patched> real unknown real 4m25.214s user unknown user 0m0.011s sys unknown sys 0m0.670s I did not manage to complete this test on "default Hikey960" kernel version. After 24 hours it was still running, therefore i had to cancel it. That is why real/user/sys are "unknown". This patch (of 3): Currently an allocation of the new vmap area is done over busy list iteration(complexity O(n)) until a suitable hole is found between two busy areas. Therefore each new allocation causes the list being grown. Due to over fragmented list and different permissive parameters an allocation can take a long time. For example on embedded devices it is milliseconds. This patch organizes the KVA memory layout into free areas of the 1-ULONG_MAX range. It uses an augment red-black tree that keeps blocks sorted by their offsets in pair with linked list keeping the free space in order of increasing addresses. Nodes are augmented with the size of the maximum available free block in its left or right sub-tree. Thus, that allows to take a decision and traversal toward the block that will fit and will have the lowest start address, i.e. it is sequential allocation. Allocation: to allocate a new block a search is done over the tree until a suitable lowest(left most) block is large enough to encompass: the requested size, alignment and vstart point. If the block is bigger than requested size - it is split. De-allocation: when a busy vmap area is freed it can either be merged or inserted to the tree. Red-black tree allows efficiently find a spot whereas a linked list provides a constant-time access to previous and next blocks to check if merging can be done. In case of merging of de-allocated memory chunk a large coalesced area is created. Complexity: ~O(log(N)) [urezki@gmail.com: v3] Link: http://lkml.kernel.org/r/20190402162531.10888-2-urezki@gmail.com [urezki@gmail.com: v4] Link: http://lkml.kernel.org/r/20190406183508.25273-2-urezki@gmail.com Link: http://lkml.kernel.org/r/20190321190327.11813-2-urezki@gmail.comSigned-off-by: NUladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: NRoman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 5月, 2019 2 次提交
-
-
由 Parav Pandit 提交于
To avoid any ambiguity between vport index and vport number, rename functions that had vport, to vport_num or vport_index appropriately. vport_num is u16 hence change mlx5_eswitch_index_to_vport_num() return type to u16. vport_index is an int in vport array. Hence change input type of vport index in mlx5_eswitch_index_to_vport_num() to int. Correct multiple eswitch representor interfaces use type u16 of rep->vport as type int vport_index. Send vport FW commands with correct eswitch u16 vport_num instead host int vport_index. Fixes: 5ae51620 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport") Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NVu Pham <vuhuong@mellanox.com> Reviewed-by: NBodong Wang <bodong@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
-
由 Heiner Kallweit 提交于
i2c_new_dummy is typically called from the probe function of the driver for the primary i2c client. It requires calls to i2c_unregister_device in the error path of the probe function and in the remove function. This can be simplified by introducing a device-managed version. Note the changed error case return value type: i2c_new_dummy returns NULL whilst devm_i2c_new_dummy_device returns an ERR_PTR. Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com> [wsa: rename new functions and fix minor kdoc issues] Signed-off-by: NWolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: NPeter Rosin <peda@axentia.se> Reviewed-by: NKieran Bingham <kieran.bingham+renesas@ideasonboard.com> Reviewed-by: NBartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
-
- 17 5月, 2019 3 次提交
-
-
由 Qian Cai 提交于
It turned out that DEBUG_SLAB_LEAK is still broken even after recent recue efforts that when there is a large number of objects like kmemleak_object which is normal on a debug kernel, # grep kmemleak /proc/slabinfo kmemleak_object 2243606 3436210 ... reading /proc/slab_allocators could easily loop forever while processing the kmemleak_object cache and any additional freeing or allocating objects will trigger a reprocessing. To make a situation worse, soft-lockups could easily happen in this sitatuion which will call printk() to allocate more kmemleak objects to guarantee an infinite loop. Also, since it seems no one had noticed when it was totally broken more than 2-year ago - see the commit fcf88917 ("slab: fix a crash by reading /proc/slab_allocators"), probably nobody cares about it anymore due to the decline of the SLAB. Just remove it entirely. Suggested-by: NVlastimil Babka <vbabka@suse.cz> Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NQian Cai <cai@lca.pw> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Willem de Bruijn 提交于
Zerocopy skbs without completion notification were added for packet sockets with PACKET_TX_RING user buffers. Those signal completion through the TP_STATUS_USER bit in the ring. Zerocopy annotation was added only to avoid premature notification after clone or orphan, by triggering a copy on these paths for these packets. The mechanism had to define a special "no-uarg" mode because packet sockets already use skb_uarg(skb) == skb_shinfo(skb)->destructor_arg for a different pointer. Before deferencing skb_uarg(skb), verify that it is a real pointer. Fixes: 5cd8d46e ("packet: copy user buffers before orphan or clone") Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
The opaque type rhash_lock_head should not be marked with __rcu because it can never be dereferenced. We should apply the RCU marking when we turn it into a pointer which can be dereferenced. This patch does exactly that. This fixes a number of sparse warnings as well as getting rid of some unnecessary RCU checking. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 5月, 2019 3 次提交
-
-
由 Stephen Boyd 提交于
Now that we've gotten rid of clk_readl() we can remove io.h from the clk-provider header and push out the io.h include to any code that isn't already including the io.h header but using things like readl/writel, etc. Found with this grep: git grep -l clk-provider.h | grep '.c$' | xargs git grep -L 'linux/io.h' | \ xargs git grep -l \ -e '\<__iowrite32_copy\>' --or \ -e '\<__ioread32_copy\>' --or \ -e '\<__iowrite64_copy\>' --or \ -e '\<ioremap_page_range\>' --or \ -e '\<ioremap_huge_init\>' --or \ -e '\<arch_ioremap_pud_supported\>' --or \ -e '\<arch_ioremap_pmd_supported\>' --or \ -e '\<devm_ioport_map\>' --or \ -e '\<devm_ioport_unmap\>' --or \ -e '\<IOMEM_ERR_PTR\>' --or \ -e '\<devm_ioremap\>' --or \ -e '\<devm_ioremap_nocache\>' --or \ -e '\<devm_ioremap_wc\>' --or \ -e '\<devm_iounmap\>' --or \ -e '\<devm_ioremap_release\>' --or \ -e '\<devm_memremap\>' --or \ -e '\<devm_memunmap\>' --or \ -e '\<__devm_memremap_pages\>' --or \ -e '\<pci_remap_cfgspace\>' --or \ -e '\<arch_has_dev_port\>' --or \ -e '\<arch_phys_wc_add\>' --or \ -e '\<arch_phys_wc_del\>' --or \ -e '\<memremap\>' --or \ -e '\<memunmap\>' --or \ -e '\<arch_io_reserve_memtype_wc\>' --or \ -e '\<arch_io_free_memtype_wc\>' --or \ -e '\<__io_aw\>' --or \ -e '\<__io_pbw\>' --or \ -e '\<__io_paw\>' --or \ -e '\<__io_pbr\>' --or \ -e '\<__io_par\>' --or \ -e '\<__raw_readb\>' --or \ -e '\<__raw_readw\>' --or \ -e '\<__raw_readl\>' --or \ -e '\<__raw_readq\>' --or \ -e '\<__raw_writeb\>' --or \ -e '\<__raw_writew\>' --or \ -e '\<__raw_writel\>' --or \ -e '\<__raw_writeq\>' --or \ -e '\<readb\>' --or \ -e '\<readw\>' --or \ -e '\<readl\>' --or \ -e '\<readq\>' --or \ -e '\<writeb\>' --or \ -e '\<writew\>' --or \ -e '\<writel\>' --or \ -e '\<writeq\>' --or \ -e '\<readb_relaxed\>' --or \ -e '\<readw_relaxed\>' --or \ -e '\<readl_relaxed\>' --or \ -e '\<readq_relaxed\>' --or \ -e '\<writeb_relaxed\>' --or \ -e '\<writew_relaxed\>' --or \ -e '\<writel_relaxed\>' --or \ -e '\<writeq_relaxed\>' --or \ -e '\<readsb\>' --or \ -e '\<readsw\>' --or \ -e '\<readsl\>' --or \ -e '\<readsq\>' --or \ -e '\<writesb\>' --or \ -e '\<writesw\>' --or \ -e '\<writesl\>' --or \ -e '\<writesq\>' --or \ -e '\<inb\>' --or \ -e '\<inw\>' --or \ -e '\<inl\>' --or \ -e '\<outb\>' --or \ -e '\<outw\>' --or \ -e '\<outl\>' --or \ -e '\<inb_p\>' --or \ -e '\<inw_p\>' --or \ -e '\<inl_p\>' --or \ -e '\<outb_p\>' --or \ -e '\<outw_p\>' --or \ -e '\<outl_p\>' --or \ -e '\<insb\>' --or \ -e '\<insw\>' --or \ -e '\<insl\>' --or \ -e '\<outsb\>' --or \ -e '\<outsw\>' --or \ -e '\<outsl\>' --or \ -e '\<insb_p\>' --or \ -e '\<insw_p\>' --or \ -e '\<insl_p\>' --or \ -e '\<outsb_p\>' --or \ -e '\<outsw_p\>' --or \ -e '\<outsl_p\>' --or \ -e '\<ioread8\>' --or \ -e '\<ioread16\>' --or \ -e '\<ioread32\>' --or \ -e '\<ioread64\>' --or \ -e '\<iowrite8\>' --or \ -e '\<iowrite16\>' --or \ -e '\<iowrite32\>' --or \ -e '\<iowrite64\>' --or \ -e '\<ioread16be\>' --or \ -e '\<ioread32be\>' --or \ -e '\<ioread64be\>' --or \ -e '\<iowrite16be\>' --or \ -e '\<iowrite32be\>' --or \ -e '\<iowrite64be\>' --or \ -e '\<ioread8_rep\>' --or \ -e '\<ioread16_rep\>' --or \ -e '\<ioread32_rep\>' --or \ -e '\<ioread64_rep\>' --or \ -e '\<iowrite8_rep\>' --or \ -e '\<iowrite16_rep\>' --or \ -e '\<iowrite32_rep\>' --or \ -e '\<iowrite64_rep\>' --or \ -e '\<__io_virt\>' --or \ -e '\<pci_iounmap\>' --or \ -e '\<virt_to_phys\>' --or \ -e '\<phys_to_virt\>' --or \ -e '\<ioremap_uc\>' --or \ -e '\<ioremap\>' --or \ -e '\<__ioremap\>' --or \ -e '\<iounmap\>' --or \ -e '\<ioremap\>' --or \ -e '\<ioremap_nocache\>' --or \ -e '\<ioremap_uc\>' --or \ -e '\<ioremap_wc\>' --or \ -e '\<ioremap_wc\>' --or \ -e '\<ioremap_wt\>' --or \ -e '\<ioport_map\>' --or \ -e '\<ioport_unmap\>' --or \ -e '\<ioport_map\>' --or \ -e '\<ioport_unmap\>' --or \ -e '\<xlate_dev_kmem_ptr\>' --or \ -e '\<xlate_dev_mem_ptr\>' --or \ -e '\<unxlate_dev_mem_ptr\>' --or \ -e '\<virt_to_bus\>' --or \ -e '\<bus_to_virt\>' --or \ -e '\<memset_io\>' --or \ -e '\<memcpy_fromio\>' --or \ -e '\<memcpy_toio\>' I also reordered a couple includes when they weren't alphabetical and removed clk.h from kona, replacing it with clk-provider.h because that driver doesn't use clk consumer APIs. Acked-by: NGeert Uytterhoeven <geert+renesas@glider.be> Cc: Chen-Yu Tsai <wens@csie.org> Acked-by: NMaxime Ripard <maxime.ripard@bootlin.com> Acked-by: NTero Kristo <t-kristo@ti.com> Acked-by: NSekhar Nori <nsekhar@ti.com> Cc: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: NMark Brown <broonie@kernel.org> Cc: Chris Zankel <chris@zankel.net> Acked-by: NMax Filippov <jcmvbkbc@gmail.com> Acked-by: NJohn Crispin <john@phrozen.org> Acked-by: NHeiko Stuebner <heiko@sntech.de> Signed-off-by: NStephen Boyd <sboyd@kernel.org>
-
由 David Howells 提交于
Add wait_var_event_interruptible() to allow interruptible waits for events. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-
由 David Howells 提交于
Allow used DNS resolver keys to be invalidated after use if the caller is doing its own caching of the results. This reduces the amount of resources required. Fix AFS to invalidate DNS results to kill off permanent failure records that get lodged in the resolver keyring and prevent future lookups from happening. Fixes: 0a5143f2 ("afs: Implement VL server rotation") Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
- 15 5月, 2019 15 次提交
-
-
由 Johannes Weiner 提交于
Right now, when somebody needs to know the recursive memory statistics and events of a cgroup subtree, they need to walk the entire subtree and sum up the counters manually. There are two issues with this: 1. When a cgroup gets deleted, its stats are lost. The state counters should all be 0 at that point, of course, but the events are not. When this happens, the event counters, which are supposed to be monotonic, can go backwards in the parent cgroups. 2. During regular operation, we always have a certain number of lazily freed cgroups sitting around that have been deleted, have no tasks, but have a few cache pages remaining. These groups' statistics do not change until we eventually hit memory pressure, but somebody watching, say, memory.stat on an ancestor has to iterate those every time. This patch addresses both issues by introducing recursive counters at each level that are propagated from the write side when stats change. Upward propagation happens when the per-cpu caches spill over into the local atomic counter. This is the same thing we do during charge and uncharge, except that the latter uses atomic RMWs, which are more expensive; stat changes happen at around the same rate. In a sparse file test (page faults and reclaim at maximum CPU speed) with 5 cgroup nesting levels, perf shows __mod_memcg_page state at ~1%. Link: http://lkml.kernel.org/r/20190412151507.2769-4-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NShakeel Butt <shakeelb@google.com> Reviewed-by: NRoman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
These are getting too big to be inlined in every callsite. They were stolen from vmstat.c, which already out-of-lines them, and they have only been growing since. The callsites aren't that hot, either. Move __mod_memcg_state() __mod_lruvec_state() and __count_memcg_events() out of line and add kerneldoc comments. Link: http://lkml.kernel.org/r/20190412151507.2769-3-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NShakeel Butt <shakeelb@google.com> Reviewed-by: NRoman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Patch series "mm: memcontrol: memory.stat cost & correctness". The cgroup memory.stat file holds recursive statistics for the entire subtree. The current implementation does this tree walk on-demand whenever the file is read. This is giving us problems in production. 1. The cost of aggregating the statistics on-demand is high. A lot of system service cgroups are mostly idle and their stats don't change between reads, yet we always have to check them. There are also always some lazily-dying cgroups sitting around that are pinned by a handful of remaining page cache; the same applies to them. In an application that periodically monitors memory.stat in our fleet, we have seen the aggregation consume up to 5% CPU time. 2. When cgroups die and disappear from the cgroup tree, so do their accumulated vm events. The result is that the event counters at higher-level cgroups can go backwards and confuse some of our automation, let alone people looking at the graphs over time. To address both issues, this patch series changes the stat implementation to spill counts upwards when the counters change. The upward spilling is batched using the existing per-cpu cache. In a sparse file stress test with 5 level cgroup nesting, the additional cost of the flushing was negligible (a little under 1% of CPU at 100% CPU utilization, compared to the 5% of reading memory.stat during regular operation). This patch (of 4): memcg_page_state(), lruvec_page_state(), memcg_sum_events() are currently returning the state of the local memcg or lruvec, not the recursive state. In practice there is a demand for both versions, although the callers that want the recursive counts currently sum them up by hand. Per default, cgroups are considered recursive entities and generally we expect more users of the recursive counters, with the local counts being special cases. To reflect that in the name, add a _local suffix to the current implementations. The following patch will re-incarnate these functions with recursive semantics, but with an O(1) implementation. [hannes@cmpxchg.org: fix bisection hole] Link: http://lkml.kernel.org/r/20190417160347.GC23013@cmpxchg.org Link: http://lkml.kernel.org/r/20190412151507.2769-2-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NShakeel Butt <shakeelb@google.com> Reviewed-by: NRoman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Chris Down 提交于
I spent literally an hour trying to work out why an earlier version of my memory.events aggregation code doesn't work properly, only to find out I was calling memcg->events instead of memcg->memory_events, which is fairly confusing. This naming seems in need of reworking, so make it harder to do the wrong thing by using vmevents instead of events, which makes it more clear that these are vm counters rather than memcg-specific counters. There are also a few other inconsistent names in both the percpu and aggregated structs, so these are all cleaned up to be more coherent and easy to understand. This commit contains code cleanup only: there are no logic changes. [akpm@linux-foundation.org: fix it for preceding changes] Link: http://lkml.kernel.org/r/20190208224319.GA23801@chrisdown.nameSigned-off-by: NChris Down <chris@chrisdown.name> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Dennis Zhou <dennis@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrei Vagin 提交于
This file uses "task" 85 times and "tsk" 25 times. It is better to be consistent. Link: http://lkml.kernel.org/r/20181129180547.15976-1-avagin@gmail.comSigned-off-by: NAndrei Vagin <avagin@gmail.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Manfred Spraul 提交于
Rewrite, based on the patch from Waiman Long: The mixing in of a sequence number into the IPC IDs is probably to avoid ID reuse in userspace as much as possible. With ipcmni_extend mode, the number of usable sequence numbers is greatly reduced leading to higher chance of ID reuse. To address this issue, we need to conserve the sequence number space as much as possible. Right now, the sequence number is incremented for every new ID created. In reality, we only need to increment the sequence number when new allocated ID is not greater than the last one allocated. It is in such case that the new ID may collide with an existing one. This is being done irrespective of the ipcmni mode. In order to avoid any races, the index is first allocated and then the pointer is replaced. Changes compared to the initial patch: - Handle failures from idr_alloc(). - Avoid that concurrent operations can see the wrong sequence number. (This is achieved by using idr_replace()). - IPCMNI_SEQ_SHIFT is not a constant, thus renamed to ipcmni_seq_shift(). - IPCMNI_SEQ_MAX is not a constant, thus renamed to ipcmni_seq_max(). Link: http://lkml.kernel.org/r/20190329204930.21620-2-longman@redhat.comSigned-off-by: NManfred Spraul <manfred@colorfullife.com> Signed-off-by: NWaiman Long <longman@redhat.com> Suggested-by: NMatthew Wilcox <willy@infradead.org> Acked-by: NWaiman Long <longman@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tom Burkart 提交于
This patch implements the PPS ECHO functionality for pps-gpio, that sysfs claims is available already. Configuration is done via device tree bindings. No changes are made to userspace interfaces. This patch was originally written by Lukas Senger as part of a masters thesis project and modified for inclusion into the linux kernel by Tom Burkart. Link: http://lkml.kernel.org/r/20190324043305.6627-4-tom@aussec.comSigned-off-by: NTom Burkart <tom@aussec.com> Acked-by: NRodolfo Giometti <giometti@enneenne.com> Signed-off-by: NLukas Senger <lukas@fridolin.com> Cc: Philipp Zabel <philipp.zabel@gmail.com> Cc: Rob Herring <robh@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tom Burkart 提交于
This patch changes the GPIO access for the pps-gpio driver from the integer based API to the descriptor based API. The integer based API is considered deprecated and the descriptor based API is the preferred way to access GPIOs as per Documentation/driver-api/gpio/intro.rst No changes are made to userspace interfaces. Link: http://lkml.kernel.org/r/20190324043305.6627-2-tom@aussec.comSigned-off-by: NTom Burkart <tom@aussec.com> Acked-by: NRodolfo Giometti <giometti@enneenne.com> Reviewed-by: NPhilipp Zabel <philipp.zabel@gmail.com> Cc: Lukas Senger <lukas@fridolin.com> Cc: Rob Herring <robh@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aaro Koskinen 提交于
Allow specifying reboot_mode for panic only. This is needed on systems where ramoops is used to store panic logs, and user wants to use warm reset to preserve those, while still having cold reset on normal reboots. Link: http://lkml.kernel.org/r/20190322004735.27702-1-aaro.koskinen@iki.fiSigned-off-by: NAaro Koskinen <aaro.koskinen@nokia.com> Reviewed-by: NKees Cook <keescook@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Feng Tang 提交于
When kernel panic happens, it will first print the panic call stack, then the ending msg like: [ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception [ 35.749975] ------------[ cut here ]------------ The above message are very useful for debugging. But if system is configured to not reboot on panic, say the "panic_timeout" parameter equals 0, it will likely print out many noisy message like WARN() call stack for each and every CPU except the panic one, messages like below: WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190 Call Trace: <IRQ> try_to_wake_up default_wake_function autoremove_wake_function __wake_up_common __wake_up_common_lock __wake_up wake_up_klogd_work_func irq_work_run_list irq_work_tick update_process_times tick_sched_timer __hrtimer_run_queues hrtimer_interrupt smp_apic_timer_interrupt apic_timer_interrupt For people working in console mode, the screen will first show the panic call stack, but immediately overridden by these noisy extra messages, which makes debugging much more difficult, as the original context gets lost on screen. Also these noisy messages will confuse some users, as I have seen many bug reporters posted the noisy message into bugzilla, instead of the real panic call stack and context. Adding a flag "suppress_printk" which gets set in panic() to avoid those noisy messages, without changing current kernel behavior that both panic blinking and sysrq magic key can work as is, suggested by Petr Mladek. To verify this, make sure kernel is not configured to reboot on panic and in console # echo c > /proc/sysrq-trigger to see if console only prints out the panic call stack. Link: http://lkml.kernel.org/r/1551430186-24169-1-git-send-email-feng.tang@intel.comSigned-off-by: NFeng Tang <feng.tang@intel.com> Suggested-by: NPetr Mladek <pmladek@suse.com> Reviewed-by: NPetr Mladek <pmladek@suse.com> Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jslaby@suse.com> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yury Norov 提交于
cpumask_parse() finds first occurrence of either or strchr() and strlen(). We can do it better with a single call of strchrnul(). [akpm@linux-foundation.org: remove unneeded cast] Link: http://lkml.kernel.org/r/20190409204208.12190-1-ynorov@marvell.comSigned-off-by: NYury Norov <ynorov@marvell.com> Acked-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
struct linux_binprm::buf is the first field and it is exactly 128 bytes in size. It means that on x86_64 all accesses to other fields will go though [r64 + disp32] addressing mode which is 3 bytes bloatier than [r64 + disp8] addressing mode. Given that accesses to other fields outnumber accesses to ->buf, move it down. Space savings (x86_64 defconfig): more on distro configs because LSMs actively dereference "bprm" but do not care about first 128 bytes of the executable itself. add/remove: 0/0 grow/shrink: 0/24 up/down: 0/-492 (-492) Function old new delta selinux_bprm_committing_creds 552 549 -3 finalize_exec 94 91 -3 __audit_log_bprm_fcaps 283 280 -3 __audit_bprm 39 36 -3 perf_trace_sched_process_exec 347 341 -6 install_exec_creds 105 99 -6 cap_bprm_set_creds.cold 60 54 -6 would_dump 137 128 -9 load_script 637 628 -9 bprm_change_interp 61 52 -9 trace_event_raw_event_sched_process_exec 260 250 -10 search_binary_handler 255 240 -15 remove_arg_zero 295 277 -18 free_bprm 119 101 -18 prepare_binprm 379 360 -19 setup_new_exec 336 315 -21 flush_old_exec 1638 1617 -21 copy_strings.isra 746 724 -22 setup_arg_pages 559 530 -29 load_misc_binary 1151 1118 -33 selinux_bprm_set_creds 792 753 -39 load_elf_binary 11111 11072 -39 cap_bprm_set_creds 1496 1454 -42 __do_execve_file.isra 2395 2286 -109 Link: http://lkml.kernel.org/r/20190421165025.GA26843@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rasmus Villemoes 提交于
The ror32 implementation (word >> shift) | (word << (32 - shift) has undefined behaviour if shift is outside the [1, 31] range. Similarly for the 64 bit variants. Most callers pass a compile-time constant (naturally in that range), but there's an UBSAN report that these may actually be called with a shift count of 0. Instead of special-casing that, we can make them DTRT for all values of shift while also avoiding UB. For some reason, this was already partly done for rol32 (which was well-defined for [0, 31]). gcc 8 recognizes these patterns as rotates, so for example __u32 rol32(__u32 word, unsigned int shift) { return (word << (shift & 31)) | (word >> ((-shift) & 31)); } compiles to 0000000000000020 <rol32>: 20: 89 f8 mov %edi,%eax 22: 89 f1 mov %esi,%ecx 24: d3 c0 rol %cl,%eax 26: c3 retq Older compilers unfortunately do not do as well, but this only affects the small minority of users that don't pass constants. Due to integer promotions, ro[lr]8 were already well-defined for shifts in [0, 8], and ro[lr]16 were mostly well-defined for shifts in [0, 16] (only mostly - u16 gets promoted to _signed_ int, so if bit 15 is set, word << 16 is undefined). For consistency, update those as well. Link: http://lkml.kernel.org/r/20190410211906.2190-1-linux@rasmusvillemoes.dkSigned-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Reported-by: NIdo Schimmel <idosch@mellanox.com> Tested-by: NIdo Schimmel <idosch@mellanox.com> Reviewed-by: NWill Deacon <will.deacon@arm.com> Cc: Vadim Pasternak <vadimp@mellanox.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andy Shevchenko 提交于
The integer exponentiation is used in few places and might be used in the future by other call sites. Move it to wider use. Link: http://lkml.kernel.org/r/20190323172531.80025-2-andriy.shevchenko@linux.intel.comSigned-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Lee Jones <lee.jones@linaro.org> Cc: Ray Jui <rjui@broadcom.com> Cc: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 George Spelvin 提交于
Rather than a fixed-size array of pending sorted runs, use the ->prev links to keep track of things. This reduces stack usage, eliminates some ugly overflow handling, and reduces the code size. Also: * merge() no longer needs to handle NULL inputs, so simplify. * The same applies to merge_and_restore_back_links(), which is renamed to the less ponderous merge_final(). (It's a static helper function, so we don't need a super-descriptive name; comments will do.) * Document the actual return value requirements on the (*cmp)() function; some callers are already using this feature. x86-64 code size 1086 -> 739 bytes (-347) (Yes, I see checkpatch complaining about no space after comma in "__attribute__((nonnull(2,3,4,5)))". Checkpatch is wrong.) Feedback from Rasmus Villemoes, Andy Shevchenko and Geert Uytterhoeven. [akpm@linux-foundation.org: remove __pure usage due to mysterious warning] Link: http://lkml.kernel.org/r/f63c410e0ff76009c9b58e01027e751ff7fdb749.1552704200.git.lkml@sdf.orgSigned-off-by: NGeorge Spelvin <lkml@sdf.org> Acked-by: NAndrey Abramov <st5pub@yandex.ru> Acked-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Daniel Wagner <daniel.wagner@siemens.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Don Mullis <don.mullis@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-