- 23 9月, 2017 1 次提交
-
-
由 Helge Deller 提交于
The parisc architecture has larger stack frames than most other architectures on 32-bit kernels. Increase the maximum allowed stack frame to 1280 bytes for parisc to avoid warnings in the do_sys_poll() and pat_memconfig() functions. Signed-off-by: NHelge Deller <deller@gmx.de>
-
- 21 9月, 2017 1 次提交
-
-
由 Petar Penkov 提交于
Issue is that if the data crosses a page boundary inside a compound page, this check will incorrectly trigger a WARN_ON. To fix this, compute the order using the head of the compound page and adjust the offset to be relative to that head. Fixes: 72e809ed ("iov_iter: sanity checks for copy to/from page primitives") Signed-off-by: NPetar Penkov <ppenkov@google.com> CC: Al Viro <viro@zeniv.linux.org.uk> CC: Eric Dumazet <edumazet@google.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 20 9月, 2017 1 次提交
-
-
由 Andreas Gruenbacher 提交于
Clarify that rhashtable_walk_{stop,start} will not reset the iterator to the beginning of the hash table. Confusion between rhashtable_walk_enter and rhashtable_walk_start has already lead to a bug. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 9月, 2017 3 次提交
-
-
由 Michal Hocko 提交于
GFP_TEMPORARY was introduced by commit e12ba74d ("Group short-lived and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's primary motivation was to allow users to tell that an allocation is short lived and so the allocator can try to place such allocations close together and prevent long term fragmentation. As much as this sounds like a reasonable semantic it becomes much less clear when to use the highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the context holding that memory sleep? Can it take locks? It seems there is no good answer for those questions. The current implementation of GFP_TEMPORARY is basically GFP_KERNEL | __GFP_RECLAIMABLE which in itself is tricky because basically none of the existing caller provide a way to reclaim the allocated memory. So this is rather misleading and hard to evaluate for any benefits. I have checked some random users and none of them has added the flag with a specific justification. I suspect most of them just copied from other existing users and others just thought it might be a good idea to use without any measuring. This suggests that GFP_TEMPORARY just motivates for cargo cult usage without any reasoning. I believe that our gfp flags are quite complex already and especially those with highlevel semantic should be clearly defined to prevent from confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and replace all existing users to simply use GFP_KERNEL. Please note that SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and so they will be placed properly for memory fragmentation prevention. I can see reasons we might want some gfp flag to reflect shorterm allocations but I propose starting from a clear semantic definition and only then add users with proper justification. This was been brought up before LSF this year by Matthew [1] and it turned out that GFP_TEMPORARY really doesn't have a clear semantic. It seems to be a heuristic without any measured advantage for most (if not all) its current users. The follow up discussion has revealed that opinions on what might be temporary allocation differ a lot between developers. So rather than trying to tweak existing users into a semantic which they haven't expected I propose to simply remove the flag and start from scratch if we really need a semantic for short term allocations. [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org [akpm@linux-foundation.org: fix typo] [akpm@linux-foundation.org: coding-style fixes] [sfr@canb.auug.org.au: drm/i915: fix up] Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Acked-by: NMel Gorman <mgorman@suse.de> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Neil Brown <neilb@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Geert Uytterhoeven 提交于
With gcc 4.1.2: lib/test_bitmap.c:189: warning: integer constant is too large for `long' type lib/test_bitmap.c:190: warning: integer constant is too large for `long' type lib/test_bitmap.c:194: warning: integer constant is too large for `long' type lib/test_bitmap.c:195: warning: integer constant is too large for `long' type Add the missing "ULL" suffix to fix this. Link: http://lkml.kernel.org/r/1505040523-31230-1-git-send-email-geert@linux-m68k.org Fixes: 60ef6900 ("bitmap: introduce BITMAP_FROM_U64()") Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Acked-by: NYury Norov <ynorov@caviumnetworks.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Biggers 提交于
IDR only supports non-negative IDs. There used to be a 'WARN_ON_ONCE(id < 0)' in idr_replace(), but it was intentionally removed by commit 2e1c9b28 ("idr: remove WARN_ON_ONCE() on negative IDs"). Then it was added back by commit 0a835c4f ("Reimplement IDR and IDA using the radix tree"). However it seems that adding it back was a mistake, given that some users such as drm_gem_handle_delete() (DRM_IOCTL_GEM_CLOSE) pass in a value from userspace to idr_replace(), allowing the WARN_ON_ONCE to be triggered. drm_gem_handle_delete() actually just wants idr_replace() to return an error code if the ID is not allocated, including in the case where the ID is invalid (negative). So once again remove the bogus WARN_ON_ONCE(). This bug was found by syzkaller, which encountered the following warning: WARNING: CPU: 3 PID: 3008 at lib/idr.c:157 idr_replace+0x1d8/0x240 lib/idr.c:157 Kernel panic - not syncing: panic_on_warn set ... CPU: 3 PID: 3008 Comm: syzkaller218828 Not tainted 4.13.0-rc4-next-20170811 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190 do_trap_no_signal arch/x86/kernel/traps.c:224 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:273 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:930 RIP: 0010:idr_replace+0x1d8/0x240 lib/idr.c:157 RSP: 0018:ffff8800394bf9f8 EFLAGS: 00010297 RAX: ffff88003c6c60c0 RBX: 1ffff10007297f43 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800394bfa78 RBP: ffff8800394bfae0 R08: ffffffff82856487 R09: 0000000000000000 R10: ffff8800394bf9a8 R11: ffff88006c8bae28 R12: ffffffffffffffff R13: ffff8800394bfab8 R14: dffffc0000000000 R15: ffff8800394bfbc8 drm_gem_handle_delete+0x33/0xa0 drivers/gpu/drm/drm_gem.c:297 drm_gem_close_ioctl+0xa1/0xe0 drivers/gpu/drm/drm_gem.c:671 drm_ioctl_kernel+0x1e7/0x2e0 drivers/gpu/drm/drm_ioctl.c:729 drm_ioctl+0x72e/0xa50 drivers/gpu/drm/drm_ioctl.c:825 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 entry_SYSCALL_64_fastpath+0x1f/0xbe Here is a C reproducer: #include <fcntl.h> #include <stddef.h> #include <stdint.h> #include <sys/ioctl.h> #include <drm/drm.h> int main(void) { int cardfd = open("/dev/dri/card0", O_RDONLY); ioctl(cardfd, DRM_IOCTL_GEM_CLOSE, &(struct drm_gem_close) { .handle = -1 } ); } Link: http://lkml.kernel.org/r/20170906235306.20534-1-ebiggers3@gmail.com Fixes: 0a835c4f ("Reimplement IDR and IDA using the radix tree") Signed-off-by: NEric Biggers <ebiggers@google.com> Acked-by: NTejun Heo <tj@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: <stable@vger.kernel.org> [v4.11+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 9月, 2017 21 次提交
-
-
由 Alexey Dobriyan 提交于
Every for_each_XXX_cpu() invocation calls cpumask_next() which is an inline function: static inline unsigned int cpumask_next(int n, const struct cpumask *srcp) { /* -1 is a legal arg here. */ if (n != -1) cpumask_check(n); return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1); } However! find_next_bit() is regular out-of-line function which means "nr_cpu_ids" load and increment happen at the caller resulting in a lot of bloat x86_64 defconfig: add/remove: 3/0 grow/shrink: 8/373 up/down: 155/-5668 (-5513) x86_64 allyesconfig-ish: add/remove: 3/1 grow/shrink: 57/634 up/down: 3515/-28177 (-24662) !!! Some archs redefine find_next_bit() but it is OK: m68k inline but SMP is not supported arm out-of-line unicore32 out-of-line Function call will happen anyway, so move load and increment into callee. Link: http://lkml.kernel.org/r/20170824230010.GA1593@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Carpenter 提交于
Most checks will check for min and then max, except the int check. Flip the checks to be consistent with the other code. [mcgrof@kernel.org: massaged commit log] Link: http://lkml.kernel.org/r/20170802211707.28020-3-mcgrof@kernel.orgSigned-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Jessica Yu <jeyu@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michal Marek <mmarek@suse.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Colin Ian King <colin.king@canonical.com> Cc: David Binderman <dcb314@hotmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Carpenter 提交于
The UINT_MAX comparison is not needed because "max" is already an unsigned int, and we expect developer C code max value input to have a sensible 0 - UINT_MAX range. Note that if it so happens to be UINT_MAX + 1 it would lead to an issue, but we expect the developer to know this. [mcgrof@kernel.org: massaged commit log] Link: http://lkml.kernel.org/r/20170802211707.28020-2-mcgrof@kernel.orgSigned-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Jessica Yu <jeyu@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michal Marek <mmarek@suse.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Colin Ian King <colin.king@canonical.com> Cc: David Binderman <dcb314@hotmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Takashi Iwai 提交于
The sprint_oid() utility function doesn't properly check the buffer size that it causes that the warning in vsnprintf() be triggered. For example on v4.1 kernel: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 2357 at lib/vsprintf.c:1867 vsnprintf+0x5a7/0x5c0() ... We can trigger this issue by injecting maliciously crafted x509 cert in DER format. Just using hex editor to change the length of OID to over the length of the SEQUENCE container. For example: 0:d=0 hl=4 l= 980 cons: SEQUENCE 4:d=1 hl=4 l= 700 cons: SEQUENCE 8:d=2 hl=2 l= 3 cons: cont [ 0 ] 10:d=3 hl=2 l= 1 prim: INTEGER :02 13:d=2 hl=2 l= 9 prim: INTEGER :9B47FAF791E7D1E3 24:d=2 hl=2 l= 13 cons: SEQUENCE 26:d=3 hl=2 l= 9 prim: OBJECT :sha256WithRSAEncryption 37:d=3 hl=2 l= 0 prim: NULL 39:d=2 hl=2 l= 121 cons: SEQUENCE 41:d=3 hl=2 l= 22 cons: SET 43:d=4 hl=2 l= 20 cons: SEQUENCE <=== the SEQ length is 20 45:d=5 hl=2 l= 3 prim: OBJECT :organizationName <=== the original length is 3, change the length of OID to over the length of SEQUENCE Pawel Wieczorkiewicz reported this problem and Takashi Iwai provided patch to fix it by checking the bufsize in sprint_oid(). Link: http://lkml.kernel.org/r/20170903021646.2080-1-jlee@suse.comSigned-off-by: NTakashi Iwai <tiwai@suse.de> Signed-off-by: N"Lee, Chun-Yi" <jlee@suse.com> Reported-by: NPawel Wieczorkiewicz <pwieczorkiewicz@suse.com> Cc: David Howells <dhowells@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Pawel Wieczorkiewicz <pwieczorkiewicz@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Dumazet 提交于
__radix_tree_preload() only disables preemption if no error is returned. So we really need to make sure callers always check the return value. idr_preload() contract is to always disable preemption, so we need to add a missing preempt_disable() if an error happened. Similarly, ida_pre_get() only needs to call preempt_enable() in the case no error happened. Link: http://lkml.kernel.org/r/1504637190.15310.62.camel@edumazet-glaptop3.roam.corp.google.com Fixes: 0a835c4f ("Reimplement IDR and IDA using the radix tree") Fixes: 7ad3d4d8 ("ida: Move ida_bitmap to a percpu variable") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> [4.11+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baoquan He 提交于
One line of code was commented out by c++ style comment for debugging, but forgot removing it. Clean it up. Link: http://lkml.kernel.org/r/1503312113-11843-1-git-send-email-bhe@redhat.comSigned-off-by: NBaoquan He <bhe@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Carpenter 提交于
This is mostly to keep the number of static checker warnings down so we can spot new bugs instead of them being drowned in noise. This function doesn't return normal kernel error codes but instead the return value is used to display exactly which memory failed. I chose -1 as hopefully that's a helpful thing to print. Link: http://lkml.kernel.org/r/20170817115420.uikisjvfmtrqkzjn@mwandaSigned-off-by: NDan Carpenter <dan.carpenter@oracle.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com> Cc: Daniel Micay <danielmicay@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yury Norov 提交于
The macro is the compile-time analogue of bitmap_from_u64() with the same purpose: convert the 64-bit number to the properly ordered pair of 32-bit parts, suitable for filling the bitmap in 32-bit BE environment. Use it to make test_bitmap_parselist() correct for 32-bit BE ABIs. Tested on BE mips/qemu. [akpm@linux-foundation.org: tweak code comment] Link: http://lkml.kernel.org/r/20170810172916.24144-1-ynorov@caviumnetworks.comSigned-off-by: NYury Norov <ynorov@caviumnetworks.com> Cc: Noam Camus <noamca@mellanox.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yury Norov 提交于
Do some basic checks for bitmap_parselist(). [akpm@linux-foundation.org: fix printk warning] Link: http://lkml.kernel.org/r/20170807225438.16161-2-ynorov@caviumnetworks.comSigned-off-by: NYury Norov <ynorov@caviumnetworks.com> Cc: Noam Camus <noamca@mellanox.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yury Norov 提交于
Current implementation of bitmap_parselist() uses a static variable to save local state while setting bits in the bitmap. It is obviously wrong if we assume execution in multiprocessor environment. Fortunately, it's possible to rewrite this portion of code to avoid using the static variable. It is also possible to set bits in the mask per-range with bitmap_set(), not per-bit, as it is implemented now, with set_bit(); which is way faster. The important side effect of this change is that setting bits in this function from now is not per-bit atomic and less memory-ordered. This is because set_bit() guarantees the order of memory accesses, while bitmap_set() does not. I think that it is the advantage of the new approach, because the bitmap_parselist() is intended to initialise bit arrays, and user should protect the whole bitmap during initialisation if needed. So protecting individual bits looks expensive and useless. Also, other range-oriented functions in lib/bitmap.c don't worry much about atomicity. With all that, setting 2k bits in map with the pattern like 0-2047:128/256 becomes ~50 times faster after applying the patch in my testing environment (arm64 hosted on qemu). The second patch of the series adds the test for bitmap_parselist(). It's not intended to cover all tricky cases, just to make sure that I didn't screw up during rework. Link: http://lkml.kernel.org/r/20170807225438.16161-1-ynorov@caviumnetworks.comSigned-off-by: NYury Norov <ynorov@caviumnetworks.com> Cc: Noam Camus <noamca@mellanox.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Florian Fainelli 提交于
Add a test module that allows testing that CONFIG_DEBUG_VIRTUAL works correctly, at least that it can catch invalid calls to virt_to_phys() against the non-linear kernel virtual address map. Link: http://lkml.kernel.org/r/20170808164035.26725-1-f.fainelli@gmail.comSigned-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andy Shevchenko 提交于
In some cases caller would like to use error code directly without shadowing. -EINVAL feels a rightful code to return in case of error in hex2bin(). Link: http://lkml.kernel.org/r/20170731135510.68023-1-andriy.shevchenko@linux.intel.comSigned-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
Allow interval trees to quickly check for overlaps to avoid unnecesary tree lookups in interval_tree_iter_first(). As of this patch, all interval tree flavors will require using a 'rb_root_cached' such that we can have the leftmost node easily available. While most users will make use of this feature, those with special functions (in addition to the generic insert, delete, search calls) will avoid using the cached option as they can do funky things with insertions -- for example, vma_interval_tree_insert_after(). [jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()] Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de> Signed-off-by: NJérôme Glisse <jglisse@redhat.com> Acked-by: NChristian König <christian.koenig@amd.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NDoug Ledford <dledford@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Jason Wang <jasowang@redhat.com> Cc: Christian Benvenuti <benve@cisco.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
We can work with a single rb_root_cached root to test both cached and non-cached rbtrees. In addition, also add a test to measure latencies between rb_first and its fast counterpart. Link: http://lkml.kernel.org/r/20170719014603.19029-7-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
This adds a second test for regular rb-tree testing in that there is no need to repeat it for the augmented flavor. Link: http://lkml.kernel.org/r/20170719014603.19029-6-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
Allows for more flexible debugging. Link: http://lkml.kernel.org/r/20170719014603.19029-5-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
While overall the code is very nicely commented, it might not be immediately obvious from the diagrams what is going on. Add a very brief summary of each case. Opposite cases where the node is the left child are left untouched. Link: http://lkml.kernel.org/r/20170719014603.19029-4-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
The only times the nil-parent (root node) condition is true is when the node is the first in the tree, or after fixing rbtree rule #4 and the case 1 rebalancing made the node the root. Such conditions do not apply most of the time: (i) The common case in an rbtree is to have more than a single node, so this is only true for the first rb_insert(). (ii) While there is a chance only one first rotation is needed, cases where the node's uncle is black (cases 2,3) are more common as we can have the following scenarios during the rotation looping: case1 only, case1+1, case2+3, case1+2+3, case3 only, etc. This patch, therefore, adds an unlikely() optimization to this conditional. When profiling with CONFIG_PROFILE_ANNOTATED_BRANCHES, a kernel build shows that the incorrect rate is less than 15%, and for workloads that involve insert mostly trees overtime tend to have less than 2% incorrect rate. Link: http://lkml.kernel.org/r/20170719014603.19029-3-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
Patch series "rbtree: Cache leftmost node internally", v4. A series to extending rbtrees to internally cache the leftmost node such that we can have fast overlap check optimization for all interval tree users[1]. The benefits of this series are that: (i) Unify users that do internal leftmost node caching. (ii) Optimize all interval tree users. (iii) Convert at least two new users (epoll and procfs) to the new interface. This patch (of 16): Red-black tree semantics imply that nodes with smaller or greater (or equal for duplicates) keys always be to the left and right, respectively. For the kernel this is extremely evident when considering our rb_first() semantics. Enabling lookups for the smallest node in the tree in O(1) can save a good chunk of cycles in not having to walk down the tree each time. To this end there are a few core users that explicitly do this, such as the scheduler and rtmutexes. There is also the desire for interval trees to have this optimization allowing faster overlap checking. This patch introduces a new 'struct rb_root_cached' which is just the root with a cached pointer to the leftmost node. The reason why the regular rb_root was not extended instead of adding a new structure was that this allows the user to have the choice between memory footprint and actual tree performance. The new wrappers on top of the regular rb_root calls are: - rb_first_cached(cached_root) -- which is a fast replacement for rb_first. - rb_insert_color_cached(node, cached_root, new) - rb_erase_cached(node, cached_root) In addition, augmented cached interfaces are also added for basic insertion and deletion operations; which becomes important for the interval tree changes. With the exception of the inserts, which adds a bool for updating the new leftmost, the interfaces are kept the same. To this end, porting rb users to the cached version becomes really trivial, and keeping current rbtree semantics for users that don't care about the optimization requires zero overhead. Link: http://lkml.kernel.org/r/20170719014603.19029-2-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox 提交于
[akpm@linux-foundation.org: minor tweaks] Link: http://lkml.kernel.org/r/20170720184539.31609-3-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox 提交于
Patch series "Multibyte memset variations", v4. A relatively common idiom we're missing is a function to fill an area of memory with a pattern which is larger than a single byte. I first noticed this with a zram patch which wanted to fill a page with an 'unsigned long' value. There turn out to be quite a few places in the kernel which can benefit from using an optimised function rather than a loop; sometimes text size, sometimes speed, and sometimes both. The optimised PowerPC version (not included here) improves performance by about 30% on POWER8 on just the raw memset_l(). Most of the extra lines of code come from the three testcases I added. This patch (of 8): memset16(), memset32() and memset64() are like memset(), but allow the caller to fill the destination with a value larger than a single byte. memset_l() and memset_p() allow the caller to use unsigned long and pointer values respectively. Link: http://lkml.kernel.org/r/20170720184539.31609-2-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 9月, 2017 1 次提交
-
-
由 Robin Murphy 提交于
mmio_flush_range() suffers from a lack of clearly-defined semantics, and is somewhat ambiguous to port to other architectures where the scope of the writeback implied by "flush" and ordering might matter, but MMIO would tend to imply non-cacheable anyway. Per the rationale in 67a3e8fe ("nd_blk: change aperture mapping from WC to WB"), the only existing use is actually to invalidate clean cache lines for ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent cleanup of the pmem API, that also now happens to be the exact purpose of arch_invalidate_pmem(), which would be a far more well-defined tool for the job. Rather than risk potentially inconsistent implementations of mmio_flush_range() for the sake of one callsite, streamline things by removing it entirely and instead move the ARCH_MEMREMAP_PMEM related definitions up to the libnvdimm level, so they can be shared by NFIT as well. This allows NFIT to be enabled for arm64. Signed-off-by: NRobin Murphy <robin.murphy@arm.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 31 8月, 2017 2 次提交
-
-
由 Alexander Kuleshov 提交于
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: NJonathan Corbet <corbet@lwn.net>
-
由 Chris Mi 提交于
The following new APIs are added: int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index, unsigned long start, unsigned long end, gfp_t gfp); void *idr_remove_ext(struct idr *idr, unsigned long id); void *idr_find_ext(const struct idr *idr, unsigned long id); void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id); void *idr_get_next_ext(struct idr *idr, unsigned long *nextid); Signed-off-by: NChris Mi <chrism@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 8月, 2017 1 次提交
-
-
由 Peter Zijlstra 提交于
Commit: e9149858 ("locking/lockdep/selftests: Add mixed read-write ABBA tests") adds an explicit FAILURE to the locking selftest but overlooked the fact that this kills lockdep. Fudge the test to avoid this. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/20170828124245.xlo2yshxq2btgmuf@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 26 8月, 2017 1 次提交
-
-
由 Denys Vlasenko 提交于
Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: mingo@redhat.com Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Megha Dey <megha.dey@linux.intel.com> Cc: Gayatri Kammela <gayatri.kammela@intel.com> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NShaohua Li <shli@fb.com>
-
- 25 8月, 2017 1 次提交
-
-
由 Peter Zijlstra 提交于
Currently lockdep has limited support for recursive readers, add a few mixed read-write ABBA selftests to show the extend of these limitations. [ 0.000000] ---------------------------------------------------------------------------- [ 0.000000] | spin |wlock |rlock |mutex | wsem | rsem | [ 0.000000] -------------------------------------------------------------------------- [ 0.000000] mixed read-lock/lock-write ABBA: |FAILED| | ok | [ 0.000000] mixed read-lock/lock-read ABBA: | ok | | ok | [ 0.000000] mixed write-lock/lock-write ABBA: | ok | | ok | This clearly illustrates the case where lockdep fails to find a deadlock. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: byungchul.park@lge.com Cc: david@fromorbit.com Cc: johannes@sipsolutions.net Cc: oleg@redhat.com Cc: tj@kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 22 8月, 2017 1 次提交
-
-
由 Stephan Mueller 提交于
Using sg_miter_start and sg_miter_next, the buffer of an SG is kmap'ed to *buff. The current code calls sg_miter_stop (and thus kunmap) on the SG entry before the last access of *buff. The patch moves the sg_miter_stop call after the last access to *buff to ensure that the memory pointed to by *buff is still mapped. Fixes: 4816c940 ("lib/mpi: Fix SG miter leak") Cc: <stable@vger.kernel.org> Signed-off-by: NStephan Mueller <smueller@chronox.de> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
- 18 8月, 2017 2 次提交
-
-
由 Chris Wilson 提交于
This was the competing idea long ago, but it was only with the rewrite of the idr as an radixtree and using the radixtree directly ourselves, along with the realisation that we can store the vma directly in the radixtree and only need a list for the reverse mapping, that made the patch performant enough to displace using a hashtable. Though the vma ht is fast and doesn't require any extra allocation (as we can embed the node inside the vma), it does require a thread for resizing and serialization and will have the occasional slow lookup. That is hairy enough to investigate alternatives and favour them if equivalent in peak performance. One advantage of allocating an indirection entry is that we can support a single shared bo between many clients, something that was done on a first-come first-serve basis for shared GGTT vma previously. To offset the extra allocations, we create yet another kmem_cache for them. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-5-chris@chris-wilson.co.uk
-
由 Thomas Gleixner 提交于
The hardlockup detector on x86 uses a performance counter based on unhalted CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the performance counter period, so the hrtimer should fire 2-3 times before the performance counter NMI fires. The NMI code checks whether the hrtimer fired since the last invocation. If not, it assumess a hard lockup. The calculation of those periods is based on the nominal CPU frequency. Turbo modes increase the CPU clock frequency and therefore shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x nominal frequency) the perf/NMI period is shorter than the hrtimer period which leads to false positives. A simple fix would be to shorten the hrtimer period, but that comes with the side effect of more frequent hrtimer and softlockup thread wakeups, which is not desired. Implement a low pass filter, which checks the perf/NMI period against kernel time. If the perf/NMI fires before 4/5 of the watchdog period has elapsed then the event is ignored and postponed to the next perf/NMI. That solves the problem and avoids the overhead of shorter hrtimer periods and more frequent softlockup thread wakeups. Fixes: 58687acb ("lockup_detector: Combine nmi_watchdog and softlockup detector") Reported-and-tested-by: NKan Liang <Kan.liang@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: dzickus@redhat.com Cc: prarit@redhat.com Cc: ak@linux.intel.com Cc: babu.moger@oracle.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: acme@redhat.com Cc: stable@vger.kernel.org Cc: atomlin@redhat.com Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos
-
- 17 8月, 2017 4 次提交
-
-
由 Ingo Molnar 提交于
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive The syntax to turn Kconfig options into non-interactive ones is to not offer interactive prompt help texts. Remove them. Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Byungchul Park 提交于
'complete' is an adjective and LOCKDEP_COMPLETE sounds like 'lockdep is complete', so pick a better name that uses a noun. Signed-off-by: NByungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/1502960261-16206-3-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Byungchul Park 提交于
Lockdep doesn't have to be made to work with crossrelease and just works with them. Reword the title so that what the option does is clear. Signed-off-by: NByungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/1502960261-16206-2-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Byungchul Park 提交于
Crossrelease support added the CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETE options. It makes little sense to enable them when PROVE_LOCKING is disabled. Make them non-interative options and part of PROVE_LOCKING to simplify the UI. Signed-off-by: NByungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/1502960261-16206-1-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-