- 28 10月, 2016 40 次提交
-
-
由 Jan Höppner 提交于
The function dasd_ro_store() calls set_disk_ro() to set the device in question read-only. Since set_disk_ro() might sleep, we can't call it while holding a lock. However, we also can't simply check if the device, block, and gdp references are valid before we call set_disk_ro() because an offline processing might have been started in the meanwhile which will destroy those references. In order to reliably call set_disk_ro() we have to ensure several things: - Still check validity of the mentioned references but additionally check if offline processing is running and bail out accordingly. Also, do this while holding the device lock. - To ensure that the block device is still safe after the lock, increase the open_count while still holding the device lock. Reviewed-by: NStefan Haberland <sth@linux.vnet.ibm.com> Signed-off-by: NJan Höppner <hoeppner@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Jan Höppner 提交于
The reference to a device in question may get lost when the extended error reporting (EER) attribute is being enabled/disabled while the device is set offline at the same time. This is due to missing refcounting and incorrect locking. Fix this by the following: - In dasd_eer_store() get the device directly and handle the refcount accordingly. - Move the lock in dasd_eer_enable() up so we can ensure safe processing. - Check if the device is being set offline and return with -EBUSY if so. - While at it, change the return code from -EPERM to -EMEDIUMTYPE as suggested by a FIXME, since that is what we're actually checking. Reviewed-by: NStefan Haberland <sth@linux.vnet.ibm.com> Signed-off-by: NJan Höppner <hoeppner@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Jan Höppner 提交于
Before we set a device offline, the open_count for the block device is checked and certain flags are checked and set as well. However, this is all done without holding any lock. Potentially, if the open_count was checked but the DASD_FLAG_OFFLINE wasn't set yet, a different process might want to increase the open_count depending on whether DASD_FLAG_OFFLINE is set or not in the meanwhile. This is quite racy and can lead to the loss of the device for that process and subsequently lead to a panic. Fix this by checking the open_count and setting the offline flags while holding the ccwdev lock. Reviewed-by: NStefan Haberland <sth@linux.vnet.ibm.com> Signed-off-by: NJan Höppner <hoeppner@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Jan Höppner 提交于
When setting certain attributes, we actually set the according feature flag. Do this by using dasd_set_feature() at a few occurrences and remove duplicate code. In dasd_set_feature() dasd_find_busid() is used to retrieve the devmap for the device in question. Combined with the change above, this would require the device to be set online at least once so that a devmap is being created. Change that by using dasd_devmap_from_cdev() instead, which uses dasd_find_busid() first and will create a devmap accordingly if there is none yet. Reviewed-by: NStefan Haberland <sth@linux.vnet.ibm.com> Signed-off-by: NJan Höppner <hoeppner@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Jan Höppner 提交于
simple_strtoul() has been marked obsolete for quite some time now. Replace a few last occurrences with kstrtouint(). Reviewed-by: NStefan Haberland <sth@linux.vnet.ibm.com> Signed-off-by: NJan Höppner <hoeppner@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Dong Jia Shi 提交于
Clean up DEV_STATE_SENSE_PGID related code, since it's not used anymore. Everything related to path verification is handled within DEV_STATE_VERIFY. Signed-off-by: NDong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Acked-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
On STP sync events the TOD clock will jump in time, either forward or backward. The TOD clocksource claims to be continuous but in case of an STP sync with a negative offset it is not. Subtract the offset injected by the STP sync check from the result of the TOD clocksource to make it continuous again. Add code to drift the offset towards zero with a fixed rate, steering 1 second in ~9 hours. Suggested-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
The last_update_clock time stamp in the lowcore should be adjusted by the TOD clock delta that is created by the clock synchronization. Otherwise the calculation of the steal time will be incorrect. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
Merge clock_sync_cpu into stp_sync_clock and split out the update of the global and per-CPU clock fields into clock_sync_global and clock_sync_local. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Jan Höppner 提交于
block->request_queue is used many times in dasd_setup_queue. Define a separate variable to increase readability a bit and to make it better reusable. Reviewed-by: NStefan Haberland <sth@linux.vnet.ibm.com> Signed-off-by: NJan Höppner <hoeppner@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Jan Höppner 提交于
Currently the block queue value max_segments is set to -1L, which is then implicitly casted to unsigned short in blk_queue_max_segments. This results in 65535 (64k) max_segments. Even though the resulting value is correct, setting it implicitly using -1L is rather confusing. Set the value explicitly using the USHRT_MAX macro instead. Reviewed-by: NStefan Haberland <sth@linux.vnet.ibm.com> Signed-off-by: NJan Höppner <hoeppner@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Michael Holzheu 提交于
Since commit d86bd1be ("mm/slub: support left redzone") it is no longer guaranteed that kmalloc(PAGE_SIZE) returns page aligned memory. After the above commit we get an error for diag224 because aligned memory is required. This leads to the following user visible error: # mount none -t s390_hypfs /sys/hypervisor/ mount: unknown filesystem type 's390_hypfs' # dmesg | grep hypfs hypfs.cccfb8: The hardware system does not provide all functions required by hypfs hypfs.7a79f0: Initialization of hypfs failed with rc=-61 Fix this problem and use get_free_page() instead of kmalloc() to get correctly aligned memory. Cc: stable@vger.kernel.org # v3.6+ Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Linus Torvalds 提交于
Merge misc fixes from Andrew Morton: "20 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: drivers/misc/sgi-gru/grumain.c: remove bogus 0x prefix from printk cris/arch-v32: cryptocop: print a hex number after a 0x prefix ipack: print a hex number after a 0x prefix block: DAC960: print a hex number after a 0x prefix fs: exofs: print a hex number after a 0x prefix lib/genalloc.c: start search from start of chunk mm: memcontrol: do not recurse in direct reclaim CREDITS: update credit information for Martin Kepplinger proc: fix NULL dereference when reading /proc/<pid>/auxv mm: kmemleak: ensure that the task stack is not freed during scanning lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB latent_entropy: raise CONFIG_FRAME_WARN by default kconfig.h: remove config_enabled() macro ipc: account for kmem usage on mqueue and msg mm/slab: improve performance of gathering slabinfo stats mm: page_alloc: use KERN_CONT where appropriate mm/list_lru.c: avoid error-path NULL pointer deref h8300: fix syscall restarting kcov: properly check if we are in an interrupt mm/slab: fix kmemcg cache creation delayed issue
-
由 Dimitri Sivanich 提交于
Would like to have this be a decimal number. Link: http://lkml.kernel.org/r/20161026134746.GA30169@sgi.comSigned-off-by: NDimitri Sivanich <sivanich@sgi.com> Reported-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Uwe Kleine-König 提交于
It makes the result hard to interpret correctly if a base 10 number is prefixed by 0x. So change to a hex number. Link: http://lkml.kernel.org/r/20161026125658.25728-6-u.kleine-koenig@pengutronix.deSigned-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Uwe Kleine-König 提交于
It makes the result hard to interpret correctly if a base 10 number is prefixed by 0x. So change to a hex number. Link: http://lkml.kernel.org/r/20161026125658.25728-4-u.kleine-koenig@pengutronix.deSigned-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Cc: Jens Taprogge <jens.taprogge@taprogge.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Uwe Kleine-König 提交于
It makes the message hard to interpret correctly if a base 10 number is prefixed by 0x. So change to a hex number. Link: http://lkml.kernel.org/r/20161026125658.25728-3-u.kleine-koenig@pengutronix.deSigned-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Uwe Kleine-König 提交于
It makes the message hard to interpret correctly if a base 10 number is prefixed by 0x. So change to a hex number. Link: http://lkml.kernel.org/r/20161026125658.25728-2-u.kleine-koenig@pengutronix.deSigned-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Boaz Harrosh <ooo@electrozaur.com> Cc: Benny Halevy <bhalevy@primarydata.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Mentz 提交于
gen_pool_alloc_algo() iterates over the chunks of a pool trying to find a contiguous block of memory that satisfies the allocation request. The shortcut if (size > atomic_read(&chunk->avail)) continue; makes the loop skip over chunks that do not have enough bytes left to fulfill the request. There are two situations, though, where an allocation might still fail: (1) The available memory is not contiguous, i.e. the request cannot be fulfilled due to external fragmentation. (2) A race condition. Another thread runs the same code concurrently and is quicker to grab the available memory. In those situations, the loop calls pool->algo() to search the entire chunk, and pool->algo() returns some value that is >= end_bit to indicate that the search failed. This return value is then assigned to start_bit. The variables start_bit and end_bit describe the range that should be searched, and this range should be reset for every chunk that is searched. Today, the code fails to reset start_bit to 0. As a result, prefixes of subsequent chunks are ignored. Memory allocations might fail even though there is plenty of room left in these prefixes of those other chunks. Fixes: 7f184275 ("lib, Make gen_pool memory allocator lockless") Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.comSigned-off-by: NDaniel Mentz <danielmentz@google.com> Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
On 4.0, we saw a stack corruption from a page fault entering direct memory cgroup reclaim, calling into btrfs_releasepage(), which then tried to allocate an extent and recursed back into a kmem charge ad nauseam: [...] btrfs_releasepage+0x2c/0x30 try_to_release_page+0x32/0x50 shrink_page_list+0x6da/0x7a0 shrink_inactive_list+0x1e5/0x510 shrink_lruvec+0x605/0x7f0 shrink_zone+0xee/0x320 do_try_to_free_pages+0x174/0x440 try_to_free_mem_cgroup_pages+0xa7/0x130 try_charge+0x17b/0x830 memcg_charge_kmem+0x40/0x80 new_slab+0x2d9/0x5a0 __slab_alloc+0x2fd/0x44f kmem_cache_alloc+0x193/0x1e0 alloc_extent_state+0x21/0xc0 __clear_extent_bit+0x2b5/0x400 try_release_extent_mapping+0x1a3/0x220 __btrfs_releasepage+0x31/0x70 btrfs_releasepage+0x2c/0x30 try_to_release_page+0x32/0x50 shrink_page_list+0x6da/0x7a0 shrink_inactive_list+0x1e5/0x510 shrink_lruvec+0x605/0x7f0 shrink_zone+0xee/0x320 do_try_to_free_pages+0x174/0x440 try_to_free_mem_cgroup_pages+0xa7/0x130 try_charge+0x17b/0x830 mem_cgroup_try_charge+0x65/0x1c0 handle_mm_fault+0x117f/0x1510 __do_page_fault+0x177/0x420 do_page_fault+0xc/0x10 page_fault+0x22/0x30 On later kernels, kmem charging is opt-in rather than opt-out, and that particular kmem allocation in btrfs_releasepage() is no longer being charged and won't recurse and overrun the stack anymore. But it's not impossible for an accounted allocation to happen from the memcg direct reclaim context, and we needed to reproduce this crash many times before we even got a useful stack trace out of it. Like other direct reclaimers, mark tasks in memcg reclaim PF_MEMALLOC to avoid recursing into any other form of direct reclaim. Then let recursive charges from PF_MEMALLOC contexts bypass the cgroup limit. Link: http://lkml.kernel.org/r/20161025141050.GA13019@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Martin Kepplinger 提交于
Content and employer changed. Link: http://lkml.kernel.org/r/1477304102-28830-1-git-send-email-martin.kepplinger@ginzinger.comSigned-off-by: NMartin Kepplinger <martink@posteo.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Leon Yu 提交于
Reading auxv of any kernel thread results in NULL pointer dereferencing in auxv_read() where mm can be NULL. Fix that by checking for NULL mm and bailing out early. This is also the original behavior changed by recent commit c5317167 ("proc: switch auxv to use of __mem_open()"). # cat /proc/2/auxv Unable to handle kernel NULL pointer dereference at virtual address 000000a8 Internal error: Oops: 17 [#1] PREEMPT SMP ARM CPU: 3 PID: 113 Comm: cat Not tainted 4.9.0-rc1-ARCH+ #1 Hardware name: BCM2709 task: ea3b0b00 task.stack: e99b2000 PC is at auxv_read+0x24/0x4c LR is at do_readv_writev+0x2fc/0x37c Process cat (pid: 113, stack limit = 0xe99b2210) Call chain: auxv_read do_readv_writev vfs_readv default_file_splice_read splice_direct_to_actor do_splice_direct do_sendfile SyS_sendfile64 ret_fast_syscall Fixes: c5317167 ("proc: switch auxv to use of __mem_open()") Link: http://lkml.kernel.org/r/1476966200-14457-1-git-send-email-chianglungyu@gmail.comSigned-off-by: NLeon Yu <chianglungyu@gmail.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Janis Danisevskis <jdanis@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Catalin Marinas 提交于
Commit 68f24b08 ("sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK") may cause the task->stack to be freed during kmemleak_scan() execution, leading to either a NULL pointer fault (if task->stack is NULL) or kmemleak accessing already freed memory. This patch uses the new try_get_task_stack() API to ensure that the task stack is not freed during kmemleak stack scanning. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=173901. Fixes: 68f24b08 ("sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK") Link: http://lkml.kernel.org/r/1476266223-14325-1-git-send-email-catalin.marinas@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com> Reported-by: NCAI Qian <caiqian@redhat.com> Tested-by: NCAI Qian <caiqian@redhat.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: CAI Qian <caiqian@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dmitry Vyukov 提交于
KASAN uses stackdepot to memorize stacks for all kmalloc/kfree calls. Current stackdepot capacity is 16MB (1024 top level entries x 4 pages on second level). Size of each stack is (num_frames + 3) * sizeof(long). Which gives us ~84K stacks. This capacity was chosen empirically and it is enough to run kernel normally. However, when lots of configs are enabled and a fuzzer tries to maximize code coverage, it easily hits the limit within tens of minutes. I've tested for long a time with number of top level entries bumped 4x (4096). And I think I've seen overflow only once. But I don't have all configs enabled and code coverage has not reached maximum yet. So bump it 8x to 8192. Since we have two-level table, memory cost of this is very moderate -- currently the top-level table is 8KB, with this patch it is 64KB, which is negligible under KASAN. Here is some approx math. 128MB allows us to memorize ~670K stacks (assuming stack is ~200b). I've grepped kernel for kmalloc|kfree|kmem_cache_alloc|kmem_cache_free| kzalloc|kstrdup|kstrndup|kmemdup and it gives ~60K matches. Most of alloc/free call sites are reachable with only one stack. But some utility functions can have large fanout. Assuming average fanout is 5x, total number of alloc/free stacks is ~300K. Link: http://lkml.kernel.org/r/1476458416-122131-1-git-send-email-dvyukov@google.comSigned-off-by: NDmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Baozeng Ding <sploving1@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
When building with the latent_entropy plugin, set the default CONFIG_FRAME_WARN to 2048, since some __init functions have many basic blocks that, when instrumented by the latent_entropy plugin, grow beyond 1024 byte stack size on 32-bit builds. Link: http://lkml.kernel.org/r/20161018211216.GA39687@beastSigned-off-by: NKees Cook <keescook@chromium.org> Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Cc: Emese Revfy <re.emese@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michal Marek <mmarek@suse.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Masahiro Yamada 提交于
The use of config_enabled() is ambiguous. For config options, IS_ENABLED(), IS_REACHABLE(), etc. will make intention clearer. Sometimes config_enabled() has been used for non-config options because it is useful to check whether the given symbol is defined or not. I have been tackling on deprecating config_enabled(), and now is the time to finish this work. Some new users have appeared for v4.9-rc1, but it is trivial to replace them: - arch/x86/mm/kaslr.c replace config_enabled() with IS_ENABLED() because CONFIG_X86_ESPFIX64 and CONFIG_EFI are boolean. - include/asm-generic/export.h replace config_enabled() with __is_defined(). Then, config_enabled() can be removed now. Going forward, please use IS_ENABLED(), IS_REACHABLE(), etc. for config options, and __is_defined() for non-config symbols. Link: http://lkml.kernel.org/r/1476616078-32252-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Acked-by: NIngo Molnar <mingo@kernel.org> Acked-by: NNicolas Pitre <nicolas.pitre@linaro.org> Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Michal Marek <mmarek@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Garnier <thgarnie@google.com> Cc: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aristeu Rozanski 提交于
When kmem accounting switched from account by default to only account if flagged by __GFP_ACCOUNT, IPC mqueue and messages was left out. The production use case at hand is that mqueues should be customizable via sysctls in Docker containers in a Kubernetes cluster. This can only be safely allowed to the users of the cluster (without the risk that they can cause resource shortage on a node, influencing other users' containers) if all resources they control are bounded, i.e. accounted for. Link: http://lkml.kernel.org/r/1476806075-1210-1-git-send-email-arozansk@redhat.comSigned-off-by: NAristeu Rozanski <arozansk@redhat.com> Reported-by: NStefan Schimanski <sttts@redhat.com> Acked-by: NDavidlohr Bueso <dave@stgolabs.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Stefan Schimanski <sttts@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aruna Ramakrishna 提交于
On large systems, when some slab caches grow to millions of objects (and many gigabytes), running 'cat /proc/slabinfo' can take up to 1-2 seconds. During this time, interrupts are disabled while walking the slab lists (slabs_full, slabs_partial, and slabs_free) for each node, and this sometimes causes timeouts in other drivers (for instance, Infiniband). This patch optimizes 'cat /proc/slabinfo' by maintaining a counter for total number of allocated slabs per node, per cache. This counter is updated when a slab is created or destroyed. This enables us to skip traversing the slabs_full list while gathering slabinfo statistics, and since slabs_full tends to be the biggest list when the cache is large, it results in a dramatic performance improvement. Getting slabinfo statistics now only requires walking the slabs_free and slabs_partial lists, and those lists are usually much smaller than slabs_full. We tested this after growing the dentry cache to 70GB, and the performance improved from 2s to 5ms. Link: http://lkml.kernel.org/r/1472517876-26814-1-git-send-email-aruna.ramakrishna@oracle.comSigned-off-by: NAruna Ramakrishna <aruna.ramakrishna@oracle.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joe Perches 提交于
Recent changes to printk require KERN_CONT uses to continue logging messages. So add KERN_CONT where necessary. [akpm@linux-foundation.org: coding-style fixes] Fixes: 4bcc595c ("printk: reinstate KERN_CONT for printing continuation lines") Link: http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git.joe@perches.comReported-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Polakov 提交于
As described in https://bugzilla.kernel.org/show_bug.cgi?id=177821: After some analysis it seems to be that the problem is in alloc_super(). In case list_lru_init_memcg() fails it goes into destroy_super(), which calls list_lru_destroy(). And in list_lru_init() we see that in case memcg_init_list_lru() fails, lru->node is freed, but not set NULL, which then leads list_lru_destroy() to believe it is initialized and call memcg_destroy_list_lru(). memcg_destroy_list_lru() in turn can access lru->node[i].memcg_lrus, which is NULL. [akpm@linux-foundation.org: add comment] Signed-off-by: NAlexander Polakov <apolyakov@beget.ru> Acked-by: NVladimir Davydov <vdavydov.dev@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mark Rutland 提交于
Back in commit f56141e3 ("all arches, signal: move restart_block to struct task_struct"), all architectures and core code were changed to use task_struct::restart_block. However, when h8300 support was subsequently restored in v4.2, it was not updated to account for this, and maintains thread_info::restart_block, which is not kept in sync. This patch drops the redundant restart_block from thread_info, and moves h8300 to the common one in task_struct, ensuring that syscall restarting always works as expected. Fixes: f56141e3 ("all arches, signal: move restart_block to struct task_struct") Link: http://lkml.kernel.org/r/1476714934-11635-1-git-send-email-mark.rutland@arm.comSigned-off-by: NMark Rutland <mark.rutland@arm.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: uclinux-h8-devel@lists.sourceforge.jp Cc: <stable@vger.kernel.org> [4.2+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Konovalov 提交于
in_interrupt() returns a nonzero value when we are either in an interrupt or have bh disabled via local_bh_disable(). Since we are interested in only ignoring coverage from actual interrupts, do a proper check instead of just calling in_interrupt(). As a result of this change, kcov will start to collect coverage from within local_bh_disable()/local_bh_enable() sections. Link: http://lkml.kernel.org/r/1476115803-20712-1-git-send-email-andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com> Acked-by: NDmitry Vyukov <dvyukov@google.com> Cc: Nicolai Stange <nicstange@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Kees Cook <keescook@chromium.org> Cc: James Morse <james.morse@arm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
There is a bug report that SLAB makes extreme load average due to over 2000 kworker thread. https://bugzilla.kernel.org/show_bug.cgi?id=172981 This issue is caused by kmemcg feature that try to create new set of kmem_caches for each memcg. Recently, kmem_cache creation is slowed by synchronize_sched() and futher kmem_cache creation is also delayed since kmem_cache creation is synchronized by a global slab_mutex lock. So, the number of kworker that try to create kmem_cache increases quietly. synchronize_sched() is for lockless access to node's shared array but it's not needed when a new kmem_cache is created. So, this patch rules out that case. Fixes: 801faf0d ("mm/slab: lockless decision to grow cache") Link: http://lkml.kernel.org/r/1475734855-4837-1-git-send-email-iamjoonsoo.kim@lge.comReported-by: NDoug Smythies <dsmythies@telus.net> Tested-by: NDoug Smythies <dsmythies@telus.net> Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
No, KASAN may not be able to co-exist with HOTPLUG_MEMORY at runtime, but for build testing there is no reason not to allow them together. This hopefully means better build coverage and fewer embarrasing silly problems like the one fixed by commit 9db4f36e ("mm: remove unused variable in memory hotplug") in the future. Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
When I removed the per-zone bitlock hashed waitqueues in commit 9dcb8b68 ("mm: remove per-zone hashtable of bitlock waitqueues"), I removed all the magic hotplug memory initialization of said waitqueues too. But when I actually _tested_ the resulting build, I stupidly assumed that "allmodconfig" would enable memory hotplug. And it doesn't, because it enables KASAN instead, which then disables hotplug memory support. As a result, my build test of the per-zone waitqueues was totally broken, and I didn't notice that the compiler warns about the now unused iterator variable 'i'. I guess I should be happy that that seems to be the worst breakage from my clearly horribly failed test coverage. Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux由 Linus Torvalds 提交于
Pull i2c fixes from Wolfram Sang: "I2C has some driver bugfixes, module autoload fixes, and driver enablement on some architectures" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: imx: defer probe if bus recovery GPIOs are not ready i2c: designware: Avoid aborted transfers with fast reacting I2C slaves i2c: i801: Fix I2C Block Read on 8-Series/C220 and later i2c: xgene: Avoid dma_buffer overrun i2c: digicolor: Fix module autoload i2c: xlr: Fix module autoload for OF registration i2c: xlp9xx: Fix module autoload i2c: jz4780: Fix module autoload i2c: allow configuration of imx driver for ColdFire architecture i2c: mark device nodes only in case of successful instantiation i2c: rk3x: Give the tuning value 0 during rk3x_i2c_v0_calc_timings i2c: hix5hd2: allow build with ARCH_HISI
-
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux由 Linus Torvalds 提交于
Pull thermal updates from Zhang Rui: "The latest Thermal Management updates for v4.9-rc3: - Fix a regression introduced by commit b721ca0d(thermal/powerclamp: remove cpu whitelist), that powerclamp driver checks cpu support in a wrong way. From: Eric Ernst. - Fix a problem that intel_pch_thermal driver misses passive trip point when the PCH thermal device has an ACPI companion device associated. From: Srinivas Pandruvada. - Add missing support for Haswell PCH thermal sensor. From: Srinivas Pandruvada" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: thermal/powerclamp: correct cpu support check thermal: intel_pch_thermal: Enable Haswell PCH thermal: intel_pch_thermal: Add an ACPI passive trip
-
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux由 Linus Torvalds 提交于
Pull s390 fixes from Martin Schwidefsky: "A few more s390 patches for 4.9: - a fix for an overflow in the dasd driver reported by UBSAN - fix a regression and add hotplug memory to the zone movable again - add ignore defines for the pkey system calls - fix the ouput of the merged stack tracer - replace printk with pr_cont in arch/s390 where appropriate - remove the arch specific return_address function again - ignore reserved channel paths at boot time - add a missing hugetlb_bad_size call to the arch backend" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/mm: fix zone calculation in arch_add_memory() s390/dumpstack: use pr_cont within show_stack and die s390/dumpstack: get rid of return_address again s390/disassambler: use pr_cont where appropriate s390/dumpstack: use pr_cont where appropriate s390/dumpstack: restore reliable indicator for call traces s390/mm: use hugetlb_bad_size() s390/cio: don't register chpids in reserved state s390: ignore pkey system calls s390/dasd: avoid undefined behaviour
-
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux由 Linus Torvalds 提交于
Pull module maintainership updates from Rusty Russell: "(Quoting from the MAINTAINERS commit:) Being a Linux kernel maintainer has been my proudest professional accomplishment, spanning the last 19 years. But now we have a surfeit of excellent hackers, and I can hand this over without regret. I'll still be around as co-maintainer for another cycle, but Jessica is now the one to convince if you want your patches applied. She rocks, and is far more timely than me too!" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: MAINTAINERS: Begin module maintainer transition
-
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux由 Linus Torvalds 提交于
Pull oreangefs updates from Mike Marshall: "A couple of orangefs cleanups sent in by other developers: - use d_fsdata instead of d_time (Miklos Szeredi) - use file_inode(file) instead of file->f_path.dentry->d_inode (Amir Goldstein)" * tag 'for-linus-4.9-rc2-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: don't use d_time orangefs: user file_inode() where it is due
-