- 14 12月, 2012 12 次提交
-
-
由 David Howells 提交于
Use the ASN1_LONG_TAG and ASN1_INDEFINITE_LENGTH constants in the ASN.1 general decoder instead of the equivalent numbers. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 David Howells 提交于
Define a constant to hold the marker value seen in an indefinite-length element. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Rusty Russell 提交于
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Rusty Russell 提交于
Jan Beulich points out __COUNTER__ (gcc 4.3 and above), so let's use that to create unique ids. This is better than __LINE__ which we use today, so provide a wrapper. Stanislaw Gruszka <sgruszka@redhat.com> reported that some module parameters start with a digit, so we need to prepend when we for the unique id. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Acked-by: NJan Beulich <jbeulich@suse.com>
-
由 Josh Boyer 提交于
If CONFIG_MODULE_SIG is set, and 'make modules_sign' is called then this patch will cause the modules to get a signature appended. The make target is intended to be run after 'make modules_install', and will modify the modules in-place in the installed location. It can be used to produce signed modules after they have been processed by distribution build scripts. Signed-off-by: NJosh Boyer <jwboyer@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor typo fix)
-
由 Rusty Russell 提交于
(This is just for Acks: this won't work without the actual syscall patches, sitting in my tree for -next at the moment). Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Mimi Zohar 提交于
With the addition of the new kernel module syscall, which defines two arguments - a file descriptor to the kernel module and a pointer to a NULL terminated string of module arguments - it is now possible to measure and appraise kernel modules like any other file on the file system. This patch adds support to measure and appraise kernel modules in an extensible and consistent manner. To support filesystems without extended attribute support, additional patches could pass the signature as the first parameter. Signed-off-by: NMimi Zohar <zohar@us.ibm.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Kees Cook 提交于
This adds the finit_module syscall to the generic syscall list. Signed-off-by: NKees Cook <keescook@chromium.org> Acked-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Kees Cook 提交于
Add finit_module syscall to the ARM syscall list. Signed-off-by: NKees Cook <keescook@chromium.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Kees Cook 提交于
Now that kernel module origins can be reasoned about, provide a hook to the LSMs to make policy decisions about the module file. This will let Chrome OS enforce that loadable kernel modules can only come from its read-only hash-verified root filesystem. Other LSMs can, for example, read extended attributes for signatures, etc. Signed-off-by: NKees Cook <keescook@chromium.org> Acked-by: NSerge E. Hallyn <serge.hallyn@canonical.com> Acked-by: NEric Paris <eparis@redhat.com> Acked-by: NMimi Zohar <zohar@us.ibm.com> Acked-by: NJames Morris <james.l.morris@oracle.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Rusty Russell 提交于
Thanks to Michael Kerrisk for keeping us honest. These flags are actually useful for eliminating the only case where kmod has to mangle a module's internals: for overriding module versioning. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Acked-by: NLucas De Marchi <lucas.demarchi@profusion.mobi> Acked-by: NKees Cook <keescook@chromium.org>
-
由 Kees Cook 提交于
As part of the effort to create a stronger boundary between root and kernel, Chrome OS wants to be able to enforce that kernel modules are being loaded only from our read-only crypto-hash verified (dm_verity) root filesystem. Since the init_module syscall hands the kernel a module as a memory blob, no reasoning about the origin of the blob can be made. Earlier proposals for appending signatures to kernel modules would not be useful in Chrome OS, since it would involve adding an additional set of keys to our kernel and builds for no good reason: we already trust the contents of our root filesystem. We don't need to verify those kernel modules a second time. Having to do signature checking on module loading would slow us down and be redundant. All we need to know is where a module is coming from so we can say yes/no to loading it. If a file descriptor is used as the source of a kernel module, many more things can be reasoned about. In Chrome OS's case, we could enforce that the module lives on the filesystem we expect it to live on. In the case of IMA (or other LSMs), it would be possible, for example, to examine extended attributes that may contain signatures over the contents of the module. This introduces a new syscall (on x86), similar to init_module, that has only two arguments. The first argument is used as a file descriptor to the module and the second argument is a pointer to the NULL terminated string of module arguments. Signed-off-by: NKees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (merge fixes)
-
- 03 12月, 2012 2 次提交
-
-
由 James Hogan 提交于
Add the arch symbol prefix (if applicable) to the asm definition of modsign_certificate_list and modsign_certificate_list_end. This uses the recently defined SYMBOL_PREFIX which is derived from CONFIG_SYMBOL_PREFIX. This fixes the build of module signing on the blackfin and metag architectures. Signed-off-by: NJames Hogan <james.hogan@imgtec.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: David Howells <dhowells@redhat.com> Cc: Mike Frysinger <vapier@gentoo.org> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 James Hogan 提交于
Define SYMBOL_PREFIX to be the same as CONFIG_SYMBOL_PREFIX if set by the architecture, or "" otherwise. This avoids the need for ugly #ifdefs whenever symbols are referenced in asm blocks. Signed-off-by: NJames Hogan <james.hogan@imgtec.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Joe Perches <joe@perches.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Mike Frysinger <vapier@gentoo.org> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 02 12月, 2012 6 次提交
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq由 Linus Torvalds 提交于
Pull late workqueue fixes from Tejun Heo: "Unfortunately, I have two really late fixes. One was for a long-standing bug and queued for 3.8 but I found out about a regression introduced during 3.7-rc1 two days ago, so I'm sending out the two fixes together. The first (long-standing) one is rescuer_thread() entering exit path w/ TASK_INTERRUPTIBLE. It only triggers on workqueue destructions which isn't very frequent and the exit path can usually survive being called with TASK_INTERRUPT, so it was hidden pretty well. Apparently, if you're reiserfs, this could lead to the exiting kthread sleeping indefinitely holding a mutex, which is never good. The fix is simple - restoring TASK_RUNNING before returning from the kthread function. The second one is introduced by the new mod_delayed_work(). mod_delayed_work() was missing special case handling for 0 delay. Instead of queueing the work item immediately, it queued the timer which expires on the closest next tick. Some users of the new function converted from "[__]cancel_delayed_work() + queue_delayed_work()" combination became unhappy with the extra delay. Block unplugging led to noticeably higher number of context switches and intel 6250 wireless failed to associate with WPA-Enterprise network. The fix, again, is fairly simple. The 0 delay special case logic from queue_delayed_work_on() should be moved to __queue_delayed_work() which is shared by both queue_delayed_work_on() and mod_delayed_work_on(). The first one is difficult to trigger and the failure mode for the latter isn't completely catastrophic, so missing these two for 3.7 wouldn't make it a disastrous release, but both bugs are nasty and the fixes are fairly safe" * 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: mod_delayed_work_on() shouldn't queue timer on 0 delay workqueue: exit rescuer_thread() as TASK_RUNNING
-
由 Tejun Heo 提交于
8376fe22 ("workqueue: implement mod_delayed_work[_on]()") implemented mod_delayed_work[_on]() using the improved try_to_grab_pending(). The function is later used, among others, to replace [__]candel_delayed_work() + queue_delayed_work() combinations. Unfortunately, a delayed_work item w/ zero @delay is handled slightly differently by mod_delayed_work_on() compared to queue_delayed_work_on(). The latter skips timer altogether and directly queues it using queue_work_on() while the former schedules timer which will expire on the closest tick. This means, when @delay is zero, that [__]cancel_delayed_work() + queue_delayed_work_on() makes the target item immediately executable while mod_delayed_work_on() may induce delay of upto a full tick. This somewhat subtle difference breaks some of the converted users. e.g. block queue plugging uses delayed_work for deferred processing and uses mod_delayed_work_on() when the queue needs to be immediately unplugged. The above problem manifested as noticeably higher number of context switches under certain circumstances. The difference in behavior was caused by missing special case handling for 0 delay in mod_delayed_work_on() compared to queue_delayed_work_on(). Joonsoo Kim posted a patch to add it - ("workqueue: optimize mod_delayed_work_on() when @delay == 0")[1]. The patch was queued for 3.8 but it was described as optimization and I missed that it was a correctness issue. As both queue_delayed_work_on() and mod_delayed_work_on() use __queue_delayed_work() for queueing, it seems that the better approach is to move the 0 delay special handling to the function instead of duplicating it in mod_delayed_work_on(). Fix the problem by moving 0 delay special case handling from queue_delayed_work_on() to __queue_delayed_work(). This replaces Joonsoo's patch. [1] http://thread.gmane.org/gmane.linux.kernel/1379011/focus=1379012Signed-off-by: NTejun Heo <tj@kernel.org> Reported-and-tested-by: NAnders Kaseorg <andersk@MIT.EDU> Reported-and-tested-by: NZlatko Calusic <zlatko.calusic@iskon.hr> LKML-Reference: <alpine.DEB.2.00.1211280953350.26602@dr-wily.mit.edu> LKML-Reference: <50A78AA9.5040904@iskon.hr> Cc: Joonsoo Kim <js1304@gmail.com>
-
由 Mike Galbraith 提交于
A rescue thread exiting TASK_INTERRUPTIBLE can lead to a task scheduling off, never to be seen again. In the case where this occurred, an exiting thread hit reiserfs homebrew conditional resched while holding a mutex, bringing the box to its knees. PID: 18105 TASK: ffff8807fd412180 CPU: 5 COMMAND: "kdmflush" #0 [ffff8808157e7670] schedule at ffffffff8143f489 #1 [ffff8808157e77b8] reiserfs_get_block at ffffffffa038ab2d [reiserfs] #2 [ffff8808157e79a8] __block_write_begin at ffffffff8117fb14 #3 [ffff8808157e7a98] reiserfs_write_begin at ffffffffa0388695 [reiserfs] #4 [ffff8808157e7ad8] generic_perform_write at ffffffff810ee9e2 #5 [ffff8808157e7b58] generic_file_buffered_write at ffffffff810eeb41 #6 [ffff8808157e7ba8] __generic_file_aio_write at ffffffff810f1a3a #7 [ffff8808157e7c58] generic_file_aio_write at ffffffff810f1c88 #8 [ffff8808157e7cc8] do_sync_write at ffffffff8114f850 #9 [ffff8808157e7dd8] do_acct_process at ffffffff810a268f [exception RIP: kernel_thread_helper] RIP: ffffffff8144a5c0 RSP: ffff8808157e7f58 RFLAGS: 00000202 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff8107af60 RDI: ffff8803ee491d18 RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Signed-off-by: NMike Galbraith <mgalbraith@suse.de> Signed-off-by: NTejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs由 Linus Torvalds 提交于
Pull vfs fixes from Al Viro: "A bunch of fixes; the last one is this cycle regression, the rest are -stable fodder." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix off-by-one in argument passed by iterate_fd() to callbacks lookup_one_len: don't accept . and .. cifs: get rid of blind d_drop() in readdir nfs_lookup_revalidate(): fix a leak don't do blind d_drop() in nfs_prime_dcache()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip由 Linus Torvalds 提交于
Pull RCU fix from Ingo Molnar: "Fix leaking RCU extended quiescent state, which might trigger warnings and mess up the extended quiescent state tracking logic into thinking that we are in "RCU user mode" while we aren't." * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Fix unrecovered RCU user mode in syscall_trace_leave()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip由 Linus Torvalds 提交于
Pull perf fixes from Ingo Molnar: "This is mostly about unbreaking architectures that took the UAPI changes in the v3.7 cycle, plus misc fixes." * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf kvm: Fix building perf kvm on non x86 arches perf kvm: Rename perf_kvm to perf_kvm_stat perf: Make perf build for x86 with UAPI disintegration applied perf powerpc: Use uapi/unistd.h to fix build error tools: Pass the target in descend tools: Honour the O= flag when tool build called from a higher Makefile tools: Define a Makefile function to do subdir processing x86: Export asm/{svm.h,vmx.h,perf_regs.h} perf tools: Fix strbuf_addf() when the buffer needs to grow perf header: Fix numa topology printing perf, powerpc: Fix hw breakpoints returning -ENOSPC
-
- 01 12月, 2012 20 次提交
-
-
由 Ingo Molnar 提交于
Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: - Don't build 'perf kvm stat" on non-x86 arches, fix from Xiao Guangrong. - UAPI fixes to get perf building again in non-x86 arches, from David Howells. Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip由 Linus Torvalds 提交于
Pull x86 fixes from Peter Anvin. This includes the resume-time FPU corruption fix from the chromeos guys, marked for stable. * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, fpu: Avoid FPU lazy restore after suspend x86-32: Unbreak booting on some 486 clones x86, kvm: Remove incorrect redundant assembly constraint
-
git://linux-c6x.org/git/projects/linux-c6x-upstreaming由 Linus Torvalds 提交于
Pull C6X fixes from Mark Salter. * tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming: c6x: use generic kvm_para.h c6x: remove internal kernel symbols from exported setup.h c6x: fix misleading comment c6x: run do_notify_resume with interrupts enabled
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal由 Linus Torvalds 提交于
Pull assorted signal-related fixes from Al Viro: "uml regression fix (braino in sys_execve() patch) + a bunch of fucked sigaltstack-on-rt_sigreturn uses, similar to sparc64 fix that went in through davem's tree. m32r horrors not included - that one's waiting for maintainer." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: microblaze: rt_sigreturn is too trigger-happy about sigaltstack errors score: do_sigaltstack() expects a userland pointer... sh64: fix altstack switching on sigreturn openrisk: fix altstack switching on sigreturn um: get_safe_registers() should be done in flush_thread(), not start_thread()
-
git://git.samba.org/sfrench/cifs-2.6由 Linus Torvalds 提交于
Pull CIFS fixes from Steve French: "Two low risk, small fixes, that fix cifs regressions introduced in 3.7." * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix wrong buffer pointer usage in smb_set_file_info cifs: fix writeback race with file that is growing
-
git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc由 Linus Torvalds 提交于
Pull remoteproc fix from Ohad Ben-Cohen: "A single remoteproc fix for an error path issue reported by Ido Yariv." * tag 'rproc-3.7-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc: remoteproc: fix error path of ->find_vqs
-
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending由 Linus Torvalds 提交于
Pull target fix from Nicholas Bellinger: "So just a single target fix for v3.7.0 this time around from Roland to address a aborted command bug w/ tcm_qla2xxx fabric ports. Also, there is one outstanding IBLOCK + virtio-blk bug that is still being tracked down effecting v3.6.x, but AFAICT thus far this appears to be a bug outside of target code." * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target: Fix handling of aborted commands
-
由 Vincent Palatin 提交于
When a cpu enters S3 state, the FPU state is lost. After resuming for S3, if we try to lazy restore the FPU for a process running on the same CPU, this will result in a corrupted FPU context. Ensure that "fpu_owner_task" is properly invalided when (re-)initializing a CPU, so nobody will try to lazy restore a state which doesn't exist in the hardware. Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off, by doing thousands of suspend/resume cycles with 4 processes doing FPU operations running. Without the patch, a process is killed after a few hundreds cycles by a SIGFPE. Cc: Duncan Laurie <dlaurie@chromium.org> Cc: Olof Johansson <olofj@chromium.org> Cc: <stable@kernel.org> v3.4+ # for 3.4 need to replace this_cpu_write by percpu_write Signed-off-by: NVincent Palatin <vpalatin@chromium.org> Link: http://lkml.kernel.org/r/1354306532-1014-1-git-send-email-vpalatin@chromium.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
git://people.freedesktop.org/~airlied/linux由 Linus Torvalds 提交于
Pull DRM fixes from Dave Airlie: "Just driver fixes, nothing major, except maybe the Ironlake rc6 disable: - intel: * revert ironlake rc6 - we still have one ilk regression, but this gets rid of one big one * turn off cloning * a directed fix for Apple edp - radeon: one modesetting fix - exynos: minor fixes" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: radeon: fix pll/ctrc mapping on dce2 and dce3 hardware Revert "drm/i915: enable rc6 on ilk again" drm/i915: do not default to 18 bpp for eDP if missing from VBT drm/exynos: Fix potential NULL pointer dereference in exynos_drm_encoder.c drm/exynos: Make exynos4/5_fimd_driver_data static drm/exynos: fix overlay updating issue drm/exynos: remove unnecessary code. drm/exynos: fix linux framebuffer address setting. drm/i915: disable cloning on sdvo
-
由 Linus Torvalds 提交于
Merge misc fixes from Andrew Morton: "Seven fixes, some of them fingers-crossed :(" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (7 patches) drivers/rtc/rtc-tps65910.c: fix invalid pointer access on _remove() mm: soft offline: split thp at the beginning of soft_offline_page() mm: avoid waking kswapd for THP allocations when compaction is deferred or contended revert "Revert "mm: remove __GFP_NO_KSWAPD"" mm: vmscan: fix endless loop in kswapd balancing mm/vmemmap: fix wrong use of virt_to_page mm: compaction: fix return value of capture_free_page()
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc由 Linus Torvalds 提交于
Pull ARM SoC fixes from Arnd Bergmann: "These are three fixes for the Marvell EBU family and one for the Samsung s3c platforms. All of them are obvious should still make it into 3.7." * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: Kirkwood: Update PCI-E fixup Dove: Fix irq_to_pmu() Dove: Attempt to fix PMU/RTC interrupts ARM: S3C24XX: Fix potential NULL pointer dereference error
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc由 Linus Torvalds 提交于
Pull ARM ixp4xx bug fixes from Arnd Bergmann: "These were originally prepared by Krzysztof Halasa but not submitted in time for v3.7 due to some confusion about how ixp4xx patches should be handled. Jason Cooper thankfully offered to help out sending the patches upstream through arm-soc now, but given the timing, we could as well delay them for 3.8." * tag 'ixp4xx-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: IXP4xx: use __iomem for MMIO IXP4xx: map CPU config registers within VMALLOC region. IXP4xx: Always ioremap() Queue Manager MMIO region at boot. ixp4xx: Declare MODULE_FIRMWARE usage IXP4xx crypto: MOD_AES{128,192,256} already include key size. WAN: Remove redundant HDLC info printed by IXP4xx HSS driver. IXP4xx: Remove time limit for PCI TRDY to enable use of slow devices. IXP4xx: ixp4xx_crypto driver requires Queue Manager and NPE drivers. IXP4xx: HW pseudo-random generator is available on IXP45x/46x only. IXP4xx: Fix off-by-one bug in Goramo MultiLink platform. IXP4xx: Fix Goramo MultiLink platform compilation.
-
git://git.linaro.org/people/rmk/linux-arm由 Linus Torvalds 提交于
Pull final ARM fix from Russell King: "One final fix, spotted by Will, to do with what happens when we boot a SMP kernel on UP." * 'fixes' of git://git.linaro.org/people/rmk/linux-arm: ARM: 7586/1: sp804: set cpumask to cpu_possible_mask for clock event device
-
由 Kim, Milo 提交于
The tps65910_rtc data is registered as the platform driver data in _probe(= ). Therefore the tps65910_rtc should be used on unregistering the rtc device. And device pointer should be retrieved from the platform_device structure. This patch fixes the below oops: Unable to handle kernel NULL pointer dereference at virtual address 00000008 Modules linked in: rtc_tps65910(-) CPU: 0 Not tainted (3.7.0-rc7-next-20121128-g6b1f974-dirty #7) PC is at tps65910_rtc_alarm_irq_enable+0x20/0x2c [rtc_tps65910] (tps65910_rtc_alarm_irq_enable+0x20/0x2c [rtc_tps65910]) (tps65910_rtc_remove+0x18/0x28 [rtc_tps65910]) (platform_drv_remove+0x18/0x1c) (__device_release_driver+0x70/0xcc) (driver_detach+0xb4/0xb8) (bus_remove_driver+0x7c/0xc0) (sys_delete_module+0x148/0x21c) Signed-off-by: NMilo(Woogyom) Kim <milo.kim@ti.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
When we try to soft-offline a thp tail page, put_page() is called on the tail page unthinkingly and VM_BUG_ON is triggered in put_compound_page(). This patch splits thp before going into the main body of soft-offlining. Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" reverted, Zdenek Kabelac reported the following Hmm, so it's just took longer to hit the problem and observe kswapd0 spinning on my CPU again - it's not as endless like before - but still it easily eats minutes - it helps to turn off Firefox or TB (memory hungry apps) so kswapd0 stops soon - and restart those apps again. (And I still have like >1GB of cached memory) kswapd0 R running task 0 30 2 0x00000000 Call Trace: preempt_schedule+0x42/0x60 _raw_spin_unlock+0x55/0x60 put_super+0x31/0x40 drop_super+0x22/0x30 prune_super+0x149/0x1b0 shrink_slab+0xba/0x510 The sysrq+m indicates the system has no swap so it'll never reclaim anonymous pages as part of reclaim/compaction. That is one part of the problem but not the root cause as file-backed pages could also be reclaimed. The likely underlying problem is that kswapd is woken up or kept awake for each THP allocation request in the page allocator slow path. If compaction fails for the requesting process then compaction will be deferred for a time and direct reclaim is avoided. However, if there are a storm of THP requests that are simply rejected, it will still be the the case that kswapd is awake for a prolonged period of time as pgdat->kswapd_max_order is updated each time. This is noticed by the main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead it will loopp, shrinking a small number of pages and calling shrink_slab() on each iteration. This patch defers when kswapd gets woken up for THP allocations. For !THP allocations, kswapd is always woken up. For THP allocations, kswapd is woken up iff the process is willing to enter into direct reclaim/compaction. [akpm@linux-foundation.org: fix typo in comment] Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: Zdenek Kabelac <zkabelac@redhat.com> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Glauber Costa <glommer@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
It apepars that this patch was innocent, and we hope that "mm: avoid waking kswapd for THP allocations when compaction is deferred or contended" will fix the final kswapd-spinning cause. Cc: Zdenek Kabelac <zkabelac@redhat.com> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Kswapd does not in all places have the same criteria for a balanced zone. Zones are only being reclaimed when their high watermark is breached, but compaction checks loop over the zonelist again when the zone does not meet the low watermark plus two times the size of the allocation. This gets kswapd stuck in an endless loop over a small zone, like the DMA zone, where the high watermark is smaller than the compaction requirement. Add a function, zone_balanced(), that checks the watermark, and, for higher order allocations, if compaction has enough free memory. Then use it uniformly to check for balanced zones. This makes sure that when the compaction watermark is not met, at least reclaim happens and progress is made - or the zone is declared unreclaimable at some point and skipped entirely. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reported-by: NGeorge Spelvin <linux@horizon.com> Reported-by: NJohannes Hirte <johannes.hirte@fem.tu-ilmenau.de> Reported-by: NTomas Racek <tracek@redhat.com> Tested-by: NJohannes Hirte <johannes.hirte@fem.tu-ilmenau.de> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jianguo Wu 提交于
I enable CONFIG_DEBUG_VIRTUAL and CONFIG_SPARSEMEM_VMEMMAP, when doing memory hotremove, there is a kernel BUG at arch/x86/mm/physaddr.c:20. It is caused by free_section_usemap()->virt_to_page(), virt_to_page() is only used for kernel direct mapping address, but sparse-vmemmap uses vmemmap address, so it is going wrong here. ------------[ cut here ]------------ kernel BUG at arch/x86/mm/physaddr.c:20! invalid opcode: 0000 [#1] SMP Modules linked in: acpihp_drv acpihp_slot edd cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf fuse vfat fat loop dm_mod coretemp kvm crc32c_intel ipv6 ixgbe igb iTCO_wdt i7core_edac edac_core pcspkr iTCO_vendor_support ioatdma microcode joydev sr_mod i2c_i801 dca lpc_ich mfd_core mdio tpm_tis i2c_core hid_generic tpm cdrom sg tpm_bios rtc_cmos button ext3 jbd mbcache usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif processor thermal_sys hwmon scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh_emc scsi_dh ata_generic ata_piix libata megaraid_sas scsi_mod CPU 39 Pid: 6454, comm: sh Not tainted 3.7.0-rc1-acpihp-final+ #45 QCI QSSC-S4R/QSSC-S4R RIP: 0010:[<ffffffff8103c908>] [<ffffffff8103c908>] __phys_addr+0x88/0x90 RSP: 0018:ffff8804440d7c08 EFLAGS: 00010006 RAX: 0000000000000006 RBX: ffffea0012000000 RCX: 000000000000002c ... Signed-off-by: NJianguo Wu <wujianguo@huawei.com> Signed-off-by: NJiang Liu <jiang.liu@huawei.com> Reviewd-by: NWen Congyang <wency@cn.fujitsu.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
Commit ef6c5be6 ("fix incorrect NR_FREE_PAGES accounting (appears like memory leak)") fixes a NR_FREE_PAGE accounting leak but missed the return value which was also missed by this reviewer until today. That return value is used by compaction when adding pages to a list of isolated free pages and without this follow-up fix, there is a risk of free list corruption. Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-