- 25 4月, 2012 1 次提交
-
-
由 Greg Pearson 提交于
Provide systems that do not support x2apic cluster mode a mechanism to select x2apic physical mode using the FADT FORCE_APIC_PHYSICAL_DESTINATION_MODE bit. Changes from v1: (based on Suresh's comments) - removed #ifdef CONFIG_ACPI - removed #include <linux/acpi.h> Signed-off-by: NGreg Pearson <greg.pearson@hp.com> Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1335313436-32020-1-git-send-email-greg.pearson@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 4月, 2012 2 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
With commit a2ef5c4f "ACPI: Move module parameter gts and bfs to sleep.c" the wake_sleep_flags is required when calling acpi_enter_sleep_state. The assembler code in wakeup_*.S did not do that. One solution is to call it from assembler and stick the wake_sleep_flags on the stack (for 32-bit) or in %esi (for 64-bit). hpa and rafael both suggested however to create a wrapper function to call acpi_enter_sleep_state and call said wrapper function ("acpi_enter_s3") from assembler. For 32-bit, the acpi_enter_s3 ends up looking as so: push %ebp mov %esp,%ebp sub $0x8,%esp movzbl 0xc1809314,%eax [wake_sleep_flags] movl $0x3,(%esp) mov %eax,0x4(%esp) call 0xc12d1fa0 <acpi_enter_sleep_state> leave ret And 64-bit: movzbl 0x9afde1(%rip),%esi [wake_sleep_flags] push %rbp mov $0x3,%edi mov %rsp,%rbp callq 0xffffffff812e9800 <acpi_enter_sleep_state> leaveq retq Reviewed-by: NH. Peter Anvin <hpa@zytor.com> Suggested-by: NH. Peter Anvin <hpa@zytor.com> [v2: Remove extra assembler operations, per hpa review] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1335150198-21899-3-git-send-email-konrad.wilk@oracle.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Konrad Rzeszutek Wilk 提交于
With commit a2ef5c4f "ACPI: Move module parameter gts and bfs to sleep.c" the wake_sleep_flags is required when calling acpi_enter_sleep_state, which means that if there are functions outside the sleep.c code they can't get the wake_sleep_flags values. This converts the function in to a exported value and converts the module config operands to a function. Acked-by: NRafael J. Wysocki <rjw@sisk.pl> Acked-by: NLin Ming <ming.m.lin@intel.com> [v2: Parameters can be turned on/off dynamically] [v3: unsigned char -> u8] [v4: val -> kp->arg] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1335150198-21899-2-git-send-email-konrad.wilk@oracle.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 20 4月, 2012 1 次提交
-
-
由 Srivatsa S. Bhat 提交于
If the L3 disable slot is already in use, return -EEXIST instead of -EINVAL. The caller, store_cache_disable(), checks this return value to print an appropriate warning. Also, we want to signal with -EEXIST that the current index we're disabling has actually been already disabled on the node: $ echo 12 > /sys/devices/system/cpu/cpu3/cache/index3/cache_disable_0 $ echo 12 > /sys/devices/system/cpu/cpu3/cache/index3/cache_disable_0 -bash: echo: write error: File exists $ echo 12 > /sys/devices/system/cpu/cpu3/cache/index3/cache_disable_1 -bash: echo: write error: File exists $ echo 12 > /sys/devices/system/cpu/cpu5/cache/index3/cache_disable_1 -bash: echo: write error: File exists The old code would say -bash: echo: write error: Invalid argument for disable slot 1 when playing the example above with no output in dmesg, which is clearly misleading. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120419070053.GB16645@elgon.mountain [Boris: add testing for the other index too] Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
- 19 4月, 2012 1 次提交
-
-
由 Bryan O'Donoghue 提交于
Current APIC code assumes MSR_IA32_APICBASE is present for all systems. Pentium Classic P5 and friends didn't have this MSR. MSR_IA32_APICBASE was introduced as an architectural MSR by Intel @ P6. Code paths that can touch this MSR invalidly are when vendor == Intel && cpu-family == 5 and APIC bit is set in CPUID - or when you simply pass lapic on the kernel command line, on a P5. The below patch stops Linux incorrectly interfering with the MSR_IA32_APICBASE for P5 class machines. Other code paths exist that touch the MSR - however those paths are not currently reachable for a conformant P5. Signed-off-by: NBryan O'Donoghue <bryan.odonoghue@linux.intel.com> Link: http://lkml.kernel.org/r/4F8EEDD3.1080404@linux.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org>
-
- 17 4月, 2012 2 次提交
-
-
由 Oleg Nesterov 提交于
Starting from 7e16838d "i387: support lazy restore of FPU state" we assume that fpu_owner_task doesn't need restore_fpu_checking() on the context switch, its FPU state should match what we already have in the FPU on this CPU. However, debugger can change the tracee's FPU state, in this case we should reset fpu.last_cpu to ensure fpu_lazy_restore() can't return true. Change init_fpu() to do this, it is called by user_regset->set() methods. Reported-by: NJan Kratochvil <jan.kratochvil@redhat.com> Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20120416204815.GB24884@redhat.com Cc: <stable@vger.kernel.org> v3.3 Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Andreas Herrmann 提交于
It's only called from amd.c:srat_detect_node(). The introduced condition for calling the fixup code is true for all AMD multi-node processors, e.g. Magny-Cours and Interlagos. There we have 2 NUMA nodes on one socket. Thus there are cores having different numa-node-id but with equal phys_proc_id. There is no point to print error messages in such a situation. The confusing/misleading error message was introduced with commit 64be4c1c ("x86: Add x86_init platform override to fix up NUMA core numbering"). Remove the default fixup function (especially the error message) and replace it by a NULL pointer check, move the Numascale-specific condition for calling the fixup into the fixup-function itself and slightly adapt the comment. Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com> Acked-by: NBorislav Petkov <borislav.petkov@amd.com> Cc: <stable@kernel.org> Cc: <sp@numascale.com> Cc: <bp@amd64.org> Cc: <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/20120402160648.GR27684@alberich.amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 4月, 2012 1 次提交
-
-
由 Andreas Herrmann 提交于
Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com> Link: http://lkml.kernel.org/r/20120411151238.GA4794@alberich.amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 4月, 2012 2 次提交
-
-
由 Andreas Herrmann 提交于
Exit early when there's no support for a particular CPU family. Also, fixup the "no support for this CPU vendor" to be issued only when the driver is attempted to be loaded on an unsupported vendor. Cc: stable@vger.kernel.org Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/20120411163849.GE4794@alberich.amd.com [Boris: add a commit msg because Andreas is lazy] Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Andreas Herrmann 提交于
Loading the microcode driver on an unsupported CPU and subsequently unloading the driver causes WARNING: at fs/sysfs/group.c:138 mc_device_remove+0x5f/0x70 [microcode]() Hardware name: 01972NG sysfs group ffffffffa00013d0 not found for kobject 'cpu0' Modules linked in: snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel btusb snd_hda_codec bluetooth thinkpad_acpi rfkill microcode(-) [last unloaded: cfg80211] Pid: 4560, comm: modprobe Not tainted 3.4.0-rc2-00002-g258f7426 #5 Call Trace: [<ffffffff8103113b>] ? warn_slowpath_common+0x7b/0xc0 [<ffffffff81031235>] ? warn_slowpath_fmt+0x45/0x50 [<ffffffff81120e74>] ? sysfs_remove_group+0x34/0x120 [<ffffffffa00000ef>] ? mc_device_remove+0x5f/0x70 [microcode] [<ffffffff81331eb9>] ? subsys_interface_unregister+0x69/0xa0 [<ffffffff81563526>] ? mutex_lock+0x16/0x40 [<ffffffffa0000c3e>] ? microcode_exit+0x50/0x92 [microcode] [<ffffffff8107051d>] ? sys_delete_module+0x16d/0x260 [<ffffffff810a0065>] ? wait_iff_congested+0x45/0x110 [<ffffffff815656af>] ? page_fault+0x1f/0x30 [<ffffffff81565ba2>] ? system_call_fastpath+0x16/0x1b on recent kernels. This is due to commit 8a25a2fd ("cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem") which renders commit 6c53cbfc ("x86, microcode: Correct sysdev_add error path") useless. See http://marc.info/?l=linux-kernel&m=133416246406478 Avoid above warning by restoring the old driver behaviour before 6c53cbfc ("x86, microcode: Correct sysdev_add error path"). Cc: stable@vger.kernel.org Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/20120411163849.GE4794@alberich.amd.comSigned-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
- 06 4月, 2012 3 次提交
-
-
由 Emil Goode 提交于
This patch silences the following sparse warning: arch/x86/kernel/vsyscall_64.c:250:34: warning: Using plain integer as NULL pointer Signed-off-by: NEmil Goode <emilgoode@gmail.com> Acked-by: NAndy Lutomirski <luto@amacapital.net> Cc: john.stultz@linaro.org Link: http://lkml.kernel.org/r/1333306084-3776-1-git-send-email-emilgoode@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Stephen Boyd 提交于
Many users of debugfs copy the implementation of default_open() when they want to support a custom read/write function op. This leads to a proliferation of the default_open() implementation across the entire tree. Now that the common implementation has been consolidated into libfs we can replace all the users of this function with simple_open(). This replacement was done with the following semantic patch: <smpl> @ open @ identifier open_f != simple_open; identifier i, f; @@ -int open_f(struct inode *i, struct file *f) -{ ( -if (i->i_private) -f->private_data = i->i_private; | -f->private_data = i->i_private; ) -return 0; -} @ has_open depends on open @ identifier fops; identifier open.open_f; @@ struct file_operations fops = { ... -.open = open_f, +.open = simple_open, ... }; </smpl> [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: NStephen Boyd <sboyd@codeaurora.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gleb Natapov 提交于
"Page ready" async PF can kick vcpu out of idle state much like IRQ. We need to tell RCU about this. Reported-by: NSasha Levin <levinsasha928@gmail.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 03 4月, 2012 1 次提交
-
-
由 Peter Zijlstra 提交于
Steven reported his P4 not booting properly, the missing format attributes cause a NULL ptr deref. Cure this by adding the missing format specification. I took the format description out of the comment near p4_config_pack*() and hope that comment is still relatively accurate. Reported-by: NSteven Rostedt <rostedt@goodmis.org> Reported-by: NBruno Prémont <bonbons@linux-vserver.org> Tested-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1332859842.16159.227.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 30 3月, 2012 4 次提交
-
-
由 Petr Vandrovec 提交于
When processor is being hot-added to the system, acpi_map_lsapic invokes ACPI _MAT method to find APIC ID and flags, verifies that returned structure is indeed ACPI's local APIC structure, and that flags contain MADT_ENABLED bit. Then saves APIC ID, frees structure - and accesses structure when computing arguments for acpi_register_lapic call. Which sometime leads to acpi_register_lapic call being made with second argument zero, failing to bring processor online with error 'Unable to map lapic to logical cpu number'. As lapic->lapic_flags & ACPI_MADT_ENABLED was already confirmed to be non-zero few lines above, we can just pass unconditional ACPI_MADT_ENABLED to the acpi_register_lapic. Signed-off-by: NPetr Vandrovec <petr@vmware.com> Signed-off-by: NAlok N Kataria <akataria@vmware.com> Reviewed-by: NToshi Kani <toshi.kani@hp.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Boris Ostrovsky 提交于
Currently when a CPU is off-lined it enters either MWAIT-based idle or, if MWAIT is not desired or supported, HLT-based idle (which places the processor in C1 state). This patch allows processors without MWAIT support to stay in states deeper than C1. Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@amd.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Len Brown 提交于
The X86_32-only disable_hlt/enable_hlt mechanism was used by the 32-bit floppy driver. Its effect was to replace the use of the HLT instruction inside default_idle() with cpu_relax() - essentially it turned off the use of HLT. This workaround was commented in the code as: "disable hlt during certain critical i/o operations" "This halt magic was a workaround for ancient floppy DMA wreckage. It should be safe to remove." H. Peter Anvin additionally adds: "To the best of my knowledge, no-hlt only existed because of flaky power distributions on 386/486 systems which were sold to run DOS. Since DOS did no power management of any kind, including HLT, the power draw was fairly uniform; when exposed to the much hhigher noise levels you got when Linux used HLT caused some of these systems to fail. They were by far in the minority even back then." Alan Cox further says: "Also for the Cyrix 5510 which tended to go castors up if a HLT occurred during a DMA cycle and on a few other boxes HLT during DMA tended to go astray. Do we care ? I doubt it. The 5510 was pretty obscure, the 5520 fixed it, the 5530 is probably the oldest still in any kind of use." So, let's finally drop this. Signed-off-by: NLen Brown <len.brown@intel.com> Signed-off-by: NJosh Boyer <jwboyer@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: N"H. Peter Anvin" <hpa@zytor.com> Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk> Cc: Stephen Hemminger <shemminger@vyatta.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/n/tip-3rhk9bzf0x9rljkv488tloib@git.kernel.org [ If anyone cares then alternative instruction patching could be used to replace HLT with a one-byte NOP instruction. Much simpler. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jason Wessel 提交于
There has long been a limitation using software breakpoints with a kernel compiled with CONFIG_DEBUG_RODATA going back to 2.6.26. For this particular patch, it will apply cleanly and has been tested all the way back to 2.6.36. The kprobes code uses the text_poke() function which accommodates writing a breakpoint into a read-only page. The x86 kgdb code can solve the problem similarly by overriding the default breakpoint set/remove routines and using text_poke() directly. The x86 kgdb code will first attempt to use the traditional probe_kernel_write(), and next try using a the text_poke() function. The break point install method is tracked such that the correct break point removal routine will get called later on. Cc: x86@kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: stable@vger.kernel.org # >= 2.6.36 Inspried-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
- 29 3月, 2012 5 次提交
-
-
由 Liu, Chuansheng 提交于
The default irq_disable() sematics are to mark the interrupt disabled, but keep it unmasked. If the interrupt is delivered while marked disabled, the low level interrupt handler masks it and marks it pending. This is important for detecting wakeup interrupts during suspend and for edge type interrupts to avoid losing interrupts. fixup_irqs() moves the interrupts away from an offlined cpu. For certain interrupt types it needs to mask the interrupt line before changing the affinity. After affinity has changed the interrupt line is unmasked again, but only if it is not marked disabled. This breaks the lazy irq disable semantics and causes problems in suspend as the interrupt can be lost or wakeup functionality is broken. Check irqd_irq_masked() instead of irqd_irq_disabled() because irqd_irq_masked() is only set, when the core code actually masked the interrupt line. If it's not set, we unmask the interrupt and let the lazy irq disable logic deal with an eventually incoming interrupt. [ tglx: Massaged changelog and added a comment ] Signed-off-by: Nliu chuansheng <chuansheng.liu@intel.com> Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A05DFB3@SHSMSX101.ccr.corp.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Dave Young 提交于
crashkernel reservation need know the total memory size. Current get_total_mem simply use max_pfn - min_low_pfn. It is wrong because it will including memory holes in the middle. Especially for kvm guest with memory > 0xe0000000, there's below in qemu code: qemu split memory as below: if (ram_size >= 0xe0000000 ) { above_4g_mem_size = ram_size - 0xe0000000; below_4g_mem_size = 0xe0000000; } else { below_4g_mem_size = ram_size; } So for 4G mem guest, seabios will insert a 512M usable region beyond of 4G. Thus in above case max_pfn - min_low_pfn will be more than original memsize. Fixing this issue by using memblock_phys_mem_size() to get the total memsize. Signed-off-by: NDave Young <dyoung@redhat.com> Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com> Reviewed-by: NSimon Horman <horms@verge.net.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Robert Richter 提交于
Add information about LVT offset assignments to better debug firmware bugs related to this. See following examples. # dmesg | grep -i 'offset\|ibs' LVT offset 0 assigned for vector 0xf9 [Firmware Bug]: cpu 0, try to use APIC500 (LVT offset 0) for vector 0x10400, but the register is already in use for vector 0xf9 on another cpu [Firmware Bug]: cpu 0, IBS interrupt offset 0 not available (MSRC001103A=0x0000000000000100) Failed to setup IBS, -22 In this case the BIOS assigns both offsets for MCE (0xf9) and IBS (0x400) vectors to offset 0, which is why the second APIC setup (IBS) failed. With correct setup you get: # dmesg | grep -i 'offset\|ibs' LVT offset 0 assigned for vector 0xf9 LVT offset 1 assigned for vector 0x400 IBS: LVT offset 1 assigned perf: AMD IBS detected (0x00000007) oprofile: AMD IBS detected (0x00000007) Note: The vector includes also the message type to handle also NMIs (0x400). In the firmware bug message the format is the same as of the APIC500 register and includes the mask bit (bit 16) in addition. Signed-off-by: NRobert Richter <robert.richter@amd.com> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 David Howells 提交于
Disintegrate asm/system.h for X86. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NH. Peter Anvin <hpa@zytor.com> cc: x86@kernel.org
-
由 Dan Carpenter 提交于
These are used as offsets into an array of GDT_ENTRY_TLS_ENTRIES members so GDT_ENTRY_TLS_ENTRIES is one past the end of the array. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Link: http://lkml.kernel.org/r/20120324075250.GA28258@elgon.mountain Cc: <stable@vger.kernel.org> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 28 3月, 2012 2 次提交
-
-
由 Andrzej Pietrasiewicz 提交于
Adapt core x86 and IA64 architecture code for dma_map_ops changes: replace alloc/free_coherent with generic alloc/free methods. Signed-off-by: NAndrzej Pietrasiewicz <andrzej.p@samsung.com> Acked-by: NKyungmin Park <kyungmin.park@samsung.com> [removed swiotlb related changes and replaced it with wrappers, merged with IA64 patch to avoid inter-patch dependences in intel-iommu code] Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NTony Luck <tony.luck@intel.com>
-
由 Jeremy Fitzhardinge 提交于
Xen dom0 needs to paravirtualize IO operations to the IO APIC, so add a io_apic_ops for it to intercept. Do this as ops structure because there's at least some chance that another paravirtualized environment may want to intercept these. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com> Cc: jwboyer@redhat.com Cc: yinghai@kernel.org Link: http://lkml.kernel.org/r/1332385090-18056-2-git-send-email-konrad.wilk@oracle.com [ Made all the affected code easier on the eyes ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 3月, 2012 1 次提交
-
-
由 Peter Zijlstra 提交于
We should not ever enable IRQs until we're fully set up. This opens up a window where interrupts can hit the cpu and interrupts can do wakeups, wakeups need state that isn't set-up yet, in particular this cpu isn't elegible to run tasks, so if any cpu-affine task that got created in CPU_UP_PREPARE manages to get a wakeup, its affinity mask will get broken and we'll run into lots of 'interesting' problems. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-yaezmlbriluh166tfkgni22m@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 26 3月, 2012 1 次提交
-
-
由 Richard Weinberger 提交于
Both functions are mostly identical. The differences are: - x86_32's cpu_idle() makes use of check_pgt_cache(), which is a nop on both x86_32 and x86_64. - x86_64's cpu_idle() uses enter/__exit_idle/(), on x86_32 these function are a nop. - In contrast to x86_32, x86_64 calls rcu_idle_enter/exit() in the innermost loop because idle notifications need RCU. Calling these function on x86_32 also in the innermost loop does not hurt. So we can merge both functions. Signed-off-by: NRichard Weinberger <richard@nod.at> Acked-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: paulmck@linux.vnet.ibm.com Cc: josh@joshtriplett.org Cc: tj@kernel.org Link: http://lkml.kernel.org/r/1332709204-22496-1-git-send-email-richard@nod.atSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 3月, 2012 5 次提交
-
-
由 Thomas Gleixner 提交于
Sigh, warnings are there for a reason. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: John Stultz <john.stultz@linaro.org>
-
由 Hugh Dickins 提交于
After printing out the first line of a stack backtrace, print_context_stack() calls print_ftrace_graph_addr() to check if it's making a graph of function calls, usually not the case. But unfortunate ordering of assignments causes this to oops if an earlier stack overflow corrupted threadinfo->task. Reorder to avoid that irritation. ( The fact that there was a stack overflow may often be more interesting than the stack that can now be shown; but integrating that information with this stacktrace is awkward, so leave it to overflow reporting. ) Signed-off-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20120323225648.15DD5A033B@akpm.mtv.corp.google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Akinobu Mita 提交于
Use for_each_clear_bit() to iterate over all the cleared bit in a memory region. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
This renames for_each_set_bit_cont() to for_each_set_bit_from() because it is analogous to list_for_each_entry_from() in list.h rather than list_for_each_entry_continue(). This doesn't remove for_each_set_bit_cont() for now. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andy Lutomirski 提交于
We used to store the wall-to-monotonic offset and the realtime base. It's faster to precompute the monotonic base. This is about a 3% speedup on Sandy Bridge for CLOCK_MONOTONIC. It's much more impressive for CLOCK_MONOTONIC_COARSE. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
- 23 3月, 2012 6 次提交
-
-
由 Alexander Gordeev 提交于
This patch removes dead code from certain .config variations. When CONFIG_GENERIC_PENDING_IRQ=n irq move and reenable code is never get executed, nor do_unmask_irq variable updates its init value. Move the code under CONFIG_GENERIC_PENDING_IRQ macro. Signed-off-by: NAlexander Gordeev <agordeev@redhat.com> Link: http://lkml.kernel.org/r/20120320141935.GA24806@dhcp-26-207.brq.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Steffen Persvold 提交于
As suggested by Suresh Siddha and Yinghai Lu: For x2apic pre-enabled systems, apic driver is set already early through early_acpi_boot_init()/early_acpi_process_madt()/ acpi_parse_madt()/default_acpi_madt_oem_check() path so that apic_id_valid() checking will be sufficient during MADT and SRAT parsing. For non-x2apic pre-enabled systems, all apic ids should be less than 255. This allows us to substitute the checks in arch/x86/kernel/acpi/boot.c::acpi_parse_x2apic() and arch/x86/mm/srat.c::acpi_numa_x2apic_affinity_init() with apic->apic_id_valid(). In addition we can avoid feigning the x2apic cpu feature in the NumaChip apic code. The following apic drivers have separate apic_id_valid() functions which will accept x2apic type IDs : x2apic_phys x2apic_cluster x2apic_uv_x apic_numachip Signed-off-by: NSteffen Persvold <sp@numascale.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Daniel J Blueman <daniel@numascale-asia.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Jack Steiner <steiner@sgi.com> Link: http://lkml.kernel.org/r/1331925935-13372-1-git-send-email-sp@numascale.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yinghai Lu 提交于
Dave found: | During bootup, I now have 162 messages like this.. | [ 0.227346] MSR0000001b: 00000000fee00900 | [ 0.227465] MSR00000021: 0000000000000001 | [ 0.227584] MSR0000002a: 00000000c1c81400 | | commit 21c3fcf3 looks suspect. | It claims that it will only print these out if show_msr= is | passed, but that doesn't seem to be the case. Fix it by changing to the version that checks the index. Reported-and-tested-by: NDave Jones <davej@redhat.com> Signed-off-by: NYinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1332477103-4595-1-git-send-email-yinghai@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Complete the syscall-less self-profiling feature and address all complaints, namely: - capabilities, so we can detect what is actually available at runtime Add a capabilities field to perf_event_mmap_page to indicate what is actually available for use. - on x86: RDPMC weirdness due to being 40/48 bits and not sign-extending properly. - ABI documentation as to how all this stuff works. Also improve the documentation for the new features. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vweaver1@eecs.utk.edu> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1332433596.2487.33.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dmitry Adamushko 提交于
The problem occurs on !CONFIG_VM86 kernels [1] when a kernel-mode task returns from a system call with a pending signal. A real-life scenario is a child of 'khelper' returning from a failed kernel_execve() in ____call_usermodehelper() [ kernel/kmod.c ]. kernel_execve() fails due to a pending SIGKILL, which is the result of "kill -9 -1" (at least, busybox's init does it upon reboot). The loop is as follows: * syscall_exit_work: - work_pending: // start_of_the_loop - work_notify_sig: - do_notify_resume() - do_signal() - if (!user_mode(regs)) return; - resume_userspace // TIF_SIGPENDING is still set - work_pending // so we call work_pending => goto // start_of_the_loop More information can be found in another LKML thread: http://www.serverphorums.com/read.php?12,457826 [1] the problem was also seen on MIPS. Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com> Link: http://lkml.kernel.org/r/1332448765.2299.68.camel@dimm Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Jan Kiszka 提交于
Even if the content is always 0, gdb expects us to return also ds, es, fs, and gs while in x86-64 mode. Do this to avoid ugly errors on "info registers". [jason.wessel@windriver.com: adjust NUMREGBYTES for two new regs] Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
- 22 3月, 2012 2 次提交
-
-
由 Xiao Guangrong 提交于
If the required size is bigger than cached_hole_size it is better to search from free_area_cache - it is easier to get a free region, specifically for the 64 bit process whose address space is large enough Do it just as hugetlb_get_unmapped_area_topdown() in arch/x86/mm/hugetlbpage.c Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hillf Danton <dhillf@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
In some cases it may happen that pmd_none_or_clear_bad() is called with the mmap_sem hold in read mode. In those cases the huge page faults can allocate hugepmds under pmd_none_or_clear_bad() and that can trigger a false positive from pmd_bad() that will not like to see a pmd materializing as trans huge. It's not khugepaged causing the problem, khugepaged holds the mmap_sem in write mode (and all those sites must hold the mmap_sem in read mode to prevent pagetables to go away from under them, during code review it seems vm86 mode on 32bit kernels requires that too unless it's restricted to 1 thread per process or UP builds). The race is only with the huge pagefaults that can convert a pmd_none() into a pmd_trans_huge(). Effectively all these pmd_none_or_clear_bad() sites running with mmap_sem in read mode are somewhat speculative with the page faults, and the result is always undefined when they run simultaneously. This is probably why it wasn't common to run into this. For example if the madvise(MADV_DONTNEED) runs zap_page_range() shortly before the page fault, the hugepage will not be zapped, if the page fault runs first it will be zapped. Altering pmd_bad() not to error out if it finds hugepmds won't be enough to fix this, because zap_pmd_range would then proceed to call zap_pte_range (which would be incorrect if the pmd become a pmd_trans_huge()). The simplest way to fix this is to read the pmd in the local stack (regardless of what we read, no need of actual CPU barriers, only compiler barrier needed), and be sure it is not changing under the code that computes its value. Even if the real pmd is changing under the value we hold on the stack, we don't care. If we actually end up in zap_pte_range it means the pmd was not none already and it was not huge, and it can't become huge from under us (khugepaged locking explained above). All we need is to enforce that there is no way anymore that in a code path like below, pmd_trans_huge can be false, but pmd_none_or_clear_bad can run into a hugepmd. The overhead of a barrier() is just a compiler tweak and should not be measurable (I only added it for THP builds). I don't exclude different compiler versions may have prevented the race too by caching the value of *pmd on the stack (that hasn't been verified, but it wouldn't be impossible considering pmd_none_or_clear_bad, pmd_bad, pmd_trans_huge, pmd_none are all inlines and there's no external function called in between pmd_trans_huge and pmd_none_or_clear_bad). if (pmd_trans_huge(*pmd)) { if (next-addr != HPAGE_PMD_SIZE) { VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem)); split_huge_page_pmd(vma->vm_mm, pmd); } else if (zap_huge_pmd(tlb, vma, pmd, addr)) continue; /* fall through */ } if (pmd_none_or_clear_bad(pmd)) Because this race condition could be exercised without special privileges this was reported in CVE-2012-1179. The race was identified and fully explained by Ulrich who debugged it. I'm quoting his accurate explanation below, for reference. ====== start quote ======= mapcount 0 page_mapcount 1 kernel BUG at mm/huge_memory.c:1384! At some point prior to the panic, a "bad pmd ..." message similar to the following is logged on the console: mm/memory.c:145: bad pmd ffff8800376e1f98(80000000314000e7). The "bad pmd ..." message is logged by pmd_clear_bad() before it clears the page's PMD table entry. 143 void pmd_clear_bad(pmd_t *pmd) 144 { -> 145 pmd_ERROR(*pmd); 146 pmd_clear(pmd); 147 } After the PMD table entry has been cleared, there is an inconsistency between the actual number of PMD table entries that are mapping the page and the page's map count (_mapcount field in struct page). When the page is subsequently reclaimed, __split_huge_page() detects this inconsistency. 1381 if (mapcount != page_mapcount(page)) 1382 printk(KERN_ERR "mapcount %d page_mapcount %d\n", 1383 mapcount, page_mapcount(page)); -> 1384 BUG_ON(mapcount != page_mapcount(page)); The root cause of the problem is a race of two threads in a multithreaded process. Thread B incurs a page fault on a virtual address that has never been accessed (PMD entry is zero) while Thread A is executing an madvise() system call on a virtual address within the same 2 MB (huge page) range. virtual address space .---------------------. | | | | .-|---------------------| | | | | | |<-- B(fault) | | | 2 MB | |/////////////////////|-. huge < |/////////////////////| > A(range) page | |/////////////////////|-' | | | | | | '-|---------------------| | | | | '---------------------' - Thread A is executing an madvise(..., MADV_DONTNEED) system call on the virtual address range "A(range)" shown in the picture. sys_madvise // Acquire the semaphore in shared mode. down_read(¤t->mm->mmap_sem) ... madvise_vma switch (behavior) case MADV_DONTNEED: madvise_dontneed zap_page_range unmap_vmas unmap_page_range zap_pud_range zap_pmd_range // // Assume that this huge page has never been accessed. // I.e. content of the PMD entry is zero (not mapped). // if (pmd_trans_huge(*pmd)) { // We don't get here due to the above assumption. } // // Assume that Thread B incurred a page fault and .---------> // sneaks in here as shown below. | // | if (pmd_none_or_clear_bad(pmd)) | { | if (unlikely(pmd_bad(*pmd))) | pmd_clear_bad | { | pmd_ERROR | // Log "bad pmd ..." message here. | pmd_clear | // Clear the page's PMD entry. | // Thread B incremented the map count | // in page_add_new_anon_rmap(), but | // now the page is no longer mapped | // by a PMD entry (-> inconsistency). | } | } | v - Thread B is handling a page fault on virtual address "B(fault)" shown in the picture. ... do_page_fault __do_page_fault // Acquire the semaphore in shared mode. down_read_trylock(&mm->mmap_sem) ... handle_mm_fault if (pmd_none(*pmd) && transparent_hugepage_enabled(vma)) // We get here due to the above assumption (PMD entry is zero). do_huge_pmd_anonymous_page alloc_hugepage_vma // Allocate a new transparent huge page here. ... __do_huge_pmd_anonymous_page ... spin_lock(&mm->page_table_lock) ... page_add_new_anon_rmap // Here we increment the page's map count (starts at -1). atomic_set(&page->_mapcount, 0) set_pmd_at // Here we set the page's PMD entry which will be cleared // when Thread A calls pmd_clear_bad(). ... spin_unlock(&mm->page_table_lock) The mmap_sem does not prevent the race because both threads are acquiring it in shared mode (down_read). Thread B holds the page_table_lock while the page's map count and PMD table entry are updated. However, Thread A does not synchronize on that lock. ====== end quote ======= [akpm@linux-foundation.org: checkpatch fixes] Reported-by: NUlrich Obergfell <uobergfe@redhat.com> Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Jones <davej@redhat.com> Acked-by: NLarry Woodman <lwoodman@redhat.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> [2.6.38+] Cc: Mark Salter <msalter@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-