- 22 6月, 2013 1 次提交
-
-
由 Al Viro 提交于
dump_seek() does SEEK_CUR, not SEEK_SET; native binfmt_aout handles it correctly (seeks by PAGE_SIZE - sizeof(struct user), getting the current position to PAGE_SIZE), compat one seeks by PAGE_SIZE and ends up at PAGE_SIZE + already written... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 13 6月, 2013 2 次提交
-
-
由 Kees Cook 提交于
Fixes a typo in register clearing code. Thanks to PaX Team for fixing this originally, and James Troup for pointing it out. Signed-off-by: NKees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/20130605184718.GA8396@www.outflux.net Cc: <stable@vger.kernel.org> v2.6.30+ Cc: PaX Team <pageexec@freemail.hu> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Kees Cook 提交于
The __vvar_page relocation should actually be listed in S_REL instead of S_ABS. Oddly, this didn't always cause things to break, presumably because there are no users for relocation information on 64 bits yet. [ hpa: Not for stable - new code in 3.10 ] Signed-off-by: NKees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/20130611185652.GA23674@www.outflux.netReported-by: NMichael Davidson <md@google.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 11 6月, 2013 1 次提交
-
-
由 Matthew Garrett 提交于
This patch reworks the UEFI anti-bricking code, including an effective reversion of cc5a080c and 31ff2f20. It turns out that calling QueryVariableInfo() from boot services results in some firmware implementations jumping to physical addresses even after entering virtual mode, so until we have 1:1 mappings for UEFI runtime space this isn't going to work so well. Reverting these gets us back to the situation where we'd refuse to create variables on some systems because they classify deleted variables as "used" until the firmware triggers a garbage collection run, which they won't do until they reach a lower threshold. This results in it being impossible to install a bootloader, which is unhelpful. Feedback from Samsung indicates that the firmware doesn't need more than 5KB of storage space for its own purposes, so that seems like a reasonable threshold. However, there's still no guarantee that a platform will attempt garbage collection merely because it drops below this threshold. It seems that this is often only triggered if an attempt to write generates a genuine EFI_OUT_OF_RESOURCES error. We can force that by attempting to create a variable larger than the remaining space. This should fail, but if it somehow succeeds we can then immediately delete it. I've tested this on the UEFI machines I have available, but I don't have a Samsung and so can't verify that it avoids the bricking problem. Signed-off-by: NMatthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Lee, Chun-Y <jlee@suse.com> [ dummy variable cleanup ] Cc: <stable@vger.kernel.org> Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
-
- 06 6月, 2013 1 次提交
-
-
由 Matt Fleming 提交于
f9a37be0 ("x86: Use PCI setup data") added support for using PCI ROM images from setup_data. This used phys_to_virt(), which is not valid for highmem addresses, and can cause a crash when booting a 32-bit kernel via the EFI boot stub. pcibios_add_device() assumes that the physical addresses stored in setup_data are accessible via the direct kernel mapping, and that calling phys_to_virt() is valid. This isn't guaranteed to be true on x86 where the direct mapping range is much smaller than on x86-64. Calling phys_to_virt() on a highmem address results in the following: BUG: unable to handle kernel paging request at 39a3c198 IP: [<c262be0f>] pcibios_add_device+0x2f/0x90 ... Call Trace: [<c2370c73>] pci_device_add+0xe3/0x130 [<c274640b>] pci_scan_single_device+0x8b/0xb0 [<c2370d08>] pci_scan_slot+0x48/0x100 [<c2371904>] pci_scan_child_bus+0x24/0xc0 [<c262a7b0>] pci_acpi_scan_root+0x2c0/0x490 [<c23b7203>] acpi_pci_root_add+0x312/0x42f ... The solution is to use ioremap() instead of phys_to_virt() to map the setup data into the kernel address space. [bhelgaas: changelog] Tested-by: NJani Nikula <jani.nikula@intel.com> Signed-off-by: NMatt Fleming <matt.fleming@intel.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Seth Forshee <seth.forshee@canonical.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: stable@vger.kernel.org # v3.8+
-
- 04 6月, 2013 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
The xen_play_dead is an undead function. When the vCPU is told to offline it ends up calling xen_play_dead wherin it calls the VCPUOP_down hypercall which offlines the vCPU. However, when the vCPU is onlined back, it resumes execution right after VCPUOP_down hypercall. That was OK (albeit the API for play_dead assumes that the CPU stays dead and never returns) but with commit 4b0c0f29 (tick: Cleanup NOHZ per cpu data on cpu down) that is no longer safe as said commit resets the ts->inidle which at the start of the cpu_idle loop was set. The net effect is that we get this warn: Broke affinity for irq 16 installing Xen timer for CPU 1 cpu 1 spinlock event irq 48 ------------[ cut here ]------------ WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0() Modules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-rc3upstream-00068-gdcdbe33a #1 Hardware name: BIOSTAR Group N61PB-M2S/N61PB-M2S, BIOS 6.00 PG 09/03/2009 ffffffff8193b448 ffff880039da5e60 ffffffff816707c8 ffff880039da5ea0 ffffffff8108ce8b ffff880039da4010 ffff88003fa8e500 ffff880039da4010 0000000000000001 ffff880039da4000 ffff880039da4010 ffff880039da5eb0 Call Trace: [<ffffffff816707c8>] dump_stack+0x19/0x1b [<ffffffff8108ce8b>] warn_slowpath_common+0x6b/0xa0 [<ffffffff8108ced5>] warn_slowpath_null+0x15/0x20 [<ffffffff810e4745>] tick_nohz_idle_exit+0x195/0x1b0 [<ffffffff810da755>] cpu_startup_entry+0x205/0x250 [<ffffffff81661070>] cpu_bringup_and_idle+0x13/0x15 ---[ end trace 915c8c486004dda1 ]--- b/c ts_inidle is set to zero. Thomas suggested that we just add a workaround to call tick_nohz_idle_enter before returning from xen_play_dead() - and that is what this patch does and fixes the issue. We also add the stable part b/c git commit 4b0c0f29 is on the stable tree. CC: stable@vger.kernel.org Suggested-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 03 6月, 2013 3 次提交
-
-
由 Gleb Natapov 提交于
apic->pending_events processing has a race that may cause INIT and SIPI processing to be reordered: vpu0: vcpu1: set INIT test_and_clear_bit(KVM_APIC_INIT) process INIT set INIT set SIPI test_and_clear_bit(KVM_APIC_SIPI) process SIPI At the end INIT is left pending in pending_events. The following patch fixes this by latching pending event before processing them. Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Paolo Bonzini 提交于
The x86-64 extended low-byte registers were fetched correctly from reg, but not from mod/rm. This fixes another bug in the boot of RHEL5.9 64-bit, but it is still not enough. Cc: <stable@vger.kernel.org> # 3.9 Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Paolo Bonzini 提交于
This is encountered when booting RHEL5.9 64-bit. There is another bug after this one that is not a simple emulation failure, but this one lets the boot proceed a bit. Cc: <stable@vger.kernel.org> # 3.9 Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
- 01 6月, 2013 1 次提交
-
-
由 Yinghai Lu 提交于
Commit 8d57470d x86, mm: setup page table in top-down causes a kernel panic while setting mem=2G. [mem 0x00000000-0x000fffff] page 4k [mem 0x7fe00000-0x7fffffff] page 1G [mem 0x7c000000-0x7fdfffff] page 1G [mem 0x00100000-0x001fffff] page 4k [mem 0x00200000-0x7bffffff] page 2M for last entry is not what we want, we should have [mem 0x00200000-0x3fffffff] page 2M [mem 0x40000000-0x7bffffff] page 1G Actually we merge the continuous ranges with same page size too early. in this case, before merging we have [mem 0x00200000-0x3fffffff] page 2M [mem 0x40000000-0x7bffffff] page 2M after merging them, will get [mem 0x00200000-0x7bffffff] page 2M even we can use 1G page to map [mem 0x40000000-0x7bffffff] that will cause problem, because we already map [mem 0x7fe00000-0x7fffffff] page 1G [mem 0x7c000000-0x7fdfffff] page 1G with 1G page, aka [0x40000000-0x7fffffff] is mapped with 1G page already. During phys_pud_init() for [0x40000000-0x7bffffff], it will not reuse existing that pud page, and allocate new one then try to use 2M page to map it instead, as page_size_mask does not include PG_LEVEL_1G. At end will have [7c000000-0x7fffffff] not mapped, loop in phys_pmd_init stop mapping at 0x7bffffff. That is right behavoir, it maps exact range with exact page size that we ask, and we should explicitly call it to map [7c000000-0x7fffffff] before or after mapping 0x40000000-0x7bffffff. Anyway we need to make sure ranges' page_size_mask correct and consistent after split_mem_range for each range. Fix that by calling adjust_range_size_mask before merging range with same page size. -v2: update change log. -v3: add more explanation why [7c000000-0x7fffffff] is not mapped, and it causes panic. Bisected-by: N"Xie, ChanglongX" <changlongx.xie@intel.com> Bisected-by: NYuanhan Liu <yuanhan.liu@linux.intel.com> Reported-and-tested-by: NYuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: NYinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1370015587-20835-1-git-send-email-yinghai@kernel.org Cc: <stable@vger.kernel.org> v3.9 Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 31 5月, 2013 2 次提交
-
-
由 Pekka Riikonen 提交于
With the addition of eagerfpu the irq_fpu_usable() now returns false negatives especially in the case of ksoftirqd and interrupted idle task, two common cases for FPU use for example in networking/crypto. With eagerfpu=off FPU use is possible in those contexts. This is because of the eagerfpu check in interrupted_kernel_fpu_idle(): ... * For now, with eagerfpu we will return interrupted kernel FPU * state as not-idle. TBD: Ideally we can change the return value * to something like __thread_has_fpu(current). But we need to * be careful of doing __thread_clear_has_fpu() before saving * the FPU etc for supporting nested uses etc. For now, take * the simple route! ... if (use_eager_fpu()) return 0; As eagerfpu is automatically "on" on those CPUs that also have the features like AES-NI this patch changes the eagerfpu check to return 1 in case the kernel_fpu_begin() has not been said yet. Once it has been the __thread_has_fpu() will start returning 0. Notice that with eagerfpu the __thread_has_fpu is always true initially. FPU use is thus always possible no matter what task is under us, unless the state has already been saved with kernel_fpu_begin(). [ hpa: this is a performance regression, not a correctness regression, but since it can be quite serious on CPUs which need encryption at interrupt time I am marking this for urgent/stable. ] Signed-off-by: NPekka Riikonen <priikone@iki.fi> Link: http://lkml.kernel.org/r/alpine.GSO.2.00.1305131356320.18@git.silcnet.org Cc: <stable@vger.kernel.org> v3.7+ Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Jan Beulich 提交于
binutils prior to 2.18 (e.g. the ones found on SLE10) don't support assembling PEXTRD, so a macro based approach like the one for PCLMULQDQ in the same file should be used. This requires making the helper macros capable of recognizing 32-bit general purpose register operands. [ hpa: tagging for stable as it is a low risk build fix ] Signed-off-by: NJan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/51A6142A02000078000D99D8@nat28.tlf.novell.com Cc: Alexander Boyko <alexander_boyko@xyratex.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Huang Ying <ying.huang@intel.com> Cc: <stable@vger.kernel.org> v3.9 Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 29 5月, 2013 2 次提交
-
-
由 Stefan Bader 提交于
Commit f447d56d introduced the implementation of the PV apic ipi interface. But there were some odd things (it seems none of which cause really any issue but maybe they should be cleaned up anyway): - xen_send_IPI_mask_allbutself (and by that xen_send_IPI_allbutself) ignore the passed in vector and only use the CALL_FUNCTION_SINGLE vector. While xen_send_IPI_all and xen_send_IPI_mask use the vector. - physflat_send_IPI_allbutself is declared unnecessarily. It is never used. This patch tries to clean up those things. Signed-off-by: NStefan Bader <stefan.bader@canonical.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Zhang Yanfei 提交于
In head_64.S, a switchover has been used to handle kernel crossing 1G, 512G boundaries. And commit 8170e6be x86, 64bit: Use a #PF handler to materialize early mappings on demand said: During the switchover in head_64.S, before #PF handler is available, we use three pages to handle kernel crossing 1G, 512G boundaries with sharing page by playing games with page aliasing: the same page is mapped twice in the higher-level tables with appropriate wraparound. But from the switchover code, when we set up the PUD table: 114 addq $4096, %rdx 115 movq %rdi, %rax 116 shrq $PUD_SHIFT, %rax 117 andl $(PTRS_PER_PUD-1), %eax 118 movq %rdx, (4096+0)(%rbx,%rax,8) 119 movq %rdx, (4096+8)(%rbx,%rax,8) It seems line 119 has a potential bug there. For example, if the kernel is loaded at physical address 511G+1008M, that is 000000000 111111111 111111000 000000000000000000000 and the kernel _end is 512G+2M, that is 000000001 000000000 000000001 000000000000000000000 So in this example, when using the 2nd page to setup PUD (line 114~119), rax is 511. In line 118, we put rdx which is the address of the PMD page (the 3rd page) into entry 511 of the PUD table. But in line 119, the entry we calculate from (4096+8)(%rbx,%rax,8) has exceeded the PUD page. IMO, the entry in line 119 should be wraparound into entry 0 of the PUD table. The patch fixes the bug. Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com> Link: http://lkml.kernel.org/r/5191DE5A.3020302@cn.fujitsu.comSigned-off-by: NYinghai Lu <yinghai@kernel.org> Cc: <stable@vger.kernel.org> v3.9 Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 28 5月, 2013 1 次提交
-
-
由 Jussi Kivilinna 提交于
The _XFER stack element size was set too small, 8 bytes, when it needs to be 16 bytes. As _XFER is the last stack element used by these implementations, the 16 byte stores with 'movdqa' corrupt the stack where the value of register %r12 is temporarily stored. As these implementations align the stack pointer to 16 bytes, this corruption did not happen every time. Patch corrects this issue. Reported-by: NJulian Wollrath <jwollrath@web.de> Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi> Tested-by: NJulian Wollrath <jwollrath@web.de> Acked-by: NTim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
- 21 5月, 2013 2 次提交
-
-
由 Linus Torvalds 提交于
In commit 78d77df7 ("x86-64, init: Do not set NX bits on non-NX capable hardware") we added the early_pmd_flags that gets the NX bit set when a CPU supports NX. However, the new variable was marked __initdata, because the main _use_ of this is in an __init routine. However, the bit setting happens from secondary_startup_64(), which is called not only at bootup, but on every secondary CPU start. Including resuming from STR and at CPU hotplug time. So the value cannot be __initdata. Reported-bisected-and-tested-by: NMichal Hocko <mhocko@suse.cz> Cc: stable@vger.kernel.org # v3.9 Acked-by: NPeter Anvin <hpa@linux.intel.com> Cc: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Bjorn Helgaas 提交于
This reverts commit dd72be99. Andy Shevchenko <andy.shevchenko@gmail.com> reported that this commit broke Intel Medfield devices. Reference: https://lkml.kernel.org/r/CAHp75Vdf6gFZChS47=grUygHBDWcoOWDYPzw+Zj5bdVCWj85Jw@mail.gmail.comReported-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 15 5月, 2013 1 次提交
-
-
由 John Stultz 提交于
Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config, which enables some minor compile time optimization to avoid uncessary code in mostly the suspend/resume path could cause problems for userland. In particular, the dependency for RTC_HCTOSYS on !ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time twice and simplifies suspend/resume, has the side effect of causing the /sys/class/rtc/rtcN/hctosys flag to always be zero, and this flag is commonly used by udev to setup the /dev/rtc symlink to /dev/rtcN, which can cause pain for older applications. While the udev rules could use some work to be less fragile, breaking userland should strongly be avoided. Additionally the compile time optimizations are fairly minor, and the code being optimized is likely to be reworked in the future, so lets revert this change. Reported-by: NKay Sievers <kay@vrfy.org> Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Cc: stable <stable@vger.kernel.org> #3.9 Cc: Feng Tang <feng.tang@intel.com> Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Link: http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 14 5月, 2013 1 次提交
-
-
由 Lee, Chun-Yi 提交于
That will be better initial the value of DataSize to zero for the input of GetVariable(), otherwise we will feed a random value. The debug log of input DataSize like this: ... [ 195.915612] EFI Variables Facility v0.08 2004-May-17 [ 195.915819] efi: size: 18446744071581821342 [ 195.915969] efi: size': 18446744071581821342 [ 195.916324] efi: size: 18446612150714306560 [ 195.916632] efi: size': 18446612150714306560 [ 195.917159] efi: size: 18446612150714306560 [ 195.917453] efi: size': 18446612150714306560 ... The size' is value that was returned by BIOS. After applied this patch: [ 82.442042] EFI Variables Facility v0.08 2004-May-17 [ 82.442202] efi: size: 0 [ 82.442360] efi: size': 1039 [ 82.443828] efi: size: 0 [ 82.444127] efi: size': 2616 [ 82.447057] efi: size: 0 [ 82.447356] efi: size': 5832 ... Found on Acer Aspire V3 BIOS, it will not return the size of data if we input a non-zero DataSize. Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: NLee, Chun-Yi <jlee@suse.com> Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
-
- 10 5月, 2013 4 次提交
-
-
由 Bjorn Helgaas 提交于
We now cache the MSI-X capability offset in the struct pci_dev, so no need to find the capability again. Acked-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Bjorn Helgaas 提交于
PCI_MSIX_FLAGS_BIRMASK is mis-named because the BIR mask is in the Table Offset register, not the flags ("Message Control" per spec) register. Acked-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Zhang Yanfei 提交于
Two sets of comments were lost during patch-series shuffling: - comments for init_range_memory_mapping() - comments in init_mem_mapping that is helpful for reminding people that the pagetable is setup top-down The comments were written by Yinghai in his patch in: https://lkml.org/lkml/2012/11/28/620 This patch reintroduces them. Originally-From: Yinghai Lu <yinghai@kernel.org> Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/518BC776.7010506@gmail.com [ Tidied it all up a bit. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 09 5月, 2013 5 次提交
-
-
由 Paolo Bonzini 提交于
This is an almost-undocumented instruction available in 32-bit mode. I say "almost" undocumented because AMD documents it in their opcode maps just to say that it is unavailable in 64-bit mode (sections "A.2.1 One-Byte Opcodes" and "B.3 Invalid and Reassigned Instructions in 64-Bit Mode"). It is roughly equivalent to "sbb %al, %al" except it does not set the flags. Use fastop to emulate it, but do not use the opcode directly because it would fail if the host is 64-bit! Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: stable@vger.kernel.org # 3.9 Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Paolo Bonzini 提交于
This is used by SGABIOS, KVM breaks with emulate_invalid_guest_state=1. It is just a MOV in disguise, with a funny source address. Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: stable@vger.kernel.org # 3.9 Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Paolo Bonzini 提交于
This is used by SGABIOS, KVM breaks with emulate_invalid_guest_state=1. AAM needs the source operand to be unsigned; do the same in AAD as well for consistency, even though it does not affect the result. Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: stable@vger.kernel.org # 3.9 Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Konrad Rzeszutek Wilk 提交于
This can easily be triggered if a new CPU is added (via ACPI hotplug mechanism) and from user-space you do: echo 1 > /sys/devices/system/cpu/cpu3/online (or wait for UDEV to do it) on a newly appeared physical CPU. The deadlock is that the "store_online" in drivers/base/cpu.c takes the cpu_hotplug_driver_lock() lock, then calls "cpu_up". "cpu_up" eventually ends up calling "save_mc_for_early" which also takes the cpu_hotplug_driver_lock() lock. And here is that lockdep thinks of it: smpboot: Stack at about ffff880075c39f44 smpboot: CPU3: has booted. microcode: CPU3 sig=0x206a7, pf=0x2, revision=0x25 ============================================= [ INFO: possible recursive locking detected ] 3.9.0upstream-10129-g167af0e #1 Not tainted --------------------------------------------- sh/2487 is trying to acquire lock: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 but task is already holding lock: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(x86_cpu_hotplug_driver_mutex); lock(x86_cpu_hotplug_driver_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 6 locks held by sh/2487: #0: (sb_writers#5){.+.+.+}, at: [<ffffffff811ca48d>] vfs_write+0x17d/0x190 #1: (&buffer->mutex){+.+.+.}, at: [<ffffffff812464ef>] sysfs_write_file+0x3f/0x160 #2: (s_active#20){.+.+.+}, at: [<ffffffff81246578>] sysfs_write_file+0xc8/0x160 #3: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 #4: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810961c2>] cpu_maps_update_begin+0x12/0x20 #5: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810962a7>] cpu_hotplug_begin+0x27/0x60 Suggested-and-Acked-by: NBorislav Petkov <bp@alien8.de> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: fenghua.yu@intel.com Cc: xen-devel@lists.xensource.com Cc: stable@vger.kernel.org # for v3.9 Link: http://lkml.kernel.org/r/1368029583-23337-1-git-send-email-konrad.wilk@oracle.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Gleb Natapov 提交于
The invalid guest state emulation loop does not check halt_request which causes 100% cpu loop while guest is in halt and in invalid state, but more serious issue is that this leaves halt_request set, so random instruction emulated by vm86 #GP exit can be interpreted as halt which causes guest hang. Fix both problems by handling halt_request in emulation loop. Reported-by: NTomas Papan <tomas.papan@gmail.com> Tested-by: NTomas Papan <tomas.papan@gmail.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> CC: stable@vger.kernel.org Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
- 08 5月, 2013 4 次提交
-
-
由 Zhenzhong Duan 提交于
On x2apic enabled pvm, doing sysrq+l, got NULL pointer dereference as below. SysRq : Show backtrace of all active CPUs BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8125e3cb>] memcpy+0xb/0x120 Call Trace: [<ffffffff81039633>] ? __x2apic_send_IPI_mask+0x73/0x160 [<ffffffff8103973e>] x2apic_send_IPI_all+0x1e/0x20 [<ffffffff8103498c>] arch_trigger_all_cpu_backtrace+0x6c/0xb0 [<ffffffff81501be4>] ? _raw_spin_lock_irqsave+0x34/0x50 [<ffffffff8131654e>] sysrq_handle_showallcpus+0xe/0x10 [<ffffffff8131616d>] __handle_sysrq+0x7d/0x140 [<ffffffff81316230>] ? __handle_sysrq+0x140/0x140 [<ffffffff81316287>] write_sysrq_trigger+0x57/0x60 [<ffffffff811ca996>] proc_reg_write+0x86/0xc0 [<ffffffff8116dd8e>] vfs_write+0xce/0x190 [<ffffffff8116e3e5>] sys_write+0x55/0x90 [<ffffffff8150a242>] system_call_fastpath+0x16/0x1b That's because apic points to apic_x2apic_cluster or apic_x2apic_phys but the basic element like cpumask isn't initialized. Mask x2APIC feature in pvm to avoid overwrite of apic pointer, update commit message per Konrad's suggestion. Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com> Tested-by: NTamon Shiose <tamon.shiose@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
During review of git commit cb9c6f15 ("xen/spinlock: Check against default value of -1 for IRQ line.") Stefano pointed out a bug in the patch. Unfortunatly due to vacation timing the fix was not applied and this patch fixes it up. Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
As it will point to some data, but not event channel data (the shared_info has an array limited to 32). This means that for PVHVM guests with more than 32 VCPUs without the usage of VCPUOP_register_info any interrupts to VCPUs larger than 32 would have gone unnoticed during early bootup. That is OK, as during early bootup, in smp_init we end up calling the hotplug mechanism (xen_hvm_cpu_notify) which makes the VCPUOP_register_vcpu_info call for all VCPUs and we can receive interrupts on VCPUs 33 and further. This is just a cleanup. Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Marcelo Tosatti 提交于
Emulation of xcr0 writes zero guest_xcr0_loaded variable so that subsequent VM-entry reloads CPU's xcr0 with guests xcr0 value. However, this is incorrect because guest_xcr0_loaded variable is read to decide whether to reload hosts xcr0. In case the vcpu thread is scheduled out after the guest_xcr0_loaded = 0 assignment, and scheduler decides to preload FPU: switch_to { __switch_to __math_state_restore restore_fpu_checking fpu_restore_checking if (use_xsave()) fpu_xrstor_checking xrstor64 with CPU's xcr0 == guests xcr0 Fix by properly restoring hosts xcr0 during emulation of xcr0 writes. Analyzed-by: NUlrich Obergfell <uobergfe@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
- 07 5月, 2013 3 次提交
-
-
由 Thomas Gleixner 提交于
The core code expects the arch idle code to return with interrupts enabled. The conversion missed two x86 cases which fail to do that. Reported-and-tested-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Tested-by: NBorislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305021557030.3972@ionosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Michel Lespinasse 提交于
modify __down_write[_nested] and __down_write_trylock to grab the write lock whenever the active count is 0, even if there are queued waiters (they must be writers pending wakeup, since the active count is 0). Note that this is an optimization only; architectures without this optimization will still work fine: - __down_write() would take the slow path which would take the wait_lock and then try stealing the lock (as in the spinlocked rwsem implementation) - __down_write_trylock() would fail, but callers must be ready to deal with that - since there are some writers pending wakeup, they could have raced with us and obtained the lock before we steal it. Signed-off-by: NMichel Lespinasse <walken@google.com> Reviewed-by: NPeter Hurley <peter@hurleysoftware.com> Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konrad Rzeszutek Wilk 提交于
They are important structures and it is not clear at first look what they are for. The xen_vcpu is a pointer. By default it points to the shared_info structure (at the CPU offset location). However if the VCPUOP_register_vcpu_info hypercall is implemented we can make the xen_vcpu pointer point to a per-CPU location. Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> [v1: Added comments from Ian Campbell] Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 06 5月, 2013 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
If a user did: echo 0 > /sys/devices/system/cpu/cpu1/online echo 1 > /sys/devices/system/cpu/cpu1/online we would (this a build with DEBUG enabled) get to: smpboot: ++++++++++++++++++++=_---CPU UP 1 .. snip.. smpboot: Stack at about ffff880074c0ff44 smpboot: CPU1: has booted. and hang. The RCU mechanism would kick in an try to IPI the CPU1 but the IPIs (and all other interrupts) would never arrive at the CPU1. At first glance at least. A bit digging in the hypervisor trace shows that (using xenanalyze): [vla] d4v1 vec 243 injecting 0.043163027 --|x d4v1 intr_window vec 243 src 5(vector) intr f3 ] 0.043163639 --|x d4v1 vmentry cycles 1468 ] 0.043164913 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254 0.043164913 --|x d4v1 inj_virq vec 243 real [vla] d4v1 vec 243 injecting 0.043164913 --|x d4v1 intr_window vec 243 src 5(vector) intr f3 ] 0.043165526 --|x d4v1 vmentry cycles 1472 ] 0.043166800 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254 0.043166800 --|x d4v1 inj_virq vec 243 real [vla] d4v1 vec 243 injecting there is a pending event (subsequent debugging shows it is the IPI from the VCPU0 when smpboot.c on VCPU1 has done "set_cpu_online(smp_processor_id(), true)") and the guest VCPU1 is interrupted with the callback IPI (0xf3 aka 243) which ends up calling __xen_evtchn_do_upcall. The __xen_evtchn_do_upcall seems to do *something* but not acknowledge the pending events. And the moment the guest does a 'cli' (that is the ffffffff81673254 in the log above) the hypervisor is invoked again to inject the IPI (0xf3) to tell the guest it has pending interrupts. This repeats itself forever. The culprit was the per_cpu(xen_vcpu, cpu) pointer. At the bootup we set each per_cpu(xen_vcpu, cpu) to point to the shared_info->vcpu_info[vcpu] but later on use the VCPUOP_register_vcpu_info to register per-CPU structures (xen_vcpu_setup). This is used to allow events for more than 32 VCPUs and for performance optimizations reasons. When the user performs the VCPU hotplug we end up calling the the xen_vcpu_setup once more. We make the hypercall which returns -EINVAL as it does not allow multiple registration calls (and already has re-assigned where the events are being set). We pick the fallback case and set per_cpu(xen_vcpu, cpu) to point to the shared_info->vcpu_info[vcpu] (which is a good fallback during bootup). However the hypervisor is still setting events in the register per-cpu structure (per_cpu(xen_vcpu_info, cpu)). As such when the events are set by the hypervisor (such as timer one), and when we iterate in __xen_evtchn_do_upcall we end up reading stale events from the shared_info->vcpu_info[vcpu] instead of the per_cpu(xen_vcpu_info, cpu) structures. Hence we never acknowledge the events that the hypervisor has set and the hypervisor keeps on reminding us to ack the events which we never do. The fix is simple. Don't on the second time when xen_vcpu_setup is called over-write the per_cpu(xen_vcpu, cpu) if it points to per_cpu(xen_vcpu_info). Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> CC: stable@vger.kernel.org Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 05 5月, 2013 1 次提交
-
-
由 Peter Zijlstra 提交于
We should always have proper privileges when requesting kernel data. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: eranian@google.com Link: http://lkml.kernel.org/r/20130503121256.230745028@chello.nl [ Fix build error reported by fengguang.wu@intel.com, propagate error code back. ] Signed-off-by: NIngo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/n/tip-v0x9ky3ahzr6nm3c6ilwrili@git.kernel.org
-
- 04 5月, 2013 2 次提交
-
-
由 Peter Zijlstra 提交于
The LBR 'from' adddress is under full userspace control; ensure we validate it before reading from it. Note: is_module_text_address() can potentially be quite expensive; for those running into that with high overhead in modules optimize it using an RCU backed rb-tree. Reported-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/20130503121256.158211806@chello.nlSigned-off-by: NIngo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/n/tip-mk8i82ffzax01cnqo829iy1q@git.kernel.org
-
由 Peter Zijlstra 提交于
Errata BV98 states that all MEM_*_RETIRED events corrupt the counter value of the SMT sibling's counters. Blacklist these events Reported-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/20130503121256.083340271@chello.nlSigned-off-by: NIngo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/n/tip-jwra43mujrv1oq9xk6mfe57v@git.kernel.org
-
- 03 5月, 2013 1 次提交
-
-
由 Jan Kiszka 提交于
With VMX, enable_irq_window can now return -EBUSY, in which case an immediate exit shall be requested before entering the guest. Account for this also in enable_nmi_window which uses enable_irq_window in absence of vnmi support, e.g. Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-