- 09 6月, 2017 1 次提交
-
-
由 Bilal Amarni 提交于
CONFIG_KEYS_COMPAT is defined in arch-specific Kconfigs and is missing for several 64-bit architectures : mips, parisc, tile. At the moment and for those architectures, calling in 32-bit userspace the keyctl syscall would return an ENOSYS error. This patch moves the CONFIG_KEYS_COMPAT option to security/keys/Kconfig, to make sure the compatibility wrapper is registered by default for any 64-bit architecture as long as it is configured with CONFIG_COMPAT. [DH: Modified to remove arm64 compat enablement also as requested by Eric Biggers] Signed-off-by: NBilal Amarni <bilal.amarni@gmail.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Reviewed-by: NArnd Bergmann <arnd@arndb.de> cc: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
- 07 6月, 2017 10 次提交
-
-
由 Pavel Tatashin 提交于
The old method that is using xcall and softint to get new context id is deleted, as it is replaced by a method of using per_cpu_secondary_mm without xcall to perform the context wrap. Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: NBob Picco <bob.picco@oracle.com> Reviewed-by: NSteven Sistare <steven.sistare@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Tatashin 提交于
The current wrap implementation has a race issue: it is called outside of the ctx_alloc_lock, and also does not wait for all CPUs to complete the wrap. This means that a thread can get a new context with a new version and another thread might still be running with the same context. The problem is especially severe on CPUs with shared TLBs, like sun4v. I used the following test to very quickly reproduce the problem: - start over 8K processes (must be more than context IDs) - write and read values at a memory location in every process. Very quickly memory corruptions start happening, and what we read back does not equal what we wrote. Several approaches were explored before settling on this one: Approach 1: Move smp_new_mmu_context_version() inside ctx_alloc_lock, and wait for every process to complete the wrap. (Note: every CPU must WAIT before leaving smp_new_mmu_context_version_client() until every one arrives). This approach ends up with deadlocks, as some threads own locks which other threads are waiting for, and they never receive softint until these threads exit smp_new_mmu_context_version_client(). Since we do not allow the exit, deadlock happens. Approach 2: Handle wrap right during mondo interrupt. Use etrap/rtrap to enter into into C code, and issue new versions to every CPU. This approach adds some overhead to runtime: in switch_mm() we must add some checks to make sure that versions have not changed due to wrap while we were loading the new secondary context. (could be protected by PSTATE_IE but that degrades performance as on M7 and older CPUs as it takes 50 cycles for each access). Also, we still need a global per-cpu array of MMs to know where we need to load new contexts, otherwise we can change context to a thread that is going way (if we received mondo between switch_mm() and switch_to() time). Finally, there are some issues with window registers in rtrap() when context IDs are changed during CPU mondo time. The approach in this patch is the simplest and has almost no impact on runtime. We use the array with mm's where last secondary contexts were loaded onto CPUs and bump their versions to the new generation without changing context IDs. If a new process comes in to get a context ID, it will go through get_new_mmu_context() because of version mismatch. But the running processes do not need to be interrupted. And wrap is quicker as we do not need to xcall and wait for everyone to receive and complete wrap. Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: NBob Picco <bob.picco@oracle.com> Reviewed-by: NSteven Sistare <steven.sistare@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Tatashin 提交于
The new wrap is going to use information from this array to figure out mm's that currently have valid secondary contexts setup. Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: NBob Picco <bob.picco@oracle.com> Reviewed-by: NSteven Sistare <steven.sistare@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Tatashin 提交于
CTX_FIRST_VERSION defines the first context version, but also it defines first context. This patch redefines it to only include the first context version. Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: NBob Picco <bob.picco@oracle.com> Reviewed-by: NSteven Sistare <steven.sistare@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Tatashin 提交于
The only difference between these two functions is that in activate_mm we unconditionally flush context. However, there is no need to keep this difference after fixing a bug where cpumask was not reset on a wrap. So, in this patch we combine these. Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: NBob Picco <bob.picco@oracle.com> Reviewed-by: NSteven Sistare <steven.sistare@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Tatashin 提交于
After a wrap (getting a new context version) a process must get a new context id, which means that we would need to flush the context id from the TLB before running for the first time with this ID on every CPU. But, we use mm_cpumask to determine if this process has been running on this CPU before, and this mask is not reset after a wrap. So, there are two possible fixes for this issue: 1. Clear mm cpumask whenever mm gets a new context id 2. Unconditionally flush context every time process is running on a CPU This patch implements the first solution Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: NBob Picco <bob.picco@oracle.com> Reviewed-by: NSteven Sistare <steven.sistare@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Liam R. Howlett 提交于
hugetlb_bad_size needs to be called on invalid values. Also change the pr_warn to a pr_err to better align with other platforms. Signed-off-by: NLiam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 James Clarke 提交于
VIO devices were being looked up by their index in the machine description node block, but this often varies over time as devices are added and removed. Instead, store the ID and look up using the type, config handle and ID. Signed-off-by: NJames Clarke <jrtc27@jrtc27.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112541Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Mike Kravetz 提交于
When a TSB grows beyond its current capacity, a new TSB is allocated and copy_tsb is called to copy entries from the old TSB to the new. A hash shift based on page size is used to calculate the index of an entry in the TSB. copy_tsb has hard coded PAGE_SHIFT in these calculations. However, for huge page TSBs the value REAL_HPAGE_SHIFT should be used. As a result, when copy_tsb is called for a huge page TSB the entries are placed at the incorrect index in the newly allocated TSB. When doing hardware table walk, the MMU does not match these entries and we end up in the TSB miss handling code. This code will then create and write an entry to the correct index in the TSB. We take a performance hit for the table walk miss and recreation of these entries. Pass a new parameter to copy_tsb that is the page size shift to be used when copying the TSB. Suggested-by: NAnthony Yznaga <anthony.yznaga@oracle.com> Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jane Chu 提交于
Linux SPARC64 limits NR_CPUS to 4064 because init_cpu_send_mondo_info() only allocates a single page for NR_CPUS mondo entries. Thus we cannot use all 4096 CPUs on some SPARC platforms. To fix, allocate (2^order) pages where order is set according to the size of cpu_list for possible cpus. Since cpu_list_pa and cpu_mondo_block_pa are not used in asm code, there are no imm13 offsets from the base PA that will break because they can only reach one page. Orabug: 25505750 Signed-off-by: NJane Chu <jane.chu@oracle.com> Reviewed-by: NBob Picco <bob.picco@oracle.com> Reviewed-by: NAtish Patra <atish.patra@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 6月, 2017 1 次提交
-
-
由 David S. Miller 提交于
Reported-by: NWaldemar Brodkorb <wbx@openadk.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 6月, 2017 2 次提交
-
-
由 Ard Biesheuvel 提交于
As reported by Patrice, the header layout of the decompressor is incorrect when building for v7-M. In this case, the __nop macro resolves to 'mov r0, r0', which is emitted as a narrow encoding, resulting in the header data fields to end up at lower offsets than required. Given the variety of targets we need to support with the same code, the startup sequence is a bit of a jumble, and uses instructions and macros whose encoding widths cannot be specified (badr), or only exist in a narrow encoding (bx) So force the use of a wide encoding in __nop, and replace the start sequence with a simple jump to the label marking the start of code, preceded by a Thumb2 mode switch if required (using explicit wide encodings where appropriate). The label itself can be moved to the start of code [where it belongs] due to the larger range of branch instructions as compared to adr instructions. Reported-by: NPatrice CHOTARD <patrice.chotard@st.com> Acked-by: NNicolas Pitre <nico@linaro.org> Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
-
由 Vladimir Murzin 提交于
NOMMU build leads to the following error: CC drivers/pci/mmap.o drivers/pci/mmap.c: In function 'pci_mmap_resource_range': drivers/pci/mmap.c:60:3: error: implicit declaration of function 'pgprot_device' [-Werror=implicit-function-declaration] vma->vm_page_prot = pgprot_device(vma->vm_page_prot); ^ cc1: some warnings being treated as errors scripts/Makefile.build:302: recipe for target 'drivers/pci/mmap.o' failed make[2]: *** [drivers/pci/mmap.o] Error 1 scripts/Makefile.build:561: recipe for target 'drivers/pci' failed make[1]: *** [drivers/pci] Error 2 Makefile:1016: recipe for target 'drivers' failed make: *** [drivers] Error 2 Fix it with support of pgprot_device() macro for NOMMU. Fixes: 00d2904f ("ARM/PCI: Use generic pci_mmap_resource_range()") Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
-
- 03 6月, 2017 1 次提交
-
-
由 Matthias Kaehlcke 提交于
Commit 7c30f352 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp") removed a section specification from the jiffies declaration that caused conflicts on some platforms. Unfortunately this change broke the build for frv: kernel/built-in.o: In function `__do_softirq': (.text+0x6460): relocation truncated to fit: R_FRV_GPREL12 against symbol `jiffies' defined in *ABS* section in .tmp_vmlinux1 kernel/built-in.o: In function `__do_softirq': (.text+0x6574): relocation truncated to fit: R_FRV_GPREL12 against symbol `jiffies' defined in *ABS* section in .tmp_vmlinux1 kernel/built-in.o: In function `pwq_activate_delayed_work': workqueue.c:(.text+0x15b9c): relocation truncated to fit: R_FRV_GPREL12 against symbol `jiffies' defined in *ABS* section in .tmp_vmlinux1 ... Add __jiffy_arch_data to the declaration of jiffies and use it on frv to include the section specification. For all other platforms __jiffy_arch_data (currently) has no effect. Fixes: 7c30f352 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp") Link: http://lkml.kernel.org/r/20170516221333.177280-1-mka@chromium.orgSigned-off-by: NMatthias Kaehlcke <mka@chromium.org> Reported-by: NGuenter Roeck <linux@roeck-us.net> Tested-by: NGuenter Roeck <linux@roeck-us.net> Reviewed-by: NDavid Howells <dhowells@redhat.com> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 6月, 2017 3 次提交
-
-
由 Lorenzo Pieralisi 提交于
The BAD_MADT_GICC_ENTRY() macro checks if a GICC MADT entry passes muster from an ACPI specification standpoint. Current macro detects the MADT GICC entry length through ACPI firmware version (it changed from 76 to 80 bytes in the transition from ACPI 5.1 to ACPI 6.0 specification) but always uses (erroneously) the ACPICA (latest) struct (ie struct acpi_madt_generic_interrupt - that is 80-bytes long) length to check if the current GICC entry memory record exceeds the MADT table end in memory as defined by the MADT table header itself, which may result in false negatives depending on the ACPI firmware version and how the MADT entries are laid out in memory (ie on ACPI 5.1 firmware MADT GICC entries are 76 bytes long, so by adding 80 to a GICC entry start address in memory the resulting address may well be past the actual MADT end, triggering a false negative). Fix the BAD_MADT_GICC_ENTRY() macro by reshuffling the condition checks and update them to always use the firmware version specific MADT GICC entry length in order to carry out boundary checks. Fixes: b6cfb277 ("ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro") Reported-by: NJulien Grall <julien.grall@arm.com> Acked-by: NWill Deacon <will.deacon@arm.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Julien Grall <julien.grall@arm.com> Cc: Hanjun Guo <hanjun.guo@linaro.org> Cc: Al Stone <ahs3@redhat.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Leonard Crestez 提交于
Right now mach-imx6ul registers a fixup for the ksz8081 phy. The same register values can be set through the micrel phy driver by using dts properties. This seems preferable and allows cleanly fixing suspend/resume. Signed-off-by: NLeonard Crestez <leonard.crestez@nxp.com> Reviewed-by: NFabio Estevam <fabio.estevam@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
arch/sparc/kernel/ds.c: In function ‘register_services’: arch/sparc/kernel/ds.c:912:3: error: ‘strcpy’: writing at least 1 byte into a region of size 0 overflows the destination Reported-by: NAnatoly Pugachev <matorola@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 6月, 2017 3 次提交
-
-
由 Ingo Molnar 提交于
This reverts commit cbed27cd. As Andy Lutomirski observed: "I think this patch is bogus. pat_enabled() sure looks like it's supposed to return true if PAT is *enabled*, and these days PAT is 'enabled' even if there's no HW PAT support." Reported-by: NBernhard Held <berny156@gmx.de> Reported-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: stable@vger.kernel.org # v4.2+ Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 ZhuangYanying 提交于
When spin_lock_irqsave() deadlock occurs inside the guest, vcpu threads, other than the lock-holding one, would enter into S state because of pvspinlock. Then inject NMI via libvirt API "inject-nmi", the NMI could not be injected into vm. The reason is: 1 It sets nmi_queued to 1 when calling ioctl KVM_NMI in qemu, and sets cpu->kvm_vcpu_dirty to true in do_inject_external_nmi() meanwhile. 2 It sets nmi_queued to 0 in process_nmi(), before entering guest, because cpu->kvm_vcpu_dirty is true. It's not enough just to check nmi_queued to decide whether to stay in vcpu_block() or not. NMI should be injected immediately at any situation. Add checking nmi_pending, and testing KVM_REQ_NMI replaces nmi_queued in vm_vcpu_has_events(). Do the same change for SMIs. Signed-off-by: NZhuang Yanying <ann.zhuangyanying@huawei.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Roman Pen 提交于
This is a fix for the problem [1], where VMCB.CPL was set to 0 and interrupt was taken on userspace stack. The root cause lies in the specific AMD CPU behaviour which manifests itself as unusable segment attributes on SYSRET. The corresponding work around for the kernel is the following: 61f01dd9 ("x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue") In other turn virtualization side treated unusable segment incorrectly and restored CPL from SS attributes, which were zeroed out few lines above. In current patch it is assured only that P bit is cleared in VMCB.save state and segment attributes are not zeroed out if segment is not presented or is unusable, therefore CPL can be safely restored from DPL field. This is only one part of the fix, since QEMU side should be fixed accordingly not to zero out attributes on its side. Corresponding patch will follow. [1] Message id: CAJrWOzD6Xq==b-zYCDdFLgSRMPM-NkNuTSDFEtX=7MreT45i7Q@mail.gmail.com Signed-off-by: NRoman Pen <roman.penyaev@profitbricks.com> Signed-off-by: NMikhail Sennikovskii <mikhail.sennikovskii@profitbricks.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 30 5月, 2017 3 次提交
-
-
由 Gioh Kim 提交于
Commit 19bca6ab ("KVM: SVM: Fix cross vendor migration issue with unusable bit") added checking type when setting unusable. So unusable can be set if present is 0 OR type is 0. According to the AMD processor manual, long mode ignores the type value in segment descriptor. And type can be 0 if it is read-only data segment. Therefore type value is not related to unusable flag. This patch is based on linux-next v4.12.0-rc3. Signed-off-by: NGioh Kim <gi-oh.kim@profitbricks.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
kvm_skip_emulated_instruction() will return 0 if userspace is single-stepping the guest. kvm_skip_emulated_instruction() uses return status convention of exit handler: 0 means "exit to userspace" and 1 means "continue vm entries". The problem is that nested_vmx_check_vmptr() return status means something else: 0 is ok, 1 is error. This means we would continue executing after a failure. Static checker noticed it because vmptr was not initialized. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Fixes: 6affcbed ("KVM: x86: Add kvm_skip_emulated_instruction and use it.") Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vegard Nossum 提交于
This fixes a regression in commit 4d6501dc where I didn't notice that MIPS and OpenRISC were reinitialising p->{set,clear}_child_tid to NULL after our initialisation in copy_process(). We can simply get rid of the arch-specific initialisation here since it is now always done in copy_process() before hitting copy_thread{,_tls}(). Review notes: - As far as I can tell, copy_process() is the only user of copy_thread_tls(), which is the only caller of copy_thread() for architectures that don't implement copy_thread_tls(). - After this patch, there is no arch-specific code touching p->set_child_tid or p->clear_child_tid whatsoever. - It may look like MIPS/OpenRISC wanted to always have these fields be NULL, but that's not true, as copy_process() would unconditionally set them again _after_ calling copy_thread_tls() before commit 4d6501dc. Fixes: 4d6501dc ("kthread: Fix use-after-free if kthread fork fails") Reported-by: NGuenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> # MIPS only Acked-by: NStafford Horne <shorne@gmail.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: openrisc@lists.librecores.org Cc: Jamie Iles <jamie.iles@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 5月, 2017 2 次提交
-
-
由 Borislav Petkov 提交于
... to raw_smp_processor_id() to not trip the BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 check. The reasoning behind it is that __warn() already uses the raw_ variants but the show_regs() path on 32-bit doesn't. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170528092212.fiod7kygpjm23m3o@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
With CONFIG_DEBUG_PREEMPT enabled, I get: BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is debug_smp_processor_id CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc2+ #2 Call Trace: dump_stack check_preemption_disabled debug_smp_processor_id save_microcode_in_initrd_amd ? microcode_init save_microcode_in_initrd ... because, well, it says it above, we're using smp_processor_id() in preemptible code. But passing the CPU number is not really needed. It is only used to determine whether we're on the BSP, and, if so, to save the microcode patch for early loading. [ We don't absolutely need to do it on the BSP but we do that customarily there. ] Instead, convert that function parameter to a boolean which denotes whether the patch should be saved or not, thereby avoiding the use of smp_processor_id() in preemptible code. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170528200414.31305-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 5月, 2017 3 次提交
-
-
由 Baoquan He 提交于
For EFI with the 'efi=old_map' kernel option specified, the kernel will panic when KASLR is enabled: BUG: unable to handle kernel paging request at 000000007febd57e IP: 0x7febd57e PGD 1025a067 PUD 0 Oops: 0010 [#1] SMP Call Trace: efi_enter_virtual_mode() start_kernel() x86_64_start_reservations() x86_64_start_kernel() start_cpu() The root cause is that the identity mapping is not built correctly in the 'efi=old_map' case. On 'nokaslr' kernels, PAGE_OFFSET is 0xffff880000000000 which is PGDIR_SIZE aligned. We can borrow the PUD table from the direct mappings safely. Given a physical address X, we have pud_index(X) == pud_index(__va(X)). However, on KASLR kernels, PAGE_OFFSET is PUD_SIZE aligned. For a given physical address X, pud_index(X) != pud_index(__va(X)). We can't just copy the PGD entry from direct mapping to build identity mapping, instead we need to copy the PUD entries one by one from the direct mapping. Fix it. Signed-off-by: NBaoquan He <bhe@redhat.com> Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Young <dyoung@redhat.com> Cc: Frank Ramsay <frank.ramsay@hpe.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <rja@sgi.com> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170526113652.21339-5-matt@codeblueprint.co.uk [ Fixed and reworded the changelog and code comments to be more readable. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Sai Praneeth 提交于
Booting kexec kernel with "efi=old_map" in kernel command line hits kernel panic as shown below. BUG: unable to handle kernel paging request at ffff88007fe78070 IP: virt_efi_set_variable.part.7+0x63/0x1b0 PGD 7ea28067 PUD 7ea2b067 PMD 7ea2d067 PTE 0 [...] Call Trace: virt_efi_set_variable() efi_delete_dummy_variable() efi_enter_virtual_mode() start_kernel() x86_64_start_reservations() x86_64_start_kernel() start_cpu() [ efi=old_map was never intended to work with kexec. The problem with using efi=old_map is that the virtual addresses are assigned from the memory region used by other kernel mappings; vmalloc() space. Potentially there could be collisions when booting kexec if something else is mapped at the virtual address we allocated for runtime service regions in the initial boot - Matt Fleming ] Since kexec was never intended to work with efi=old_map, disable runtime services in kexec if booted with efi=old_map, so that we don't panic. Tested-by: NLee Chun-Yi <jlee@suse.com> Signed-off-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk> Acked-by: NDave Young <dyoung@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170526113652.21339-4-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Juergen Gross 提交于
When booted as Xen dom0 there won't be an EFI memmap allocated. Avoid issuing an error message in this case: [ 0.144079] efi: Failed to allocate new EFI memmap Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk> Cc: <stable@vger.kernel.org> # v4.9+ Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170526113652.21339-2-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 5月, 2017 5 次提交
-
-
由 Thomas Gleixner 提交于
ftrace use module_alloc() to allocate trampoline pages. The mapping of module_alloc() is RWX, which makes sense as the memory is written to right after allocation. But nothing makes these pages RO after writing to them. Add proper set_memory_rw/ro() calls to protect the trampolines after modification. Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705251056410.1862@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
With function tracing starting in early bootup and having its trampoline pages being read only, a bug triggered with the following: kernel BUG at arch/x86/mm/pageattr.c:189! invalid opcode: 0000 [#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc2-test+ #3 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 task: ffffffffb4222500 task.stack: ffffffffb4200000 RIP: 0010:change_page_attr_set_clr+0x269/0x302 RSP: 0000:ffffffffb4203c88 EFLAGS: 00010046 RAX: 0000000000000046 RBX: 0000000000000000 RCX: 00000001b6000000 RDX: ffffffffb4203d40 RSI: 0000000000000000 RDI: ffffffffb4240d60 RBP: ffffffffb4203d18 R08: 00000001b6000000 R09: 0000000000000001 R10: ffffffffb4203aa8 R11: 0000000000000003 R12: ffffffffc029b000 R13: ffffffffb4203d40 R14: 0000000000000001 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9a639ea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff9a636b384000 CR3: 00000001ea21d000 CR4: 00000000000406b0 Call Trace: change_page_attr_clear+0x1f/0x21 set_memory_ro+0x1e/0x20 arch_ftrace_update_trampoline+0x207/0x21c ? ftrace_caller+0x64/0x64 ? 0xffffffffc029b000 ftrace_startup+0xf4/0x198 register_ftrace_function+0x26/0x3c function_trace_init+0x5e/0x73 tracer_init+0x1e/0x23 tracing_set_tracer+0x127/0x15a register_tracer+0x19b/0x1bc init_function_trace+0x90/0x92 early_trace_init+0x236/0x2b3 start_kernel+0x200/0x3f5 x86_64_start_reservations+0x29/0x2b x86_64_start_kernel+0x17c/0x18f secondary_startup_64+0x9f/0x9f ? secondary_startup_64+0x9f/0x9f Interrupts should not be enabled at this early in the boot process. It is also fine to leave interrupts enabled during this time as there's only one CPU running, and on_each_cpu() means to only run on the current CPU. If early_boot_irqs_disabled is set, it is safe to run cpu_flush_range() with interrupts disabled. Don't trigger a BUG_ON() in that case. Link: http://lkml.kernel.org/r/20170526093717.0be3b849@gandalf.local.homeSuggested-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Masami Hiramatsu 提交于
Fix kprobes to set(recover) RWX bits correctly on trampoline buffer before releasing it. Releasing readonly page to module_memfree() crash the kernel. Without this fix, if kprobes user register a bunch of kprobes in function body (since kprobes on function entry usually use ftrace) and unregister it, kernel hits a BUG and crash. Link: http://lkml.kernel.org/r/149570868652.3518.14120169373590420503.stgit@devboxSigned-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Fixes: d0381c81 ("kprobes/x86: Set kprobes pages read-only") Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Jane Chu 提交于
SPARC M6-32 platform has (2^5) NUMA nodes, so need to bump up the CONFIG_NODES_SHIFT to 5. Orabug: 25577754 Signed-off-by: NJane Chu <jane.chu@oracle.com> Reviewed-by: NBob Picco <bob.picco@oracle.com> Reviewed-by: NAtish Patra <atish.patra@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jan H. Schönherr 提交于
Intel SDM says, that at most one LAPIC should be configured with ExtINT delivery. KVM configures all LAPICs this way. This causes pic_unlock() to kick the first available vCPU from the internal KVM data structures. If this vCPU is not the BSP, but some not-yet-booted AP, the BSP may never realize that there is an interrupt. Fix that by enabling ExtINT delivery only for the BSP. This allows booting a Linux guest without a TSC in the above situation. Otherwise the BSP gets stuck in calibrate_delay_converge(). Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de> Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 26 5月, 2017 3 次提交
-
-
由 Jan H. Schönherr 提交于
The decision whether or not to exit from L2 to L1 on an lmsw instruction is based on bogus values: instead of using the information encoded within the exit qualification, it uses the data also used for the mov-to-cr instruction, which boils down to using whatever is in %eax at that point. Use the correct values instead. Without this fix, an L1 may not get notified when a 32-bit Linux L2 switches its secondary CPUs to protected mode; the L1 is only notified on the next modification of CR0. This short time window poses a problem, when there is some other reason to exit to L1 in between. Then, L2 will be resumed in real mode and chaos ensues. Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de> Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Preemption can occur during cancel preemption timer, and there will be inconsistent status in lapic, vmx and vmcs field. CPU0 CPU1 preemption timer vmexit handle_preemption_timer(vCPU0) kvm_lapic_expired_hv_timer vmx_cancel_hv_timer vmx->hv_deadline_tsc = -1 vmcs_clear_bits /* hv_timer_in_use still true */ sched_out sched_in kvm_arch_vcpu_load vmx_set_hv_timer write vmx->hv_deadline_tsc vmcs_set_bits /* back in kvm_lapic_expired_hv_timer */ hv_timer_in_use = false ... vmx_vcpu_run vmx_arm_hv_run write preemption timer deadline spurious preemption timer vmexit handle_preemption_timer(vCPU0) kvm_lapic_expired_hv_timer WARN_ON(!apic->lapic_timer.hv_timer_in_use); This can be reproduced sporadically during boot of L2 on a preemptible L1, causing a splat on L1. WARNING: CPU: 3 PID: 1952 at arch/x86/kvm/lapic.c:1529 kvm_lapic_expired_hv_timer+0xb5/0xd0 [kvm] CPU: 3 PID: 1952 Comm: qemu-system-x86 Not tainted 4.12.0-rc1+ #24 RIP: 0010:kvm_lapic_expired_hv_timer+0xb5/0xd0 [kvm] Call Trace: handle_preemption_timer+0xe/0x20 [kvm_intel] vmx_handle_exit+0xc9/0x15f0 [kvm_intel] ? lock_acquire+0xdb/0x250 ? lock_acquire+0xdb/0x250 ? kvm_arch_vcpu_ioctl_run+0xdf3/0x1ce0 [kvm] kvm_arch_vcpu_ioctl_run+0xe55/0x1ce0 [kvm] kvm_vcpu_ioctl+0x384/0x7b0 [kvm] ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm] ? __fget+0xf3/0x210 do_vfs_ioctl+0xa4/0x700 ? __fget+0x114/0x210 SyS_ioctl+0x79/0x90 do_syscall_64+0x8f/0x750 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 This patch fixes it by disabling preemption while cancelling preemption timer. This way cancel_hv_timer is atomic with respect to kvm_arch_vcpu_load. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jan Kiszka 提交于
This ensures that adjustments to x86_platform done by the hypervisor setup is already respected by this simple calibration. The current user of this, introduced by 1b5aeebf ("x86/earlyprintk: Add support for earlyprintk via USB3 debug port"), comes much later into play. Fixes: dd759d93 ("x86/timers: Add simple udelay calibration") Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NLu Baolu <baolu.lu@linux.intel.com> Link: http://lkml.kernel.org/r/5e89fe60-aab3-2c1c-aba8-32f8ad376189@siemens.com
-
- 25 5月, 2017 3 次提交
-
-
由 Timmy Li 提交于
Commit 093d24a2 ("arm64: PCI: Manage controller-specific data on per-controller basis") added code to allocate ACPI PCI root_ops dynamically on a per host bridge basis but failed to update the corresponding memory allocation failure path in pci_acpi_scan_root() leading to a potential memory leakage. Fix it by adding the required kfree call. Fixes: 093d24a2 ("arm64: PCI: Manage controller-specific data on per-controller basis") Reviewed-by: NTomasz Nowicki <tn@semihalf.com> Signed-off-by: NTimmy Li <lixiaoping3@huawei.com> [lorenzo.pieralisi@arm.com: refactored code, rewrote commit log] Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Nicholas Piggin 提交于
Providing "scv" support to userspace requires kernel support, so it must be advertised as independently to the base ISA 3 instruction set. The darn instruction relies on firmware enablement, so it has been decided to split this out from the core ISA 3 feature as well. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Jeremy Kerr 提交于
Commit ac29c640 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED") swapped _PAGE_USER for _PAGE_PRIVILEGED, and introduced check_pte_access() which denied kernel access to non-_PAGE_PRIVILEGED pages. However, it didn't add _PAGE_PRIVILEGED to the hash fault handler for spufs' kernel accesses, so the DMAs required to establish SPE memory no longer work. This change adds _PAGE_PRIVILEGED to the hash fault handler for kernel accesses. Fixes: ac29c640 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: NJeremy Kerr <jk@ozlabs.org> Reported-by: NSombat Tragolgosol <sombat3960@gmail.com> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-