- 17 3月, 2017 2 次提交
-
-
由 Andy Lutomirski 提交于
Naively, it looks racy, but ->mmap_sem saves it. Add a comment and a lockdep assertion. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/03a1e629063899168dfc4707f3bb6e581e21f5c6.1489694270.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
If one thread mmaps a perf event while another thread in the same mm is in some context where active_mm != mm (which can happen in the scheduler, for example), refresh_pce() would write the wrong value to CR4.PCE. This broke some PAPI tests. Reported-and-tested-by: NVince Weaver <vincent.weaver@maine.edu> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 7911d3f7 ("perf/x86: Only allow rdpmc if a perf_event is mapped") Link: http://lkml.kernel.org/r/0c5b38a76ea50e405f9abe07a13dfaef87c173a1.1489694270.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 3月, 2017 3 次提交
-
-
由 Alexander Shishkin 提交于
Since commit: 1c5ac21a ("perf/x86/intel/pt: Don't die on VMXON") ... PT events depend on re-scheduling to get enabled after a VMX session has taken place. This is, in particular, a problem for CPU context events, which don't normally get re-scheduled, unless there is a reason. This patch changes the VMX handling so that PT event gets re-enabled when VMX root mode exits. Also, notify the user when there's a gap in PT data due to VMX root mode by flagging AUX records as partial. In combination with vmm_exclusive=0 parameter of the kvm_intel driver, this will result in trace gaps only for the duration of the guest's timeslices. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20170220133352.17995-5-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Will Deacon 提交于
In preparation for adding more flags to perf AUX records, introduce a separate API for setting the flags for a session, rather than appending more bool arguments to perf_aux_output_end. This allows to set each flag at the time a corresponding condition is detected, instead of tracking it in each driver's private state. Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20170220133352.17995-3-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
Goldmont supports full Top Down level 1 metrics (FrontendBound, Retiring, Backend Bound and Bad Speculation). It has 3 wide pipeline. Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1486711438-80058-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 3月, 2017 2 次提交
-
-
由 Daniel Borkmann 提交于
Fengguang reported random corruptions from various locations on x86-32 after commits d2852a22 ("arch: add ARCH_HAS_SET_MEMORY config") and 9d876e79 ("bpf: fix unlocking of jited image when module ronx not set") that uses the former. While x86-32 doesn't have a JIT like x86_64, the bpf_prog_lock_ro() and bpf_prog_unlock_ro() got enabled due to ARCH_HAS_SET_MEMORY, whereas Fengguang's test kernel doesn't have module support built in and therefore never had the DEBUG_SET_MODULE_RONX setting enabled. After investigating the crashes further, it turned out that using set_memory_ro() and set_memory_rw() didn't have the desired effect, for example, setting the pages as read-only on x86-32 would still let probe_kernel_write() succeed without error. This behavior would manifest itself in situations where the vmalloc'ed buffer was accessed prior to set_memory_*() such as in case of bpf_prog_alloc(). In cases where it wasn't, the page attribute changes seemed to have taken effect, leading to the conclusion that a TLB invalidate didn't happen. Moreover, it turned out that this issue reproduced with qemu in "-cpu kvm64" mode, but not for "-cpu host". When the issue occurs, change_page_attr_set_clr() did trigger a TLB flush as expected via __flush_tlb_all() through cpa_flush_range(), though. There are 3 variants for issuing a TLB flush: invpcid_flush_all() (depends on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE), cr4 based flush (depends on X86_FEATURE_PGE), and cr3 based flush. For "-cpu host" case in my setup, the flush used invpcid_flush_all() variant, whereas for "-cpu kvm64", the flush was cr4 based. Switching the kvm64 case to cr3 manually worked fine, and further investigating the cr4 one turned out that X86_CR4_PGE bit was not set in cr4 register, meaning the __native_flush_tlb_global_irq_disabled() wrote cr4 twice with the same value instead of clearing X86_CR4_PGE in the first write to trigger the flush. It turned out that X86_CR4_PGE was cleared from cr4 during init from lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE bit is also cleared from there due to concerns of using PGE in guest kernel that can lead to hard to trace bugs (see bff672e6 ("lguest: documentation V: Host") in init()). The CPU feature bits are cleared in dynamic boot_cpu_data, but they never propagated to __flush_tlb_all() as it uses static_cpu_has() instead of boot_cpu_has() for testing which variant of TLB flushing to use, meaning they still used the old setting of the host kernel. Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would propagate to static_cpu_has() checks is too late at this point as sections have been patched already, so for now, it seems reasonable to switch back to boot_cpu_has(X86_FEATURE_PGE) as it was prior to commit c109bf95 ("x86/cpufeature: Remove cpu_has_pge"). This lets the TLB flush trigger via cr3 as originally intended, properly makes the new page attributes visible and thus fixes the crashes seen by Fengguang. Fixes: c109bf95 ("x86/cpufeature: Remove cpu_has_pge") Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Cc: bp@suse.de Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: lkp@01.org Cc: Laura Abbott <labbott@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernrl.org/r/20170301125426.l4nf65rx4wahohyl@wfg-t540p.sh.intel.com Link: http://lkml.kernel.org/r/25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel@iogearbox.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Guenter Roeck 提交于
After changing from module.h to extable.h, score builds fail with: arch/score/kernel/traps.c: In function 'do_ri': arch/score/kernel/traps.c:248:4: error: implicit declaration of function 'user_disable_single_step' arch/score/mm/extable.c: In function 'fixup_exception': arch/score/mm/extable.c:32:38: error: dereferencing pointer to incomplete type arch/score/mm/extable.c:34:24: error: dereferencing pointer to incomplete type because extable.h doesn't drag in the same amount of headers as the module.h did. Add in the headers which were implicitly expected. Fixes: 90858794 ("module.h: remove extable.h include now users have migrated") Signed-off-by: NGuenter Roeck <linux@roeck-us.net> [PG: tweak commit log; refresh for sched header refactoring.] Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 11 3月, 2017 1 次提交
-
-
由 Thomas Gleixner 提交于
The purgatory code defines global variables which are referenced via a symbol lookup in the kexec code (core and arch). A recent commit addressing sparse warnings made these static and thereby broke kexec_file. Why did this happen? Simply because the whole machinery is undocumented and lacks any form of forward declarations. The variable names are unspecific and lack a prefix, so adding forward declarations creates shadow variables in the core code. Aside of that the code relies on magic constants and duplicate struct definitions with no way to ensure that these things stay in sync. The section placement of the purgatory variables happened by chance and not by design. Unbreak kexec and cleanup the mess: - Add proper forward declarations and document the usage - Use common struct definition - Use the proper common defines instead of magic constants - Add a purgatory_ prefix to have a proper name space - Use ARRAY_SIZE() instead of a homebrewn reimplementation - Add proper sections to the purgatory variables [ From Mike ] Fixes: 72042a8c ("x86/purgatory: Make functions and variables static") Reported-by: NMike Galbraith <<efault@gmx.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Nicholas Mc Guire <der.herr@hofr.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Tobin C. Harding" <me@tobin.cc> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1703101315140.3681@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 10 3月, 2017 13 次提交
-
-
由 Matjaz Hegedic 提交于
The reboot quirk for ASUS EeeBook X205TA contains a typo in DMI_PRODUCT_NAME, improperly referring to X205TAW instead of X205TA, which prevents the quirk from being triggered. The model X205TAW already has a reboot quirk of its own. This fix simply removes the inappropriate final letter W. Fixes: 90b28ded ("x86/reboot/quirks: Add ASUS EeeBook X205TA reboot quirk") Signed-off-by: NMatjaz Hegedic <matjaz.hegedic@gmail.com> Link: http://lkml.kernel.org/r/1489064417-7445-1-git-send-email-matjaz.hegedic@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Nicholas Piggin 提交于
Add POWER9 machine check handler. There are several new types of errors added, so logging messages for those are also added. This doesn't attempt to reuse any of the P7/8 defines or functions, because that becomes too complex. The better option in future is to use a table driven approach. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
Currently severity and initiator are always set to MCE_SEV_ERROR_SYNC and MCE_INITIATOR_CPU in the core mce code. Allow them to be set by the machine specific mce handlers. No functional change for existing handlers. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
A synchronous machine check is an exception raised by the attempt to execute the current instruction. If the error can't be corrected, it can make sense to SIGBUS the currently running process. In other cases, the error condition is not related to the current instruction, so killing the current process is not the right thing to do. Today, all machine checks are MCE_SEV_ERROR_SYNC, so this has no practical change. It will be used to handle POWER9 asynchronous machine checks. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
Remove incorrect CONFIG_IDE ifdef (CONFIG_IDE config option is for internal drivers/ide/ use) and make IDE hardware interface always initialized (not only when IDE subsystem is built-in). This patch allows Cayman board to work with modular IDE subsystem support and removes the requirement of having the whole core IDE subsystem built-in when using libata PATA support. Link: http://lkml.kernel.org/r/1990884.yFoE6lSB9G@amdc3058Signed-off-by: NBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
All exit paths from gup_pte_range() require pte_unmap() of the original pte page before returning. Refactor the code to have a single exit point to do the unmap. This mirrors the flow of the generic gup_pte_range() in mm/gup.c. Link: http://lkml.kernel.org/r/148804251828.36605.14910389618497006945.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
gup_pte_range() fails to check pte_allows_gup() before translating a DAX pte entry, pte_devmap(), to a page. This allows writes to read-only mappings, and bypasses the DAX cacheline dirty tracking due to missed 'mkwrite' faults. The gup_huge_pmd() path and the gup_huge_pud() path correctly check pte_allows_gup() before checking for _devmap() entries. Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings") Link: http://lkml.kernel.org/r/148804251312.36605.12665024794196605053.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Reported-by: NDave Hansen <dave.hansen@linux.intel.com> Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Cc: Xiong Zhou <xzhou@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
We use pte_write() to check whethwer the pte entry is writable. This is mostly used to later mark the pte read only if it is writable. The other use of pte_write() is to check whether the pte_entry is writable so that hardware page table entry can be marked accordingly. This is used in kvm where we look at qemu page table entry and update hardware hash page table for the guest with correct write enable bit. With the above, for the first usage we should also check the savedwrite bit so that we can correctly clear the savedwite bit. For the later, we add a new variant __pte_write(). With this we can revert write_protect_page part of 595cd8f2 ("mm/ksm: handle protnone saved writes when making page write protect"). But I left it as it is as an example code for savedwrite check. Fixes: c137a275 ("powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write") Link: http://lkml.kernel.org/r/1488203787-17849-2-git-send-email-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Rik van Riel <riel@surriel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
We need to mark pages of parent process read only on fork. Numa fault pte needs a protnone ptes variant with saved write flag set. On fork we need to make sure we remove the saved write bit. Instead of adding the protnone check in the caller update ptep_set_wrprotect variants to clear savedwrite bit. Without this we see random segfaults in application on fork. Fixes: c137a275 ("powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write") Link: http://lkml.kernel.org/r/1488203787-17849-1-git-send-email-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Rik van Riel <riel@surriel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Masahiro Yamada 提交于
Fix typos and add the following to the scripts/spelling.txt: disble||disable disbled||disabled I kept the TSL2563_INT_DISBLED in /drivers/iio/light/tsl2563.c untouched. The macro is not referenced at all, but this commit is touching only comment blocks just in case. Link: http://lkml.kernel.org/r/1481573103-11329-20-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heiko Carstens 提交于
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Kirill A. Shutemov 提交于
If an architecture uses 4level-fixup.h we don't need to do anything as it includes 5level-fixup.h. If an architecture uses pgtable-nop*d.h, define __ARCH_USE_5LEVEL_HACK before inclusion of the header. It makes asm-generic code to use 5level-fixup.h. If an architecture has 4-level paging or folds levels on its own, include 5level-fixup.h directly. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
Look for 'la57' in /proc/cpuinfo to see if your machine supports 5-level paging. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 3月, 2017 9 次提交
-
-
由 Radim Krčmář 提交于
Before trying to do nested_get_page() in nested_vmx_merge_msr_bitmap(), we have already checked that the MSR bitmap address is valid (4k aligned and within physical limits). SDM doesn't specify what happens if the there is no memory mapped at the valid address, but Intel CPUs treat the situation as if the bitmap was configured to trap all MSRs. KVM already does that by returning false and a correct handling doesn't need the guest-trigerrable warning that was reported by syzkaller: (The warning was originally there to catch some possible bugs in nVMX.) ------------[ cut here ]------------ WARNING: CPU: 0 PID: 7832 at arch/x86/kvm/vmx.c:9709 nested_vmx_merge_msr_bitmap arch/x86/kvm/vmx.c:9709 [inline] WARNING: CPU: 0 PID: 7832 at arch/x86/kvm/vmx.c:9709 nested_get_vmcs12_pages+0xfb6/0x15c0 arch/x86/kvm/vmx.c:9640 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 7832 Comm: syz-executor1 Not tainted 4.10.0+ #229 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 panic+0x1fb/0x412 kernel/panic.c:179 __warn+0x1c4/0x1e0 kernel/panic.c:540 warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 nested_vmx_merge_msr_bitmap arch/x86/kvm/vmx.c:9709 [inline] nested_get_vmcs12_pages+0xfb6/0x15c0 arch/x86/kvm/vmx.c:9640 enter_vmx_non_root_mode arch/x86/kvm/vmx.c:10471 [inline] nested_vmx_run+0x6186/0xaab0 arch/x86/kvm/vmx.c:10561 handle_vmlaunch+0x1a/0x20 arch/x86/kvm/vmx.c:7312 vmx_handle_exit+0xfc0/0x3f00 arch/x86/kvm/vmx.c:8526 vcpu_enter_guest arch/x86/kvm/x86.c:6982 [inline] vcpu_run arch/x86/kvm/x86.c:7044 [inline] kvm_arch_vcpu_ioctl_run+0x1418/0x4840 arch/x86/kvm/x86.c:7205 kvm_vcpu_ioctl+0x673/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2570 Reported-by: NDmitry Vyukov <dvyukov@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> [Jim Mattson explained the bare metal behavior: "I believe this behavior would be documented in the chipset data sheet rather than the SDM, since the chipset returns all 1s for an unclaimed read."] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Alexey Kardashevskiy 提交于
On POWERNV platform, in order to do DMA via IOMMU (i.e. 32bit DMA in our case), a device needs an iommu_table pointer set via set_iommu_table_base(). The codeflow is: - pnv_pci_ioda2_setup_dma_pe() - pnv_pci_ioda2_setup_default_config() - pnv_ioda_setup_bus_dma() [1] pnv_pci_ioda2_setup_dma_pe() creates IOMMU groups, pnv_pci_ioda2_setup_default_config() does default DMA setup, pnv_ioda_setup_bus_dma() takes a bus PE (on IODA2, all physical function PEs as bus PEs except NPU), walks through all underlying buses and devices, adds all devices to an IOMMU group and sets iommu_table. On IODA2, when VFIO is used, it takes ownership over a PE which means it removes all tables and creates new ones (with a possibility of sharing them among PEs). So when the ownership is returned from VFIO to the kernel, the iommu_table pointer written to a device at [1] is stale and needs an update. This adds an "add_to_group" parameter to pnv_ioda_setup_bus_dma() (in fact re-adds as it used to be there a while ago for different reasons) to tell the helper if a device needs to be added to an IOMMU group with an iommu_table update or just the latter. This calls pnv_ioda_setup_bus_dma(..., false) from pnv_ioda2_release_ownership() so when the ownership is restored, 32bit DMA can work again for a device. This does the same thing on obtaining ownership as the iommu_table point is stale at this point anyway and it is safer to have NULL there. We did not hit this earlier as all tested devices in recent years were only using 64bit DMA; the rare exception for this is MPT3 SAS adapter which uses both 32bit and 64bit DMA access and it has not been tested with VFIO much. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Linu Cherian 提交于
Having only 32 memslots is a real constraint for the maximum number of PCI devices that can be assigned to a single guest. Assuming each PCI device/virtual function having two memory BAR regions, we could assign only 15 devices/virtual functions to a guest. Hence increase KVM_USER_MEM_SLOTS to 512 as done in other archs like powerpc. Reviewed-by: NChristoffer Dall <cdall@linaro.org> Signed-off-by: NLinu Cherian <linu.cherian@cavium.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Linu Cherian 提交于
arm/arm64 architecture doesnt use private memslots, hence removing KVM_PRIVATE_MEM_SLOTS macro definition. Reviewed-by: NChristoffer Dall <cdall@linaro.org> Signed-off-by: NLinu Cherian <linu.cherian@cavium.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Linu Cherian 提交于
Return KVM_USER_MEM_SLOTS for userspace capability query on NR_MEMSLOTS. Reviewed-by: NChristoffer Dall <cdall@linaro.org> Signed-off-by: NLinu Cherian <linu.cherian@cavium.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Alexey Kardashevskiy 提交于
The IODA2 specification says that a 64 DMA address cannot use top 4 bits (3 are reserved and one is a "TVE select"); bottom page_shift bits cannot be used for multilevel table addressing either. The existing IODA2 table allocation code aligns the minimum TCE table size to PAGE_SIZE so in the case of 64K system pages and 4K IOMMU pages, we have 64-4-12=48 bits. Since 64K page stores 8192 TCEs, i.e. needs 13 bits, the maximum number of levels is 48/13 = 3 so we physically cannot address more and EEH happens on DMA accesses. This adds a check that too many levels were requested. It is still possible to have 5 levels in the case of 4K system page size. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Madhavan Srinivasan 提交于
MMCRA[SDAR_MODE] specifices how the SDAR should be updated in continous sampling mode. On P9 it must be set to 0b00 when MMCRA[63] is set. Fixes: c7c3f568 ('powerpc/perf: macros for power9 format encoding') Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Madhavan Srinivasan 提交于
Power9 DD1 do not support PMU_HAS_SIER flag and sdsync in perf_get_data_addr() defaults to MMCRA_SDSYNC which is wrong. Since power9 MMCRA does not support SDSYNC bit, patch includes PPMU_NO_SIAR flag to the check and set the sdsync with MMCRA_SAMPLE_ENABLE; Fixes: 27593d72 ("powerpc/perf: Use MSR to report privilege level on P9 DD1") Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Jan Kara 提交于
It is invalid to call del_gendisk() when disk->queue is NULL. Fix error handling in axon_ram_probe() to avoid doing that. Also del_gendisk() does not drop a reference to gendisk allocated by alloc_disk(). That has to be done by put_disk(). Add that call where needed. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 08 3月, 2017 3 次提交
-
-
由 Arnd Bergmann 提交于
After the split of linux/sched.h, several platforms in arch/mips stopped building. Add the respective additional #include statements to fix the problem I first tried adding these into asm/processor.h, but ran into circular header dependencies with that which I could not figure out. The commit I listed as causing the problem is the branch merge, as there is likely a combination of multiple patches in that branch. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mips@linux-mips.org Cc: ralf@linux-mips.org Fixes: 1827adb1 ("Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") Link: http://lkml.kernel.org/r/20170308072931.3836696-1-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Daniel Axtens 提交于
Turning on crypto self-tests on a POWER8 shows: alg: hash: Test 1 failed for crc32c-vpmsum 00000000: ff ff ff ff Comparing the code with the Intel CRC32c implementation on which ours is based shows that we are doing an init with 0, not ~0 as CRC32c requires. This probably wasn't caught because btrfs does its own weird open-coded initialisation. Initialise our internal context to ~0 on init. This makes the self-tests pass, and btrfs continues to work. Fixes: 6dd7a82c ("crypto: powerpc - Add POWER8 optimised crc32c") Cc: Anton Blanchard <anton@samba.org> Cc: stable@vger.kernel.org Signed-off-by: NDaniel Axtens <dja@axtens.net> Acked-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Michael Ellerman 提交于
Recent toolchains force the TOC to be 256 byte aligned. We need to enforce this alignment in the zImage linker script, otherwise pointers to our TOC variables (__toc_start) could be incorrect. If the actual start of the TOC and __toc_start don't have the same value we crash early in the zImage wrapper. Cc: stable@vger.kernel.org Suggested-by: NAlan Modra <amodra@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 07 3月, 2017 7 次提交
-
-
由 Mark Rutland 提交于
Currently we BUG() if we see an ESR_EL2.EC value we don't recognise. As configurable disables/enables are added to the architecture (controlled by RES1/RES0 bits respectively), with associated synchronous exceptions, it may be possible for a guest to trigger exceptions with classes that we don't recognise. While we can't service these exceptions in a manner useful to the guest, we can avoid bringing down the host. Per ARM DDI 0487A.k_iss10775, page D7-1937, EC values within the range 0x00 - 0x2c are reserved for future use with synchronous exceptions, and EC values within the range 0x2d - 0x3f may be used for either synchronous or asynchronous exceptions. The patch makes KVM handle any unknown EC by injecting an UNDEFINED exception into the guest, with a corresponding (ratelimited) warning in the host dmesg. We could later improve on this with with a new (opt-in) exit to the host userspace. Cc: Dave Martin <dave.martin@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Mark Rutland 提交于
Currently we BUG() if we see a HSR.EC value we don't recognise. As configurable disables/enables are added to the architecture (controlled by RES1/RES0 bits respectively), with associated synchronous exceptions, it may be possible for a guest to trigger exceptions with classes that we don't recognise. While we can't service these exceptions in a manner useful to the guest, we can avoid bringing down the host. Per ARM DDI 0406C.c, all currently unallocated HSR EC encodings are reserved, and per ARM DDI 0487A.k_iss10775, page G6-4395, EC values within the range 0x00 - 0x2c are reserved for future use with synchronous exceptions, and EC values within the range 0x2d - 0x3f may be used for either synchronous or asynchronous exceptions. The patch makes KVM handle any unknown EC by injecting an UNDEFINED exception into the guest, with a corresponding (ratelimited) warning in the host dmesg. We could later improve on this with with a new (opt-in) exit to the host userspace. Cc: Dave Martin <dave.martin@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Wanpeng Li 提交于
Reported by syzkaller: WARNING: CPU: 1 PID: 27742 at arch/x86/kvm/vmx.c:11029 nested_vmx_vmexit+0x5c35/0x74d0 arch/x86/kvm/vmx.c:11029 CPU: 1 PID: 27742 Comm: a.out Not tainted 4.10.0+ #229 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 panic+0x1fb/0x412 kernel/panic.c:179 __warn+0x1c4/0x1e0 kernel/panic.c:540 warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 nested_vmx_vmexit+0x5c35/0x74d0 arch/x86/kvm/vmx.c:11029 vmx_leave_nested arch/x86/kvm/vmx.c:11136 [inline] vmx_set_msr+0x1565/0x1910 arch/x86/kvm/vmx.c:3324 kvm_set_msr+0xd4/0x170 arch/x86/kvm/x86.c:1099 do_set_msr+0x11e/0x190 arch/x86/kvm/x86.c:1128 __msr_io arch/x86/kvm/x86.c:2577 [inline] msr_io+0x24b/0x450 arch/x86/kvm/x86.c:2614 kvm_arch_vcpu_ioctl+0x35b/0x46a0 arch/x86/kvm/x86.c:3497 kvm_vcpu_ioctl+0x232/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2721 vfs_ioctl fs/ioctl.c:43 [inline] do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683 SYSC_ioctl fs/ioctl.c:698 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689 entry_SYSCALL_64_fastpath+0x1f/0xc2 The syzkaller folks reported a nested_run_pending warning during userspace clear VMX capability which is exposed to L1 before. The warning gets thrown while doing (*(uint32_t*)0x20aecfe8 = (uint32_t)0x1); (*(uint32_t*)0x20aecfec = (uint32_t)0x0); (*(uint32_t*)0x20aecff0 = (uint32_t)0x3a); (*(uint32_t*)0x20aecff4 = (uint32_t)0x0); (*(uint64_t*)0x20aecff8 = (uint64_t)0x0); r[29] = syscall(__NR_ioctl, r[4], 0x4008ae89ul, 0x20aecfe8ul, 0, 0, 0, 0, 0, 0); i.e. KVM_SET_MSR ioctl with struct kvm_msrs { .nmsrs = 1, .pad = 0, .entries = { {.index = MSR_IA32_FEATURE_CONTROL, .reserved = 0, .data = 0} } } The VMLANCH/VMRESUME emulation should be stopped since the CPU is going to reset here. This patch resets the nested_run_pending since the CPU is going to be reset hence there should be nothing pending. Reported-by: NDmitry Vyukov <dvyukov@google.com> Suggested-by: NRadim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Shanker Donthineni 提交于
On Qualcomm Datacenter Technologies QDF2400 SoCs, the ITS hardware implementation uses 16Bytes for Interrupt Translation Entry (ITE), but reports an incorrect value of 8Bytes in GITS_TYPER.ITTE_size. It might cause kernel memory corruption depending on the number of MSI(x) that are configured and the amount of memory that has been allocated for ITEs in its_create_device(). This patch fixes the potential memory corruption by setting the correct ITE size to 16Bytes. Cc: stable@vger.kernel.org Signed-off-by: NShanker Donthineni <shankerd@codeaurora.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Guenter Roeck 提交于
Fix the following h8300 build failures: arch/h8300/kernel/ptrace_h.c: In function ‘trace_trap’: arch/h8300/kernel/ptrace_h.c:253:3: error: implicit declaration of function ‘force_sig’ Signed-off-by: NGuenter Roeck <linux@roeck-us.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: uclinux-h8-devel@lists.sourceforge.jp Fixes: c3edc401 ("sched/headers: Move task_struct::signal and ...") Link: http://lkml.kernel.org/r/1488738434-3504-1-git-send-email-linux@roeck-us.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Guenter Roeck 提交于
Various avr32 builds fail: arch/avr32/oprofile/backtrace.c:58: error: dereferencing pointer to incomplete type arch/avr32/oprofile/backtrace.c:60: error: implicit declaration of function 'user_mode' Signed-off-by: NGuenter Roeck <linux@roeck-us.net> Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: oprofile-list@lists.sf.net Fixes: f780d89a ("sched/headers: Remove <asm/ptrace.h> from ...") Link: http://lkml.kernel.org/r/1488762357-4500-1-git-send-email-linux@roeck-us.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jim Mattson 提交于
VMCLEAR should silently ignore a failure to clear the launch state of the VMCS referenced by the operand. Signed-off-by: NJim Mattson <jmattson@google.com> [Changed "kvm_write_guest(vcpu->kvm" to "kvm_vcpu_write_guest(vcpu".] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-