- 19 3月, 2018 6 次提交
-
-
由 Christoffer Dall 提交于
As we are about to move a bunch of save/restore logic for VHE kernels to the load and put functions, we need some infrastructure to do this. Reviewed-by: NAndrew Jones <drjones@redhat.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
We currently have a separate read-modify-write of the HCR_EL2 on entry to the guest for the sole purpose of setting the VF and VI bits, if set. Since this is most rarely the case (only when using userspace IRQ chip and interrupts are in flight), let's get rid of this operation and instead modify the bits in the vcpu->arch.hcr[_el2] directly when needed. Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Reviewed-by: NJulien Thierry <julien.thierry@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Shih-Wei Li 提交于
We always set the IMO and FMO bits in the HCR_EL2 when running the guest, regardless if we use the vgic or not. By moving these flags to HCR_GUEST_FLAGS we can avoid one of the extra save/restore operations of HCR_EL2 in the world switch code, and we can also soon get rid of the other one. This is safe, because even though the IMO and FMO bits control both taking the interrupts to EL2 and remapping ICC_*_EL1 to ICV_*_EL1 when executed at EL1, as long as we ensure that these bits are clear when running the EL1 host, we're OK, because we reset the HCR_EL2 to only have the HCR_RW bit set when returning to EL1 on non-VHE systems. Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NShih-Wei Li <shihwei@cs.columbia.edu> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
VHE actually doesn't rely on clearing the VTTBR when returning to the host kernel, and that is the current key mechanism of hyp_panic to figure out how to attempt to return to a state good enough to print a panic statement. Therefore, we split the hyp_panic function into two functions, a VHE and a non-VHE, keeping the non-VHE version intact, but changing the VHE behavior. The vttbr_el2 check on VHE doesn't really make that much sense, because the only situation where we can get here on VHE is when the hypervisor assembly code actually called into hyp_panic, which only happens when VBAR_EL2 has been set to the KVM exception vectors. On VHE, we can always safely disable the traps and restore the host registers at this point, so we simply do that unconditionally and call into the panic function directly. Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
We already have the percpu area for the host cpu state, which points to the VCPU, so there's no need to store the VCPU pointer on the stack on every context switch. We can be a little more clever and just use tpidr_el2 for the percpu offset and load the VCPU pointer from the host context. This has the benefit of being able to retrieve the host context even when our stack is corrupted, and it has a potential performance benefit because we trade a store plus a load for an mrs and a load on a round trip to the guest. This does require us to calculate the percpu offset without including the offset from the kernel mapping of the percpu array to the linear mapping of the array (which is what we store in tpidr_el1), because a PC-relative generated address in EL2 is already giving us the hyp alias of the linear mapping of a kernel address. We do this in __cpu_init_hyp_mode() by using kvm_ksym_ref(). The code that accesses ESR_EL2 was previously using an alternative to use the _EL1 accessor on VHE systems, but this was actually unnecessary as the _EL1 accessor aliases the ESR_EL2 register on VHE, and the _EL2 accessor does the same thing on both systems. Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
Calling vcpu_load() registers preempt notifiers for this vcpu and calls kvm_arch_vcpu_load(). The latter will soon be doing a lot of heavy lifting on arm/arm64 and will try to do things such as enabling the virtual timer and setting us up to handle interrupts from the timer hardware. Loading state onto hardware registers and enabling hardware to signal interrupts can be problematic when we're not actually about to run the VCPU, because it makes it difficult to establish the right context when handling interrupts from the timer, and it makes the register access code difficult to reason about. Luckily, now when we call vcpu_load in each ioctl implementation, we can simply remove the call from the non-KVM_RUN vcpu ioctls, and our kvm_arch_vcpu_load() is only used for loading vcpu content to the physical CPU when we're actually going to run the vcpu. Reviewed-by: NJulien Grall <julien.grall@arm.com> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 26 2月, 2018 4 次提交
-
-
由 Jérémy Fanguède 提交于
Set the handlers to emulate read and write operations for CNTP_CTL, CNTP_CVAL and CNTP_TVAL registers in such a way that VMs can use the physical timer. Signed-off-by: NJérémy Fanguède <j.fanguede@virtualopensystems.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
由 Jérémy Fanguède 提交于
Some 32bits guest OS can use the CNTP timer, however KVM does not handle the accesses, injecting a fault instead. Use the proper handlers to emulate the EL1 Physical Timer (CNTP) register accesses of AArch32 guests. Signed-off-by: NJérémy Fanguède <j.fanguede@virtualopensystems.com> Signed-off-by: NAlvise Rigo <a.rigo@virtualopensystems.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
由 Dave Martin 提交于
The HCR_EL2.TID3 flag needs to be set when trapping guest access to the CPU ID registers is required. However, the decision about whether to set this bit does not need to be repeated at every switch to the guest. Instead, it's sufficient to make this decision once and record the outcome. This patch moves the decision to vcpu_reset_hcr() and records the choice made in vcpu->arch.hcr_el2. The world switch code can then load this directly when switching to the guest without the need for conditional logic on the critical path. Signed-off-by: NDave Martin <Dave.Martin@arm.com> Suggested-by: NChristoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
由 Mark Rutland 提交于
We don't currently limit guest accesses to the LOR registers, which we neither virtualize nor context-switch. As such, guests are provided with unusable information/controls, and are not isolated from each other (or the host). To prevent these issues, we can trap register accesses and present the illusion LORegions are unssupported by the CPU. To do this, we mask ID_AA64MMFR1.LO, and set HCR_EL2.TLOR to trap accesses to the following registers: * LORC_EL1 * LOREA_EL1 * LORID_EL1 * LORN_EL1 * LORSA_EL1 ... when trapped, we inject an UNDEFINED exception to EL1, simulating their non-existence. As noted in D7.2.67, when no LORegions are implemented, LoadLOAcquire and StoreLORelease must behave as LoadAcquire and StoreRelease respectively. We can ensure this by clearing LORC_EL1.EN when a CPU's EL2 is first initialized, as the host kernel will not modify this. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: kvmarm@lists.cs.columbia.edu Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
- 23 2月, 2018 13 次提交
-
-
由 Pratyush Anand 提交于
do_task_stat() calls get_wchan(), which further does unwind_frame(). unwind_frame() restores frame->pc to original value in case function graph tracer has modified a return address (LR) in a stack frame to hook a function return. However, if function graph tracer has hit a filtered function, then we can't unwind it as ftrace_push_return_trace() has biased the index(frame->graph) with a 'huge negative' offset(-FTRACE_NOTRACE_DEPTH). Moreover, arm64 stack walker defines index(frame->graph) as unsigned int, which can not compare a -ve number. Similar problem we can have with calling of walk_stackframe() from save_stack_trace_tsk() or dump_backtrace(). This patch fixes unwind_frame() to test the index for -ve value and restore index accordingly before we can restore frame->pc. Reproducer: cd /sys/kernel/debug/tracing/ echo schedule > set_graph_notrace echo 1 > options/display-graph echo wakeup > current_tracer ps -ef | grep -i agent Above commands result in: Unable to handle kernel paging request at virtual address ffff801bd3d1e000 pgd = ffff8003cbe97c00 [ffff801bd3d1e000] *pgd=0000000000000000, *pud=0000000000000000 Internal error: Oops: 96000006 [#1] SMP [...] CPU: 5 PID: 11696 Comm: ps Not tainted 4.11.0+ #33 [...] task: ffff8003c21ba000 task.stack: ffff8003cc6c0000 PC is at unwind_frame+0x12c/0x180 LR is at get_wchan+0xd4/0x134 pc : [<ffff00000808892c>] lr : [<ffff0000080860b8>] pstate: 60000145 sp : ffff8003cc6c3ab0 x29: ffff8003cc6c3ab0 x28: 0000000000000001 x27: 0000000000000026 x26: 0000000000000026 x25: 00000000000012d8 x24: 0000000000000000 x23: ffff8003c1c04000 x22: ffff000008c83000 x21: ffff8003c1c00000 x20: 000000000000000f x19: ffff8003c1bc0000 x18: 0000fffffc593690 x17: 0000000000000000 x16: 0000000000000001 x15: 0000b855670e2b60 x14: 0003e97f22cf1d0f x13: 0000000000000001 x12: 0000000000000000 x11: 00000000e8f4883e x10: 0000000154f47ec8 x9 : 0000000070f367c0 x8 : 0000000000000000 x7 : 00008003f7290000 x6 : 0000000000000018 x5 : 0000000000000000 x4 : ffff8003c1c03cb0 x3 : ffff8003c1c03ca0 x2 : 00000017ffe80000 x1 : ffff8003cc6c3af8 x0 : ffff8003d3e9e000 Process ps (pid: 11696, stack limit = 0xffff8003cc6c0000) Stack: (0xffff8003cc6c3ab0 to 0xffff8003cc6c4000) [...] [<ffff00000808892c>] unwind_frame+0x12c/0x180 [<ffff000008305008>] do_task_stat+0x864/0x870 [<ffff000008305c44>] proc_tgid_stat+0x3c/0x48 [<ffff0000082fde0c>] proc_single_show+0x5c/0xb8 [<ffff0000082b27e0>] seq_read+0x160/0x414 [<ffff000008289e6c>] __vfs_read+0x58/0x164 [<ffff00000828b164>] vfs_read+0x88/0x144 [<ffff00000828c2e8>] SyS_read+0x60/0xc0 [<ffff0000080834a0>] __sys_trace_return+0x0/0x4 Fixes: 20380bb3 (arm64: ftrace: fix a stack tracer's output under function graph tracer) Signed-off-by: NPratyush Anand <panand@redhat.com> Signed-off-by: NJerome Marchand <jmarchan@redhat.com> [catalin.marinas@arm.com: replace WARN_ON with WARN_ON_ONCE] Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Samuel Neves 提交于
Without this fix, /proc/cpuinfo will display an incorrect amount of CPU cores, after bringing them offline and online again, as exemplified below: $ cat /proc/cpuinfo | grep cores cpu cores : 4 cpu cores : 8 cpu cores : 8 cpu cores : 20 cpu cores : 4 cpu cores : 3 cpu cores : 2 cpu cores : 2 This patch fixes this by always zeroing the booted_cores variable upon turning off a logical CPU. Tested-by: NDou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: NSamuel Neves <sneves@dei.uc.pt> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jgross@suse.com Cc: luto@kernel.org Cc: prarit@redhat.com Cc: vkuznets@redhat.com Link: http://lkml.kernel.org/r/20180221205036.5244-1-sneves@dei.uc.ptSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andrea Parri 提交于
Successful RMW operations are supposed to be fully ordered, but Alpha's xchg() and cmpxchg() do not meet this requirement. Will Deacon noticed the bug: > So MP using xchg: > > WRITE_ONCE(x, 1) > xchg(y, 1) > > smp_load_acquire(y) == 1 > READ_ONCE(x) == 0 > > would be allowed. ... which thus violates the above requirement. Fix it by adding a leading smp_mb() to the xchg() and cmpxchg() implementations. Reported-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NAndrea Parri <parri.andrea@gmail.com> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-alpha@vger.kernel.org Link: http://lkml.kernel.org/r/1519291488-5752-1-git-send-email-parri.andrea@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andrea Parri 提交于
Replace each occurrence of __ASM__MB with a (trailing) smp_mb() in xchg(), cmpxchg(), and remove the now unused __ASM__MB definitions; this improves readability, with no additional synchronization cost. Suggested-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NAndrea Parri <parri.andrea@gmail.com> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-alpha@vger.kernel.org Link: http://lkml.kernel.org/r/1519291469-5702-1-git-send-email-parri.andrea@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Wang Hui 提交于
x86/intel_rdt: Fix incorrect returned value when creating rdgroup sub-directory in resctrl file system If no monitoring feature is detected because all monitoring features are disabled during boot time or there is no monitoring feature in hardware, creating rdtgroup sub-directory by "mkdir" command reports error: mkdir: cannot create directory ‘/sys/fs/resctrl/p1’: No such file or directory But the sub-directory actually is generated and content is correct: cpus cpus_list schemata tasks The error is because rdtgroup_mkdir_ctrl_mon() returns non zero value after the sub-directory is created and the returned value is reported as an error to user. Clear the returned value to report to user that the sub-directory is actually created successfully. Signed-off-by: NWang Hui <john.wanghui@huawei.com> Signed-off-by: NZhang Yanfei <yanfei.zhang@huawei.com> Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vikas <vikas.shivappa@intel.com> Cc: Xiaochen Shen <xiaochen.shen@intel.com> Link: http://lkml.kernel.org/r/1519356363-133085-1-git-send-email-fenghua.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
When a irq vector is replaced, then the previous vector is normally released when the first interrupt happens on the new vector. If the target CPU of the previous vector is already offline when the new vector is installed, then the previous vector is silently discarded, which leads to accounting issues causing suspend failures and other problems. Adjust the logic so that the previous vector is freed in the underlying matrix allocator to ensure that the accounting stays correct. Fixes: 69cde000 ("x86/vector: Use matrix allocator for vector assignment") Reported-by: NYuriy Vostrikov <delamonpansie@gmail.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NYuriy Vostrikov <delamonpansie@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180222112316.930791749@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Michael Ellerman 提交于
Some versions of firmware will have a setting that can be configured to disable the RFI flush, add support for it. Fixes: 6e032b35 ("powerpc/powernv: Check device-tree for RFI flush settings") Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
Some versions of firmware will have a setting that can be configured to disable the RFI flush, add support for it. Fixes: 8989d568 ("powerpc/pseries: Query hypervisor for RFI flush settings") Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Bharata B Rao 提交于
Memory addtion and removal by count and indexed-count methods temporarily mark the LMBs that are being added/removed by a special flag value DRMEM_LMB_RESERVED. Accessing flags value directly at a few places without proper accessor method is causing two unexpected side-effects: - DRMEM_LMB_RESERVED bit is becoming part of the flags word of drconf_cell_v2 entries in ibm,dynamic-memory-v2 DT property. - This results in extra drconf_cell entries in ibm,dynamic-memory-v2. For example if 1G memory is added, it leads to one entry for 3 LMBs and 1 separate entry for the last LMB. All the 4 LMBs should be defined by one entry here. Fix this by always accessing the flags by its accessor method drmem_lmb_flags(). Fixes: 2b31e3ae ("powerpc/drmem: Add support for ibm, dynamic-memory-v2 property") Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Kees Cook 提交于
The MIPS %.its.S compiler command did not define __ASSEMBLY__, which meant when compiler_types.h was added to kconfig.h, unexpected things appeared (e.g. struct declarations) which should not have been present. As done in the general %.S compiler command, __ASSEMBLY__ is now included here too. The failure was: Error: arch/mips/boot/vmlinux.gz.its:201.1-2 syntax error FATAL ERROR: Unable to parse input tree /usr/bin/mkimage: Can't read arch/mips/boot/vmlinux.gz.itb.tmp: Invalid argument /usr/bin/mkimage Can't add hashes to FIT blob Reported-by: Nkbuild test robot <lkp@intel.com> Fixes: 28128c61 ("kconfig.h: Include compiler types to avoid missed struct attributes") Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Borkmann 提交于
I recently noticed a crash on arm64 when feeding a bogus index into BPF tail call helper. The crash would not occur when the interpreter is used, but only in case of JIT. Output looks as follows: [ 347.007486] Unable to handle kernel paging request at virtual address fffb850e96492510 [...] [ 347.043065] [fffb850e96492510] address between user and kernel address ranges [ 347.050205] Internal error: Oops: 96000004 [#1] SMP [...] [ 347.190829] x13: 0000000000000000 x12: 0000000000000000 [ 347.196128] x11: fffc047ebe782800 x10: ffff808fd7d0fd10 [ 347.201427] x9 : 0000000000000000 x8 : 0000000000000000 [ 347.206726] x7 : 0000000000000000 x6 : 001c991738000000 [ 347.212025] x5 : 0000000000000018 x4 : 000000000000ba5a [ 347.217325] x3 : 00000000000329c4 x2 : ffff808fd7cf0500 [ 347.222625] x1 : ffff808fd7d0fc00 x0 : ffff808fd7cf0500 [ 347.227926] Process test_verifier (pid: 4548, stack limit = 0x000000007467fa61) [ 347.235221] Call trace: [ 347.237656] 0xffff000002f3a4fc [ 347.240784] bpf_test_run+0x78/0xf8 [ 347.244260] bpf_prog_test_run_skb+0x148/0x230 [ 347.248694] SyS_bpf+0x77c/0x1110 [ 347.251999] el0_svc_naked+0x30/0x34 [ 347.255564] Code: 9100075a d280220a 8b0a002a d37df04b (f86b694b) [...] In this case the index used in BPF r3 is the same as in r1 at the time of the call, meaning we fed a pointer as index; here, it had the value 0xffff808fd7cf0500 which sits in x2. While I found tail calls to be working in general (also for hitting the error cases), I noticed the following in the code emission: # bpftool p d j i 988 [...] 38: ldr w10, [x1,x10] 3c: cmp w2, w10 40: b.ge 0x000000000000007c <-- signed cmp 44: mov x10, #0x20 // #32 48: cmp x26, x10 4c: b.gt 0x000000000000007c 50: add x26, x26, #0x1 54: mov x10, #0x110 // #272 58: add x10, x1, x10 5c: lsl x11, x2, #3 60: ldr x11, [x10,x11] <-- faulting insn (f86b694b) 64: cbz x11, 0x000000000000007c [...] Meaning, the tests passed because commit ddb55992 ("arm64: bpf: implement bpf_tail_call() helper") was using signed compares instead of unsigned which as a result had the test wrongly passing. Change this but also the tail call count test both into unsigned and cap the index as u32. Latter we did as well in 90caccdd ("bpf: fix bpf_tail_call() x64 JIT") and is needed in addition here, too. Tested on HiSilicon Hi1616. Result after patch: # bpftool p d j i 268 [...] 38: ldr w10, [x1,x10] 3c: add w2, w2, #0x0 40: cmp w2, w10 44: b.cs 0x0000000000000080 48: mov x10, #0x20 // #32 4c: cmp x26, x10 50: b.hi 0x0000000000000080 54: add x26, x26, #0x1 58: mov x10, #0x110 // #272 5c: add x10, x1, x10 60: lsl x11, x2, #3 64: ldr x11, [x10,x11] 68: cbz x11, 0x0000000000000080 [...] Fixes: ddb55992 ("arm64: bpf: implement bpf_tail_call() helper") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Daniel Borkmann 提交于
Implement a retpoline [0] for the BPF tail call JIT'ing that converts the indirect jump via jmp %rax that is used to make the long jump into another JITed BPF image. Since this is subject to speculative execution, we need to control the transient instruction sequence here as well when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop. The latter aligns also with what gcc / clang emits (e.g. [1]). JIT dump after patch: # bpftool p d x i 1 0: (18) r2 = map[id:1] 2: (b7) r3 = 0 3: (85) call bpf_tail_call#12 4: (b7) r0 = 2 5: (95) exit With CONFIG_RETPOLINE: # bpftool p d j i 1 [...] 33: cmp %edx,0x24(%rsi) 36: jbe 0x0000000000000072 |* 38: mov 0x24(%rbp),%eax 3e: cmp $0x20,%eax 41: ja 0x0000000000000072 | 43: add $0x1,%eax 46: mov %eax,0x24(%rbp) 4c: mov 0x90(%rsi,%rdx,8),%rax 54: test %rax,%rax 57: je 0x0000000000000072 | 59: mov 0x28(%rax),%rax 5d: add $0x25,%rax 61: callq 0x000000000000006d |+ 66: pause | 68: lfence | 6b: jmp 0x0000000000000066 | 6d: mov %rax,(%rsp) | 71: retq | 72: mov $0x2,%eax [...] * relative fall-through jumps in error case + retpoline for indirect jump Without CONFIG_RETPOLINE: # bpftool p d j i 1 [...] 33: cmp %edx,0x24(%rsi) 36: jbe 0x0000000000000063 |* 38: mov 0x24(%rbp),%eax 3e: cmp $0x20,%eax 41: ja 0x0000000000000063 | 43: add $0x1,%eax 46: mov %eax,0x24(%rbp) 4c: mov 0x90(%rsi,%rdx,8),%rax 54: test %rax,%rax 57: je 0x0000000000000063 | 59: mov 0x28(%rax),%rax 5d: add $0x25,%rax 61: jmpq *%rax |- 63: mov $0x2,%eax [...] * relative fall-through jumps in error case - plain indirect jump as before [0] https://support.google.com/faqs/answer/7625886 [1] https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2bSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 H.J. Lu 提交于
On i386, there are 2 types of PLTs, PIC and non-PIC. PIE and shared objects must use PIC PLT. To use PIC PLT, you need to load _GLOBAL_OFFSET_TABLE_ into EBX first. There is no need for that on x86-64 since x86-64 uses PC-relative PLT. On x86-64, for 32-bit PC-relative branches, we can generate PLT32 relocation, instead of PC32 relocation, which can also be used as a marker for 32-bit PC-relative branches. Linker can always reduce PLT32 relocation to PC32 if function is defined locally. Local functions should use PC32 relocation. As far as Linux kernel is concerned, R_X86_64_PLT32 can be treated the same as R_X86_64_PC32 since Linux kernel doesn't use PLT. R_X86_64_PLT32 for 32-bit PC-relative branches has been enabled in binutils master branch which will become binutils 2.31. [ hjl is working on having better documentation on this all, but a few more notes from him: "PLT32 relocation is used as marker for PC-relative branches. Because of EBX, it looks odd to generate PLT32 relocation on i386 when EBX doesn't have GOT. As for symbol resolution, PLT32 and PC32 relocations are almost interchangeable. But when linker sees PLT32 relocation against a protected symbol, it can resolved locally at link-time since it is used on a branch instruction. Linker can't do that for PC32 relocation" but for the kernel use, the two are basically the same, and this commit gets things building and working with the current binutils master - Linus ] Signed-off-by: NH.J. Lu <hjl.tools@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 2月, 2018 6 次提交
-
-
由 Will Deacon 提交于
ioremap_page_range doesn't honour break-before-make and attempts to put down huge mappings (using p*d_set_huge) over the top of pre-existing table entries. This leads to us leaking page table memory and also gives rise to TLB conflicts and spurious aborts, which have been seen in practice on Cortex-A75. Until this has been resolved, refuse to put block mappings when the existing entry is found to be present. Fixes: 324420bf ("arm64: add support for ioremap() block mappings") Reported-by: NHanjun Guo <hanjun.guo@linaro.org> Reported-by: NLei Li <lious.lilei@hisilicon.com> Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Ingo Molnar 提交于
On lkml suggestions were made to split up such trivial typo fixes into per subsystem patches: --- a/arch/x86/boot/compressed/eboot.c +++ b/arch/x86/boot/compressed/eboot.c @@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height) struct efi_uga_draw_protocol *uga = NULL, *first_uga; efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID; unsigned long nr_ugas; - u32 *handles = (u32 *)uga_handle;; + u32 *handles = (u32 *)uga_handle; efi_status_t status = EFI_INVALID_PARAMETER; int i; This patch is the result of the following script: $ sed -i 's/;;$/;/g' $(git grep -E ';;$' | grep "\.[ch]:" | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq) ... followed by manual review to make sure it's all good. Splitting this up is just crazy talk, let's get over with this and just do it. Reported-by: NPavel Machek <pavel@ucw.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mark Lord 提交于
I am using SECCOMP to filter syscalls on a ppc32 platform, and noticed that the JIT compiler was failing on the BPF even though the interpreter was working fine. The issue was that the compiler was missing one of the instructions used by SECCOMP, so here is a patch to enable JIT for that instruction. Fixes: eb84bab0 ("ppc: Kconfig: Enable BPF JIT on ppc32") Signed-off-by: NMark Lord <mlord@pobox.com> Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Bringmann 提交于
This reverts commit 02ef6dd8. The earlier patch tried to enable support for a new property "ibm,drc-info" on powerpc systems. Unfortunately, some errors in the associated patch set break things in some of the DLPAR operations. In particular when attempting to hot-add a new CPU or set of CPUs, the original patch failed to properly calculate the available resources, and aborted the operation. In addition, the original set missed several opportunities to compress and reuse common code. As the associated patch set was meant to provide an optimization of storage and performance of a set of device-tree properties for future systems with large amounts of resources, reverting just restores the previous behavior for existing systems. It seems unnecessary to enable this feature and introduce the consequent problems in the field that it will cause at this time, so please revert it for now until testing of the corrections are finished properly. Fixes: 02ef6dd8 ("powerpc: Enable support for ibm,drc-info devtree property") Signed-off-by: NMichael W. Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
We had a mid-air collision between two new firmware features, DRMEM_V2 and DRC_INFO, and they ended up with the same value. No one's actually reported any problems, presumably because the new firmware that supports both properties is not widely available, and the two properties tend to be enabled together. Still if we ever had one enabled but not the other, the bugs that could result are many and varied. So fix it. Fixes: 3f38000e ("powerpc/firmware: Add definitions for new drc-info firmware feature") Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
-
由 Arnd Bergmann 提交于
Looking at functions with large stack frames across all architectures led me discovering that BUG() suffers from the same problem as fortify_panic(), which I've added a workaround for already. In short, variables that go out of scope by calling a noreturn function or __builtin_unreachable() keep using stack space in functions afterwards. A workaround that was identified is to insert an empty assembler statement just before calling the function that doesn't return. I'm adding a macro "barrier_before_unreachable()" to document this, and insert calls to that in all instances of BUG() that currently suffer from this problem. The files that saw the largest change from this had these frame sizes before, and much less with my patch: fs/ext4/inode.c:82:1: warning: the frame size of 1672 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/namei.c:434:1: warning: the frame size of 904 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/super.c:2279:1: warning: the frame size of 1160 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/xattr.c:146:1: warning: the frame size of 1168 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/f2fs/inode.c:152:1: warning: the frame size of 1424 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_core.c:1195:1: warning: the frame size of 1068 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_core.c:395:1: warning: the frame size of 1084 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_ftp.c:298:1: warning: the frame size of 928 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_ftp.c:418:1: warning: the frame size of 908 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_lblcr.c:718:1: warning: the frame size of 960 bytes is larger than 800 bytes [-Wframe-larger-than=] drivers/net/xen-netback/netback.c:1500:1: warning: the frame size of 1088 bytes is larger than 800 bytes [-Wframe-larger-than=] In case of ARC and CRIS, it turns out that the BUG() implementation actually does return (or at least the compiler thinks it does), resulting in lots of warnings about uninitialized variable use and leaving noreturn functions, such as: block/cfq-iosched.c: In function 'cfq_async_queue_prio': block/cfq-iosched.c:3804:1: error: control reaches end of non-void function [-Werror=return-type] include/linux/dmaengine.h: In function 'dma_maxpq': include/linux/dmaengine.h:1123:1: error: control reaches end of non-void function [-Werror=return-type] This makes them call __builtin_trap() instead, which should normally dump the stack and kill the current process, like some of the other architectures already do. I tried adding barrier_before_unreachable() to panic() and fortify_panic() as well, but that had very little effect, so I'm not submitting that patch. Vineet said: : For ARC, it is double win. : : 1. Fixes 3 -Wreturn-type warnings : : | ../net/core/ethtool.c:311:1: warning: control reaches end of non-void function : [-Wreturn-type] : | ../kernel/sched/core.c:3246:1: warning: control reaches end of non-void function : [-Wreturn-type] : | ../include/linux/sunrpc/svc_xprt.h:180:1: warning: control reaches end of : non-void function [-Wreturn-type] : : 2. bloat-o-meter reports code size improvements as gcc elides the : generated code for stack return. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365 Link: http://lkml.kernel.org/r/20171219114112.939391-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc] Tested-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc] Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Christopher Li <sparse@chrisli.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 2月, 2018 8 次提交
-
-
由 Andrea Parri 提交于
Continuing along with the fight against smp_read_barrier_depends() [1] (or rather, against its improper use), add an unconditional barrier to cmpxchg. This guarantees that dependency ordering is preserved when a dependency is headed by an unsuccessful cmpxchg. As it turns out, the change could enable further simplification of LKMM as proposed in [2]. [1] https://marc.info/?l=linux-kernel&m=150884953419377&w=2 https://marc.info/?l=linux-kernel&m=150884946319353&w=2 https://marc.info/?l=linux-kernel&m=151215810824468&w=2 https://marc.info/?l=linux-kernel&m=151215816324484&w=2 [2] https://marc.info/?l=linux-kernel&m=151881978314872&w=2Signed-off-by: NAndrea Parri <parri.andrea@gmail.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-alpha@vger.kernel.org Link: http://lkml.kernel.org/r/1519152356-4804-1-git-send-email-parri.andrea@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Arnd Bergmann 提交于
GCC-8 shows a warning for the x86 oprofile code that copies per-CPU data from CPU 0 to all other CPUs, which when building a non-SMP kernel turns into a memcpy() with identical source and destination pointers: arch/x86/oprofile/nmi_int.c: In function 'mux_clone': arch/x86/oprofile/nmi_int.c:285:2: error: 'memcpy' source argument is the same as destination [-Werror=restrict] memcpy(per_cpu(cpu_msrs, cpu).multiplex, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ per_cpu(cpu_msrs, 0).multiplex, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ sizeof(struct op_msr) * model->num_virt_counters); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/x86/oprofile/nmi_int.c: In function 'nmi_setup': arch/x86/oprofile/nmi_int.c:466:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict] arch/x86/oprofile/nmi_int.c:470:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict] I have analyzed a number of such warnings now: some are valid and the GCC warning is welcome. Others turned out to be false-positives, and GCC was changed to not warn about those any more. This is a corner case that is a false-positive but the GCC developers feel it's better to keep warning about it. In this case, it seems best to work around it by telling GCC a little more clearly that this code path is never hit with an IS_ENABLED() configuration check. Cc:stable as we also want old kernels to build cleanly with GCC-8. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Cc: Jessica Yu <jeyu@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Sebor <msebor@gcc.gnu.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: oprofile-list@lists.sf.net Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180220205826.2008875-1-arnd@arndb.de Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84095Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Juan J. Alvarez 提交于
The notify_resume() callback in eeh_ops is NULL on powernv, leading to crashes: NIP (null) LR eeh_report_resume+0x218/0x220 Call Trace: eeh_report_resume+0x1f0/0x220 (unreliable) eeh_pe_dev_traverse+0x98/0x170 eeh_handle_normal_event+0x3f4/0x650 eeh_handle_event+0x54/0x380 eeh_event_handler+0x14c/0x210 kthread+0x168/0x1b0 ret_from_kernel_thread+0x5c/0xb4 Fix it by adding a check before calling it. Fixes: 856e1eb9 ("PCI/AER: Add uevents in AER and EEH error/resume") Signed-off-by: NJuan J. Alvarez <jjalvare@linux.vnet.ibm.com> Reviewed-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com> Tested-by: NCarol L. Soto <clsoto@us.ibm.com> Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Tested-by: NMauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com> Acked-by: NMichael Neuling <mikey@neuling.org> [mpe: Rewrite change log] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Clark 提交于
The sbi_ prefix would seem to indicate an SBI interface, and save is not very specific. After applying this patch, reading head.S makes more sense. Signed-off-by: NMichael Clark <michaeljclark@mac.com> Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
-
由 zongbox@gmail.com 提交于
Interrupt is allowed during exception handling. There are warning messages if the kernel enables the configuration 'CONFIG_DEBUG_ATOMIC_SLEEP=y'. BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:23 in_atomic(): 0, irqs_disabled(): 1, pid: 43, name: ash CPU: 0 PID: 43 Comm: ash Tainted: G W 4.15.0-rc8-00089-g89ffdae-dirty #17 Call Trace: [<000000009abb1587>] walk_stackframe+0x0/0x7a [<00000000d4f3d088>] ___might_sleep+0x102/0x11a [<00000000b1fd792a>] down_read+0x18/0x28 [<000000000289ec01>] do_page_fault+0x86/0x2f6 [<00000000012441f6>] _do_fork+0x1b4/0x1e0 [<00000000f46c3e3b>] ret_from_syscall+0xa/0xe Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NZong Li <zong@andestech.com> Signed-off-by: NPalmer Dabbelt <palmer@dabbelt.com> Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
-
由 Ulf Magnusson 提交于
The ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE symbol was removed in commit 51a02124 ("atomic64: no need for CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE"). Remove the ARCH_HAS_ATOMIC64_DEC_IS_POSITIVE select from RISCV. Discovered with the https://github.com/ulfalizer/Kconfiglib/blob/master/examples/list_undefined.py script. Signed-off-by: NUlf Magnusson <ulfalizer@gmail.com> Reviewed-by: NJonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
-
由 Ulf Magnusson 提交于
The RISCV_IRQ_INTC configuration symbol is undefined, but RISCV selects it. Quoting Palmer Dabbelt: It looks like this slipped through, the symbol has been renamed RISCV_INTC. No RISCV_INTC configuration symbol has been merged either. Just remove the RISCV_IRQ_INTC select for now. Signed-off-by: NUlf Magnusson <ulfalizer@gmail.com> Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
-
由 Ulf Magnusson 提交于
The ARCH_WANT_OPTIONAL_GPIOLIB symbol was removed in commit 65053e1a ("gpio: delete ARCH_[WANTS_OPTIONAL|REQUIRE]_GPIOLIB"). GPIOLIB should just be selected explicitly if needed. Remove the ARCH_WANT_OPTIONAL_GPIOLIB select from RISCV. See commit 0145071b ("x86: Do away with ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB") and commit da9a1c67 ("arm64: do away with ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB") as well. Discovered with the https://github.com/ulfalizer/Kconfiglib/blob/master/examples/list_undefined.py script. Reviewed-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NUlf Magnusson <ulfalizer@gmail.com> Signed-off-by: NPalmer Dabbelt <palmer@dabbelt.com>
-
- 20 2月, 2018 3 次提交
-
-
由 James Hogan 提交于
MIPS' struct compat_flock doesn't match the 32-bit struct flock, as it has an extra short __unused before pad[4], which combined with alignment increases the size to 40 bytes compared with struct flock's 36 bytes. Since commit 8c6657cb ("Switch flock copyin/copyout primitives to copy_{from,to}_user()"), put_compat_flock() writes the full compat_flock struct to userland, which results in corruption of the userland word after the struct flock when running 32-bit userlands on 64-bit kernels. This was observed to cause a bus error exception when starting Firefox on Debian 8 (Jessie). Reported-by: NPeter Mamonov <pmamonov@gmail.com> Signed-off-by: NJames Hogan <jhogan@kernel.org> Tested-by: NPeter Mamonov <pmamonov@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 4.13+ Patchwork: https://patchwork.linux-mips.org/patch/18646/
-
由 Mark Rutland 提交于
The ID_AA64DFR0_EL1.PMUVer field doesn't follow the usual ID registers scheme. While value 0xf indicates a non-architected PMU is implemented, values 0x1 to 0xe indicate an increasingly featureful architected PMU, as if the field were unsigned. For more details, see ARM DDI 0487C.a, D10.1.4, "Alternative ID scheme used for the Performance Monitors Extension version". Currently, we treat the field as signed, and erroneously bail out for values 0x8 to 0xe. Let's correct that. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Reviewed-by: NRobin Murphy <robin.murphy@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-
由 Mark Rutland 提交于
The ux500 PMU IRQ bouncer is getting in the way of some fundametnal changes to the ARM PMU driver, and it's the only special case that exists today. Let's remove it. Reviewed-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-