- 13 2月, 2018 3 次提交
-
-
由 Alexey Kardashevskiy 提交于
Radix guests do normally invalidate process-scoped translations when a new pid is allocated but migrated guests do not invalidate these so migrated guests crash sometime, especially easy to reproduce with migration happening within first 10 seconds after the guest boot start on the same machine. This adds the "Invalidate process-scoped translations" flush to fix radix guests migration. Fixes: 2ee13be3 ("KVM: PPC: Book3S HV: Update kvmppc_set_arch_compat() for ISA v3.00") Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Tested-by: NLaurent Vivier <lvivier@redhat.com> Tested-by: NDaniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
cp_abort is only required for user windows, because kernel context must not be preempted between a copy/paste pair. Without this patch, the init task gets used_vas set when it runs the nx842_powernv_init initcall, which opens windows for kernel usage. used_vas is then never cleared anywhere, so it gets propagated into all other tasks. It's a property of the address space, so it should really be cleared when a new mm is created (or in dup_mmap if the mmaps are marked as VM_DONTCOPY). For now we seem to have no such driver, so leave that for another patch. Fixes: 6c8e6bb2 ("powerpc/vas: Add support for user receive window") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Reviewed-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Sam Bobroff 提交于
Currently if the kernel receives a memory hot-unplug event early enough, it may get stuck in an infinite loop in dissolve_free_huge_pages(). This appears as a stall just after: pseries-hotplug-mem: Attempting to hot-remove XX LMB(s) at YYYYYYYY It appears to be caused by "minimum_order" being uninitialized, due to init_ras_IRQ() executing before hugetlb_init(). To correct this, extract the part of init_ras_IRQ() that enables hotplug event processing and place it in the machine_late_initcall phase, which is guaranteed to be after hugetlb_init() is called. Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com> Acked-by: NBalbir Singh <bsingharora@gmail.com> [mpe: Reorder the functions to make the diff readable] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 08 2月, 2018 5 次提交
-
-
由 Balbir Singh 提交于
This patch splits the linear mapping if the hot-unplug range is smaller than the mapping size. The code detects if the mapping needs to be split into a smaller size and if so, uses the stop machine infrastructure to clear the existing mapping and then remap the remaining range using a smaller page size. The code will skip any region of the mapping that overlaps with kernel text and warn about it once. We don't want to remove a mapping where the kernel text and the LMB we intend to remove overlap in the same TLB mapping as it may affect the currently executing code. I've tested these changes under a kvm guest with 2 vcpus, from a split mapping point of view, some of the caveats mentioned above applied to the testing I did. Fixes: 4b5d62ca ("powerpc/mm: add radix__remove_section_mapping()") Signed-off-by: NBalbir Singh <bsingharora@gmail.com> [mpe: Tweak change log to match updated behaviour] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
This change restores and formalises the behaviour that access to NULL or other user addresses by the kernel during boot should fault rather than succeed and modify memory. This was inadvertently broken when fixing another bug, because it was previously not well defined and only worked by chance. powerpc/64s/radix uses high address bits to select an address space "quadrant", which determines which PID and LPID are used to translate the rest of the address (effective PID, effective LPID). The kernel mapping at 0xC... selects quadrant 3, which uses PID=0 and LPID=0. So the kernel page tables are installed in the PID 0 process table entry. An address at 0x0... selects quadrant 0, which uses PID=PIDR for translating the rest of the address (that is, it uses the value of the PIDR register as the effective PID). If PIDR=0, then the translation is performed with the PID 0 process table entry page tables. This is the kernel mapping, so we effectively get another copy of the kernel address space at 0. A NULL pointer access will access physical memory address 0. To prevent duplicating the kernel address space in quadrant 0, this patch allocates a guard PID containing no translations, and initializes PIDR with this during boot, before the MMU is switched on. Any kernel access to quadrant 0 will use this guard PID for translation and find no valid mappings, and therefore fault. After boot, this PID will be switchd away to user context PIDs, but those contain user mappings (and usually NULL pointer protection) rather than kernel mapping, which is much safer (and by design). It may be in future this is tightened further, which the guard PID could be used for. Commit 371b8044 ("powerpc/64s: Initialize ISAv3 MMU registers before setting partition table"), introduced this problem because it zeroes PIDR at boot. However previously the value was inherited from firmware or kexec, which is not robust and can be zero (e.g., mambo). Fixes: 371b8044 ("powerpc/64s: Initialize ISAv3 MMU registers before setting partition table") Cc: stable@vger.kernel.org # v4.15+ Reported-by: NFlorian Weimer <fweimer@redhat.com> Tested-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
The soft IRQ masking code has to hard-disable interrupts in cases where the exception is not cleared by the masked handler. External interrupts used this approach for soft masking. Now recently PMU interrupts do the same thing. The soft IRQ masking code additionally allowed for interrupt handlers to hard-enable interrupts after soft-disabling them. The idea is to allow PMU interrupts through to profile interrupt handlers. So when interrupts are being replayed when there is a pending interrupt that requires hard-disabling, there is a test to prevent those handlers from hard-enabling them if there is a pending external interrupt. may_hard_irq_enable() handles this. After f442d004 ("powerpc/64s: Add support to mask perf interrupts and replay them"), may_hard_irq_enable() could prematurely enable MSR[EE] when a PMU exception exists, which would result in the interrupt firing again while masked, and MSR[EE] being disabled again. I haven't seen that this could cause a serious problem, but it's more consistent to handle these soft-masked interrupts in the same way. So introduce a define for all types of interrupts that require MSR[EE] masking in their soft-disable handlers, and use that in may_hard_irq_enable(). Fixes: f442d004 ("powerpc/64s: Add support to mask perf interrupts and replay them") Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Reviewed-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Madhavan Srinivasan 提交于
Commit f14e953b ("powerpc/64s: Add support to take additional parameter in MASKABLE_* macro") messed up MASKABLE_RELON_EXCEPTION_HV_OOL macro by adding the wrong SOFTEN test which caused guest kernel crash at boot. Patch to fix the macro to use SOFTEN_TEST_HV instead of SOFTEN_NOTEST_HV. Fixes: f14e953b ("powerpc/64s: Add support to take additional parameter in MASKABLE_* macro") Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Fix-Suggested-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nathan Fontenot 提交于
When DLPAR removing a CPU, the unmapping of the cpu from a node in unmap_cpu_from_node() should also invalidate the CPUs entry in the numa_cpu_lookup_table. There is not a guarantee that on a subsequent DLPAR add of the CPU the associativity will be the same and thus could be in a different node. Invalidating the entry in the numa_cpu_lookup_table causes the associativity to be read from the device tree at the time of the add. The current behavior of not invalidating the CPUs entry in the numa_cpu_lookup_table can result in scenarios where the the topology layout of CPUs in the partition does not match the device tree or the topology reported by the HMC. This bug looks like it was introduced in 2004 in the commit titled "ppc64: cpu hotplug notifier for numa", which is 6b15e4e87e32 in the linux-fullhist tree. Hence tag it for all stable releases. Cc: stable@vger.kernel.org Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com> Reviewed-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 06 2月, 2018 2 次提交
-
-
由 Mathieu Desnoyers 提交于
Allow expedited membarrier to be used for data shared between processes through shared memory. Processes wishing to receive the membarriers register with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED. Those which want to issue membarrier invoke MEMBARRIER_CMD_GLOBAL_EXPEDITED. This allows extremely simple kernel-level implementation: we have almost everything we need with the PRIVATE_EXPEDITED barrier code. All we need to do is to add a flag in the mm_struct that will be used to check whether we need to send the IPI to the current thread of each CPU. There is a slight downside to this approach compared to targeting specific shared memory users: when performing a membarrier operation, all registered "global" receivers will get the barrier, even if they don't share a memory mapping with the sender issuing MEMBARRIER_CMD_GLOBAL_EXPEDITED. This registration approach seems to fit the requirement of not disturbing processes that really deeply care about real-time: they simply should not register with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED. In order to align the membarrier command names, the "MEMBARRIER_CMD_SHARED" command is renamed to "MEMBARRIER_CMD_GLOBAL", keeping an alias of MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_GLOBAL for UAPI header backward compatibility. Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-5-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mathieu Desnoyers 提交于
Allow PowerPC to skip the full memory barrier in switch_mm(), and only issue the barrier when scheduling into a task belonging to a process that has registered to use expedited private. Threads targeting the same VM but which belong to different thread groups is a tricky case. It has a few consequences: It turns out that we cannot rely on get_nr_threads(p) to count the number of threads using a VM. We can use (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1) instead to skip the synchronize_sched() for cases where the VM only has a single user, and that user only has a single thread. It also turns out that we cannot use for_each_thread() to set thread flags in all threads using a VM, as it only iterates on the thread group. Therefore, test the membarrier state variable directly rather than relying on thread flags. This means membarrier_register_private_expedited() needs to set the MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows private expedited membarrier commands to succeed. membarrier_arch_switch_mm() now tests for the MEMBARRIER_STATE_PRIVATE_EXPEDITED flag. Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20180129202020.8515-3-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 01 2月, 2018 3 次提交
-
-
由 Aneesh Kumar K.V 提交于
Instead of marking the pmd ready for split, invalidate the pmd. This should take care of powerpc requirement. Only side effect is that we mark the pmd invalid early. This can result in us blocking access to the page a bit longer if we race against a thp split. [kirill.shutemov@linux.intel.com: rebased, dirty THP once] Link: http://lkml.kernel.org/r/20171213105756.69879-13-kirill.shutemov@linux.intel.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Daney <david.daney@cavium.com> Cc: David Miller <davem@davemloft.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Nitin Gupta <nitin.m.gupta@oracle.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
It's required to avoid losing dirty and accessed bits. Link: http://lkml.kernel.org/r/20171213105756.69879-7-kirill.shutemov@linux.intel.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pavel Tatashin 提交于
There is no need to have ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT, as all the page initialization code is in common code. Also, there is no need to depend on MEMORY_HOTPLUG, as initialization code does not really use hotplug memory functionality. So, we can remove this requirement as well. This patch allows to use deferred struct page initialization on all platforms with memblock allocator. Tested on x86, arm64, and sparc. Also, verified that code compiles on PPC with CONFIG_MEMORY_HOTPLUG disabled. Link: http://lkml.kernel.org/r/20171117014601.31606-1-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390] Reviewed-by: NKhalid Aziz <khalid.aziz@oracle.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 1月, 2018 1 次提交
-
-
由 Michael Ellerman 提交于
The recent TLB flush rework broke the build when the Radix MMU is disabled at build time, eg: (.text+0x264): undefined reference to `.radix__tlbiel_all' We could add an empty version, but if we ever called it by accident that would indicate a bad bug, so add a stub that just WARNs if we do. Fixes: d4748276 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 28 1月, 2018 4 次提交
-
-
由 Michael Ellerman 提交于
When a CPU detects its locked up via soft_nmi_interrupt() we have pt_regs, so print the regs->nip, which points to where we took the soft-NMI. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
soft_nmi_interrupt() is called directly from the asm exception handling code, which passes regs as a pointer to the stack. So regs can't be NULL, it may be full of junk, but that's a separate problem. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
Use pr_fmt() in the watchdog code, so we don't have to say "Watchdog" so many times. Rather than "CPU:%d" just spell it "CPU %d", "Hard" doesn't need a capital in the middle of a sentence, and "LOCKUP other CPUS" should be "LOCKUP on other CPUS". Also make it clear when a CPU self detects a lockup by spelling it out. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
The QS21/22 IBM Cell blades had a southbridge chip called Axon. This could have DDR DIMMs attached to it, though they were not directly usable as RAM, instead they could be used as some sort of buffer, if applications were written specifically to use the block device provided by the driver. Although the driver supposedly had direct access support, it was apparently never tested (see commit 91117a20 ("axonram: Fix bug in direct_access")). These machines have not been available for over 5 years, and were never widely in use. It seems highly unlikely anyone is using this driver. In general we're happy to leave old drivers in the tree, but because DAX is involved this driver is caught up in the ongoing work in that area, but none of the DAX folks are able to test it. So remove the driver, if any one *is* using it, we'll be happy to put it back. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 27 1月, 2018 15 次提交
-
-
由 Julia Cartwright 提交于
The mpc52xx_gpt code currently implements an irq_chip for handling interrupts; due to how irq_chip handling is done, it's necessary for the irq_chip methods to be invoked from hardirq context, even on a a real-time kernel. Because the spinlock_t type becomes a "sleeping" spinlock w/ RT kernels, it is not suitable to be used with irq_chips. A quick audit of the operations under the lock reveal that they do only minimal, bounded work, and are therefore safe to do under a raw spinlock. Signed-off-by: NJulia Cartwright <julia@ni.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Bringmann 提交于
On powerpc systems with shared configurations of CPUs and memory and memoryless nodes at boot, an event ordering problem was observed on a SLES12 build platforms with the hot-add of CPUs to the memoryless nodes. * The most common error occurred when the memory SLAB driver attempted to reference the memoryless node to which a CPU was being added before the kernel had finished initializing all of the data structures for the CPU and exited 'device_online' under DLPAR/hot-add. Normally the memoryless node would be initialized through the call path device_online ... arch_update_cpu_topology ... find_cpu_nid ... try_online_node. This patch ensures that the powerpc node will be initialized as early as possible, even if it was memoryless and CPU-less at the point when we are trying to hot-add a new CPU to it. Signed-off-by: NMichael Bringmann <mwb@linux.vnet.ibm.com> Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Bringmann 提交于
This patch fixes some problems encountered at runtime with configurations that support memory-less nodes, or that hot-add CPUs into nodes that are memoryless during system execution after boot. The problems of interest include: * Nodes known to powerpc to be memoryless at boot, but to have CPUs in them are allowed to be 'possible' and 'online'. Memory allocations for those nodes are taken from another node that does have memory until and if memory is hot-added to the node. * Nodes which have no resources assigned at boot, but which may still be referenced subsequently by affinity or associativity attributes, are kept in the list of 'possible' nodes for powerpc. Hot-add of memory or CPUs to the system can reference these nodes and bring them online instead of redirecting the references to one of the set of nodes known to have memory at boot. Note that this software operates under the context of CPU hotplug. We are not doing memory hotplug in this code, but rather updating the kernel's CPU topology (i.e. arch_update_cpu_topology / numa_update_cpu_topology). We are initializing a node that may be used by CPUs or memory before it can be referenced as invalid by a CPU hotplug operation. CPU hotplug operations are protected by a range of APIs including cpu_maps_update_begin/cpu_maps_update_done, cpus_read/write_lock / cpus_read/write_unlock, device locks, and more. Memory hotplug operations, including try_online_node, are protected by mem_hotplug_begin/mem_hotplug_done, device locks, and more. In the case of CPUs being hot-added to a previously memoryless node, the try_online_node operation occurs wholly within the CPU locks with no overlap. Using HMC hot-add/hot-remove operations, we have been able to add and remove CPUs to any possible node without failures. HMC operations involve a degree self-serialization, though. Signed-off-by: NMichael Bringmann <mwb@linux.vnet.ibm.com> Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Bringmann 提交于
On powerpc systems which allow 'hot-add' of CPU or memory resources, it may occur that the new resources are to be inserted into nodes that were not used for these resources at bootup. In the kernel, any node that is used must be defined and initialized. These empty nodes may occur when, * Dedicated vs. shared resources. Shared resources require information such as the VPHN hcall for CPU assignment to nodes. Associativity decisions made based on dedicated resource rules, such as associativity properties in the device tree, may vary from decisions made using the values returned by the VPHN hcall. * memoryless nodes at boot. Nodes need to be defined as 'possible' at boot for operation with other code modules. Previously, the powerpc code would limit the set of possible nodes to those which have memory assigned at boot, and were thus online. Subsequent add/remove of CPUs or memory would only work with this subset of possible nodes. * memoryless nodes with CPUs at boot. Due to the previous restriction on nodes, nodes that had CPUs but no memory were being collapsed into other nodes that did have memory at boot. In practice this meant that the node assignment presented by the runtime kernel differed from the affinity and associativity attributes presented by the device tree or VPHN hcalls. Nodes that might be known to the pHyp were not 'possible' in the runtime kernel because they did not have memory at boot. This patch ensures that sufficient nodes are defined to support configuration requirements after boot, as well as at boot. This patch set fixes a couple of problems. * Nodes known to powerpc to be memoryless at boot, but to have CPUs in them are allowed to be 'possible' and 'online'. Memory allocations for those nodes are taken from another node that does have memory until and if memory is hot-added to the node. * Nodes which have no resources assigned at boot, but which may still be referenced subsequently by affinity or associativity attributes, are kept in the list of 'possible' nodes for powerpc. Hot-add of memory or CPUs to the system can reference these nodes and bring them online instead of redirecting to one of the set of nodes that were known to have memory at boot. This patch extracts the value of the lowest domain level (number of allocable resources) from the device tree property "ibm,max-associativity-domains" to use as the maximum number of nodes to setup as possibly available in the system. This new setting will override the instruction: nodes_and(node_possible_map, node_possible_map, node_online_map); presently seen in the function arch/powerpc/mm/numa.c:initmem_init(). If the "ibm,max-associativity-domains" property is not present at boot, no operation will be performed to define or enable additional nodes, or enable the above 'nodes_and()'. Signed-off-by: NMichael Bringmann <mwb@linux.vnet.ibm.com> Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Sukadev Bhattiprolu 提交于
clear_thread_tidr() is called in interrupt context as a part of delayed put of the task structure (i.e as a part of timer interrupt). To prevent a deadlock, block interrupts when holding vas_thread_id_lock to set/ clear TIDR for a task. Fixes: ec233ede ("powerpc: Add support for setting SPRN_TIDR") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
The pcidev value stored in pci_dn is only used for NPU/NPU2 initialization. We can easily drop the cached pointer and use an ancient helper - pci_get_domain_bus_and_slot() instead in order to reduce complexity. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Acked-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Christophe Leroy 提交于
Most of the time, flush_tlb_range() is called on single pages. At the time being, flush_tlb_range() inconditionnaly calls flush_tlb_mm() which flushes at least the entire PID pages and on older CPUs like 4xx or 8xx it flushes the entire TLB table. This patch calls flush_tlb_page() instead of flush_tlb_mm() when the range is a single page. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Bryant G. Ly 提交于
When enabling SR-IOV in pseries platform, the VF bar properties for a PF are reported on the device node in the device tree. This patch adds the IOV Bar resources to Linux structures from the device tree for later use when configuring SR-IOV by PF driver. Signed-off-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: NJuan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Bryant G. Ly 提交于
After initial validation of SR-IOV resources, firmware will associate PEs to the dynamic VFs created within this call. This patch adds the association of PEs to the PF array of PE numbers indexed by VF. Signed-off-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: NJuan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Bryant G. Ly 提交于
Introduce a method for notify resume to be called from sysfs. In this patch one can now call notify resume from sysfs when is supported by platform. Signed-off-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: NJuan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: NRussell Currey <ruscur@russell.cc> [mpe: Add NULL check, add empty versions to avoid #ifdefs] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Bryant G. Ly 提交于
When pseries SR-IOV is enabled and after a PF driver has resumed from EEH, platform has to be notified of the event so the child VFs can be allowed to resume their normal recovery path. This patch makes the EEH operation allow unfreeze platform dependent code and adds the call to pseries EEH code. Signed-off-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: NJuan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Bryant G. Ly 提交于
To correctly use EEH code one has to make sure that the EEH_PE_VF is set for dynamic created VFs. Therefore this patch allocates an eeh_pe of eeh type EEH_PE_VF and associates PE with parent. Signed-off-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: NJuan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Bryant G. Ly 提交于
Devices can go offline when erors reported. This patch adds a change to the kernel object and lets udev know of error. When device resumes, a change is also set reporting device as online. Therefore, EEH and AER events are better propagated to user space for PCI devices in all arches. Signed-off-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: NJuan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Acked-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Bryant G. Ly 提交于
Add EEH platform operations for pseries to update VF config space. With this change after EEH, the VF will have updated config space for pseries platform. Signed-off-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: NJuan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Daniel Borkmann 提交于
Since we've changed div/mod exception handling for src_reg in eBPF verifier itself, remove the leftovers from ppc64 JIT. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 25 1月, 2018 2 次提交
-
-
由 Benjamin Gilbert 提交于
It doesn't actually do anything. Merge its help text into EXTRA_FIRMWARE. Fixes: 5620a0d1 ("firmware: delete in-kernel firmware") Fixes: 0946b2fb ("firmware: cleanup FIRMWARE_IN_KERNEL message") Signed-off-by: NBenjamin Gilbert <benjamin.gilbert@coreos.com> Signed-off-by: NRobin H. Johnson <robbat2@gentoo.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Benjamin Gilbert 提交于
The USB_SERIAL_KEYSPAN_* firmware options no longer do anything. Fixes: 5620a0d1 ("firmware: delete in-kernel firmware") Signed-off-by: NBenjamin Gilbert <benjamin.gilbert@coreos.com> Cc: Johan Hovold <johan@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 24 1月, 2018 5 次提交
-
-
由 Marc Zyngier 提交于
CONFIG_IRQ_DOMAIN_DEBUG is similar to CONFIG_GENERIC_IRQ_DEBUGFS, just with less information. Spring cleanup time. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Yang Shunyong <shunyong.yang@hxt-semitech.com> Link: https://lkml.kernel.org/r/20180117142647.23622-1-marc.zyngier@arm.com
-
由 Frederic Barrat 提交于
Add user APIs through ioctl to allocate, free, and be notified of an AFU interrupt. For opencapi, an AFU can trigger an interrupt on the host by sending a specific command targeting a 64-bit object handle. On POWER9, this is implemented by mapping a special page in the address space of a process and a write to that page will trigger an interrupt. Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Frederic Barrat 提交于
In the opencapi protocol, host memory contexts are referenced by a 'actag'. During setup, a driver must tell the device how many actags it can used, and what values are acceptable. On POWER9, the NPU can handle 64 actags per link, so they must be shared between all the PCI functions of the link. To get a global picture of how many actags are used by each AFU of every function, we capture some data at the end of PCI enumeration, so that actags can be shared fairly if needed. This is not powernv specific per say, but rather a consequence of the opencapi configuration specification being quite general. The number of available actags on POWER9 makes it more likely to be hit. This is somewhat mitigated by the fact that existing AFUs are coded by requesting a reasonable count of actags and existing devices carry only one AFU. Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Frederic Barrat 提交于
Implement a few platform-specific calls which can be used by drivers: - provide the Transaction Layer capabilities of the host, so that the driver can find some common ground and configure the device and host appropriately. - provide the hw interrupt to be used for translation faults raised by the NPU - map/unmap some NPU mmio registers to get the fault context when the NPU raises an address translation fault The rest are wrappers around the previously-introduced opal calls. Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Frederic Barrat 提交于
Add opal calls to interact with the NPU: OPAL_NPU_SPA_SETUP: set the Shared Process Area (SPA) The SPA is a table containing one entry (Process Element) per memory context which can be accessed by the opencapi device. OPAL_NPU_SPA_CLEAR_CACHE: clear the context cache The NPU keeps a cache of recently accessed memory contexts. When a Process Element is removed from the SPA, the cache for the link must be cleared. OPAL_NPU_TL_SET: configure the Transaction Layer The Transaction Layer specification defines several templates for messages to be exchanged on the link. During link setup, the host and device must negotiate what templates are supported on both sides and at what rates those messages can be sent. Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-