- 02 3月, 2016 1 次提交
-
-
由 Alexey Kardashevskiy 提交于
At the moment the kvmppc_spapr_tce_table struct can only describe 4GB windows and handle fixed size (4K) pages. Dynamic DMA windows support more so these limits need to be extended. This replaces window_size (in bytes, 4GB max) with page_shift (32bit) and size (64bit, in pages). This should cause no behavioural change as this is changing the internal structures only - the user interface still only allows one to create a 32-bit table with 4KiB pages at this stage. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 29 2月, 2016 9 次提交
-
-
由 Suresh E. Warrier 提交于
Redirecting the wakeup of a VCPU from the H_IPI hypercall to a core running in the host is usually a good idea, most workloads seemed to benefit. However, in one heavily interrupt-driven SMT1 workload, some regression was observed. This patch adds a kvm_hv module parameter called h_ipi_redirect to control this feature. The default value for this tunable is 1 - that is enable the feature. Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Suresh E. Warrier 提交于
This patch adds support to real-mode KVM to search for a core running in the host partition and send it an IPI message with VCPU to be woken. This avoids having to switch to the host partition to complete an H_IPI hypercall when the VCPU which is the target of the the H_IPI is not loaded (is not running in the guest). The patch also includes the support in the IPI handler running in the host to do the wakeup by calling kvmppc_xics_ipi_action for the PPC_MSG_RM_HOST_ACTION message. When a guest is being destroyed, we need to ensure that there are no pending IPIs waiting to wake up a VCPU before we free the VCPUs of the guest. This is accomplished by: - Forces a PPC_MSG_CALL_FUNCTION IPI to be completed by all CPUs before freeing any VCPUs in kvm_arch_destroy_vm(). - Any PPC_MSG_RM_HOST_ACTION messages must be executed first before any other PPC_MSG_CALL_FUNCTION messages. Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Suresh Warrier 提交于
This patch adds the support for the kick VCPU operation for kvmppc_host_rm_ops. The kvmppc_xics_ipi_action() function provides the function to be invoked for a host side operation when poked by the real mode KVM. This is initiated by KVM by sending an IPI to any free host core. KVM real mode must set the rm_action to XICS_RM_KICK_VCPU and rm_data to point to the VCPU to be woken up before sending the IPI. Note that we have allocated one kvmppc_host_rm_core structure per core. The above values need to be set in the structure corresponding to the core to which the IPI will be sent. Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Suresh Warrier 提交于
The kvmppc_host_rm_ops structure keeps track of which cores are are in the host by maintaining a bitmask of active/runnable online CPUs that have not entered the guest. This patch adds support to manage the bitmask when a CPU is offlined or onlined in the host. Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Suresh Warrier 提交于
Update the core host state in kvmppc_host_rm_ops whenever the primary thread of the core enters the guest or returns back. Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Suresh Warrier 提交于
This patch defines the data structures to support the setting up of host side operations while running in real mode in the guest, and also the functions to allocate and free it. The operations are for now limited to virtual XICS operations. Currently, we have only defined one operation in the data structure: - Wake up a VCPU sleeping in the host when it receives a virtual interrupt The operations are assigned at the core level because PowerKVM requires that the host run in SMT off mode. For each core, we will need to manage its state atomically - where the state is defined by: 1. Is the core running in the host? 2. Is there a Real Mode (RM) operation pending on the host? Currently, core state is only managed at the whole-core level even when the system is in split-core mode. This just limits the number of free or "available" cores in the host to perform any host-side operations. The kvmppc_host_rm_core.rm_data allows any data to be passed by KVM in real mode to the host core along with the operation to be performed. The kvmppc_host_rm_ops structure is allocated the very first time a guest VM is started. Initial core state is also set - all online cores are in the host. This structure is never deleted, not even when there are no active guests. However, it needs to be freed when the module is unloaded because the kvmppc_host_rm_ops_hv can contain function pointers to kvm-hv.ko functions for the different supported host operations. Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Suresh Warrier 提交于
Function to cause an IPI by directly updating the MFFR register in the XICS. The function is meant for real-mode callers since they cannot use the smp_ops->cause_ipi function which uses an ioremapped address. Normal usage is for the the KVM real mode code to set the IPI message using smp_muxed_ipi_message_pass and then invoke icp_native_cause_ipi_rm to cause the actual IPI. The function requires kvm_hstate.xics_phys to have been initialized with the physical address of XICS. Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Suresh Warrier 提交于
smp_muxed_ipi_message_pass() invokes smp_ops->cause_ipi, which uses an ioremapped address to access registers on the XICS interrupt controller to cause the IPI. Because of this real mode callers cannot call smp_muxed_ipi_message_pass() for IPI messaging. This patch creates a separate function smp_muxed_ipi_set_message just to set the IPI message without the cause_ipi routine. After calling this function to set the IPI message, real mode callers must cause the IPI by writing to the XICS registers directly. As part of this, we also change smp_muxed_ipi_message_pass to call smp_muxed_ipi_set_message to set the message instead of doing it directly inside the routine. Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Suresh Warrier 提交于
This patch increases the number of demuxed messages for a controller with a single ipi to 8 for 64-bit systems. This is required because we want to use the IPI mechanism to send messages from a CPU running in KVM real mode in a guest to a CPU in the host to take some action. Currently, we only support 4 messages and all 4 are already taken. Define a fifth message PPC_MSG_RM_HOST_ACTION for this purpose. Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 16 2月, 2016 7 次提交
-
-
由 Alexey Kardashevskiy 提交于
This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO devices or emulated PCI. These calls allow adding multiple entries (up to 512) into the TCE table in one call which saves time on transition between kernel and user space. The current implementation of kvmppc_h_stuff_tce() allows it to be executed in both real and virtual modes so there is one helper. The kvmppc_rm_h_put_tce_indirect() needs to translate the guest address to the host address and since the translation is different, there are 2 helpers - one for each mode. This implements the KVM_CAP_PPC_MULTITCE capability. When present, the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE if these are enabled by the userspace via KVM_CAP_PPC_ENABLE_HCALL. If they can not be handled by the kernel, they are passed on to the user space. The user space still has to have an implementation for these. Both HV and PR-syle KVM are supported. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Alexey Kardashevskiy 提交于
Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls) will validate TCE (not to have unexpected bits) and IO address (to be within the DMA window boundaries). This introduces helpers to validate TCE and IO address. The helpers are exported as they compile into vmlinux (to work in realmode) and will be used later by KVM kernel module in virtual mode. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Alexey Kardashevskiy 提交于
SPAPR_TCE_SHIFT is used in few places only and since IOMMU_PAGE_SHIFT_4K can be easily used instead, remove SPAPR_TCE_SHIFT. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Alexey Kardashevskiy 提交于
At the moment pages used for TCE tables (in addition to pages addressed by TCEs) are not counted in locked_vm counter so a malicious userspace tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and lock a lot of memory. This adds counting for pages used for TCE tables. This counts the number of pages required for a table plus pages for the kvmppc_spapr_tce_table struct (TCE table descriptor) itself. This changes release_spapr_tce_table() to store @npages on stack to avoid calling kvmppc_stt_npages() in the loop (tiny optimization, probably). This does not change the amount of used memory. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Alexey Kardashevskiy 提交于
At the moment only spapr_tce_tables updates are protected against races but not lookups. This fixes missing protection by using RCU for the list. As lookups also happen in real mode, this uses list_for_each_entry_lockless() (which is expected not to access any vmalloc'd memory). This converts release_spapr_tce_table() to a RCU scheduled handler. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Alexey Kardashevskiy 提交于
This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have following patches applied nicer. This moves the ioba boundaries check to a helper and adds a check for least bits which have to be zeros. The patch is pretty mechanical (only check for least ioba bits is added) so no change in behaviour is expected. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Alexey Kardashevskiy 提交于
This makes vmalloc_to_phys() public as there will be another user (KVM in-kernel VFIO acceleration) for it soon. As this new user can be compiled as a module, this exports the symbol. As a little optimization, this changes the helper to call vmalloc_to_pfn() instead of vmalloc_to_page() as the size of the struct page may not be power-of-two aligned which will make gcc use multiply instructions instead of shifts. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 15 2月, 2016 1 次提交
-
-
由 David Gibson 提交于
ec53500f "kvm: Add VFIO device" added a special KVM pseudo-device which is used to handle any necessary interactions between KVM and VFIO. Currently that device is built on x86 and ARM, but not powerpc, although powerpc does support both KVM and VFIO. This makes things awkward in userspace Currently qemu prints an alarming error message if you attempt to use VFIO and it can't initialize the KVM VFIO device. We don't want to remove the warning, because lack of the KVM VFIO device could mean coherency problems on x86. On powerpc, however, the error is harmless but looks disturbing, and a test based on host architecture in qemu would be ugly, and break if we do need the KVM VFIO device for something important in future. There's nothing preventing the KVM VFIO device from being built for powerpc, so this patch turns it on. It won't actually do anything, since we don't define any of the arch_*() hooks, but it will make qemu happy and we can extend it in future if we need to. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Reviewed-by: NEric Auger <eric.auger@linaro.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 28 1月, 2016 2 次提交
-
-
由 Aneesh Kumar K.V 提交于
This was wrongly updated by commit 7aa9a23c ("powerpc, thp: remove infrastructure for handling splitting PMDs") during the last merge window. Fix it up. This could lead to incorrect behaviour in THP and/or mprotect(), at a minimum. Fixes: 7aa9a23c ("powerpc, thp: remove infrastructure for handling splitting PMDs") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Madhavan Srinivasan 提交于
Commit 7a786832 ("powerpc/perf: Add an explict flag indicating presence of SLOT field") introduced the PPMU_HAS_SSLOT flag to remove the assumption that MMCRA[SLOT] was present when PPMU_ALT_SIPR was not set. That commit's changelog also mentions that Power8 does not support MMCRA[SLOT]. However when the Power8 PMU support was merged, it errnoeously included the PPMU_HAS_SSLOT flag. So remove PPMU_HAS_SSLOT from the Power8 flags. mpe: On systems where MMCRA[SLOT] exists, the field occupies bits 37:39 (IBM numbering). On Power8 bit 37 is reserved, and 38:39 overlap with the high bits of the Threshold Event Counter Mantissa. I am not aware of any published events which use the threshold counting mechanism, which would cause the mantissa bits to be set. So in practice this bug is unlikely to trigger. Fixes: e05b9b9e ("powerpc/perf: Power8 PMU support") Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 27 1月, 2016 1 次提交
-
-
由 Gavin Shan 提交于
In eeh_pe_loc_get(), the PE location code is retrieved from the "ibm,loc-code" property of the device node for the bridge of the PE's primary bus. It's not correct because the property indicates the parent PE's location code. This reads the correct PE location code from "ibm,io-base-loc-code" or "ibm,slot-location-code" property of PE parent bus's device node. Cc: stable@vger.kernel.org # v3.16+ Fixes: 357b2f3d ("powerpc/eeh: Dump PE location code") Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 25 1月, 2016 1 次提交
-
-
由 Vasant Hegde 提交于
With commit 90a545e9 (restrict /dev/mem to idle io memory ranges) mapping rtas_rmo_buf from user space is failing. Hence we are not able to make RTAS syscall. This patch calls page_is_rtas_user_buf before calling iomem_is_exclusive in devmem_is_allowed(). This will allow user space to map rtas_rmo_buf and we are able to make RTAS syscall. Reported-by: NBharata B Rao <bharata@linux.vnet.ibm.com> CC: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 23 1月, 2016 1 次提交
-
-
由 Al Viro 提交于
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 21 1月, 2016 6 次提交
-
-
由 Stephen Rothwell 提交于
Commit d5d6a443 ("arch/powerpc/include/asm/pgtable-ppc64.h: add pmd_[dirty|mkclean] for THP") added a new identical definition of pmd_dirty(). Remove it again. Cc: Minchan Kim <minchan@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alan Modra 提交于
PowerPC64 uses the symbol .TOC. much as other targets use _GLOBAL_OFFSET_TABLE_. It identifies the value of the GOT pointer (or in powerpc parlance, the TOC pointer). Global offset tables are generally local to an executable or shared library, or in the kernel, module. Thus it does not make sense for a module to resolve a relocation against .TOC. to the kernel's .TOC. value. A module has its own .TOC., and indeed the powerpc64 module relocation processing ignores the kernel value of .TOC. and instead calculates a module-local value. This patch removes code involved in exporting the kernel .TOC., tweaks modpost to ignore an undefined .TOC., and the module loader to twiddle the section symbol so that .TOC. isn't seen as undefined. Note that if the kernel was compiled with -msingle-pic-base then ELFv2 would not have function global entry code setting up r2. In that case the module call stubs would need to be modified to set up r2 using the kernel .TOC. value, requiring some of this code to be reinstated. mpe: Furthermore a change in binutils master (not yet released) causes the current way we handle the TOC to no longer work when building with MODVERSIONS=y and RELOCATABLE=n. The symptom is that modules can not be loaded due to there being no version found for TOC. Cc: stable@vger.kernel.org # 3.16+ Signed-off-by: NAlan Modra <amodra@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Chandan Rajendra 提交于
Test runs on a ppc64 BE guest succeeded using modified fstests. Also tested on ppc64 LE using a home made test - mpe. Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Christoph Hellwig 提交于
Move the generic implementation to <linux/dma-mapping.h> now that all architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now that everyone supports them. [valentinrothberg@gmail.com: remove leftovers in Kconfig] Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Helge Deller <deller@gmx.de> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Steven Miao <realmz6@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: NValentin Rothberg <valentinrothberg@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Axtens 提交于
This hooks up UBSAN support for PowerPC. So far it's found some interesting cases where we don't properly sanitise input to shifts, including one in our futex handling. Nothing critical, but interesting and worth fixing. [valentinrothberg@gmail.com: arch/powerpc/Kconfig: fix typo in select statement] Signed-off-by: NDaniel Axtens <dja@axtens.net> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NValentin Rothberg <valentinrothberg@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rasmus Villemoes 提交于
The four cpumasks cpu_{possible,online,present,active}_bits are exposed readonly via the corresponding const variables cpu_xyz_mask. But they are also accessible for arbitrary writing via the exposed functions set_cpu_xyz. There's quite a bit of code throughout the kernel which iterates over or otherwise accesses these bitmaps, and having the access go via the cpu_xyz_mask variables is nowadays [1] simply a useless indirection. It may be that any problem in CS can be solved by an extra level of indirection, but that doesn't mean every extra indirection solves a problem. In this case, it even necessitates some minor ugliness (see 4/6). Patch 1/6 is new in v2, and fixes a build failure on ppc by renaming a struct member, to avoid problems when the identifier cpu_online_mask becomes a macro later in the series. The next four patches eliminate the cpu_xyz_mask variables by simply exposing the actual bitmaps, after renaming them to discourage direct access - that still happens through cpu_xyz_mask, which are now simply macros with the same type and value as they used to have. After that, there's no longer any reason to have the setter functions be out-of-line: The boolean parameter is almost always a literal true or false, so by making them static inlines they will usually compile to one or two instructions. For a defconfig build on x86_64, bloat-o-meter says we save ~3000 bytes. We also save a little stack (stackdelta says 127 functions have a 16 byte smaller stack frame, while two grow by that amount). Mostly because, when iterating over the mask, gcc typically loads the value of cpu_xyz_mask into a callee-saved register and from there into %rdi before each find_next_bit call - now it can just load the appropriate immediate address into %rdi before each call. [1] See Rusty's kind explanation http://thread.gmane.org/gmane.linux.kernel/2047078/focus=2047722 for some historic context. This patch (of 6): As preparation for eliminating the indirect access to the various global cpu_*_bits bitmaps via the pointer variables cpu_*_mask, rename the cpu_online_mask member of struct fadump_crash_info_header to simply online_mask, thus allowing cpu_online_mask to become a macro. Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 1月, 2016 1 次提交
-
-
由 Will Deacon 提交于
As illustrated by commit a3afe70b ("[S390] latencytop s390 support."), HAVE_LATENCYTOP_SUPPORT is defined by an architecture to advertise an implementation of save_stack_trace_tsk. However, as of 9212ddb5 ("stacktrace: provide save_stack_trace_tsk() weak alias") a dummy implementation is provided if STACKTRACE=y. Given that LATENCYTOP already depends on STACKTRACE_SUPPORT and selects STACKTRACE, we can remove HAVE_LATENCYTOP_SUPPORT altogether. Signed-off-by: NWill Deacon <will.deacon@arm.com> Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: James Hogan <james.hogan@imgtec.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Helge Deller <deller@gmx.de> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 1月, 2016 6 次提交
-
-
由 Dan Williams 提交于
For the purpose of communicating the optional presence of a 'struct page' for the pfn returned from ->direct_access(), introduce a type that encapsulates a page-frame-number plus flags. These flags contain the historical "page_link" encoding for a scatterlist entry, but can also denote "device memory". Where "device memory" is a set of pfns that are not part of the kernel's linear mapping by default, but are accessed via the same memory controller as ram. The motivation for this new type is large capacity persistent memory that needs struct page entries in the 'memmap' to support 3rd party DMA (i.e. O_DIRECT I/O with a persistent memory source/target). However, we also need it in support of maintaining a list of mapped inodes which need to be unmapped at driver teardown or freeze_bdev() time. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave@sr71.net> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
To date, we have implemented two I/O usage models for persistent memory, PMEM (a persistent "ram disk") and DAX (mmap persistent memory into userspace). This series adds a third, DAX-GUP, that allows DAX mappings to be the target of direct-i/o. It allows userspace to coordinate DMA/RDMA from/to persistent memory. The implementation leverages the ZONE_DEVICE mm-zone that went into 4.3-rc1 (also discussed at kernel summit) to flag pages that are owned and dynamically mapped by a device driver. The pmem driver, after mapping a persistent memory range into the system memmap via devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus page-backed pmem-pfns via flags in the new pfn_t type. The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the resulting pte(s) inserted into the process page tables with a new _PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys off _PAGE_DEVMAP to pin the device hosting the page range active. Finally, get_page() and put_page() are modified to take references against the device driver established page mapping. Finally, this need for "struct page" for persistent memory requires memory capacity to store the memmap array. Given the memmap array for a large pool of persistent may exhaust available DRAM introduce a mechanism to allocate the memmap from persistent memory. The new "struct vmem_altmap *" parameter to devm_memremap_pages() enables arch_add_memory() to use reserved pmem capacity rather than the page allocator. This patch (of 18): The core has developed a need for a "pfn_t" type [1]. Move the existing pfn_t in KVM to kvm_pfn_t [2]. [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com> Acked-by: NChristoffer Dall <christoffer.dall@linaro.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite of the contents since MADV_FREE syscall is called for THP page. This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE support. Signed-off-by: NMinchan Kim <minchan@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Shaohua Li <shli@kernel.org> Cc: <yalin.wang2010@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Gang <gang.chen.5i5j@gmail.com> Cc: Chris Zankel <chris@zankel.net> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: Helge Deller <deller@gmx.de> Cc: Hugh Dickins <hughd@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jason Evans <je@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttil <mika.penttila@nextfour.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Rik van Riel <riel@redhat.com> Cc: Roland Dreier <roland@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Shaohua Li <shli@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
With new refcounting we don't need to mark PMDs splitting. Let's drop code to handle this. pmdp_splitting_flush() is not needed too: on splitting PMD we will do pmdp_clear_flush() + set_pte_at(). pmdp_clear_flush() will do IPI as needed for fast_gup. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
Tail page refcounting is utterly complicated and painful to support. It uses ->_mapcount on tail pages to store how many times this page is pinned. get_page() bumps ->_mapcount on tail page in addition to ->_count on head. This information is required by split_huge_page() to be able to distribute pins from head of compound page to tails during the split. We will need ->_mapcount to account PTE mappings of subpages of the compound page. We eliminate need in current meaning of ->_mapcount in tail pages by forbidding split entirely if the page is pinned. The only user of tail page refcounting is THP which is marked BROKEN for now. Let's drop all this mess. It makes get_page() and put_page() much simpler. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: NSasha Levin <sasha.levin@oracle.com> Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NJerome Marchand <jmarchan@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
We are going to decouple splitting THP PMD from splitting underlying compound page. This patch renames split_huge_page_pmd*() functions to split_huge_pmd*() to reflect the fact that it doesn't imply page splitting, only PMD. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: NSasha Levin <sasha.levin@oracle.com> Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NJerome Marchand <jmarchan@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 1月, 2016 1 次提交
-
-
由 Vladimir Davydov 提交于
Mark those kmem allocations that are known to be easily triggered from userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to memcg. For the list, see below: - threadinfo - task_struct - task_delay_info - pid - cred - mm_struct - vm_area_struct and vm_region (nommu) - anon_vma and anon_vma_chain - signal_struct - sighand_struct - fs_struct - files_struct - fdtable and fdtable->full_fds_bits - dentry and external_name - inode for all filesystems. This is the most tedious part, because most filesystems overwrite the alloc_inode method. The list is far from complete, so feel free to add more objects. Nevertheless, it should be close to "account everything" approach and keep most workloads within bounds. Malevolent users will be able to breach the limit, but this was possible even with the former "account everything" approach (simply because it did not account everything in fact). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 1月, 2016 1 次提交
-
-
由 Greg Kurz 提交于
The get and set operations got exchanged by mistake when moving the code from book3s.c to powerpc.c. Fixes: 3840edc8 Cc: stable@vger.kernel.org # 3.18+ Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 13 1月, 2016 2 次提交
-
-
由 Ulrich Weigand 提交于
GCC 6 will include changes to generated code with -mcmodel=large, which is used to build kernel modules on powerpc64le. This was necessary because the large model is supposed to allow arbitrary sizes and locations of the code and data sections, but the ELFv2 global entry point prolog still made the unconditional assumption that the TOC associated with any particular function can be found within 2 GB of the function entry point: func: addis r2,r12,(.TOC.-func)@ha addi r2,r2,(.TOC.-func)@l .localentry func, .-func To remove this assumption, GCC will now generate instead this global entry point prolog sequence when using -mcmodel=large: .quad .TOC.-func func: .reloc ., R_PPC64_ENTRY ld r2, -8(r12) add r2, r2, r12 .localentry func, .-func The new .reloc triggers an optimization in the linker that will replace this new prolog with the original code (see above) if the linker determines that the distance between .TOC. and func is in range after all. Since this new relocation is now present in module object files, the kernel module loader is required to handle them too. This patch adds support for the new relocation and implements the same optimization done by the GNU linker. Cc: stable@vger.kernel.org Signed-off-by: NUlrich Weigand <ulrich.weigand@de.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Russell Currey 提交于
The recently added OPAL API call, OPAL_CONSOLE_FLUSH, originally took no parameters and returned nothing. The call was updated to accept the terminal number to flush, and returned various values depending on the state of the output buffer. The prototype has been updated and its usage in the OPAL kmsg dumper has been modified to support its new behaviour as an incremental flush. Signed-off-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-