- 22 4月, 2014 3 次提交
-
-
由 Martin Schwidefsky 提交于
Switch the user dirty bit detection used for migration from the hardware provided host change-bit in the pgste to a fault based detection method. This reduced the dependency of the host from the storage key to a point where it becomes possible to enable the RCP bypass for KVM guests. The fault based dirty detection will only indicate changes caused by accesses via the guest address space. The hardware based method can detect all changes, even those caused by I/O or accesses via the kernel page table. The KVM/qemu code needs to take this into account. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Dominik Dingel 提交于
Introduce a new function s390_enable_skey(), which enables storage key handling via setting the use_skey flag in the mmu context. This function is only useful within the context of kvm. Note that enabling storage keys will cause a one-time hickup when walking the page table; however, it saves us special effort for cases like clear reset while making it possible for us to be architecture conform. s390_enable_skey() takes the page table lock to prevent reseting storage keys triggered from multiple vcpus. Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Dominik Dingel 提交于
For lazy storage key handling, we need a mechanism to track if the process ever issued a storage key operation. This patch adds the basic infrastructure for making the storage key handling optional, but still leaves it enabled for now by default. Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
- 03 4月, 2014 1 次提交
-
-
由 Martin Schwidefsky 提交于
The zEC12 machines introduced the local-clearing control for the IDTE and IPTE instruction. If the control is set only the TLB of the local CPU is cleared of entries, either all entries of a single address space for IDTE, or the entry for a single page-table entry for IPTE. Without the local-clearing control the TLB flush is broadcasted to all CPUs in the configuration, which is expensive. The reset of the bit mask of the CPUs that need flushing after a non-local IDTE is tricky. As TLB entries for an address space remain in the TLB even if the address space is detached a new bit field is required to keep track of attached CPUs vs. CPUs in the need of a flush. After a non-local flush with IDTE the bit-field of attached CPUs is copied to the bit-field of CPUs in need of a flush. The ordering of operations on cpu_attach_mask, attach_count and mm_cpumask(mm) is such that an underindication in mm_cpumask(mm) is prevented but an overindication in mm_cpumask(mm) is possible. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 21 3月, 2014 2 次提交
-
-
由 Dominik Dingel 提交于
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Dominik Dingel 提交于
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 21 2月, 2014 2 次提交
-
-
由 Konstantin Weitz 提交于
This patch enables Collaborative Memory Management (CMM) for kvm on s390. CMM allows the guest to inform the host about page usage (see arch/s390/mm/cmm.c). The host uses this information to avoid swapping in unused pages in the page fault handler. Further, a CPU provided list of unused invalid pages is processed to reclaim swap space of not yet accessed unused pages. [ Martin Schwidefsky: patch reordering and cleanup ] Signed-off-by: NKonstantin Weitz <konstantin.weitz@gmail.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
Git commit 050eef36 "[S390] fix tlb flushing vs. concurrent /proc accesses" introduced the attach counter to avoid using the mm_users value to decide between IPTE for every PTE and lazy TLB flushing with IDTE. That fixed the problem with mm_users but it introduced another subtle race, fortunately one that is very hard to hit. The background is the requirement of the architecture that a valid PTE may not be changed while it can be used concurrently by another cpu. The decision between IPTE and lazy TLB flushing needs to be done while the PTE is still valid. Now if the virtual cpu is temporarily stopped after the decision to use lazy TLB flushing but before the invalid bit of the PTE has been set, another cpu can attach the mm, find that flush_mm is set, do the IDTE, return to userspace, and recreate a TLB that uses the PTE in question. When the first, stopped cpu continues it will change the PTE while it is attached on another cpu. The first cpu will do another IDTE shortly after the modification of the PTE which makes the race window quite short. To fix this race the CPU that wants to attach the address space of a user space thread needs to wait for the end of the PTE modification. The number of concurrent TLB flushers for an mm is tracked in the upper 16 bits of the attach_count and finish_arch_post_lock_switch is used to wait for the end of the flush operation if required. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 30 1月, 2014 1 次提交
-
-
由 Dominik Dingel 提交于
In the case of a fault, we will retry to exit sie64 but with gmap fault indication for this thread set. This makes it possible to handle async page faults. Based on a patch from Martin Schwidefsky. Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
- 15 10月, 2013 1 次提交
-
-
由 Martin Schwidefsky 提交于
For machines without enhanced supression on protection the software dirty bit code forces the pte dirty bit and clears the page protection bit in pgste_set_pte. This is done for all pte types, the check for present ptes is missing. As a result swap ptes and other not-present ptes can get corrupted. Add a check for the _PAGE_PRESENT bit to pgste_set_pte before modifying the pte value. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 29 8月, 2013 1 次提交
-
-
由 Martin Schwidefsky 提交于
The last remaining use for the storage key of the s390 architecture is reference counting. The alternative is to make page table entries invalid while they are old. On access the fault handler marks the pte/pmd as young which makes the pte/pmd valid if the access rights allow read access. The pte/pmd invalidations required for software managed reference bits cost a bit of performance, on the other hand the RRBE/RRBM instructions to read and reset the referenced bits are quite expensive as well. Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 28 8月, 2013 1 次提交
-
-
由 Martin Schwidefsky 提交于
For a single-threaded KVM guest ptep_modify_prot_start will not use IPTE, the invalid bit will therefore not be set. If DEBUG_VM is set pgste_set_key called by ptep_modify_prot_commit will complain about the missing invalid bit. ptep_modify_prot_start should set the invalid bit in all cases. Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 22 8月, 2013 4 次提交
-
-
由 Martin Schwidefsky 提交于
Isolate the logic of IDTE vs. IPTE flushing of ptes in two functions, ptep_flush_lazy and __tlb_flush_mm_lazy. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
ptep_modify_prot_start uses the pgste_set helper to store the pgste, while ptep_modify_prot_commit uses its own pointer magic to retrieve the value again. Add the pgste_get help function to keep things symmetrical and improve readability. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
On process exit there is no more need for the pgste information. Skip the expensive storage key operations which should speed up termination of KVM processes. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
Improve the encoding of the different pte types and the naming of the page, segment table and region table bits. Due to the different pte encoding the hugetlbfs primitives need to be adapted as well. To improve compatability with common code make the huge ptes use the encoding of normal ptes. The conversion between the pte and pmd encoding for a huge pte is done with set_huge_pte_at and huge_ptep_get. Overall the code is now easier to understand. Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 29 7月, 2013 1 次提交
-
-
由 Martin Schwidefsky 提交于
Improve the code to upgrade the standard 2K page tables to 4K page tables with PGSTEs to allow the operation to happen when the program is already multi-threaded. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 29 6月, 2013 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 20 6月, 2013 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
This will be later used by powerpc THP support. In powerpc we want to use pgtable for storing the hash index values. So instead of adding them to mm_context list, we would like to store them in the second half of pmd Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 05 6月, 2013 3 次提交
-
-
由 Christian Borntraeger 提交于
Getting and Releasing the pgste lock has lock semantics. Make the code an explicit barrier. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Christian Borntraeger 提交于
In modify_prot_start we update the pgste value but never store it back into the original location. Lets save the calculated result, since modify_prot_commit will use the value of the pgste. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Christian Borntraeger 提交于
When doing the transition invalid->valid in the host page table for a guest, then the guest view of C/R is in the pgste. After validation the view is pgste OR real key. We must zero out the real key C/R to avoid guest over-indication for change (and reference). Touching the real key is ok also for the host: The change bit is tracked via write protection and the reference bit is also ok because set_pte_at was called and the page will be touched anyway soon. Furthermore architecture defines reference as "substantially accurate", over- and underindication are ok. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 28 5月, 2013 1 次提交
-
-
由 Christian Borntraeger 提交于
pte_present might return true on PAGE_TYPE_NONE, even if the invalid bit is on. Modify the existing check of the pgste functions to avoid crashes. [ Martin Schwidefsky: added ptep_modify_prot_[start|commit] bits ] Reported-by: NMartin Schwidefky <schwidefsky@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> CC: stable@vger.kernel.org Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 21 5月, 2013 3 次提交
-
-
由 Christian Borntraeger 提交于
The guest prefix pages must be mapped writeable all the time while SIE is running, otherwise the guest might see random behaviour. (pinned at the pte level) Turns out that mlocking is not enough, the page table entry (not the page) might change or become r/o. This patch uses the gmap notifiers to kick guest cpus out of SIE. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Martin Schwidefsky 提交于
The RCP byte is a part of the PGSTE value, the existing RCP_xxx names are inaccurate. As the defines describe bits and pieces of the PGSTE, the names should start with PGSTE_. The KVM_UR_BIT and KVM_UC_BIT are part of the PGSTE as well, give them better names as well. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Christian Borntraeger 提交于
Dont use the same bit as user referenced. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
- 17 5月, 2013 1 次提交
-
-
由 Christian Borntraeger 提交于
Dont use the same bit as user referenced. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
- 03 5月, 2013 1 次提交
-
-
由 Martin Schwidefsky 提交于
Add a notifier for kvm to get control before a page table entry is invalidated. The notifier is only called for ptes of an address space with pgstes that have been explicitly marked to require notification. Kvm will use this to get control before prefix pages of virtual CPU are unmapped. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 30 4月, 2013 1 次提交
-
-
由 Gerald Schaefer 提交于
Commit abf09bed ("s390/mm: implement software dirty bits") introduced another difference in the pte layout vs. the pmd layout on s390, thoroughly breaking the s390 support for hugetlbfs. This requires replacing some more pte_xxx functions in mm/hugetlbfs.c with a huge_pte_xxx version. This patch introduces those huge_pte_xxx functions and their generic implementation in asm-generic/hugetlb.h, which will now be included on all architectures supporting hugetlbfs apart from s390. This change will be a no-op for those architectures. [akpm@linux-foundation.org: fix warning] Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Hillf Danton <dhillf@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> [for !s390 parts] Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 4月, 2013 1 次提交
-
-
由 Heiko Carstens 提交于
Implement gmap_translate() function which translates a guest absolute address to a user space process address without establishing the guest page table entries. This is useful for kvm guest address translations where no memory access is expected to happen soon (e.g. tprot exception handler). Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 17 4月, 2013 1 次提交
-
-
由 Linus Torvalds 提交于
Commit b4cbb197 ("vm: add vm_iomap_memory() helper function") added a helper function wrapper around io_remap_pfn_range(), and every other architecture defined it in <asm/pgtable.h>. The s390 choice of <asm/io.h> may make sense, but is not very convenient for this case, and gratuitous differences like that cause unexpected errors like this: mm/memory.c: In function 'vm_iomap_memory': mm/memory.c:2439:2: error: implicit declaration of function 'io_remap_pfn_range' [-Werror=implicit-function-declaration] Glory be the kbuild test robot who noticed this, bisected it, and reported it to the guilty parties (ie me). Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 4月, 2013 2 次提交
-
-
由 Heiko Carstens 提交于
All architectures need to provide a check_pgt_cache() function. The s390 one got lost somewhere. So reintroduce it to prevent future compile errors e.g. if Thomas Gleixner's idle loop rework patches get merged. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
When translating user space addresses to kernel addresses the follow_table() function had two bugs: - PROT_NONE mappings could be read accessed via the kernel mapping. That is e.g. putting a filename into a user page, then protecting the page with PROT_NONE and afterwards issuing the "open" syscall with a pointer to the filename would incorrectly succeed. - when walking the page tables it used the pgd/pud/pmd/pte primitives which with dynamic page tables give no indication which real level of page tables is being walked (region2, region3, segment or page table). So in case of an exception the translation exception code passed to __handle_fault() is not necessarily correct. This is not really an issue since __handle_fault() doesn't evaluate the code. Only in case of e.g. a SIGBUS this code gets passed to user space. If user space can do something sane with the value is a different question though. To fix these issues don't use any Linux primitives. Only walk the page tables like the hardware would do it, however we leave quite some checks away since we know that we only have full size page tables and each index is within bounds. In theory this should fix all issues... Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 08 3月, 2013 1 次提交
-
-
由 Heiko Carstens 提交于
Implement gmap_translate() function which translates a guest absolute address to a user space process address without establishing the guest page table entries. This is useful for kvm guest address translations where no memory access is expected to happen soon (e.g. tprot exception handler). Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 28 2月, 2013 1 次提交
-
-
由 Heiko Carstens 提交于
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 14 2月, 2013 2 次提交
-
-
由 Martin Schwidefsky 提交于
The s390 architecture is unique in respect to dirty page detection, it uses the change bit in the per-page storage key to track page modifications. All other architectures track dirty bits by means of page table entries. This property of s390 has caused numerous problems in the past, e.g. see git commit ef5d437f "mm: fix XFS oops due to dirty pages without buffers on s390". To avoid future issues in regard to per-page dirty bits convert s390 to a fault based software dirty bit detection mechanism. All user page table entries which are marked as clean will be hardware read-only, even if the pte is supposed to be writable. A write by the user process will trigger a protection fault which will cause the user pte to be marked as dirty and the hardware read-only bit is removed. With this change the dirty bit in the storage key is irrelevant for Linux as a host, but the storage key is still required for KVM guests. The effect is that page_test_and_clear_dirty and the related code can be removed. The referenced bit in the storage key is still used by the page_test_and_clear_young primitive to provide page age information. For page cache pages of mappings with mapping_cap_account_dirty there will not be any change in behavior as the dirty bit tracking already uses read-only ptes to control the amount of dirty pages. Only for swap cache pages and pages of mappings without mapping_cap_account_dirty there can be additional protection faults. To avoid an excessive number of additional faults the mk_pte primitive checks for PageDirty if the pgprot value allows for writes and pre-dirties the pte. That avoids all additional faults for tmpfs and shmem pages until these pages are added to the swap cache. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
Only needed to make some drivers compile... Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 22 1月, 2013 1 次提交
-
-
由 Gerald Schaefer 提交于
On s390, an architecture-specific implementation of the function pmdp_set_wrprotect() is missing and the generic version is currently being used. The generic version does not flush the tlb as it would be needed on s390 when modifying an active pmd, which can lead to subtle tlb errors on s390 when using transparent hugepages. This patch adds an s390-specific implementation of pmdp_set_wrprotect() including the missing tlb flush. Cc: stable@vger.kernel.org Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 13 1月, 2013 1 次提交
-
-
由 Gerald Schaefer 提交于
The pfn calculation in pmd_pfn() is broken for thp, because it uses HPAGE_SHIFT instead of the normal PAGE_SHIFT. This is fixed by removing the distinction between thp and normal pmds in that function, and always using PAGE_SHIFT. Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 13 12月, 2012 1 次提交
-
-
由 Kirill A. Shutemov 提交于
We have two different implementation of is_zero_pfn() and my_zero_pfn() helpers: for architectures with and without zero page coloring. Let's consolidate them in <asm-generic/pgtable.h>. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-