- 21 7月, 2017 1 次提交
-
-
由 Kirill A. Shutemov 提交于
MPX (without MAWA extension) cannot handle addresses above 47 bits, so we need to make sure that MPX cannot be enabled if we already have a VMA above the boundary and forbid creating such VMAs once MPX is enabled. The patch implements mpx_unmapped_area_check() which is called from all variants of get_unmapped_area() to check if the requested address fits mpx. On enabling MPX, we check if we already have any vma above 47-bit boundary and forbit the enabling if we do. As long as DEFAULT_MAP_WINDOW is equal to TASK_SIZE_MAX, the change is nop. It will change when we allow userspace to have mappings above 47-bits. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170716225954.74185-6-kirill.shutemov@linux.intel.com [ Readability edits. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 4月, 2017 1 次提交
-
-
由 Joerg Roedel 提交于
When this function fails it just sends a SIGSEGV signal to user-space using force_sig(). This signal is missing essential information about the cause, e.g. the trap_nr or an error code. Fix this by propagating the error to the only caller of mpx_handle_bd_fault(), do_bounds(), which sends the correct SIGSEGV signal to the process. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: fe3d197f ('x86, mpx: On-demand kernel allocation of bounds tables') Link: http://lkml.kernel.org/r/1491488362-27198-1-git-send-email-joro@8bytes.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 3月, 2017 1 次提交
-
-
由 Tobias Klauser 提交于
Make the function get_user_bd_entry() static as it is not used outside of arch/x86/mm/mpx.c This fixes a sparse warning. Signed-off-by: NTobias Klauser <tklauser@distanz.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 02 3月, 2017 1 次提交
-
-
由 Ingo Molnar 提交于
Update code that relied on sched.h including various MM types for them. This will allow us to remove the <linux/mm_types.h> include from <linux/sched.h>. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 25 2月, 2017 1 次提交
-
-
由 Mike Rapoport 提交于
When a non-cooperative userfaultfd monitor copies pages in the background, it may encounter regions that were already unmapped. Addition of UFFD_EVENT_UNMAP allows the uffd monitor to track precisely changes in the virtual memory layout. Since there might be different uffd contexts for the affected VMAs, we first should create a temporary representation for the unmap event for each uffd context and then notify them one by one to the appropriate userfault file descriptors. The event notification occurs after the mmap_sem has been released. [arnd@arndb.de: fix nommu build] Link: http://lkml.kernel.org/r/20170203165141.3665284-1-arnd@arndb.de [mhocko@suse.com: fix nommu build] Link: http://lkml.kernel.org/r/20170202091503.GA22823@dhcp22.suse.cz Link: http://lkml.kernel.org/r/1485542673-24387-3-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 2月, 2017 1 次提交
-
-
由 Kirill A. Shutemov 提交于
There's no users of zap_page_range() who wants non-NULL 'details'. Let's drop it. Link: http://lkml.kernel.org/r/20170118122429.43661-3-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 1月, 2017 1 次提交
-
-
由 Tobias Klauser 提交于
info->si_addr is of type void __user *, so it should be compared against something from the same address space. This fixes the following sparse error: arch/x86/mm/mpx.c:296:27: error: incompatible types in comparison expression (different address spaces) Signed-off-by: NTobias Klauser <tklauser@distanz.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 17 12月, 2016 1 次提交
-
-
由 Mark Rutland 提交于
Currently bd_addr lives in mm_struct, which is otherwise architecture independent. Architecture-specific data is supposed to live within mm_context_t (itself contained in mm_struct). Other x86-specific context like the pkey accounting data lives in mm_context_t, and there's no readon the MPX data can't also live there. So as to keep the arch-specific data togather, and to set a good example for others, this patch moves bd_addr into x86's mm_context_t. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Acked-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1481892055-24596-1-git-send-email-mark.rutland@arm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 19 10月, 2016 1 次提交
-
-
由 Lorenzo Stoakes 提交于
This removes the 'write' and 'force' from get_user_pages() and replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: NLorenzo Stoakes <lstoakes@gmail.com> Acked-by: NChristian König <christian.koenig@amd.com> Acked-by: NJesper Nilsson <jesper.nilsson@axis.com> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 2月, 2016 1 次提交
-
-
由 Colin Ian King 提交于
In the unlikely event that regno == nr_registers then we get an array overrun on regoff because the invalid register check is currently off-by-one. Fix this with a check that regno is >= nr_registers instead. Detected with static analysis using CoverityScan. Fixes: fcc7ffd6 "x86, mpx: Decode MPX instruction to get bound violation information" Signed-off-by: NColin Ian King <colin.king@canonical.com> Acked-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1456512931-3388-1-git-send-email-colin.king@canonical.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 24 2月, 2016 1 次提交
-
-
由 Adam Buchbinder 提交于
Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: trivial@kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 2月, 2016 1 次提交
-
-
由 Dave Hansen 提交于
We will soon modify the vanilla get_user_pages() so it can no longer be used on mm/tasks other than 'current/current->mm', which is by far the most common way it is called. For now, we allow the old-style calls, but warn when they are used. (implemented in previous patch) This patch switches all callers of: get_user_pages() get_user_pages_unlocked() get_user_pages_locked() to stop passing tsk/mm so they will no longer see the warnings. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: jack@suse.cz Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210156.113E9407@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 12月, 2015 1 次提交
-
-
由 Dave Hansen 提交于
MPX decodes instructions in order to tell which bounds register was violated. Part of this decoding involves looking at the "REX prefix" which is a special instrucion prefix used to retrofit support for new registers in to old instructions. The X86_REX_*() macros are defined to return actual bit values: #define X86_REX_R(rex) ((rex) & 4) *not* boolean values. However, the MPX code was checking for them like they were booleans. This might have led to us mis-decoding the "REX prefix" and giving false information out to userspace about bounds violations. X86_REX_B() actually is bit 1, so this is really only broken for the X86_REX_X() case. Fix the conditionals up to tolerate the non-boolean values. Fixes: fcc7ffd6 "x86, mpx: Decode MPX instruction to get bound violation information" Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20151201003113.D800C1E0@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 12 11月, 2015 2 次提交
-
-
由 Dave Hansen 提交于
I received a bug report that running 32-bit MPX binaries on 64-bit kernels was broken. I traced it down to this little code snippet. We were switching our "number of bounds directory entries" calculation correctly. But, we didn't switch the other side of the calculation: the virtual space size. This meant that we were calculating an absurd size for bd_entry_virt_space() on 32-bit because we used the 64-bit virt_space. This was _also_ broken for 32-bit kernels running on 64-bit hardware since boot_cpu_data.x86_virt_bits=48 even when running in 32-bit mode. Correct that and properly handle all 3 possible cases: 1. 32-bit binary on 64-bit kernel 2. 64-bit binary on 64-bit kernel 3. 32-bit binary on 32-bit kernel This manifested in having bounds tables not properly unmapped. It "leaked" memory but had no functional impact otherwise. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20151111181934.FA7FAC34@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
When you call get_user(foo, bar), you effectively do a copy_from_user(&foo, bar, sizeof(*bar)); Note that the sizeof() is implicit. When we reach out to userspace to try to zap an entire "bounds table" we need to go read a "bounds directory entry" in order to locate the table's address. The size of a "directory entry" depends on the binary being run and is always the size of a pointer. But, when we have a 64-bit kernel and a 32-bit application, the directory entry is still only 32-bits long, but we fetch it with a 64-bit pointer which makes get_user() does a 64-bit fetch. Reading 4 extra bytes isn't harmful, unless we are at the end of and run off the table. It might also cause the zero page to get faulted in unnecessarily even if you are not at the end. Fix it up by doing a special 32-bit get_user() via a cast when we have 32-bit userspace. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20151111181931.3ACF6822@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 9月, 2015 2 次提交
-
-
由 Dave Hansen 提交于
MPX includes two separate "extended state components". There is no real need to have an 'mpx_struct' because we never really manage the states together. We also separate out the actual data in 'mpx_bndcsr_state' from the padding. We will shortly be checking the state sizes against our structures and need them to match. For consistency, we also ensure to prefix these types with 'mpx_'. Lastly, we add some comments to mirror some of the descriptions in the Intel documents (SDM) of the various state components. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233129.384B73EB@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
There are two concepts that have some confusing naming: 1. Extended State Component numbers (currently called XFEATURE_BIT_*) 2. Extended State Component masks (currently called XSTATE_*) The numbers are (currently) from 0-9. State component 3 is the bounds registers for MPX, for instance. But when we want to enable "state component 3", we go set a bit in XCR0. The bit we set is 1<<3. We can check to see if a state component feature is enabled by looking at its bit. The current 'xfeature_bit's are at best xfeature bit _numbers_. Calling them bits is at best inconsistent with ending the enum list with 'XFEATURES_NR_MAX'. This patch renames the enum to be 'xfeature'. These also happen to be what the Intel documentation calls a "state component". We also want to differentiate these from the "XSTATE_*" macros. The "XSTATE_*" macros are a mask, and we rename them to match. These macros are reasonably widely used so this patch is a wee bit big, but this really is just a rename. The only non-mechanical part of this is the s/XSTATE_EXTEND_MASK/XFEATURE_MASK_EXTEND/ We need a better name for it, but that's another patch. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: dave@sr71.net Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150902233126.38653250@viggo.jf.intel.com [ Ported to v4.3-rc1. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 11 9月, 2015 1 次提交
-
-
由 Oleg Nesterov 提交于
Add the additional "vm_flags_t vm_flags" argument to do_mmap_pgoff(), rename it to do_mmap(), and re-introduce do_mmap_pgoff() as a simple wrapper on top of do_mmap(). Perhaps we should update the callers of do_mmap_pgoff() and kill it later. This way mpx_mmap() can simply call do_mmap(vm_flags => VM_MPX) and do not play with vm internals. After this change mmap_region() has a single user outside of mmap.c, arch/tile/mm/elf.c:arch_setup_additional_pages(). It would be nice to change arch/tile/ and unexport mmap_region(). [kirill@shutemov.name: fix build] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NDave Hansen <dave.hansen@linux.intel.com> Tested-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Minchan Kim <minchan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 7月, 2015 1 次提交
-
-
由 Kirill A. Shutemov 提交于
MPX setups private anonymous mapping, but uses vma->vm_ops too. This can confuse core VM, as it relies on vm->vm_ops to distinguish file VMAs from anonymous. As result we will get SIGBUS, because handle_pte_fault() thinks it's file VMA without vm_ops->fault and it doesn't know how to handle the situation properly. Let's fix that by not setting ->vm_ops. We don't really need ->vm_ops here: MPX VMA can be detected with VM_MPX flag. And vma_merge() will not merge MPX VMA with non-MPX VMA, because ->vm_flags won't match. The only thing left is name of VMA. I'm not sure if it's part of ABI, or we can just drop it. The patch keep it by providing arch_vma_name() on x86. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> # Fixes: 6b7339f4 (mm: avoid setting up anonymous pages into file mapping) Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@sr71.net Link: http://lkml.kernel.org/r/20150720212958.305CC3E9@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 6月, 2015 14 次提交
-
-
由 Dave Hansen 提交于
Now that the bugs in mixed mode MPX handling are fixed, re-allow 32-bit binaries on 64-bit kernels again. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183706.70277DAD@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
The comment pretty much says it all. I wrote a test program that does lots of random allocations and forces bounds tables to be created. It came up with a layout like this: .... | BOUNDS DIRECTORY ENTRY COVERS | .... | BOUNDS TABLE COVERS | | BOUNDS TABLE | REAL ALLOC | BOUNDS TABLE | Unmapping "REAL ALLOC" should have been able to free the bounds table "covering" the "REAL ALLOC" because it was the last real user. But, the neighboring VMA bounds tables were found, considered as real neighbors, and we declined to free the bounds table covering the area. Doing this over and over left a small but significant number of these orphans. Handling them is fairly straighforward. All we have to do is walk the VMAs and skip all of the MPX ones when looking for neighbors. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183706.A6BD90BF@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
The MPX code needs to clear out bounds tables for memory which is no longer in use. We do this when a userspace mapping is torn down (unmapped). There are two modes: 1. An entire bounds table becomes unused, and can be freed and its pointer removed from the bounds directory. This happens either when a large mapping is torn down, or when a small mapping is torn down and it is the last mapping "covered" by a bounds table. 2. Only part of a bounds table becomes unused, in which case we free the backing memory as if MADV_DONTNEED was called. The old code was a spaghetti mess of "edge" bounds tables where the edges were handled specially, even if we were unmapping an entire one. Non-edge bounds tables are always fully unmapped, but share a different code path from the edge ones. The old code had a bug where it was unmapping too much memory. I worked on fixing it for two days and gave up. I didn't write the original code. I didn't particularly like it, but it worked, so I left it. After my debug session, I realized it was undebuggagle *and* buggy, so out it went. I also wrote a new unmapping test program which uncovers bugs pretty nicely. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183706.DCAEC67D@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
Right now, the kernel can only switch between 64-bit and 32-bit binaries at compile time. This patch adds support for 32-bit binaries on 64-bit kernels when we support ia32 emulation. We essentially choose which set of table sizes to use when doing arithmetic for the bounds table calculations. This also uses a different approach for calculating the table indexes than before. I think the new one makes it much more clear what is going on, and allows us to share more code between the 32-bit and 64-bit cases. Based-on-patch-by: NQiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183705.E01F21E2@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
user_atomic_cmpxchg_inatomic() actually looks at sizeof(*ptr) to figure out how many bytes to copy. If we run it on a 64-bit kernel with a 64-bit pointer, it will copy a 64-bit bounds directory entry. That's fine, except when we have 32-bit programs with 32-bit bounds directory entries and we only *want* 32-bits. This patch breaks the cmpxchg() operation out in to its own function and performs the 32-bit type swizzling in there. Note, the "64-bit" version of this code _would_ work on a 32-bit-only kernel. The issue this patch addresses is only for when the kernel's 'long' is mismatched from the size of the bounds directory entry of the process we are working on. The new helper modifies 'actual_old_val' or returns an error. But gcc doesn't know this, so it warns about 'actual_old_val' being unused. Shut it up with an uninitialized_var(). Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183705.672B115E@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
Currently, to get from a bounds directory entry to the virtual address of a bounds table, we simply mask off a few low bits. However, the set of bits we mask off is different for 32-bit and 64-bit binaries. This breaks the operation out in to a helper function and also adds a temporary variable to store the result until we are sure we are returning one. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.007686CE@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
When we allocate a bounds table, we call mmap(), then add a "valid" bit to the value before storing it in to the bounds directory. If we fail along the way, we go and mask that valid bit _back_ out. That seems a little silly, and this makes it much more clear when we have a plain address versus an actual table _entry_. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.3D69D5F4@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
Bounds tables are a significant consumer of memory. It is important to know when they are being allocated. Add a trace point to trace whenever an allocation occurs and also its virtual address. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.EC23A93E@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
There are two different events being traced here. They are doing similar things so share a trace "EVENT_CLASS" and are presented together. 1. Trace when MPX is zapping pages "mpx_unmap_zap": When MPX can not free an entire bounds table, it will instead try to zap unused parts of a bounds table to free the backing memory. This decreases RSS (resident set size) without decreasing the virtual space allocated for bounds tables. 2. Trace attempts to find bounds tables "mpx_unmap_search": This event traces any time we go looking to unmap a bounds table for a given virtual address range. This is useful to ensure that the kernel actually "tried" to free a bounds table versus times it succeeded in finding one. It might try and fail if it realized that a table was shared with an adjacent VMA which is not being unmapped. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183703.B9D2468B@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
There are two basic things that can happen as the result of a bounds exception (#BR): 1. We allocate a new bounds table 2. We pass up a bounds exception to userspace. This patch adds a trace point for the case where we are passing the exception up to userspace with a signal. We are also explicit that we're printing out the inverse of the 'upper' that we encounter. If you want to filter, for instance, you need to ~ the value first. The reason we do this is because of how 'upper' is stored in the bounds table. If a pointer's range is: 0x1000 -> 0x2000 it is stored in the bounds table as (32-bits here for brevity): lower: 0x00001000 upper: 0xffffdfff That is so that an all 0's entry: lower: 0x00000000 upper: 0x00000000 corresponds to the "init" bounds which store a *range* of: 0x00000000 -> 0xffffffff That is, by far, the common case, and that lets us use the zero page, or deduplicate the memory, etc... The 'upper' stored in the table is gibberish to print by itself, so we print ~upper to get the *actual*, logical, human-readable value printed out. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183703.027BB9B0@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
This is the first in a series of MPX tracing patches. I've found these extremely useful in the process of debugging applications and the kernel code itself. This exception hooks in to the bounds (#BR) exception very early and allows capturing the key registers which would influence how the exception is handled. Note that bndcfgu/bndstatus are technically still 64-bit registers even in 32-bit mode. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183703.5FE2619A@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
The comment and code here are confusing. We do not currently allocate the bounds directory in the kernel. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183702.222CEC2A@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
The MPX code can only work on the current task. You can not, for instance, enable MPX management in another process or thread. You can also not handle a fault for another process or thread. Despite this, we pass a task_struct around prolifically. This patch removes all of the task struct passing for code paths where the code can not deal with another task (which turns out to be all of them). This has no functional changes. It's just a cleanup. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bp@alien8.de Link: http://lkml.kernel.org/r/20150607183702.6A81DA2C@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
The MPX registers (bndcsr/bndcfgu/bndstatus) are not directly accessible via normal instructions. They essentially act as if they were floating point registers and are saved/restored along with those registers. There are two main paths in the MPX code where we care about the contents of these registers: 1. #BR (bounds) faults 2. the prctl() code where we are setting MPX up Both of those paths _might_ be called without the FPU having been used. That means that 'tsk->thread.fpu.state' might never be allocated. Also, fpu_save_init() is not preempt-safe. It was a bug to call it without disabling preemption. The new get_xsave_addr() calls unlazy_fpu() instead and properly disables preemption. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave@sr71.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: bp@alien8.de Link: http://lkml.kernel.org/r/20150607183701.BC0D37CF@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 19 5月, 2015 7 次提交
-
-
由 Ingo Molnar 提交于
Use these consistent names: struct fregs_state # was: i387_fsave_struct struct fxregs_state # was: i387_fxsave_struct struct swregs_state # was: i387_soft_struct struct xregs_state # was: xsave_struct union fpregs_state # was: thread_xstate Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
Standardize the naming of the various functions that copy register content in specific FPU context formats: copy_fxregs_to_kernel() # was: fpu_fxsave() copy_xregs_to_kernel() # was: xsave_state() copy_kernel_to_fregs() # was: frstor_checking() copy_kernel_to_fxregs() # was: fxrstor_checking() copy_kernel_to_xregs() # was: fpu_xrstor_checking() copy_kernel_to_xregs_booting() # was: xrstor_state_booting() copy_fregs_to_user() # was: fsave_user() copy_fxregs_to_user() # was: fxsave_user() copy_xregs_to_user() # was: xsave_user() copy_user_to_fregs() # was: frstor_user() copy_user_to_fxregs() # was: fxrstor_user() copy_user_to_xregs() # was: xrestore_user() copy_user_to_fpregs_zeroing() # was: restore_user_xstate() Eliminate fpu_xrstor_checking(), because it was just a wrapper. No change in functionality. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
So 6 years ago we made the FPU fpstate dynamically allocated: aa283f49 ("x86, fpu: lazy allocation of FPU area - v5") 61c4628b ("x86, fpu: split FPU state from task struct - v5") In hindsight this was a mistake: - it complicated context allocation failure handling, such as: /* kthread execs. TODO: cleanup this horror. */ if (WARN_ON(fpstate_alloc_init(fpu))) force_sig(SIGKILL, tsk); - it caused us to enable irqs in fpu__restore(): local_irq_enable(); /* * does a slab alloc which can sleep */ if (fpstate_alloc_init(fpu)) { /* * ran out of memory! */ do_group_exit(SIGKILL); return; } local_irq_disable(); - it (slightly) slowed down task creation/destruction by adding slab allocation/free pattens. - it made access to context contents (slightly) slower by adding one more pointer dereference. The motivation for the dynamic allocation was two-fold: - reduce memory consumption by non-FPU tasks - allocate and handle only the necessary amount of context for various XSAVE processors that have varying hardware frame sizes. These days, with glibc using SSE memcpy by default and GCC optimizing for SSE/AVX by default, the scope of FPU using apps on an x86 system is much larger than it was 6 years ago. For example on a freshly installed Fedora 21 desktop system, with a recent kernel, all non-kthread tasks have used the FPU shortly after bootup. Also, even modern embedded x86 CPUs try to support the latest vector instruction set - so they'll too often use the larger xstate frame sizes. So remove the dynamic allocation complication by embedding the FPU fpstate in task_struct again. This should make the FPU a lot more accessible to all sorts of atomic contexts. We could still optimize for the xstate frame size in the future, by moving the state structure to the last element of task_struct, and allocating only a part of that. This change is kept minimal by still keeping the ctx_alloc()/free() routines (that now do nothing substantial) - we'll remove them in the following patches. Reviewed-by: NBorislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
So fpu_save_init() is a historic name that got its name when the only way the FPU state was FNSAVE, which cleared (well, destroyed) the FPU state after saving it. Nowadays the name is misleading, because ever since the introduction of FXSAVE (and more modern FPU saving instructions) the 'we need to reload the FPU state' part is only true if there's a pending FPU exception [*], which is almost never the case. So rename it to copy_fpregs_to_fpstate() to make it clear what's happening. Also add a few comments about why we cannot keep registers in certain cases. Also clean up the control flow a bit, to make it more apparent when we are dropping/keeping FP registers, and to optimize the common case (of keeping fpregs) some more. [*] Probably not true anymore, modern instructions always leave the FPU state intact, even if exceptions are pending: because pending FP exceptions are posted on the next FP instruction, not asynchronously. They were truly asynchronous back in the IRQ13 case, and we had to synchronize with them, but that code is not working anymore: we don't have IRQ13 mapped in the IDT anymore. But a cleanup patch is obviously not the place to change subtle behavior. Reviewed-by: NBorislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
This unifies all the FPU related header files under a unified, hiearchical naming scheme: - asm/fpu/types.h: FPU related data types, needed for 'struct task_struct', widely included in almost all kernel code, and hence kept as small as possible. - asm/fpu/api.h: FPU related 'public' methods exported to other subsystems. - asm/fpu/internal.h: FPU subsystem internal methods - asm/fpu/xsave.h: XSAVE support internal methods (Also standardize the header guard in asm/fpu/internal.h.) Reviewed-by: NBorislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
It's a pointless wrapper now - use xsave_state(). Reviewed-by: NBorislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
Fix a minor header file dependency bug in asm/fpu-internal.h: it relies on i387.h but does not include it. All users of fpu-internal.h included it explicitly. Also remove unnecessary includes, to reduce compilation time. This also makes it easier to use it as a standalone header file for FPU internals, such as an upcoming C module in arch/x86/kernel/fpu/. Reviewed-by: NBorislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-